Ubungen zu Moderne Methoden der Datenanalyse ¨ Exercise 5: Estimation of upper limits and hypothesis testing

(1)

Institut f¨ur Experimentelle Kernphysik (EKP) Prof. Dr. M. Feindt, Dr. T. Kuhr

M. R¨ohrken, B. Kronenbitter, Dr. A. Zupanc

25. November 2010

Ubungen zu Moderne Methoden der Datenanalyse ¨ Exercise 5: Estimation of upper limits and hypothesis testing

We consider in this exercise Poisson processes. In those, a measurement of the number of detected events is distributed according to the probability function:

P(n|µt) = µⁿ_te^−µ^t

n! , (1)

wheren is the number of detected events andµ_tthe true (or expected) number of events.

In the presence of signal and background the expected number of events isµ_t=µ_t,S+µ_t,B.

• Exercise 5.1: Classical (frequentist) approach

Let us assume the expected number of background events to be negligible: µt,B = 0.

Using the classical approach, we will compute a 68% confidence interval onµ_t=µ_t,S if the measured number of event is n₀ = 3.

– Find µ₁ < n₀ and µ₂ > n₀ such that:

∞

X

n=n0

P(n|µ₁) = 0.16, (2)

n0

X

n=0

P(n|µ₂) = 0.16. (3)

– Compute the one-sided 90% confidence level upper limit and lower limit on µ_t,S. The strategy consists in using the first formula above alone.

– Compare to table 32.3 in PDG (http://pdg.lbl.gov/2008/reviews/statrpp.pdf).

(2)

2

• Exercise 5.2: Likelihood approach

The likelihood function for a Poisson process, supposing one single measurement, is:

L(n₀|µ_t) = µⁿ_t⁰e^−µ^t

n₀! . (4)

where n₀ is the number of measured events.

– Draw the−2 lnLcurve as function ofµ_t, performing a scan over a significative range of values. Where is the minimum−2 lnL_min of this curve?

– The 68% confidence level confidence interval boundaries correspond to points where 2·∆ lnL= 2·(−lnL+ lnL_min) is 1. Where are they?

– The 90% confidence level upper limit correspond to the point with µ_t > n₀ where 2·∆ lnLis 1.28. What is the upper limit in this case?

To translate a CL into the proper ∆ lnL, you can use ROOT:

2·∆ lnL=√

2·T M ath::Erf Inverse(2·CL−1) (5) Check that for CL= 0.90, 2·∆ lnL= 1.28. For more details see these lectures http://www.hef.kun.nl/~wes/stat_course/statist_2002.pdf, in particu- lar chapter 8.4.

• Exercise 5.3: Bayesian approach

The Bayesian posterior probability P(µ_t|n₀) is given by the Bayes theorem:

P(µ_t|n₀) = L(n₀|µ_t) P(µ_t) R

allµtL(n₀|µ_t) P(µ_t)dµ_t. (6) P(µ_t) is called the prior probability on µ_t and describe our prior belief about the distribution of this parameter. We’ll try 2 priors:

– P(µ_t) constant for µ_t >0 and null otherwise,

– P(µ_t) proportional to 1/µ_t for µ_t>0 and null otherwise.

Now:

– Compute and draw the posterior probability in both cases.

– What are the 90% credibility upper and lower limits with this method (with each of the 2 prior distributions)?

Finally: compare the upper and lower limits obtained with the 3 methods.

• Exercise 5.4: Classical upper limits in presence of background Now µ_t,B is not negligible anymore.

(3)

3

– Compute with the classical method the 90% confidence level upper limits on µ_t,S as function of µ_t,B. The convention is that one subtracts the number of background events from the limit onµt,S obtained with no background events.

What is the inconvenience of this procedure?

– Make a plot of those limits if n₀ = 0, n₀ = 1, n₀ = 2, . . . . You can draw in the same canvas one curve for every value of n₀.

– NormalizeCL_SB byCL_B and make the plot again usingCL_S instead of CL_SB as it was done in exercise 1.CL_SB andCL_B are defined below.CL_SB measures the compatibility of the experiment with the signal plus background hypothesis, while CL_B the compatibility with the background only hypothesis.

CL_SB =

n0

X

n=0

P(n|µ_t,S+µ_t,B), (7)

CLB =

n0

X

n=0

P(n|µt,B), (8)

CL_S = CL_SB/CL_B. (9)

• Exercise 5.5: Signal significance

We know the signal is expected to be µt,S = 15 and the backgroundµt,B = 40 while the measurement is still n₀ = 56.

– What is the probability to measure n₀ or less events if you expect only background?

– What is the probability to measure more than n₀ events if you expect signal and background?

– What is the corresponding significance (i.e. the “number of sigmas” of a Gaus- sian distribution corresponding to this probability)? You can use the formula p=R+∞

s

√1 2πe^−t

2

2 dt. ROOT offers you also the function Erf in the TMath name space.

– One among the many significance estimators is the so called S_L2 and it has many desirable features. It is defined as:

S_L2 =p

2 lnQ, Q=L_SB/L_B (10)

whereL_SBandL_Bare the likelihood in the signal+background and background- only hypotheses. This means calculated using the same dataset but with the signal+background and background only models respectively.