• Keine Ergebnisse gefunden

8. Likelihood Point Source Search 89

8.10. Hypothesis Testing

The goal of hypothesis testing is to measure how likely or unlikely the outcome of an experiment is assuming a certain hypothesis H0 [54]. To quantify the result of a statistical test, the scalar quantity called test statistic λis defined. In the context of the likelihood method (see discussion in section 8.2), the test statistic can be defined as maximum and L(0) is the likelihood function evaluated under the assumption of the hypothesis to be tested (H0). In the context of this analysis,H0 always describes the background only hypothesis, the data set does not contain any signal from a point source6.

The more consistent the outcome of the likelihood maximization L(ˆns,ˆγ) is to the null hypothesis H0, the closer the value ofλis to zero. Due to statistical fluctuations of real date, even if H0 is true, the λdistribution will not result in δ-peak at λ= 0

6Ifns= 0, then the value of the spectral indexγ is degenerated and thus not of interest.

8.10. Hypothesis Testing

only, but in a distribution around it.

p-Values

The p-value is a scalar which quantifies the consistency of an experimental outcome with a hypothesisH0. Ifλexpis the experimental outcome of an experiment andH0(λ) is the distribution of background test statistic, the measured p-value is defined as

p= 1−

The p-value is the probability to get an experimental result which is equally or more inconsistent with the expectation from null hypothesis H0 under the assumption that H0 is true. The smaller the p-value, the larger the inconsistency.

The p-value is also called the significance. Significance can also be measured in units of standard deviationsσ of a normal distribution. The typical requirement for discovery in particle physics is a 5σdiscovery, which corresponds to a p-value of roughly 6·10−7. The requirement means that the null hypothesisH0 is discarded only if the measured outcome looks so different to the expectation from H0 that a result as the measured one or an even more different extreme result will only happen because of fluctuations of H0 in about one out of ten million trials. At this point, one might consider an alternative hypothesis H1 and claim a discovery.

Hypothesis Testing

In hypothesis testing, one want to test two hypotheses against each other. H0 is called the null hypothesis and typically corresponds to the established model where the alternative hypothesis H1 typically incorporated H0 plus an additional, so far unknown component. This unknown component can be a new resonance in a spectrum or a neutrino point source. In the context of this work,H0is also called the background hypothesis and H1 the signal hypothesis7.

Before the experiment is performed, the required significance for discarding H0 is selected. This could be 90%, 5σ or any other value. It has to be chosen before the

7The signal hypothesis contains the background hypotheses plus a point source signal on top.

experiment is actually performed. The sensitivity level then corresponds to a threshold valueλthresfor the test statistic, see equation 8.31. Thus, if the experimental outcome λexpis below the threshold valueλthres, the null hypothesis is accepted andH1rejected.

If λexp > λthres, the null hypothesis H0 is rejected and the alternative hypothesis H1 is accepted. When testing the null hypothesis H0 against the signal hypothesis H1, there are four scenarios:

accept H0 acceptH1

H0 true X Error Type II

H1 true Error Type I X

Table 8.1.: Possible outcome of a hypothesis testing involving two hypothesis.

Correct Hypothesis Selected

The hypothesis testing identified the correct hypothesis. This is the ideal case and indicated with the check marks in table 8.1.

Error Type I An error of type I is the probability to reject the null hypothesis H0 given that it is true. It is also called false positive. The chance for error type I α is is given by

α= Z

λthres

H0(λ)dλ and is identical to the p-value.

Error Type II The error of type II, also called false negative, is the chance to reject the signal hypothesis H1 even if it is true. It is given by

β =

Z λthres

0

H1(λ)dλ.

Choice of Test Statistic Threshold

The selection of λthres is a compromise between an error of type I and an error of type II. If λthres is set to a large value, the chance of falsely claiming a discovery is very low, but the chance to identify H1 is very low, too. In contrast does the choice

8.10. Hypothesis Testing

of a small λthres lead to more likely discoveries, but also to more false claims just as a result of background fluctuations. There is no optimal choice of λthres since it is a matter of statistical interpretation. A smaller value of the error of first type αis often called conservative. In the context of potential discoveries and fundamental claims, analyses are often designed to be more on the conservative side.

Characterizing the Performance of an Analysis

Studying both errors of type I and II of a certain analysis allow to give a measure on the strength of the analysis. Typically this is done by computing the required signal strength that leads to a signal hypothesis H1 which then fulfills certain selections of α and β.

Sensitivity

In IceCube the sensitivity is defined as the required flux for whichα= 0.5 andβ = 0.1.

This is also calledmedian sensitivity at90%confidence level. For this work, sensitivity is the main quantity to describe the performance. The sensitivity is given in a unit of signal strength; this can be the neutrino flux, the neutrino fluence or a similar unit.

Discovery Potential

The discovery potential is defined as the required flux for a type I error of 5σ in 50%

of the experiments (β = 0.5). Compared to the sensitivity, the discovery potential is much more sensitive to the background. Also, the flux required to fulfill the discovery potential requirement is typically larger than for sensitivity. A practical challenge when computing the discovery potential is that the background test statics distribution H0(λ) has to be known to an accuracy of about the 5σ regime. If generated by simulation, this requires about 10 million test experiments which can be computational challenging. Figure 8.6 illustrates both sensitivity and discovery potential.

Computing Sensitivity The sensitivity is estimated from a simulation. First, sev-eral hundred simulations with only background events are performed. The likelihood function is maximized on this data set and the outcoming test statistic values λ are stored. They are then ordered. This leads to the inverse cumulative distribution of

Figure 8.6.: Schematic plot describing definition of sensitivity and discovery potential.

The black line is the background distributionH0λand red and blue line the signal hypothesisH1(λ) for sensitivity and discovery potential.

H0

1− Z λ

0

H00)dλ0

whereH0(λ) is the background test statistic distribution. To compute the sensitivity, the median on the distributionH0(λ) is computed. Since the likelihood maximization is restricted to ns ≥0, any under fluctuation is fitted to ns = 0 and thus there is a pile up at λ= 0, so the typical median of the background is typically zero or close to.

Then the procedure is repeated with additional signal events from a simulated point source with a certain strength i. This generates the distribution Hi(λ) where the subscript idenotes the injected flux of strengthiin arbitrary units. The integral

Pdetect = Z

λmedian

Hi0)dλ0 (8.32)

is computed. It describes the chance to measure a test static value λ > lambdamedian. Several scenarios with different signal strengthiare computed, and the injected signal strength is plotted against the corresponding outcome of equation 8.32. The distri-bution is interpolated to find the required flux to have a chance to be above the

8.10. Hypothesis Testing

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

Signal Flux Strength 0.2

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

chanceforλoverthreashold

Interpolation PolyFit

(1 exp( ax)) + 0.50

Figure 8.7.: The plot shows the interpolation to find the sensitivity flux. Compared are three methods for interpolation.

threshold in 90% of the cases. Figure 8.7 shows the process of interpolation to find the required sensitivity flux. Several interpolations are performed for testing, a linear interpolation, a polynomial fit and a dedicated fit functionf(x) = (1eax) +1/2. Computing the sensitivity requires many simulations with various signal strengths and is computational very demanding. Therefore, also the discovery potential has not been computed in this thesis since it would have required even more computa-tional effort. The limit for claiming a discovery has been set to 5σ consistent with the standard in particle physics. For performance studies, sensitivity has been used to quantify the analysis performance instead of discovery potential.