Profile Likelihood Ratio - Statistical Tools

Systematic Uncertainties and Statistical Tools

7.2. Statistical Tools

7.2.1. Profile Likelihood Ratio

In order to test the presence of thettH¯ signal, a binned profile likelihood fit is performed simultaneously in all the analysis regions considered using the distributions ofH^had_T in the control regions and of the MVA in the signal regions.

Each bin of each distribution in each of the regions has an expected number of events given by the equation:

Ei j = µsi j+bi j (7.2)

where si j and bi j represent the number of expected events associated to either signal or background processes in theibin of the jhistogram. Since the data follow a Poisson dis-tribution around the number of expected events, it is possible to define a merely statistical binned likelihood functionL(µ) as the product of the Poisson probability terms over each bin of each of the considered distributions:

7.2. Statistical Tools

L(µ)=

j n_bins(_j)

(Ei j)ⁿ^{i j}

ni j! e^−E^{i j}, (7.3)

wheren_bins(_j) is the number of bins for the jhistogram andni j is the observed number of events for the biniand for the histogram j. The free parameterµis estimated by maximis-ing theL(µ) or minimising its logarithm by afitprocedure. The error on the achieved best estimate ofµis obtained through a scan of the values of the likelihood as a function ofµ.

The 1-σband is set finding the points in which the logarithm of the likelihood decreases by a factor of two with respect to its maximum.

In real cases, the expected number of events for signal and background processes is af-fected by both statistical and systematic uncertainties. The ksystematic uncertainties are considered directly in the definition of the likelihood, through a collection of k continu-ous parametersθk, referred to as Nuisance Parameters (NPs). By varying the values of the NPs, one changes both the shape and the normalisation of the predictions, so the si j and bi j are then also dependent onkNPs, referred to in the following asθ. By maximising the likelihood, the best values for theθin order to improve the agreement between expected and observed number of events are found. The NPs are inserted in the definition of the likelihood through their probability distribution functionsρ(θ).

L(µ, θ)=L(µ)Y

ρ(θk) (7.4)

The ρ(θ) are also referred to aspenalty terms orprior distributionson θ. The assumed functional form of the priors depends on the considered nuisance parameter. Three diff er-ent types are used in this analysis [144]:

• Gaussian prior distribution: this is the assumed shape for most of the NPs. The associated function is:

ρ(θk)= 1

√2πσk

exp (

−(θk−θ¯k)² 2σ²_k

)

(7.5) where the central value ¯θkis the measured value of a certain systematic variation and σk is the uncertainty associated to it. The usage of a Gaussian distribution prevents the fit to prefer very large deviations from the measured value in the minimisation procedure.

• Log-normal prior distribution: this shape is used for those NPs associated to

quanti-7. Systematic Uncertainties and Statistical Tools

ties that always need to be positive defined, such as normalisations. The associated function is:

ρ(θk)= 1

√2πσk

exp











− log

θk/θ¯k

2(log(σk))²









 1 θk

(7.6)

• Gamma prior distribution: the gamma distribution is associated to the NPs which are introduced to take into account the statistical uncertainty on the number of the selected MC events. This takes the form:

ρ(θk)= A

Γ(B)(Aθk)^Be^−Aθ^k (7.7) whereA=(1/σ^rel_k )²,σ^rel_k is the relative statistical uncertainty of the considered bin, andB= N−1 withNthe bin content, rounded to the nearest integer.

By convention, the NPs are defined such that the value of θ = 0 refers to the nominal value of the prediction while the values of ±1 refer to ±1σ variations of the systematic uncertainty associated to the consideredθ:

θ⁰ = θ−θ¯

σ . (7.8)

After the maximisation procedure is concluded, the values of ˆθ and ˆµare defined as the ones which maximise the likelihood. If the observed data are not sensitive to a given source of systematic uncertainty, the best value of the correspondingθk stays at 0 and its error is consistent with the input uncertainty. In the opposite case, the fit can shift (pull) the best value for a given NPs to achieve a better data/MC agreement or produce a reduc-tion (constraint) of the error associated to a nuisance parameter. The latter case happens when the large effects of a given systematic uncertainty are not supported by the available data. Constraints provided by data can help to increase the sensitivity of the measurement.

Statistical fluctuations can produce additional shape differences in the considered distri-butions, changing the result of the fit. To avoid this, a smoothing procedure is applied to merge the bins until the shape differences are significant compared to the statistical fluctuations. To neglect those uncertainties which do not play a role in the fit, all the sys-tematic uncertainties affecting the total normalisation or the total shape by less than 0.5%

are dropped. This procedure is referred to aspruning. Pruning does not affect the result of the fit. The tool used to implement the profile likelihood fit is the RooFit framework [145].

7.2. Statistical Tools The likelihood definition gives the possibility to define confidence intervals as well. The

p_µare defined as profile likelihood ratios:

q_µ =−2 log(λ(µ))=−2 log







L(µ,θ)ˆˆ L( ˆµ,θ)ˆ







, (7.9)

whereθˆˆ are the values of the NPs that maximise the likelihood for a given value ofµ, with the constraint 0 ≤ µˆ ≤ µsince physics only allows the ˆµto be positive and the exclusion limitµneeds to be greater than the best estimator. By using the Wilk and Wald theorems [146] which hold for sufficiently large dataset statistics, the asymptotic approximation is obtained:

qµ = −2 log(λ(µ))' (µ−µ)ˆ ²

σ² , (7.10)

where σ represents the variance of the likelihood estimate of µ. Such a parameter is calculated making use of the so called Asimov dataset [143], an artificial dataset in which all observed quantities are set equal to their expected values¹. A dataset defined in this way is such that when it is used to evaluate the estimators for all parameters, the true parameter values are obtained. The Asimov dataset has the particularity that all the pulls for the NPs are zero by definition.

Im Dokument Search for the Standard Model Higgs boson produced in association with a pair of top quarks and decaying into a bb-pair in the single lepton channel at √s = 13 TeV with the ATLAS experiment at the LHC (Seite 126-129)