Cross Section Extraction Using Profile Likelihood Fitting

5. General Analysis Strategy 45

5.5. Cross Section Extraction Using Profile Likelihood Fitting

5.5.1. The Idea

The aim of the presented analyses is to extract the best fitting value for the top quark pair production cross section, σt¯t, from a fit of the templates derived from Monte Carlo simulated events to a data distribution. Here, the distribution is the likelihood discriminantD, as described previously, available for events in the µ+jets and e+jets channels with different jet multiplicities (njets = 3,4, ≥ 5).

While the discriminant D is divided into 20 bins per channel and jet multiplicity here and a global discriminant with 120 bins is formed for the final fit, the fitting procedure itself is valid for any binned distribution. The best fitting value of σt¯t is derived not only as a function of the signal and background normalization, but also as a function of several nuisance parameters representing sources of systematic uncertainties. Allowing to vary the latter systematics leads to a better knowledge of the uncertainties themselves and to a possible reduction of the overall uncertainty of the measurement.

The likelihood function and the minimization technique, implemented usingROOT [95] and the built-in mbuilt-inimization procedure Minuit2 [96], as well as the technique to handle and create continuous systematic uncertainties are described in the following.

5.5.2. The Likelihood Function

The top quark pair production cross section, σt¯t, is extracted from a likelihood fit of signal and background templates to data. A likelihood functionL can be defined, as described in the following, and a minimization procedure of −ln(L) returns the preferred value for the parameter of interest, here the top quark pair production cross section.

For a binned distribution D with k bins, predictions for signal and background processes can be compared to data assuming Poisson probabilities and formulating the extended maximum likelihood function

Lβ(~β) =Y

µⁿ_k^kexp(−µk(~β))

nk! . (5.8)

The number of observed data events per bin k is defined as nk, while the sum of signal and background events is expressed as the number of expected events µk. Each signal and background process is associated to a parameter βj3, which is βj = 1.0 for the initial prediction from Monte Carlo simulation, normalized to the luminosity of a given data set. The number of expected events in a bin k can therefore be expressed as the sum over the predictions for all processes j for this bin k: µk(~β) = P

jβjνjk, with νjk the predicted number of events for a process j in the bin k. The βj

are the parameters adjusted to achieve a minimal value of−ln(L), and the preferredβj are returned as the final result of the fit. However, while the parameter of interest, β0 = σt¯t,measured/σt¯t,predicted, needs to be allowed to take any possible value, the parameters βj for the background processes are constrained by the uncertainties of their predictions. The knowledge about the uncertainties is

3with the top quark pair production normalization associated toβ0

implemented in a Gaussian term

Gβ =Y

√ 1

2π∆j exp −(βj −1)² 2∆²_j

, (5.9)

which is multiplied with Lβ. Up to here, only variations of the βj are allowed in the minimization procedure, i.e. only variations of the amount of signal and background.

In the presented analyses the aim is to also include sources of systematic uncertainties as nuisance parameters δi in the fitting procedure and also obtain preferred values for each systematics. This extends the likelihood function to

Lβ,δ(~β,~δ) =Y

µⁿ_k^kexp(−µk(~β,~δ))

nk! ×Y

√ 1

2π∆j exp(−(βj −1)²

2∆²j )×Y

√1

2πexp(−δ2 )ⁱ² . (5.10) An additional term for Gaussian constraints on the nuisance parameters δi is added, but moreover the number of expected events µk turns into a function of the parameters βj and δi:

µk(~β,~δ) =X

βjνjkY

εjik(δi). (5.11)

The termεjik, representing the variation of bin contents as a fit of the nuisance parameterδi, becomes necessary since initially only the ±1σ variations, corresponding to δi = ±1.0, of each source of systematic uncertainty are known. For each source of systematic uncertainty i, the deviation from the nominal prediction for process j and bin k can be expressed as

λ^±_jik = ν_jik^±

νjk . (5.12)

To translate this knowledge into a number of expected events per process and bin for any possible value of δi, a procedure called vertical template morphing [97] is implemented. As illustrated in figure ??

a quadratic interpolation based on Lagrange polynomials is performed in the range [−1σ,1σ], based on the up and down variations of a systematic uncertainty i and the nominal prediction in each bin.

Beyond the well-known range of 1σ a linear extrapolation is implemented in such a way that the function remains differentiable at the ±1σ points. The template morphing procedure of quadratic interpolation and linear extrapolation for a given source of systematics i for a physics process j in the bin k is expressed by the shift parameter εjik,

εjik(δi) =







λ⁺_jik + (δi−1)[(³₂λ⁺_jik−1) + (¹₂λ⁻_jik−1)] for δi >1,

12δi[(δi −1)λ⁻_jik+ (δi+ 1)λ⁺_jik]−(δi−1)(δi + 1) for |δi| ≤1, λ⁻_jik + (δi+ 1)[(−¹₂λ⁺_jik+ 1) + (−³₂λ⁻_jik+ 1)] for δi < −1,

(5.13)

and the product over all systematic uncertainties included in the likelihood function leads to the number of expected events for a given process. Initially, the nuisance parameters have the value δi = 0.0±1.0, representing the nominal situation with a 1σ uncertainty. Both the central value and the uncertainty can take different values in the fitting procedure. This procedure is only valid for systematic uncertainties that can be assumed to follow a continuous distribution in the [−1σ,1σ]

! i

-1 0 1

1" up

nominal 1" down

Figure 5.7.: Illustration of the quadratic interpolation in vertical template morphing to translate the availabilite ±1σ shifted distributions for a systematic uncertainty i into a continuous function of the parameter δi. An example distribution showing possible shifts in a full distribution is displayed, and the interpolation is applied in each bin and process separately.

range, and therefore not all sources of systematics as described in section 5.6 can be included in the fitting function.

Once the full likelihood function Lβ,δ(~β,~δ), is defined, the minimization of −ln(L) is performed with Minuit2, obtaining fit results for all normalization parameters βj and nuisance parameters δj. Hessian symmetric uncertainties and the full covariance matrix for all fit parameters are extracted from the second derivatives of −ln(L) at the minimum, while asymmetric uncertainties can be obtained from the MINOS algorithm available in the Minuit2 package.

Finally, the reduction of the full likelihood to a one-dimensional function Lβ0 only depending on the parameter of interest, β0 =σt¯t,measured/σt¯t,predicted, is performed in the profiling [98] step. After finding the global minimum, gmin, in the general minimization procedure, the likelihood as a function of β0

can be expressed as subtraction of a local minimization, lmin, with fixed parameters except for β0, from the global minimum:

−ln(Lβ0(β0)) = ln(Lβδ(~β^gmin, ~δ^gmin))−ln(Lβδ(β0, β₁^lmin, . . . , ~δ^lmin)). (5.14) This profile likelihood equation behaves like a log-likelihood function and allows to test the perfor-mance of the likelihood, to extract the 1σ-uncertainty onβ0 from the minimum with−ln(Lβ₀(β₀^lmin))+

0.5 and to understand the dependency of β0 on each other fit parameter.

The stability of the described procedure and the sensitivity to various effects has to be carefully evaluated, as described in section 5.7 and the corresponding analysis chapters.

Im Dokument Precision Measurements of the Top Quark Pair Production Cross Section in the Single Lepton Channel with the ATLAS Experiment (Seite 101-104)