• Keine Ergebnisse gefunden

In order to further validate the robustness of the fit method, another pseudo-dataset is constructed from all nominal simulated background samples except for thet¯t+jets back-ground which is replaced by an alternative sample generated withPowheg+Pythia6.

This sample is not used in the definition of any uncertainty and is similar to the sample used in the search fort¯tH(H →b¯b) based on √

s= 8 TeV Atlasdata collected during Run 1 of the Lhc[115]. If there is a bias on the extraction of the signal strength, the fit to this pseudo-dataset should yield a value ofµthat is incompatible with zero under the signal-plus-background hypothesis which assumes the presence of a t¯tH signal. After performing this fit, no such bias can be observed [4]. Finally, the ability of the fit to constrain the systematic uncertainties is validated in this fit as well.

11.2 Results before and after the fit to data

Depicted in this section are the pre-fit and post-fit distributions of the discriminants used in the combined profile likelihood fit. Figure 11.1 summarises the event yields in the various considered analysis regions of the single lepton and dilepton channels before and after the fit assuming the presence of a t¯tH signal. Similarly, the scalar sum pT of selected jets, HThad, is shown in the two t¯t+ ≥ 1c CRs of the single lepton channel in Figure 11.2. The Figures 11.3–11.7 show the output distributions of the Classification BDT in the SRs of the dilepton and single lepton channels. As can be seen, all these distributions are modelled well pre-fit within the considered uncertainties assigned to the respective predictions. In the post-fit distributions, this level of agreement between data and simulation improves, because the profile likelihood fit is able to adjust the considered nuisance parameters accordingly [4]. This is particularly important for the two k-factors of the t¯t + heavy flavour background, whose post-fit values correspond to k(t¯t+ ≥ 1b) = 1.24±0.10 and k(t¯t+ ≥ 1c) = 1.63±0.23, respectively [4]. The quoted uncertainties do not consider any theory related uncertainties of the t¯t+ ≥ 1b and t¯t+ ≥1c cross-sections. Their post-fit uncertainty is significantly reduced as well, because the fit constrains and creates correlations between the nuisance parameters [4].

Aside from this, the input variables to the classification BDT are checked post-fit in the respective SR and no significant disagreement between data and simulation is found [4].

As an example, Figure11.8illustrates the post-fit predictions of the Higgs boson candi-date mass distribution in the single lepton and dilepton channels.

tt+light5j

Figure 11.1: Comparison of the predicted event yields to the observed events in data in all considered analysis regions in the single lepton channel before (top left) and after the combined fit (top right) and in the dileptonic channel before (bottom left) and after the combined fit (bottom right). The filled red area (dashed red line) represents thet¯tHsignal stacked on top of the background (shown separately) normalised to the SM cross-section before the fit and to the extracted µ value after the fit. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in thet¯t+≥1bandt¯t+≥1cnormalisations.

11.2 Results before and after the fit to data

[GeV]

had

HT

200 250 300 350 400 450 500 550 600 650

Data / Pred. 0.5

200 250 300 350 400 450 500 550 600 650

Data / Pred. 0.5

200 300 400 500 600 700 800 900 1000

Data / Pred. 0.5 0.75 1 1.25 1.5

Events / 100 GeV

0

200 300 400 500 600 700 800 900 1000

Data / Pred. 0.5 0.75 1 1.25 1.5

Events / 100 GeV

0

Figure 11.2: Comparison of the predicted event yields to the observed events in data as a function of HThad in the single lepton t¯t+≥1c CR with exactly five jets before (top left) and after the combined fit (top right) and in the correspond-ing CR with at least six jets before (bottom left) and after the combined fit (bottom right). The filled red area represents the t¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extracted µvalue after the fit. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in thet¯t+≥1b and t¯t+≥1c normalisations.

Classification BDT output

Figure 11.3: Comparison of the predicted event yields to the observed events in data as a function of the Classification BDT output in the single lepton SR5j2 before (top left) and after the combined fit (top right) and in the single lepton SR5j1 before (bottom left) and after the combined fit (bottom right). The filled red area represents thet¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extractedµvalue after the fit. The dashed red line shows the t¯tH signal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in the t¯t+≥1band t¯t+≥1cnormalisations.

11.2 Results before and after the fit to data

Figure 11.4: Comparison of the predicted event yields to the observed events in data as a function of the Classification BDT output in the single lepton SR≥6j3 before (top left) and after the combined fit (top right) and in the single lepton SR≥6j2 before (bottom left) and after the combined fit (bottom right). The filled red area represents the t¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extracted µ value after the fit. The dashed red line shows the t¯tH signal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in thet¯t+≥1b and t¯t+≥1c normalisations.

Classification BDT output

Figure 11.5: Comparison of the predicted event yields to the observed events in data as a function of the Classification BDT output in the single lepton SR≥6j1 before (left) and after the combined fit (right). The filled red area represents the t¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extractedµvalue after the fit. The dashed red line shows thet¯tHsignal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in the t¯t+≥1b andt¯t+≥1c normalisations.

11.2 Results before and after the fit to data

Figure 11.6: Comparison of the predicted event yields to the observed events in data as a function of the Classification BDT output in the dilepton SR≥4j3 before (top left) and after the combined fit (top right) and in the dilepton SR≥4j2 before (bottom left) and after the combined fit (bottom right). The filled red area represents the t¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extracted µ value after the fit. The dashed red line shows thet¯tH signal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in the t¯t+≥1b andt¯t+≥1c normalisations.

Classification BDT output

Figure 11.7: Comparison of the predicted event yields to the observed events in data as a function of the Classification BDT output in the dilepton SR≥4j1 before (left) and after the combined fit (right). The filled red area represents the t¯tH signal stacked on top of the background normalised to the SM cross-section before the fit and to the extractedµvalue after the fit. The dashed red line shows thet¯tHsignal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area, while the histograms before the fit do not consider an uncertainty in the t¯t+≥1b andt¯t+≥1c normalisations.

11.2 Results before and after the fit to data

Figure 11.8: Comparison of the predicted event yields to the observed events in data as a function of the Higgs boson candidate mass from the reconstructed BDT trained without using Higgs boson information in the dilepton SR≥4j1 (left) and the single lepton SR≥6j1 (right) after the combined fit. The filled red area represents thet¯tHsignal stacked on top of the background normalised to the SM cross-section before the fit and to the extractedµvalue after the fit. The dashed red line shows thet¯tH signal separately, normalised to the total background prediction. The total uncertainty in the simulated yields is represented by the hatched area.

The signal strength is extracted from the combined fit to data, namely the simultaneous fit of all fifteen single lepton and dilepton regions. The best-fit value is [4]:

µ= 0.84±0.29(stat.)+0.57−0.54(syst.) = 0.84+0.64−0.61. (11.1) The observed uncertainty is identical to the one expected from the fit to the Asimov dataset. A separate fit is also performed in which both the single lepton and dilep-ton channels are included in the combined fit, but allowed to have independent signal strength parameters. The corresponding results are:

µdilepton=−0.24+1.02−1.05,

µsingle-lepton= 0.95+0.65−0.62. (11.2) The probability to observe a discrepancy between these two parameters that is equal to or larger than the quoted values is 19% [4]. The three signal strength parameters and their respective uncertainties are depicted in Figure11.9.

SM H t

σt H/

t

σt

µ = Best fit

1 0 1 2 3 4 5 6

Combined combined fit) µ

Single Lepton combined fit) µ

Dilepton −0.24 +1.021.05(+0.540.52 +0.870.91) 0.95 +0.650.62(+0.310.31 +0.570.54)

0.84 +0.640.61(+0.290.29 +0.570.54)

ATLAS s = 13 TeV, 36.1 fb-1 = 125 GeV mH

tot.

stat.

tot ( stat syst )

Figure 11.9: The signal strength parameter values µ of the simultaneous profile likeli-hood fit to Atlas data in all analysis regions. Shown is the combined result as well as the individual results of the single lepton and dilepton channels, where the signal strengths are treated uncorrelated while keeping all nuisance parameters correlated across the channels.

11.2 Results before and after the fit to data The statistical uncertainty on µ is determined by repeating the fit to data after fixing all nuisance parameters to their corresponding post-fit values, except for the normalisa-tion factors allowed to float freely, namely k(t¯t+ ≥1b), k(t¯t+ ≥ 1c) and µ. The total systematic uncertainty is then computed by subtracting this statistical uncertainty in quadrature from the total uncertainty. It is important to note that the statistical un-certainty is only a minor component of the total unun-certainty, the systematic component is significantly more dominant.

In addition to the combined fit above, another fit configuration is studied in which the single lepton and dilepton regions are fitted completely independently of each other. The corresponding signal strength parameters are:

µdilepton = 0.11+1.36−1.41;

µsingle-lepton= 0.67+0.71−0.69. (11.3) As is expected and can be seen, the total uncertainties of both results are significantly larger than those of the combined fit. It is also important to note that the best-fit value of both parameters is smaller than the observed combined result of µ quoted in Equation11.1. This has been found to originate from the large correlations between the systematic uncertainties associated to the simulated background contributions in both channels [4].

The impact of each independent source of uncertainty onµin the combined fit is listed in Table 11.1. The total systematic uncertainty is clearly dominated by the modelling uncertainty on the t¯t+≥1b background prediction. The second source in the ranking is the limited number of events in the MC samples which leads to significant statistical fluctuations in certain distributions, but it also includes the uncertainty component in the data-driven estimation of the non-prompt and fake lepton background in the single lepton channel. After these, the most highly ranked sources are the uncertainties in the b-tagging efficiency SFs, the jet energy scale and resolution, and the MC simulation of thet¯tH signal.

Figure11.10shows the twenty nuisance parameters from independent sources of system-atic uncertainty that have the largest impact on the total uncertainty onµand they are ranked by decreasing impact. Here, the best-fit value for each nuisance parameter and its corresponding uncertainty post-fit is depicted. The nuisance parameters with the largest impact on the signal strength parameter uncertainty, ∆µ, are all related to the imperfect modelling of the t¯t+≥1b background: the highest contribution to the total uncertainty originates from the comparison between the nominal prediction of t¯t+≥1b and the one simulated by the Sherpa5F setup which represents the uncertainty in the choice of the ME modelling. The three highest contributing nuisance parameters after this are also related to the t¯t+≥ 1b modelling. In addition to this, sources of system-atic uncertainty are listed that are associated to the modelling of the t¯tH signal, the t¯t+≥1cand t¯t+ light background processes and to experimental sources of uncertain-ties, for example the b-tagging efficiency as well as the jet energy scale and resolution.

However, their impact on ∆µis significantly lower than those related to the background modelling. The twenty nuisance parameters listed here contribute to 95% of the total uncertainty on the best-fit value ofµ [4].

At this point, a trend appears indicating that the physics modelling is the critical lim-iting factor of this search. As more data is recorded and the experimental conditions improve over the next years, the systematic uncertainties originating from our imper-fect understanding of these physics processes remains the biggest challenge. In order to improve the modelling of the t¯tH signal and the t¯t+b¯b background processes, various studies have been performed up until now and they show promising results. Some of these studies are presented in Chapter12.

Uncertainty source ∆µ

t¯t+≥1b modelling +0.46 -0.46

Statistical uncertainty of background model +0.29 -0.31 b-tagging efficiency and mis-tag rates +0.16 -0.16

Jet energy scale and resolution +0.14 -0.14

ttH¯ signal modelling +0.22 -0.05

t¯t+≥1c modelling +0.09 -0.11

JVT and pile-up modelling +0.03 -0.05

Other background modelling +0.08 -0.08

t¯t+ light modelling +0.06 -0.03

Luminosity +0.03 -0.02

Light lepton (e, µ) ID, isolation & trigger uncertainty +0.03 -0.04

Total systematic uncertainty +0.57 -0.54

t¯t+≥1b normalisation +0.09 -0.10

t¯t+≥1c normalisation +0.02 -0.03

Intrinsic statistical uncertainty +0.21 -0.20

Total statistical uncertainty +0.29 -0.29

Total uncertainty +0.64 -0.61

Table 11.1: List of sources of systematic uncertainty in the t¯tH(H →b¯b) search ranked in decreasing order by their impact on the signal strength uncertainty ∆µ[4].

The ‘statistical uncertainty of the background model’ refers to the statistical uncertainties in the number of simulated events and in the estimation of the non-prompt and fake lepton contribution in the single lepton channel.

11.2 Results before and after the fit to data b-tagging: efficiency NP II

: soft-term resolution

miss

ET

b-tagging: mis-tag (c) NP I b-tagging: efficiency NP I Wt: diagram subtr. vs. nominal +light: PS & hadronization t

t

Jet energy resolution: NP II 1c: ISR / FSR

+ t t

1b: shower recoil scheme

H: cross section (QCD scale) t

t

Jet energy resolution: NP I 0.10

± 1b) = 1.24

k(tt+

b-tagging: mis-tag (light) NP I H: PS & hadronization t

1b: PS & hadronization

Pre-fit impact on θ

θ+

θ = θ = θ-θ µ: Post-fit impact on

θ

Figure 11.10: List of the most significant nuisance parameters that encode systematic uncertainties that impact the combined signal strength parameterµ dur-ing the combined fit [4]. The parameters are ranked in decreasdur-ing order by their contribution to the total uncertainty, but only the twenty pa-rameters with the highest impact are presented. The empty (filled) blue rectangle represents the pre-fit (post-fit) contribution to ∆µ that shift the nuisance parameter upwards, while the teal rectangles show the corre-sponding downward shift. The black points indicate the relative pulls of the nuisance parameters. In the bottom x-axis label, ˆθ (θ0) refers to the best-fit (nominal) parameter value and ∆θis the pre-fit uncertainty of the parameter. k(t¯t+≥1b) is freely floating in the fit with a pre-fit value and uncertainty of 1.0 and thus has no pre-fit impact onµ. Some experimental uncertainties contain the label NP I and/or NP II, which refers to the first and/or second nuisance parameter encoding this uncertainty, respectively, ordered by their impact onµ.

The most precise result, quoted in Equation 11.1, shows an excess of events over the expected SM background with an observed (expected) significance of 1.4 (1.6) standard deviations (σ). The level of significance must reach at least 5.0σ to be able to claim an observation. Using the CLS method [133–135], a signal strength larger than 2.0 is ex-cluded at 95% confidence level as is depicted in Figure11.11. The expected significance and exclusion limit are determined based on the post-fit background estimate [4].

H) t

SM(t σ σ/ 95% CL upper limit on

0 1 2 3 4 5

Combined combined fit) µ

Single Lepton combined fit) µ

Dilepton

ATLAS s = 13 TeV, 36.1 fb-1

= 125 GeV mH

σ

± 1 Expected

σ

± 2 Expected Observed

µ=1) Expected (

Figure 11.11: Shown are the upper limits on the signal strength at 95% CL for the in-dividual channels and the combined result [4]. The inin-dividual limits are derived using the same approach as the quotedµ values in Equation11.3.

The black solid line represents the observed limits, while the black (red) dotted line shows the expected limits for the background-only (signal-plus-background) hypothesis. The green (yellow) area depicts the 1σ (2σ) uncertainty range on the expected limits for the background-only hypoth-esis.

In Figure11.12, the event yields in data are compared to the combined post-fit prediction for all fifteen analysis regions. They are grouped together and ordered by the signal-to-background ratio in the respective bins of the final discriminants that enter the fit. The predictions are shown for the two fit hypotheses, namely the background-only scenario and the signal-plus-background scenario. Thet¯tH signal contribution is scaled to either the observed signal strength or to the upper limit quoted above [4].

11.2 Results before and after the fit to data

(S/B) log10

2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 Bkgd. Unc.Data - Bkgd. 2

0 2 4

Events / 0.2

102

103

104

105

106

107

Data

=0.84) µfit

H ( t t

=2.0)

95% excl.

µ H ( t t

Background Bkgd. Unc.

µ=0) Bkgd. ( ATLAS

= 13 TeV, 36.1 fb-1

s

) Combined b

H (b t t

Dilepton and Single Lepton Post-fit

Figure 11.12: The combined post-fit event yields of both the dilepton and single lepton channels as a function of log10(S/B) [4]. S(B) corresponds to the number of observed signal (background) events after the fit. In this histogram, the final-discriminant bins in all dilepton and single lepton analysis categories are combined into bins of log10(S/B) in which the signal is normalised to the SM prediction used to determine log10(S/B). The red (orange) area represents the number of signal events scaled to the best-fit value (95%

CL exclusion limit) and is stacked on top of the expected background. In addition to this, the ratio of the difference between observed data events andBwith respect to the uncertainty onB, in other words the pull on data relative to B, is compared to the pulls from the signal-plus-background hypothesis predictions. These are represented by the solid red line (dashed orange line) referring to the presence of a t¯tH signal with µ= 0.84 (µ= 2.0). The dashed black line shows the pull of B for the background-only hypothesis. The error bars on the data events are purely statistical.

CHAPTER 12

t ¯ t + b ¯ b and t ¯ tH modelling studies

In the previous chapter, the results from the combined profile likelihood fit to data in the search for the ttH(H¯ → b¯b) process at √

s = 13 TeV are presented. The leading five nuisance parameters in terms of their impact on the uncertainty on the t¯tH signal strength, as shown in Figure 11.10, are the following:

1. The comparison between the nominalt¯t+jets generatorPowheg+Pythia8 and an alternative five flavour generatorSherpa5F. The difference between these two generators lies in the ME calculation and the PS and hadronisation model as well as the corresponding matching scheme. Along with the comparison to Powheg interfaced toHerwig7, the uncertainty on the chosen model to calculate the ME can be assessed.

2. The comparison betweenPowheg+Pythia8 and a four flavour scheme generator setup which is labelledSherpa4F and used to calculate thet¯t+b¯bME with NLO precision. This alternative prediction is used to model the t¯t+≥ 1b background with the best available precision, namely by reweighting the sub-categories (t¯t+b, t¯t+B,t¯t+b¯band t¯t+≥3b) of the nominal prediction to this four flavour scheme

2. The comparison betweenPowheg+Pythia8 and a four flavour scheme generator setup which is labelledSherpa4F and used to calculate thet¯t+b¯bME with NLO precision. This alternative prediction is used to model the t¯t+≥ 1b background with the best available precision, namely by reweighting the sub-categories (t¯t+b, t¯t+B,t¯t+b¯band t¯t+≥3b) of the nominal prediction to this four flavour scheme