Optimisation of observables - Search for top associated Higgs production

Search for top associated Higgs production

5.4.1 Optimisation of observables

The set of observables that are input to the BDT are determined algorithmically. An initial list of observables is generated by computing∆R, minimal∆R,∆φ,∆η,k_T¹, invariant mass and scalarp_Tsum of all combinations of the following reconstructed objects: τ_had, both light leptons, first three leading jets in p_T, as well as the last jet, first two jets that areb–tagged andE_T^miss. The p_Tof all above objects are also added. This results in a list of 275 potential observables to serve as input to the BDT. The goal of the algorithm that chooses the final set is to select the smallest number of observables with the best performance. Since these criteria are subjective, an arbitrary choice is made to reduce the set of observable to one dozen.

The algorithm trains a BDT with the full set of observables and discards the least important 20%. The figure of merit to determine the importance of an observable is the ranking obtained by counting how often the observable was used in a tree splitting weighted by the gain in separation achieved in the child nodes [136]. The discarding is repeated until a dozen observables are identified.

A ROC curve is a graphical summary of the performance of a classifier and shows the background rejection (1−efficiency) vs. signal efficiency [137]. A better classifier therefore has a curve closer to the top right corner. Each point on the ROC curve is a specific cut on the classifier output. By moving along the line from the left to right, the classifier output cut threshold changes from rejecting all background (but not leaving any signal either) to rejecting no background (but keeping all events). In a search analysis signal events are rare so a cut threshold with high signal efficiency is usually chosen. That means that only the right side of the ROC curve is relevant for estimating the performance of a classifier.

Figure 5.11(a) shows the ROC curves for three iterations (first, last and one intermediate). In each iteration there are four curves for each of the 4–fold BDTs. When ROC curves have similar shapes, as is the case here, their integral can be used as a figure of merit to compare them. To ascertain that the final set of observables is performing well the area under the ROC curve (ROC integral) is plotted in Fig. 5.11(b) for each iteration of the optimisation. It can be seen that no performance is lost by reducing the set of observables from 275 to a dozen.

Signal eff

0 0.2 0.4 0.6 0.8 1

Backgr rejection (1-eff) 0.2 0.4 0.6 0.8 1

ROC 77.76

∫

n. of obs. 275, av.

ROC 78.57

∫

n. of obs. 48, av.

ROC 78.03

∫

n. of obs. 12, av.

(a) ROC curve

Number of observables

0 100 200 300

ROC Integral [%]

50 60 70 80 90 100

(b) ROC integral

Figure 5.11: ROC curves for the first, last, and an intermediate iteration and ROC integrals for all iterations of the optimisation of BDT observables. For each iteration the four folds of the training are shown.

1k_T=min(p^a_T,p^b_T)·∆R(a,b) with a maximum of 400 GeV

The optimisation results in the set of observables in Tab. 5.7. Two changes are made to the optimised set of a dozen observables. First, the transverse momentum of the leadingb–tagged jet is removed due to strong correlation with leading jetp_T. Second, the number of jets andb–tagged jets is added. Due to their integer nature which is different to the other observables that have continuous values they are not included in the optimisation algorithm. Therefore the final number is 13 observables.

The range of momentum-based and jet-counting observables is restricted for the purpose of training the BDT. The restrictions are motivated by aspects of the training algorithm as well as considerations of data/MC agreement.

During training, the BDT algorithm scans the full range of observable values for optimal cut thresholds.

With larger ranges the search space and so the distance between considered cut thresholds becomes larger.

The momentum-based observables have long continuously falling distributions. The allowed values for these are restricted to essentially remove the tail. All events are still used with a maximum rather than the actual value. Capping of the allowed range prevents too coarse searches of the cut thresholds.

Furthermore, little information aboutt¯tHis expected to be found at large values of p_T.

The allowed number of (b–tagged) jets are capped at 6 (2) for a different reason. Large multiplicities of jets are caused by higher order QCD radiation. In MC, which is used to train the BDT, these are partially approximated by parton shower algorithms that often may not model data well. This potential region of mismodelling is avoided by capping the values.

All BDT inputs and the output can be seen in Figs. 5.12 to 5.15. All distributions show good agreement of the prediction to data. It is also determined with Kolmogorov-Smirnov tests between BDT outputs obtained from seen and unseen data that no overtraining is present in this BDT.

Observables Abbreviation used

in figures

Lepton properties

Invariant mass of light lepton pair m_``

Sum p_Tof light leptons

p_Tofτ_had τ_had p_T

Jetproperties

Leading jet p_T Sum p_Tof jets

Sum p_Tofb–tagged jets Number of jets

Number ofb-tagged jets

Angulardistances

Smallest∆Rdistance between a light lepton and a jet min∆R(lj) Smallest∆Rdistance between a light lepton and ab–tagged jet min∆R(lb) Smallest∆Rdistance between a non-tagged jet and ab–tagged jet min∆R(jb)

∆Rdistance between the leading light lepton and theτ_had ∆R(l₀, τ_had)

∆Rdistance between the sub-leading light lepton and theτ_had ∆R(l₁, τ_had) Table 5.7: Chosen set of observables for BDT.

5.4 Suppression oft¯tbackground using BDT

[GeV]

τhad

0 50 100 150 200

103

Data Pred. × 0 0.5 1 1.5 2

[GeV]

τhad

Events / bin

0 100 200 300

400 ^Data_Fake_τ_had ^t_V^t^H_+jets^(x20) Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

[GeV]

Leading jet pT

0 100 200 300

103

Data Pred. × 0.51 1.52 2.5

[GeV]

Leading jet pT

Events / bin

0 50 100 150

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

Number of jets (capped at 6)

0 2 4 6

Pred.Data

0.8 0.9 1 1.1 1.2

Number of jets (capped at 6)

Events / bin

0 200 400 600

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

Number of b-tagged jets (capped at 2)

0 1 2

Pred.Data

0.7 0.8 0.91 1.1

Number of b-tagged jets (capped at 2)

Events / bin

0 200 400 600

800 Data ttH (x20)

τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

of jets [GeV]

Sum pT

0 200 400 600 800

103

Data Pred. × 0 0.51 1.52

of jets [GeV]

Sum pT

Events / bin

0 50 100 150

200 ^DataFake τ_had ^tV^t^H+jets^(x20) Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

of b-tagged jets [GeV]

Sum pT

0 100 200 300 400

103

Data Pred. × 0.25

0.5 0.75 1 1.25

of b-tagged jets [GeV]

Sum pT

Events / bin

0 50 100 150

200 Data ttH (x20)

τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

Figure 5.12: BDT input observables.

Top row: Leading taup_T. Leading jetp_T.

Middle row: Number of jets. Number ofb–tagged jets.

Bottom row: Scalar sum of jetp_T. Scalar sum ofb–tagged jetp_T.

R(lj)

∆ min

0 1 2 3

Pred.Data

0 0.5 1 1.5 2

R(lj)

∆ min

Events / bin

50 100 150

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

R(lb)

∆ min

0 1 2 3 4

Pred.Data

0.5 0.75 1 1.25 1.5

R(lb)

∆ min

Events / bin

50 100

150 ^Data ^t^t^H^(x20)

τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

R(jb)

∆ min

0 1 2 3 4

Pred.Data

0 0.5 1 1.5 2

R(jb)

∆ min

Events / bin

0 50 100 150

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

[GeV]

mll

0 50 100 150 200

103

Data Pred. × 0.5 0.75 1 1.25 1.5

[GeV]

mll

Events / bin

50 100 150

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

had)

0τ

∆ R(l

0 1 2 3 4 5

Pred.Data

0 0.51 1.52

had)

0τ

∆ R(l

Events / bin

0 50 100

150 _Data _t_t_H_(x20)

τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

had)

1τ Rl

0 1 2 3 4 ∆ 5

Pred.Data

0.5 1 1.5 2 2.5

had) τ

∆ R(

Events / bin

0 50 100

150 _Data _t_t_H_(x20)

τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

Figure 5.13: BDT input observables.

Top row: Minimum∆Rbetween a lepton and a jet. Minimum∆Rbetween a lepton and ab–tagged jet.

Middle row: Minimum∆Rbetween a jet and ab–tagged jet. Invariant mass of the dilepton system. Here the Z–veto is clearly visible.

Bottom row:∆Rbetween leading lepton and tau.∆Rbetween subleading lepton and tau.

5.4 Suppression oft¯tbackground using BDT

of light leptons [GeV]

Sum pT

0 100 200 300 400

103

Data Pred. × 0.5

1 1.52 2.5

of light leptons [GeV]

Sum pT

Events / bin

0 100 200

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1 s

τhad 1 + OS 2L

Figure 5.14: BDT input observable. Scalar sum of leptonp_T

BDT output

−1 −0.5 0 0.5 1

Pred. Data

0.8 1 1.2 1.4 1.6

BDT output

Events / bin

100 200

Data ttH (x20) τhad

Fake V+jets Z

t ttW

Diboson Other DL

t Stat. Unc.

= 13 TeV, 36.1 fb-1

τhad

1 + OS 2L

Figure 5.15: BDT output in 2`(OS)1τ_had

Im Dokument Universität Bonn (Seite 79-84)