• Keine Ergebnisse gefunden

shape as do the ratios S/B. In accordance with the efficiencies seen in figure 5.3 the signifi-cances have a similar shape across categories. The significance is lower by a factor of around 0.7 in the 4 b-tagged categories while this factor increases for higher masses. One explanation for this increase are the efficiencies as seen in figure 5.3, where the efficiency for the 4 b-tagged category doesn’t fall as fast as the one for the 3 b-tagged category. This results in a relatively faster increase ofSfor the 4 b-tagged jet category than for the other category.

5.3 Novosibirsk function and background fit

The background model is taken from measured data. This is because the vast majority of the background arises from QCD events and the Monte Carlo simulated QCD events possibly aren’t as accurate as this data driven method. Another benefit are better statistics because of the higher amount of events in the data compared to Monte Carlo. In fact the 3 b-tagged jet category has so many events in the control region that its distributions had to be pre-scaled by randomly selecting only 10% of the triggered events to correspond to roughly the same amount of events in the signal region. With the number of events lower, the uncertainties rise, which makes it easier to fit as there is less structure visible. In addition the number of events in the control region is now roughly on the same magnitude of order as the signal region.

A comparison between the di-jet mass distributions of the control regions of both categories can be seen in figure 5.5. Both distributions have little difference in shape, which means there is little to no bias because of the different selection between the two categories. The number of events in the 4 b-tagged jet category’s control region is generally lower by a factor of 0.2 to 0.3.

For masses higher than 1200 GeV the uncertainties go up due to the relatively low number of events in the 4 b-tagged jet category.

In order to produce a smooth di-jet mass distribution free from statistical fluctuation the re-sulting distribution from the control region is fitted to the turn-on Novosibirsk functionF(M12).

This function is given by

F(M12) = f(M12)·g(M12) with the turn-on

f = 1

2 Erf pslope·(M12−pturnon) +1 where Erf is the Gaussian error function

Erf(x) = 2

Chapter 5 5.3. Novosibirsk function and background fit

0 200 400 600 800 1000 1200 1400 1600

a. u.

0 200 400 600 800 1000 1200 1400 1600

bbnb/NbbbnbN 0

0.05 0.1

Figure 5.5: Comparison of background events between both categories. Both distributions have been scaled to the same value for display in the upper part. The lower part shows the ratio between the two (without scaling from the upper part). Note that the bbnb has been pre-scaled as described in the text. For masses below 300 GeV there are less events because of the trigger selection.

where pslope, pturnon, ppeak, ptail and pwidthare free parameters determined by the fit. The turn-on functiturn-on is able to reproduce the peak originating from the triggers for low masses, while the Novosibirsk function works well to model the dependence up to high masses in the distribution.

This function does not possess any physical meaning in this context and is only used as an analytical tool to be able to subtract the background. For this reason bias tests are necessary but due to time constraints are not done in this thesis. The fit is being done over a wide mass range from 240 GeV to 1700 GeV and χ2 as well as p-value are used to check the validity of the fit.

Bothχ2/nd f andp-value should be optimally close to 1.

The results can be seen in figure 5.6 for the 4 b-tagged jet category and in figure 5.7 for the 3 b-tagged jet category. For the 4 b-tagged jet category theχ2/nd f is approximately equal to 1 and has a p-value of 0.51, yet the 3 b-tagged jet category is a bit worse with aχ2/nd f of 1.1 and ap-value of only 0.19. One can also see some structure in the pulls for masses between 400 GeV and 800 GeV, which isn’t significant. This difference between the two categories might be caused by the higher number of events in the 3 b-tagged jet category and it is possible that with more events in the 4 b-tagged jet category some structure in the pulls would arise as well. Better fits might be achieved by using an extended Novosibirsk function or splitting the fit into sub-ranges and fitting low masses and high masses separately. The extended Novosibirsk function would use an additional term that is quadratic in the argument of the exponential function,

Chapter 5 5.3. Novosibirsk function and background fit

which makes this option involve an additional free parameter. The latter involves splitting the fit which can result in a discontinuity later on.

After the fits are done, histograms with the same binning as the ones created from Monte Carlo are created using the fits. This results in the templates later used for obtaining the limits as described in 7. CMS Work in progress

/ndf = 65.9/67 = 1.0 400 600 800 1000 1200 1400 1600

Pulls CMS Work in progress

/ndf = 65.9/67 = 1.0 400 600 800 1000 1200 1400 1600

Pulls

5 0 5

(b)

Figure 5.6: Background fit for the 4 b-tagged jet category in linear (a) and logarithmic scale (b).

Black dots depict measured data while the red line is the turn-on Novosibirsk function fit. The pulls represent the difference between data and fit divided by the uncertainty.

22

Chapter 5 5.3. Novosibirsk function and background fit CMS Work in progress

/ndf = 77.9/68 = 1.1 400 600 800 1000 1200 1400 1600

Pulls CMS Work in progress

/ndf = 77.9/68 = 1.1 400 600 800 1000 1200 1400 1600

Pulls

5 0 5

(b)

Figure 5.7: Background fit for the 3 b-tagged jet category in linear (a) and logarithmic scale (b).

Black dots depict measured data while the red line is the turn-on Novosibirsk function fit. The pulls represent the difference between data and fit divided by the uncertainty.

23

Chapter 6

Comparison between the two b-tagging algorithms CSVv2 and DeepCSV

Besides the b-tagging algorithm DeepCSV that this thesis uses mainly, another often used, older algorithm is CSV, which uses the same set of observables [7]. To evaluate their respective per-formance for differently flavored jets, events produced by Monte Carlo simulation, that include information about the actual jet flavor, are used. In particular these are top-anti-top events which yield a large amount of QCD decays similar to the background in this analysis. The results can be seen in figure 6.1.

For the medium working point and bottom quark jets DeepCSV clearly shows a higher ef-ficiency than CSVv2, with the difference being even greater for higher pt. Around 180 GeV DeepCSV is able to reach an efficiency close to 0.7. Overall both algorithms show a decreasing efficiency towards higher pt. The charm quark jets are falsely identified as b jets with an effi-ciency of little over 0.1. While both algorithms seem to be similar for this flavor, below 300 GeV DeepCSV yields better results than CSVv2, because of a lower efficiency with a difference of around 0.03. For jets of other flavors, the algorithms show little difference for this working point. This is different for the loose working point, where DeepCSV shows a higher efficiency for those (falsely identified as b jet) light flavor and charm quark jets, which means a higher fake rate. In the interval from 400 GeV to 600 GeV the difference is noticeably around 0.05 for light flavor and around 0.1 for charm quark jets. However for the correctly tagged bottom quark jets DeepCSV still shows better results. At the tight working point light and charm quark jets are suppressed and efficiencies show again little difference between these algorithms. For the bottom quark jets DeepCSV again outperforms CSVv2.

A check if any bias arises from the newer DeepCSV algorithm compared to the older CSVv2 is shown in figure 6.2. The ratio between the mass distributions of the background events selected with these two algorithms is approximately equal to 1 without any strong structure differing from 1. This makes it plausible to assume that there is in fact no bias from the use of DeepCSV compared to CSVv2.

Comparisons between the resulting efficiencies of both categories from Monte Carlo events

Chapter 6

Figure 6.1: Ratio between the the number of b-tagged events and the total number of events of CSVv2 and DeepCSV at medium (a), loose (b) and tight (c) working points with respect to the transverse momentum pt of the jet for variously flavored jets.

Chapter 6

0 200 400 600 800 1000 1200 1400 1600

a. u.

10 Four b-jets DeepCSV

Four b-jets CSVv2 (13 TeV) 36.26 fb-1

CMSWork in progress

di-jet mass in GeV

0 200 400 600 800 1000 1200 1400 1600

CSVv2/NDeepCSVN

Figure 6.2: Comparison of background events of the 3 b-tagged jets category between CSVv2 and DeepCSV. Both distributions have been scaled to the same value for display in the upper part. The lower part shows the ratio between the two (without scaling of the upper part).

can be seen in figure 6.3. For both categories one can see that with DeepCSV the efficiencies in-crease by a factor between 1.1 and 1.4 compared to CSVv2 with a mean around 1.2. Especially for higher masses (above 800 GeV) DeepCSV seems to give better results. This agrees with the observation from before that for higher pt the improvement gained by DeepCSV becomes larger. There is no noticeable difference in shape between the two categories.

Finally the significance estimation S/√

B is compared in figure 6.4. While there is still an increase in significance for DeepCSV the factor by which it increased is only around 1.1, which is lower than the efficiency increase from before. This is to be expected as the events in the background control region also have increased due to the higher efficiency. Assuming the same factor of 1.2 for the control region one immediately gains the factor 1.2/√

1.2≈1.1 for the significance estimation. The efficiency for the control region can rise easily because of the increased false identification of DeepCSV for the loose working point, which is the working point for the control region.

Chapter 6

200 400 600 800 1000 1200 1400

Eciency 200 400 600 800 1000 1200 1400

CSVv2ε/DeepCSVε 1.2 1.3

(a)

200 400 600 800 1000 1200 1400

Eciency 200 400 600 800 1000 1200 1400

CSVv2ε/DeepCSVε

Figure 6.3: Efficiencies of the signal region of the two algorithms CSVv2 and DeepCSV for the 4 b-tagged jets category (a) and the 3 b-tagged jet category (b). The lower part shows the efficiency of DeepCSV divided by the one of CSVv2.

200 400 600 800 1000 1200 1400

Signicance 200 400 600 800 1000 1200 1400

CSVv2σ/DeepCSVσ

200 400 600 800 1000 1200 1400

Signicance 200 400 600 800 1000 1200 1400

CSVv2σ/DeepCSVσ 1 1.1 1.2

(b)

Figure 6.4: Significance estimations S/√

B of CSVv2 and DeepCSV for the 4 b-tagged jet category (a) and the 3 b-tagged category (b). The lower part shows the significance estimation of DeepCSV divided by the one of CSVv2.

Chapter 7

Results on the expected limits

7.1 Systematic uncertainties

Various systematic uncertainties arise from the methods used in this analysis, which have been considered before computing the limits:

• Five uncertainties arise from the free parameters of the background fit. The value for these uncertainties are obtained as a result of the fit and are applied by using shape uncertainties obtained by varying the each affected parameter.

• The signal is affected by an uncertainty on the online b-tagging which is assumed to be log-normal distributed with a relative uncertainty of 5%.

• Likewise the signal luminosity is also assumed to be log-normal distributed and to have a relative uncertainty of 3%.

• The b-tagging scale factors applied as described in section 5.1 have uncertainties. The values of these depend on the pT of the jets and so are applied as shape uncertainty.

• The jet energy scale, which is a scaling for the measurements of calorimeters, comes with an uncertainty and is also applied as shape uncertainty.

• Further uncertainties include those from sources such as bias as mentioned in section 5.3, jet energy resolution, pile-up reweighting and efficiencies. These (aside from bias uncertainties) are very likely to have an impact of less than 3% on the results and also have been omitted in this thesis.