B ELLE IIE XPERIMENT D ECAYATTHE B → K ( 892 ) S ENSITIVITY S TUDYFORTHE

(1)

B

ACHELORARBEIT

D

EUTSCHES

E

LEKTRONEN

-S

YNCHROTRON

, H

AMBURG

S ^ENSITIVITY S TUDY FOR THE

B ⁺ → K ^∗ (892) ⁺ µ ⁺ µ ⁻ D ECAY AT THE

B ^ELLE II E ^XPERIMENT

J ASPER R IEBESEHL

G

^UTACHTER

P

ROF

. D

R

. C

AREN

H

AGNER

, U

^NIVERSITÄT

H

^AMBURG

&

D

R

. A

LEXANDER

G

LAZOV

,

D

^EUTSCHES

E

^LEKTRONEN

-S

^YNCHROTRON

(2)

(3)

Abstract

With B meson decays it is possible to probe the Standard Model of particle physics without the necessity to investigate high energy scales. In previous analyses, the quark transitionb→sℓ⁺ℓ⁻ showed signs of New Physics in kinematic observables that deviated from the Standard Model predictions. Among others, the decay B⁺ → K^∗(892)⁺µ⁺µ⁻ is particularly suited for the search since it is highly suppressed in the Standard Model. New Physics effects therefore could have similar amplitudes to Standard Model effects which makes them detectable.

In this thesis the sensitivity of the Belle II experiment of this particular decay is determined.

Using simulated particle decays the amount of signal and background candidates is estimated for data sets with various integrated luminosities which were selected to be relevant for the Belle II data taking period. A new B meson reconstruction is presented which uses the Belle II Analysis Framework and a boosted decision tree algorithm to classify individual events for their likelihood of being a signal event. This boosted decision tree is trained using 29 observables which are derived from the particle decay and show separation between the shapes of signal and background events.

The presented analysis outperforms the predecessor Belle analysis as the reconstruction efficiency is doubled and the expected amount of background is reduced by a factor of eight.

(4)

(5)

Zusammenfassung

Durch die Untersuchung von B-Mesonen-Zerfällen ist es möglich, die Voraussagen des Stan- dardmodells der Teilchenphysik zu überprüfen, ohne auf hohe Energieskalen ausweichen zu müssen. In vorherigen Analysen von Zerfällen mit dem Quark-Übergang b→sℓ⁺ℓ⁻ wurden bereits Anzeichen für Neue Physik in mehreren kinematischen Observablen gefunden. Neben Anderen bietet sich der ZerfallB⁺→K^∗(892)⁺µ⁺µ⁻für die Suche nach Neuer Physik an, da er im Standardmodell stark unterdrückt ist. Effekte jenseits des Standardmodells könnten daher ähnliche Amplituden haben, was sie detektierbar macht.

In dieser Arbeit wird die Sensitivität des Belle II Experiments gegenüber diesem Zerfall ermittelt. Mit simulierten Teilchenzerfällen werden die Anzahl von Signal- und Hintergrundkan- didaten für Datensätze mit verschiedenen integrierten Luminositäten, welche relevant für den Belle II Datennahme-Zeitplan gewählt sind, ermittelt. Eine neue B Mesonen-Rekonstruktion, die das Belle II Analysis Framework und einen Boosted Decision Tree Algorithmus verwendet, wird vorgestellt. Dieser Boosted Decision Tree wird mit 29 Variablen, welche aus dem Zer- fall berechnet werden und eine Auftrennung zwischen den Signal- und Hintergrundkandidaten zeigen, trainiert.

Die präsentierte Analyse übertrifft die Vorgängeranalyse von Belle insofern, als dass die Re- konstrutioneffizienz verdoppelt und die Anzahl der Hintergrundkandidaten um einen Faktor acht verringert werden konnte.

(6)

1. Introduction

On a fundamental level, the Standard Model of particle physics is one of the most successful and most tested theories in physics. Since the second half of the last century it has been build up to the theory it is today. Through trial and error it predicted the existence of the gluon, the massiveW^±andZ⁰bosons, the charm, the bottom, and the top quark before their experimental discovery with great accuracy.

It is however not perfect in a way that it leaves some phenomena unexplained. The most fundamental one, apart from not including gravity, is probably the imbalance of matter and antimatter in the universe which can not be explained by the allowed sources of CP violation in the Standard Model alone. Other sources of CP violation must lie beyond the Standard Model, showing that the theory needs to be expanded.

To come closer to the grand goal of particle physics, a theory of everything that explains all phenomena, two complementary experimental approaches exist. To provide confirmation of the existence of new particles they are produced directly at high energy scales of up to 14 TeV at the energy frontier. One example of this is the discovery of the Higgs boson at LHC in 2012.

This example illustrates how theory, which predicted the existence of the particle as early as the 1960s, and experiments worked together to gain knowledge about the nature of the universe.

On the intensity or precision frontier, hints on new particles are gathered indirectly. Elec- troweak processes that contain flavor changing neutral currents are highly susceptible to influ- ences of particles that arise in theoretical New Physics models. By detecting deviations from Standard Model predictions it is possible to learn about those particles without creating them directly. This method can be sensitive to new particles with masses up to orderO(100 TeV).

The Belle II experiment at SuperKEKB pursues the intensity approach. It is a B factory, creatingϒ(4S)mesons that decay intoBB¯meson pairs without other particles that contaminate the event, providing a clean and controlled analysis environment.

The focus of this thesis lies upon a specific B meson decay. The rareB⁺→K^∗(892)⁺µ⁺µ⁻ decay only has a branching ratio of B(B⁺ →K^∗(892)⁺µ⁺µ⁻) = 9.6·10⁻⁷, less than one in a million, which creates the necessity to analyze a large amount of data to be statistically relevant. This decay proves interesting for multiple reasons: it features a forward-backward asymmetry, several kinematic variables that indicate New Physics and, together with its sister decayB⁺→K^∗(892)⁺e⁺e⁻, the possibility to measure whether lepton universality is broken.

The goal of this thesis is to create a B meson reconstruction method that provides both good efficiency and purity. With it the reconstruction efficiency for different integrated luminosities that are meaningful for the Belle II data taking period is estimated.

(9)

CHAPTER 1. INTRODUCTION

Outline In section 2, a brief overview over the Standard Model is given, followed by some explanation on flavor structure in the Standard Model and the specific decay in this thesis.

Section 3 contains information about the experimental setup, in particular how the Belle II detector is structured and how the analysis of data is done in the Belle II software framework.

In the main chapter which is section 4 the data analysis is described. Every step along is detailed from start to finish to illustrate the process. In section 5 a conclusion is given.

(10)

(11)

CHAPTER 2. THEORY OVERVIEW

for the weak interaction: the neutral Z⁰ and the chargedW^±. WhileZ⁰ can interact with every particle in the Standard Model, W^± can only interact with charged ones. Only W^± bosons are capable of changing quark flavor, the neutral Z⁰ is not. Because of charge conservation, processes likeb→sare not possible on tree level since b and s have the same electrical charge.

This is known as the GIM (Glashow, Iliopolus and Maiani) mechanism, which forbids these Flavor Changing Neutral Currents at tree level. That is to say, the neutral gauge bosons (γ, g, Z⁰) can not have interactions with fermions which change flavor. Those processes can only occur highly suppressed in higher order loop diagrams.

Originally it was thought that a change of flavor could only occur within one generation, for example in the process u→d+W⁺. However, the mass eigenstates of quarks are not the same as the weak interaction eigenstates which introduces the mixing of quark flavors in the mass eigenstates of the down-type quarks. This translates to a probability of an inter-generation transition likes→u+W⁻happening.

This mixing is described by the CKM-matrixVCKM named after Cabbibo, Kobayashi and Masukawa. It is complex-valued, unitary and has four free parameters. Three of these are mixing angles between different quark generations and one is a CP-violating phase.





 d

s b







weak

=V_CKM





 d s b







mass

=







V_ud V_us V_ub V_cd Vcs V_cb V_td Vts V_tb











 d

s b







mass

(2.1)

|V_CKM| ≈







0.974 0.225 0.004 0.225 0.973 0.041 0.009 0.040 0.999





 (2.2)

The probability of a quark transition is proportional to the magnitude of the corresponding matrix element squared. Eq. 2.2 is taken from [3], where each element is determined and averaged from different processes.

The diagonal elements correspond to transitions within a single generation (likeu→d+W⁺).

Their magnitude is close to one, making these kinds of transitions most likely. u→s+W⁺ for example is a less likely transition sinceV_us is only 0.225.

2.2. New Physics Sensitivity of B

⁺

→ K

^∗

( 892 )

⁺

µ

⁺

µ

⁻

The decayB⁺→K^∗⁺µ⁺µ⁻featured in this thesis contains the flavor changing neutral current processb→sand therefore is forbidden in the Standard Model at tree level. While no tree level Feynman diagram can be found, it is possible to find diagrams that include a loop. Feynman diagrams with loops are of higher order and usually suppressed compared to tree level diagrams.

The suppression is caused by the high mass of theW^± boson and the flavor change in between

(12)

2.2. NEW PHYSICS SENSITIVITY OFB⁺→K^∗(892)⁺µ⁺µ⁻ generations. Fig. 2.2 shows three different diagrams, two of which are possible in the Standard Model.

Both Standard Model diagrams contain at least oneW^± with two interaction vertices where a change of flavor occurs. Fig. 2.2 (a) displays a penguin diagram in which the ¯bquark decays into an up-type anti-quark and a W⁻. TheW⁻ emits a γ or Z⁰ that decays into two µ and interacts with the up-type anti-quark afterwards to form a ¯squark. In fig. 2.2 (b) two separate W^± interact with the ¯bquark and the up-type anti-quark. TheW⁺ decays into a µ and a muon neutrinoν_µ, which decays into the otherµ and theW⁻. Fig. 2.2 (c) displays a diagram that is possible in the Minimal Supersymmetric Standard Model [4] which is a New Physics model. It is very similar to (b), only the twoW^±are exchanged for two charged Higgs bosons.

Since the Standard Model suppression is large and the branching ratio is small, contributions of potential New Physics effects could have an order of magnitude comparable to Standard Model effects.

¯s u u

¯b

W⁻

γ, Z⁰

µ⁻

µ⁺

¯u,c,¯¯t

B⁺ K^∗⁺

(a) Penguin Diagram

s¯ u u

¯b

µ⁻

µ⁺ u,¯ ¯c,¯t

νµ

W⁺ W⁻

K^∗⁺ B⁺

(b)W⁺W⁻Box Diagram

s¯ u u

¯b

µ⁻

µ⁺ u,¯ ¯c,¯t

ν_µ H⁺ H⁻

K^∗⁺ B⁺

(c) Minimal Supersymmetric Standard Model Diagram

Figure 2.2.: Feynman diagrams forB⁺→K^∗⁺µ⁺µ⁻ Credit for templates: Simon Wehle, [5]

One of the earliest indications thatB⁺→K^∗(892)⁺µ⁺µ⁻is sensitive to New Physics effects was the measurement of the forward-backward asymmetry A_FB. It expresses that a different amount of decay products have a momentum in forward direction with respect toK^∗⁺compared to decay products in backwards direction. This is caused by two interfering penguin diagrams where only the intermediate boson, aγ or aZ⁰, is different. For certain regions of the invariant

(13)

CHAPTER 2. THEORY OVERVIEW

mass of the lepton pair q², A_FB agrees both with the Standard Model prediction but also to predictions where certain Wilson coefficients were adjusted. Wilson coefficients are used to theoretically describe the kinematics of the decay in an ansatz called effective Hamiltonian.

The adjustment would simulate how New Physics effects could influence the decay [6, 7].

The theoretical uncertainties of AFB are fairly high because of hadronic form factor uncertainties, therefore the decay is described differently. Kinematically, four variables are needed to form a base that describes the decay. A common choice is q² together with three different angles between the decay products. With several transformations a basis can be found that is mostly independent of hadronic uncertainties, thus a higher accuracy for the theory prediction can be achieved. In recent measurements, most of these observables would be in agreement with Standard Model predictions, but observableP₅^′ showed 3.4σ deviation from the Standard Model [8].

Together with this decay’s sister decay B⁺ →K^∗(892)⁺e⁺e⁻, it is also possible to test a fundamental principle within the Standard Model. Lepton flavor universality suggests that it should not matter whether the lepton pair in the decay, apart from mass differences, is an electron or a muon pair. Therefore, the ratio

R_K∗+= B(B⁺→K^∗(892)⁺µ⁺µ⁻)

B(B⁺ →K^∗(892)⁺e⁺e⁻) (2.3) should be around a value of one. At this point in time R_K∗+ has not been measured. However, R_K∗0 andR_K which corresponds toB⁺→K⁺ℓ⁺ℓ⁻ were measured by the LHCb collaboration and found to be deviating from the Standard Model predictions. Both observables were found to be around 2.6σ below the Standard Model prediction [9, 10].

The data that Belle II will collect in the future will make these measurements more precise and usual 5σ that are required in particle physics to claim a discovery, might be achieved.

(14)

3. Experimental Setup

3.1. Introduction

The Belle II experiment is located at the SuperKEKB particle accelerator in Tsukuba, Japan and will be the direct and upgraded successor of the Belle experiment. The SuperKEKB accelerator is designed to produce B meson pairs for the detector in a clean experimental environment without perturbing amounts of other particles. With a comparatively low main collision energy this experiment pursues the precision approach of particle physics. It is a second generation B factory after Belle at KEKB and BaBar at the PEP-II accelerator.

3.2. SuperKEKB

The accelerator uses the tunnels of its predecessor KEKB and has the same basic principle. It consists of two beam pipes, one low energy ring (LER) for 4.0 GeV positrons and one high energy ring (HER) for 7.0 GeV electrons. The accelerator is depicted schematically in fig. 3.1.

At the interaction point, an electron annihilates with a positron with a center of mass energy of

√s≈ q

4·ELER·EHER=10.58GeV ≈m_ϒ(4S). Since the energy is chosen to be slightly higher than the mass of the ϒ(4S) meson, it is frequently created. At its creation it has almost no momentum and is therefore almost at rest. Theϒ(4S)decays with more than 96 % probability [3] into a BB¯ pair, and in many cases the e⁺e⁻ collision will exclusively produce a ϒ(4S), without any other particles.

The other feature is the asymmetric beam energy. If a BB¯ pair is created, its rest frame is boosted, giving the B mesons a longer half-life in the laboratory rest frame. This allows for a more precise detection of decay vertices, since the B meson traverses slightly further in the detector.

The main reason for the upgrade of KEKB to SuperKEKB was increasing the luminosity of the accelerator. By doubling the current in the beam pipes and shrinking the cross section of the interaction region (IR) by a factor of 20, SuperKEKB’s instantaneous luminosity L= 8·10³⁵cm⁻²s⁻¹will be 40 times larger than KEKB’s.

By immensely scaling down the vertical size of the beam which results in a smaller IR the problem of beam-induced background arises. The main source is the Touschek effect, where two particles in a bunch scatter and their energies are changed from the nominal bunch energy.

It is estimated that this effect will be about 20 times higher than in KEKB. Together with some

(15)

CHAPTER 3. EXPERIMENTAL SETUP

other sources, it is expected that beam background will be a lot higher, but as of now it is not known, how much impact this will have on the detection efficiency. [11, 12]

Figure 3.1.: Schematic view of the SuperKEKB accelerator. Credit: KEK

3.3. The Belle II Detector

The Belle II detector consists of several sub-detectors of which each one has a specific scope.

All of them are more or less arranged in multiple layers going outwards from the IR. The innermost component surrounding the IR is a two layer silicon pixel detector (PXD). Combined with the silicon vertex detector (SVD) it forms a unit to measure vertex positions. The central drift chamber (CDC) follows around these inner layers enclosed by the particle identification (PID) system, consisting of an array of time of propagation (TOP) counters in the radial direction and the Aerogel Ring-Imaging Cherenkov detector (ARICH) in the forward direction.

Finally, an electromagnetic calorimeter (ECL) and theK_L and muon detector (KLM) conclude this summary of the sub-detectors. The full detector is depicted schematically in fig. 3.2.

3.3.1. Vertex Detectors

The innermost detector system of Belle II is the pixel detector (PXD). It consists of two cylindrical layers of several planes, arranged in an overlapping fashion. With radii of 14 mm and 22 mm they are very close to the IR. Each plane is a two-dimensional array of 50µm thin DEPFET sensors with a size of 50 µm ×50 µm. With this small size a good vertex resolution despite the expected high beam background can be achieved. The silicon vertex detector (SVD) is composed of four cylindrical layers of double-sided silicon strip detectors. Just like the PXD’s layers, each layer is made of overlapping planes of detectors. The radius of the inner layer is 38 mm and the outer radius is 140 mm which is an upgrade from Belle’s SVD outer radius of 88

(16)

3.3. THE BELLE II DETECTOR

Figure 3.2.: Schematic view of the cross section of the Belle II detector, from the top. Credit:

[13]

mm. Because of higher beam background at radii up to about 10 cm which the CDC is not expected to be able to handle, the larger coverage with silicon detectors is necessary. The purpose of these systems is the measurement of vertices of decaying particles, mainly B mesons.

3.3.2. Central Drift Chamber

The Central drift chamber (CDC) is a detector for ionizing radiation. It is a cylindrical chamber filled with a 50-50 gas mixture of Helium and Ethane. Thin charged wires run across it to attract electrons of gas particles that were ionized by a highly energetic particle flying trough. With the time information and positioning of the wires, a charged track can be reconstructed, including its momentum.

The CDC can also be used to identify particles based on their energy loss traversing the gas.

This is done by comparing the energy lossdE/dxoff each individual particle together with its momentum to the theoretical energy loss for each type of charged particle in the material, which helps for an overall estimation for the particle’s identity.

3.3.3. Particle Identification

As the name suggests, the particle identification systems purpose is the differentiation between distinct particles, in particular between kaons and pions. It consists of two subsystems, the Time-Of-Propagation (TOP) counter and the Aerogel Ring-Imaging Cherenkov detector (ARICH). Both share their basic idea: measuring the velocity of a particle via Cherenkov radiation. Whenever a highly energetic particle passes through a radiator material, in this case quartz and aerogel, it emits Cherenkov photons in a cone with a specific opening angle. The angle

(17)

depends on the velocity, therefore measuring that angle yields the velocity. Together with the momentum information gained by the vertex detector and the CDC, a mass hypothesis can be calculated. The hypothesis is then compared to the nominal masses from PDG [3]. The particle associated with the nominal mass closest to the mass hypothesis is the most likely.

3.3.4. Electromagnetic Calorimeter

The electromagnetic calorimeter (ECL) is an accumulation of CsI(Tl)-crystals attached around the CDC. CsI(Tl) has a short radiation length and a high light output, making it a good scin- tillation material. When a photon or an electron hits one of the crystals or clusters, a electromagnetic shower due to pair production and bremsstrahlung occurs. The intensity of the light of the shower is then measured by photo diodes, resulting in a value for the energy deposition.

Because of the large angular coverage of the ECL, photons can be detected with high efficiency and electrons can be identified. Their energy corresponds to the deposited energy.

3.3.5. K

_L

and Muon Detection

The K_L and muon detector is the most outer part of the Belle II detector and is made out of alternating iron plates and resistive plate chambers (RPC’s). RPC’s functions similar to the CDC in a way, as the charged particle ionizes molecules along its path and the secondary electrons are amplified and measured. Next to serving as a flux return for the magnetic field, the main purpose of the KLM is to detectK_Land muons. The thick iron plates provide several interaction length to makeK_L’s shower hadronically and the shower can be detected by the RPC’s.

3.3.6. Detector Solenoid

In between the ECL and the KLM a superconducting solenoid magnet is located. The supercon- ductor used is a composition of Niob Titanium Copper (NiTi/Cu). It creates an approximately homogeneous magnetic field with 1.5 T. The tracks of charged particles traversing the field are forced into a circular path which allows the measurement of the particle’s momentum.

This all information in this overview is taken from [12] which offers highly detailed technical descriptions of each component.

3.4. Data Taking Period

At this point in time, construction of the upgrade from Belle to Belle II and KEKB to Su- perKEKB is done. The data taking period is visualized in fig. 3.3. After thorough testing, the planned start of the data taking with all detector components in place is scheduled for late 2018.

Over time, the instantaneous luminosity is raised until it reaches its peak around mid 2022.

Plateaus in the integrated luminosity are caused by maintenance breaks.

(18)

(19)

laborations like ROOT¹ from Cern. To simplify the physics analysis, the framework has an interface allowing it to be used with python. Just as described above, all modules can be loaded in a path that is defined by a simple python script. In addition to loading models this steering file can also contain any python functionality, making it quite versatile while the syntax remains simple.

3.5.2. Monte Carlo Simulation

To get an estimate on signal yields for the decay, a toy study with simulated data is conducted.

In basf2, simulations of particle collisions, also called Monte Carlo (MC) events, are generated using a variety of external packages. Among others, EvtGen [15] and PYTHIA [16] are used to generate a chosen particle production and decay, based on random number generators. Next, the detector response to the generated decay is simulated using the GEANT4 [17] software. The results should mimic the detector response to a real decay as closely as possible. Finally, the particle decay gets reconstructed from the detector response. Tracks from charged particles get fitted, ECL hits get grouped together and several other operations are performed. At this point, the Monte Carlo data is ready for use in a physics analysis.

All these steps can be done locally. Large samples of different kinds of Monte Carlo data are produced in central Monte Carlo campaigns. With each new release version of basf2, a new data set is generated. This data is accessible through the GRID, a global network of clusters for dispersed computing.

The analysis in this thesis requires two different types of MC data sets.

Signal MC Only signal events are contained in this set. These are events where a B⁺B⁻ pair is created and one of the mesons decays like B⁺ →K^∗⁺µ⁺µ⁻. This also includes the charge conjugated decay B⁻ →K^∗−µ⁻µ⁺. The other meson decays generically according to its branching ratios. To have enough training data for the analysis, 1,000,000 events are generated locally.

Table 3.1.: Composition of generic MC

Type Content Percentage

Continuum MC e⁺e⁻ →qq,¯ q=u,d,c,s 69 %

Mixed MC ϒ(4S)→B⁰B¯⁰ 16 %

Charged MC ϒ(4S)→B⁺B⁻ 15 %

Generic MC This set contains events that are associated to be background events. These on one hand are events where aϒ(4S)is created, but both mesons of theBB¯pair do not decay like

1https://root.cern.ch/

(20)

3.5. BELLE II ANALYSIS FRAMEWORK

B⁺→K^∗⁺µ⁺µ⁻, which is not even possible if it is aB⁰B¯⁰pair. On the other hand also events where not aϒ(4S), but a quark pair is created are included in the generic MC. These processes followe⁺e⁻→qq,¯ q=u,d,c,sand are called continuum events.

The generic MC is composed of these different background such as the percentage of each background component mimics the probability of a real collision be of that kind of background.

The percentages are listed in tab. 3.1. 1 ab⁻¹ of generic MC events was obtained through the GRID.

(21)

4. Analysis

The main part of this thesis is a sensitivity study to determine how many signal candidates of the rare decayB⁺ →K^∗⁺µ⁺µ⁻ can be expected given the data sets with various luminosities at the Belle II experiment. The luminosities are chosen to be 0.711, 1, 5, 10, 25 and 50 ab⁻¹.

4.1. B Meson Reconstruction

The studies are done using the MC data described in sec. 3.5.2. It is evaluated using basf2.

First, the raw data undergoes several reconstruction steps to build up a full B meson decay chain. Every event, both from background MC and signal MC, is iterated as follows.

1. The charged muon, kaon and pion tracks are selected.

2. TheK^∗⁺ gets reconstructed via its two main decay channels:

a) K^∗⁺ →K⁺ π⁰ where π⁰ is reconstructed from two photons. Each photons is reconstructed by an algorithm that groups up ECL hits close to each other and creates possible photon candidates.

b) K^∗⁺ →K_S⁰π⁺ whereK_S⁰ is reconstructed by an algorithm that looks for two oppo- sitely charged particle tracks which have the same spatial origin.

3. The B meson is reconstructed fromB⁺→K^∗⁺µ⁺µ⁻.

Every step is also done for the charge conjugated variant which is equivalent. In further steps, all variables described in section 4.2.1 are calculated and written into a file for analysis use.

4.1.1. Selection Criteria

Since most of events have various combinatorial possibilities to reconstruct each intermediate particle, a selection of constrains has to be applied. This limits the amount of computational work and saves time. In this first selection step the amount of combinatorial background is drastically decreased.

PID A particle identification (PID) with the detector response can be performed. For each reconstructed particle, six different probabilities are calculated, each corresponding to one of the »stable« charged particles. These are electrons, muons, kaons, pions, deuterons and protons.

(22)

4.1. B MESON RECONSTRUCTION Their lifetime and therefore their mean free path is long enough to not decay in the detector, which makes them stable in this experiment.

To calculate a PID value, the interaction of each particle with different sub-detectors is taken into consideration individually. For each sub-detector, six likelihoods for the stable hypotheses are determined.

∆ln(L^α) =ln(L^hyp)−ln(L^α) (4.1) With eq. 4.1, a logarithmic difference in combined likelihood from all sub-detectors for a specific particle hypothesis is calculated. L^α is the sum of the likelihoods of different sub- detectors for particle hypothesisα, whileL^hyp is the summed likelihood for the hypothesis of the particle itself, which is arbitrarily chosen. The PID value can be extracted by normalizing

∆ln(L)to a scale from zero to one. This results in a powerful discriminator for opposing particle hypothesis [11].

To find the best PID information constraint that preserves enough efficiency, the PID information for every reconstructed electron, kaon and pion for a sample of 100,000 signal MC events is analyzed. Only the probability for a muon candidate to be a muon, and similarly for kaons and pions, is looked at. Other variants such as the probability of e.g. a muon candidate being a kaon are not taken into account.

Since the data is generated, every particle along the decay chain is known. Solely those particles that satisfy the following conditions are considered true candidates:

• ForK andπ:

– The particle candidate is a generatedK/π

– The particle candidates mother particle is aK^∗(892)⁺ – The particle candidates grandmother particle is aB⁺

• Forµ:

– The particle candidate is a generatedµ

– The particle candidates mother particle is aB⁺

All other particles are neglected because they do not matter for the main decay and are considered background.

The efficiency and purity for each constraint are calculated as efficiency=N_(true_|_selected)

N_(true) (4.2)

purity= N_(true_|_selected)

N_(true_|_selected)+N₍_{f alse}_|_selected). (4.3) The index ’selected’ indicates that only candidates that fulfill the PID constraint are included whereas a lack of this index includes all candidates.

(23)

(24)

(25)

(26)

(27)

(28)

(29)

CHAPTER 4. ANALYSIS

layer and in case of a good decision tree, one subset should contain either only true or only false candidates. Generally this is not the case and the model is not very strong.

To create a stronger model, multiple decision trees are combined together in a process called boosting. To start with, a decision tree limited to a depth of a few layers is trained. Also referred to as a weak learner, this model will predict some of the data correctly and some of it incorrectly. This imperfect learner is then inputted into the boosting algorithm which tries to find another weak learner that improves the shortcomings of the imperfect learner. Both are combined to form another imperfect learner which is slightly better than its predecessor. This step can theoretically be repeated indefinitely [18].

The combination of many weak learners, also referred to as estimators, results in a model that has a lot of separation power, referred to as a strong learner. After the training, the model can be applied to test data that is similar to the training data and has the same features. All data points in the test set are run through the tree and a probability for each one being a signal is calculated.

In case of a good model, this divides the test set into two groups with few data points that have a probability around 50 % which indicates that the tree can not classify this point well.

4.2.1. Multivariate Analysis Features

The data set that has to be classified is the generic background mixed with the signal MC, both of which are discussed in section 3.5.2. The set is split into two subsets of equal size, where the signal MC is marked as signal for the classifier. One of the sets is used to train the classifier while the other one is used as a test set to later get the results. This step prevents the classifier to learn the training set by heart and not generalize to similar events.

To train the boosted decision tree, features from the data set have to be selected. There is no general way to always find the optimal features, they are selected through trial and error. Tab.

4.3 contains every feature used to train the boosted decision tree.

The histograms for each feature, comparing the signal shape to the different background shapes that form the generic background, are displayed in the appendix in fig. A.1.

Continuum Suppression To identify and suppress continuum events (e⁺e⁻→qq,¯ q=u,d,s, c), several sets of variables exist. The idea is to use the topological differences of continuum decays and realBB¯events. If aϒ(4S)is created, it only has a small momentum and is approximately at rest and it therefore decays almost isotropically into aBB¯ pair. If instead something other than aϒ(4S)is created, it has enough momentum that the decay forms two opposite jets of daughter particles.

For particles in the event with momenta p_i, (i=1, ...,N), the thrust axis~T is defined as the unit vector along which their total projection is maximal. The magnitude of the thrust axis

T ^max= ∑^N_i=1|Tˆ·~pi|

∑^N_i=1|~pi| (4.9)

(30)

4.2. MACHINE LEARNING

Table 4.3.: Features used for classifier training

Feature Description

∆E ∆E=E_B−E_beam

M_K∗(892)⁺ Mass of the reconstructedK^∗(892)⁺

σ_M_K∗(892)+ Significance ofM_K∗(892)⁺

Vertex P Value Measure for the quality of the decay vertex fit. The vertex is calculated using the charged tracks of the decay. A low value indicates that the charged tracks originate close to each other.

Thrust B Magnitude of B thrust axis

cos(θB,ROE) Cosine of the angle between the thrust axis of the B meson and the thrust axis of the Rest of Event (ROE)

Cleo Cones see below

Modified Fox-Wolfram Moments see below

R2 reduced Fox-Wolfram moment

M_miss² Squared Missing Mass

E_T Transversal Energy

EROE Energy of unused tracks and clusters in ROE

M_ROE Invariant mass of unused tracks and clusters in ROE

E_extra,_ROE Extra energy from ECLClusters in the calorimeter

that is not associated to the given Particle

nE_extra,_ROE Extra energy from neutral ECLClusters in the

calorimeter that is not associated to the given Parti- cle

can be defined as a derived quantity [19]. It serves as a measure for general direction of multiple particles. The B thrust axis therefore is the thrust axis given by all daughter particles of the B meson.

To put this concept into use, in 1996 the CLEO collaboration developed the Cleo Cones as variables for charmless B decays. Nine cones around the thrust axis of the B candidate with apex angles of each 10^◦more than the one before it are defined. The momentum flowx_i,(i=1, ...,9) through these cones are defined as the scalar sum of the momenta of all tracks going through the cones. Each event is folded in a way that the forward and backward direction are combined.

The momentum flow through each cone for an isotropic decay should look different than the flow of a jet-like decay [20].

Another way to quantify the shape differences in the decays is the definition of Fox-Wolfram

(31)

CHAPTER 4. ANALYSIS moments. They are obtained by

H_l=

∑

m,n

|p~_m||~p_n|P_l(cosθ_mn)

E_vis² (4.10)

where~pi is the momentum of the i-th particle,P_l the l-th order Legendre polynomial,θmn the angle between particle m and n and Evis the total visible energy in the event [21]. The ratio R_K =H_k/H0 is often used as a feature, especiallyR₂seems to have a lot of separation power between continuum andBB¯ events.

To further refine these moments, each calculation can be done using not all but only specific particles. The particles are separated into groups depending on if they are a daughter of the signal B meson (s) or a member of the rest of event (o). The modified Fox-Wolfram moments H_l^ss,H_l^soandH_l^oo, also known as KSFW moments, can be calculated by only using the specified particle groups. Multiple of these are used in the training of the classifier.

Correlation It is important that none of the features of the signal MC data set correlate too strongly with the beam constrained massM_bc. To estimate the efficiency, this variable is used to obtain the number of signal and background candidates, it therefore must not have any sort of bias that is introduced by the classifier which could be caused by correlated features. A biased classifier would leave more background candidates tagged as signal candidates with aM_bcvalue close to the nominal B mass, resembling a signal candidate.

Linear correlation can be quantified using the Pearson correlation coefficient. It is defined as ρ_xy≡ ∑ⁿ_i=1(xi−x) (y¯ i−y)¯

q

∑ⁿ_i=1(xi−x)¯ ²·∑ⁿ_i=1(yi−y)¯ ²

(4.11)

where ¯xis the mean value of data set x.

Fig. 4.8 is a heat map showing all possible correlations between the classifier features. It is symmetrical since the arguments of the Pearson coefficient are commutative. Each feature cor- relates with itself, therefore having a maximum correlation of one with itself. The bottom most row displays the correlation of M_bc to every other feature. The strongest correlation observ- able here is the one with∆E, beingρ_∆E,_M_bc =0.22. This compromise has to be made because

∆E has by far the greatest separation power of all features. All other features do not have any significant linear correlation withM_bc.

It is possible that two features do not correlate linearly but non-linearly with each other. This case is not covered in this simple evaluation. It should however suggest how correlation affects the final classification result.

4.2.2. Best Candidate Selection

For most events there are multiple candidates that fit the discussed constraints, although there can at most only be one true candidate per event. To drastically reduce this combinatorial

(32)

(33)

CHAPTER 4. ANALYSIS

therefore likely to select the one true candidate out of the several false ones. This can only be done because∆E andM_bc are not too strongly correlated. It is also possible to only select the candidate that has the highest probability of being a signal candidate, outputted by the boosted decision tree. Both options were tested against each other, leaving the∆E candidate selection as the superior variant.

4.3. Efficiency Estimation

As mentioned in sec. 4.2.1, both a training and test data set with equal sizes are created. The classifier is trained on the corresponding set and afterwards applied on the test set. Each candidate in every event now has a calculated probability of being a signal candidate. To further reduce background candidates, the best candidate selection described above is applied.

The package XGBoost [22] that implements a boosted decision tree method is used in this thesis since it runs fast and reliable. To confirm the good performance of XGBoost it is compared to different classifiers.

4.3.1. Receiver Operator Characteristics

To quantify the performance of the classifier against other classifier models or the same model with different parameters, a Receiver Operating Characteristic (ROC) curve is defined. For a classifier the true positive rate (TPR) is plotted against the false positive rate (FPR) at different threshold settings. A perfect classifiers ROC curve would include the point (0, 1) since it sym- bolizes no false positives while the TPR is one. The area under the curve is a measure for the quality of the classifier, evaluated on a test data set.

A variation of this is a purity versus efficiency curve where both quantities are plotted against each other. Here a perfect classifier retains 100% efficiency while the purity is also 100%.

To justify the choice of classifier model in this thesis, different models are compared via purity versus efficiency curves displayed in fig. 4.10. All classifiers were trained on the same data and features. XGBoost visibly performs best.

To approximate the best parameters for XGBoost and for this particular problem, multiple XGBoost classifiers are tested against each other. The parameters that were varied are the maximum depth a single tree/weak learner can have and the number of iteration the training has. The result is displayed in 4.11.

At a certain number of estimators and at a certain max. depth, the scores are very similar.

With a score of 0.974, the classifier with max. depth of four and 800 estimators came out on top.

This configuration however produces an output that seems to be correlated toM_bc. Therefore, a configuration slightly less powerful but also less correlated was chosen. A classifier with max.

depth three and 300 estimators was used.

(34)

(35)

(36)

(37)

(38)

4.4. RESULTS At this point the differences in choice of features and classifier parameters become apparent.

As discussed in sec. 4.3.1 not the best performing classifier was chosen. This is because the

»stronger« classifiers would cause an image where the correlation between the classifier output andM_bcis stronger as well. One has to balance between a good separation and efficiency or a low correlation between these variables.

(39)

5. Conclusion

In this thesis, the Belle II Analysis Framework was used to generate and analyze large amounts of simulated data. I reconstructed the rare B meson decayB⁺ →K^∗(892)⁺µ⁺µ⁻ and selec- tively filtered it by applying hard constraints on different variables, choosing specific events based on a ranked feature and the use of a classifying algorithm. The algorithm was chosen to be a boosted decision tree because it performed the best with its default parameters. After I tweaked some of these to be optimized for this task, a good separation between the signal and the background class was achieved.

The reconstruction efficiency and the amounts of signal candidates versus background candidates in the signal region ofM_bcfor different integrated luminosities were calculated. The integrated luminosities were chosen in steps relevant for the Belle II data taking period. Compared to the results of Belle [5] on the same decay, the figure of merit value for the reconstruction was more than doubled. While the amount of signal was improved from 26 to 55 candidates, the amount of background candidates was reduced from 122 to only 15. These numbers dis- play the significant improvement to the reconstruction of this decay which I achieved with the introduction of a machine learning algorithm, the new software framework and the upgraded detector.

In conclusion, a reconstruction was found that allows an estimate for future calculations of R_K∗. The latest paper published by the LHCb collaboration [10] used roughly 640 signal candidates from theB⁰→K^∗⁰µ⁺µ⁻ decay to calculateR_K∗0. It can be reconstructed well since it does not require a reconstruction of neutral particles in the final state. In theB⁰→K^∗⁰e⁺e⁻ channel however only 200 candidates could be used which accounts for a large portion of the error onRK^∗. In the Belle II experiment it is expected that the electron channel reconstruction performs similarly to the muon channel reconstruction. Therefore, by the time 5 ab⁻¹ of data are collected, Belle II’s measurement ofR_K∗+should at least be comparable to LHCb’s in terms of deviation from the Standard Model. 5 ab⁻¹should be reached in 2021, giving a prospect of New Physics in the near future.

(40)

Bibliography

[1] S. L. Glashow, “Partial Symmetries of Weak Interactions,”Nucl. Phys.22(1961) 579–588.

[2] S. Weinberg, “A Model of Leptons,”Phys. Rev. Lett.19(Nov, 1967) 1264–1266.

https://link.aps.org/doi/10.1103/PhysRevLett.19.1264.

[3] C. Particle Data Group Collaboration, Patrignaniet al., “Review of Particle Physics,”

Chin. Phys.C40no. 10, (2016) 100001.

[4] C. Csaki, “The Minimal supersymmetric standard model (MSSM),”Mod. Phys. Lett.

A11(1996) 599,arXiv:hep-ph/9606414 [hep-ph].

[5] S. Wehle,Angular Analysis of B→K^∗ℓℓand search for B⁺→K⁺ττ at the Belle experiment. Phd thesis, Universität Hamburg, 2016.

[6] K. Abe and the Belle Collaboration, “Measurement of the Differentialq²Spectrum and Forward-Backward Asymmetry forB→K⁽^∗⁾ℓ⁺ℓ⁻,”ArXiv High Energy Physics - Experiment e-prints(Oct., 2004) ,hep-ex/0410006.

[7] A. Ishikawa and the Belle Collaboration, “Measurement of Forward-Backward Asymmetry and Wilson Coefficients inB→K^∗ℓ⁺ℓ⁻,”ArXiv High Energy Physics - Experiment e-prints(Aug., 2005) ,hep-ex/0508009.

[8] LHCb collaboration, R. Aaij,et al., “Angular analysis of theB⁰→K^∗⁰µ⁺µ⁻ decay using 3 fb⁻¹of integrated luminosity,”ArXiv e-prints(Dec., 2015) ,arXiv:1512.04442 [hep-ex].

[9] LHCb collaboration, R. Aaij, B. Adeva,et al., “Test of Lepton Universality Using B⁺→K⁺ℓ⁺ℓ⁻ Decays,”Physical Review Letters113no. 15, (Oct., 2014) 151601, arXiv:1406.6482 [hep-ex].

[10] LHCb collaboration, R. Aaij, B. Adeva,et al., “Test of lepton universality with

B⁰→K^∗⁰ℓ⁺ℓ⁻decays,”ArXiv e-prints(May, 2017) ,arXiv:1705.05802 [hep-ex]. [11] Emi Kou, Phillip Urquijo, The Belle II collaboration and The B2TiP theory community,

“The Belle II Physics Book,”not yet published.

(41)

Bibliography

[12] T. Abe, I. Adachi, Adamczyk,et al., “Belle II Technical Design Report,”ArXiv e-prints (Nov., 2010) ,arXiv:1011.0352 [physics.ins-det].

[13] P. C.,dE/dx particle identification and pixel detector data reduction for the Belle II experiment. Diploma thesis, KIT, 2012.

[14] A. Moll, “The Software Framework of the Belle II Experiment,”Journal of Physics:

Conference Series331no. 3, (2011) 032024.

http://stacks.iop.org/1742-6596/331/i=3/a=032024.

[15] David J. Lange, “The EvtGen particle decay simulation package,”Nuclear Instruments and Methods in Physics Research A462no. 1, (2001) 152 – 155.

http://www.sciencedirect.com/science/article/pii/S0168900201000894. [16] T. Sjöstrand, S. Ask,et al., “An introduction to PYTHIA 8.2,”Computer Physics

Communications191(June, 2015) 159–177,arXiv:1410.3012 [hep-ph]. [17] S. Agostinelli and J. Allison and others, “Geant4 - a simulation toolkit,”Nuclear

Instruments and Methods in Physics Research A506no. 3, (2003) 250 – 303.

http://www.sciencedirect.com/science/article/pii/S0168900203013688. [18] H. D. III,A Course in Machine Learning. self-published, 2017.

http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf.

[19] A. J. Bevan, B. Golob, T. Mannel, S. Prell, B. D. Yabsley, H. Aihara, F. Anulli, N. Arnaud, T. Aushev, and M. Beneke, “The Physics of the B Factories,”European Physical Journal C74(Nov., 2014) 3026,arXiv:1406.6311 [hep-ex].

[20] D. M. Asneret al., “Search for exclusive charmless hadronic B decays,”Physics Review D53(Feb., 1996) 1039–1050,hep-ex/9508004.

[21] G. C. Fox and S. Wolfram, “Event shapes ine⁺e⁻ annihilation,”Nuclear Physics B149 no. 3, (1979) 413 – 496.

http://www.sciencedirect.com/science/article/pii/0550321379900038. [22] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,”ArXiv e-prints

(Mar., 2016) ,arXiv:1603.02754 [cs.LG].

(42)

A. Appendix

A.1. Features

In this section every feature that is used for the classifier training is displayed. Each figure contains five differently colored sets. While the mixed and charged set correspond to the background sets described in 3.5.2, the continuum background is divided into charm and "uds".

Charm contains all background that is produced via e⁺e⁻ →cc¯while "uds" contains the rest of the continuum background, this being e⁺e⁻ →qq,¯ q=u,d,s. This separation was made because charm behaves differently than "uds" in some variables. Everything is normed so that the shape differences become apparent.

(43)

(44)

(45)

(46)

(47)

(48)

Hiermit bestätige ich, dass die vorliegende Arbeit von mir selbständig verfasst wurde und ich keine anderen als die angegebenen Hilfsmittel - insbesondere keine im Quellenverzeichnis nicht benannten Internet-Quellen - benutzt habe und die Arbeit von mir vorher nicht einem anderen Prüfungsverfahren eingereicht wurde. Die eingereichte schriftliche Fassung entspricht der auf dem elektronischen Speichermedium. Ich bin damit einverstanden, dass die Bachelorar- beit veröffentlicht wird.

Jasper Riebesehl 10. November 2017, Hamburg

B ELLE IIE XPERIMENT D ECAYATTHE B → K ( 892 ) S ENSITIVITY S TUDYFORTHE

B

D

E

-S

, H

S ENSITIVITY S TUDY FOR THE

B + → K ∗ (892) + µ + µ − D ECAY AT THE

B ELLE II E XPERIMENT

J ASPER R IEBESEHL

G

P

. D

. C

H

, U

H

&

D

. A

G

,

D

E

-S

Contents

1. Introduction

2.2. New Physics Sensitivity of B

→ K

( 892 )

µ

µ

3. Experimental Setup

3.1. Introduction

3.2. SuperKEKB

3.3. The Belle II Detector

3.3.1. Vertex Detectors

3.3.2. Central Drift Chamber

3.3.3. Particle Identification

3.3.4. Electromagnetic Calorimeter

3.3.5. K

and Muon Detection

3.3.6. Detector Solenoid

3.4. Data Taking Period

3.5.2. Monte Carlo Simulation

4. Analysis

4.1. B Meson Reconstruction

4.1.1. Selection Criteria

4.2.1. Multivariate Analysis Features

∑

4.2.2. Best Candidate Selection

4.3. Efficiency Estimation

4.3.1. Receiver Operator Characteristics

5. Conclusion

Bibliography

A. Appendix

A.1. Features

S ^ENSITIVITY S TUDY FOR THE

B ⁺ → K ^∗ (892) ⁺ µ ⁺ µ ⁻ D ECAY AT THE

B ^ELLE II E ^XPERIMENT