• Keine Ergebnisse gefunden

Online Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

N/A
N/A
Protected

Academic year: 2022

Aktie "Online Track and Vertex Reconstruction on GPUs for the Mu3e Experiment"

Copied!
55
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Online Track and Vertex Reconstruction on GPUs for the Mu3e Experiment

Dorothea vom Bruch

March 7

th

2017

Connecting the Dots / Workshop on Intelligent Trackers 2017

(2)

Mu3e Signal

Signal

Coincident in time

Single vertex

E = mμ

pi=0 e+

e+ e-

Search for charged lepton flavour-violating decay

μ+

→ e

+

e

-

e

+

with a

sensitivity in branching ratio better than 10

-16

(3)

The Mu3e Detector

Target

Inner pixel layers

Scintillating f bres

Outer pixel layers Recurl pixel layers

Scintillator tiles

μ Beam

i

10 cm 4.5 cm

B

(4)

The Mu3e Detector

10 cm 4.5 cm

Target

Inner pixel layers

Scintillating f bres

Outer pixel layers i

Recurl pixel layers Scintillator tiles

μ Beam

B

(5)

Readout Scheme

FPGA: Field-Programmable Gate Array GPU: Graphics Processing Unit

2844 Pixel Sensors

up to 45 1.25 Gbit/s links

FPGA FPGA FPGA

...

86 FPGAs

1 6 Gbit/s link each

GPU PC

GPU PC

GPU 12 PCs PC

12 10 Gbit/s links per

8 Inputs each

3072 Fibre Readout Channels

FPGA FPGA

...

12 FPGAs

6272 Tiles

FPGA FPGA

...

14 FPGAs

Data Collection

Server

Mass Storage Gbit Ethernet

Switching Board

Switching Board Switching

Board

Front-end(inside magnet)

Switching Board

(6)

Readout Scheme

From Switching board: get 50 ns time slices of data containing full detector information

2844 Pixel Sensors

up to 45 1.25 Gbit/s links

FPGA FPGA FPGA

...

86 FPGAs

1 6 Gbit/s link each

GPU PC

GPU PC

GPU 12 PCs PC

12 10 Gbit/s links per

8 Inputs each

3072 Fibre Readout Channels

FPGA FPGA

...

12 FPGAs

6272 Tiles

FPGA FPGA

...

14 FPGAs

Data Collection

Server

Mass Storage Gbit Ethernet

Switching Board

Switching Board Switching

Board

Front-end(inside magnet)

Switching Board

(7)

Readout Rate

Data rate [Gbit / s]

Pixel detector 40

Fiber detector 20

Tile detector negligible

Total ~ 60

At a rate of 108 muons / s

Triggerless, zero-suppressed readout

Need factor ~80 reduction to reach 100 MB/s

(8)

Readout Rate

Data rate [Gbit / s]

Pixel detector 40

Fiber detector 20

Tile detector negligible

Total ~ 60

At a rate of 108 muons / s

Triggerless, zero-suppressed readout

Need factor ~80 reduction to reach 100 MB/s

(9)

Selection Process

How do we find the three signal tracks?

1) Track fitting 2) Vertex search

e+

e+ e-

(10)

Geometrical Selection

r x

y

01

2

0 1

2

(11)

Geometrical Selection

r x

y

01

2

0 1

2

z1 - z0

Ф1 - Ф0

(12)

Geometrical Selection

r x

y

01

2

0 1

2

z2 - z1

Ф2 - Ф1

(13)

Geometrical Selection

r x

y

01

2

0 1

2

z2 - z1

Ф2 - Ф1

Reduce 3-hit combinations by factor 50

(14)

Fitting

Use Multiple Scattering Fit ( talk by A. Kozlinskiy)

Fit hits in first three layers

Propagate to 4th layer

Select hit in 4th layer closest to propagated position

Redo fit with a second triplet, cut on χ2

After all selections:

98.5 % of true 4-hit MC tracks selected

74 % of 4-hit tracks are true MC tracks

(15)

Vertex Estimate: XY-Plane

Study each combination of two e+, one e-

In xy-plane: find intersections of track circles

Calculate weights of intersections based on uncertainties due to

multiple scattering

pixel size

e+ e+

e-

x y

(16)

Vertex Estimate: XY-Plane

Study each combination of two e+, one e-

In xy-plane: find intersections of track circles

Calculate weights of intersections based on uncertainties due to

multiple scattering

pixel size

e+ e+

e-

x y

(17)

Vertex Estimate: XY-Plane

Study each combination of two e+, one e-

In xy-plane: find intersections of track circles

Calculate weights of intersections based on uncertainties due to

multiple scattering

pixel size

e+ e+

e-

x y

(18)

Vertex Estimate: XY-Plane

Study each combination of two e+, one e-

In xy-plane: find intersections of track circles

Calculate weights of intersections based on uncertainties due to

multiple scattering

pixel size

e+ e+

e-

x y

(19)

Vertex Estimate

PCAxy 1

x y

PCAxy 2

PCAxy 3

Weighted mean

Calculate weighted mean of intersections from three different tracks

Find point of closest approach (PCAxy) to weighted mean in xy-plane on each track

Calculate z-position PCAz and weight at PCAxy

Find weighted mean in z-coordinate

Achieve vertex resolution of ~400 μm sigma

χ

2

= ∑

i=0

3

PCA

xy ,i

− ¯ xy σ

PCA

xy ,i

+ PCA

z , i

−¯ z σ

PCA

z, i

z

(20)

χ 2 Distribution

0 10 20 30 40 50 60 70 80 90 Chi2100

Number of Entries

103

104

Random combinations Signal

(21)

Cut Effects

Signal reference: full offline track reconstruction and offline vertex fit

0.986 0.988 0.990 0.992 0.994 0.996 0.998 1.000

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

background tightsignalcut

Signal frames accepted Background frames accepted

(22)

Fast Reconstruction on GPU

Use time slices of 50 ns for track &

vertex search

→ Process 20∙106 time slices per second

Plan for 12 filter farm PCs with one GPU each

→ Process at least 1.7∙106 time slices per second

use GPUs

Thousands of cores

Optimal parallel performance

Best suited for many floating-point operations / second

(23)

Selection on GPU

PCIe FPGA

Recurl station hits, Timing information

Hits layer 1

Geometrical three-hit selection

Coordinate transformation

Hits layer 2

Hits layer 3

Hits layer 4

DMA

GPU

GPU memory Three-hit fit

Propagation, four-hit fit Positive

tracks Negative tracks Vertex selection

GPU memory

Selection decision

DMA

(24)

Selection on GPU

PCIe FPGA

Recurl station hits, Timing information

Hits layer 1

Geometrical three-hit selection

Coordinate transformation

Hits layer 2

Hits layer 3

Hits layer 4

DMA

GPU

GPU memory Three-hit fit

Propagation, four-hit fit Positive

tracks Negative tracks Vertex selection

GPU memory

Selection decision

DMA

(25)

Parallelization Track Fit

Time slice

1 Time slice 2

Time Slice N

...

...

...

... ... ...

Fit for one combination of three hits

Propagation to 4th layer

Loop over hits in 4th layer: check if hit exists in proximity of propagated track, re-fit

Wait for all cores in one time slice to be done with previous steps

Thread

1 Thread

2

Thread N

...

...

...

... ... ...

16 x 8192 50 ns time slices

96 threads / time slice

(26)

Parallelization Track Fit

Time slice

1 Time slice 2

Time Slice N

...

...

...

... ... ...

Fit for one combination of three hits

Propagation to 4th layer

Loop over hits in 4th layer: check if hit exists in proximity of propagated track, re-fit

Wait for all cores in one time slice to be done with previous steps

Thread

1 Thread

2

Thread

...

...

...

... ... ...

16 x 8192 50 ns time slices

96 threads / time slice

Total of 12.6 million threads to be distributed among 2560 cores

(27)

Parallelization Vertex Selection

Time slice

1 Time slice 2

Time slice N

...

...

...

... ... ...

For one electron & one positron from this 50 ns time slice:

Loop over all other positrons

Find vertex estimate

Decide whether to keep this time slice

Thread

1 Thread

2

Thread N

...

...

...

... ... ...

(28)

Performance

Optimizations performed:

Memory layout and access pattern

Register usage

Grid dimensions

Approximations

(29)

Performance

Optimizations performed:

Memory layout and access pattern

Register usage

Grid dimensions

Approximations

Currently process 2106 time slices / s on one nvidia GTX 1080 at a muon stopping rate of 7∙107 Hz

(30)

Backup

(31)

Muon Stopping Rate Study I

4.00E+07 6.00E+07 8.00E+07 1.00E+08 1.20E+08 0.86

0.88 0.9 0.92 0.94 0.96 0.98 1

0 0.01 0.02 0.03 0.04 0.05 0.06

background tightsignalcut truthsignal losesignalcut

muon stopping rate on target [Hz]

Signal frames accepted Background frames accepted

(32)

Muon Stopping Rate Study II

4.0E+07 6.0E+07 8.0E+07 1.0E+08 1.2E+08 1.4E+08 0.0E+00

5.0E+05 1.0E+06 1.5E+06 2.0E+06 2.5E+06 3.0E+06 3.5E+06 4.0E+06

Muon stopping rate on target

Frames / s

4.0E+07 6.0E+07 8.0E+07 1.0E+08 1.2E+08 1.4E+08 0

0 0 0 0 0 0 0

0 0 0 0 0 0.01 0.01 0.01 0.01

frames with hit overflow

Muon stopping rate on target

Frames with hit overflow Frames with triplet overflow

(33)

The Mu3e Experiment

Search for charged lepton flavour-violating decay

μ+

→ e

+

e

-

e

+

with a sensitivity in branching ratio better than 10

-16

Branching ratio

suppressed in Standard Model to below 10-54

Any hint of signal new physics

Supersymmetry

Grand unified models

Extended Higgs sector

...

(34)

Mupix7: Efficiency

(35)

Mupix7: Efficiency

Mupix7, HV = -85 V

(36)

Mupix: Mechanics

50 m siliconμ

∼ 50 m flexprint: Kapton, aluminum, μ copper

25 m Kapton foilμ

→ Ơ(0.1 %) radiation length

(37)

Sensitivity Study

2] [MeV/c mrec

96 98 100 102 104 106 108 110

2 Events per 0.2 MeV/c

3

10 2

10 1

10

1 10 102

at 10-12

eee

→ µ

at 10-13

eee

→ µ

at 10-14

eee

→ µ

at 10-15

eee

→ µ ν

eeeν

→ µ

muons/s muon stops at 108

1015

Mu3e Phase I

Bhabha + Michel

(38)

Multiple Scattering

Muons decay at rest

→ momentum < 53 MeV/c

Momentum resolution to first order:

σp/p ∼ θMS

Use recurling tracks for momentum measurement

(39)

Mupix Protoype

Readout electronics on chip

Fast LVDS link: 1.25 Gbit/s

Mupix7: latest prototype

Thinned to 50 mμ

32 x 40 pixel matrix

Pixel size: 103 m x 80 mμ μ

3.2 x 3.2 mm2

(40)

Muon Beam

@ Paul Scherrer Institute (PSI)

590 MeV cyclotron

2.2 mA proton beam

Most powerful proton beam worldwide

Target E: 28 MeV/c surface muons to πE5 beamline

(41)

Data Transfer

Transfer data from FPGA to RAM via direct memory access (DMA)

Tested at 1.5 GB/s: BER ≤ 4•10-16 (at 95% confidence level)

Tested on beam test campaigns

Will be used for readout of next MuPix prototype

LVDS connector for data cable from MuPix chip

(42)

Multiple Scattering Fit

Electrons: 12 – 53 MeV/c

Resolution dominated by multiple Coulomb scattering

Ignore hit uncertainty

Three consecutive hits: “triplet”

Multiple scattering at middle hit of triplet

Minimize multiple scattering

χ

2

= Φ

2MS

2

+ θ

2MS

2

r y

ΦMS

S01 S12

S 12 S 01

Θ MS

x y

Triplet

(43)

Geometrical Selection

After all cuts:

Reduce 3-hit combinations by factor 50 In subsequent layers, cut on:

Z-difference of hits

Φ-difference of hits

y

Ф1 - Ф0

(44)

Radius Distribution

400

300 200 100 0 100 200 300 400

Number of events / 4 mm

0 500 1000 1500 2000 2500 3000

3500 Positrons

Electrons

(45)

Z distance

(46)

Uncertainty at Intersection

σ

MS , PCA

MS , first layer

s ≈0.8 mm σ = 0.08 mm / √ 12=0.02 mm

Take both into account when calculating weights

multiple scattering sigma at first layer [rad]

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Number of events / mrad

0 20 40 60 80 100 120

103

×

path length in xy-plane from first layer to PCA [mm]

0 5 10 15 20 25 30 35 40 45 50

Number of events / 0.5 mm

0 20 40 60 80 100 120 140 160 180 200 220 240

103

×

(47)

Offline Reconstruction Reference

Full detector simulation is available

For this study:

Simulated signal events with one signal decay / 50 ns frame

Simulated background events with ordinary muon decays

Full offline reconstruction includes:

Track reconstruction with hits from all layers and recurl stations

Matching and linking of recurling track pieces

Linearised vertex fit for low momentum tracks in magnetic field

(48)

March 7th, 2017 D. vom Bruch, Mu3e 48

Selection on GPU

Obtain 50 ns data slices on DAQ computer, so called frames

Need to process 20∙106 frames / s

Will have about 10 DAQ computers

→ Process 2∙106 frames / s on each computer

Geometric selection cuts

Save hit positions of the three hits belonging to one triplet and hits in fourth

layer

FPGA GPU

DMA

Fits with three and four hits

Vertex selection

Save frame decision

(49)

histo

Entries 7603

Mean 0.02629

RMS 0.8089

/ ndf

χ2 40.6 / 6

Constant 1065 ±18.8 Mean 0.01068±0.00355 Sigma 0.2314±0.0037

Number of events / 0.1 mm

200 400 600 800 1000

1200 histo

Entries 7603

Mean 0.02629

RMS 0.8089

/ ndf

χ2 40.6 / 6

Constant 1065 ±18.8 Mean 0.01068±0.00355 Sigma 0.2314±0.0037

Vertex Position Distribution

histo

Entries 7603

Mean 0.01541

RMS 1.332

/ ndf

χ2 84.29 / 12

Constant 600.6±11.1 Mean 0.002901±0.006102 Sigma 0.3914±0.0068

true - estimated vertex position in x [mm]

10

8 6 4 2 0 2 4 6 8 10

Number of events / 0.1 mm

0 100 200 300 400 500 600

700 histo

Entries 7603

Mean 0.01541

RMS 1.332

/ ndf

χ2 84.29 / 12

Constant 600.6±11.1 Mean 0.002901±0.006102 Sigma 0.3914±0.0068

histo

Entries 7603

Mean 0.04704

RMS 1.331

/ ndf

χ2 84.32 / 14

Constant 613 ±10.9 Mean 0.004342±0.005676 Sigma 0.3941±0.0058

true - estimated vertex position in y [mm]

10

8 6 4 2 0 2 4 6 8 10

Number of events / 0.1 mm

0 100 200 300 400 500 600

histo

Entries 7603

Mean 0.04704

RMS 1.331

/ ndf

χ2 84.32 / 14

Constant 613 ±10.9 Mean 0.004342±0.005676 Sigma 0.3941±0.0058

(50)

Combined Momentum and Energy

combined momentum magnitude [MeV/c]

0 10 20 30 40 50 60 70 80 90 100

Number of events / MeV/c

0 10000 20000 30000 40000 50000 60000 70000 80000

Signal

Random combinations

combined energy [MeV]

0 20 40 60 80 100 120 140 160 180 200

Number of events / MeV

10000 20000 30000 40000 50000 60000

70000 Random combinations

Signal

(51)

Combined Momentum and Energy

combined momentum magnitude [MeV/c]

0 10 20 30 40 50 60 70 80 90 100

Number of events / MeV/c

0 10000 20000 30000 40000 50000 60000 70000 80000

Signal

Random combinations

combined energy [MeV]

0 20 40 60 80 100 120 140 160 180 200

Number of events / MeV

10000 20000 30000 40000 50000 60000

70000 Random combinations

Signal

(52)

Distance to Target

distance to target [mm]

0 2 4 6 8 10 12 14 16 18 20

Number of events / 0.1 mm

10000 20000 30000 40000 50000 60000 70000

Random combinations

Signal

(53)

Distance to Target

distance to target [mm]

0 2 4 6 8 10 12 14 16 18 20

Number of events / 0.1 mm

10000 20000 30000 40000 50000 60000 70000

Random combinations

Signal

(54)

Pixel Detector

High Voltage Monolithic Active Pixel Sensors (HV-MAPS)

Fast charge collection via drift

Thinned down to 50 mμ

Pixel size: 80 m x 80 mμ μ

Chip size: 2 cm x 2 cm

Thickness chip & readout:

Ơ(0.1 %) radiation length

(55)

2844 Pixel Sensors

up to 45 1.25 Gbit/s links

FPGA FPGA FPGA

...

86 FPGAs

1 6 Gbit/s link each

GPU PC

GPU PC

GPU 12 PCs PC

12 10 Gbit/s links per

8 Inputs each

3072 Fibre Readout Channels

FPGA FPGA

...

12 FPGAs

6272 Tiles

FPGA FPGA

...

14 FPGAs

Data Collection

Server

Mass Storage Gbit Ethernet

Switching Board

Switching Board Switching

Board

Front-end(inside magnet)

Switching Board

Readout Scheme

Front-end board:

Sort hits according to time stamps

Send off via optical links

Switching board:

Merge data from different detector regions

Pack into 50 ns time slices

Send off via optical links PCIe board:

First data selection

Transfer data to RAM of PC via PCIe

Referenzen

ÄHNLICHE DOKUMENTE

Transfer these + hits in 4 th layer to GPU Positive tracks Negative tracks Select combinations of 2 positive, 1 negative track from one vertex, based on

Track reconstruction for the Mu3e experiment based on a novel Multiple Scattering fit.. Alexandr Kozlinskiy (Mainz, KPH) for the Mu3e collaboration CTD/WIT 2017

Large area O(1m 2 ) monolithic pixel detectors with X/X 0 = 0.1% per tracking layer Novel helium gas cooling concept.. Thin scintillating fiber detector with ≤ 1mm thickness

HV-MAPS pixel size = 80µm mount to 2x2cm 2 sensors thinned to 50µm.. Kapton as

Store time stamp and row address of 1 st hit in column in end-of-column cell Delete hit flag.. LdCol

If all cuts passed: Count triplets and save hits in global memory using atomic function. Copy back global

The measurement with helium showed that cooling of the layers with a heat dissipation of 400 mW / cm 2 caused a temperature increase of around 70 − 75 K compared to the

Particularly important for the cooling system is the scintillating fibre detector, because it divides the helium volume between the outer and inner double pixel layer into two