Online Track and Vertex Reconstruction on GPUs for the Mu3e Experiment
Dorothea vom Bruch
March 28
th2017
DPG Frühjahrstagung 2017, T46: Elektronik
The Mu3e Experiment
Search for charged lepton flavour-violating decay
μ+→ e
+e
-e
+with a sensitivity in branching ratio better than 10
-16Branching ratio
suppressed in Standard Model to below 10-54
Any hint of signal new physics→
● Supersymmetry
● Grand unified models
● Extended Higgs sector
● ...
Mu3e Signal
Signal
● Coincident in time
● Single vertex
●
● E = m
∑
⃗pi=0e+
e+ e-
e+
e+ e-
∑
⃗pi≠0Random Combinations
● Not coincident in time
● No single vertex
●
● E ≠ m
The Mu3e Detector
Target
Inner pixel layers
Scintillating f bres
Outer pixel layers Recurl pixel layers
Scintillator tiles
μ Beam
i
10 cm 4.5 cm
B
The Mu3e Detector
10 cm 4.5 cm
Target
Inner pixel layers
Scintillating f bres
Outer pixel layers i
Recurl pixel layers Scintillator tiles
μ Beam
B
Readout Scheme
FPGA: Field-Programmable Gate Array GPU: Graphics Processing Unit
2844 Pixel Sensors
up to 45 1.25 Gbit/s links
FPGA FPGA FPGA
...
86 FPGAs 1 6 Gbit/s
link each
GPU PC GPU
PC GPU
12 PCs PC 12 10 Gbit/s
links per
8 Inputs each
3072 Fibre Readout Channels
FPGA FPGA
...
12 FPGAs
6272 Tiles
FPGA FPGA
...
14 FPGAs
Gbit Ethernet
Switching
Board Switching
Board Switching
Board
Front-end (inside magnet)
Switching Board
Readout Scheme
2844 Pixel Sensors
up to 45 1.25 Gbit/s links
FPGA FPGA FPGA
...
86 FPGAs 1 6 Gbit/s
link each
GPU PC GPU
PC GPU
12 PCs PC 12 10 Gbit/s
links per
8 Inputs each
3072 Fibre Readout Channels
FPGA FPGA
...
12 FPGAs
6272 Tiles
FPGA FPGA
...
14 FPGAs
Data Mass
Gbit Ethernet
Switching
Board Switching
Board Switching
Board
Front-end (inside magnet)
Switching Board
From Switching board: get 50 ns time slices of data containing full detector information
Readout Rate
Data rate [Gbit / s]
Pixel detector 40
Fiber detector 20
Tile detector negligible
Total ~ 60
At a rate of 108 muons / s
Triggerless, zero-suppressed readout
Selection Process
How do we find the three signal tracks?
1) Selection Cuts 2) Track fitting 3) Vertex search
e+
e+ e-
Geometrical Selection
After all cuts:
In subsequent layers, cut on:
x y
Ф1 - Ф0
z r
01
2
01
2
z1 - z0
Multiple Scattering Fit
● Electrons: 12 – 53 MeV/c
● Resolution dominated by multiple Coulomb scattering
● Ignore hit uncertainty
● Three consecutive hits: “triplet”
● Multiple scattering at middle hit of triplet
● Minimize multiple scattering
χ
2= Φ
MS 2σ
2MS ,Φ+ θ
MS 2σ
2MS ,θy
Triplet
→ Talk by A. Kozlinskiy (T 116.1, Thursday, 16:45)
Fitting
● Fit hits in first three layers
● Propagate to 4th layer
● Select hit in 4th layer closest to propagated position
● Redo fit with a second triplet, cut on χ2
After all selections:
● 98.5 % of true 4-hit MC tracks selected
● 74 % of 4-hit tracks are true MC tracks
Vertex Estimate: XY-Plane
● Study each combination of two e+, one e-
● In xy-plane: find intersections of track circles
● Calculate weights of intersections based on uncertainties due to
– multiple scattering
– pixel size
e+ e+
e-
x y
Vertex Estimate: XY-Plane
● Study each combination of two e+, one e-
● In xy-plane: find intersections of track circles
● Calculate weights of intersections based on uncertainties due to
– multiple scattering
– pixel size
e+ e+
e-
y
Vertex Estimate: XY-Plane
● Study each combination of two e+, one e-
● In xy-plane: find intersections of track circles
● Calculate weights of intersections based on uncertainties due to
– multiple scattering
– pixel size
e+ e+
e-
x y
Vertex Estimate: XY-Plane
● Study each combination of two e+, one e-
● In xy-plane: find intersections of track circles
● Calculate weights of intersections based on uncertainties due to
– multiple scattering
– pixel size
e+ e+
e-
y
Vertex Estimate
PCAxy 1
x y
PCAxy 2
PCAxy 3
Weighted mean
● Calculate weighted mean of intersections from three different tracks
● Find point of closest approach (PCAxy) to weighted mean in xy-plane on each track
● Calculate z-position PCAz and weight at PCAxy
● Find weighted mean in z-coordinate
● Achieve vertex resolution of ~400 μm sigma
Cut Effects
Signal reference: full offline track reconstruction and offline vertex fit
no cuts
chi2
& target distance
ntum magnitude
& total energy
& recurler cut 0.986
0.988 0.990 0.992 0.994 0.996 0.998 1.000
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
background tightsignalcut
Signal frames accepted Background frames accepted
Less than 2%
Fast Reconstruction on GPU
● Use time slices of 50 ns for track &
vertex search
→ Process 20∙106 time slices per second
● Plan for 12 filter farm PCs with one GPU each
→ Process at least 1.7∙106 time slices per second
● Thousands of cores
● Optimal parallel performance
● Best suited for many floating-point operations / second
Selection on GPU
PCIe FPGA
Recurl station hits, Timing information
Hits layer 1
Geometrical three-hit selection
Coordinate transformation
Hits layer 2
Hits layer 3
Hits layer 4
GPU
GPU memory Three-hit fit
Propagation, four-hit fit Positive
tracks Negative tracks Vertex selection
GPU memory
Selection decision
Selection on GPU
PCIe FPGA
Recurl station hits, Timing information
Hits layer 1
Geometrical three-hit selection
Coordinate transformation
Hits layer 2
Hits layer 3
Hits layer 4
GPU
GPU memory Three-hit fit
Propagation, four-hit fit Positive
tracks Negative tracks Vertex selection
GPU memory
Selection decision
Performance
Optimizations performed:
● Memory layout and access pattern
● Register usage
● Grid dimensions
Currently process 2∙106 time slices / s on one nvidia GTX 1080 at a muon stopping rate of 7∙107 Hz
Backup
Other Mu3e Talks:
● L. Huth: Test beam results for neutron and proton irradiated MuPix7 prototypes, T26, Monday, 17:45
● H. Augustin: The MuPix8, HK 18, Tuesday, 11:00
● T. Kar: Large Area Monolithic Pixel Detectors for HL-LHC & Future High Rate Experiments, HK 18, Tuesday, 11:15
● J. Kroeger: Flexprint Design Studies for the Mu3e Experiment, T46, Tuesday, 12:15
● U. Hartenstein: Track Based Alignment for the Mu3e Detector, T89, Wednesday, 17:00
● A.-K. Perrevoort: Searches for New Physics with the Mu3e Experiment, T78, Wednesday, 17:35
● A. Herkert: A Thin Silicon Pixel Tracker for the Mu3e Experiment, T94, Wednesday, 18:30
● S. Dittmeier: Readout of the Mu3e pixel detector, T94, Wednesday, 18:50
Institutions
● University of Geneva
● Heidelberg University
● Karlsruhe Institute of Technology
● Mainz University
● Paul Scherrer Institut
● ETH Zurich
● University of Zurich
Parallelization Track Fit
Time slice
1 Time slice 2
Time Slice N
...
...
...
... ... ...
● Fit for one combination of three hits
● Propagation to 4th layer
● Loop over hits in 4th layer: check if hit exists in proximity of propagated track, re-fit
● Wait for all cores in one time slice to be done with previous steps
Thread
1 Thread
2 ...
...
...
16 x 8192 50 ns time slices
96 threads / time slice
Parallelization Track Fit
Time slice
1 Time slice 2
Time Slice N
...
...
...
... ... ...
● Fit for one combination of three hits
● Propagation to 4th layer
● Loop over hits in 4th layer: check if hit exists in proximity of propagated track, re-fit
● Wait for all cores in one time slice to be done with previous steps
Thread
1 Thread
2 ...
...
... ... ...
16 x 8192 50 ns time slices
96 threads / time slice
Parallelization Vertex Selection
Time slice
1 Time slice 2
Time slice N
...
...
...
... ... ...
● For one electron & one positron from this 50 ns time slice:
– Loop over all other positrons
– Find vertex estimate
● Decide whether to keep this time slice
Thread
1 Thread
2 ...
...
...
Muon Stopping Rate Study I
4.00E+07 6.00E+07 8.00E+07 1.00E+08 1.20E+08 0.86
0.88 0.9 0.92 0.94 0.96 0.98 1
0 0.01 0.02 0.03 0.04 0.05 0.06
background tightsignalcut truthsignal losesignalcut
muon stopping rate on target [Hz]
Signal frames accepted Background frames accepted
Muon Stopping Rate Study II
4.0E+07 6.0E+07 8.0E+07 1.0E+08 1.2E+08 1.4E+08 0.0E+00
5.0E+05 1.0E+06 1.5E+06 2.0E+06 2.5E+06 3.0E+06 3.5E+06 4.0E+06
Muon stopping rate on target
Frames / s
4.0E+07 6.0E+07 8.0E+07 1.0E+08 1.2E+08 1.4E+08 0
0 0 0 0 0 0 0
0 0 0 0 0 0.01 0.01 0.01 0.01
frames with hit overflow
Muon stopping rate on target
Frames with hit overflow Frames with triplet overflow
Multiple Scattering
● Muons decay at rest
→ momentum < 53 MeV/c
● Momentum resolution to first order:
σp/p ∼ θMS/Ω
● Use recurling tracks for momentum measurement
Ω ~ π MS
θMS
B
Multiple Scattering Fit
z r
x y
ΦMS
S01 S12
S 12 S 01
Θ MS
χ
2= Φ
MS 2σ
2+ θ
MS 2σ
2Data Transfer
● Transfer data from FPGA to RAM via direct memory access (DMA)
● Tested at 1.5 GB/s: BER ≤ 4•10-16 (at 95% confidence level)
● Tested on beam test campaigns
● Will be used for readout of next MuPix prototype
LVDS connector for data
DMA: Implementation
CUDA API:
memory allocation
Physical memory Virtual
memory
Length 1 Length 2
Length 3
Address
Write addresses, lengths to FPGA
Radius Distribution
Number of events / 4 mm
500 1000 1500 2000 2500 3000
3500 Positrons
Electrons
Z distance
Uncertainty at Intersection
σ
MS , PCA=σ
MS , first layer⋅ s ≈ 0.8 mm
Take both into account whenmultiple scattering sigma at first layer [rad]
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Number of events / mrad
0 20 40 60 80 100 120
103
×
path length in xy-plane from first layer to PCA [mm]
0 5 10 15 20 25 30 35 40 45 50
Number of events / 0.5 mm
0 20 40 60 80 100 120 140 160 180 200 220 240
103
×
histo
Entries 7603
Mean −0.02629
RMS 0.8089
/ ndf
χ2 40.6 / 6
Constant 1065 ±18.8 Mean −0.01068±0.00355 Sigma 0.2314±0.0037
Number of events / 0.1 mm
600 800 1000
1200 histo
Entries 7603
Mean −0.02629
RMS 0.8089
/ ndf
χ2 40.6 / 6
Constant 1065 ±18.8 Mean −0.01068±0.00355 Sigma 0.2314±0.0037
Vertex Position Distribution
histo
Entries 7603
Mean 0.01541
RMS 1.332
/ ndf
χ2 84.29 / 12
Constant 600.6±11.1 Mean −0.002901±0.006102 Sigma 0.3914±0.0068
true - estimated vertex position in x [mm]
10
− −8 −6 −4 −2 0 2 4 6 8 10
Number of events / 0.1 mm
0 100 200 300 400 500 600
700 histo
Entries 7603
Mean 0.01541
RMS 1.332
/ ndf
χ2 84.29 / 12
Constant 600.6±11.1 Mean −0.002901±0.006102 Sigma 0.3914±0.0068
histo
Entries 7603
Mean 0.04704
RMS 1.331
/ ndf
χ2 84.32 / 14
Constant 613 ±10.9 Mean −0.004342±0.005676 Sigma 0.3941±0.0058
true - estimated vertex position in y [mm]
10
− −8 −6 −4 −2 0 2 4 6 8 10
Number of events / 0.1 mm
0 100 200 300 400 500 600
histo
Entries 7603
Mean 0.04704
RMS 1.331
/ ndf
χ2 84.32 / 14
Constant 613 ±10.9 Mean −0.004342±0.005676 Sigma 0.3941±0.0058
χ 2 Distribution
0 10 20 30 40 50 60 70 80 90 Chi2100
Number of Entries
103
104
Random combinations Signal
Combined Momentum and Energy
combined momentum magnitude [MeV/c]
0 10 20 30 40 50 60 70 80 90 100
Number of events / MeV/c
0 10000 20000 30000 40000 50000 60000 70000 80000
Signal
Random combinations
combined energy [MeV]
0 20 40 60 80 100 120 140 160 180 200
Number of events / MeV
10000 20000 30000 40000 50000 60000
70000 Random combinations
Signal
Distance to Target
0 2 4 6 8 10 12 14 16 18 20
Number of events / 0.1 mm
10000 20000 30000 40000 50000 60000 70000
Random combinations
Signal
Pixel Detector
● High Voltage Monolithic Active Pixel Sensors (HV-MAPS)
● Fast charge collection via drift
● Thinned down to 50 mμ
● Pixel size: 80 m x 80 mμ μ
● Chip size: 2 cm x 2 cm
● Thickness chip & readout:
Ơ(0.1 %) radiation length
I. Peric, P. Fischer et al, NIM A 582 (2007) 876
P-substrate N-well
Particle E f eld
Mupix: Mechanics
● 50 m siliconμ
● ∼ 50 m flexprint: Kapton, aluminum, μ copper
● 25 m Kapton foilμ
→ Ơ(0.1 %) radiation length
Sensitivity Study
2 Events per 0.2 MeV/c
10 3
10 2
10 1
1 10 102
at 10-12
eee
at 10-13
eee
at 10-14
eee
at 10-15
eee
eee
muons/s muon stops at 108
1015
Mu3e Phase I
Bhabha + Michel