• Keine Ergebnisse gefunden

High Voltage Monolithic Active Pixel Sensor

N/A
N/A
Protected

Academic year: 2022

Aktie "High Voltage Monolithic Active Pixel Sensor"

Copied!
15
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

GPU-based online track reconstruction for the MuPix-telescope

Carsten Grzesik for the Mu3e collaboration

February 29, 2016

(2)

Motivation

Mu3e experiment

I high data rate: ∼1 Tbit s−1

I online track reconstruction

I reduction factor: ∼1000

MuPix telescope

I test setup: pixel sensors, readout and online reconstruction

I high beam rates: O(1 MHz)

I max. output rate:

4×1.25 Gbit s−1

(3)

High Voltage Monolithic Active Pixel Sensor

I 180 nm HV-CMOS technology

I reverse biased up to 90 V

I thin depletion region

I thinning to 50µm

I readout logic directly on chip

I zero suppressed, serial data output 1.25 Gbit s−1

I details in T72.1/2/3

I.Peri´c,Nucl.Instrum.Meth., 2007, A582, 876

(4)

Setup

Beam

I MuPix7

I sensor size:

3.2×3.2 mm2

I serial data output via LVDS

(5)

Data Transmission - DMA

I serial data output from sensor planes

I merge and sort on FPGA

I PCIe connection to PC

I Direct Memory Access to GPU not available

I DMA via main memory

I data rate: ≤1.5 GB s−1

CPU

FPGA GPU

PCIe

RAM

MuPix

(6)

Graphics Processing Unit

I programming: CUDA API

I commercial gaming GPUs

I GTX 980: 2048 cores @ 1.3 GHz

I straight track model →few calculation steps

I combinatorics of hits → lots of memory loads

I memory bound algorithm → need high memory throughput

(7)

Memory Coalescing

Memory

Threads

Memory load

I example for 16 threads/SM

I 16 threads perform same operation at same time (e.g.

memory load)

I for consecutive data in memory →grouped in 1 load operation

(8)

GPU implementation

I parallelization: one timeframe per thread →no communication required across thread boundaries

I hits from consecutive frames next to each other (coalesced memory access)

I need to sort the data by plane and time

memory pos. 0 1 2 ... 31

hit.plane.frame 0.0.0 0.0.1 0.0.2 ... 0.0.31

memory pos. 32 33 ... ... 63

hit.plane.frame 1.0.0 1.0.1 ... ... 1.0.31 ...

memory pos. 256 257 ...

hit.plane.frame 0.1.0 0.1.1 ...

(9)

Setup - DESY Testbeam

planned:

CPU

FPGA GPU

PCIe

RAM

MuPix

implemented:

CPU

FPGA GPU

PCIe

RAM

MuPix

Sorting

(10)

Results - DESY Testbeam

Entries h1636869

Mean 5.006

RMS 33.35

1000

500 0 500 1000

1 10 102 103 104 105

Entries h1636869

Mean 5.006

RMS 33.35

res1_x

CPU residuals [um]

Entries h1636869

Mean 5.006

RMS 33.35

10 102 103 104 105

Entries h1636869

Mean 5.006

RMS 33.35

GPU residuals [um]

(11)

Results - DESY Testbeam

res_hist Entries 636869 Mean 0.0001312 RMS 0.0001586

0.001

0 0.00080.00060.00040.0002 0 0.0002 0.0004 0.0006 0.0008 0.001 5000

10000 15000 20000 25000 30000 35000 40000 45000

res_hist Entries 636869 Mean 0.0001312 RMS 0.0001586

Residual of Residuals [um]

I deviation<1 nm

I bias to bigger CPU values

I execution differences CPU/GPU (e.g. floating point precision)

(12)

Summary and Outlook

I GPU tracking implemented

I DMA working up to 1.5 GB s−1

I offline tracking on GPU gives reasonable results to do:

I finally test FPGA firmware

I use GPU tracking online

I optimization of GPU code Acknowledgments

The measurements leading to these results have been performed at the Test Beam Facility at DESY Hamburg (Germany), a member of the Helmholtz Association (HGF)

(13)

Backup

I memory bound GPU kernels →32 bit floating point

I IEEE 754 floating point arithmetic

I GPU uses Fused Multiply Add (FMA)

0 50 100 150 200 250 300 350103

× residual vs value: chi2

chi2(double)

0 20 40 60 80 100 120

chi2(GPU)-chi2(double)

0.06 0.04

0.02

0 0.02 0.04

3

10

× histo

Entries 1310720 Mean x 11.43 Mean y 5.017e07 RMS x 8.625 RMS y 9.745e07

residual vs value: chi2 10^3

(14)

Backup

htemp Entries 999799 Mean 0.02322 RMS 0.1488

(chi2_float-chi2_double)/chi2_float

0 0.2 0.4 0.6 0.8 1

1 10 102

103

104

105

106

htemp Entries 999799 Mean 0.02322 RMS 0.1488 (chi2_float-chi2_double)/chi2_float {chi2_float > 0}

I IEEE 754:

I float ULP: 10−7

(15)

Backup

htemp Entries 626660 Mean 56.56 RMS 146.1

chi2

0 2000 4000 6000 8000 10000 12000 14000 16000

1 10 102 103 104 105

106 htemp

Entries 626660 Mean 56.56 RMS 146.1 chi2

htemp Entries 636869 Mean 56.58 RMS 146.2

chi2

0 2000 4000 6000 8000 10000 12000 14000 16000

1 10 102 103 104 105

106 htemp

Entries 636869 Mean 56.58 RMS 146.2 chi2 {chi2 == chi2}

Referenzen

ÄHNLICHE DOKUMENTE

The measurements leading to beam test results have been performed at the Test Beam Facility at DESY Hamburg (Germany), a member of the Helmholtz Association (HGF). We would like

Mu3e Testbeam Measurements at DESY.. Moritz Kiehn for the

Precise timing, good momentum and vertex resolution required Good momentum and total. energy

In summary, the Mu3e detector must provide excellent vertex and timing resolution as well as an average momentum resolu- tion better than 0.5 MeV/c with a large geometrical

The Mu3e detector consists of two double layers of high voltage monolithic active pixel sensors (HV-MAPS) around a target double cone..

By setting the real injection amplitude to a value just above the threshold, increasing the tune values caused the counts to drop and the distribution of the time delays to get

Perić, A novel monolithic pixelated particle detector implemented in high- voltage CMOS technology. Nucl.Instrum.Meth., 2007,

After mini- mizing emittance contributions from nonlinear electromag- netic RF fields, linear and nonlinear space charge forces, chromatic effects, and image noise in the emittance