HV-MAPS Tracking Telescope: Fast Data Transfer with Direct Memory Access
Dorothea vom Bruch for the Mu3e Collaboration
BTTB Workshop Feb 3, 2016
Introduction
talk by Lennart Huth: HV-MAPS tracking telescope
Motivation
4 x 1.25 Gbit/s LVDS links
Max. 30 Mhits / s / plane
More data than we can write to disk
Need GPU online reconstruction for selection
Readout Scheme
Readout PC
FPGA Board
PC Back (plug) side PC Front (CD Drive) side
Control Adapter Control Adapter
0 Data Cables
Data Cables
PCIe Connection
1 2 3
Graphics Processing Unit (GPU)
Readout Scheme
Data Cables
Data Cables Control
Adapter
Control Adapter FPGA
Board
FPGA Board
Commercially available from Altera Data stream from four planes Timestamps of 32 ns
Hit sorter:
Sort according to timestamps Up to 15 hits / timestamp Receive trigger signals Send off data:
Polling data initiated by CPU OR
Direct Memory Access (DMA) initiated by FPGA
Readout Software
GUI Main Window
Pre- processing
File Writer
Monitoring Hard
Disk
Receive Data
Online Tracking
&
Efficiencies
Readout Software
GUI Main Window
Pre- processing
File Writer
Monitoring Hard
Disk
Receive Data
Online Tracking
&
Efficiencies
Polling Data
Main Memory
Data FPGA
memory
Request
Data
CPU Data
Direct Memory Access (DMA)
Send interrupt messages
Main Memory
Address 1 Address 2
Address 3
Length 1 Length 2
Length 3
FPGA
Data buffer Address
memory
Write data CPU
Polling Data vs. Direct Memory Access
Polling Data Direct Memory Access Read request from computer Write from FPGA to Write from FPGA to main memory main memory
Computer controls copying FPGA controls copying CPU is informed about process
via interrupt messages Limited to ∼30 MB/s Theoretically limited by
PCIe bandwidth (4 GB/s)1
Rate Tests
On FPGA: 256 kB memory buffer for data to be sent off Send in chunks of 4 kB
Every 256 kB PC is notified via interrupt message where to read next
For speed test: copy data to memory of graphics processing unit (GPU)
Check for errors
Not with telescope readout, only testing data transfer At 1.5 GB/s: Measured bit error rate ≤4×10−16
DMA with Telescope
Pre-processing Ring Buffer
in main menory
DMA Data Read
Test with data generator on FPGA
Produces hits from four planes
Tested at 300 MB/s
Tested at DESY in October 2015 Run for two days continuously
Errors occuring with probability of 10−4
Online Track Reconstruction on GPUs
Sort hits according to planes Prepare memory for coalesced access on GPU
Copy data to GPU memory Fit straight tracks
Calculate efficiency Track rate of∼700 kHz
Image source: nvidia.com
GPU Reconstruction: Results from Desy (10/2015)
htemp Entries 636869
Mean 5.006
RMS 33.35
res1_x 1000
− −500 0 500 1000
1 10 102 103 104 105
htemp Entries 636869
Mean 5.006
RMS 33.35
Residuals plane 1, CPU
htemp Entries 636869
Mean 5.006
RMS 33.35
1000
− −500 0 500 1000
1 10 102 103 104 105
htemp Entries 636869
Mean 5.006
RMS 33.35
Residuals plane 1, GPU
DMA and GPU Reconstruction - Current Status
GUI Main Window
Pre- processing
File Writer
Monitoring Hard
Disk
DMA Data
Online Tracking
&
Efficiencies on GPU
Combine DMA and GPU Reconstruction
Main Memory
Main Memory DMA
DMA File Writer
Monitoring
Hard Disk
FPGA Online Tracking
&
Efficiencies on GPU Pre-processing
Hits sorted for GPU tracking
Hits for offline analysis
Master thesis in progress by Carsten Grzesik
Summary
Data transmission via DMA @ 1.5 GB/s Simulated telescope data @ 300 MB/s Tested telescope readout via DMA @ Desy
Tested online track reconstruction on GPU @ Desy
Outlook
Transfer sorted hits directly from FPGA to GPU memory via main memory
Online track reconstruction and efficiency calculation independent from file writing for offline analysis Goal:
Telescope with large chips
Capable of high rates (∼20 Mhits / plane / s) Fast online track reconstruction
Iterative alignment procedure Online efficiency calculation