Development of high speed waveform sampling ASICs
Stefan Ritt - Paul Scherrer Institute, Switzerland
NSNI – 2010, Mumbai, India
Question …
4 channels 5 GSPS
1 GHz BW 8 bit (6-7)
15k$ (700kRs) 4 channels
5 GSPS 1 GHz BW 8 bit (6-7)
15k$ (700kRs)
4 channels 5 GSPS
1 GHz BW 11.5 bits
1k$ (50kRs) USB Power 4 channels 5 GSPS
1 GHz BW 11.5 bits
1k$ (50kRs)
USB Power
Switched Capacitor Array
Shift Register
Clock IN
Out
“Time stretcher” GHz MHz
“Time stretcher” GHz MHz
Waveform stored
Inverter “Domino” ring chain
0.2-2 ns
FADC 33 MHz
Switched Capacitor Array
• Cons
• No continuous acquisition
• Limited sampling depth
• Nonlinear timing
• Pros
• High speed (up to 5 GSPS) high resolution (13 bit SNR)
• High channel density (16 channels on 5x5 mm
2)
• Low power (10-40 mW / channel)
• Low cost (~ 10$ / channel)
t t t t t
Goa l: M inim ize Lim itati ons
• CMOS process (typically 0.35 … 0.13 m) sampling speed
• Number of channels, sampling depth, differential input
• PLL for frequency stabilization
• Input buffer or passive input
• Analog output or (Wilkinson) ADC
• Internal trigger
Design Options
PLL
ADC
Trigger
Write Circuitry
How to sample the input signal
Simple inverter chain
1 0 0 0 0 0
0 0 0 0 0
0
1 0
0 0 0 0 0
0
1 0
1
0 0 0
1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1 1
1 1
1 0 0 0 0 0
0 0 0 0 0
0
0
0 0 0 0 0
0
0
0 0 0
1
1
1
0 0 0
0
0 0 0 0
0 0 0 0
0 0
0 0 0 0 0
0
0 0
Design of Inverter Chain
PMOS > NMOS
PMOS < NMOS
“Tail Biting”
enable
1 2 3 4
1 2 3 4
speed
Phase Locked Loop
On-chip PLL can lock sampling frequency to external reference clock
T Q
Phase Comparator
External Reference Clock
Inverter Chain
loop filter down
1
2
sampling speed control
PLL
up
Timing jitter
t1 t2 t3 t4 t5
• Inverter chain has transistor variations
ti between samples differ
“Fixed pattern aperture jitter”
• “Differential temporal nonlinearity”
TDi= ti – tnominal
• “Integral temporal nonlinearity”
TIi = ti – itnominal
• “Random aperture jitter” = variation of ti between measurements
• Inverter chain has transistor variations
ti between samples differ
“Fixed pattern aperture jitter”
• “Differential temporal nonlinearity”
TDi= ti – tnominal
• “Integral temporal nonlinearity”
TIi = ti – itnominal
• “Random aperture jitter” = variation of ti between measurements
TD TI
Fixed jitter calibration
• Fixed jitter is constant over time, can be measured and corrected for
• Several methods are commonly used
• Most use sine wave with random phase and correct for TDi on a statistical basis
• Fixed jitter is constant over time, can be measured and corrected for
• Several methods are commonly used
• Most use sine wave with random phase and correct for TDi on a statistical basis
Fixed Pattern Jitter Results
• TDi typically ~50 ps RMS @ 5 GHz
• TIi goes up to ~600 ps
• Jitter is mostly constant over time,
measured and corrected
• Residual random jitter 3-4 ps RMS
Achievable Timing Resolution
After proper timing calibration, a
“split pulse timing accuracy” of typically
~10 ps can be chieved
D. Breton D. Breton
What determines the BW?
• The analog bandwidth is given by the parasitic capacitance of the input bus and the input impedance
• Typically 20fF/cell+20pF (bus), 2-3 for bond wire 1 GHz BW
• An active input buffer does not really help
20 fF
20 pF Bond wire
2-3
RC GHz f dB 1.8
2 1
3
“The best buffer is no buffer”
– G. Varner
“The best buffer is no buffer”
– G. Varner
Cascaded Switched Capacitor Array
• Combines the advantage of a short input stage (32 cells) with a deep secondary sampling stage (32x32 cells)
• Estimated input BW:
5 GHz
• Sampling speed:
10 GSPS (130 nm)
• 100 ps sample time – 3.1 ns hold time
• Matches BW of fastest detectors
(G-APD, MCP-PMT)
next generation of SCAs
shift register input
. . . .
Readout Circuitry
How to read out sampled waveforms
Analog Readout Methods
write
read
C
. . . R
(700 )
Uin I
write
C Uin
“Differential Pair”
Ib Vout
Ib/2 Ib/2
+
-
read write
C
(200fF)
Uin
read
I ~ kT
Digital Readout
Wilkinson-type ADC requires only one comparator per sampling cell
12-bit counter
+ -
+ -
latch
DAC latch
ramp voltage
comparator comparator
ASIC
FPGA
How to minimize dead time ?
• Fast analog readout: 30 ns / sample
• Parallel readout
• Region-of-interest readout
• Simultaneous write / read
I N 0 I N 1 I N 2 I N 3 I N 4 I N 5 I N 6 I N 7 I N 8
S T O P S H I F T R E G I S T E R R E A D S H I F T R E G I S T E R W S R O U T
R S R L O A D D E N A B L E W S R I N D W R I T E
D S P E E D P L L O U T
D O M I N O W A V E C IR C U I T P L L
A G N D A V D D P L L L C K R E F C L K D T A PA 0 A 1 A 2 A 3
ENABLE
O U T 0 O U T 1 O U T 2 O U T 3 O U T 4 O U T 5 O U T 6 O U T 7 O U T 8 / M U X O U T B I A S O - O F S R O F S S R O U T S R IN
F U N C T I O N A L B L O C K D I A G R A M
M U X
WRITE SHIFT REGISTER WRITE CONFIG REGISTER
C H A N N E L 0 C H A N N E L 1 C H A N N E L 2 C H A N N E L 3 C H A N N E L 4 C H A N N E L 5 C H A N N E L 6 C H A N N E L 7 C H A N N E L 8
M U X L V D S
AD9222 12 bit 8 channels
DRS4DRS4
DRS4 ROI readout mode
readout shift register
Trigger stop
normal trigger stop after latency
Delay
delayed trigger stop
Patent pending!
33 MHz
e.g. 100 samples @ 33 MHz
3 us dead time
300,000 events / sec.
e.g. 100 samples @ 33 MHz
3 us dead time
300,000 events / sec.
Simultaneous Write/Read
Channel 0 Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Channel 6 Channel 7
0
FPGA
0 0 0 0 0 0 0
1 Channel 0
Channel 1
1
Channel 0 readout
8-fold
analog multi-event buffer
Channel 2
1
Channel 1
0
Expected crosstalk ~few mV Expected crosstalk ~few mV
Current SCA ASICs
Chip family SAM [1] LAB [2] DRS [3] Anusmriti [4]
Max. sampling speed 2.5 GSPS 3.7 GSPS 6 GSPS 0.5 GSPS
Analog Bandwidth 300 MHz 900 MHz 950 MHz ?
Number of channels 2 1-16 9 1
SNR 13.4 bits 10 bits 11.4 bits ?
Sampling depth 144-2520 256-64k 1025-8192 128
Readout time 650 s 150 s – 10ms 30 ns * nsamples 128 s
Input Buffers YES YES NO YES
Internal PLL YES NO YES YES
ADC External Internal External External
Power/channel 150-500 mW 15-50 mW 14-45 mW 400 mW
[1] E. Delagnes, D. Breton et al., NIM A567 (2006) 21 [2] G. Varner et al., NIM A583 (2007) 447
[3] S. Ritt, NIM A518 (2004) and http://drs.web.psi.ch
Advanced Topics
Triggering, Channel Cascading, Waveform Analysis
How to measure best timing?
Simulation of MCP with realistic noise and different discriminators Simulation of MCP with realistic noise and different discriminators
J.-F. Genat et al., arXiv:0810.5590 (2008)
Flash ADC Technique
60 MHz 12 bit Q-sensitive
Preamplifier PMT/APD
Wire
Shaper
• Shaper is used to optimize signals for “slow” 60 MHz FADC
• Shaping stage can only remove information from the signal
• Shaping is unnecessary if FADC is “fast” enough
• Shaper is used to optimize signals for “slow” 60 MHz FADC
• Shaping stage can only remove information from the signal
• Shaping is unnecessary if FADC is “fast” enough
FADC
TDC
“Fast”
12 bit Transimpedance
Preamplifier FADC
PMT/APD Wire
Digital Processing Amplitude
Time Baseline
Restoration
How fast is “fast”
• Nyquist-Shannon: Sampling rate must be 2x the highest frequency coming from detector
• Analog Bandwidth must match signal from detector
• Fastest pulses coming from Micro-Channel-Plate PMTs
3m pores
Fastest pulses
• MCP-PMTs: 70 ps rise time
4-5 GHz BW 10 GSPS
• Cable should not limit bandwidth
Put digitizer onto detector
• Higher sampling speed only improves statistics
shift register input
fast sampling stage secondary sampling stage . . . .
Aimed parameters:
5 GHz Bandwidth
10 GSPS Sampling Rate Aimed parameters:
5 GHz Bandwidth
10 GSPS Sampling Rate
10 GSPS
Trigger and DAQ on same board
• All SCA applications need some kind of trigger split signals
• Using a multiplexer in DRS4, input signals can simultaneously digitized at 65 MHz and sampled in the DRS
• FPGA can make local trigger (or global one) and stop DRS upon a trigger
• DRS readout (5 GSPS) though same 8-channel FADCs
analog front end
DRS FADC
12 bit 65 MHz
MUX FPGA
trigger
LVDS
SRAM
DRS4
global trigger bus
“Free” local trigger capability without additional hardware
“Free” local trigger capability without additional hardware
Daisy-chaining of channels
Channel 0 Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Channel 6 Channel 7 Domino Wave
1 clock
0 1 0 1 0 1 0
enable input
enable input
Channel 0 Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Channel 6 Channel 7 Domino Wave
1 clock
0
1 0
1 0
1 0
enable input
enable input
DRS4 can be partitioned in: 8x1024, 4x2048, 2x4096, 1x8192 cells DRS4 can be partitioned in: 8x1024, 4x2048, 2x4096, 1x8192 cells
Interleaved sampling
delays (167ps/8 = 21ps)
G. Varner et al., Nucl.Instrum.Meth. A583, 447 (2007) G. Varner et al., Nucl.Instrum.Meth. A583, 447 (2007)
6 GSPS * 8 = 48 GSPS
Possible if delay is implemented on PCB Possible if delay is implemented on PCB
On-line waveform display
click template
fit
pedestal histo
848
PMTs
“virtual oscilloscope”
“virtual oscilloscope”
Pulse shape discrimination
) t t [...]θ..
) t d θ(t
)/τ t e (t
/τ ) t e (t
/τi ) t e (t
A
V(t) 0 0 0 0 0 r
B s C
Leading edge Decay time AC-coupling Reflections
Example: / source in liquid xenon detector (or: /p in air shower) Example: / source in liquid xenon detector (or: /p in air shower)
-distribution
= 21 ns
= 34 ns
Waveforms can be clearly distinguished
= 21 ns
= 34 ns
Waveforms can
be clearly
distinguished
Template Fit
• Determine “standard” PMT pulse by
averaging over many events “Template”
• Find hit in waveform
• Shift (“TDC”) and scale (“ADC”) template to hit
• Minimize 2
• Compare fit with waveform
• Repeat if above threshold
• Store ADC & TDC values
Experiment 500 MHz sampling
Pile-up can be detected if two hits are separated in time by ~rise time of signal Pile-up can be detected if two hits are separated in time by ~rise time of signal
Do we still need crates?
• An empty crate slot costs ~1k$
(crate, interface/computer, cooling)
• Crate topologies requires long cables
Reduction of bandwidth
• Alternative: Put electronics on detectors
MEG 3000 channels MEG 3000 channels
G. Varner Belle-TOF G. Varner
Belle-TOF
cPCI
H. Friedrich H. Friedrich
GBit Ethernet
Experiments using SCA ASCIs
MAGIC-II MAGIC-II MEG 3000 channels
MEG 3000 channels
ANITA ANITA
ANTARES ANTARES H.E.S.S.
H.E.S.S.
Belle-TOF Belle-TOF
Conclusions
• Fast waveform digitizing with SCA chips will have a big impact on experiments in the next future, replacing traditional ADCs and TDCs
• SCA community growing! Exchange of experience is important. Joining is easy (e.g. USB evaluation boards)
• New generation of SCA chips on the horizon