A new method to generate and reduce one-loop amplitudes in OpenLoops 2
M. F. Zoller
in collaboration with F. Buccioni and S. Pozzorini
PSI, Villigen - Theoretical Particle Physics Seminar - 04/10/2017
Outline
I. Numerical amplitude generation in OpenLoops
II. New colour and helicity treatment
III. On-the-fly Reduction
IV. Numerical stability
V. Summary and Outlook
I. Numerical amplitude generation in OpenLoops
• Fully automated numerical algorithm for tree and one-loop amplitudes (h = helicity configuration): W0 = X
h
X
col
|M0(h)|2, W1 = X
h
X
col
2 Re
M∗0(h)M1(h)
, W1loop-ind = X
h
X
col
|M1(h)|2 Tree level and one-loop amplitudes are sums of Feynman diagrams
M0 = X
d
M(d)0 , M1 = X
d
M(d)1
• hybrid tree-loop recursion ⇒ high CPU efficiency and numerical stability
• NLO QCD and NLO EW corrections fully implemented
• OpenLoops is interfaced to Sherpa, Powheg, Herwig, Whizard, Geneva, Munich, Matrix
• OpenLoops 1 publicly available at openloops.hepforge.org [Cascioli, Lindert, Maierhöfer, Pozzorini]
– Third party tools for the tensor integral reduction to scalar MIs:
Cuttools 1.9.5 [Ossola, Papadopoulos, Pittau ’08], OneLoop 3.6.1 [van Hameren ’10], Collier 1.2 [Denner, Dittmaier, Hofer ’16]
– High tensor rank in loop momentum q ⇒ high complexity – Stability in the IR region is challenging for 2 → 4 processes
Long-term goal: NNLO automation for 2 → 2 and 2 → 3 processes
– 2 loop amplitude construction and reduction needed ⇒ avoid high tensor rank complexity – Numerical stability at NLO for 2 → 4 is crucial
• OpenLoops 2 to be published soon [Buccioni, Lindert, Maierhöfer, Pozzorini, M.Z.]
– Amplitude construction and integrand reduction merged ⇒ On-the-fly Reduction
⇒ tensor rank ≤ 2 at all times
– Stability issues addressed in a targeted way
Tree level amplitudes
M0 = X
d
M(d)0
Each diagram factorizes into a colour factor and a colour stripped amplitude M(d)l = Cl(d) A(d)l .
colour stripped A(d)0 are split into subtrees by cutting an internal line:
w
aw
b for example⇒ Numerical merging of subtrees performed recursively:
σa wa = σa
wb
wc
waα(ka, ha) = X
βγα (kb,kc)
ka2−m2a wbβ(kb, hb) wcγ(kc, hc)
with momentum ka =kb+kc and for all possible helicity configurations ha = hb+hc.
⇒ Once computed subtrees used in multiple Feynman diagrams at tree and loop level
One-loop amplitude
A(d)1 = Z dDq Tr
"
N(q, h)
#
D¯0D¯1· · ·D¯N−1 =
wN−1 wN
w1 w2
D0
D1
D2
DN−1
q cut open at D¯0
−−−−−−−−−−−→
"
N(q, h)
#βN
β0
=
wN
w1 βN
β0
propagators Di = (q +pi)2 −m2i, spinor/Lorentz indices βi ⇒ trace: contraction with δββ0
N, helicity configurations of subtree wi: hi helicity configurations of A(d)1 : h= h1 +. . .+hN
Numerator factorizes into segments:
"
N(q, h)
#βN
β0
=
" N
Y
i=1
Si(q, hi)
#βN
β0
=
"
S1(q, h1)
#β1
β0
"
S2(q, h2)
#β2
β1
· · ·
"
SN(q, hN)
#βN
βN−1
In the SM a segment (external subtree(s) + one loop vertex + propagator) is a q-polynomial of rank r ≤ 1:
3-point segment:
"
Si(q, hi)
#βi
βi−1
= βi−1
wi
ki
Di
βi
=
"
Yσi
i
#βi
βi−1
+
"
Zν;σi
i
#βi
βi−1
qν
wiσi(ki, hi)
4-point segment:
"
Si(q, hi)
#βi
βi−1
= βi−1
wi1 wi2
ki1 ki2
D
βi
=
"
Yσi
1σ2
#βi
βi−1
wσi11(ki1, hi1)wiσ22(ki2, hi2) (hi = hi1 +hi2)
The OpenLoops dressing step
define partially dressed numerator Nn(q, ˆhn) = S1(q, h1)· · ·Sn(q, hn) (ˆhn = Pn
i=1hi)
β0
w
1D
1w
2D
2w
kD
kβk
w
k+1D
k+1w
N−1D
N−1w
ND
0βN
| {z }
dressed segments
| {z }
undressed segments
dressing step Nn(q,hˆn) = Nn−1(q,hˆn−1)Sn(q, hn) with initial condition N0 = 11 (rank R ≤ n) performed numerically for the tensor coefficients in
N(q,hˆn) =
R
X
r=0
Nµ1...µr(ˆhn)qµ1 · · ·qµr,
"
Nµ1...µr(ˆhn)
#βn
β0
=
"
Nµ1...µr(ˆhn−1)
#βn−1
β0
"
Yσn
n
#βn
βn−1
+
"
Nµ2...µr(ˆhn−1)
#βn−1
β0
"
Zµn
1;σn
#βn
βn−1
wnσn(kn, hn)
Colour, helicity and diagram sums in OpenLoops 1
• for each diagram d and global helicity h configuration construct Tr
NN(d)(q, h)
• colour sum with Born: VN(d)(q, h) = 2
X
col
M0(h)∗C(d)
Tr
NN(d)(q, h)
• helicity sum: VN(d)(q) = X
h
VN(d)(q,h)
• sum same topology diagrams, reduce and evaluate integrals: Z dDq X
d
Tr
VN(d)(q,0)
D¯0, . . . ,D¯N−1
⇒ parent-child trick (recycling of colour-stripped partially dressed numerators)
NN−2 =
w1 wN−2
−→
w1 wk wk+1 wk+2
= NN−2 SN−1 SN w1 wk wk+1 wk+2
= NN−2 S˜N−1
New idea: formulate the OpenLoops recursion directly for
the colour-helicity summed interference with the Born amplitude VN(d)(q,0).
II. New colour and helicity treatment
consider color-helicity summed numerator VN(q,0) = X
h
2
X
col
M0(h)∗C
NN(q,h) = X
h1...hN
2
X
col
M0(h)∗C
| {z }
=V0(h)
S1(q,h1)· · ·SN(q,hN)
and formulate recursion for partially dressed numerator with nested helicity sums Vn(q,ˇhn) = X
hn
. . . X
h2
X
h1
V0(h)S1(q,h1)
S2(q,h2)· · ·
Sn(q,hn) ∀ hˇn = hn+1 + · · · + hN
= X
h1...hn
X
col
wk+1
wN
w1 wk
LO ×
wk+1
wN
w1 wk
NLO
and a dressing step as Vn(q,ˇhn) = X
hn
Vn−1(q,ˇhn−1)Sn(q, hn)
⇒ Remaining helicity dof are those of the undressed segments!
Parent-child trick not possible (different colour factors) ⇒ OpenLoops Merging instead
The OpenLoops Merging
Sum partially dressed open loops Vn(q,hˇn) = P
αVn(α)(q,hˇn) with
• the same topology D¯0, . . . , D¯N−1
• the same undressed segments Sn+1, . . . , SN
since
P
α
Vn(α) Sn+1···SN−1
D¯0D¯1···D¯N−1
=
Vn¯ Sn+1···SN−1D0D¯1···D¯N−1
Example:
N(1)
e1 e2 e3
Dn
wn+1
Dn+1
wN
D0
+
N(2)
e1 e2 e3
Dn
wn+1
Dn+1
wN
D0
+
N(3)
e3 e1 e2
Dn
wn+1
Dn+1
wN
D0
+
N(4)
e3 e1 e2
Dn
wn+1
Dn+1
wN
D0
=
N
e1 e2 e3
Dn
wn+1
Dn+1
wN
D0
B dressing steps for Sn+1, . . . , SN performed only once for the merged object B crucial for combination with on-the-fly integrand reduction (see later)
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1
2
3
4
5
6
7
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
35
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
35
70
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
35
70
126
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
35
70
126
210
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 OpenLoops 330
complexity grows exponentially with tensor rank
Amplitude generation and tensor reduction in OpenLoops 1
Example:
n: # of attached external legs
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 OpenLoops 330
complexity grows exponentially with tensor rank
Collier CutTools
Numerical tensor integral reduction to scalar MI
III. On-the-fly Reduction
Use reduction identities valid at integrand level [del Aguila, Pittau ’05]
qµqν = Aµν + Bλµνqλ
= Aµν−1 + Aµν0 D0 +
B−1,λµν + X3
i=0 Bi,λµνDi
qλ, Di = (q + pi)2 − m2i in order to reduce the factorized open loop integrand:
VN(q)
D0 · · ·DN = S1(q)S2(q)· · ·Sn(q)· · ·SN(q) D0D1D2D3 · · ·DN−1
.
III. On-the-fly Reduction
Use reduction identities valid at integrand level [del Aguila, Pittau ’05]
qµqν = Aµν + Bλµνqλ
= Aµν−1 + Aµν0 D0 +
B−1,λµν + X3
i=0
Bi,λµνDi
qλ, Di = (q + pi)2 − m2i in order to reduce the factorized open loop integrand:
VN(q)
D0· · ·DN = S1(q)S2(q)· · ·Sn(q)· · ·SN(q) D0D1D2D3 · · ·DN−1
integrand reduction applicable after n steps ∀n ≥ 2 (independently of future steps!)
⇒ Vµν qµqν
D¯0 · · ·D¯N−1 = V−1µ qµ + V−1
D¯0 · · ·D¯N−1 + X3
i=0
Viµqµ + Vi
D¯0 · · ·D¯i−1D¯i+1 · · ·D¯N−1
• q-dependence reconstructed in terms of 4 propagators ⇒ new topologies with pinched propagators
• Aµν, Bλµν depend on external momenta p1, p2, p3
⇒ Compute with momentum space basis lµ = pµ−α pµ, lµ = pµ−α pµ, l , l ⊥ l , l , l2 = 0
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5
15
35
70
126
210
330
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 330
4 pinched subtopologies
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 330
4 pinched subtopologies
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 330
4 pinched subtopologies
4 double pinched subtopologies
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 330
4 pinched subtopologies
4 double pinched subtopologies
OpenLoops + OFR
complexity associated with tensor rank remains small!
Amplitude generation and tensor reduction in OpenLoops 2
Example:
1 2 3 4 5 6 7 n
# of tensor coefficients rank
1 2 3 4 5 6 7
5 15 35 70 126 210 OpenLoops 1 330
4 pinched subtopologies
4 double pinched subtopologies
OpenLoops + OFR
complexity associated with tensor rank remains small!
Problem: huge proliferation of topologies due to pinching of propagators
⇒ Vµν qµqν
D¯0· · ·D¯N−1 =
V−1µ +
3
X
i=0
ViµD¯i
qµ
| {z }
rank 1
+V−1 +V0D¯0
| {z }
rank 0
+ V˜−1q˜2
| {z }
rational term
1
D¯0· · ·D¯N−1
w1 w2
∼ Vµνqµqν
w3 wN
=
w1 w2 w3 wN
V−1µqµ+ ˜V−1q˜2
+
w1 w2 w3 wN
V1µqµ
+
w1 w2 w3 wN
V2µqµ
+
w1 w2 w3 wN
V3µqµ
+
w1 w2 w3 wN
V0µqµ
⇒ factor ∼ 5 higher complexity after each reduction step!
Solution: OpenLoops Merging
• Contract pinched propagator between dressed segments
wi
Di
wi+1
Di+1
−→
wi wi+1
Di+1
• Merge with all (pinched and unpinched) diagrams with same topology and undressed segments
N(1)
wn wn+1
Dn+1
wn+2 wN
N(2)
wn wn+1
Dn+1
wn+2 wN
−→
N
wn wn+1
Dn+1
wn+2 wN
• No extra cost for pinched topologies after merging
• Algorithm:
– Start with highest point diagrams → merging with lower point diagrams
– OpenLoops 2 recursion step: dress one segment → reduce if necessary → merge
Technicalities
• Important: Cutting rule , i.e. choice of D¯0.
wN−1
wN
w1 w2
D0
D1
D2
DN−1
q →
β0
w1
D1
w2
D2
wk
Dk
βk
wk+1
Dk+1
wN−1
DN−1
wN
D0
βN
⇒ One specific external particle always in w1.
⇒ Unique rule for dressing direction based on external particles in w2 and wN.
• Treatment of pinches of D¯0 = (q2 − m20) (p0 = pN = 0) w1
k1
wn+1 wN−1
kN−1
wN
kN
pN= 0
shift cut
−−−−→
wN
kN
w1
k1
wn+1 wN−1
pN−1= 0 kN−1
dress SN
−−−−→
wN
kN
w1
k1
wn+1 wN−1
pN−1= 0 kN−1
contract
−−−−→
wN w1
kN k1
wn+1 wN−1
pN−1= 0 kN−1
Final integral reduction
• reduce bubbles, rank-1 triangles and boxes with integral level identities [del Aguila, Pittau ’05]
• reduce rank-1 and rank-0 integrals with N ≥ 5 propagators to scalar boxes via simple OPP relations [Ossola, Papadopoulos, Pittau ’07]
V + Vµqµ
D¯0D¯1 · · ·D¯N−1 = NX−1
i0<i1<i2<i3
d(i0i1i2i3) D¯i0D¯i1D¯i2D¯i3
• use Collier 1.2 [Denner, Dittmaier, Hofer ’16] for scalar boxes, triangles, bubbles, tadpoles
IV. Numerical Stability
qµqν = Aµν−1 + Aµν0 D0 +
B−1,λµν + X3
i=0
Bi,λµνDi
qλ
Aµνi , Bi,λµν computed from reduction basis li(p1, p2) with i = 1,2,3,4 and third momentum p3
Aµνi = 1
γaµνi , Bi,λµν = 1
γ2
b(1)i,λ
µν
+ 1 γ
b(2)i,λ
µν
γ = γ(p1, p2) = 4 ∆(p1,p2)
p1p2±√
∆(p1,p2) with ∆ = (p1p2)2 − p21p22
Severe numerical instabilities for γ ∝ ∆(p1, p2) → 0
• Freedom to choose two momenta from p1, p2, p3
⇒ maximize γ in on-the-fly reduction with N ≥ 4 propagators.
⇒ avoid small Gram determinants until triangle reduction
• For N = 3: identify problematic kinematic configurations and use targeted expansions.
Problematic kinematic configuration: t-channel diagrams with
q p1
q + p1
p2 − p1 q + p2
−p2
p21 = −p2 < 0,
p22 = −p2(1 + δ), 0 ≤ δ 1, (p2 −p1)2 = 0,
⇒ √
∆ = p2 2 δ
⇒ γ = −p2δ2
⇒ expand basis momenta li, reduction formula and scalar integrals in δ, e.g. massless rank 1:
Cµ = 2 δ2p2
B0(−p2,0,0)[−pµ1(1 + δ) +pµ2] +B0−p2(1 + δ),0,0[(pµ1 − pµ2)(1 + δ)]
+1
δC0−p2,−p2(1 + δ),0,0,0[−pµ1(1 + δ) + pµ2]
= pµ1 + pµ2 2p2
−B0(−p2,0,0) + 1
+δ pµ1 + 2pµ2 6p2
B0(−p2,0,0)
+O(δ2) with C0(p1, p2, m0, m1, m2) ∼ Z dDq 1
D¯0D¯1D¯2 and B0(p1, m0, m1) ∼ Z dDq 1 D¯0D¯1
Implemented: direct expansions for the full reduction of rank ≤ 3 triangles to scalars for all relevant mass configurations up to and including O(δ2) [soon O(δ4)].
CPU performance: OpenLoops 1 + Collier/Cuttools vs OpenLoops 2
Runtimes (10−3s) per phase-space point
Last column: timing ratio between the fastest OL1+reduction library and OL2
OL1 (Collier) OL1 (Cuttools) OL2 OL1/OL2
uu¯ → tt¯ 0.2355 0.4034 0.2385 0.99
uu¯ → tt g¯ 4.259 7.066 3.490 1.2
uu¯ → tt g g¯ 1.154 · 102 1.612 · 102 0.7505 · 102 1.5
gg → tt¯ 1.408 2.486 1.019 1.4
gg → tt g¯ 35.03 50.23 22.93 1.5
gg → tt g g¯ 1.330 · 103 1.519 · 103 0.6010 · 103 2.2
ud¯→ W+g 0.2972 0.6274 0.3255 0.91
ud¯→ W+g g 5.690 11.30 5.222 1.1
ud¯→ W+g g g 1.787 · 102 2.380 · 102 1.078 · 102 1.7
uu¯ → W+ W− 0.2622 0.4140 0.1756 1.5
uu¯ → W+ W− g 8.528 12.04 7.011 1.2
uu¯ → W+ W− g g 2.441 · 102 2.817 · 102 1.278 · 102 1.9
Factor ∼ 2 speedup wrt OpenLoops 1 for nontrivial processes!
Stability of OpenLoops 1 and 2 in double precision: 2 → 3 processes (at √ ˆ
s = 1 TeV)
Probability of relative accuracy A or less (wrt OL1 + Cuttools in quad precision, 106 uniform random points)
Quadruple Precision OpenLoops1 + Collier OpenLoops1 + Cuttools OpenLoops2
-15 -10 -5 0 5
10- 6 10- 5 10- 4 0.001 0.010 0.100 1
accuracy
fractionofpoints(cumulative)
gg →tt+g
Quadruple Precision OpenLoops1 + Collier OpenLoops1 + Cuttools OpenLoops2
-15 -10 -5 0 5
10- 6 10- 5 10- 4 0.001 0.010 0.100 1
accuracy
fractionofpoints
(c)umulative
ud→W++2g
• Hard cuts: pT > 50GeV and ∆Rij => 0.5 for final state QCD partons
(∆Rij = q(ηi−ηj)2 + (φi −φj)2, φi azimuthal angle, ηi rapidity)
• Behaviour in the tails crucial for real-life applications
• 1 to 3 orders of magnitude improvement wrt OL1 + Cuttols and Collier in DP
Excellent stability thanks to on-the fly reduction and minimal ∆-expansions Soft region under investigation ⇒ important for real-virtual part of NNLO
Stability of OpenLoops 1 and 2 in double precision: 2 → 4 processes (at √ ˆ
s = 1 TeV)
Probability of relative accuracy A or less (wrt OL1 + Cuttools in quad precision, 106 uniform random points)
Quadruple Precision OpenLoops1 + Collier OpenLoops1 + Cuttools OpenLoops2
-15 -10 -5 0 5
10- 6 10- 5 10- 4 0.001 0.010 0.100 1
accuracy
fractionofpoints(cumulative)
gg→tt+2g
preliminary
Quadruple Precision OpenLoops1 + Collier OpenLoops1 + Cuttools OpenLoops2
-15 -10 -5 0 5
10- 6 10- 5 10- 4 0.001 0.010 0.100 1
accuracy
fractionofpoints
(cumulative)
ud→W++3g
preliminary
• Same hard cuts as for 2 → 3
• Orders of magnitude improvement wrt Cuttools and similar or better stability wrt Collier
• Further improvements in the tail under investigation
Very good stability thanks to on-the fly reduction and minimal ∆-expansions
V. Summary and Outlook
• New algorithm for construction and reduction of 1-loop ampitudes in a single recursion
• Drastic reduction of complexity at all stages of the calculation (rank ≤ 2)
• New colour and helicity treatment + OpenLoops merging ⇒ significant gain in CPU efficiency
• Same level of automation and same interface as OpenLoops 1
• Dedicated stability analysis possible in a single dressing and reduction tool
⇒ Simple targeted expansions provide excellent numerical stability in the hard regions
• future projects:
– improvement of stability in real-virtual NNLO contributions (soft region) – extension to 2 loops