Parameter tuning for the nonuniform FFT and applications

(1)

Franziska Nestler Chemnitz University of Technology

Faculty of Mathematics

Mecklenburg Workshop

Approximation Methods and Fast Algorithms September 10 – 14, 2018, Hasenwinkel

(2)

Parameter tuning for the nonuniform FFT and applications Overview

Overview

1 Parameter tuning for the NFFT

• Introduction to NFFT

• Error in theL2-norm

• Parameter choice and comparison of window functions

2 Application: interactions in particle systems

• Coulomb problem and Ewald summation

• A fast NFFT based algorithm

• Errors and parameter choice

(3)

Parameter tuning for the nonuniform FFT and applications Introduction to NFFT

FFT for nonequispaced data – NFFT

Trig. polynomial: f(x) := X

k∈I_M

fˆke^2πik^>^x x∈T^d

Notation: SetIM :={−^M/2, . . . ,^M/2−1}^d⊂Z^dfürM ∈2N.

TorusT:=R/Z'[−¹/2,¹/2).

(inverse) FFT: fj := X

k∈I_M

fˆke^2πik^>^j^/^M j∈IM

, N :=|IM|=M^d

NFFT: f(xj) := X

k∈I_M

fˆke^2πik^>^x^j xj∈T^d, j= 1, . . . , N

adjoint NFFT: h(k) :=

N

X

j=1

fje^−2πik^>^x^j xj∈T^d,k∈ IM

Complexity:O(|IM|log|IM|

+N

)

[Dutt, Rokhlin 1993] [Beylkin 1995] [Potts, Steidl, Tasche 2001] [Greengard, Lee 2004]

(4)

FFT for nonequispaced data – NFFT

k∈I_M

TorusT:=R/Z'[−¹/2,¹/2).

k∈I_M

fˆke^2πik^>^j^/^M j∈ IM, N :=|IM|=M^d

NFFT: f(xj) := X

k∈I_M

N

X

j=1

Complexity:O(|IM|log|IM|+N)

[Dutt, Rokhlin 1993] [Beylkin 1995]

[Potts, Steidl, Tasche 2001] [Greengard, Lee 2004]

(5)

FFT for nonequispaced data – NFFT

k∈I_M

TorusT:=R/Z'[−¹/2,¹/2).

k∈I_M

fˆke^2πik^>^j^/^M j∈ IM, N :=|IM|=M^d

NFFT: f(xj) := X

k∈I_M

N

X

j=1

Complexity:O(|IM|log|IM|+N)

[Dutt, Rokhlin 1993] [Beylkin 1995]

[Potts, Steidl, Tasche 2001] [Greengard, Lee 2004]

(6)

Idea

Trace problem back to equidistant case via convolution with awindow function.

iFFT:

gl:= 1 M^d

X

k∈I_M

1 ck( ˜ϕ)

fˆke^2πik^>^l/M (l∈ IM)

Convolution with window:

f(xj)≈

X

l∈I_M

glϕ˜ xj−_M^l

=:f≈(xj)

−¹2

1

0 2 1

M

−^m_M,_M^m

˜ ϕ(x)

−¹2

1

0 2

Higher accuracy: enlarge support parameterm∈N,oversampling factorσ≥1

(7)

Idea

iFFT:

gl:= 1 M^d

X

k∈I_M

1 ck( ˜ϕ)

f(xj)≈ X

l∈I_M

glϕ˜ xj−_M^l

=:f≈(xj)

−¹2

1

0 2 1

M

−^m_M,_M^m

˜ ϕ(x)

−¹2

1

0 2

(8)

Idea

iFFT:

gl:= 1 M^d

X

k∈I_M

1 ck( ˜ϕ)

f(xj)≈ X

l∈I_M

glϕ˜ xj−_M^l

=:f≈(xj)

−¹2

1

0 2 1

M

−^m_M,_M^m

˜ ϕ(x)

−¹2

1

0 2

Higher accuracy: enlarge support parameterm∈N

,oversampling factorσ≥1

(9)

Idea

iFFT:

gl:= 1 σ^dM^d

X

k∈I_M

1 ck( ˜ϕ)

fˆke^2πik^>^l/σM (l∈ IσM)

f(xj)≈ X

l∈I_σM

glϕ˜ xj−_σM^l

=:f≈(xj)

−¹2

1

0 2 1

σM

−_σM^m ,_σM^m

˜ ϕ(x)

−¹2

1

0 2

(10)

Parameter tuning for the nonuniform FFT and applications Error in theL2-norm and window functions

Error in the L

2

-norm

In the following we consider the cased:= 1.

Letϕbe a rapidly decreasing function. Commonly, we choose

˜

ϕ(x) :=X

n∈Z

ϕ(x+n).

Forcompactly supportedfunctionsϕwe have

kf−f≈k²L₂= X

k∈I_M

|fˆk|² X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ) .

Strongly depends on:

• window functionϕwith support[−^m/σM,^m/σM]

• support parameterm

• oversampling factorσ

• Fourier coefficientsfˆk

(11)

Error in the L

2

-norm

In the following we consider the cased:= 1.

Letϕbe a rapidly decreasing function. Commonly, we choose

˜

ϕ(x) :=X

n∈Z

ϕ(x+n).

Forcompactly supportedfunctionsϕwe have

kf−f≈k²L₂= X

k∈I_M

|fˆk|² X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ) .

Strongly depends on:

• window functionϕwith support[−^m/σM,^m/σM]

• support parameterm

• oversampling factorσ

• Fourier coefficientsfˆk

(12)

Compactly supported window functions

• Centered cardinal B-spline: with support[−_σM^m,_σM^m ]

ϕ(x) =B2m(σM x), ck( ˜ϕ) = 1 σMsinc^2m

πk σM

• Kaiser-Bessel (Bessel-I0): with ashape parameterb >0 ϕ(x) = ¹₂I0(bp

m²−σ²M²x²)·χ[−^m σM,_σM^m ]

ck( ˜ϕ) = 1 σM







sinh(b√

4π²k²/(σ²M²)−m²)

√b²−4π²k²/(σ²M²) :|k| ≤ ^σMb_2π msinc(mp

4π²k²/(σ²M²)−b²) :else

• Truncated Gaussian: with ashape parameterb >0 ϕ(x) = ^√¹

πbe^−σ²^M²^x²^/b·χ[−_σM^m ,_σM^m ]

ck( ˜ϕ) = e^−bπ²^k²^/(σ²^M²⁾

σM Re

" erf m

√b+ iπk√ b σM

!#

(13)

Compactly supported window functions

πk σM

m²−σ²M²x²)·χ[−^m σM,_σM^m]

ck( ˜ϕ) = 1 σM







sinh(b√

4π²k²/(σ²M²)−m²)

ck( ˜ϕ) = e^−bπ²^k²^/(σ²^M²⁾

σM Re

" erf m

√b+ iπk√ b σM

!#

(14)

Compactly supported window functions

πk σM

m²−σ²M²x²)·χ[−^m σM,_σM^m]

ck( ˜ϕ) = 1 σM







sinh(b√

4π²k²/(σ²M²)−m²)

ck( ˜ϕ) = e^−bπ²^k²^/(σ²^M²⁾

σM Re

"

erf m

√b+ iπk√ b σM

!#

(15)

Estimates for certain window functions

kf−f≈k²L₂= X

k∈I_M

|fˆk|² X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)

| {z }

≤s(k)

≤ X

k∈I_M

|fˆk|²s(k).

B-Spline[Steidl 1998]:

s(k) = 8m 4m−1

|k| σM

|k| σM −1

!4m

Bessel[N. 2016]: ForRk> _σM^|k| +_2π^b

s(k) = X

0<|r|≤R_k

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ) + 1

c²_k( ˜ϕ) ln

2π(^|k|/σM−R_k)−b 2π(^|k|/σM−R_k)+b

+ ln

2π(^|k|/σM+R_k)+b 2π(^|k|/σM+R_k)−b

4πbσ²M²

(16)

Estimates for certain window functions

kf−f≈k²L₂= X

k∈I_M

|fˆk|² X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)

| {z }

≤s(k)

≤ X

k∈I_M

|fˆk|²s(k).

s(k) = 8m 4m−1

|k|

σM

|k|

σM −1

!4m

Bessel[N. 2016]: ForRk> _σM^|k| +_2π^b

s(k) = X

0<|r|≤R_k

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ) + 1

c²_k( ˜ϕ) ln

+ ln

4πbσ²M²

(17)

Estimates for certain window functions

kf−f≈k²L₂= X

k∈I_M

|fˆk|² X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)

| {z }

≤s(k)

≤ X

k∈I_M

|fˆk|²s(k).

s(k) = 8m 4m−1

|k|

σM

|k|

σM −1

!4m

Bessel[N. 2016]: ForRk>_σM^|k| +_2π^b

s(k) = X

0<|r|≤R_k

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ) + 1

c²_k( ˜ϕ) ln

+ ln

4πbσ²M²

(18)

Parameter choice

Error estimate:

kf−f≈k²L₂ ≤ X

k∈IM

|fˆk|²s(k). (1)

Questions:

• How large to choosem,σin order to reach a certain accuracyε?

• How to choose the shape parameterb?

• Which window function performs best?

fˆkunkown→worst case error analysis fˆkknown→compute (1)

(19)

Parameter choice

Error estimate:

kf−f≈k²L₂ ≤ X

k∈IM

|fˆk|²s(k). (1)

Questions:

• How large to choosem,σin order to reach a certain accuracyε?

• How to choose the shape parameterb?

• Which window function performs best?

fˆkunkown→worst case error analysis fˆkknown→compute (1)

(20)

The optimal shape parameter

In case of theBessel window functiona worst case error analysis suggests[Potts, Steidl 2003]

b:= 2π 1−_2σ¹

≈

(3.14 :σ=1.0, 3.77 :σ=1.25. Example:Known Fourier coefficients.

2 4 6

10⁻¹³ 10⁻⁸ 10⁻³

shape parameterb kf−f≈k2 L2

fˆk:= _1+k¹2, k∈I64

2 4 6

10⁻¹⁷ 10⁻¹¹ 10⁻⁵

fˆk:= e^−0.2^k², k∈I64

Estimated (solid) and measured (dotted) errors. Parameters:m= 4,σ∈ {1.0,1.25}.

(21)

The optimal shape parameter

b:= 2π 1−_2σ¹

≈

(3.14 :σ=1.0, 3.77 :σ=1.25.

Example:Known Fourier coefficients.

2 4 6

10⁻¹³ 10⁻⁸ 10⁻³

fˆk:= _1+k¹2, k∈I64

2 4 6

10⁻¹⁷ 10⁻¹¹ 10⁻⁵

fˆk:= e^−0.2^k², k∈I64

(22)

The optimal shape parameter

b:= 2π 1−_2σ¹

≈

(3.14 :σ=1.0, 3.77 :σ=1.25.

Example:Known Fourier coefficients.

2 4 6

10⁻¹³ 10⁻⁸ 10⁻³

fˆk:= _1+k¹2, k∈I64

2 4 6

10⁻¹⁷ 10⁻¹¹ 10⁻⁵

fˆk:= e^−0.2^k², k∈I64

(23)

Modified B-spline window

Introduce a shape parameterb∈ ¹₂Nand set ϕ(x) :=B2b

σM b m x

.

supp(ϕ) = [−_σM^m ,_σM^m ]

Estimate:

For someRk∈Nwe obtain the upper bound[N. 2016]

s(k) = X

0<|r|≤R_k

c²_k+rσM( ˜ϕ)

c²_k( ˜ϕ) + 1 sinc^4b ^mπk_{σM b}

k

σM +Rk1−4b

− _σM^k −Rk1−4b πm

b

4b

(4b−1)

.

3 4 5 6

10⁻⁹ 10⁻⁷ 10⁻⁵

3 4 5 6

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵

fˆ_k:=_1+k¹₂ (left) andfˆ_k:= e^−0.2^k²(right),m= 4,σ∈ {1.0,1.25}

(24)

Modified B-spline window

σM b m x

. Estimate: For someRk∈Nwe obtain the upper bound[N. 2016]

s(k) = X

0<|r|≤Rk

c²_k+rσM( ˜ϕ)

k

σM +R_k1−4b

− _σM^k −R_k1−4b πm

b

4b

(4b−1) .

3 4 5 6

10⁻⁹ 10⁻⁷ 10⁻⁵

3 4 5 6

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵

fˆk:=_1+k¹2 (left) andfˆk:= e^−0.2^k²(right),m= 4,σ∈ {1.0,1.25}

(25)

Modified B-spline window

σM b m x

. Estimate: For someRk∈Nwe obtain the upper bound[N. 2016]

s(k) = X

0<|r|≤Rk

c²_k+rσM( ˜ϕ)

k

σM +R_k1−4b

− _σM^k −R_k1−4b πm

b

4b

(4b−1) .

3 4 5 6

10⁻⁹ 10⁻⁷ 10⁻⁵

3 4 5 6

10⁻¹⁵ 10⁻¹⁰ 10⁻⁵

fˆk:= ¹2(left) andfˆk:= e^−0.2^k²(right),m= 4,σ∈ {1.0,1.25}

(26)

Parameter tuning

(1) Choice of the shape parameter:

• Input:Fourier coefficientsfˆk, support paramterm, oversampling factorσ.

• Determine the optimalbbased on the error estimates and a simple binary search algorithm.

(2) Oversampling factorσ:

• Input:Fourier coefficientsfˆk, support parameterm, required accuracyε.

• Use a simple binary search algorithm to determine the minimal requiredσ.

→For each consideredσoptimizebbased on(1). (3) Minimal runtime:

• Input:Fourier coefficientsfˆk, required accuracyε.

• Consider different support parametersm.

• Determine viable parameter combinations based on(1)and(2).

• The runtime minimizing combination will depend on your implementation, software, hardware etc.

min

{m,σ,b}tmeasure

(27)

Parameter tuning

→For each consideredσoptimizebbased on(1).

(3) Minimal runtime:

min

{m,σ,b}tmeasure

(28)

Parameter tuning

→For each consideredσoptimizebbased on(1).

(3) Minimal runtime:

min

{m,σ,b}tmeasure

(29)

Comparison of different window functions

2 4 6 8

10⁻²⁴ 10⁻¹⁴ 10⁻⁴

support parameterm kf−f≈k2 L2

2 4 6 8

10⁻³⁵ 10⁻¹⁹ 10⁻³

support parameterm kf−f≈k2 L2

Estimated errorskf−f≈k²for different window functions:

*Bessel,oB-spline,+Gaussian (all with optimizedb)

fˆk:=_1+k¹₂ (left) andfˆk:= e^−0.2^k²(right),m∈ {2, . . . ,8},σ={1.0,1.25}

(30)

Parameter tuning for the nonuniform FFT and applications Application: interactions in particle systems

The Coulomb problem

Charged particle system: LetNchargesq_j∈Rat positionsx_j∈R³be given.

Are interested in the electrostatic potentials (xij:=xi−xj):

φ(j) :=

N

X

i=1 i6=j

qi

kxijk, j= 1, . . . , N. → O(N²)?

3d-periodic boundary conditions ChooseS:=Z³,xj∈[−^L/2,^L/2)³and set

φ(j) :=X

n∈S N

X

i=1 i6=jifn=0

qi

kxij+Lnk

→crystals, etc.

(31)

The Coulomb problem

φ(j) :=

N

X

i=1 i6=j

qi

kxijk, j= 1, . . . , N. → O(N²)?

φ(j) :=X

n∈S N

X

i=1 i6=jifn=0

qi

kxij+Lnk

→crystals, etc.

L

(32)

The Coulomb problem

φ(j) :=

N

X

i=1 i6=j

qi

kxijk, j= 1, . . . , N. → O(N²)?

φ(j) :=X

n∈S N

X

i=1 i6=jifn=0

qi

kxij+Lnk

→crystals, etc.

Franziska Nestler, TU Chemnitz, Faculty of Mathematics 13

(33)

The Coulomb problem

φ(j) :=

N

X

i=1 i6=j

qi

kxijk, j= 1, . . . , N. → O(N²)?

φ(j) :=X

n∈S N

X

i=1 i6=jifn=0

qi

kxij+Lnk

→crystals, etc.

(34)

Ewald summation

φ(j) =X

n

X

i

qi

kxij+Lnk

Idea: Split kernel function into two parts[Ewald 1921]

1 r

= f(αr) r

| {z } long ranged, continuous

+ 1−f(αr) r

| {z } short ranged, singularity

0 0.2 0.4 0.6 0.8 1 0

2 4 6 8 10

φ(j) =X

n

X

i

qif(αkxij+Lnk) kxij+Lnk

| {z }

Fourier space

+X

n

X

i

qi

1−f(αkxij+Lnk) kxij+Lnk

| {z }

direct via truncation kx_ij+Lnk ≤r_cut

(35)

Ewald summation

φ(j) =X

n

X

i

qi

kxij+Lnk

1

r = f(αr) r

+ 1−f(αr) r

0 0.2 0.4 0.6 0.8 1 0

2 4 6 8 10

φ(j) =X

n

X

i

| {z }

Fourier space

+X

n

X

i

qi

| {z }

(36)

Ewald summation

φ(j) =X

n

X

i

qi

kxij+Lnk

1

r = f(αr) r

+ 1−f(αr) r

0 0.2 0.4 0.6 0.8 1 0

2 4 6 8 10

φ(j) =X

n

X

i

| {z }

Fourier space

+X

n

X

i

qi

| {z }

(37)

Ewald summation

φ(j) =X

n

X

i

qi

kxij+Lnk

1

r = f(αr) r

+ 1−f(αr) r

0 0.2 0.4 0.6 0.8 1 0

2 4 6 8 10

φ(j) =X

n

X

i

| {z }

Fourier space

+X

n

X

i

qi

| {z }

(38)

Long range parts

Transformation into Fourier space yields[Ewald 1921]

φ^long(j) = X

k∈Z³ N

X

i=1

qiψ(k)eˆ ^2πik^>^x^ij^/L.

Classical choice:f(αr) = erf(αr)

→ ψ(k) = (πL)ˆ ⁻¹kkk⁻²e^−π²^kkk²^/(α²^L²⁾

→ rapidly decreasing Fourier coefficients

→ good convergence, truncation of infinite sum

O(NlogN)Alg.P²NFFT [Pippig, Potts 2011]

φ^long(j)≈ X

k∈I_M

ψ(k)ˆ

N

X

i=1

qie^2πik^>^xⁱ^/L

!

| {z }

adjoint NFFT

e^−2πik^>^x^j^/L

| {z }

NFFT

.

Analogously: computeforcesF(j) =−qj∇x_jφ(j) (qj·vector valued NFFT)

(39)

Long range parts

φ^long(j) = X

k∈Z³ N

X

i=1

→ good convergence, truncation of infinite sum

O(NlogN)Alg.P²NFFT [Pippig, Potts 2011]

φ^long(j)≈ X

k∈I_M

ψ(k)ˆ

N

X

i=1

qie^2πik^>^xⁱ^/L

!

| {z }

adjoint NFFT

e^−2πik^>^x^j^/L

| {z }

NFFT

.

(40)

Long range parts

φ^long(j) = X

k∈Z³ N

X

i=1

→ good convergence,truncationof infinite sum O(NlogN)Alg.P²NFFT [Pippig, Potts 2011]

φ^long(j)≈ X

k∈I_M

ψ(k)ˆ

N

X

i=1

qie^2πik^>^xⁱ^/L

!

| {z }

adjoint NFFT

e^−2πik^>^x^j^/L

| {z }

NFFT

.

(41)

Error analysis

In Applications: Interested in root mean square (rms) force error

∆F:=

v u u t 1 N

N

X

j=1

kF(j)−F≈(j)k².

Two different types of errors:

1 Ewald truncation errors [Kolafa, Perram 1992]

→cutoff radiusrcut, splitting parameterα, cutoff parameterM(mesh size)

2 NFFT approximation errors (long range part)

→window functionϕ, support parameterm, oversampling factorσ, shape parameterb

∆F_NFFT≈ v u u t 1 N

N

X

j=1

q_j² v u u u t

X

k∈I_M

kkk²ψ(k)ˆ ²







 X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)





2

−1





fˆ_k²(given analytically),error sums (as before, now 3d)

[Hockney, Eastwood 1988] [Pippig 2015] [N. 2016]

(42)

Error analysis

∆F:=

v u u t 1 N

N

X

j=1

kF(j)−F≈(j)k².

N

X

j=1

q_j² v u u u t

X

k∈I_M

kkk²ψ(k)ˆ ²







 X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)





2

−1





(43)

Error analysis

∆F:=

v u u t 1 N

N

X

j=1

kF(j)−F≈(j)k².

N

X

j=1

q_j² v u u u t

X

k∈I_M

kkk²ψ(k)ˆ ²







 X

r∈Z³

c²_k+rσM( ˜ϕ) c²_k( ˜ϕ)





2

−1





(44)

Parameter tuning approach

Accuracy tuning:

Input: Required level of accuracyε, near field cutoff radiusrcut.

1 Compute optimalαandMbased on

∆F_short≈^! ε and ∆F_long≈^! ε.

2 Tune NFFT parameters: ∆FNFFT

≈! ε

For a set of possible support parametersmcompute the corresponding

• required oversampling factorsσ,

• optimal shape parametersb.

Output:α,Mand several combinations{m, σ, b}.

Runtime optimization:depending on implementation, software, hardware

{rcutmin,α,M} min

{m,σ,b}tmeasure

(45)

Parameter tuning approach

Accuracy tuning:

≈! ε

{rcutmin,α,M} min

{m,σ,b}tmeasure

(46)

Parameter tuning approach

Accuracy tuning:

≈! ε

{rcutmin,α,M} min

{m,σ,b}tmeasure

(47)

Results / examples

[N. 2016, 2018]

• Spending some oversampling is oftentimes advantageous in terms of computational costs.

• Bessel (with optimizedb) outperforms the B-spline window in most cases.

{rcutmin,α,M} min

{m,σ,b}tmeasure

B-spline

m σ_required runtime

3 1.72 0.125

4 1.23 0.077

5 1.08 0.069

6 1.02 0.074

7 1.00 0.079

Bessel

m σ_required runtime

3 1.28 0.074

4 1.00 0.056

5 1.00 0.062

6 1.00 0.069

7 1.00 0.078

Example:Possible combinations ofm,σand corresponding runtimes (for{rcut, α, M}andε= 10⁻⁵fixed).

min

{rcut,α,M} min

{m,σ,b}tmeasure

4 5 6 7

6·10⁻² 8·10⁻² 0.1

rcut

overallruntime

Example:Typical trend of runtime with respect to near field cutoffrcutfor fixedε(B-splinevs.Bessel).