Fast summation algorithms of radial functions

(1)

Fast Fourier transforms at nonequispaced nodes and applications

Franziska Nestler, Daniel Potts Department of Mathematics Technische Universität Chemnitz

CECAM Summer School, Jülich, 11th September 2013

(2)

1 Introduction: FFT

2 NFFT

3 NFFT based fast summation

4 Application to particle simulation

(3)

”The FFT is, without doubt, one of the most important algorithm in applied mathematics and engineering.” (V. Olshevsky)

”The Fast Fourier transform (FFT) is one of the truly great computational developments of this century. It has changed the face of science and engineering so that it is not an exaggeration to say that life as we know it would be very different without FFT.”(Charles Van Loan)

1805Carl Friedrich Gaußused an algorithm similar to FFT.

1903Runge

1942Danielson and Lanczos 1965Cooley and Tukey

Gauß Runge Lanczos Tukey

(4)

Problem:fast computation of

f(wj) =

M/2−1

X

k=−M/2

fˆke^−2πikw^j (j=−N/2, . . . , N/2−1)

h(k) =

N/2−1

X

j=−N/2

fje^2πikw^j (k=−M/2, . . . , M/2−1) wj∈T:= [−1/2,1/2)

forequispacednodeswjandM =N wj:= j

M (j=−M/2, . . . , M/2−1) FFTinO(MlogM)instead ofO(M²)flops

(5)

Problem:(NFFT) evaluation of the 1–periodic function

f(w) =

M/2−1

X

k=−M/2

fˆke^−2πikw

atarbitraryknotswj∈T(j=−N/2, . . . , N/2−1) Idea:

1.approximatef bys1: m:=σM (σ >1),ϕ(x) :=˜ P

k∈Zϕ(x+k)

s1(w) :=

m/2−1

X

l=−m/2

glϕ˜

w− l m

2.approximates1 bys: pm,ψ(x) :=ϕ(x)·χ_[−^p

m,_m^p](x)

s(w) :=

m/2−1

X

l=−m/2

glψ˜

w− l m

=

[wm]+p

X

l=[wm]−p

glψ˜

w− l m

3.f(wj)≈s1(wj)≈s(wj)

(6)

Approximate

f(w) =

M/2−1

X

k=−M/2

fˆke^−2πikw

by

s1(w) =

m/2−1

X

l=−m/2

glϕ˜

w− l m

=

∞

X

k=−∞

ˆ

gkck( ˜ϕ) e^−2πikw

≈

m/2−1

X

k=−m/2

ˆ

gkck( ˜ϕ)e^−2πikw

1 set ˆ gk:=

fˆk/ck( ˜ϕ) k=−M/2, . . . , M/2−1,

0 k=−m/2, . . . ,−M/2−1, M/2, . . . , m/2−1

2 by FFT(m):

g = 1 ^M/2−1X ˆ

g e^−2πikl/m

(7)

Algorithm-1D(NFFT)

1. Fork=−M/2, . . . , M/2−1compute ˆ

gk:= ˆfk/ck( ˜ϕ).

2. Forl=−m/2, . . . , m/2−1compute by FFT(m)

gl:= 1 m

M/2−1

X

k=−M/2

ˆ

gke^−2πikl/m.

3. Forj=−N/2, . . . , N/2−1compute

f(wj)≈s(wj) :=

[w_jm]+p

X

l=[w_jm]−p

glψ˜

wj− l m

.

arithmetic operations:

O(M+mlogm+ (2p+ 1)N) =O(MlogM+pN)

(8)

Matrix-vector notation:

f=Afˆ,

whereAmay be factorised approximately as follows:

A≈CF D.

Each of the three matrices corresponds to a step in the NFFT algorithm:

1. D∈R^M^×M is a diagonal matrix:

D:=diag 1

m ck( ˜ϕ) M/2−1

k=−M/2

2. F ∈R^m×M is a truncated Fourier matrix:

F :=

e^−2πikl/mm/2−1 l=−m/2,

M/2−1 k=−M/2

(9)

3. C∈R^N×mis a sparse band matrix with2p+ 1non-zero entries per row:

C:=

cj,l

N/2−1 j=−N/2,

m/2−1

l=−m/2

where cj,l=

ψ w˜ j−_m^l

ifl∈ {bwjmc −p, . . . ,dwjme+p}

0 otherwise.

Structure of the matrixC. Non-zero entries are indicated by dots. The row indexjruns from−N/2toN/2−1, the column indexlruns from−m/2to m/2−1. Parameters used wereN=M = 64,m= 128andp= 5; Legendre nodes were used for thewj.

(10)

Error estimates:

|f(wj)−s(wj)| ≤ Ea(wj) +Et(wj)

aliasing error Ea(wj) := |f(wj)−s1(wj)|

truncation error Et(wj) := |s1(wj)−s(wj)|

Ea(wj)≤ kfˆk¹ max

−M/2≤k<M/2

∞

X

r=−∞

r6=0

ck+mr( ˜ϕ) ck( ˜ϕ)

Et(wj)≤kfˆk¹

m max

−M/2≤k<M/2

1

|ck( ˜ϕ)|

m/2−1

X

l=−m/2

ϕ˜

wj−_m^l

−ψ˜

wj−_m^l

(11)

Window functionsϕ(w) =˜ P

k∈Zϕ(w+k):

•Gaussian(Dutt, Rokhlin 1993; Steidl 1998) ϕ(w) = (πb)^−1/2e^−(mw)²^/b

b:= 2σ 2σ−1

p π

•B-splines(Beylkin 1995; Potts, Steidl, Tasche 1998) ϕ(w) = B2p(mw)

•Sinc–function(Potts 2001)

ϕ(w) = ^(2σ−1)M_2p

sinc_{π(2σ−1)M w}

2p

•Kaiser–Bessel function(Fourmont 2001, Jackson 1991)

|w| ≤ _m^p : ϕ(w) = 1 π

sinh(bp

p²−m²w²) pp²−m²w²

b:=π

2−1

σ

(12)

Error estimates for special window functionsϕ:

|f(wj)−s(wj)| ≤C(σ, p)kfkˆ 1

with

C(σ, p) :=











4 e−pπ(1−1/(2σ−1))

for Gaussian 4

1 2σ−1

2p

for B-Splines

1 p−1

2 σ^2p+

σ 2σ−1

2p

for sinc 4π(√

p+p)q⁴

1−_σ¹e^−p2π

√1−1/σ

for Kaiser-Bessel

For fixedσ >1, theerror decaysexponentiallywithp.

(13)

2 4 6 8 10 12 14 16 18 20 10⁻¹⁶

10⁻¹⁴ 10⁻¹² 10⁻¹⁰ 10⁻⁸ 10⁻⁶ 10⁻⁴ 10⁻² 10⁰

m E2

Gaussian Kaiser−Bessel B−Spline Sinc^2m

p

The error with options double precision,d= 1, parameters M = 1024, N= 2000, σ= 2forE2

E2=kf−sk2

kfk2

=





N/2−1

X

j=−N/2

|fj−s(wj)|²





1 2

·





N/2−1

X

j=−N/2

|fj|²





−¹₂

(14)

NFFT– fast computation of

f(wj) =

M/2−1

X

k=−M/2

fˆke^−2πikw^j (j=−N/2, . . . , N/2−1)

matrix–vector form

fˆ:= ( ˆfk)^M/2_k=−M/2,f:= (f(wj))^N/2_j=−N/2,A:= e^−2πikw^jN/2−1,M/2−1 j=−N/2,k=−M/2

f=Afˆ≈CF Dfˆ NFFT^H (adjoint,not inverse!) – fast computation of

h(k) =

N/2−1

X

j=−N/2

fje^2πikw^j (j=−M/2, . . . , M/2−1)

The factorisation that was derived forAallows us to derive an NFFT^H algorithm simply by transposingA:

(15)

NFFT

(multivariate case)

fast computation of the sums

f(wj) =

M/2−1

X

k1=−M/2

. . .

M/2−1

X

kd=−M/2

fke^−2πikw^j (j=−N/2, . . . , N/2−1)

h(k) =

N/2−1

X

j=−N/2

fje^2πikw^j

k∈ {−M/2, . . . , M/2−1}^d=:IM^d

forequispacednodeswj:=_M^j (N=M^d)

FFT(fast Fourier transform) inO(M^dlogM) forarbitrarynodeswj∈[−1/2,1/2)^d

NFFT(nonequispaced FFT) inO(M^dlogM+p^dN)

(16)

Software available:

NFFT – C subroutine library (Keiner, Kunis, Potts 2002–2013) http://www.tu-chemnitz.de/∼potts/nfft

Generalization:

Nonequispaced in time and frequency (NNFFT), nonequispaced DCT/DST, hyperbolic cross, NFFT on the sphere, iterative solution of the inverse transforms

Applications:

fast summation, fast Gauss transform, summation on the sphere, MRI, polar FFT, Radon transform, CT, ridgelet transform

Documentation:

NFFT3 Tutorial (Keiner, Kunis, Potts)

(17)

Fast summation algorithms of radial functions

^P Problem: fast computation of

f(xj) :=

N

X

k=1

αkK(xj−xk) (j= 1, . . . , N)

nodesxj∈R^d,K(x) =K(kxk)radial functions

f = Kα

Kare special kernels, e.g.

singular kernels: 1

|x|, 1

x²,log|x|, x²log|x|

nonsingular kernels: (x²+c²)^±1/2,e^−δx²

Applications: integral equations, scattered data approximation, image processing, discrete Gauss transform,. . .

(18)

Introduction NFFT Fast summation P2NFFT

Known methods for products of vectors with specially structured dense matrices

f=Kα

panel clustering, fast multipole method, wavelet methods

Standard algorithmforequispacednodes: K –Toeplitz matrix

f = FFT( diag(b) FFT

^H

(α))

f = NFFT( diag( ˜ b) NFFT

^H

(α)) + near field

(19)

Known methods for products of vectors with specially structured dense matrices

f=Kα

panel clustering, fast multipole method, wavelet methods

Standard algorithmforequispacednodes: K –Toeplitz matrix

f = FFT( diag(b) FFT

^H

(α))

Ideafornonequispacednodes: replaceFFTbyNFFT

f = NFFT( diag( ˜ b) NFFT

^H

(α)) + near field

(20)

Problem: fast evaluation of f(x) :=

N

X

k=1

αkK(x−xk) =

N

X

k=1

αkK(kx−xkk),

at theN given nodesx=xj∈R^d Singular kernels: 1

|x|, 1

• at boundary, ¹₂−εB≤ kxk ≤ ¹₂ (assumekxj−xkk ≤ ¹₂ −εB)

• smooth and periodic functionKR

Approximation:

KR(x)≈ KRF(x) := X

l∈I^d_m

ble^2πilx

(21)

N

X

k=1

αkK(x−xk) =

N

X

k=1

αkK(kx−xkk),

|x|, 1

RegularizeK:

• near0,kxk ≤εI

Approximation:

l∈I^d_m

ble^2πilx

−¹2+ε_B −ε_I +εI 1 2−ε_B

(22)

N

X

k=1

αkK(x−xk) =

N

X

k=1

αkK(kx−xkk),

|x|, 1

RegularizeK:

• near0,kxk ≤εI

Approximation:

l∈I^d

ble^2πilx

(23)

Regularization by algebraic polynomials Given: aj, bj,j= 0, . . . , q−1

Compute: polynomialP with

P^(j)(c−r) =aj (j= 0, . . . , q−1) (1) P^(j)(c+r) =bj (j= 0, . . . , q−1) (2) Theorem (Two point Taylor interpolation):

For givenaj, bj(j= 0, . . . , q−1) there exists a unique polynomialP of degree2q−1which satisfies the conditions (1) and (2):

P(x) = 1 2^q

q−1

X

j=0 q−1−j

X

k=0

r^j j!2^k

q−1 +k k

h(1 +y)^j+k(1−y)^qaj+ (−1)^j(1−y)^j+k(1 +y)^qbj

i,

wherey:=^x−c_r . For symmetric functions: (−1)^jbj=aj.

P Around0:aj=K^(j)(−εI), bj=K^(j)(εI)

At the boundary: aj=K^(j)(1/2−εB), bj=K^(j)(−1/2+εB)

(24)

Splitting: K(x) = [K(x)− KR(x)] +KR(x) =:KNE(x) +KR(x)

ApproximationKR(x)≈ KRF(x): f(x)≈f(x) :=˜ fNE(x) +fRF(x) Near field (kx−xkk ≤εI, direct):

fNE(x) :=

N

X

k=1

αkKNE(x−xk) Fourier method

fRF(x) :=

N

X

k=1

αkKRF(x−xk)

fRF(xj) =

N

X

k=1

αk

X

l∈I_m^d

ble^2πil(x^j^−x^k⁾ = X

l∈I_m^d

bl N

X

k=1

αke^−2πilx^k

!

| {z }

NFFT^H

e^2πilx^j

| {z }

NFFT

(25)

Particle-particle NFFT (P²NFFT)

Coulomb potential in charged particle systems:

φ(xj) :=

N

X

i=1,i6=j

qi

kxi−xjk

Approach:

• setK(kxk) :=kxk⁻¹

• letkxi−xjk ≤h(¹/2−εB)

• constructh-periodic regularization

• fast computation of the far field by NFFT based fast summation

φRF(xj) = X

l∈I³_m

bl N

X

i=1

qie^2πilxⁱ^/h

!

| {z }

NFFT^H

e^−2πilx^j^/h

| {z }

NFFT

(26)

φ(xj) :=X

n∈S N

X

i=1

0 qi

kxi−xj+nk s.t.periodic boundary conditions

xj∈B1T×B2T×B3T

1

fully periodic:S =B1Z×B2Z×B3Z

• Ewald summation

Ewald splitting

1

r =erf(αr)

r +erfc(αr) r

0 0.2 0.4 0.6 0.8 1

0 2 4 6 8 10

• erf(x) := ^√²_πRx r

0 e^−t²dt(error function)

• erfc(x) := 1−erf(x)(complementary error function)

(27)

φ(xj) :=X

n∈S N

X

i=1

0 qi

xj∈B1T×B2T×B3T

1

• Ewald summation X

n∈S N

X

i=1

0 qi

kxi−xj+nk =X

n∈S N

X

i=1 0qi

erfc(αkxi−xj+nk) kxi−xj+nk +

X

n∈S N

X

i=1

qi

erf(αkxi−xj+nk) kxi−xj+nk − 2α

√πqj

• short range part: direct evaluation after truncation

• limr→0erf(αr)

r =^√^2α_π ⇒substractself potential

• transformlong range partinto a sum in Fourier space

(28)

φ(xj) :=X

n∈S N

X

i=1

0 qi

xj∈B1T×B2T×B3T

1

• Ewald summation

• compute long range part using NFFTs (Hedman, Laaksonen 2006)

φ^L(xj) = 4π B1B2B3

X

k6=0

e^−kkk²^/(4α²⁾ kkk²

N

X

i=1

qie^ikxⁱ

!

| {z }

NFFT^H

e^−ikx^j

| {z }

NFFT

k∈_B^2π₁Z×_B^2π₂Z×_B^2π₃Z

(29)

φ(xj) :=X

n∈S N

X

i=1

0 qi

xj∈B1T×B2T×B3T

1

2d-periodic: S=B1Z×B2Z× {0}

• Ewald summation, long range part: k∈ _B^2π

1Z×_B^2π

2Z φ^L(xj) = π

B1B2

X

k6=0

Θ(kkk, xij,3)

kkk eîk(xîj,1^,xîj,2⁾

• Idea: regularize the functionsΘ(k,·)(N., Potts 2013)

0

−(B3 +ε) B3 +ε

−B3 B3

Θ(k,·)

≈

M/2−1

X

l=−M/2

bk,le^πilx/(B³^+ε)

Fast Fourier transforms at nonequispaced nodes and applications

(30)

φ(xj) :=X

n∈S N

X

i=1

0 qi

xj∈B1T×B2T×B3T summary:

• S={0}³: NFFT based fast summation in 3d

• fully periodic: Ewald + NFFT

• 2d-periodic: Ewald + NFFT based fast summation in 1d

• 1d-periodic: Ewald + NFFT based fast summation in 2d

Structure Near field +CF D

| {z }

NFFT

D˜

|{z}

diag

D^HF^HC^H

| {z }

NFFT^H

(31)

Calculation of the fields

Ej:=−∇φ(y) y=x_j

Long range part for fully p.b.c.:

two possibilies:

1 ikdifferentation (apply∇to Fourier series) E^L_j = 4iπ

B1B2B3

X

k6=0

e^kkk²^/(4α²⁾

kkk² kS(k)e^−ikx^j with S(k) :=

N

X

i=1

qie^ikxⁱ

2 analytic differentation (apply∇to NFFT window function)

∇φ^L(xj) = 4π B1B2B3

∇X

k6=0

e^−kkk²^/(4α²⁾

kkk² S(k)e^−ikx^j

≈ 4π B1B2B3

X

l∈I³_n

gl∇˜ϕ xj−_m¹l

Analog for other types of boundary conditions

(32)

Conclusions

• : fast evaluation of trigonometric sums for nonequispaced data

• software available

• important: NFFT based fast summation

• application to particle simulation: methods for all types of boundary conditions

http://www.tu-chemnitz.de/∼potts/nfft http://www.tu-chemnitz.de/∼nesfr