• Keine Ergebnisse gefunden

7.3 Numerical realisation

7.3.1 Linear least squares problems

The problem to minimise the functional (7.12) is known aslinear least squares prob-lem. The main result for linear least squares problems is summarised in the following theorem.

Theorem 7.9. LetA ∈Cm×n andb ∈Cm withm, n∈N andm > n. Thenx∈Cn is a solution of the linear least squares problem

x∈minRn

kAx−bk2 (7.16)

if and only if x is a solution of the normal equation

AAx=Ab. (7.17)

Proof. See e.g. [7, Theorem 1.1.2].

The matrix AA is hermitian and positive semi-definite. Furthermore AA is positive definite if and only if N(A) = {0}. In this case the normal equation possesses a unique solution xˆ given through

ˆ

x= (AA)−1Ab.

In the case that N(A) 6= {0}, which means that A does not have full rank, the normal equation and hence the linear least square problem possesses more than one

7.3 Numerical realisation

Algorithm 7.1 The conjugate gradient method for linear least squares problems Input data:

A ∈Cm×n % the system matrix

x(0) ∈Cn % initial guess

b ∈Cm % the right hand side

Initialisation:

r(0) =b−Ax(0) s(0) =Ar(0) p(0) =s(0)

FOR k = 0,1, . . . DO q(k) =Ap(k)

αk =ks(k)k22/kq(k)k22 x(k+1) =x(k)kp(k) r(k+1) =r(k)−αkq(k) s(k+1) =Ar(k+1)

βk =ks(k+1)k22/ks(k)k22 p(k+1) =s(k+1)kp(k) END

Output:

x(k) % approximation to the linear least squares problem s(k) % the residual of the normal equation

solution. But under all of these solutions, there is only one with minimal norm. This solution is given throughAb, whereAdenotes thepseudoinverseorMoore-Penrose inverse of A.

The conjugate gradient method for least squares problem (CGLS method), as it is described e.g. in [7, Chapter 7.4], is a well suited algorithm to compute this minimal norm least squares solution. The general algorithm is shown in Algorithm 7.1.

We make the following definition.

Definition 7.10. For a given matrix B ∈ Cn×n and vector c ∈ Cn the Krylov subspace Kk(B, c) is given through

Kk(B, c) := span{c, Bc, . . . , Bk−1c}. (7.18) Now we can state the next theorem that contains the main property of the CGLS method.

Theorem 7.11. The k-th iterate x(k) of the CGLS method is an element of the

affine subspace

x(k) ∈x(0)+Kk(AA, s(0)). (7.19) Furthermore, x(k) minimises the functional (7.12) over all elements in the affine subspace.

Proof. See e.g. [7, Chapter 7.4].

On can show that the CGLS method with initial guessx(0) converges to the unique solutionx˜of the linear least squares problem that minimisesk˜x−x(0)k. Thus, for the choice x(0) = 0 the CGLS method converges to the minimal norm solutionxˆ=Ab.

The convergence does not require the matrix AA to be definit.

We note that the algorithm can be carried out without storing the matricesAand A, if one supplies subroutines that carry out the matrix-vector productA·ψand A·ψ. Using the two-dimensional matrices G,L ∈ C[×(2N)]×[×(2N)]

given through (6.29) and (6.30), we can write the matrices Aand A as

A= (I+L+G)·Pτ and A =Pτ ·(I +L+G).

The matrix L and L are sparse banded block matrices with band blocks of equal band-width. Thus the matrix-vector multiplication can be done in O(](2N)) op-erations. As in the case of the BMIA, we need to speed up the matrix-vector multiplication of the dense matrices G and G.

An optimised version of the CGLS method applied to the functional (7.12) is given in pseudo code in Algorithm 7.2

For some preliminary numerical results for the case of the rough surface scattering problem we refer the reader to [31].

7.3 Numerical realisation

Algorithm 7.2The CGLS method applied to the rough surface scattering problem Input data:

φz ∈C×(2N) % right hand side

ψ0∈C×(2N) % initial guess

A ∈C[×(2N)]×[×(2N)] % large dense rectangular matrix

Tol % tolerance

Initialisation:

r =φz−A·ψ0 s =A·r p =s γold =ksk22 γnewold

WHILE γnew> TolDO s =A·p α =γold/ksk22 ψ =ψ+αp r =r−αs s =A·r γnew =ksk22 β =γnewold p =s+βp γoldnew END

Output:

ψ.

Chapter 8

Fast matrix-vector multiplications for integral operators with

difference kernels

In this chapter we present the use of the fast Fourier transform (FFT) to speed up the matrix-vector multiplication for the fully discrete version of certain integral operators. More precisely, we are interested in the multi-dimensional analogon of integral operators of the form

(Bψ)(x) :=

Z b a

k(x−y)ψ(y)dy, x∈[a, b], for real numbers a < b in the cases, where

(i) k and ψ are piecewise, continuous periodic functions with period b−a, (ii) k and ψ are continuous functions on [−(b−a), b−a]respectively [a, b].

A simple computation shows that (Bψ)(x+a) =

Z b a

k(x−(y−a))ψ(y)dy

= Z b−a

0

k(x−z)ψ(z+a)dz, x∈(0, b−a).

Thus we see that it suffices to consider the above two cases only for operators of the form

(Aψ)(x) :=

Z p 0

k(x−y)ψ(y)dy, x∈[0, p] (8.1) for some p:=b−a >0.

In analogy to the discretisation (5.2) we introduce a semi-discrete approximation in the form

(Ahψ)(x) :=

N−1

X

j=0

kh,j(x)ψ(hj), x∈[0, p], (8.2)

where has a matrix representationA with

(A)i,j =hk h(i−j)

, i, j = 0, . . . , N −1.

Hence the i-th entry of the matrix-vector product of the matrix A with a vector ψ= (ψi)i=0,...,N−1 ∈CN×1 with

If done in a naive way, this matrix-vector multiplication needsO(N2)evaluations of multiplications and summations. Fortunately, the matrix has a very special structure that can be exploited. Due to the equidistantly spaced grid we see that in the case of a periodic kernel function, the matrixA is of the form

 matrix is obviously fully determined by its first rowor column.

In the case of a non-periodic kernel function, the matrix A is of the form

8.1 The multi-dimensional case for periodic kernel

where tj :=h k(hj)forj ∈Z. Such a matrix is calledToeplitz matrix and it is fully determined by its first row and column.

It is possible to design fast matrix-vector multiplications for these kind of matrices that reduce the normal amount of work from O(N2) down to O(NlogN). These fast algorithms are based on the following two properties of circulant and Toeplitz matrices:

• any circulant matrix is diagonalised by the Fourier matrix, i.e. the eigenvectors of a circulant matrix are the columns of the Fourier matrix,

• any Toeplitz matrix of size N ×N can be embedded into a circulant matrix of size 2N ×2N, cf. Example 8.3.

We show in the next section that the discretisation of integral operators with difference kernels in Rd lead to special matrices that are generalisation of circulant and Toeplitz matrices. They are d-level block matrices, where each block and level is circular or toeplitz. Though we only need two- and three-level block matrices we give a general approach that works for arbitrary dimensions.

8.1 The multi-dimensional case for periodic kernel

Consider, for some p∈Rd with p > 0, the multi-dimensional analogon of (8.1), i.e.

(Aψ)(x) :=

Z

[0,p]

k(x−y)ψ(y)dy, x∈[0, p]

with multi-periodic kernel and density function with period vector p. For N ∈ Nd we define the step size

h:=p÷2N ∈Rd>0.

Note that the step-size is a vector so that the grid points are given through hj for j ∈Zd.

Remark 8.1. We chose an even number of discretisation points, as this will enable us later on to use the fast Fourier transform for the matrix-vector multiplication.

This will work best if N is some power of two.

The semi-discrete and fully discrete approximation are given through (Ahψ)(x) :=

2N−1

XXXXXXXXX

j=0

kh,j(x)ψ(hj), x∈[0, p]

with

kh,j(x) := ](h)k(x−hj), j ∈Zd, x∈[0, p]

and

respectively. The matrix representation of the fully discrete version is ad-level block circulant matrixA∈C[×(2N)]×[×(2N)] with

(A)i,j =](h)k h(i−j)

, 0≤i, j ≤2N −1.

A different way to get the same discretisation is the following. Instead of replacing the integral by a quadrature formula (we are using the composite trapezoidal rule), we replace the kernel and density by theird-dimensional trigonometric interpolation polynomial of degree at most |N|. Thus the semi-discrete approximation for the Fourier method is given through

(ADFTh ψ)(x) :=

Z

[0,p]

kN(x−y)ψN(y)dy, x∈[0, p], (8.4) where kN and ψN are the d-dimensional trigonometric interpolation polynomials given through (4.6) with coefficients given through (4.7). Inserting the corresponding representations of kN and ψN, we can use the convolution theorem for periodic functions(Theorem 4.3) to rewrite the integral to yield

(ADFTh ψ)(x) =

The fully discrete version is given through (ADFTh ψ)(hl) =](p)

N−1

XXXXXXXXX

m=−N

k(m) ˜˜ ψ(m)e2πihm,l÷2Ni, l= 0, . . . ,2N −1. (8.5) We note that this is a discrete version of theconvolution theorem for periodic func-tions (Theorem 4.3) that can be used for fast summation algorithm. To compute this expression we see that it suffices to calculate threed-dimensional discrete Fourier transform (DFT) of length N :=](2N). If this is done with a d-dimensional FFT, we only need O(3·Nlog(N)) operations as compared to O(N∗2) operations for the standard algorithm.

The following Lemma shows that both algorithm give exactly the same result.

Lemma 8.2. (ADFTh ψ)(hl) = (Ahψ)(hl), l = 0, . . . ,2N −1.