• Keine Ergebnisse gefunden

Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 6

N/A
N/A
Protected

Academic year: 2021

Aktie "Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 6"

Copied!
82
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

ITERATIVE PROJECTION METHODS FOR

SPARSE LINEAR SYSTEMS AND

EIGENPROBLEMS

CHAPTER 6 : MINIMIZING METHODS

Heinrich Voss voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

CGN Methods

There are several generalizations of CG (and CR) for nonsymmetric linear problems.

The most obvious approach is to apply CG to an equivalent linear systems, which is symmetric and positive definite, e.g.

ATAx = ATb

or

AATy = b, x = ATy .

These methods are calledCGN methodwhere the letter ’N’ suggests that

(3)

CGN methods ct.

For ATAx = ATb the k -th CG iterate xk minimizes the functional

kx − x∗k2

ATA= (x − x∗)TATA(x − x∗) = kA(x − x∗)k22= kr k22

in the Krylov space

x0+ Kk(ATr0,ATA)

= x0+span{ATr0, (ATA)ATr0, . . . , (ATA)k −1ATr0}.

This method is calledCGNR methodwhere the letter ’R’ indicates that the

residuum is minimized.

The CGNR method was suggested byHestenes und Stiefel(1952). The

matrix ATA in general is more populated than A, however, the algorithm does

(4)

CGNR

The matrix ATA which appears in the development of the CGNR method is

usually not as sparse as the original matrix A. However, it has not to be formed explicitly in the algorithm.

1: r0=b − Ax0

2: d1=ATr0

3: α0= kd1k22

4: for k = 1, 2, . . . until convergence do

5: sk =Adk 6: τk = αk −1/kskk2 2 7: xk =xk −1+ τ kdk 8: rk =rk −1− τ ksk 9: βk =1/αk −1 10: tk =ATrk 11: αk = ktkk22 12: βk = αkβk 13: dk +1=tk+ βkdk 14: end for

(5)

CGNR

Cost of CGNR

2 matrix-vector products 2 scalar products 3 _axpy

Storage requirements: 4 vectors

To investigate the convergence of the CGNR method we note that xk =x0+qk −1(ATA)ATr0

for some polynomial qk −1of degree k − 1. Subtracting the exact solution x∗

on both sides and observing r0=Ae0one obtains

ek =pk(ATA)e0, pk(z) = 1 − zqk −1(z),

and since Apk(ATA) = pk(AAT)A and rk =Aek, multiplication with A gives

(6)

Convergence of CGNR

Replacing pk by a shifted and scaled Chebyshev polynomial one gets similar

to the convergence result for the CG method

Theorem 6.1

Let A be nonsingular, σ1its largest and σnits smallest singular value, and let

R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds krkk 2≤ 2Rk 1 + R2kkr 0k 2≤ 2  κ2(A) − 1 κ2(A) + 1 k kr0k 2.

(7)

Example

1000 2000 3000 4000 5000 6000 7000 8000 10−6 10−5 10−4 10−3 10−2 10−1 100 CG and CGNR for −∆ u = 0, h=1/128 ||xCG||2 ||x CGNR||2

(8)

Example

100 200 300 400 500 600 700 800 900 1000 1100 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y = 0, h=1/128 ||r CGNR||2

(9)

Example

0 200 400 600 800 1000 1200 1400 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −163.84 u = 0, h=1/128 ||rCGNR||2

(10)

Example

0 1000 2000 3000 4000 5000 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −1638.4 u = 0, h=1/128 ||rCGNR||2

(11)

CGNE

For AATy = b, x = ATy , the iterates yk minimize the functional

ky − y∗k2AAT = (y − y∗)AAT(y − y∗) = k(AT(y − y∗)k22= kx − x∗k22

in the Krylov

ATx0+ATKk(r0,ATA)

= ATx0+span{ATr0,AT(ATA)r0, . . . ,AT(ATA)k −1r0}.

This method is calledCGNE methodwhere the letter ’E’ indicates that the

error is minimized.

(12)

CGNE

Again AAT has not to be formed explicitly in the algorithm. Moreover, the

method can be implemented in terms of the vectors xk without reference to

yk =ATxk. The cost and storage requirements are the same as for CGNR.

1: r0=b − Ax0

2: d1=ATr0

3: α0= kr0k22

4: for k = 1, 2, . . . until convergence do

5: τk = αk −1/kdkk22 6: xk =xk −1+ τ kdk 7: rk =rk −1− τ kAdk 8: βk =1/αk −1 9: αk = krkk22 10: βk = αkβk 11: dk +1=ATrk+ β kdk 12: end for

(13)

Convergence of CGNE

The k -th iterate xk =ATyk satisfies

A−Txk =yk = y0+ ˜qk −1(AAT)(AATy0− b)

= A−Tx0+ ˜qk −1(AAT)r0,

i.e.

xk =x0+ATq˜k −1(AAT)r0.

Subtracting the solution x∗yields

ek =e0+ATq˜k −1(AAT)Ae0, and from ATq˜k −1(AAT)A =: AT k −1X j=0 αj(AAT)jA = k −1 X j=0 αjAT(AAT)jA = k −1 X j=0 αj(ATA)j+1 we finally get ek = ˜pk(ATA)e0 p˜k(0) = 1.

(14)

Convergence of CGNE ct.

Replacing ˜pk by a shifted and scaled Chebyshev polynomial one gets

Theorem 6.1a

Let A be nonsingular, σ1its largest and σnits smallest singular value, and let

R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds kekk 2≤ 2Rk 1 + R2kke 0k 2≤ 2  κ2(A) − 1 κ2(A) + 1 k ke0k 2.

(15)

CGNR, CGNE

The bounds given in Theorems 6.1 and 6.1a illustrate the main drawback of the CGN methods. In particular, if A is symmetric and positive definite the speed of convergence is slowed down to the one of the steepest descent method.

However, there are special situations where CGNR and CGNE are the

optimal extensions of CG (cf.Nachtigal, Reddy & Treffethen(1992) and

Freund, Golub & Nachtigal(1992)), but usually they are not satisfactory generalizations of the CG method.

As for the CG method the convergence properties of CGNE and CGNR can

be improved considerably by preconditioning. Since the matrices ATA and

AAT are much more populated than the sparse matrix A the incomplete

(16)

Preconditioning by incomplete LQ factorization

Let A = LQ be the LQ factorization of A, where L is a lower triangular matrix and Q is an orthogonal matrix. Then

AAT = (LQ)(LQ)T =LQQTLT =LLT,

and LLT is the exact Cholesky factorization of AAT which can be achieved

without forming the matrix AAT.

To obtain a sparse approximationSaad(1988) proposed the incomplete LQ

factorization of A.Analogously to the incomplete Cholesky factorization most of the elements of L and Q are dropped during the factorization process according to some strategy to avoid excessive fill-in.

The LQ factorization can be calculated by applying the Gram-Schmidt process to the rows of A. This approach is known to be unstable if it is used to

orthogonalize a large number of vectors. Notice, however that the rows of Q remain very sparse in the incomplete LQ factorization, and therefore any given row of A will be orthogonal to most of the previous rows of Q. Hence, within incomplete LQ factorization the Gram-Schmidt process is much less susceptible to numerical difficulties.

(17)

Incomplete LQ factorization ct.

Details of the implementation of the incomplete LQ factorization and its use as

a preconditioner in CGNE method can be found inSaad(1988) and the book

of Saad on Iterative Methods for Linear Systems.

Similarly the CGNR method can be preconditioned using the incomplete LQ factorization of AT.

(18)

Preconditioning by SSOR

Consider the j-elementary step of the Gauß-Seidel method for the linear system

AATy = b

where the j-th component of y is modified such that the j-equation of the system is satisfied.

Let ynew = yold + δjej, then δj ∈ R is chosen such that

0 = (ej)T(b − AATynew) = (ej)T(b − AATyold − δjAATej) ⇒ δj = bj− (ATej)TATyold kATejk2 2 = bj− a 0 jxold k(a0 j)Tk22 , where a0

j denotes the j-th row Of A, and xold = ATyold. The update of x is

given by

xnew = ATynew = AT

yold + δjATej =xold + δj(a0j)T,

(19)

SSOR for AA

T

y = b, x = A

T

y

1: Let y be an initial approximation

2: x = ATy 3: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 4: δ = ω(bj − a0jx )/ka0jk22 5: yj =yj+ δ 6: x = x + δ(a0j)T 7: end for Comments

[1.] Since A is nonsingular the matrix AAT is positive definite, and therefore

the SSOR method converges for every ω ∈ (0, 2) if thefor-statement is

repeated. A serious difficulty is to determine the optimum relaxation parameter.

[2.] The matrix AAT can be dense or in general much less sparse than A.

However, the cost of the implementation above depend only on the nonzero structure of A.

(20)

CGNE-SSOR

In the following implementation of CGNE with preconditioning by SSOR the call "SSOR-CGNE(r,ω,z)" indicates that one sweep of the SSOR method with parameter ω and initial approximation z = 0 is executed for the linear system AATz = r .

1: r0=b − Ax0

2: SSOR-CGNE(r0, ω,z0)

3: d1=ATz0

4: α0= (z0)Tr0

5: for k = 1, 2, . . . until convergence do

6: τk = αk −1/kdkk22 7: xk =xk −1+ τ kdk 8: rk =rk −1− τ kAdk 9: βk =1/αk −1 10: SSOR-CGNE(rk, ω,zk) 11: αk = (zk)Trk 12: βk = αkβk 13: dk +1=ATzk+ β kdk 14: end for

(21)

SSOR for A

T

Ax = A

T

b

Similarly, if aj denotes the j-th column of A the SSOR method for ATAx = ATb

with relaxation parameter ω reads

1: Let x be an initial approximation

2: z = Ax 3: c = ATb 4: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 5: δ = ω(cj− zTaj)/kajk22 6: xj =xj+ δ 7: z = z + δaj 8: end for Comment

The SSOR method above can be applied to nonquadratic matrices A ∈ Rm×n

where m > n and rank(A) = n. In this case it converges to the unique solution of the least squares problem

(22)

CGNR-SSOR

In the following implementation of preconditioned CGNR the call

"SSOR-CGNR(r,ω,z)" indicates that one sweep of SSOR with parameter ω

and initial approximation z = 0 is executed for the linear system AATz = r .

1: r0=AT(b − Ax0)

2: SSOR-CGNR(r0, ω,z0)

3: d1=z0

4: α0= (z0)Tr0

5: for k = 1, 2, . . . until convergence do

6: sk =Adk 7: τk = αk −1/kskk22 8: xk =xk −1+ τ kdk 9: rk =rk −1− τ kATsk 10: βk =1/αk −1 11: SSOR-CGNR(rk, ω,zk) 12: αk = (zk)Trk 13: βk = αkβk 14: dk +1=zk+ β kdk 15: end for

(23)
(24)
(25)

Preconditioning

CGNR (and CGNE) can be precondioned not taking advantage of the special structure of the system matrix applying the methods to the system

M−1Ax = M−1b left preconditioning

or to

AM−1y = b, y = Mx right preconditioning

where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.

In the following examples we used M = LU where

[L, U] = luinc(A,000) no fill incomplete LU decomposition

or

[L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .

(26)

Example

20 40 60 80 100 120 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 uy=0, h=1/128

no precond.: 5.877E8 flops

nofill luinc: 1.108E8 + 1.115E5 flops

luinc(a,0.1): 3.414E7 + 4.4225E6 flops

luinc(a,0.01): 1.555E7 + 4.4225E6 flops

luinc(a,0.001): 6.298E6 + 4.4225E6 flops

(27)

Example

20 40 60 80 100 120 140 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 uy−163.84 u=0, h=1/128

no precond.: 6.982E8 flops

nofill luinc: 1.273E8 + 1.115E5 flops

luinc(a,0.1): 4.346E7 + 4.4225E6 flops

luinc(a,0.01): 1.755E7 + 4.4225E6 flops

luinc(a,0.001): 6.298E6 + 4.4225E6 flops

(28)

Example

50 100 150 200 250 300 350 400 450 500 550 600 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 uy−1638.4 u=0, h=1/128

no precond.: 2.711E9 flops

nofill luinc: 5.014E8 + 1.115E5 flops

luinc(a,0.1): 1.934E8 + 4.4225E6 flops luinc(a,0.01): 2.144E8 + 4.4225E6 flops

luinc(a,0.001): 5.665E7 + 4.4225E6 flops

(29)

Sparse approximate inverses

For symmetric problems the preconditioner had to be symmetric as well. For nonsymmetric problems this is no longer necessary. We just have to find a

matrix C = M−1such that kAC − Ik ≈ 0 (right preconditioning) or

kCA − Ik ≈ 0 (left preconditioning). Then applying the preconditioner simply amounts in matrix multiplication.

The Frobenius norm is particularly interesting to choose since

kAC − Ik2 F = n X j=1 k(AC − I)ejk2 2.

(30)

SPAI ct.

Hence, if cj denotes the j-th column of C, then we have to solve the n

independentleast squares problems kAcj− ejk

2=min!, j = 1, . . . , n.

Solving these problems is equivalent to determining the inverse of A, and will be much too expensive. Therefore, for every j = 1, . . . , n we choose a sparsity

pattern G(j), denote by ˆcj a vector which contains only elements in G(j), and

by ˆAj a matrix whose columns are the columns of A corresponding to G(j) and

whose rows i are such that there exists aik 6= 0 for some k ∈ G(j). Since A is

sparse, ˆAj will be a small matrix, and the resulting least squares problems

kˆAjcˆj− ˆejk2=min! (∗)

(31)

Choice of sparsity pattern

Grote & Huckle(1997) proposed an incremental method. Starting from a set

of indices G0(j) (usually corresponding to a diagonal C or to the structure of

A) they solve the least squares problem (*) and then, iteratively, enlarge the set of indices.

Let

kˆA(p)j ˆcj,p− ejk

2=min! (∗∗)

be the least squares problem in the p-th step of the iteration, let cj ∈ Rnbe the

solution of (**) (i.e. ˆcj,paugmented by zero components), and denote by

(32)

Choice of sparsity pattern ct.

Let L = {k : rk 6= 0}, and for ` ∈ L let N`= {k : a`k 6= 0, k 6∈ Gp(j)}. Then

the candidates for indices of new non-zero components in the solution vector are chosen in

∪`∈LN`.

For k in this set we consider the problem

kr + µkAekk2= kA(cj+ µkek) −ejk2=min! =⇒ µk = −

rTAek

kAekk2 2

.

It can be shown that the norm of the new residual is kr k2 2− (rTAek)2 kAekk2 2 ,

and that there exist indices such that rTAek 6= 0. From these indices one

(33)

Choice of sparsity pattern ct.

This iteration step is repeated until the norm of the residual is smaller than a prescribed bound or until we have reached the maximum storage that is allowed for column j.

Note that putting new indices into Gp(j) will add columns and rows to the

matrix ˆA(p)j . Methods exist to update the QR factorization.

Approximate inverse preconditioning is an active field of research. It is

particularly interesting for parallel machines. Gould & Scott(1995) described

some improvements of the method. A symmetric approximate inverse

(34)

Nonsymmetric problems

Desirable are generalizations of CG (or CR) methods which have similar properties as CG without squaring the condition number of the system matrix. Outstanding properties of the CG method:

— based on a short recurrence (low storage requirements) — minimizes error on ascending sequence of Krylov spaces Theorem of Faber & Manteuffel (1984)

Methods with these two properties exist only for a very limited class of matrices.

For 3-term-recurrences only for matrices

A = eiθ(H + σI), H = HH, θ ∈ R, σ ∈ C.

(35)

Generalized Conjugate Residuals

Eisenstat, Elman & Schultz(1983) studied theGeneralized Conjugate

Residual Method,GCR methodfor short, which is a direct generalization of

the CR method (and of MINRES) to nonsymmetric systems of linear equations.

By the Gram-Schmidt process a basis {d1, . . . ,dk} of the Krylov space

Kk(r0,A) is constructed which is orthogonal with respect to the scalar product

hx, y iATA=xTATAy .

Since d1, . . . ,dk are ATA–conjugate directions the minimization problem

kx − x∗kATA=min!, x ∈ x0+ Kk(r0,A)

is decoupled into a sequence of line searches along d1, . . . ,dk.

Hence, the following algorithm generates iterates xk which minimize the

Euclidean norm of the residual kb − Ax k2= kx − x∗kATAupon x0+ Kk(r0,A)

(36)

GCR method

1: Choose an initial guess x0

2: r0=b − Ax0

3: d1=r0

4: s1=Ad1

5: for k = 1, 2, . . . until convergence do

6: αk = kskk22 7: τk = (rk −1)Tsk/αk 8: xk =xk −1+ τ kdk 9: rk =rk −1− τ ksk 10: dk +1=rk 11: sk +1=Adk +1 12: for j = 1 : k do 13: γk ,j = (sk +1)Tsj j 14: dk +1=dk +1− γ k ,jdj 15: sk +1=sk +1− γ k ,jsj 16: end for 17: end for

(37)

Generalized Conjugate Residuals

Without further assumptions on the matrix A the GCR method may break

down because dk +1becomes the null vector.

Consider for instance

A = 0 1 −1 0  , b =1 1  , x0=0 0  .

Then at step k = 1 one obtains Ad1=0, and therefore τ

1is undefined.

It can be shown (cf.Elman(1982)) that the CGR method does not break

down, if A is a positive real matrix, i.e. if

M = 1

2(A + A

T)

is positive definite. Hence, for this type of matrices the GCR methods (in exact arithmetic) yields the solution in a finite number of steps.

(38)

Comments on GCR

Axelsson(1980) proposed the following method for nonsymmetric linear systems:

For x0, r0, and d1as in the GCR method let

xk = xk −1+ k −1 X j=1 ηk ,jdj rk = b − Axk βk = (Ark)TAdk/kAdkk22 dk +1 = rk− βkdk

where the scalars ηk ,j are chosen such that krkk2in minimal, i.e. from the

linear system of equations

(dj)TAT(Ay − rk −1) =0, j = 1, . . . , k , y =

k −1

X

j=1

ηk ,jdj.

(39)

Comments on GCR

The work and storage requirements of the GCR method are prohibitively high.

Vinsome(1976) proposed a method calledORTHOMINwhich is a truncated GCR method and which is considerably less expensive per iteration step. For

an integer m  n the search direction dk +1, k > m, is forced to be

ATA–orthogonal only to the last m directions dk, . . . ,dk −m+1, but not to all

preceding directions dj. This method usually is referred toORTHOMIN(m).

Eisenstat, Elman & Schultz(1983) alternatively suggested to restart the GCR

method every m + 1 steps, i.e. the current iterate xj(m+1), j = 0, 1, 2, . . . is

chosen as a new initial guess for the GCR method. The storage requirements

of this method, calledGCR(m), are the same as for ORTHOMIN(m) since m

preceding directions have to be stored, but the work per iteration is smaller,

since on an average only m/2 vectors dj are used to compute the current

(40)

ORTHODIR

Young & Jea(1980) studied methods which are closely related to ORTHOMIN.

InORTHOMINthey replaced the minimization problem in the k -th step by

kxk − x∗kZA≤ kx − x∗kZA for every x ∈ x0+ Kk(r0,A)

where Z ∈ Rn×nis an arbitrary matrix such tht ZA is positive definite.

Moreover, they considered an algorithm calledORTHODIRorORTHODIR(m)

for the truncated form, which differs from the GCR method only by the choice of the search directions. These are given by

˜ dk +1=dk k X j=1 ˜ γk ,jd˜j

where the scalars ˜γk ,j are chosen such that the search directions ˜dj are

ZA-conjugate.

Notice that ORTHODIR can only break down with ˜dk +1=0 if the solution x

lies in x0+ K

(41)

ORTHORES

Young & Jea(1980) introduced a method calledORTHORESand

ORTHORES(m)for the truncated variant, where the iterates are constructed

by the recurrence relation

xk = γkrk −1rk −1+ k −1

X

j=0

δk ,jxj

and γk and δk ,j are determined such that the residual vectors rj are

semi-orthogonal with respect to Z , i.e.

(Zrk)Trj =0 for j < k .

For the truncated version ORTHORES(m) this requirement must be fulfilled for k − m ≤ j < k

For the choice of Z Young and Jea considered among others

Z = AT, Z = AT(A + AT), and Z = ATM

(42)

Generalized Minimal Residual Method (GMRES)

A different approach to the iterative solution of general linear systems which generalizes the MINRES method and which is mathematically equivalent to

the GCR method was introduced bySaad & Schultz(1986). Differently from

the GCR method this method, which is calledGeneralized Minimal RESidual

Method(GMRESfor short) does not break down for general matrices, which

are not positive real.

GMRES constructs a sequence

xk ∈ x0+ Kk(r0,A)

of vectors such that the Euclidean norm of the residuum becomes minimal.

To this end an orthonormal basis q1, . . . ,qk of K

k(r0,A) is determined using

Arnoldi’s method(i.e. the modified Gram-Schmidt process), such that the problem

krkk

2= kb − A(x0+Qky )k2=min!, y ∈ Rk,

(43)

Arnoldi method

1: Let q1be an initial vector such that kq1k

2=1. 2: for k = 1, 2, . . . do 3: qk +1=Aqk 4: for j = 1 : k do 5: hjk= (qk +1)Tqj 6: qk +1=qk +1− hjkqj 7: end for 8: hk +1,k= kqk +1k2 9: qk +1=qk +1/hk +1,k 10: end for

(44)

GMRES

With Qk = (q1, . . . ,qk)and

˜

Hk = (hj`) ∈ R(k +1)×k such that hj`=0 for j − ` > 2

it holds

AQk =Qk +1H˜k,

and for x = x0+Qky , r = b − Ax and q1=r0/kr0k2

kr k2 = kb − A(x0+Qky )k2= kr0− AQky k2= kr0− Qk +1H˜ky k2 = Qk +1(kr 0k 2e1− ˜Hky ) 2= kr 0k 2e1− ˜Hky 2.

Hence, the problem to minimize the Euclidean norm of the residuum on a Krylov space yields a least squares problem where the system matrix ˜Hk has

(45)

GMRES ct.

If Uk ∈ R(k +1)×(k +1)is an orthogonal matrix such that

UkH˜k =  Rk 0T  , Rk upper triangular,

and if uk is the first column of U

k then it holds kr 0k 2e1− ˜Hky 2 2 = Uk(kr 0k 2e1− ˜Hky ) 2 2 = kr0k 2uk−  Rk 0T  y 2 2 = kr0k 2(uk1, . . . ,ukk)T − Rky 2 2 +kr0k2 2· |uk +1k |2.

(46)

GMRES ct.

Hence, the least squares problem is equivalent to the linear system

Rky = kr0k2    u1k .. . ukk   

where Rk is an upper triangular matrix, and the norm of the residuum satisfies

kb − Axkk2= kr0k · |uk +1k |,

i.e. the accuracy of the method, can be monitored without explicitly forming xk

or yk.

If the residual norm is sufficiently small, GMRES is terminated, and only after this final step of the Arnoldi method, the triangular system is solved and

xk =x0+Q

(47)

GMRES ct.

Since ˜Hk is an upper Hessenberg matrix, the matrix Uk can be calculated

efficiently using Givens reflections. Let

Gj =   Ij−2 0 0 0T cj sj 0T s j −cj  ∈ Rj×j, cj2+sj2=1

denote a Givens reflection acting on the (j − 1)-st and j-th component of a vector.

Let U1=G2be such that

U1H˜1=  c2 s2 s2 −c2   h11 h21  = q h2 11+h221 0 ! =:  ˆ h11 0  .

(48)

GMRES ct.

Assume that the orthogonal matrix Uk −1=Gk  Gk −1 0 0T 1  · · · · ·  G2 O O Ik −1  =:Gk  Uk −2 0 0T 1 

is chosen such that

Uk −1H˜k −1=:        ˆ h11 hˆ12 . . . hˆ1,k −1 0 hˆ22 . . . hˆ2,k −1 .. . ... . .. ... 0 0 . . . hˆk −1,k −1 0 0 . . . 0        =:  Rk −1 0T  ∈ Rk ×(k −1)

(49)

GMRES ct.

Since the multiplication of a matrix by Gk +1from the left leaves invariant the

first k − 1 rows one obtains the upper triangular matrix in the k -th step of the method by UkH˜k = Gk +1  Uk −1 0 0T 1   ˜ Hk −1 hk 0T h k +1,k  =   Ik −1 0 0 0T c k +1 sk +1 0T sk +1 −ck +1     Rk −1 h˜k 0T g k 0T h k +1,k  =  Rk 0T  where Uk −1      h1k .. . hk −1,k hkk      =:Uk −1hk =:      ˆ h1k .. . ˆ hk −1,k gk      =:  ˆ hk gk  , . . .

(50)

GMRES ct.

. . . ,sk +1:= hk +1,k q g2 k +h2k +1,k , ck +1:= gk q g2 k +h2k +1,k .

In a similar manner as in the SYMMLQ method the elements ˆhj`can be

obtained along with the Arnoldi process, and the reflections can be applied to

the right hand side kr0k

2e1currently.

Observe however, that differently from the SYMMLQ method the nontrivial

entries cj and sj of Gj have to be stored since they are needed in the

transformation of the k -th column hk of ˜H

k.

The diagonal elements of the triangular matrix Rk are given by

ˆ hjj =

q g2

j +h2j+1,j.Hence, Rk is nonsingular, if the Arnoldi process constructs

k orthogonal vectors qj. GMRES breaks down if and only if hj+1,j =0 for

(51)

GMRES algorithm

q1=b − Ax0 z1= kq1k2 q1=q1/z 1 k = 1 while |zk| >tol do qk +1=Aqk for i = 1 : k do hik = (qi)Tqk +1 qk +1=qk +1− hikqi end for hk +1,k = kqk +1k2 qk +1=qk +1/h k +1,k for i = 1 : k − 1 do  hi,k hi+1,k  =  ci+1 si+1 si+1 −ci+1   hi,k hi+1,k  end for

(52)

GMRES algorithm ct.

α =qh2 kk+hk +1,k2 sk +1=hk +1,k/α; ck +1=hkk/α hkk = α zk +1=sk +1zk zk =ck +1zk k = k + 1 end while yk =zk/hkk for i = k − 1 : −1 : 1 do yi = (zi −Pkj=i+1hijyj)/hii end for xk =x0+Pk j=1yjqj

(53)

Cost of GMRES

Cost of the k -th step (neglecting the transformation of ˜Hk to triangular form):

1 matrix-vector product k+1 scalar products k+1 _axpy Determining xm∈ x0+ K m(r0,A) costs m+1 matrix-vector products (m+1)(m+2)+m level-1-operations m+1 vectors to store

(54)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y = 0, h=1/128

(55)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 163.84 u = 0, h=1/128 ||rGMRES||2

(56)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 1638.4 u = 0, h=1/128 ||rGMRES||2

(57)

Example

200 400 600 800 1000 1200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y − 8192 u = 0, h=1/128

(58)

Theorem 6.2

Let A be positive real, i.e. the symmetric part M := 0.5(A + AT)of A is positive

definite. Then it holds krkk 2≤  1 − λmin(M) 2 λmax(ATA) k /2 kr0k 2.

Here λmin(·)and λmax(·)denotes the minimal and maximal eigenvalue,

(59)

Proof

By construction krkk 2= min q∈Πk,q(0)=1 kq(A)r0k 2,

and for every α ∈ R we get with the polynomial ˜q(λ) = (1 + αλ)k

min

q∈Πk,q(0)=1

kq(A)k2≤ k˜q(A)kk2≤ k˜q(A)kk2.

From xTMx ≥ λ

min(M)xTx and xTATAx ≤ λmax(ATA)xTx for every x 6= 0 we

obtain for every α ∈ R

k˜q(A)k22 = max kxk2=1 xT(I + αA)T(I + αA)x = max kxk2=1 (1 + αxT(A + AT)x + α2xTATAx )

(60)

Proof ct.

and for α < 0

k˜q(A)k2

2≤ 1 + 2αλmin(M) + α2λmax(ATA).

The quadratic expression on the right hand side attains its minimum for

α = −λmin(M)/λmax(ATA).

Hence, k˜q(A)k2≤  1 − λmin(M) 2 λmax(ATA) 1/2 ,

(61)

Theorem 6.3

Let A = T ΛT−1be diagonalizable with a regular matrix T and a diagonal

matrix Λ. Then it holds krkk 2≤ κ2(T )kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|. If A is normal, then krkk2≤ kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|.

(62)

Proof

With ˜Πk := {q ∈ Πk : q(0) = 1} it holds krkk 2 = min q∈ ˜Πk kq(A)r0k 2 = min q∈ ˜Πk kq(T ΛT−1)r0k2 = min q∈ ˜Πk kT q(Λ)T−1r0k2 ≤ kT k2· kT−1k2 min q∈ ˜Πk kq(Λ)k2kr0k2 = κ2(T )kr0k2min q∈ ˜Πk max λ∈σ(A) |q(λ)|.

If A is normal, then A can be transformed to diagonal form unitarily, and κ2(T ) = 1.

(63)

Remark

If A is normal, then (cf.Greenbaum & Gurvits1994) the bound of the speed of

convergence is sharp, i.e. for A there exists an initial residual r0such that the

inequality holds with ’=’. In this case the speed of convergence depends on the spectrum of the matrix.

For nonnormal matricesit was shown byGreenbaum, Ptak and Strakos

(1996), that for every monotonely decreasing sequence of positive real numbers there exists a matrix and an initial vector such that the residuals in the GMRES method attain the elements of the given sequence. The matrix can be determined such that it has predetermined eigenvalues.

Hence,the speed of convergence of the GMRES method is completely

(64)

Example

The following example demonstrates that the GMRES may make progress only in the n-th step, although the spectrum may be clustered very well. Hence, the speed of convergence of GMRES for nonnormal matrices is independent of the spectrum of the system matrix.

Let A =        0 1 0 . . . 0 0 0 1 . . . 0 .. . ... ... . .. ... 0 0 0 . . . 1

−a0 −a1 −a2 . . . −an−1

      

be a Frobenius matrix. Assume that the initial vector is chosen such that r0=e1.

Then Ar0is a multiple of en, A2r0is a linear combination of enand en−1, etc.

All Akr0, k = 1, . . . , n − 1 are orthogonal to r0. Hence, the optimal

approximation in x0+ K

k(r0,A) is xk =x0, k = 1, . . . , n − 1, and only in the

n-th step the solution is arrived. By choosing the ajs suitably every spectrum

(65)

Remark

With this result in mind it is easy to construct an example for which CGNR clearly outperforms GMRES:

For A ∈ R10×10as on the last slide where the roots of the polynomial

p(λ) = λ10+ 9 X j=0 ajλj are λj =1 + 0.1 exp(0.2πji), j = 0, . . . , 9,

b = e1= (1, 0, . . . , 0)T and x0=0 GMRES stagnates with x1= · · · =x9=0,

x10=e2, whereas CGNR reduces the Euclidean norm of the residual to

(66)

Constructing a problem with given convergence curve

FollowingGreenbaum, Ptak & Strakos(1996) we construct a problem with a

given convergence curve and any prescribed eigenvalues.

Let x0=0 and r0=b, and let W = {w1, . . . ,wn} be an orthonormal basis of

the Krylov space Kn(b, A) such that span{w1, . . . ,wk} = Kk(b, A) for

j = 1, . . . , n.

From the minimization property

krkk2=min{kr0− uk : u ∈ AKk(r0,A)}

it is clear that b can be expanded as b = n X j=1 hb, wjiwj, where |hb, wji| =pkrj−1k2− krjk2, r0=b, krnk = 0.

(67)

Constructing a problem with . . . ct.

Given a nonincreasing positive sequence f (0) ≥ f (1) ≥ · · · ≥ f (n − 1) > 0, define f (n) = 0, and

g(k ) = q

f (k − 1)2− f (k )2, k = 1, 2, . . . , n.

The conditions kbk = f (0), krjk = f (j), j = 1, 2, . . . , n − 1 will then be satisfied

if the coordinates of b in the basis W are determined by the prescribed sequence of residual norms

(68)

Constructing a problem with . . . ct.

Let Λ = {λ1, . . . , λn}, λj 6= 0, j = 1, . . . , n be a set of points in the complex

plain. Consider the monic polynomial

n Y j=1 (λ − λj) =: λn− n−1 X j=0 αjλj. Clearly, α06= 0.

We construct the matrix A as a representation of a linear operator mapping A : Cn→ Cnwith respect to the standard basis e1, . . . ,en.

(69)

Constructing a problem with . . . ct.

Let V = {v1, . . . ,vn} be any orthonormal basis of Cn, let V = [v1, . . . ,vn], and

let b satisfy

VHb = (g(1), . . . , g(n))T.

Note that given b with kbk = f (0), V can be chosen or, alternatively, given V , b can be chosen.

Since g(n) 6= 0, the set of vectors B = {b, v1, . . . ,vn−1} is linearly

independent, and also forms a basis of Cn.

Let B = [b, v1, . . . ,vn−1]. Then the operator A is determined by the equations

Ab := v1

Avj := vj+1, j = 1, . . . , n − 2

Avn−1 := α

(70)

Constructing a problem with . . . ct.

The matrix representation of A with respect to the basis B is

AB=      0 . . . 0 α0 1 0 α1 . .. ... ... 1 αn−1     

which is the companion matrix corresponding to the set of eigenvalues Λ.

Changing the basis from B to the standard basis {e1, . . . ,en} yields the

desired matrix

(71)

Restarted GMRES

The main disadvantage of GMRES is that the cost and the storage requirements grow linearly with the number of iterations.

The usual way to cope with this problem is restarting. After a given number m

of steps the current least squares problem is solved, and the solution xmis

chosen to be the initial vector for another m GMRES steps. This method

calledGMRES(m)is repeated until convergence.

There are no rules for choosing m. If m is chosen too large the method can become too expensive, and if m is chosen too small the convergence can become very slow, or the method can even stagnate without convergence.

(72)

Example

For A =  0 1 −1 0  , b =  1 1  , x0=0,

x1=x0solves the least squares problem

kAx − bk2=min!, x = x0+ αb.

(73)

Convergence of GMRES(m)

For positive real A we get from the first convergence theorem for GMRES(m) the residual bound

krjm+ik 2≤  1 − λmin(M) 2 λmax(ATA) i/2 krjmk 2, krjmk 2≤  1 − λmin(M) 2 λmax(ATA) m/2 kr(j−1)mk 2, and therefore krkk 2≤  1 − λmin(M) 2 λmax(ATA) k /2 kr0k 2.

(74)

Example

0 200 400 600 800 1000 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES 2 4 8 16 32 64

GMRES, GMRES(m) for −∆ u − 256 u

(75)

Example

For

−∆u − 128uy− 163.84u = 0, h = 1/128,

we get

Type iterations flops

GMRES 190 1.20e9 GMRES(2) stationary GMRES(4) 964 4.39e8 GMRES(8) 656 3.66e8 GMRES(16) 624 5.01e8 GMRES(32) 704 9.24e8 GMRES(64) 704 1.65e9

(76)

Example

0 200 400 600 800 1000 1200 1400 1600 1800 2000 10−6 10−5 10−4 10−3 10−2 10−1 100

GMRES, GMRES(m) for −∆ u − 256 u

y − 819.2 u = 0, h=1/128

(77)

Preconditioning

Again GMRES and GMRES(m) can be preconditioned applying the methods to the system

M−1Ax = M−1b left preconditioning

or to

AM−1y = b, y = Mx right preconditioning

where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.

In the following examples we used M = LU where [L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .

(78)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

gmres without preconditioning, 1.37E9 flops

nofill luinc, 1.01E9+1.12E5 flops

luinc(a,0.1), 3.10E7+4.22E6 flops

luinc(a,0.01), 1.28E7+4.22E6 flops

luinc(a,0.001), 5.26E6+4.22E6 flops

−∆ u − 256 uy = 0, h=1/128, left preconditioning

(79)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

gmres without preconditioning, 1.37E9 flops

nofill luinc, 9.72E8+1.12E5 flops

luinc(a,0.1), 2.70E7+4.22E6 flops

luinc(a,0.01), 1.15E7+4.22E6 flops

luinc(a,0.001), 5.27E6+4.22E6 flops

−∆ u − 256 uy = 0, h=1/128, right preconditioning

(80)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

gmres without preconditioning, 1.55E9 flops

nofill luinc, 1.20E9+1.12E5 flops

luinc(a,0.1), 3.74E7+4.22E6 flops

luinc(a,0.01), 1.57E7+4.22E6 flops

luinc(a,0.001), 7.34E6+4.22E6 flops

−∆ u − 256 uy − 1638.4 u = 0, h=1/128, left preconditioning

(81)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

gmres without preconditioning, 1.55E9 flops

nofill luinc, 1.16E9+1.12E5 flops

luinc(a,0.1), 3.74E7+4.22E6 flops

luinc(a,0.01), 1.57E7+4.22E6 flops

luinc(a,0.001), 6.28E6+4.22E6 flops

−∆ u − 256 uy − 1638.4 u = 0, h=1/128, right preconditioning

(82)

Example

50 100 150 200 250 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

GMRES for −∆ u−256uy−8192u=0 with left (red) and right (green) preconditioning

1.03e8 flops

2.25e9 flops

Referenzen

Outline

ÄHNLICHE DOKUMENTE

Two types of iterative projection methods for eigenproblems are considered in this summer school; methods which projected the underlying problem to a sequence of Krylov spaces,

Obviously, the Gauß-Seidel method and the successive overrelaxation are not symmetric (the eigenvalues of the SOR iteration matrix are not even real). It is possible, however,

For the symmetric Gauß-Seidel method and the SSOR method the iteration matrices are similar to symmetric and positive semidefinite matrices. Obviously, the spectrum of the

Because the conjugate gradient method is a direct method, and hence, in exact arithmetic the solution is obtained in a finite number of steps, terms like ‘convergence’ or

Although the LQ factorization is determined in a stable way (even for indefinite matrices) the algorithm should not be implemented directly since intermediate matrices ˜ L k may

Gutknecht (1993) proposed BiCGStab2 where in odd-numbered iteration steps the polynomial ψ is expanded by a linear factor, but in the following even-numbered step this linear factor

TUHH Heinrich Voss QMR Methods Summer School 2006 22 / 63...

If an error tolerance is not met the search space is expanded in the course of the algorithm in an iterative way with the aim that some of the eigenvalues of the reduced matrix