Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 6

(1)

ITERATIVE PROJECTION METHODS FOR

SPARSE LINEAR SYSTEMS AND

EIGENPROBLEMS

CHAPTER 6 : MINIMIZING METHODS

Heinrich Voss voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

CGN Methods

There are several generalizations of CG (and CR) for nonsymmetric linear problems.

The most obvious approach is to apply CG to an equivalent linear systems, which is symmetric and positive definite, e.g.

ATAx = ATb

or

AATy = b, x = ATy .

These methods are calledCGN methodwhere the letter ’N’ suggests that

(3)

CGN methods ct.

For AT_{Ax = A}T_{b the k -th CG iterate x}k _{minimizes the functional}

kx − x∗_k2

AT_A= (x − x∗)TATA(x − x∗) = kA(x − x∗)k2₂= kr k2₂

in the Krylov space

x0+ Kk(ATr0,ATA)

= x0+span{ATr0, (ATA)ATr0, . . . , (ATA)k −1ATr0}.

This method is calledCGNR methodwhere the letter ’R’ indicates that the

residuum is minimized.

The CGNR method was suggested byHestenes und Stiefel(1952). The

matrix AT_{A in general is more populated than A, however, the algorithm does}

(4)

CGNR

The matrix ATA which appears in the development of the CGNR method is

usually not as sparse as the original matrix A. However, it has not to be formed explicitly in the algorithm.

1: r0=b − Ax0

2: d1₌_AT_r0

3: α0= kd1k22

4: for k = 1, 2, . . . until convergence do

5: sk ₌_Adk 6: τ_k = α_{k −1}/ksk_k2 2 7: xk ₌_xk −1_{+ τ} kdk 8: rk ₌_rk −1_{− τ} ksk 9: βk =1/αk −1 10: tk ₌_AT_rk 11: αk = ktkk22 12: βk = αkβk 13: dk +1=tk+ βkdk 14: end for

(5)

CGNR

Cost of CGNR

2 matrix-vector products 2 scalar products 3 _axpy

Storage requirements: 4 vectors

To investigate the convergence of the CGNR method we note that xk =x0+qk −1(ATA)ATr0

for some polynomial qk −1of degree k − 1. Subtracting the exact solution x∗

on both sides and observing r0₌_Ae0_{one obtains}

ek =pk(ATA)e0, pk(z) = 1 − zqk −1(z),

and since Apk(ATA) = pk(AAT)A and rk =Aek, multiplication with A gives

(6)

Convergence of CGNR

Replacing pk by a shifted and scaled Chebyshev polynomial one gets similar

to the convergence result for the CG method

Theorem 6.1

Let A be nonsingular, σ1its largest and σnits smallest singular value, and let

R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds krk_k 2≤ 2Rk 1 + R2kkr 0_k 2≤ 2 κ₂(A) − 1 κ2(A) + 1 k kr0_k 2.

(7)

Example

1000 2000 3000 4000 5000 6000 7000 8000 10−6 10−5 10−4 10−3 10−2 10−1 100 CG and CGNR for −∆ u = 0, h=1/128 ||x_CG||₂ ||x CGNR||2

(8)

Example

100 200 300 400 500 600 700 800 900 1000 1100 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y = 0, h=1/128 ||r CGNR||2

(9)

Example

0 200 400 600 800 1000 1200 1400 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −163.84 u = 0, h=1/128 ||r_CGNR||₂

(10)

Example

0 1000 2000 3000 4000 5000 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −1638.4 u = 0, h=1/128 ||r_CGNR||₂

(11)

CGNE

For AATy = b, x = ATy , the iterates yk minimize the functional

ky − y∗k2_AAT = (y − y∗)AAT(y − y∗) = k(AT(y − y∗)k2₂= kx − x∗k2₂

in the Krylov

ATx0+ATKk(r0,ATA)

= ATx0+span{ATr0,AT(ATA)r0, . . . ,AT(ATA)k −1r0}.

This method is calledCGNE methodwhere the letter ’E’ indicates that the

error is minimized.

(12)

CGNE

Again AAT _{has not to be formed explicitly in the algorithm. Moreover, the}

method can be implemented in terms of the vectors xk _{without reference to}

yk ₌_AT_xk_{. The cost and storage requirements are the same as for CGNR.}

1: r0=b − Ax0

2: d1=AT_r0

3: α0= kr0k22

5: τk = αk −1/kdkk22 6: xk ₌_xk −1_{+ τ} kdk 7: rk ₌_rk −1_{− τ} kAdk 8: β_k =1/αk −1 9: αk = krkk2₂ 10: βk = αkβk 11: dk +1₌_AT_rk_{+ β} kdk 12: end for

(13)

Convergence of CGNE

The k -th iterate xk ₌_AT_yk _satisfies

A−Txk =yk = y0+ ˜qk −1(AAT)(AATy0− b)

= A−Tx0+ ˜qk −1(AAT)r0,

i.e.

xk =x0+ATq˜k −1(AAT)r0.

Subtracting the solution x∗_yields

ek =e0+ATq˜k −1(AAT)Ae0, and from ATq˜k −1(AAT)A =: AT k −1X j=0 α_j(AAT)jA = k −1 X j=0 αjAT(AAT)jA = k −1 X j=0 αj(ATA)j+1 we finally get ek = ˜pk(ATA)e0 p˜k(0) = 1.

(14)

Convergence of CGNE ct.

Replacing ˜pk by a shifted and scaled Chebyshev polynomial one gets

Theorem 6.1a

Let A be nonsingular, σ1its largest and σnits smallest singular value, and let

R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds kek_k 2≤ 2Rk 1 + R2kke 0_k 2≤ 2 κ2(A) − 1 κ2(A) + 1 k ke0_k 2.

(15)

CGNR, CGNE

The bounds given in Theorems 6.1 and 6.1a illustrate the main drawback of the CGN methods. In particular, if A is symmetric and positive definite the speed of convergence is slowed down to the one of the steepest descent method.

However, there are special situations where CGNR and CGNE are the

optimal extensions of CG (cf.Nachtigal, Reddy & Treffethen(1992) and

Freund, Golub & Nachtigal(1992)), but usually they are not satisfactory generalizations of the CG method.

As for the CG method the convergence properties of CGNE and CGNR can

be improved considerably by preconditioning. Since the matrices AT_{A and}

AAT are much more populated than the sparse matrix A the incomplete

(16)

Preconditioning by incomplete LQ factorization

Let A = LQ be the LQ factorization of A, where L is a lower triangular matrix and Q is an orthogonal matrix. Then

AAT = (LQ)(LQ)T =LQQTLT =LLT,

and LLT _{is the exact Cholesky factorization of AA}T _{which can be achieved}

without forming the matrix AAT_.

To obtain a sparse approximationSaad(1988) proposed the incomplete LQ

factorization of A.Analogously to the incomplete Cholesky factorization most of the elements of L and Q are dropped during the factorization process according to some strategy to avoid excessive fill-in.

The LQ factorization can be calculated by applying the Gram-Schmidt process to the rows of A. This approach is known to be unstable if it is used to

orthogonalize a large number of vectors. Notice, however that the rows of Q remain very sparse in the incomplete LQ factorization, and therefore any given row of A will be orthogonal to most of the previous rows of Q. Hence, within incomplete LQ factorization the Gram-Schmidt process is much less susceptible to numerical difficulties.

(17)

Incomplete LQ factorization ct.

Details of the implementation of the incomplete LQ factorization and its use as

a preconditioner in CGNE method can be found inSaad(1988) and the book

of Saad on Iterative Methods for Linear Systems.

Similarly the CGNR method can be preconditioned using the incomplete LQ factorization of AT_.

(18)

Preconditioning by SSOR

Consider the j-elementary step of the Gauß-Seidel method for the linear system

AATy = b

where the j-th component of y is modified such that the j-equation of the system is satisfied.

Let ynew = yold + δjej, then δj ∈ R is chosen such that

0 = (ej)T(b − AAT_{ynew) = (e}j)T(b − AAT_{yold − δ}jAATej) ⇒ δj = bj− (ATej)TATyold kAT_ej_k2 2 = bj− a 0 jxold k(a0 j)Tk22 , where a0

j denotes the j-th row Of A, and xold = ATyold. The update of x is

given by

xnew = AT_{ynew = A}T

yold + δjATej =xold + δj(a0j)T,

(19)

SSOR for AA

T

y = b, x = A

T

y

1: Let y be an initial approximation

2: x = AT_y 3: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 4: δ = ω(bj − a0jx )/ka0jk22 5: yj =yj+ δ 6: x = x + δ(a0_j)T 7: end for Comments

[1.] Since A is nonsingular the matrix AAT _{is positive definite, and therefore}

the SSOR method converges for every ω ∈ (0, 2) if thefor-statement is

repeated. A serious difficulty is to determine the optimum relaxation parameter.

[2.] The matrix AAT _{can be dense or in general much less sparse than A.}

However, the cost of the implementation above depend only on the nonzero structure of A.

(20)

CGNE-SSOR

In the following implementation of CGNE with preconditioning by SSOR the call "SSOR-CGNE(r,ω,z)" indicates that one sweep of the SSOR method with parameter ω and initial approximation z = 0 is executed for the linear system AAT_{z = r .}

1: r0=b − Ax0

2: SSOR-CGNE(r0, ω,z0)

3: d1=AT_z0

4: α0= (z0)Tr0

6: τk = αk −1/kdkk22 7: xk ₌_xk −1_{+ τ} kdk 8: rk ₌_rk −1_{− τ} kAdk 9: β_k =1/αk −1 10: SSOR-CGNE(rk_{, ω,}_zk₎ 11: αk = (zk)Trk 12: βk = αkβk 13: dk +1₌_AT_zk_{+ β} kdk 14: end for

(21)

SSOR for A

T

Ax = A

T

b

Similarly, if aj _{denotes the j-th column of A the SSOR method for A}T_{Ax = A}T_b

with relaxation parameter ω reads

1: Let x be an initial approximation

2: z = Ax 3: c = ATb 4: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 5: δ = ω(cj− zTaj)/kajk22 6: xj =xj+ δ 7: z = z + δaj 8: end for Comment

The SSOR method above can be applied to nonquadratic matrices A ∈ Rm×n

where m > n and rank(A) = n. In this case it converges to the unique solution of the least squares problem

(22)

CGNR-SSOR

In the following implementation of preconditioned CGNR the call

"SSOR-CGNR(r,ω,z)" indicates that one sweep of SSOR with parameter ω

and initial approximation z = 0 is executed for the linear system AAT_{z = r .}

1: r0=AT(b − Ax0)

2: SSOR-CGNR(r0, ω,z0)

3: d1=z0

4: α0= (z0)Tr0

6: sk ₌_Adk 7: τk = αk −1/kskk22 8: xk ₌_xk −1_{+ τ} kdk 9: rk ₌_rk −1_{− τ} kATsk 10: β_k =1/αk −1 11: SSOR-CGNR(rk_{, ω,}_zk₎ 12: αk = (zk)Trk 13: βk = αkβk 14: dk +1₌_zk_{+ β} kdk 15: end for

(23)

(24)

(25)

Preconditioning

CGNR (and CGNE) can be precondioned not taking advantage of the special structure of the system matrix applying the methods to the system

M−1Ax = M−1b left preconditioning

or to

AM−1y = b, y = Mx right preconditioning

where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.

In the following examples we used M = LU where

[L, U] = luinc(A,000) no fill incomplete LU decomposition

or

[L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .

(26)

Example

20 40 60 80 100 120 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 u_y=0, h=1/128

no precond.: 5.877E8 flops

nofill luinc: 1.108E8 + 1.115E5 flops

luinc(a,0.1): 3.414E7 + 4.4225E6 flops

luinc(a,0.01): 1.555E7 + 4.4225E6 flops

luinc(a,0.001): 6.298E6 + 4.4225E6 flops

(27)

Example

20 40 60 80 100 120 140 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 u_y−163.84 u=0, h=1/128

luinc(a,0.1): 4.346E7 + 4.4225E6 flops

luinc(a,0.01): 1.755E7 + 4.4225E6 flops

luinc(a,0.001): 6.298E6 + 4.4225E6 flops

(28)

Example

50 100 150 200 250 300 350 400 450 500 550 600 10−6 10−5 10−4 10−3 10−2 10−1 100

CGNR with left precond., −∆ u−256 u_y−1638.4 u=0, h=1/128

luinc(a,0.1): 1.934E8 + 4.4225E6 flops luinc(a,0.01): 2.144E8 + 4.4225E6 flops

luinc(a,0.001): 5.665E7 + 4.4225E6 flops

(29)

Sparse approximate inverses

For symmetric problems the preconditioner had to be symmetric as well. For nonsymmetric problems this is no longer necessary. We just have to find a

matrix C = M−1_{such that kAC − Ik ≈ 0 (right preconditioning) or}

kCA − Ik ≈ 0 (left preconditioning). Then applying the preconditioner simply amounts in matrix multiplication.

The Frobenius norm is particularly interesting to choose since

kAC − Ik2 F = n X j=1 k(AC − I)ej_k2 2.

(30)

SPAI ct.

Hence, if cj denotes the j-th column of C, then we have to solve the n

independentleast squares problems kAcj_{− e}j_k

2=min!, j = 1, . . . , n.

Solving these problems is equivalent to determining the inverse of A, and will be much too expensive. Therefore, for every j = 1, . . . , n we choose a sparsity

pattern G(j), denote by ˆcj _{a vector which contains only elements in G(j), and}

by ˆAj a matrix whose columns are the columns of A corresponding to G(j) and

whose rows i are such that there exists aik 6= 0 for some k ∈ G(j). Since A is

sparse, ˆAj will be a small matrix, and the resulting least squares problems

kˆAjcˆj− ˆejk2=min! (∗)

(31)

Choice of sparsity pattern

Grote & Huckle(1997) proposed an incremental method. Starting from a set

of indices G0(j) (usually corresponding to a diagonal C or to the structure of

A) they solve the least squares problem (*) and then, iteratively, enlarge the set of indices.

Let

kˆA(p)_j ˆcj,p− ej_k

2=min! (∗∗)

be the least squares problem in the p-th step of the iteration, let cj _{∈ R}n_{be the}

solution of (**) (i.e. ˆcj,p_{augmented by zero components), and denote by}

(32)

Choice of sparsity pattern ct.

Let L = {k : rk 6= 0}, and for ` ∈ L let N`= {k : a`k 6= 0, k 6∈ Gp(j)}. Then

the candidates for indices of new non-zero components in the solution vector are chosen in

∪`∈LN`.

For k in this set we consider the problem

kr + µkAekk2= kA(cj+ µkek) −ejk2=min! =⇒ µk = −

rT_Aek

kAek_k2 2

.

It can be shown that the norm of the new residual is kr k2 2− (rTAek)2 kAek_k2 2 ,

and that there exist indices such that rT_Aek _{6= 0. From these indices one}

(33)

Choice of sparsity pattern ct.

This iteration step is repeated until the norm of the residual is smaller than a prescribed bound or until we have reached the maximum storage that is allowed for column j.

Note that putting new indices into Gp(j) will add columns and rows to the

matrix ˆA(p)_j . Methods exist to update the QR factorization.

Approximate inverse preconditioning is an active field of research. It is

particularly interesting for parallel machines. Gould & Scott(1995) described

some improvements of the method. A symmetric approximate inverse

(34)

Nonsymmetric problems

Desirable are generalizations of CG (or CR) methods which have similar properties as CG without squaring the condition number of the system matrix. Outstanding properties of the CG method:

— based on a short recurrence (low storage requirements) — minimizes error on ascending sequence of Krylov spaces Theorem of Faber & Manteuffel (1984)

Methods with these two properties exist only for a very limited class of matrices.

For 3-term-recurrences only for matrices

A = eiθ(H + σI), H = HH, θ ∈ R, σ ∈ C.

(35)

Generalized Conjugate Residuals

Eisenstat, Elman & Schultz(1983) studied theGeneralized Conjugate

Residual Method,GCR methodfor short, which is a direct generalization of

the CR method (and of MINRES) to nonsymmetric systems of linear equations.

By the Gram-Schmidt process a basis {d1_{, . . . ,}_dk_{} of the Krylov space}

Kk(r0,A) is constructed which is orthogonal with respect to the scalar product

hx, y i_AT_A=xTATAy .

Since d1, . . . ,dk are ATA–conjugate directions the minimization problem

kx − x∗k_AT_A=min!, x ∈ x0+ K_k(r0,A)

is decoupled into a sequence of line searches along d1_{, . . . ,}_dk_.

Hence, the following algorithm generates iterates xk _{which minimize the}

Euclidean norm of the residual kb − Ax k2= kx − x∗kAT_Aupon x0+ K_k(r0,A)

(36)

GCR method

1: Choose an initial guess x0

2: r0₌_{b − Ax}0

3: d1=r0

4: s1=Ad1

6: αk = kskk22 7: τk = (rk −1)Tsk/αk 8: xk =xk −1_{+ τ} kdk 9: rk ₌_rk −1_{− τ} ksk 10: dk +1₌_rk 11: sk +1₌_Adk +1 12: for j = 1 : k do 13: γ_{k ,j} = (sk +1₎T_sj_/α j 14: dk +1₌_dk +1_{− γ} k ,jdj 15: sk +1₌_sk +1_{− γ} k ,jsj 16: end for 17: end for

(37)

Generalized Conjugate Residuals

Without further assumptions on the matrix A the GCR method may break

down because dk +1becomes the null vector.

Consider for instance

A = 0 1 −1 0 , b =1 1 , x0=0 0 .

Then at step k = 1 one obtains Ad1₌_{0, and therefore τ}

1is undefined.

It can be shown (cf.Elman(1982)) that the CGR method does not break

down, if A is a positive real matrix, i.e. if

M = 1

2(A + A

T₎

is positive definite. Hence, for this type of matrices the GCR methods (in exact arithmetic) yields the solution in a finite number of steps.

(38)

Comments on GCR

Axelsson(1980) proposed the following method for nonsymmetric linear systems:

For x0_{, r}0_{, and d}1_{as in the GCR method let}

xk = xk −1+ k −1 X j=1 ηk ,jdj rk = b − Axk βk = (Ark)TAdk/kAdkk22 dk +1 = rk− βkdk

where the scalars ηk ,j are chosen such that krkk2in minimal, i.e. from the

linear system of equations

(dj)TAT(Ay − rk −1) =0, j = 1, . . . , k , y =

k −1

X

j=1

ηk ,jdj.

(39)

Comments on GCR

The work and storage requirements of the GCR method are prohibitively high.

Vinsome(1976) proposed a method calledORTHOMINwhich is a truncated GCR method and which is considerably less expensive per iteration step. For

an integer m n the search direction dk +1_{, k > m, is forced to be}

AT_{A–orthogonal only to the last m directions d}k_{, . . . ,}_dk −m+1_{, but not to all}

preceding directions dj_{. This method usually is referred to}_ORTHOMIN(m).

Eisenstat, Elman & Schultz(1983) alternatively suggested to restart the GCR

method every m + 1 steps, i.e. the current iterate xj(m+1)_{, j = 0, 1, 2, . . . is}

chosen as a new initial guess for the GCR method. The storage requirements

of this method, calledGCR(m), are the same as for ORTHOMIN(m) since m

preceding directions have to be stored, but the work per iteration is smaller,

since on an average only m/2 vectors dj _{are used to compute the current}

(40)

ORTHODIR

Young & Jea(1980) studied methods which are closely related to ORTHOMIN.

InORTHOMINthey replaced the minimization problem in the k -th step by

kxk − x∗kZA≤ kx − x∗kZA for every x ∈ x0+ Kk(r0,A)

where Z ∈ Rn×n_{is an arbitrary matrix such tht ZA is positive definite.}

Moreover, they considered an algorithm calledORTHODIRorORTHODIR(m)

for the truncated form, which differs from the GCR method only by the choice of the search directions. These are given by

˜ dk +1₌_A˜_dk ₋ k X j=1 ˜ γk ,jd˜j

where the scalars ˜γk ,j are chosen such that the search directions ˜dj are

ZA-conjugate.

Notice that ORTHODIR can only break down with ˜dk +1₌_{0 if the solution x}∗

lies in x0_{+ K}

(41)

ORTHORES

Young & Jea(1980) introduced a method calledORTHORESand

ORTHORES(m)for the truncated variant, where the iterates are constructed

by the recurrence relation

xk = γkrk −1rk −1+ k −1

X

j=0

δk ,jxj

and γk and δk ,j are determined such that the residual vectors rj are

semi-orthogonal with respect to Z , i.e.

(Zrk)Trj =0 for j < k .

For the truncated version ORTHORES(m) this requirement must be fulfilled for k − m ≤ j < k

For the choice of Z Young and Jea considered among others

Z = AT, Z = AT(A + AT), and Z = ATM

(42)

Generalized Minimal Residual Method (GMRES)

A different approach to the iterative solution of general linear systems which generalizes the MINRES method and which is mathematically equivalent to

the GCR method was introduced bySaad & Schultz(1986). Differently from

the GCR method this method, which is calledGeneralized Minimal RESidual

Method(GMRESfor short) does not break down for general matrices, which

are not positive real.

GMRES constructs a sequence

xk ∈ x0+ K_k(r0,A)

of vectors such that the Euclidean norm of the residuum becomes minimal.

To this end an orthonormal basis q1_{, . . . ,}_qk _{of K}

k(r0,A) is determined using

Arnoldi’s method(i.e. the modified Gram-Schmidt process), such that the problem

krk_k

2= kb − A(x0+Qky )k2=min!, y ∈ Rk,

(43)

Arnoldi method

1: Let q1_{be an initial vector such that kq}1_k

2=1. 2: for k = 1, 2, . . . do 3: qk +1₌_Aqk 4: for j = 1 : k do 5: hjk= (qk +1)Tqj 6: qk +1=qk +1− hjkqj 7: end for 8: hk +1,k= kqk +1k2 9: qk +1=qk +1/hk +1,k 10: end for

(44)

GMRES

With Qk = (q1, . . . ,qk)and

˜

Hk = (hj`) ∈ R(k +1)×k such that hj`=0 for j − ` > 2

it holds

AQk =Qk +1H˜k,

and for x = x0+Qky , r = b − Ax and q1=r0/kr0k2

kr k2 = kb − A(x0+Qky )k2= kr0− AQky k2= kr0− Qk +1H˜ky k2 = Qk +1(kr 0_k 2e1− ˜Hky ) 2= kr 0_k 2e1− ˜Hky 2.

Hence, the problem to minimize the Euclidean norm of the residuum on a Krylov space yields a least squares problem where the system matrix ˜Hk has

(45)

GMRES ct.

If Uk ∈ R(k +1)×(k +1)is an orthogonal matrix such that

UkH˜k = Rk 0T , Rk upper triangular,

and if uk _{is the first column of U}

k then it holds kr 0_k 2e1− ˜Hky 2 2 = Uk(kr 0_k 2e1− ˜Hky ) 2 2 = kr0_k 2uk− Rk 0T y 2 2 = kr0_k 2(uk1, . . . ,ukk)T − Rky 2 2 +kr0k2 2· |uk +1k |2.

(46)

GMRES ct.

Hence, the least squares problem is equivalent to the linear system

Rky = kr0k2    u₁k .. . u_kk   

where Rk is an upper triangular matrix, and the norm of the residuum satisfies

kb − Axkk2= kr0k · |uk +1k |,

i.e. the accuracy of the method, can be monitored without explicitly forming xk

or yk_.

If the residual norm is sufficiently small, GMRES is terminated, and only after this final step of the Arnoldi method, the triangular system is solved and

xk ₌_x0₊_Q

(47)

GMRES ct.

Since ˜Hk is an upper Hessenberg matrix, the matrix Uk can be calculated

efficiently using Givens reflections. Let

Gj =   Ij−2 0 0 0T cj sj 0T _s j −cj  ∈ Rj×j, c_j2+s_j2=1

denote a Givens reflection acting on the (j − 1)-st and j-th component of a vector.

Let U1=G2be such that

U1H˜1= c2 s2 s2 −c2 h11 h21 = q h2 11+h221 0 ! =: _ˆ h11 0 .

(48)

GMRES ct.

Assume that the orthogonal matrix Uk −1=Gk Gk −1 0 0T ₁ · · · · · G2 O O Ik −1 =:Gk Uk −2 0 0T ₁

is chosen such that

Uk −1H˜k −1=:        ˆ h11 hˆ12 . . . hˆ1,k −1 0 hˆ22 . . . hˆ2,k −1 .. . ... . .. ... 0 0 . . . hˆk −1,k −1 0 0 . . . 0        =: Rk −1 0T ∈ Rk ×(k −1)

(49)

GMRES ct.

Since the multiplication of a matrix by Gk +1from the left leaves invariant the

first k − 1 rows one obtains the upper triangular matrix in the k -th step of the method by UkH˜k = Gk +1 Uk −1 0 0T 1 _˜ Hk −1 hk 0T _h k +1,k =   Ik −1 0 0 0T _c k +1 sk +1 0T sk +1 −ck +1     Rk −1 h˜k 0T _g k 0T _h k +1,k  = Rk 0T where Uk −1      h1k .. . hk −1,k hkk      =:Uk −1hk =:      ˆ h1k .. . ˆ hk −1,k gk      =: _ˆ hk gk , . . .

(50)

GMRES ct.

. . . ,sk +1:= hk +1,k q g2 k +h2k +1,k , ck +1:= gk q g2 k +h2k +1,k .

In a similar manner as in the SYMMLQ method the elements ˆhj`can be

obtained along with the Arnoldi process, and the reflections can be applied to

the right hand side kr0_k

2e1currently.

Observe however, that differently from the SYMMLQ method the nontrivial

entries cj and sj of Gj have to be stored since they are needed in the

transformation of the k -th column hk _{of ˜}_H

k.

The diagonal elements of the triangular matrix Rk are given by

ˆ hjj =

q g2

j +h2j+1,j.Hence, Rk is nonsingular, if the Arnoldi process constructs

k orthogonal vectors qj. GMRES breaks down if and only if hj+1,j =0 for

(51)

GMRES algorithm

q1₌_{b − Ax}0 z1= kq1k2 q1₌_q1_/z 1 k = 1 while |zk| >tol do qk +1₌_Aqk for i = 1 : k do hik = (qi)Tqk +1 qk +1=qk +1− hikqi end for hk +1,k = kqk +1k2 qk +1₌_qk +1_/h k +1,k for i = 1 : k − 1 do hi,k hi+1,k = ci+1 si+1 si+1 −ci+1 hi,k hi+1,k end for

(52)

GMRES algorithm ct.

α =qh2 kk+hk +1,k2 sk +1=hk +1,k/α; ck +1=hkk/α hkk = α zk +1=sk +1zk zk =ck +1zk k = k + 1 end while yk =zk/hkk for i = k − 1 : −1 : 1 do yi = (zi −Pkj=i+1hijyj)/hii end for xk ₌_x0₊Pk j=1yjqj

(53)

Cost of GMRES

Cost of the k -th step (neglecting the transformation of ˜Hk to triangular form):

1 matrix-vector product k+1 scalar products k+1 _axpy Determining xm_{∈ x}0_{+ K} m(r0,A) costs m+1 matrix-vector products (m+1)(m+2)+m level-1-operations m+1 vectors to store

(54)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y = 0, h=1/128

(55)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 163.84 u = 0, h=1/128 ||r_GMRES||₂

(56)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 1638.4 u = 0, h=1/128 ||r_GMRES||₂

(57)

Example

200 400 600 800 1000 1200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y − 8192 u = 0, h=1/128

(58)

Theorem 6.2

Let A be positive real, i.e. the symmetric part M := 0.5(A + AT₎_{of A is positive}

definite. Then it holds krk_k 2≤ 1 − λmin(M) 2 λmax(ATA) k /2 kr0_k 2.

Here λmin(·)and λmax(·)denotes the minimal and maximal eigenvalue,

(59)

Proof

By construction krk_k 2= min q∈Πk,q(0)=1 kq(A)r0_k 2,

and for every α ∈ R we get with the polynomial ˜q(λ) = (1 + αλ)k

min

q∈Πk,q(0)=1

kq(A)k2≤ k˜q(A)kk2≤ k˜q(A)kk2.

From xT_{Mx ≥ λ}

min(M)xTx and xTATAx ≤ λmax(ATA)xTx for every x 6= 0 we

obtain for every α ∈ R

k˜q(A)k2₂ = max kxk2=1 xT(I + αA)T(I + αA)x = max kxk2=1 (1 + αxT(A + AT)x + α2xTATAx )

(60)

Proof ct.

and for α < 0

k˜q(A)k2

2≤ 1 + 2αλmin(M) + α2λmax(ATA).

The quadratic expression on the right hand side attains its minimum for

α = −λmin(M)/λmax(ATA).

Hence, k˜q(A)k2≤ 1 − λmin(M) 2 λ_max(AT_A) 1/2 ,

(61)

Theorem 6.3

Let A = T ΛT−1be diagonalizable with a regular matrix T and a diagonal

matrix Λ. Then it holds krk_k 2≤ κ2(T )kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|. If A is normal, then krkk2≤ kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|.

(62)

Proof

With ˜Πk := {q ∈ Πk : q(0) = 1} it holds krk_k 2 = min q∈ ˜Πk kq(A)r0_k 2 = min q∈ ˜Πk kq(T ΛT−1)r0k2 = min q∈ ˜Πk kT q(Λ)T−1r0k2 ≤ kT k2· kT−1k2 min q∈ ˜Πk kq(Λ)k2kr0k2 = κ2(T )kr0k2min q∈ ˜Πk max λ∈σ(A) |q(λ)|.

If A is normal, then A can be transformed to diagonal form unitarily, and κ₂(T ) = 1.

(63)

Remark

If A is normal, then (cf.Greenbaum & Gurvits1994) the bound of the speed of

convergence is sharp, i.e. for A there exists an initial residual r0such that the

inequality holds with ’=’. In this case the speed of convergence depends on the spectrum of the matrix.

For nonnormal matricesit was shown byGreenbaum, Ptak and Strakos

(1996), that for every monotonely decreasing sequence of positive real numbers there exists a matrix and an initial vector such that the residuals in the GMRES method attain the elements of the given sequence. The matrix can be determined such that it has predetermined eigenvalues.

Hence,the speed of convergence of the GMRES method is completely

(64)

Example

The following example demonstrates that the GMRES may make progress only in the n-th step, although the spectrum may be clustered very well. Hence, the speed of convergence of GMRES for nonnormal matrices is independent of the spectrum of the system matrix.

Let A =        0 1 0 . . . 0 0 0 1 . . . 0 .. . ... ... . .. ... 0 0 0 . . . 1

−a0 −a1 −a2 . . . −an−1

      

be a Frobenius matrix. Assume that the initial vector is chosen such that r0₌_e1_.

Then Ar0_{is a multiple of e}n_{, A}2_r0_{is a linear combination of e}n_{and e}n−1_{, etc.}

All Ak_r0_{, k = 1, . . . , n − 1 are orthogonal to r}0_{. Hence, the optimal}

approximation in x0_{+ K}

k(r0,A) is xk =x0, k = 1, . . . , n − 1, and only in the

n-th step the solution is arrived. By choosing the ajs suitably every spectrum

(65)

Remark

With this result in mind it is easy to construct an example for which CGNR clearly outperforms GMRES:

For A ∈ R10×10_{as on the last slide where the roots of the polynomial}

p(λ) = λ10+ 9 X j=0 ajλj are λj =1 + 0.1 exp(0.2πji), j = 0, . . . , 9,

b = e1_{= (1, 0, . . . , 0)}T _{and x}0₌_{0 GMRES stagnates with x}1_{= · · · =}_x9₌_0,

x10=e2, whereas CGNR reduces the Euclidean norm of the residual to

(66)

Constructing a problem with given convergence curve

FollowingGreenbaum, Ptak & Strakos(1996) we construct a problem with a

given convergence curve and any prescribed eigenvalues.

Let x0=0 and r0=b, and let W = {w1, . . . ,wn} be an orthonormal basis of

the Krylov space Kn(b, A) such that span{w1, . . . ,wk} = Kk(b, A) for

j = 1, . . . , n.

From the minimization property

krkk2=min{kr0− uk : u ∈ AKk(r0,A)}

it is clear that b can be expanded as b = n X j=1 hb, wjiwj, where |hb, wj_{i| =}_pkrj−1_k2_{− kr}j_k2_{, r}0₌_{b, kr}n_{k = 0.}

(67)

Constructing a problem with . . . ct.

Given a nonincreasing positive sequence f (0) ≥ f (1) ≥ · · · ≥ f (n − 1) > 0, define f (n) = 0, and

g(k ) = q

f (k − 1)2_{− f (k )}2_, _{k = 1, 2, . . . , n.}

The conditions kbk = f (0), krj_{k = f (j), j = 1, 2, . . . , n − 1 will then be satisfied}

if the coordinates of b in the basis W are determined by the prescribed sequence of residual norms

(68)

Constructing a problem with . . . ct.

Let Λ = {λ1, . . . , λn}, λj 6= 0, j = 1, . . . , n be a set of points in the complex

plain. Consider the monic polynomial

n Y j=1 (λ − λ_j) =: λn− n−1 X j=0 α_jλj. Clearly, α06= 0.

We construct the matrix A as a representation of a linear operator mapping A : Cn_{→ C}n_{with respect to the standard basis e}1_{, . . . ,}_en_.

(69)

Constructing a problem with . . . ct.

Let V = {v1, . . . ,vn} be any orthonormal basis of Cn_{, let V = [v}1_{, . . . ,}_vn_{], and}

let b satisfy

VHb = (g(1), . . . , g(n))T.

Note that given b with kbk = f (0), V can be chosen or, alternatively, given V , b can be chosen.

Since g(n) 6= 0, the set of vectors B = {b, v1_{, . . . ,}_vn−1_{} is linearly}

independent, and also forms a basis of Cn_.

Let B = [b, v1_{, . . . ,}_vn−1_{]. Then the operator A is determined by the equations}

Ab := v1

Avj := vj+1, j = 1, . . . , n − 2

Avn−1 _:= _α

(70)

Constructing a problem with . . . ct.

The matrix representation of A with respect to the basis B is

AB=      0 . . . 0 α0 1 0 α1 . .. ... .._. 1 αn−1     

which is the companion matrix corresponding to the set of eigenvalues Λ.

Changing the basis from B to the standard basis {e1_{, . . . ,}_en_{} yields the}

desired matrix

(71)

Restarted GMRES

The main disadvantage of GMRES is that the cost and the storage requirements grow linearly with the number of iterations.

The usual way to cope with this problem is restarting. After a given number m

of steps the current least squares problem is solved, and the solution xm_is

chosen to be the initial vector for another m GMRES steps. This method

calledGMRES(m)is repeated until convergence.

There are no rules for choosing m. If m is chosen too large the method can become too expensive, and if m is chosen too small the convergence can become very slow, or the method can even stagnate without convergence.

(72)

Example

For A = 0 1 −1 0 , b = 1 1 , x0=0,

x1=x0solves the least squares problem

kAx − bk₂=min!, x = x0+ αb.

(73)

Convergence of GMRES(m)

For positive real A we get from the first convergence theorem for GMRES(m) the residual bound

krjm+i_k 2≤ 1 − λmin(M) 2 λmax(ATA) i/2 krjm_k 2, krjm_k 2≤ 1 − λmin(M) 2 λmax(ATA) m/2 kr(j−1)m_k 2, and therefore krk_k 2≤ 1 − λmin(M) 2 λ_max(AT_A) k /2 kr0_k 2.

(74)

Example

0 200 400 600 800 1000 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES 2 4 8 16 32 64

GMRES, GMRES(m) for −∆ u − 256 u

(75)

Example

For

−∆u − 128uy− 163.84u = 0, h = 1/128,

we get

Type iterations flops

GMRES 190 1.20e9 GMRES(2) stationary GMRES(4) 964 4.39e8 GMRES(8) 656 3.66e8 GMRES(16) 624 5.01e8 GMRES(32) 704 9.24e8 GMRES(64) 704 1.65e9

(76)

Example

0 200 400 600 800 1000 1200 1400 1600 1800 2000 10−6 10−5 10−4 10−3 10−2 10−1 100

GMRES, GMRES(m) for −∆ u − 256 u

y − 819.2 u = 0, h=1/128

(77)

Preconditioning

Again GMRES and GMRES(m) can be preconditioned applying the methods to the system

M−1Ax = M−1b left preconditioning

or to

AM−1y = b, y = Mx right preconditioning

where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.

In the following examples we used M = LU where [L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .

(78)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

gmres without preconditioning, 1.37E9 flops

nofill luinc, 1.01E9+1.12E5 flops

luinc(a,0.1), 3.10E7+4.22E6 flops

luinc(a,0.01), 1.28E7+4.22E6 flops

luinc(a,0.001), 5.26E6+4.22E6 flops

−∆ u − 256 u_y = 0, h=1/128, left preconditioning

(79)

Example

20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

luinc(a,0.1), 2.70E7+4.22E6 flops

luinc(a,0.01), 1.15E7+4.22E6 flops

luinc(a,0.001), 5.27E6+4.22E6 flops

−∆ u − 256 u_y = 0, h=1/128, right preconditioning

(80)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

luinc(a,0.1), 3.74E7+4.22E6 flops

luinc(a,0.01), 1.57E7+4.22E6 flops

luinc(a,0.001), 7.34E6+4.22E6 flops

−∆ u − 256 u_y − 1638.4 u = 0, h=1/128, left preconditioning

(81)

Example

20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

luinc(a,0.1), 3.74E7+4.22E6 flops

luinc(a,0.01), 1.57E7+4.22E6 flops

luinc(a,0.001), 6.28E6+4.22E6 flops

−∆ u − 256 u_y − 1638.4 u = 0, h=1/128, right preconditioning

(82)

Example

50 100 150 200 250 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

GMRES for −∆ u−256u_y−8192u=0 with left (red) and right (green) preconditioning

1.03e8 flops

2.25e9 flops