ITERATIVE PROJECTION METHODS FOR
SPARSE LINEAR SYSTEMS AND
EIGENPROBLEMS
CHAPTER 6 : MINIMIZING METHODS
Heinrich Voss voss@tu-harburg.de
Hamburg University of Technology Institute of Numerical Simulation
CGN Methods
There are several generalizations of CG (and CR) for nonsymmetric linear problems.
The most obvious approach is to apply CG to an equivalent linear systems, which is symmetric and positive definite, e.g.
ATAx = ATb
or
AATy = b, x = ATy .
These methods are calledCGN methodwhere the letter ’N’ suggests that
CGN methods ct.
For ATAx = ATb the k -th CG iterate xk minimizes the functional
kx − x∗k2
ATA= (x − x∗)TATA(x − x∗) = kA(x − x∗)k22= kr k22
in the Krylov space
x0+ Kk(ATr0,ATA)
= x0+span{ATr0, (ATA)ATr0, . . . , (ATA)k −1ATr0}.
This method is calledCGNR methodwhere the letter ’R’ indicates that the
residuum is minimized.
The CGNR method was suggested byHestenes und Stiefel(1952). The
matrix ATA in general is more populated than A, however, the algorithm does
CGNR
The matrix ATA which appears in the development of the CGNR method is
usually not as sparse as the original matrix A. However, it has not to be formed explicitly in the algorithm.
1: r0=b − Ax0
2: d1=ATr0
3: α0= kd1k22
4: for k = 1, 2, . . . until convergence do
5: sk =Adk 6: τk = αk −1/kskk2 2 7: xk =xk −1+ τ kdk 8: rk =rk −1− τ ksk 9: βk =1/αk −1 10: tk =ATrk 11: αk = ktkk22 12: βk = αkβk 13: dk +1=tk+ βkdk 14: end for
CGNR
Cost of CGNR
2 matrix-vector products 2 scalar products 3 _axpy
Storage requirements: 4 vectors
To investigate the convergence of the CGNR method we note that xk =x0+qk −1(ATA)ATr0
for some polynomial qk −1of degree k − 1. Subtracting the exact solution x∗
on both sides and observing r0=Ae0one obtains
ek =pk(ATA)e0, pk(z) = 1 − zqk −1(z),
and since Apk(ATA) = pk(AAT)A and rk =Aek, multiplication with A gives
Convergence of CGNR
Replacing pk by a shifted and scaled Chebyshev polynomial one gets similar
to the convergence result for the CG method
Theorem 6.1
Let A be nonsingular, σ1its largest and σnits smallest singular value, and let
R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds krkk 2≤ 2Rk 1 + R2kkr 0k 2≤ 2 κ2(A) − 1 κ2(A) + 1 k kr0k 2.
Example
1000 2000 3000 4000 5000 6000 7000 8000 10−6 10−5 10−4 10−3 10−2 10−1 100 CG and CGNR for −∆ u = 0, h=1/128 ||xCG||2 ||x CGNR||2Example
100 200 300 400 500 600 700 800 900 1000 1100 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y = 0, h=1/128 ||r CGNR||2Example
0 200 400 600 800 1000 1200 1400 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −163.84 u = 0, h=1/128 ||rCGNR||2Example
0 1000 2000 3000 4000 5000 10−6 10−5 10−4 10−3 10−2 10−1 100 CGNR for −∆ u − 256 u y −1638.4 u = 0, h=1/128 ||rCGNR||2CGNE
For AATy = b, x = ATy , the iterates yk minimize the functional
ky − y∗k2AAT = (y − y∗)AAT(y − y∗) = k(AT(y − y∗)k22= kx − x∗k22
in the Krylov
ATx0+ATKk(r0,ATA)
= ATx0+span{ATr0,AT(ATA)r0, . . . ,AT(ATA)k −1r0}.
This method is calledCGNE methodwhere the letter ’E’ indicates that the
error is minimized.
CGNE
Again AAT has not to be formed explicitly in the algorithm. Moreover, the
method can be implemented in terms of the vectors xk without reference to
yk =ATxk. The cost and storage requirements are the same as for CGNR.
1: r0=b − Ax0
2: d1=ATr0
3: α0= kr0k22
4: for k = 1, 2, . . . until convergence do
5: τk = αk −1/kdkk22 6: xk =xk −1+ τ kdk 7: rk =rk −1− τ kAdk 8: βk =1/αk −1 9: αk = krkk22 10: βk = αkβk 11: dk +1=ATrk+ β kdk 12: end for
Convergence of CGNE
The k -th iterate xk =ATyk satisfies
A−Txk =yk = y0+ ˜qk −1(AAT)(AATy0− b)
= A−Tx0+ ˜qk −1(AAT)r0,
i.e.
xk =x0+ATq˜k −1(AAT)r0.
Subtracting the solution x∗yields
ek =e0+ATq˜k −1(AAT)Ae0, and from ATq˜k −1(AAT)A =: AT k −1X j=0 αj(AAT)jA = k −1 X j=0 αjAT(AAT)jA = k −1 X j=0 αj(ATA)j+1 we finally get ek = ˜pk(ATA)e0 p˜k(0) = 1.
Convergence of CGNE ct.
Replacing ˜pk by a shifted and scaled Chebyshev polynomial one gets
Theorem 6.1a
Let A be nonsingular, σ1its largest and σnits smallest singular value, and let
R := σ1− σn σ1+ σn =κ2(A) − 1 κ2(A) + 1 . Then it holds kekk 2≤ 2Rk 1 + R2kke 0k 2≤ 2 κ2(A) − 1 κ2(A) + 1 k ke0k 2.
CGNR, CGNE
The bounds given in Theorems 6.1 and 6.1a illustrate the main drawback of the CGN methods. In particular, if A is symmetric and positive definite the speed of convergence is slowed down to the one of the steepest descent method.
However, there are special situations where CGNR and CGNE are the
optimal extensions of CG (cf.Nachtigal, Reddy & Treffethen(1992) and
Freund, Golub & Nachtigal(1992)), but usually they are not satisfactory generalizations of the CG method.
As for the CG method the convergence properties of CGNE and CGNR can
be improved considerably by preconditioning. Since the matrices ATA and
AAT are much more populated than the sparse matrix A the incomplete
Preconditioning by incomplete LQ factorization
Let A = LQ be the LQ factorization of A, where L is a lower triangular matrix and Q is an orthogonal matrix. Then
AAT = (LQ)(LQ)T =LQQTLT =LLT,
and LLT is the exact Cholesky factorization of AAT which can be achieved
without forming the matrix AAT.
To obtain a sparse approximationSaad(1988) proposed the incomplete LQ
factorization of A.Analogously to the incomplete Cholesky factorization most of the elements of L and Q are dropped during the factorization process according to some strategy to avoid excessive fill-in.
The LQ factorization can be calculated by applying the Gram-Schmidt process to the rows of A. This approach is known to be unstable if it is used to
orthogonalize a large number of vectors. Notice, however that the rows of Q remain very sparse in the incomplete LQ factorization, and therefore any given row of A will be orthogonal to most of the previous rows of Q. Hence, within incomplete LQ factorization the Gram-Schmidt process is much less susceptible to numerical difficulties.
Incomplete LQ factorization ct.
Details of the implementation of the incomplete LQ factorization and its use as
a preconditioner in CGNE method can be found inSaad(1988) and the book
of Saad on Iterative Methods for Linear Systems.
Similarly the CGNR method can be preconditioned using the incomplete LQ factorization of AT.
Preconditioning by SSOR
Consider the j-elementary step of the Gauß-Seidel method for the linear system
AATy = b
where the j-th component of y is modified such that the j-equation of the system is satisfied.
Let ynew = yold + δjej, then δj ∈ R is chosen such that
0 = (ej)T(b − AATynew) = (ej)T(b − AATyold − δjAATej) ⇒ δj = bj− (ATej)TATyold kATejk2 2 = bj− a 0 jxold k(a0 j)Tk22 , where a0
j denotes the j-th row Of A, and xold = ATyold. The update of x is
given by
xnew = ATynew = AT
yold + δjATej =xold + δj(a0j)T,
SSOR for AA
Ty = b, x = A
Ty
1: Let y be an initial approximation
2: x = ATy 3: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 4: δ = ω(bj − a0jx )/ka0jk22 5: yj =yj+ δ 6: x = x + δ(a0j)T 7: end for Comments
[1.] Since A is nonsingular the matrix AAT is positive definite, and therefore
the SSOR method converges for every ω ∈ (0, 2) if thefor-statement is
repeated. A serious difficulty is to determine the optimum relaxation parameter.
[2.] The matrix AAT can be dense or in general much less sparse than A.
However, the cost of the implementation above depend only on the nonzero structure of A.
CGNE-SSOR
In the following implementation of CGNE with preconditioning by SSOR the call "SSOR-CGNE(r,ω,z)" indicates that one sweep of the SSOR method with parameter ω and initial approximation z = 0 is executed for the linear system AATz = r .
1: r0=b − Ax0
2: SSOR-CGNE(r0, ω,z0)
3: d1=ATz0
4: α0= (z0)Tr0
5: for k = 1, 2, . . . until convergence do
6: τk = αk −1/kdkk22 7: xk =xk −1+ τ kdk 8: rk =rk −1− τ kAdk 9: βk =1/αk −1 10: SSOR-CGNE(rk, ω,zk) 11: αk = (zk)Trk 12: βk = αkβk 13: dk +1=ATzk+ β kdk 14: end for
SSOR for A
TAx = A
Tb
Similarly, if aj denotes the j-th column of A the SSOR method for ATAx = ATb
with relaxation parameter ω reads
1: Let x be an initial approximation
2: z = Ax 3: c = ATb 4: for j = 1, 2, . . . , n, n, n − 1, . . . , 1 do 5: δ = ω(cj− zTaj)/kajk22 6: xj =xj+ δ 7: z = z + δaj 8: end for Comment
The SSOR method above can be applied to nonquadratic matrices A ∈ Rm×n
where m > n and rank(A) = n. In this case it converges to the unique solution of the least squares problem
CGNR-SSOR
In the following implementation of preconditioned CGNR the call
"SSOR-CGNR(r,ω,z)" indicates that one sweep of SSOR with parameter ω
and initial approximation z = 0 is executed for the linear system AATz = r .
1: r0=AT(b − Ax0)
2: SSOR-CGNR(r0, ω,z0)
3: d1=z0
4: α0= (z0)Tr0
5: for k = 1, 2, . . . until convergence do
6: sk =Adk 7: τk = αk −1/kskk22 8: xk =xk −1+ τ kdk 9: rk =rk −1− τ kATsk 10: βk =1/αk −1 11: SSOR-CGNR(rk, ω,zk) 12: αk = (zk)Trk 13: βk = αkβk 14: dk +1=zk+ β kdk 15: end for
Preconditioning
CGNR (and CGNE) can be precondioned not taking advantage of the special structure of the system matrix applying the methods to the system
M−1Ax = M−1b left preconditioning
or to
AM−1y = b, y = Mx right preconditioning
where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.
In the following examples we used M = LU where
[L, U] = luinc(A,000) no fill incomplete LU decomposition
or
[L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .
Example
20 40 60 80 100 120 10−6 10−5 10−4 10−3 10−2 10−1 100CGNR with left precond., −∆ u−256 uy=0, h=1/128
no precond.: 5.877E8 flops
nofill luinc: 1.108E8 + 1.115E5 flops
luinc(a,0.1): 3.414E7 + 4.4225E6 flops
luinc(a,0.01): 1.555E7 + 4.4225E6 flops
luinc(a,0.001): 6.298E6 + 4.4225E6 flops
Example
20 40 60 80 100 120 140 10−6 10−5 10−4 10−3 10−2 10−1 100CGNR with left precond., −∆ u−256 uy−163.84 u=0, h=1/128
no precond.: 6.982E8 flops
nofill luinc: 1.273E8 + 1.115E5 flops
luinc(a,0.1): 4.346E7 + 4.4225E6 flops
luinc(a,0.01): 1.755E7 + 4.4225E6 flops
luinc(a,0.001): 6.298E6 + 4.4225E6 flops
Example
50 100 150 200 250 300 350 400 450 500 550 600 10−6 10−5 10−4 10−3 10−2 10−1 100CGNR with left precond., −∆ u−256 uy−1638.4 u=0, h=1/128
no precond.: 2.711E9 flops
nofill luinc: 5.014E8 + 1.115E5 flops
luinc(a,0.1): 1.934E8 + 4.4225E6 flops luinc(a,0.01): 2.144E8 + 4.4225E6 flops
luinc(a,0.001): 5.665E7 + 4.4225E6 flops
Sparse approximate inverses
For symmetric problems the preconditioner had to be symmetric as well. For nonsymmetric problems this is no longer necessary. We just have to find a
matrix C = M−1such that kAC − Ik ≈ 0 (right preconditioning) or
kCA − Ik ≈ 0 (left preconditioning). Then applying the preconditioner simply amounts in matrix multiplication.
The Frobenius norm is particularly interesting to choose since
kAC − Ik2 F = n X j=1 k(AC − I)ejk2 2.
SPAI ct.
Hence, if cj denotes the j-th column of C, then we have to solve the n
independentleast squares problems kAcj− ejk
2=min!, j = 1, . . . , n.
Solving these problems is equivalent to determining the inverse of A, and will be much too expensive. Therefore, for every j = 1, . . . , n we choose a sparsity
pattern G(j), denote by ˆcj a vector which contains only elements in G(j), and
by ˆAj a matrix whose columns are the columns of A corresponding to G(j) and
whose rows i are such that there exists aik 6= 0 for some k ∈ G(j). Since A is
sparse, ˆAj will be a small matrix, and the resulting least squares problems
kˆAjcˆj− ˆejk2=min! (∗)
Choice of sparsity pattern
Grote & Huckle(1997) proposed an incremental method. Starting from a set
of indices G0(j) (usually corresponding to a diagonal C or to the structure of
A) they solve the least squares problem (*) and then, iteratively, enlarge the set of indices.
Let
kˆA(p)j ˆcj,p− ejk
2=min! (∗∗)
be the least squares problem in the p-th step of the iteration, let cj ∈ Rnbe the
solution of (**) (i.e. ˆcj,paugmented by zero components), and denote by
Choice of sparsity pattern ct.
Let L = {k : rk 6= 0}, and for ` ∈ L let N`= {k : a`k 6= 0, k 6∈ Gp(j)}. Then
the candidates for indices of new non-zero components in the solution vector are chosen in
∪`∈LN`.
For k in this set we consider the problem
kr + µkAekk2= kA(cj+ µkek) −ejk2=min! =⇒ µk = −
rTAek
kAekk2 2
.
It can be shown that the norm of the new residual is kr k2 2− (rTAek)2 kAekk2 2 ,
and that there exist indices such that rTAek 6= 0. From these indices one
Choice of sparsity pattern ct.
This iteration step is repeated until the norm of the residual is smaller than a prescribed bound or until we have reached the maximum storage that is allowed for column j.
Note that putting new indices into Gp(j) will add columns and rows to the
matrix ˆA(p)j . Methods exist to update the QR factorization.
Approximate inverse preconditioning is an active field of research. It is
particularly interesting for parallel machines. Gould & Scott(1995) described
some improvements of the method. A symmetric approximate inverse
Nonsymmetric problems
Desirable are generalizations of CG (or CR) methods which have similar properties as CG without squaring the condition number of the system matrix. Outstanding properties of the CG method:
— based on a short recurrence (low storage requirements) — minimizes error on ascending sequence of Krylov spaces Theorem of Faber & Manteuffel (1984)
Methods with these two properties exist only for a very limited class of matrices.
For 3-term-recurrences only for matrices
A = eiθ(H + σI), H = HH, θ ∈ R, σ ∈ C.
Generalized Conjugate Residuals
Eisenstat, Elman & Schultz(1983) studied theGeneralized Conjugate
Residual Method,GCR methodfor short, which is a direct generalization of
the CR method (and of MINRES) to nonsymmetric systems of linear equations.
By the Gram-Schmidt process a basis {d1, . . . ,dk} of the Krylov space
Kk(r0,A) is constructed which is orthogonal with respect to the scalar product
hx, y iATA=xTATAy .
Since d1, . . . ,dk are ATA–conjugate directions the minimization problem
kx − x∗kATA=min!, x ∈ x0+ Kk(r0,A)
is decoupled into a sequence of line searches along d1, . . . ,dk.
Hence, the following algorithm generates iterates xk which minimize the
Euclidean norm of the residual kb − Ax k2= kx − x∗kATAupon x0+ Kk(r0,A)
GCR method
1: Choose an initial guess x0
2: r0=b − Ax0
3: d1=r0
4: s1=Ad1
5: for k = 1, 2, . . . until convergence do
6: αk = kskk22 7: τk = (rk −1)Tsk/αk 8: xk =xk −1+ τ kdk 9: rk =rk −1− τ ksk 10: dk +1=rk 11: sk +1=Adk +1 12: for j = 1 : k do 13: γk ,j = (sk +1)Tsj/α j 14: dk +1=dk +1− γ k ,jdj 15: sk +1=sk +1− γ k ,jsj 16: end for 17: end for
Generalized Conjugate Residuals
Without further assumptions on the matrix A the GCR method may break
down because dk +1becomes the null vector.
Consider for instance
A = 0 1 −1 0 , b =1 1 , x0=0 0 .
Then at step k = 1 one obtains Ad1=0, and therefore τ
1is undefined.
It can be shown (cf.Elman(1982)) that the CGR method does not break
down, if A is a positive real matrix, i.e. if
M = 1
2(A + A
T)
is positive definite. Hence, for this type of matrices the GCR methods (in exact arithmetic) yields the solution in a finite number of steps.
Comments on GCR
Axelsson(1980) proposed the following method for nonsymmetric linear systems:
For x0, r0, and d1as in the GCR method let
xk = xk −1+ k −1 X j=1 ηk ,jdj rk = b − Axk βk = (Ark)TAdk/kAdkk22 dk +1 = rk− βkdk
where the scalars ηk ,j are chosen such that krkk2in minimal, i.e. from the
linear system of equations
(dj)TAT(Ay − rk −1) =0, j = 1, . . . , k , y =
k −1
X
j=1
ηk ,jdj.
Comments on GCR
The work and storage requirements of the GCR method are prohibitively high.
Vinsome(1976) proposed a method calledORTHOMINwhich is a truncated GCR method and which is considerably less expensive per iteration step. For
an integer m n the search direction dk +1, k > m, is forced to be
ATA–orthogonal only to the last m directions dk, . . . ,dk −m+1, but not to all
preceding directions dj. This method usually is referred toORTHOMIN(m).
Eisenstat, Elman & Schultz(1983) alternatively suggested to restart the GCR
method every m + 1 steps, i.e. the current iterate xj(m+1), j = 0, 1, 2, . . . is
chosen as a new initial guess for the GCR method. The storage requirements
of this method, calledGCR(m), are the same as for ORTHOMIN(m) since m
preceding directions have to be stored, but the work per iteration is smaller,
since on an average only m/2 vectors dj are used to compute the current
ORTHODIR
Young & Jea(1980) studied methods which are closely related to ORTHOMIN.
InORTHOMINthey replaced the minimization problem in the k -th step by
kxk − x∗kZA≤ kx − x∗kZA for every x ∈ x0+ Kk(r0,A)
where Z ∈ Rn×nis an arbitrary matrix such tht ZA is positive definite.
Moreover, they considered an algorithm calledORTHODIRorORTHODIR(m)
for the truncated form, which differs from the GCR method only by the choice of the search directions. These are given by
˜ dk +1=A˜dk − k X j=1 ˜ γk ,jd˜j
where the scalars ˜γk ,j are chosen such that the search directions ˜dj are
ZA-conjugate.
Notice that ORTHODIR can only break down with ˜dk +1=0 if the solution x∗
lies in x0+ K
ORTHORES
Young & Jea(1980) introduced a method calledORTHORESand
ORTHORES(m)for the truncated variant, where the iterates are constructed
by the recurrence relation
xk = γkrk −1rk −1+ k −1
X
j=0
δk ,jxj
and γk and δk ,j are determined such that the residual vectors rj are
semi-orthogonal with respect to Z , i.e.
(Zrk)Trj =0 for j < k .
For the truncated version ORTHORES(m) this requirement must be fulfilled for k − m ≤ j < k
For the choice of Z Young and Jea considered among others
Z = AT, Z = AT(A + AT), and Z = ATM
Generalized Minimal Residual Method (GMRES)
A different approach to the iterative solution of general linear systems which generalizes the MINRES method and which is mathematically equivalent to
the GCR method was introduced bySaad & Schultz(1986). Differently from
the GCR method this method, which is calledGeneralized Minimal RESidual
Method(GMRESfor short) does not break down for general matrices, which
are not positive real.
GMRES constructs a sequence
xk ∈ x0+ Kk(r0,A)
of vectors such that the Euclidean norm of the residuum becomes minimal.
To this end an orthonormal basis q1, . . . ,qk of K
k(r0,A) is determined using
Arnoldi’s method(i.e. the modified Gram-Schmidt process), such that the problem
krkk
2= kb − A(x0+Qky )k2=min!, y ∈ Rk,
Arnoldi method
1: Let q1be an initial vector such that kq1k
2=1. 2: for k = 1, 2, . . . do 3: qk +1=Aqk 4: for j = 1 : k do 5: hjk= (qk +1)Tqj 6: qk +1=qk +1− hjkqj 7: end for 8: hk +1,k= kqk +1k2 9: qk +1=qk +1/hk +1,k 10: end for
GMRES
With Qk = (q1, . . . ,qk)and
˜
Hk = (hj`) ∈ R(k +1)×k such that hj`=0 for j − ` > 2
it holds
AQk =Qk +1H˜k,
and for x = x0+Qky , r = b − Ax and q1=r0/kr0k2
kr k2 = kb − A(x0+Qky )k2= kr0− AQky k2= kr0− Qk +1H˜ky k2 = Qk +1(kr 0k 2e1− ˜Hky ) 2= kr 0k 2e1− ˜Hky 2.
Hence, the problem to minimize the Euclidean norm of the residuum on a Krylov space yields a least squares problem where the system matrix ˜Hk has
GMRES ct.
If Uk ∈ R(k +1)×(k +1)is an orthogonal matrix such that
UkH˜k = Rk 0T , Rk upper triangular,
and if uk is the first column of U
k then it holds kr 0k 2e1− ˜Hky 2 2 = Uk(kr 0k 2e1− ˜Hky ) 2 2 = kr0k 2uk− Rk 0T y 2 2 = kr0k 2(uk1, . . . ,ukk)T − Rky 2 2 +kr0k2 2· |uk +1k |2.
GMRES ct.
Hence, the least squares problem is equivalent to the linear system
Rky = kr0k2 u1k .. . ukk
where Rk is an upper triangular matrix, and the norm of the residuum satisfies
kb − Axkk2= kr0k · |uk +1k |,
i.e. the accuracy of the method, can be monitored without explicitly forming xk
or yk.
If the residual norm is sufficiently small, GMRES is terminated, and only after this final step of the Arnoldi method, the triangular system is solved and
xk =x0+Q
GMRES ct.
Since ˜Hk is an upper Hessenberg matrix, the matrix Uk can be calculated
efficiently using Givens reflections. Let
Gj = Ij−2 0 0 0T cj sj 0T s j −cj ∈ Rj×j, cj2+sj2=1
denote a Givens reflection acting on the (j − 1)-st and j-th component of a vector.
Let U1=G2be such that
U1H˜1= c2 s2 s2 −c2 h11 h21 = q h2 11+h221 0 ! =: ˆ h11 0 .
GMRES ct.
Assume that the orthogonal matrix Uk −1=Gk Gk −1 0 0T 1 · · · · · G2 O O Ik −1 =:Gk Uk −2 0 0T 1
is chosen such that
Uk −1H˜k −1=: ˆ h11 hˆ12 . . . hˆ1,k −1 0 hˆ22 . . . hˆ2,k −1 .. . ... . .. ... 0 0 . . . hˆk −1,k −1 0 0 . . . 0 =: Rk −1 0T ∈ Rk ×(k −1)
GMRES ct.
Since the multiplication of a matrix by Gk +1from the left leaves invariant the
first k − 1 rows one obtains the upper triangular matrix in the k -th step of the method by UkH˜k = Gk +1 Uk −1 0 0T 1 ˜ Hk −1 hk 0T h k +1,k = Ik −1 0 0 0T c k +1 sk +1 0T sk +1 −ck +1 Rk −1 h˜k 0T g k 0T h k +1,k = Rk 0T where Uk −1 h1k .. . hk −1,k hkk =:Uk −1hk =: ˆ h1k .. . ˆ hk −1,k gk =: ˆ hk gk , . . .
GMRES ct.
. . . ,sk +1:= hk +1,k q g2 k +h2k +1,k , ck +1:= gk q g2 k +h2k +1,k .In a similar manner as in the SYMMLQ method the elements ˆhj`can be
obtained along with the Arnoldi process, and the reflections can be applied to
the right hand side kr0k
2e1currently.
Observe however, that differently from the SYMMLQ method the nontrivial
entries cj and sj of Gj have to be stored since they are needed in the
transformation of the k -th column hk of ˜H
k.
The diagonal elements of the triangular matrix Rk are given by
ˆ hjj =
q g2
j +h2j+1,j.Hence, Rk is nonsingular, if the Arnoldi process constructs
k orthogonal vectors qj. GMRES breaks down if and only if hj+1,j =0 for
GMRES algorithm
q1=b − Ax0 z1= kq1k2 q1=q1/z 1 k = 1 while |zk| >tol do qk +1=Aqk for i = 1 : k do hik = (qi)Tqk +1 qk +1=qk +1− hikqi end for hk +1,k = kqk +1k2 qk +1=qk +1/h k +1,k for i = 1 : k − 1 do hi,k hi+1,k = ci+1 si+1 si+1 −ci+1 hi,k hi+1,k end forGMRES algorithm ct.
α =qh2 kk+hk +1,k2 sk +1=hk +1,k/α; ck +1=hkk/α hkk = α zk +1=sk +1zk zk =ck +1zk k = k + 1 end while yk =zk/hkk for i = k − 1 : −1 : 1 do yi = (zi −Pkj=i+1hijyj)/hii end for xk =x0+Pk j=1yjqjCost of GMRES
Cost of the k -th step (neglecting the transformation of ˜Hk to triangular form):
1 matrix-vector product k+1 scalar products k+1 _axpy Determining xm∈ x0+ K m(r0,A) costs m+1 matrix-vector products (m+1)(m+2)+m level-1-operations m+1 vectors to store
Example
20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y = 0, h=1/128Example
20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 163.84 u = 0, h=1/128 ||rGMRES||2Example
20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES for −∆ u − 256 u y − 1638.4 u = 0, h=1/128 ||rGMRES||2Example
200 400 600 800 1000 1200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r GMRES||2 GMRES for −∆ u − 256 u y − 8192 u = 0, h=1/128Theorem 6.2
Let A be positive real, i.e. the symmetric part M := 0.5(A + AT)of A is positive
definite. Then it holds krkk 2≤ 1 − λmin(M) 2 λmax(ATA) k /2 kr0k 2.
Here λmin(·)and λmax(·)denotes the minimal and maximal eigenvalue,
Proof
By construction krkk 2= min q∈Πk,q(0)=1 kq(A)r0k 2,and for every α ∈ R we get with the polynomial ˜q(λ) = (1 + αλ)k
min
q∈Πk,q(0)=1
kq(A)k2≤ k˜q(A)kk2≤ k˜q(A)kk2.
From xTMx ≥ λ
min(M)xTx and xTATAx ≤ λmax(ATA)xTx for every x 6= 0 we
obtain for every α ∈ R
k˜q(A)k22 = max kxk2=1 xT(I + αA)T(I + αA)x = max kxk2=1 (1 + αxT(A + AT)x + α2xTATAx )
Proof ct.
and for α < 0
k˜q(A)k2
2≤ 1 + 2αλmin(M) + α2λmax(ATA).
The quadratic expression on the right hand side attains its minimum for
α = −λmin(M)/λmax(ATA).
Hence, k˜q(A)k2≤ 1 − λmin(M) 2 λmax(ATA) 1/2 ,
Theorem 6.3
Let A = T ΛT−1be diagonalizable with a regular matrix T and a diagonal
matrix Λ. Then it holds krkk 2≤ κ2(T )kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|. If A is normal, then krkk2≤ kr0k2· min q∈Πk,q(0)=1 max λ∈σ(A) |q(λ)|.
Proof
With ˜Πk := {q ∈ Πk : q(0) = 1} it holds krkk 2 = min q∈ ˜Πk kq(A)r0k 2 = min q∈ ˜Πk kq(T ΛT−1)r0k2 = min q∈ ˜Πk kT q(Λ)T−1r0k2 ≤ kT k2· kT−1k2 min q∈ ˜Πk kq(Λ)k2kr0k2 = κ2(T )kr0k2min q∈ ˜Πk max λ∈σ(A) |q(λ)|.If A is normal, then A can be transformed to diagonal form unitarily, and κ2(T ) = 1.
Remark
If A is normal, then (cf.Greenbaum & Gurvits1994) the bound of the speed of
convergence is sharp, i.e. for A there exists an initial residual r0such that the
inequality holds with ’=’. In this case the speed of convergence depends on the spectrum of the matrix.
For nonnormal matricesit was shown byGreenbaum, Ptak and Strakos
(1996), that for every monotonely decreasing sequence of positive real numbers there exists a matrix and an initial vector such that the residuals in the GMRES method attain the elements of the given sequence. The matrix can be determined such that it has predetermined eigenvalues.
Hence,the speed of convergence of the GMRES method is completely
Example
The following example demonstrates that the GMRES may make progress only in the n-th step, although the spectrum may be clustered very well. Hence, the speed of convergence of GMRES for nonnormal matrices is independent of the spectrum of the system matrix.
Let A = 0 1 0 . . . 0 0 0 1 . . . 0 .. . ... ... . .. ... 0 0 0 . . . 1
−a0 −a1 −a2 . . . −an−1
be a Frobenius matrix. Assume that the initial vector is chosen such that r0=e1.
Then Ar0is a multiple of en, A2r0is a linear combination of enand en−1, etc.
All Akr0, k = 1, . . . , n − 1 are orthogonal to r0. Hence, the optimal
approximation in x0+ K
k(r0,A) is xk =x0, k = 1, . . . , n − 1, and only in the
n-th step the solution is arrived. By choosing the ajs suitably every spectrum
Remark
With this result in mind it is easy to construct an example for which CGNR clearly outperforms GMRES:
For A ∈ R10×10as on the last slide where the roots of the polynomial
p(λ) = λ10+ 9 X j=0 ajλj are λj =1 + 0.1 exp(0.2πji), j = 0, . . . , 9,
b = e1= (1, 0, . . . , 0)T and x0=0 GMRES stagnates with x1= · · · =x9=0,
x10=e2, whereas CGNR reduces the Euclidean norm of the residual to
Constructing a problem with given convergence curve
FollowingGreenbaum, Ptak & Strakos(1996) we construct a problem with a
given convergence curve and any prescribed eigenvalues.
Let x0=0 and r0=b, and let W = {w1, . . . ,wn} be an orthonormal basis of
the Krylov space Kn(b, A) such that span{w1, . . . ,wk} = Kk(b, A) for
j = 1, . . . , n.
From the minimization property
krkk2=min{kr0− uk : u ∈ AKk(r0,A)}
it is clear that b can be expanded as b = n X j=1 hb, wjiwj, where |hb, wji| =pkrj−1k2− krjk2, r0=b, krnk = 0.
Constructing a problem with . . . ct.
Given a nonincreasing positive sequence f (0) ≥ f (1) ≥ · · · ≥ f (n − 1) > 0, define f (n) = 0, and
g(k ) = q
f (k − 1)2− f (k )2, k = 1, 2, . . . , n.
The conditions kbk = f (0), krjk = f (j), j = 1, 2, . . . , n − 1 will then be satisfied
if the coordinates of b in the basis W are determined by the prescribed sequence of residual norms
Constructing a problem with . . . ct.
Let Λ = {λ1, . . . , λn}, λj 6= 0, j = 1, . . . , n be a set of points in the complex
plain. Consider the monic polynomial
n Y j=1 (λ − λj) =: λn− n−1 X j=0 αjλj. Clearly, α06= 0.
We construct the matrix A as a representation of a linear operator mapping A : Cn→ Cnwith respect to the standard basis e1, . . . ,en.
Constructing a problem with . . . ct.
Let V = {v1, . . . ,vn} be any orthonormal basis of Cn, let V = [v1, . . . ,vn], and
let b satisfy
VHb = (g(1), . . . , g(n))T.
Note that given b with kbk = f (0), V can be chosen or, alternatively, given V , b can be chosen.
Since g(n) 6= 0, the set of vectors B = {b, v1, . . . ,vn−1} is linearly
independent, and also forms a basis of Cn.
Let B = [b, v1, . . . ,vn−1]. Then the operator A is determined by the equations
Ab := v1
Avj := vj+1, j = 1, . . . , n − 2
Avn−1 := α
Constructing a problem with . . . ct.
The matrix representation of A with respect to the basis B is
AB= 0 . . . 0 α0 1 0 α1 . .. ... ... 1 αn−1
which is the companion matrix corresponding to the set of eigenvalues Λ.
Changing the basis from B to the standard basis {e1, . . . ,en} yields the
desired matrix
Restarted GMRES
The main disadvantage of GMRES is that the cost and the storage requirements grow linearly with the number of iterations.
The usual way to cope with this problem is restarting. After a given number m
of steps the current least squares problem is solved, and the solution xmis
chosen to be the initial vector for another m GMRES steps. This method
calledGMRES(m)is repeated until convergence.
There are no rules for choosing m. If m is chosen too large the method can become too expensive, and if m is chosen too small the convergence can become very slow, or the method can even stagnate without convergence.
Example
For A = 0 1 −1 0 , b = 1 1 , x0=0,x1=x0solves the least squares problem
kAx − bk2=min!, x = x0+ αb.
Convergence of GMRES(m)
For positive real A we get from the first convergence theorem for GMRES(m) the residual bound
krjm+ik 2≤ 1 − λmin(M) 2 λmax(ATA) i/2 krjmk 2, krjmk 2≤ 1 − λmin(M) 2 λmax(ATA) m/2 kr(j−1)mk 2, and therefore krkk 2≤ 1 − λmin(M) 2 λmax(ATA) k /2 kr0k 2.
Example
0 200 400 600 800 1000 1200 10−6 10−5 10−4 10−3 10−2 10−1 100 GMRES 2 4 8 16 32 64GMRES, GMRES(m) for −∆ u − 256 u
Example
For
−∆u − 128uy− 163.84u = 0, h = 1/128,
we get
Type iterations flops
GMRES 190 1.20e9 GMRES(2) stationary GMRES(4) 964 4.39e8 GMRES(8) 656 3.66e8 GMRES(16) 624 5.01e8 GMRES(32) 704 9.24e8 GMRES(64) 704 1.65e9
Example
0 200 400 600 800 1000 1200 1400 1600 1800 2000 10−6 10−5 10−4 10−3 10−2 10−1 100GMRES, GMRES(m) for −∆ u − 256 u
y − 819.2 u = 0, h=1/128
Preconditioning
Again GMRES and GMRES(m) can be preconditioned applying the methods to the system
M−1Ax = M−1b left preconditioning
or to
AM−1y = b, y = Mx right preconditioning
where in both cases M is an approximation to A such that linear systems Mu = c can be solved easily.
In the following examples we used M = LU where [L, U] = luinc(A, τ ) is an incomplete LU factorization of A with threshold τ .
Example
20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100gmres without preconditioning, 1.37E9 flops
nofill luinc, 1.01E9+1.12E5 flops
luinc(a,0.1), 3.10E7+4.22E6 flops
luinc(a,0.01), 1.28E7+4.22E6 flops
luinc(a,0.001), 5.26E6+4.22E6 flops
−∆ u − 256 uy = 0, h=1/128, left preconditioning
Example
20 40 60 80 100 120 140 160 180 200 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100gmres without preconditioning, 1.37E9 flops
nofill luinc, 9.72E8+1.12E5 flops
luinc(a,0.1), 2.70E7+4.22E6 flops
luinc(a,0.01), 1.15E7+4.22E6 flops
luinc(a,0.001), 5.27E6+4.22E6 flops
−∆ u − 256 uy = 0, h=1/128, right preconditioning
Example
20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100gmres without preconditioning, 1.55E9 flops
nofill luinc, 1.20E9+1.12E5 flops
luinc(a,0.1), 3.74E7+4.22E6 flops
luinc(a,0.01), 1.57E7+4.22E6 flops
luinc(a,0.001), 7.34E6+4.22E6 flops
−∆ u − 256 uy − 1638.4 u = 0, h=1/128, left preconditioning
Example
20 40 60 80 100 120 140 160 180 200 220 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100gmres without preconditioning, 1.55E9 flops
nofill luinc, 1.16E9+1.12E5 flops
luinc(a,0.1), 3.74E7+4.22E6 flops
luinc(a,0.01), 1.57E7+4.22E6 flops
luinc(a,0.001), 6.28E6+4.22E6 flops
−∆ u − 256 uy − 1638.4 u = 0, h=1/128, right preconditioning
Example
50 100 150 200 250 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100GMRES for −∆ u−256uy−8192u=0 with left (red) and right (green) preconditioning
1.03e8 flops
2.25e9 flops