Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 8

(1)

ITERATIVE PROJECTION METHODS FOR

SPARSE LINEAR SYSTEMS AND

EIGENPROBLEMS

CHAPTER 8 : QMR METHODS

Heinrich Voss

voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

Two-sided Lanczos method

A different approach to the biconjugate gradient method is based on the followingtwo-sided Lanczosornonsymmetric Lanczosalgorithm which was proposed byLanczos(1950) as a method to transform a general matrix to tridiagonal form.

The two-sided Lanczos process determines basis vectors vj and wj of the Krylov spaces Kk(v0,A) and Kk(w0,AT), respectively (i.e. for “both sides of

the matrix”) such that vj _{and w}j _are_biorthogonal_{, i.e.}

(wj)Tvk =0 for j 6= k .

In the following algorithm the coefficients are chosen such that: (wj)Tvk = δjk.

Other “normalizations” are possible and are in use.

(3)

Two-sided Lanczos method

1: Choose v1_,_w1_{with (v}1₎T_w1_{6= 0; v}0₌_{0; w}0₌_{0; β} 0=0; γ0=0 2: for k = 1, 2, . . . do 3: v = Av˜ k; w = A˜ Twk; 4: αk = ˜vTwk; 5: v = ˜˜ v − αkvk− βk −1vk −1 6: w = ˜˜ w − αkwk− γk −1wk −1 7: γk = k˜v k2 8: if γk ==0then 9: STOP 10: end if 11: vk +1_{= ˜}_{v /γ} k 12: β_k = ˜wT_vk +1 13: if βk ==0then 14: STOP 15: end if 16: wk +1_{= ˜}_{w /β} k 17: end for

(4)

Two-sided Lanczos ct.

With Vk := (v1, . . . ,vk), Wk := (w1, . . . ,wk)and Hk =           α₁ β₁ 0 . . . 0 0 γ₁ α₂ β₂ . . . 0 0 .. . . .. ... ... ... ... . .. ... . .. .._. 0 0 0 . . . αk −1 βk −1 0 0 0 . . . γk −1 αk          

the two-sided Lanczos process can be written as

AVk =VkHk+ γkvk +1(ek)T, ATWk =WkHkT + βkwk +1(ek)T.

The biorthogonality obtains the form

WkTVk =Ik.

(5)

Two-sided Lanczos ct.

From

Kk(v1,A) = span{v1, . . . ,vk}

and

Kk(w1,AT) =span{w1, . . . ,wk}

it follows that the k -th approximation of the BiCG method can be determined from the tridiagonal system

W_kT(A(x0+Vkyk) −b) = Hkyk − WkTr0=0,

(6)

Breakdown

The two-sided Lanczos process can break down for two reasons.

If γk =0, i.e. vk +1=0, then Kk(r0,A) is an invariant subspace of A, and for

v1_{= β} 0r0

AVkyk =VkHkyk =r0= βVke1 ⇐⇒ Vk(Hkyk− β0e1) =0.

Hence, the solution of Ax = b has the representation x0₊_V

kyk, and the

system is solved exactly. This breakdown is calledregular terminationorlucky termination.

Secondly the method can break down with vk +1_{6= 0 and β}

k =0, i.e.

(vk +1₎T_{w = 0. This case is called}_˜ _{serious breakdown}_{. One does not obtain}

biorthogonal bases of the Krylov spaces of dimension k + 1 or higher from the two-sided Lanczos process (with initial vectors v1and w1).

(7)

Two-sided Lanczos ct.

With r0₌_{b − Ax}0_{= β} 0v1and ˜ Hk = Hk γk(ek)T

the residuum of x := x0+Vky is given by

b − Ax = b − A(x0+Vky ) = r0− AVky

= β0v1− Vk +1H˜ky = Vk +1(β0e1− ˜Hky ).

The GMRES solution is obtained minimizing

kb − Axk2= kVk +1(β0e1− ˜Hky )k2.

Observe however, that Vk +1does not have orthonormal columns, and

(8)

Quasi Minimal Residuum Method

We neglect the matrix Vk +1, solve the least square problem

kβ0e1− ˜Hky k2=min!

and set xk ₌_x0₊_V kyk.

This can be interpreted as minimizing the residuum with respect to the transformed norm

kWT

k +1r k2= kWk +1T Vk +1(β0e1− ˜Hky )k2= kβ0e1− ˜Hky k2=min!

Hence, to enable a short recurrence the norm is modified in every single step.

(9)

QMR method

Similarly as in the GMRES method the least squares problem in the QMR method can be solved using Givens reflections.

Similarly as in MINRES xk ₌_x0₊_V

kyk can be updated. We do not have to

store all previous search directions, but only two of them.

The residual norms are not monotonely decreasing, however, magnifications usually appear only in very few iterations and are of small magnitude.

(10)

QMR with 3 term recurrence

r = b − Ax ; β = kr k; v1=r /β0; w1=v1; z1=1

β0=0; γ0=0; v0=0; w0=0; p−1 =0; p−2=0;

for k = 1, 2, . . . until convergence do ˜ v = Avk_{; ˜}_{w = A}T_wk αk = ˜vTwk ˜ v = ˜v − αkvk− βk −1vk −1; ˜ w = ˜w − αkwk − γk −1wk −1 γ_k = k˜v k2; vk +1= ˜v /γk β_k = ˜wT_vk +1_{; w}k +1_{= ˜}_{w /β} k if k > 2 then ρ βk −1 = ck −2 sk −2 sk −2 −ck −2 0 βk −1 ; end if if k > 1 then βk −1 αk = ck −1 sk −1 sk −1 −ck −1 βk −1 αk ; end if

(11)

QMR algorithm ct.

η = q α2 k + γk2 sk = γk/η; ck = αk/η αk = η zk +1=skzk; zk =ckzk pk −1_{= (v}k _{− β} k −1pk −2− ρpk −3)/αk x = x + βzkpk −1 end for

(12)

Cost of QMR

2 matrix-vector products 3 scalar products 9 _axpy

Storage requirements: 10 vectors

(13)

QMR ct.

krk_k

2is not determined in the QMR method.

However, it holds

b − Ax = Vk +1(βe1− ˜Hky ).

If the two-sided Lanczos process is normalized such that kvk_k

2=1 for every

k , then it follows

krkk2≤ kVk +1kSkβ0e1− ˜Hky k2≤

√

m + 1kβ0e1− ˜Hky k2,

and the termination can be based on the moderate overestimation of krk_k

2on

(14)

QMR & GMRES

The norm of the QMR residual can be related to that of the optimal GMRES residual as follows:

Theorem 8.1 If rk

G and rQk denote the GMRES and QMR residual, respectively, then it holds krk

Qk2≤ κ(Vk +1)krGkk2,

where Vk +1is the matrix of basis vectors for Kk +1(r0,A) constructed by the two-sided Lanczos method and κ(Vk +1)denotes its condition number. Proof: For the GMRES residual it holds that

krk Gk2=min y kVk +1(βe 1_{− ˜}_H ky )k ≥ σmin(Vk +1)min y kβe 1_{− ˜}_H ky k2,

where σmin(Vk +1)is the smallest singular value of Vk +1. Combining this with

kr_Qkk2≤ kVk +1k2kβe1− ˜Hky k2 for every y ∈ Rk

gives the desired result.

(15)

QMR & BiCG

Following the work ofCullum & Greenbaum(1996) we establish a relationship between the residual norms in the BiCG and the QMR method.

To this end we first consider the relationship between upper Hessenberg linear systems and least squares problems. Let Hk, k = 1, 2, . . . be a family of

upper Hessenberg matrices where Hk −1is the (k − 1) − by − (k − 1) principal

submatrix of Hk, and let

˜ Hk = Hk hk +1,k(ek)T .

The matrix ˜Hk can be factored in the form ˜Hk =FTR, where F ∈ Rk +1×k +1is

(16)

QMR & BiCG ct.

The factorization ˜Hk =FTR can be performed using plane rotations in the

manner described for the GMRES algorithm:

(Fk. . .F1) ˜Hk =R, where Fj =     Ij−1 cj −sj sj cj Ik −j     .

Note that the first k − 1 sines and cosines sj, cj, j = 1, . . . , k − 1 are those

used in the factorization of ˜Hk −1.

Let β > 0 be given, and assume that Hk is nonsingular. Let ˜yk be the solution

of the linear system Hky = βe1, let yk denote the solution of the least squares

problem k ˜Hky − βe1k2=min, and let

˜

vk = ˜Hky˜k− βe1 and vk = ˜Hkyk − βe1.

(17)

Lemma 8.2

Proof:The least squares problem with ˜Hk can be written in the form

min y k ˜Hky − βe 1_{k = min} y kF ( ˜Hky − βy 1_{)k =}_min y kRy − βFe 1_k,

and the solution yk is determined by solving the triangular system corresponding to the leading m components of Ry − βFe1_.

The remainder is therefore zero except for the last component, which is just the last entry of

−βFe1_{= −βF}

k. . .F1e1 i.e. − βs1. . .sk.

(18)

Proof ct.

For ˜yk _{= βH}−1

k e1we have

˜

vk = β ˜HkH_k−1e1− βe1

which is zero except for the last entry, and this one is βhk +1,ktimes the (m, 1)

entry of H_k−1.

Hk can be factored in the form ˜FTR, where ˜˜ F = ˜Fk −1. . . ˜F1, and ˜Fj is the

j-by-j principle submatrix of Fj.

The matrix ˜Hk after applying the first k − 1 plane rotations has the form

(Fk −1. . .F1) ˜Hk =        ∗ ∗ . . . ∗ ∗ . . . ∗ . .. ... r h       

where r is the (k , k )-entry of ˜R and h = hk +1,k.

(19)

Proof ct.

The k -th rotation is chosen such to annihilate the nonzero entry in the last row: ck = r √ r2₊_h2, sk = − h √ r2₊_h2.

Since H_k−1= ˜R−1F , the (k , 1)-entry of this is˜ 1_r times the (k , 1)-entry of ˜

F = ˜Fk −1. . . ˜F1, and this is just s1. . .sk −1.

It follows that the nonzero entry of ˜vk _{is β}hk +1,k

r s1. . .sk −1. Finally, using the

fact that |sk/ck| = |h/r | = |hk +1,k/r | we obtain the second equality in (1).

From equation (1) it is clear that kvk_k kvk −1_k = |sk| and k˜vk_k kvk_k = 1 |ck| .

(20)

Quasi-residual

An immediate consequence of Lemma 8.2 is the following relationship between the BiCG residual and the quantity

z_Qk = βe1− ˜HkyQk,

which is related to the residual rk

Q of the QMR algorithm:

r_Qk =Vk +1zQk, krQkk ≤

√

m + 1kz_Qkk.

We will refer to zk

Q as the QMRquasi-residual, the vector whose norm is

actually minimized in the QMR algorithm.

(21)

Theorem 8.3

Assume that the Lanczos vectors at step 1 through m are defined and that the triangular matrix generated by the Lanczos algorithm at step m is nonsingular. Then The BiCG residual rk

B and the QMR quasi-residual zQk are related by kr_Bkk ≤ kz k Qk q 1 − (kzk Qk/kz k −1 Q k)2 . Proof: From AVk =Vk +1H˜k, xk =x0+Vkyk and Hkyk = βe1

it follows that the BiCG residual can be written as

r_Bk =r0− AVkyBk = r0− Vk +1H˜kyBk =Vk +1(βe1− β ˜HkHk−1e 1_).

The quantity in parentheses has only one nonzero entry (in its last component), and from kvk −1_{k = 1 we get}

kr_Bkk = kβe1− β ˜HkHk−1e 1_k.

(22)

Discussion

In most cases, the quasi-residual norms and the actual residual norms in the QMR algorithm are of the same order of magnitude:

σmin(Vk +1)kzQkk ≤ krQkk ≤

√

m + 1kz_Qkk.

Theorem 8.3 shows that if the QMR quasi-residual norm is reduced by a significant factor at step m, then the BiCG residual norm will be approximately equal to the QMR quasi-residual norm at step m, since the denominator in the right hand side will be close to 1.

If the QMR quasi-residual norm remains almost constant, then the denominator will be close to 0, and the BiCG residual norm will be much larger.

Note that the graph of the QMR quasi-residual norm must be very flat before the BiCG residual norm is orders-of-magnitude larger. Peaks in the BiCG residual norm always correspond to plateaus of the QMR quasi-residual norm.

(23)

(24)

Preconditioning

QMR can be preconditioned as before solving the linear system M−1Ax = M−1b or AM−1y = b, x = M−1y ,

where M ≈ A and linear systems with coefficient matrix M can be solved inexpensively.

The following examples demonstrate the efficiency of left preconditioning.

(25)

Example

50 100 150 200 250 300 350 400 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 QMR for −∆ u −256 u y = 0, h=1/128

without preconditioning, 3.668E8 flops

nofill luinc: 6.836E7+1.12E5 flops

luinc(a,0.1): 3.360E7+4.23E6 flops

luinc(a,0.01): 1.864E7+4.23E6 flops luinc(a,0.001): 9.784E6+4.225E6 flops

(26)

Example

50 100 150 200 250 300 350 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 QMR for −∆ u −256 u y − 163.84 u = 0, h=1/128

without preconditioning, 3.130E8 flops

nofill luinc: 7.463E7+1.12E5 flops luinc(a,0.1): 3.624E7+4.23E6 flops

luinc(a,0.01): 2.140E7+4.23E6 flops

luinc(a,0.001): 1.129E7+4.23E6 flops

(27)

Example

10 20 30 40 50 60 70 80 90 100 110 120 10−10 10−8 10−6 10−4 10−2 100 QMR for −∆ u −256 u y − 1638.4 u = 0, h=1/128 without preconditioning

nofill luinc: 8.968E7+1.12E5 flops

luinc(a,0.1): 4.019E7+4.23E6 flops

luinc(a,0.01): 2.140E7+4.23E6 flops

(28)

Two-term recurrence

In addition to the two-sided Lanczos process with 3–term recurrence a coupled 2–term Lanczos process was introduced byFreund & Nachtigal (1994) to determine biorthogonal bases {vj_{} and {w}j_{}. In this case additional}

bases {pj_{} of K}

k(v1,A) and {qj} of Kk(w1,AT)are determined.

Let Pk = [p1, . . . ,pk], Qk = [q1, . . . ,qk]. We want to recover from these

matrices the Lanczos bases vj_{, w}j_{, j = 1, . . . , k . That is, we have to find}

Rk, ˜Rk ∈ Rk ×k und L, ˜L ∈ Rk +1×k such that

Vk =PkRk and Wk =QkR˜k

APk =Vk +1Lk and ATQk =Qk +1˜Lk

From span{v1_{, . . . ,}_vj_{} = span{p}1_{, . . . ,}_pj_{} = K}

j(v1,A) for j = 1, . . . , k it

follows, that Rk (for analogous reasons ˜Rk) is an upper triangular matrix, and

Apj =A j X i=1 r_ij(−1)vi= j X i=1 r_ij(−1)Avi = j+1 X i=1 ρijvi

demonstrates that L (and correspondingly ˜L) is an upper Hessenberg matrix.

(29)

Two-term recurrence ct.

Let `k _{be the last column of L. From W}T

k Vk =Ik we get

Apk =Vk +1`k and VkWkTApk =VkWkTVk +1`k =Vk[Ik,0]`k,

and assuming `k +1,k =1 which is just a scaling of pk it follows that

vk +1=Apk − VkWkTApk =Apk− VkR˜kTQkTApk,

and analogously

wk +1=ATqk− WkVkTATqk =ATqk − WkRTkPkTATqk.

These two equations demonstrate, that QT

kApk determines the length of the

recurrences for the Lanczos vectors vk _{and w}k_.

To obtain a short recurrence for vk +1_{and w}k +1_{we require A-orthogonality of}

Pk and Qk:

(30)

Two-term recurrence ct.

From vk = k X j=1 rjkpj ⇒ pk = 1 rkk vk− k −1 X j=1 rjkpj it follows for i = 1, . . . , k − 2 0 = (qi)TApk = 1 rkk qiAvk − k −1 X j=1 rjkpj = 1 rkk (qi)TAvk− 1 rkk k −1 X j=1 rjk(qi)TApk = 1 rkk (Aqi)Tvk− rik rkk dk. Aqi _{∈ K}

i+1(v1,A) implies (Aqi)Tvk =0, from which we obtain rik=0 for

i = 1, . . . , k − 2.

Hence, R (and analogously ˜R) is a bidiagonal upper triangular matrix.

(31)

Two-term recurrence ct.

pk ₌_vk_{+ αp}k −1_{, and multiplying by (q}k −1₎T_{A from the left yields}

0 = (qk −1)TApk = (qk −1)TAvk+ α(qk −1)TApk −1 i.e. pk =vk − pk −1 (q k −1₎T_Avk (qk −1₎T_Apk −1. Analogously, we obtain qk =wk− qk −1(p k −1₎T_AT_wk (qk −1₎T_Apk −1.

(32)

Two-term recurrence ct.

From QkAPk =Dk and

vk +1=Apk− VkR˜kTQkTApk,

it follows vk +1=Apk+ αvk for some α ∈ R, and (wk)Tvk +1=0 yields vk +1=Apk− vk(wk)TApk (wk₎T_v k , and similarly wk +1=ATqk− wk(wk)TATqk (wk₎T_vk .

Hence, Lk and ˜Lk are lower bidiagonal (k + 1) × k Hessenberg matrices.

(33)

Two-term recurrence ct.

In conclusion, we obtain the following coupled two-term recurrence pk = vk − pk −1 (qk −1)TAvk (qk −1₎T_Apk −1 qk = wk− qk −1(pk −1)TATwk (qk −1₎T_Apk −1 vk +1 ₌ _Apk_{− v}k(wk)TApk (wk₎T_v k wk +1 = ATqk− wk(w k₎T_AT_qk (wk₎T_vk . xk =x0+Pkyk, yk ∈ Rk implies rk =r0− APkyk =r0− Vk +1Lkyk =Vk +1(e1− Lkyk),

and the QMR method can be performed by

(34)

Properties of 2-term QMR

Work and storage requirements are low and comparable to the original implementation.

Direction vectors used to update xk _{also have a two-term recurrence.}

Numerical examples indicate that the numerical properties are better. Typically, κ(Lk) < κ(Hk), and sometimes κ(Lk) κ(Hk)

(35)

Example

0 50 100 150 200 250 300 10−6 10−5 10−4 10−3 10−2 10−1 100 QMR for −∆ u − 256 u y = 0, h=1/128 ||r_{QMR,3 Term}||₂ ||r QMR,2 Term||2

(36)

Example

0 50 100 150 200 250 10−6 10−5 10−4 10−3 10−2 10−1 100 ||r QMR,2 Term||2 ||r QMR,3 Term||2 QMR for −∆ u − 256 u y − 163.84 u = 0, h=1/128

(37)

Example

0 500 1000 1500 10−6 10−5 10−4 10−3 10−2 10−1 100 QMR for −∆ u − 256 u y − 1638.4 u = 0, h=1/128 ||r QMR,3 Term||2 ||r QMR,2 Term||2

(38)

Look–ahead

The look-ahead Lanczos process prevents possible break downs of the classical Lanczos method by relaxing the biorthogonality condition.

Look–ahead Lanczos methods were introduced byParlett, Taylor & Liu(1985) solving eigenvalue problems numerically; in connection with QMR methods for sparse linear systems they were studied byFreund, Gutknecht & Nachtigal (1993,1994).

It fills the gap between regular vectors vkj _{and v}kj+1_{(and between w}kj _and

wkj+1_{) such that k}

j+1>kj+1 adding vectors

vk ∈ Kk(v1,A) \ Kk −1(v1,A), wk ∈ Kk(w1,AT) \ Kk −1(w1,AT),

k = kj, . . . ,kj+1− 1,.

Theseinnervectors can be chosen such that the vectors vj and wk from different blocks are biorthogonal, where the vectors vi_{, i = k}

j, . . . ,kj+1− 1, are

referred to as the j-th block.

(39)

Two-sided Lanczos method

1: Choose v1_,_w1_{with (v}1₎T_w1_{6= 0; v}0₌_{0; w}0₌_{0; β} 0=0; γ0=0 2: for k = 1, 2, . . . do 3: v = Av˜ k; w = A˜ Twk; 4: αk = ˜vTwk; 5: v = ˜˜ v − αkvk− βk −1vk −1 6: w = ˜˜ w − αkwk− γk −1wk −1 7: γk = k˜v k2 8: if γk ==0then 9: STOP 10: end if 11: vk +1_{= ˜}_{v /γ} k 12: β_k = ˜wT_vk +1 13: if βk ==0then 14: STOP 15: end if 16: wk +1_{= ˜}_{w /β} k 17: end for

(40)

Look–ahead ct.

Basic idea:

Compute pairs ofinner vectors

vj and wj, j = k + 1, . . . , k0 that satisfy a relaxed biorthogonality condition

(wi)Tvj = (wj)Tvi =0, i = 1, . . . , k − 1 until it is safe to compute a next pair of regular vectors vk0₊₁

and wk0₊₁

. The look–ahead Lanczos method generates blocks of vectors of length k0₊_{1 − k}

V`= [vk vk +1 . . . vk0], W`= [wk wk +1 . . . wk0] (k0=k for the standard Lanczos algorithm) such that

[v1v2 . . . vk] = [V(1)V(2) . . .V(`)], [w1w2 . . . wk] = [W(1)W(2) . . .W(`)]. The biorthogonality requirement for the standard Lanczos method is replaced by the block biorthogonality

(W(j))TV(`)= (W(`))TV(j)=0, j = 1, . . . , ` − 1, and Hk =WkTAVk becomes block-tridiagonal.

(41)

Look ahead Lanczos method

Choose v1_,_w1_{with kv}1_k 2=1; kw1k2=1 ˜ V1=v1; ˜W1=w1; ˜D1:= ˜W1TV˜1 ˜ V0= ∅; ˜W0= ∅; v0=0; w0=0 k1=1; p = 1; ρ1=1; ξ1=1 for k = 1, 2, . . . do if det ˜Dp6= 0 then ˜ vk +1₌_Avk_{− ˜}_V pD˜p−1W˜pTAvk− ˜Vp−1D˜p−1−1 W˜p−1T Avk ˜ wk +1=ATwk − ˜WpD˜p−1V˜pTATwk − ˜Wp−1D˜_p−1−1 V˜p−1T ATwk kp+1=k + 1; p = p + 1; ˜Vp= ∅; ˜Wp= ∅ else ˜ vk +1₌_Avk_{− ζ} kvk − (ηk/ρk)vk −1− ˜Vp−1D˜p−1−1 W˜p−1T Avk ˜ wk +1=ATwk − ζkwk − (ηk/ρk)wk −1− ˜Wp−1D˜−1p−1V˜p−1T ATwk end if

(42)

Look ahead Lanczos ct.

ρ_{k +1}= k˜vk +1_k 2 ξ_{k +1}= k ˜wk +1_k 2 if ρk +1ξk +1==0then STOP end if vk +1_{= ˜}_vk +1_/ρ k +1 wk +1_{= ˜}_wk +1_/ξ k +1 ˜ Vp= ( ˜Vp,vk +1) ˜ Wp= ( ˜Wp,wk +1) ˜ Dp= ˜WpTV˜p end for

(43)

Remarks

1. ζk and ηk are coefficients which (with the exception of ηkj=0) can be

chosen arbitrarily. Since the blocks, which are built in the look-ahead Lanczos method, are usually small (Freund, Gutknecht & Nachtigalreport that the maximum size that they observed in their numerical examples was 4) the choice of ζk and ηk is not of great influence on the total iteration.

2. In the algorithm above inner vectors are only determined if a serious breakdown would appear in exact arithmetic. In practice a look-ahead Lanczos algorithm must also be able to handle near break-downs, i.e. ((wk₎T_vk _{≈ 0, but v}k _{6≈ 0, w}k _{6≈ 0).}

To check the regularity of ˜Dpin QMRPACK the minimal singular value of ˜Dpis

monitored.

3. If only regular steps are performed then the look-ahead Lanczos process reduces to the two-sided Lanczos method.

(44)

Two-term recurrence

For the coupled 2-term recurrences breakdown is possible not only since (wk)Tvk ≈ 0, but also since dk = (qk)TApk ≈ 0.

The remedy: Look-ahead in both sequences

There exists (public domain) FORTRAN implementation ofFreund & Nachtigal(1993).

(45)

TFQMR

There exists atranspose free QMR method(Freund(1993)).

The name is a little misleading. Actually, TFQMR is not a transpose free variant of the QMR method, but a QMR variant of the transpose free CGS method.

(46)

TFQMR ct.

The CGS method is given by

1: r0₌_{b − Ax}0_{; d}1₌_r0_{; u}0₌_r0

2: Choose ˜r0_{with α}

0= (˜r0)Tr06= 0

3: for k = 1, 2, . . . until convergence do

4: sk ₌_Adk 5: γk = (˜r0)Tsk 6: τk = αk −1/γk 7: qk ₌_uk −1_{− τ} ksk 8: wk =uk −1+qk 9: xk =xk −1+ τkwk 10: rk =rk −1− τkAwk 11: βk =1/αk −1; αk = (˜r0)Trk; βk = αkβk 12: uk =rk_{+ β} kqk 13: dk +1₌_uk _{+ β} k(qk+ βkdk) 14: end for

(47)

TFQMR ct.

The CGS algorithm can be interpreted in the following way: xk _{is updated in}

two steps, xk +1/2₌_xk_{+ τ}

kuk and xk +1=xk +1/2+ τkqk corresponding to the

increase of the degree of the residual polynomial in each step by 2. To avoid indices which are multiples of 1/2 we double the indices in the CGS method. Then the statements in the for-loop obtain the following form:

1: τ2k = (˜r0)Tr2k/(˜r0)TAd2k 2: q2k =u2k− τ2kAd2k 3: x2k +2=x2k + τ2k(u2k+q2k) 4: r2k +2=r2k− τ2kA(u2k+q2k) 5: β2k = (˜r0)Tr2k +2/(˜r0)Tr2k 6: u2k +2=r2k +2_{+ β} 2kq2k; 7: d2k +2₌_u2k +2_{+ β} 2k(q2k+ β2kd2k)

(48)

TFQMR ct.

x2k +2_{is updated in two steps:}

x2k +1 = x2k+ τ2ku2k;

x2k +2 = x2k +1+ τ_2kq2k;

Putting u2k +1_:=_q2k _{and τ}

2k +1:= τ2k both statements read

xk =xk −1+ τk −1uk −1.

For even m this is exactly the approximation from the CGS method, for odd m these approximations of the solution do not appear in the CGS method.

(49)

TFQMR ct.

With the matrix

Uk = (u0, . . . ,uk −1)

and the vector

zk = (τ0, τ1, . . . , τk −1)T

the approximation xk can be written as

xk =x0+Ukzk =xk −1+ τk −1uk −1,

and the residuum is

rk =r0− AUkzk =rk −1− τk −1Auk −1.

The last equation yields

Auk = 1 τk

(50)

TFQMR ct.

Hence, AUk =Rk +1B˜k with Rk +1= (r0,r1, . . . ,rk) and ˜ Bk =          1 0 . . . 0 −1 1 . . . 0 0 −1 1 . . . 0 .. . ... . .. ... ... 0 −1 1 0 0 . . . −1          diag 1 τ0 , 1 τ1 , . . . , 1 τk −1 ,

and the residuum is

rk =r0− AUkzk =Rk +1(e1− ˜Bkzk).

(51)

TFQMR ct.

If the columns of Rk +1are scaled by the diagonal matrix

∆_{k +1}=diag{δ0, . . . , δk}, then one obtains

rk =Rk +1∆−1k +1∆k +1(e1− ˜Bkzk) =Rk +1∆−1k −1(δ0e1− ˜Hk)

with ˜Hk := ∆k +1B˜k.

It is easily seen that the approximations x2k _{of the CGS method are given by}

x2k =x0+U2kH2k−1(δ0e1),

(52)

TFQMR ct.

Solving (similarly as in the transition from the BiCG to the GMRES method) the least squares problem

kδe1_{− ˜}_H

kzkk = min!,

then one obtains the approximation

xk =x0+Ukzk

of theTFQMRmethod.

To implement this method we still have to deduce how to update xk for otherwise all previous vectors uk _{would have to be stored. This can be found}

in

Saad: Iterative Methods for Sparse Linear Systems.

(53)

TFQMR algorithm

r = b − Ax ; ˜r = r ; w = r ; u = r ; v = Au; d = 0 τ = kr k2; θ = 0; η = 0; ρ = ˜rTr ; z0=Au for k = 1, 2, . . . do α1= ρ/(vT˜r ); α0= α1 u0=u; u = u − α0v w = w − α0z0 d = u0+ θ2ηd /α0 θ = kw k2/τ; c = 1/ √ 1 + θ2_{; τ = τ θc; η = c}2_α 0 x = x + ηd z1₌_Au w = w − α1z1 d = u + θ2_{ηd /α} 1 θ = kw k2/τ; c = 1/ √ 1 + θ2_{; τ = τ θc; η = c}2_α 1 x = x + ηd β =1/ρ; ρ = ˜rT_{w ; β = βρ} u = w + βu z0₌_Au v = z0_{+ β(z}1_{+ βv )} end for

(54)

Cost of TFQMR

Storage requirements: 10 Vectors

(55)

(56)

Example

(57)

(58)

QMRBiCGStab

In the same way as the QMR variant TFQMR was deduced from the CGS methodChan, Gallopoulos, Simoncini, Szeto und Tong(1994) derived the QMR variantQMRBiCGStabfrom the BiCGStab method.

(59)

QMRBiCGStab algorithm

1: r = b − Ax ; ˜r = r ; u = r ; v = Au; d = 0

2: τ = kr k2; ρ = ˜rTr ; θ = 0; η = 0

3: for k = 1, 2, . . . until convergence do

4: α = ρ/(˜rTv ); s = r − αv ; 5: θ = ksk˜ 2/τ; c = 1/ p 1 + ˜θ2 6: d = u + θ2ηd /α 7: η =c2_{α; x = x + ηd} 8: τ = τ ˜θc 9: w = As 10: ω =wT_s/(wT_{w )} 11: r = s − ωw 12: θ = kr k2/τ; c = 1/ √ 1 + θ2 13: d = s + ˜θ2_{ηd /ω} 14: η =c2_ω 15: x = x + ηd 16: τ = τ θc; β = 1/ρ; ρ = ˜rT_{r ; β = βρα/ω} 17: u = r + β(u − ωv ) 18: v = Au 19: end for

(60)

Cost of QMRBiCGStab

Storage requirements: 8 vectors

(61)

Example

0 20 40 60 80 100 120 140 160 180 10−6 10−5 10−4 10−3 10−2 10−1 100

TFQMR and QMRBCGStab for −∆ u − 256 u y = 0, h=1/128

||r TFQMR||2

||r QMRBCGStab||2

(62)

Example

0 50 100 150 200 10−6 10−5 10−4 10−3 10−2 10−1 100

TFQMR and QMRBCGStab for −∆ u − 256 u

y − 163.84 u = 0, h=1/128

||r_TFQMR||₂ ||r

QMRBCGStab||2

(63)

Example

0 50 100 150 200 250 300 350 400 450 500 10−6 10−5 10−4 10−3 10−2 10−1 100

TFQMR and QMRBCGStab for −∆ u − 256 u

y − 1638.4 u = 0, h=1/128

||r QMRBCGStab||2

||r TFQMR||2