Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 10

(1)

ITERATIVE PROJECTION METHODS FOR

SPARSE LINEAR SYSTEMS AND

EIGENPROBLEMS

CHAPTER 10 : KRYLOV SUBSPACE METHODS

Heinrich Voss

voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

Lanczos and Arnoldi Method

We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for

symmetric matrices by the Lanczos method.

Along with the bases one obtains the orthogonal projection Hm:=V_mTAVm (Hm:=V_mHAVmin the complex case)

of A onto Km(v1_,_{A) which is a Hessenberg matrix for general matrix A, and} triangular for symmetric (Hermitean) A.

In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),

respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)

Tm=WmTAVm.

(3)

Lanczos and Arnoldi Method

Tm=WmTAVm.

Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.

(4)

Lanczos and Arnoldi Method

Tm=WmTAVm.

(5)

Lanczos and Arnoldi Method

Tm=WmTAVm.

Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.

(6)

Arnoldi method; compact form

AVm=VmHm+hm+1,mvm+1eT_m V_mTVm=Im, VmTvm+1 =0, V_mTAVm=Hm=           h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... .._. . .. . .. .._. hm,m−1 hmm          

The eigenvalue problem

Hms = θs can be solved inexpensively by the QR algorithm.

(7)

Arnoldi method; compact form

AVm=VmHm+hm+1,mvm+1eT_m V_mTVm=Im, VmTvm+1 =0, V_mTAVm=Hm=           h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... .._. . .. . .. .._. hm,m−1 hmm          

The eigenvalue problem

Hms = θs can be solved inexpensively by the QR algorithm.

(8)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j.

Then vj+1₌_{0, and therefore}

Avj = j X

k =1

hkjvk ∈ Kj(A, v1).

For i < j it holds by construction

Avi = i+1 X

k =1

hkivk ∈ Kj(A, v1).

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every

eigenvalue θ(j)_i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.

(9)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1₌_{0, and therefore}

Avj = j X

k =1

Avi = i+1 X

k =1

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every

eigenvalue θ(j)_i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.

(10)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1₌_{0, and therefore}

Avj = j X

k =1

Avi = i+1 X

k =1

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every (j)

(11)

Error indicator

If θ_i(m)are the Ritz values (eigenvalues of Hm), s(m)_i the corresponding eigenvectors, and x_i(m)=Vms_i(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θ_i(m)I)x_i(m)=hm+1,mvm+1e_mTs_i(m), which implies k(A − θ_i(m)I)x_i(m)k₂=hm+1,m|s (m) m,i|.

If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that

|˜λ − θ(m)_i | ≤ hm+1,m|sm,i(m)|.

Notice, that this error bound can be computet without determining the Ritz vector x_i(m)=Vms(m)_i .

If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)_m,i| is used as an error indicator.

(12)

Error indicator

|˜λ − θ(m)_i | ≤ hm+1,m|sm,i(m)|.

(13)

Error indicator

|˜λ − θ(m)_i | ≤ hm+1,m|sm,i(m)|.

(14)

Error indicator

|˜λ − θ(m)_i | ≤ hm+1,m|sm,i(m)|.

(15)

Arnoldi method

1: choose initial vector v1with kv1k = 1, V1= [v1]

2: compute w = Av1, h = (v1)Tw , r = w − v1h, H1= [h], β = kr k2 3: for j = 1, 2, . . . do 4: vj+1=r /β 5: Vj+1= [Vj,vj+1], Hˆj = Hj βe_jT 6: w = Avj+1 7: h = V_j+1T w , r = w − Vj+1h 8: if kr k2< ηkhk2then 9: s = VT j+1r , r = r − Vj+1s 10: h = h + s 11: end if 12: Hj+1= [ ˆHj,h], β = kr k2

13: compute approximate eigenvalues of Hj+1

14: test for convergence

15: end for

(16)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1_{by A, at the cost of 2Nz}_, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj _{are needed when} determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.

(17)

Cost

We need to keep m vectors of length n plus an m × m Hessenberg matrix.

For the arithmetic costs, we need to multiply vj+1_{by A, at the cost of 2Nz}_, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

(18)

Cost

(19)

Cost

(20)

Cost

In the symmetric case only the last two basis vectors vj _{are needed when} determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the

(21)

Convergence

The Arnoldi method is a generalization of the power method where

approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.

One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ C.

Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.

(22)

Convergence

(23)

Convergence

(24)

Convergence

(25)

Example

Eigenvalues (blue plus) of a random tridiagonal (100, 100) matrix and approximations (red circle) after 10 steps of Arnoldi:

(26)

Lanczos algorithm

We analyze the convergence of the Lanczos method.

1: Choose initial vector v1_{with kv}1_{k = 1}

2: Set v0₌_{0; β0}₌₀ 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1_/β j 9: end for

The matrix of the projected problem then is the tridiagonal matrix

Tm=         α1 β1 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm         .

(27)

Lanczos algorithm

We analyze the convergence of the Lanczos method.

2: Set v0₌_{0; β0}₌₀ 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1_/β j 9: end for

The matrix of the projected problem then is the tridiagonal matrix

Tm=         α₁ β₁ 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm         .

(28)

Monotonicity

Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:

λ1≤ λ2≤ · · · ≤ λn, θ(m)₁ ≤ θ₂(m)≤ · · · ≤ θm(m),

Then the minmax principle yields (S denotes a subspace of Cm_{and ˜}_{S a} subspace of Cn₎

θ(m)_j = min dim S=jmaxy ∈S

yH_Tmy

yH_y =_{dim S=j}min max_{y ∈S} yH_VH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xH_Ax xH_x ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xH_Ax xH_x = θ(m+1)_j ≥ min dim ˜S=j max x ∈˜S xH_Ax xH_x = λj.

Each (finite) sequence {θ_j(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.

Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.

(29)

Monotonicity

λ1≤ λ2≤ · · · ≤ λn, θ(m)₁ ≤ θ₂(m)≤ · · · ≤ θm(m),

yH_Tmy

Each (finite) sequence {θ_j(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.

Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.

(30)

Monotonicity

λ1≤ λ2≤ · · · ≤ λn, θ(m)₁ ≤ θ₂(m)≤ · · · ≤ θm(m),

yH_Tmy

Each (finite) sequence {θ_j(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λ.

(31)

Example

(32)

Convergence

Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors

corresponding to the eigenvalues λi.

LEMMA 10.1

Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If

Piv16= 0 then

tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yi_k 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 1_{6= 0} 0 otherwise

(33)

Convergence

Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors

corresponding to the eigenvalues λi. LEMMA 10.1

Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If

Piv16= 0 then

tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yi_k 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 1_{6= 0} 0 otherwise

(34)

Proof

The Krylov space Km(v1_,_{A) consists of all vectors which can be written as} x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.

With the orthogonal decomposition

x = q(A)v1=q(A)Piv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui₎_{between x and u}i

tan δ(x , ui_{) =} kq(A)(I − Pi)v1k2 kq(A)Piv1k2 =kq(A)y i_k 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)

(35)

Proof

With the orthogonal decomposition x = q(A)v1₌_q(A)P

iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui₎_{between x and u}i

tan δ(x , ui) = kq(A)(I − Pi)v 1_k 2 kq(A)Piv1k2 = kq(A)y i_k 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 ,

and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)

from which we get the statement by minimizing over all x ∈ K(v1_,_A).

(36)

Proof

With the orthogonal decomposition x = q(A)v1₌_q(A)P

iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui₎_{between x and u}i

tan δ(x , ui) = kq(A)(I − Pi)v 1_k 2 kq(A)Piv1k2 = kq(A)y i_k 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik₂tan δ(v1,ui)

(37)

Convergence ct.

Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma.

We already have THEOREM 10.2

Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min

p∈Πm,p(γ)=1

max

t∈[α,β]

|p(t)|

has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α

(38)

Convergence ct.

Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma. We already have

THEOREM 10.2

Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min

p∈Πm,p(γ)=1

max

t∈[α,β]

|p(t)|

has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α

(39)

THEOREM 10.3

The angle δ(ui_{, K}

m(v1,A)) between the exact eigenvector ui and the m-th

Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λ_n− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)

In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(40)

THEOREM 10.3

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 .

Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(41)

THEOREM 10.3

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(42)

Proof

We consider the case i = 1 first.

Expanding the vector yi _{in the basis {u}i_{} of eigenvectors yields} yi = n X j=1 α_juj, where n X j=1 |αj|2=1

from which we get kp(A)y1_k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2_≤ _max λ∈[λ2,λn] |p(λ)|2_,

(43)

Proof

Expanding the vector yi _{in the basis {u}i_{} of eigenvectors yields} yi = n X j=1 αjuj, where n X j=1 |αj|2=1

and the statement follows from Theorem 10.2.

(44)

Proof

Expanding the vector yi _{in the basis {u}i_{} of eigenvectors yields} yi = n X j=1 αjuj, where n X j=1 |αj|2=1

(45)

Proof ct.

For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)

(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.

Then one gets as before

kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λ_i− λj max λ∈[λi+1,λn] |q(λ)|.

The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.

(46)

Proof ct.

(47)

Proof ct.

(48)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

Let A ∈ Cn×n_{be Hermitean with eigenvalues λ}

1≤ λ2≤ · · · ≤ λnand

corresponding orthonormal eigenvectors u1_{, . . . ,}_un_.

If θ₁(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m

steps of Lanczos’ method, then

0 ≤ θ₁(m)− λ1≤ (λn− λ1) tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).

Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.

(49)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

1≤ λ2≤ · · · ≤ λnand

If θ₁(m)≤ · · · ≤ θ_m(m)denote the eigenvalues of the matrix Tmobtained after m

(50)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

1≤ λ2≤ · · · ≤ λnand

If θ₁(m)≤ · · · ≤ θ_m(m)denote the eigenvalues of the matrix Tmobtained after m

(51)

Proof

The left inequality follows from Rayleigh’s principle.

We have

θ₁(m)= min x ∈Km(v1,A),x 6=0

xH_Ax xH_x ,

and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ₁(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH_{(A − λ} 1I)x xH_x = min q∈Πm−1,q6=0 (v1₎H_q(A)H_{(A − λ1I)q(A)v}1 (v1₎H_q(A)2_v1 .

(52)

Proof

The left inequality follows from Rayleigh’s principle. We have

θ₁(m)= min x ∈Km(v1,A),x 6=0

xH_Ax xH_x ,

and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ₁(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH_{(A − λ} 1I)x xH_x = min q∈Πm−1,q6=0 (v1₎H_q(A)H_{(A − λ1I)q(A)v}1 (v1₎H_q(A)2_v1 .

(53)

Proof ct.

With v1=Pn_j=1αjuj it holds θ₁(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λ_n− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λ_n− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2

Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get

θ₁(m)− λ1≤ (λn− λ1)tan2δ(u1,v1) min p∈Πm−1,p(λ1)=1

max λ∈[λ2,λn]

|p(λ)|2 and the statement follows from Theorem 10.2.

(54)

Proof ct.

With v1=Pn_j=1αjuj it holds θ₁(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λ_n− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λ_n− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2

Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get

(55)

Example

Matrix A has eigenvalue λ1=1 and 99 eigenvalues uniformly distributed in [α, β].

it. [α, β]=[20,100] [α, β]=[2,100]

error bound error bound

2 3.3890e+001 3.4832e+003 1.9123e+001 1.1385e+005

3 1.9708e+001 1.0090e+002 1.0884e+001 7.5862e+004

4 6.6363e+000 1.5083e+001 5.8039e+000 5.5241e+004

5 1.5352e+000 2.2440e+000 4.1850e+000 3.8224e+004

10 7.9258e-005 1.6293e-004 1.8948e+000 4.2934e+003

15 4.3730e-009 1.1827e-008 1.0945e+000 4.1605e+002

20 2.9843e-013 8.5861e-013 1.6148e-001 3.9725e+001

25 1.4843e-002 3.7876e+000 30 1.6509e-003 3.6109e-001 35 2.0658e-004 3.4423e-002 40 5.8481e-006 3.2816e-003 45 9.5770e-007 3.1285e-004 50 3.0155e-009 2.9824e-005 55 8.0487e-012 2.8432e-006 60 3.1530e-014 2.7105e-007

(56)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

Under the conditions of Theorem 10.4 it holds

0 ≤ θ_j(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)₁ ≡ 1, κ(m)_j = j−1 Y i=1 λ_n− θ_i(m) λj− θ(m)i .

The general case j > 1 can be proved by using the maxmin characterization of θ(m)_j of Courant and Fischer.

(57)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

Analogous results hold for the largest eigenvalues and Ritz values.

(58)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

(59)

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).

This was done by considering the distance of a particular eigenvector u1_{of A} from the subspace Km(v1,A).

Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|

where σ(A) is the spectrum of A, and Π∗_m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.

The following lemma relates the distance of this quantity to k(I − Pm)u1_k where Pmdenotes the projector onto Km(v1,A).

(60)

Convergence of Arnoldi Method

(61)

Convergence of Arnoldi Method

(62)

Convergence of Arnoldi Method

(63)

Convergence of Arnoldi Method ct.

LEMMA 10.6

Assume that A is diagonalizable and that the initial vector v1_{of Arnoldi’s}

method has the expansion v1₌Pn

j=1αjuj with respect to an eigenbasis

u1_{, . . . ,}_un_{of A where ku}j_{k = 1 and α}

16= 0. Then the following inequality holds

k(I − Pm)u1k ≤ ξε(m) where ξ = n

X

j=2

|αj|/|α1|.

Hence, upper bounds of ε(m)_{estimate the speed of convergence of the} Arnoldi method.

THEOREM 10.7

Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1_e−c) ,

where cm−1denotes the Chebyshev polynomial of degree m − 1.

The relative difference between the right and left hand side converges to 0.

(64)

Convergence of Arnoldi Method ct.

LEMMA 10.6

X

j=2

|αj|/|α1|.

THEOREM 10.7

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1_e−c) ,

(65)

Convergence of Arnoldi Method ct.

LEMMA 10.6

X

j=2

|αj|/|α1|.

THEOREM 10.7

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

The relative difference between the right and left hand side converges to 0.

(66)

Convergence of Arnoldi Method ct.

LEMMA 10.6

X

j=2

|αj|/|α1|.

THEOREM 10.7

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

(67)

Refined Ritz vector

After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem

kAz − θzk₂=min!, z ∈ Km(v1,A), kzk2=1,

This improvement was introduced byJia(1997) and was calledrefined Ritz

vectoralthough a solution in general is not a Ritz vector corresponding to θ.

Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm=             h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... .._. . .. . .. .._. hm,m−1 hmm hm+1,m             ∈ R(m+1)×m

(68)

Refined Ritz vector

(69)

Refined Ritz vector

(70)

Refined Ritz vector ct.

z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk₂=1.

Hence,

k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2

and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.

(71)

Refined Ritz vector ct.

z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk₂=1.

Hence,

k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2

and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.

(72)

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj _{and v}j−1 _{obtained in the last two steps is enforced, with} respect to the previous vi_{:s it follows from the symmetry of A.}

It can be shown that infloating point arithmetictheorthogonality is destroyed

when a sequence θ_i(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.

Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).

(73)

Orthogonality of basis vectors

(74)

Orthogonality of basis vectors

(75)

Orthogonality of basis vectors

(76)

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1_obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(77)

Orthogonality of basis vectors ct.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method.

A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

(78)

Orthogonality of basis vectors ct.

(79)

Orthogonality of basis vectors ct.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1 _obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(80)

Example

(81)

Complete reorthogonalization

In each step the new vector vj+1_{is reorthogonalized against all previous} vectors vi_.

With classical Gram–Schmidt this means: vj+1is replaced by ˜

vj+1=vj+1− VjVjHvj+1.

If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1

2kv j+1_k,

the reorthogonalization will have to be repeated.

Complete reorthogonalization is very reliable, but very expensive.

(82)

Complete reorthogonalization

2kv j+1_k,

(83)

Complete reorthogonalization

2kv j+1_k,

Complete reorthogonalization is very reliable, but very expensive.

(84)

Complete reorthogonalization

2kv j+1_k,

(85)

Lanczos with complete reorthogonalization

1: Choose initial vector v1with kv1k = 1

2: Set v0=0; β0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1₌_Avj_{− β} j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1₌_vj+1_{− α} jvj 7: vj+1₌_vj+1_{− VV}H_vj+1 8: β_j = kvj+1_k 9: vj+1₌_vj+1_/β j 10: V = [V , vj+1_]

11: Solve projected eigenproblem Tjs = θs

12: Test for convergence

13: end for

(86)

Example

Convergence of the Lanczos process with complete reorthogonalization for A = diag (rand(100,1))

(87)

THEOREM 10.8 (Paige)

Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos

algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk

and SH

kSk =Ik.

Let yk ,i ₌_V

ksi be the corresponding Ritz vectors. Then it holds

(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|

.

Hence, the component (yk ,i₎H_vk +1_{of the computed Lanczos vector v}k +1_in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi

(88)

THEOREM 10.8 (Paige)

and SH

kSk =Ik.

Let yk ,i ₌_V

.

(89)

THEOREM 10.8 (Paige)

and SH

kSk =Ik.

Let yk ,i ₌_V

.

(90)

Selective reorthogonalization

By Paige’s theorem the vj _:_{s lose orthogonality since the vector v}j+1_obtained in the final step has a large component with respect to the Ritz vector

y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).

This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1_{against the Ritz vector y :}

vj+1=vj+1− (yHvj+1)y .

This so calledselective reorthogonalizationis applied if βj|sj| <

√ εkTjk

(actually kAk would have been needed on the right hand side, but kAk is not available).

(91)

Selective reorthogonalization

This so calledselective reorthogonalizationis applied if βj|sj| <

√ εkTjk

(actually kAk would have been needed on the right hand side, but kAk is not available).

(92)

Selective reorthogonalization

This so calledselective reorthogonalizationis applied if β_j|s_j| <√εkTjk

(93)

Selective reorthogonalization

1: Choose initial vector v1with kv1k = 1 2: Set v0₌_{0; β} 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1₌_Avj_{− β} j−1vj−1 5: α_j = (vj₎H_vj+1 6: vj+1₌_vj+1_{− α} jvj 7: β_j = kvj+1_k 8: Solve tridiag{βi−1, αi, βi}S = SΘ 9: for i = 1, . . . , j do 10: if βj|s(i)j | < √ εmax(diagΘ)then 11: y = [v1, . . . ,vj_]s 12: vj+1₌_vj+1_{− (y}H_vj+1_)y 13: end if 14: end for 15: β_j = kvj+1_k 16: vj+1₌_vj+1_/β j 17: V = [V , vj+1_] 18: end for

(94)

Partial reorthogonalization

It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.

V_jHVj =Ij +E with kEk2= √

ε, where ε denotes the rounding unit.

If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that

Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).

Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.

(95)

Partial reorthogonalization

(96)

Partial reorthogonalization

(97)

Partial reorthogonalization

2: Set v0₌_{0; β} 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1₌_Avj_{− β}_j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1₌_vj+1_{− α} jvj 7: βj = kvj+1k 8: vj+1=vj+1/βj 9: if k[V , vj+1]H[V , vj+1] −Ij+1k > √ εthen 10: vj+1=vj+1− VVH_vj+1 11: βj = kvj+1k 12: vj+1=vj+1_/β j 13: end if 14: V = [V , vj+1_] 15: end for

(98)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.

(99)

Explicit restarts

(100)

Explicit restarts

(101)

Explicit restarts

(102)

Explicit restarts

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart

(103)

Implicit restarts (Sorensen 1992)

With m = k + p steps of the Arnoldi method one determines a factorization AVm=VmHm+rm(em)H.

With p steps of the QR algorithm with implicit shifts for Hmone gets AV_m+=V_m+H_m++rm(em)HQ (∗)

where Q = Q1Q2· · · Qp, and Qj are the orthogonal matrices from the p QR–steps, and V+

m =VmQ, Hm+=QHHmQ.

The leading k − 1 components of (em₎H_{Q are 0. Hence, the leading k} columns of (*) have the form

AV_k+=V_k+H_k++ (rk)+(ek)H with the updated residual (rk₎+₌_V+

mek +1hk +1,k+rmQ(m, k ).