• Keine Ergebnisse gefunden

Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 10

N/A
N/A
Protected

Academic year: 2021

Aktie "Summer school on Iterative Projection methods for sparse linear systems and eigenproblems, Jyväskylä, Finland 2006 Chapters: 10"

Copied!
195
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

ITERATIVE PROJECTION METHODS FOR

SPARSE LINEAR SYSTEMS AND

EIGENPROBLEMS

CHAPTER 10 : KRYLOV SUBSPACE METHODS

Heinrich Voss

voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

Lanczos and Arnoldi Method

We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for

symmetric matrices by the Lanczos method.

Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)

of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.

In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),

respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)

Tm=WmTAVm.

(3)

Lanczos and Arnoldi Method

We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for

symmetric matrices by the Lanczos method.

Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)

of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.

In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),

respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)

Tm=WmTAVm.

Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.

(4)

Lanczos and Arnoldi Method

We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for

symmetric matrices by the Lanczos method.

Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)

of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.

In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),

respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)

Tm=WmTAVm.

(5)

Lanczos and Arnoldi Method

We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for

symmetric matrices by the Lanczos method.

Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)

of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.

In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),

respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)

Tm=WmTAVm.

Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.

(6)

Arnoldi method; compact form

AVm=VmHm+hm+1,mvm+1eTm VmTVm=Im, VmTvm+1 =0, VmTAVm=Hm=           h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm          

The eigenvalue problem

Hms = θs can be solved inexpensively by the QR algorithm.

(7)

Arnoldi method; compact form

AVm=VmHm+hm+1,mvm+1eTm VmTVm=Im, VmTvm+1 =0, VmTAVm=Hm=           h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm          

The eigenvalue problem

Hms = θs can be solved inexpensively by the QR algorithm.

(8)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j.

Then vj+1=0, and therefore

Avj = j X

k =1

hkjvk ∈ Kj(A, v1).

For i < j it holds by construction

Avi = i+1 X

k =1

hkivk ∈ Kj(A, v1).

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every

eigenvalue θ(j)i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.

(9)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1=0, and therefore

Avj = j X

k =1

hkjvk ∈ Kj(A, v1).

For i < j it holds by construction

Avi = i+1 X

k =1

hkivk ∈ Kj(A, v1).

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every

eigenvalue θ(j)i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.

(10)

Lucky termination

The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1=0, and therefore

Avj = j X

k =1

hkjvk ∈ Kj(A, v1).

For i < j it holds by construction

Avi = i+1 X

k =1

hkivk ∈ Kj(A, v1).

Hence, Kj(A, v1)is an invariant subspace of A, and therefore every (j)

(11)

Error indicator

If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.

If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that

|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.

Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .

If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.

(12)

Error indicator

If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.

If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that

|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.

Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .

If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.

(13)

Error indicator

If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.

If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that

|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.

Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .

If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.

(14)

Error indicator

If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.

If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that

|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.

Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .

(15)

Arnoldi method

1: choose initial vector v1with kv1k = 1, V1= [v1]

2: compute w = Av1, h = (v1)Tw , r = w − v1h, H1= [h], β = kr k2 3: for j = 1, 2, . . . do 4: vj+1=r /β 5: Vj+1= [Vj,vj+1], Hˆj =  Hj βejT  6: w = Avj+1 7: h = Vj+1T w , r = w − Vj+1h 8: if kr k2< ηkhk2then 9: s = VT j+1r , r = r − Vj+1s 10: h = h + s 11: end if 12: Hj+1= [ ˆHj,h], β = kr k2

13: compute approximate eigenvalues of Hj+1

14: test for convergence

15: end for

(16)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.

(17)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix.

For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.

(18)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.

(19)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.

(20)

Cost

For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.

We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.

In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the

(21)

Convergence

The Arnoldi method is a generalization of the power method where

approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.

One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ C.

Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.

(22)

Convergence

The Arnoldi method is a generalization of the power method where

approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.

One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ C.

Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.

(23)

Convergence

The Arnoldi method is a generalization of the power method where

approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.

One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ C.

Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.

(24)

Convergence

The Arnoldi method is a generalization of the power method where

approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.

One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

(25)

Example

Eigenvalues (blue plus) of a random tridiagonal (100, 100) matrix and approximations (red circle) after 10 steps of Arnoldi:

(26)

Lanczos algorithm

We analyze the convergence of the Lanczos method.

1: Choose initial vector v1with kv1k = 1

2: Set v0=0; β0=0 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1 j 9: end for

The matrix of the projected problem then is the tridiagonal matrix

Tm=         α1 β1 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm         .

(27)

Lanczos algorithm

We analyze the convergence of the Lanczos method.

1: Choose initial vector v1with kv1k = 1

2: Set v0=0; β0=0 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1 j 9: end for

The matrix of the projected problem then is the tridiagonal matrix

Tm=         α1 β1 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm         .

(28)

Monotonicity

Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:

λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),

Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)

θ(m)j = min dim S=jmaxy ∈S

yHTmy

yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.

Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.

Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.

(29)

Monotonicity

Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:

λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),

Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)

θ(m)j = min dim S=jmaxy ∈S

yHTmy

yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.

Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.

Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.

(30)

Monotonicity

Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:

λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),

Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)

θ(m)j = min dim S=jmaxy ∈S

yHTmy

yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.

Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λ.

(31)

Example

(32)

Convergence

Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors

corresponding to the eigenvalues λi.

LEMMA 10.1

Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If

Piv16= 0 then

tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yik 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 16= 0 0 otherwise

(33)

Convergence

Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors

corresponding to the eigenvalues λi. LEMMA 10.1

Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If

Piv16= 0 then

tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yik 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 16= 0 0 otherwise

(34)

Proof

The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.

With the orthogonal decomposition

x = q(A)v1=q(A)Piv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui

tan δ(x , ui) = kq(A)(I − Pi)v1k2 kq(A)Piv1k2 =kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)

(35)

Proof

The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.

With the orthogonal decomposition x = q(A)v1=q(A)P

iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui

tan δ(x , ui) = kq(A)(I − Pi)v 1k 2 kq(A)Piv1k2 = kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 ,

and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)

from which we get the statement by minimizing over all x ∈ K(v1,A).

(36)

Proof

The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.

With the orthogonal decomposition x = q(A)v1=q(A)P

iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui

tan δ(x , ui) = kq(A)(I − Pi)v 1k 2 kq(A)Piv1k2 = kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields

tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)

(37)

Convergence ct.

Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma.

We already have THEOREM 10.2

Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min

p∈Πm,p(γ)=1

max

t∈[α,β]

|p(t)|

has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α

(38)

Convergence ct.

Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma. We already have

THEOREM 10.2

Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min

p∈Πm,p(γ)=1

max

t∈[α,β]

|p(t)|

has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α

(39)

THEOREM 10.3

The angle δ(ui, K

m(v1,A)) between the exact eigenvector ui and the m-th

Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)

In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(40)

THEOREM 10.3

The angle δ(ui, K

m(v1,A)) between the exact eigenvector ui and the m-th

Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)

In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 .

Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(41)

THEOREM 10.3

The angle δ(ui, K

m(v1,A)) between the exact eigenvector ui and the m-th

Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)

In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤

1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.

(42)

Proof

We consider the case i = 1 first.

Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1

from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2 max λ∈[λ2,λn] |p(λ)|2,

(43)

Proof

We consider the case i = 1 first.

Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1

from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2 max λ∈[λ2,λn] |p(λ)|2,

and the statement follows from Theorem 10.2.

(44)

Proof

We consider the case i = 1 first.

Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1

from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2 max λ∈[λ2,λn] |p(λ)|2,

(45)

Proof ct.

For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)

(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.

Then one gets as before

kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.

The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.

(46)

Proof ct.

For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)

(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.

Then one gets as before

kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.

The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.

(47)

Proof ct.

For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)

(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.

Then one gets as before

kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.

The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.

(48)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

Let A ∈ Cn×nbe Hermitean with eigenvalues λ

1≤ λ2≤ · · · ≤ λnand

corresponding orthonormal eigenvectors u1, . . . ,un.

If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m

steps of Lanczos’ method, then

0 ≤ θ1(m)− λ1≤ (λn− λ1)  tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).

Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.

(49)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

Let A ∈ Cn×nbe Hermitean with eigenvalues λ

1≤ λ2≤ · · · ≤ λnand

corresponding orthonormal eigenvectors u1, . . . ,un.

If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m

steps of Lanczos’ method, then

0 ≤ θ1(m)− λ1≤ (λn− λ1)  tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).

Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.

(50)

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

Let A ∈ Cn×nbe Hermitean with eigenvalues λ

1≤ λ2≤ · · · ≤ λnand

corresponding orthonormal eigenvectors u1, . . . ,un.

If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m

steps of Lanczos’ method, then

0 ≤ θ1(m)− λ1≤ (λn− λ1)  tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).

Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.

(51)

Proof

The left inequality follows from Rayleigh’s principle.

We have

θ1(m)= min x ∈Km(v1,A),x 6=0

xHAx xHx ,

and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ1(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH(A − λ 1I)x xHx = min q∈Πm−1,q6=0 (v1)Hq(A)H(A − λ1I)q(A)v1 (v1)Hq(A)2v1 .

(52)

Proof

The left inequality follows from Rayleigh’s principle. We have

θ1(m)= min x ∈Km(v1,A),x 6=0

xHAx xHx ,

and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ1(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH(A − λ 1I)x xHx = min q∈Πm−1,q6=0 (v1)Hq(A)H(A − λ1I)q(A)v1 (v1)Hq(A)2v1 .

(53)

Proof ct.

With v1=Pnj=1αjuj it holds θ1(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2

Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get

θ1(m)− λ1≤ (λn− λ1)tan2δ(u1,v1) min p∈Πm−1,p(λ1)=1

max λ∈[λ2,λn]

|p(λ)|2 and the statement follows from Theorem 10.2.

(54)

Proof ct.

With v1=Pnj=1αjuj it holds θ1(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2

Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get

(55)

Example

Matrix A has eigenvalue λ1=1 and 99 eigenvalues uniformly distributed in [α, β].

it. [α, β]=[20,100] [α, β]=[2,100]

error bound error bound

2 3.3890e+001 3.4832e+003 1.9123e+001 1.1385e+005

3 1.9708e+001 1.0090e+002 1.0884e+001 7.5862e+004

4 6.6363e+000 1.5083e+001 5.8039e+000 5.5241e+004

5 1.5352e+000 2.2440e+000 4.1850e+000 3.8224e+004

10 7.9258e-005 1.6293e-004 1.8948e+000 4.2934e+003

15 4.3730e-009 1.1827e-008 1.0945e+000 4.1605e+002

20 2.9843e-013 8.5861e-013 1.6148e-001 3.9725e+001

25 1.4843e-002 3.7876e+000 30 1.6509e-003 3.6109e-001 35 2.0658e-004 3.4423e-002 40 5.8481e-006 3.2816e-003 45 9.5770e-007 3.1285e-004 50 3.0155e-009 2.9824e-005 55 8.0487e-012 2.8432e-006 60 3.1530e-014 2.7105e-007

(56)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

Under the conditions of Theorem 10.4 it holds

0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .

The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.

(57)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

Under the conditions of Theorem 10.4 it holds

0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .

The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.

Analogous results hold for the largest eigenvalues and Ritz values.

(58)

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)

Under the conditions of Theorem 10.4 it holds

0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .

The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.

(59)

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).

This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).

Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|

where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.

The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).

(60)

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).

This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).

Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|

where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.

The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).

(61)

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).

This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).

Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|

where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.

The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).

(62)

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).

This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).

Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|

where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.

The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).

(63)

Convergence of Arnoldi Method ct.

LEMMA 10.6

Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s

method has the expansion v1=Pn

j=1αjuj with respect to an eigenbasis

u1, . . . ,unof A where kujk = 1 and α

16= 0. Then the following inequality holds

k(I − Pm)u1k ≤ ξε(m) where ξ = n

X

j=2

|αj|/|α1|.

Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.

THEOREM 10.7

Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

where cm−1denotes the Chebyshev polynomial of degree m − 1.

The relative difference between the right and left hand side converges to 0.

(64)

Convergence of Arnoldi Method ct.

LEMMA 10.6

Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s

method has the expansion v1=Pn

j=1αjuj with respect to an eigenbasis

u1, . . . ,unof A where kujk = 1 and α

16= 0. Then the following inequality holds

k(I − Pm)u1k ≤ ξε(m) where ξ = n

X

j=2

|αj|/|α1|.

Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.

THEOREM 10.7

Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

where cm−1denotes the Chebyshev polynomial of degree m − 1.

(65)

Convergence of Arnoldi Method ct.

LEMMA 10.6

Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s

method has the expansion v1=Pn

j=1αjuj with respect to an eigenbasis

u1, . . . ,unof A where kujk = 1 and α

16= 0. Then the following inequality holds

k(I − Pm)u1k ≤ ξε(m) where ξ = n

X

j=2

|αj|/|α1|.

Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.

THEOREM 10.7

Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

where cm−1denotes the Chebyshev polynomial of degree m − 1.

The relative difference between the right and left hand side converges to 0.

(66)

Convergence of Arnoldi Method ct.

LEMMA 10.6

Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s

method has the expansion v1=Pn

j=1αjuj with respect to an eigenbasis

u1, . . . ,unof A where kujk = 1 and α

16= 0. Then the following inequality holds

k(I − Pm)u1k ≤ ξε(m) where ξ = n

X

j=2

|αj|/|α1|.

Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.

THEOREM 10.7

Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,

focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,

(67)

Refined Ritz vector

After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem

kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,

This improvement was introduced byJia(1997) and was calledrefined Ritz

vectoralthough a solution in general is not a Ritz vector corresponding to θ.

Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm=             h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m             ∈ R(m+1)×m

(68)

Refined Ritz vector

After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem

kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,

This improvement was introduced byJia(1997) and was calledrefined Ritz

vectoralthough a solution in general is not a Ritz vector corresponding to θ.

Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm=             h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m             ∈ R(m+1)×m

(69)

Refined Ritz vector

After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem

kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,

This improvement was introduced byJia(1997) and was calledrefined Ritz

vectoralthough a solution in general is not a Ritz vector corresponding to θ.

Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm=             h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m             ∈ R(m+1)×m

(70)

Refined Ritz vector ct.

z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk2=1.

Hence,

k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2

and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.

(71)

Refined Ritz vector ct.

z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk2=1.

Hence,

k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2

and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.

(72)

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.

It can be shown that infloating point arithmetictheorthogonality is destroyed

when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.

Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).

(73)

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.

It can be shown that infloating point arithmetictheorthogonality is destroyed

when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.

Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).

(74)

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.

It can be shown that infloating point arithmetictheorthogonality is destroyed

when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.

Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).

(75)

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.

It can be shown that infloating point arithmetictheorthogonality is destroyed

when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.

Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).

(76)

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(77)

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method.

A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(78)

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(79)

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.

Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1 obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.

(80)

Example

(81)

Complete reorthogonalization

In each step the new vector vj+1is reorthogonalized against all previous vectors vi.

With classical Gram–Schmidt this means: vj+1is replaced by ˜

vj+1=vj+1− VjVjHvj+1.

If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1

2kv j+1k,

the reorthogonalization will have to be repeated.

Complete reorthogonalization is very reliable, but very expensive.

(82)

Complete reorthogonalization

In each step the new vector vj+1is reorthogonalized against all previous vectors vi.

With classical Gram–Schmidt this means: vj+1is replaced by ˜

vj+1=vj+1− VjVjHvj+1.

If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1

2kv j+1k,

the reorthogonalization will have to be repeated.

(83)

Complete reorthogonalization

In each step the new vector vj+1is reorthogonalized against all previous vectors vi.

With classical Gram–Schmidt this means: vj+1is replaced by ˜

vj+1=vj+1− VjVjHvj+1.

If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1

2kv j+1k,

the reorthogonalization will have to be repeated.

Complete reorthogonalization is very reliable, but very expensive.

(84)

Complete reorthogonalization

In each step the new vector vj+1is reorthogonalized against all previous vectors vi.

With classical Gram–Schmidt this means: vj+1is replaced by ˜

vj+1=vj+1− VjVjHvj+1.

If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1

2kv j+1k,

the reorthogonalization will have to be repeated.

(85)

Lanczos with complete reorthogonalization

1: Choose initial vector v1with kv1k = 1

2: Set v0=0; β0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− β j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: vj+1=vj+1− VVHvj+1 8: βj = kvj+1k 9: vj+1=vj+1 j 10: V = [V , vj+1]

11: Solve projected eigenproblem Tjs = θs

12: Test for convergence

13: end for

(86)

Example

Convergence of the Lanczos process with complete reorthogonalization for A = diag (rand(100,1))

(87)

THEOREM 10.8 (Paige)

Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos

algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk

and SH

kSk =Ik.

Let yk ,i =V

ksi be the corresponding Ritz vectors. Then it holds

(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|

.

Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi

(88)

THEOREM 10.8 (Paige)

Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos

algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk

and SH

kSk =Ik.

Let yk ,i =V

ksi be the corresponding Ritz vectors. Then it holds

(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|

.

Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi

(89)

THEOREM 10.8 (Paige)

Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos

algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk

and SH

kSk =Ik.

Let yk ,i =V

ksi be the corresponding Ritz vectors. Then it holds

(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|

.

Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi

(90)

Selective reorthogonalization

By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector

y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).

This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :

vj+1=vj+1− (yHvj+1)y .

This so calledselective reorthogonalizationis applied if βj|sj| <

√ εkTjk

(actually kAk would have been needed on the right hand side, but kAk is not available).

(91)

Selective reorthogonalization

By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector

y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).

This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :

vj+1=vj+1− (yHvj+1)y .

This so calledselective reorthogonalizationis applied if βj|sj| <

√ εkTjk

(actually kAk would have been needed on the right hand side, but kAk is not available).

(92)

Selective reorthogonalization

By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector

y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).

This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :

vj+1=vj+1− (yHvj+1)y .

This so calledselective reorthogonalizationis applied if βj|sj| <√εkTjk

(93)

Selective reorthogonalization

1: Choose initial vector v1with kv1k = 1 2: Set v0=0; β 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− β j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: βj = kvj+1k 8: Solve tridiag{βi−1, αi, βi}S = SΘ 9: for i = 1, . . . , j do 10: if βj|s(i)j | < √ εmax(diagΘ)then 11: y = [v1, . . . ,vj]s 12: vj+1=vj+1− (yHvj+1)y 13: end if 14: end for 15: βj = kvj+1k 16: vj+1=vj+1 j 17: V = [V , vj+1] 18: end for

(94)

Partial reorthogonalization

It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.

VjHVj =Ij +E with kEk2= √

ε, where ε denotes the rounding unit.

If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that

Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).

Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.

(95)

Partial reorthogonalization

It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.

VjHVj =Ij +E with kEk2= √

ε, where ε denotes the rounding unit.

If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that

Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).

Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.

(96)

Partial reorthogonalization

It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.

VjHVj =Ij +E with kEk2= √

ε, where ε denotes the rounding unit.

If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that

Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).

Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.

(97)

Partial reorthogonalization

1: Choose initial vector v1with kv1k = 1

2: Set v0=0; β 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: βj = kvj+1k 8: vj+1=vj+1/βj 9: if k[V , vj+1]H[V , vj+1] −Ij+1k > √ εthen 10: vj+1=vj+1− VVHvj+1 11: βj = kvj+1k 12: vj+1=vj+1 j 13: end if 14: V = [V , vj+1] 15: end for

(98)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.

(99)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.

(100)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.

(101)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.

(102)

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart

(103)

Implicit restarts (Sorensen 1992)

With m = k + p steps of the Arnoldi method one determines a factorization AVm=VmHm+rm(em)H.

With p steps of the QR algorithm with implicit shifts for Hmone gets AVm+=Vm+Hm++rm(em)HQ (∗)

where Q = Q1Q2· · · Qp, and Qj are the orthogonal matrices from the p QR–steps, and V+

m =VmQ, Hm+=QHHmQ.

The leading k − 1 components of (em)HQ are 0. Hence, the leading k columns of (*) have the form

AVk+=Vk+Hk++ (rk)+(ek)H with the updated residual (rk)+=V+

mek +1hk +1,k+rmQ(m, k ).

Referenzen

ÄHNLICHE DOKUMENTE

Obviously, the Gauß-Seidel method and the successive overrelaxation are not symmetric (the eigenvalues of the SOR iteration matrix are not even real). It is possible, however,

For the symmetric Gauß-Seidel method and the SSOR method the iteration matrices are similar to symmetric and positive semidefinite matrices. Obviously, the spectrum of the

Because the conjugate gradient method is a direct method, and hence, in exact arithmetic the solution is obtained in a finite number of steps, terms like ‘convergence’ or

Although the LQ factorization is determined in a stable way (even for indefinite matrices) the algorithm should not be implemented directly since intermediate matrices ˜ L k may

Vinsome (1976) proposed a method called ORTHOMIN which is a truncated GCR method and which is considerably less expensive per iteration step. This method usually is referred

Gutknecht (1993) proposed BiCGStab2 where in odd-numbered iteration steps the polynomial ψ is expanded by a linear factor, but in the following even-numbered step this linear factor

TUHH Heinrich Voss QMR Methods Summer School 2006 22 / 63...

If an error tolerance is not met the search space is expanded in the course of the algorithm in an iterative way with the aim that some of the eigenvalues of the reduced matrix