ITERATIVE PROJECTION METHODS FOR
SPARSE LINEAR SYSTEMS AND
EIGENPROBLEMS
CHAPTER 10 : KRYLOV SUBSPACE METHODS
Heinrich Voss
voss@tu-harburg.de
Hamburg University of Technology Institute of Numerical Simulation
Lanczos and Arnoldi Method
We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for
symmetric matrices by the Lanczos method.
Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)
of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.
In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),
respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)
Tm=WmTAVm.
Lanczos and Arnoldi Method
We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for
symmetric matrices by the Lanczos method.
Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)
of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.
In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),
respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)
Tm=WmTAVm.
Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.
Lanczos and Arnoldi Method
We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for
symmetric matrices by the Lanczos method.
Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)
of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.
In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),
respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)
Tm=WmTAVm.
Lanczos and Arnoldi Method
We already discussed in Sections 4, 5, and 6 how to construct an orthonormal basis Vmof the Krylov space Km(v1,A) by the Arnoldi method and for
symmetric matrices by the Lanczos method.
Along with the bases one obtains the orthogonal projection Hm:=VmTAVm (Hm:=VmHAVmin the complex case)
of A onto Km(v1,A) which is a Hessenberg matrix for general matrix A, and triangular for symmetric (Hermitean) A.
In Section 8 we discussed the two-sided Lanczos process which constructs bases Vmand Wmof the Krylov spaces Km(v1,A) and Km(w1,AT),
respectively, along with the oblique projection of A onto Km(v1,A) along Km(w1,AT)
Tm=WmTAVm.
Tmis a triangular matrix. If look-ahead is included, then Tmis block-triangular.
Arnoldi method; compact form
AVm=VmHm+hm+1,mvm+1eTm VmTVm=Im, VmTvm+1 =0, VmTAVm=Hm= h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm The eigenvalue problem
Hms = θs can be solved inexpensively by the QR algorithm.
Arnoldi method; compact form
AVm=VmHm+hm+1,mvm+1eTm VmTVm=Im, VmTvm+1 =0, VmTAVm=Hm= h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm The eigenvalue problem
Hms = θs can be solved inexpensively by the QR algorithm.
Lucky termination
The Arnoldi method may terminate with hj,j+1=0 for some j.
Then vj+1=0, and therefore
Avj = j X
k =1
hkjvk ∈ Kj(A, v1).
For i < j it holds by construction
Avi = i+1 X
k =1
hkivk ∈ Kj(A, v1).
Hence, Kj(A, v1)is an invariant subspace of A, and therefore every
eigenvalue θ(j)i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.
Lucky termination
The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1=0, and therefore
Avj = j X
k =1
hkjvk ∈ Kj(A, v1).
For i < j it holds by construction
Avi = i+1 X
k =1
hkivk ∈ Kj(A, v1).
Hence, Kj(A, v1)is an invariant subspace of A, and therefore every
eigenvalue θ(j)i of Hj is an eigenvalue of A, and the corresponding Ritz vectors are eigenvectors of A.
Lucky termination
The Arnoldi method may terminate with hj,j+1=0 for some j. Then vj+1=0, and therefore
Avj = j X
k =1
hkjvk ∈ Kj(A, v1).
For i < j it holds by construction
Avi = i+1 X
k =1
hkivk ∈ Kj(A, v1).
Hence, Kj(A, v1)is an invariant subspace of A, and therefore every (j)
Error indicator
If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.
If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that
|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.
Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .
If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.
Error indicator
If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.
If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that
|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.
Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .
If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.
Error indicator
If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.
If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that
|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.
Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .
If A is non-normal, then the Krylov & Bogoliubov Theorem does not hold. Nonetheless, in this case the backward eror hm+1,m|s(m)m,i| is used as an error indicator.
Error indicator
If θi(m)are the Ritz values (eigenvalues of Hm), s(m)i the corresponding eigenvectors, and xi(m)=Vmsi(m)the Ritz vectors, then (as for the Lanczos method) it holds (A − θi(m)I)xi(m)=hm+1,mvm+1emTsi(m), which implies k(A − θi(m)I)xi(m)k2=hm+1,m|s (m) m,i|.
If A is symmetric then by the Krylov & Bogoliubov Theorem there exists an eigenvalue ˜λof A such that
|˜λ − θ(m)i | ≤ hm+1,m|sm,i(m)|.
Notice, that this error bound can be computet without determining the Ritz vector xi(m)=Vms(m)i .
Arnoldi method
1: choose initial vector v1with kv1k = 1, V1= [v1]
2: compute w = Av1, h = (v1)Tw , r = w − v1h, H1= [h], β = kr k2 3: for j = 1, 2, . . . do 4: vj+1=r /β 5: Vj+1= [Vj,vj+1], Hˆj = Hj βejT 6: w = Avj+1 7: h = Vj+1T w , r = w − Vj+1h 8: if kr k2< ηkhk2then 9: s = VT j+1r , r = r − Vj+1s 10: h = h + s 11: end if 12: Hj+1= [ ˆHj,h], β = kr k2
13: compute approximate eigenvalues of Hj+1
14: test for convergence
15: end for
Cost
For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.
We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.
Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.
In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.
Cost
For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.
We need to keep m vectors of length n plus an m × m Hessenberg matrix.
For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.
Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.
In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.
Cost
For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.
We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.
Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.
In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.
Cost
For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.
We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.
Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.
In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the Ritz vectors are computed.
Cost
For large matrices the Arnoldi method becomes costly, both in terms of computation and storage.
We need to keep m vectors of length n plus an m × m Hessenberg matrix. For the arithmetic costs, we need to multiply vj+1by A, at the cost of 2Nz, where Nzis the number of nonzero elements of A, and then orthogonalize the result against j basis vectors, at the cost of 4(j + 1)n.
Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2in storage and ≈ 2nNz+2nm2in arithmetic operations.
In the symmetric case only the last two basis vectors vj are needed when determining the projection Hm. The previous vectors are not even needed to determine an error bound, and can be stored on secondary storage until the
Convergence
The Arnoldi method is a generalization of the power method where
approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).
As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.
One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.
The Arnoldi method is independent of shifts since
Km(v1,A) = Km(v1,A + αI) for all α ∈ C.
Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.
Convergence
The Arnoldi method is a generalization of the power method where
approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).
As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.
One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.
The Arnoldi method is independent of shifts since
Km(v1,A) = Km(v1,A + αI) for all α ∈ C.
Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.
Convergence
The Arnoldi method is a generalization of the power method where
approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).
As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.
One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.
The Arnoldi method is independent of shifts since
Km(v1,A) = Km(v1,A + αI) for all α ∈ C.
Hence, we can expect convergence of the Arnoldi method to extreme eigenvalues first.
Convergence
The Arnoldi method is a generalization of the power method where
approximations to eigenvectors are obtained from the last iterate only (or the two last iterates in case of a complex eigenvalue).
As for the power method we therefore can expect fast convergence to the eigenvalue which is maximal in modulus and to the corresponding eigenvector.
One can influence the convergence of the power method by shifts, either to separate the wanted eigenvalue more from the remaining spectrum or to enforce convergence to a different eigenvalue.
The Arnoldi method is independent of shifts since
Example
Eigenvalues (blue plus) of a random tridiagonal (100, 100) matrix and approximations (red circle) after 10 steps of Arnoldi:
Lanczos algorithm
We analyze the convergence of the Lanczos method.
1: Choose initial vector v1with kv1k = 1
2: Set v0=0; β0=0 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1/β j 9: end for
The matrix of the projected problem then is the tridiagonal matrix
Tm= α1 β1 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm .
Lanczos algorithm
We analyze the convergence of the Lanczos method.
1: Choose initial vector v1with kv1k = 1
2: Set v0=0; β0=0 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− αjvj 7: βj = kvj+1k 8: vj+1=vj+1/β j 9: end for
The matrix of the projected problem then is the tridiagonal matrix
Tm= α1 β1 0 . . . 0 β1 α2 β2 . . . 0 . .. ... . .. . .. βm−1 αm .
Monotonicity
Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:
λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),
Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)
θ(m)j = min dim S=jmaxy ∈S
yHTmy
yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.
Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.
Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.
Monotonicity
Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:
λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),
Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)
θ(m)j = min dim S=jmaxy ∈S
yHTmy
yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.
Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λj.
Likewise the sequence of the j-largest eigenvalues of Tmare monotonically increasing and bounded above by the j-largest eigenvalue of A.
Monotonicity
Assume that the eigenvalues of A and the Ritz values with respect to Kmare ordered by magnitude:
λ1≤ λ2≤ · · · ≤ λn, θ(m)1 ≤ θ2(m)≤ · · · ≤ θm(m),
Then the minmax principle yields (S denotes a subspace of Cmand ˜S a subspace of Cn)
θ(m)j = min dim S=jmaxy ∈S
yHTmy
yHy =dim S=jmin maxy ∈S yHVH mAVmy yVH mVmy = min dim ˜S=j, ˜S⊂Km max x ∈˜S xHAx xHx ≥ min dim ˜S=j, ˜S⊂Km+1 max x ∈˜S xHAx xHx = θ(m+1)j ≥ min dim ˜S=j max x ∈˜S xHAx xHx = λj.
Each (finite) sequence {θj(m)}m=j,j+1,j+2 therefore is monotonically decreasing and bounded below by λ.
Example
Convergence
Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors
corresponding to the eigenvalues λi.
LEMMA 10.1
Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If
Piv16= 0 then
tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yik 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 16= 0 0 otherwise
Convergence
Before getting results about the speed of convergence of the Lanczos method we first prove a bound for the angle between an eigenvector of A and a Krylov space Km(v1,A). Denote by ui a system of orthonormal eigenvectors
corresponding to the eigenvalues λi. LEMMA 10.1
Let Pi be the orthogonal projector onto the eigenspace corresponding to λi. If
Piv16= 0 then
tan δ(ui, Km) = min p∈Πm−1,p(λi)=1 kp(A)yik 2tan δ(ui,v1), where yi = ( (I−Pi)v1 k(I−Pi)v1k2 if (I − Pi)v 16= 0 0 otherwise
Proof
The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.
With the orthogonal decomposition
x = q(A)v1=q(A)Piv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui
tan δ(x , ui) = kq(A)(I − Pi)v1k2 kq(A)Piv1k2 =kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields
tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)
Proof
The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.
With the orthogonal decomposition x = q(A)v1=q(A)P
iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui
tan δ(x , ui) = kq(A)(I − Pi)v 1k 2 kq(A)Piv1k2 = kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 ,
and the scaling p(λ) := q(λ)/q(λi)yields
tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)
from which we get the statement by minimizing over all x ∈ K(v1,A).
Proof
The Krylov space Km(v1,A) consists of all vectors which can be written as x = q(A)v1where q ∈ Πm−1is any polynomial of degree m − 1.
With the orthogonal decomposition x = q(A)v1=q(A)P
iv1+q(A)(I − Pi)v1 it holds for the angle δ(x , ui)between x and ui
tan δ(x , ui) = kq(A)(I − Pi)v 1k 2 kq(A)Piv1k2 = kq(A)y ik 2 |q(λi)| k(I − Pi)v1k2 kPiv1k2 , and the scaling p(λ) := q(λ)/q(λi)yields
tan δ(x , ui) = kp(A)yik2tan δ(v1,ui)
Convergence ct.
Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma.
We already have THEOREM 10.2
Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min
p∈Πm,p(γ)=1
max
t∈[α,β]
|p(t)|
has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α
Convergence ct.
Inserting any polynomial of degree m − 1 which satisfies p(λi) =1 one obtains an upper bound for tan δ(ui, Km(v1,A)) from the last lemma. We already have
THEOREM 10.2
Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem min
p∈Πm,p(γ)=1
max
t∈[α,β]
|p(t)|
has a unique solution, and is solved by the scaled Chebyshev polynomial ˜ cm(t) := ( cm(1 + 2β−αt−β)/cm(1 + 2 γ−β β−α) für γ > β cm(1 + 2β−αα−t)/cm(1 + 2 α−γ β−α) für γ < α
THEOREM 10.3
The angle δ(ui, K
m(v1,A)) between the exact eigenvector ui and the m-th
Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)
In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤
1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.
THEOREM 10.3
The angle δ(ui, K
m(v1,A)) between the exact eigenvector ui and the m-th
Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)
In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤
1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 .
Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.
THEOREM 10.3
The angle δ(ui, K
m(v1,A)) between the exact eigenvector ui and the m-th
Krylov space satisfies the inequality tan δ(ui, Km) ≤ κi cm−i(1 + 2ρi) tan δ(ui,v1) (1) where κ1=1, κi = i−1 Y j=1 λn− λj λi− λj für i > 1 (2) and ρi = λi+1− λi λn− λi+1 . (3)
In particular for i = 1 one gets the estimate tan δ(u1, Km(v1,A)) ≤
1 cm−1(1 + 2ρ1) tan δ(v1,u1) where ρ1= λ2− λ1 λn− λ2 . Crucial for the convergence is the distance of the two smallest eigenvalues relative to the width of the entire spectrum.
Proof
We consider the case i = 1 first.
Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1
from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2≤ max λ∈[λ2,λn] |p(λ)|2,
Proof
We consider the case i = 1 first.
Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1
from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2≤ max λ∈[λ2,λn] |p(λ)|2,
and the statement follows from Theorem 10.2.
Proof
We consider the case i = 1 first.
Expanding the vector yi in the basis {ui} of eigenvectors yields yi = n X j=1 αjuj, where n X j=1 |αj|2=1
from which we get kp(A)y1k 2= n X j=2 |p(λj)αj|2≤ max j=2,...,n|p(λj)| 2≤ max λ∈[λ2,λn] |p(λ)|2,
Proof ct.
For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)
(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.
Then one gets as before
kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.
The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.
Proof ct.
For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)
(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.
Then one gets as before
kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.
The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.
Proof ct.
For i > 1 we consider in Lemma 10.1 polynomials of the form p(λ) := (λ − λ1) · · · (λ − λi−1)
(λi− λ1) · · · (λi− λi−1) q(λ) with q ∈ Πm−iand q(λi) =1.
Then one gets as before
kp(A)yik2 ≤ max λ∈[λi+1,λn] i−1 Y j=1 λ − λj λi− λj q(λ) ≤ i−1 Y j=1 λn− λj λi− λj max λ∈[λi+1,λn] |q(λ)|.
The result follows by minimizing this expression over all polynomials q satisfying the constraint q(λi) =1.
THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)
Let A ∈ Cn×nbe Hermitean with eigenvalues λ
1≤ λ2≤ · · · ≤ λnand
corresponding orthonormal eigenvectors u1, . . . ,un.
If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m
steps of Lanczos’ method, then
0 ≤ θ1(m)− λ1≤ (λn− λ1) tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).
Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.
THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)
Let A ∈ Cn×nbe Hermitean with eigenvalues λ
1≤ λ2≤ · · · ≤ λnand
corresponding orthonormal eigenvectors u1, . . . ,un.
If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m
steps of Lanczos’ method, then
0 ≤ θ1(m)− λ1≤ (λn− λ1) tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).
Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.
THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)
Let A ∈ Cn×nbe Hermitean with eigenvalues λ
1≤ λ2≤ · · · ≤ λnand
corresponding orthonormal eigenvectors u1, . . . ,un.
If θ1(m)≤ · · · ≤ θm(m)denote the eigenvalues of the matrix Tmobtained after m
steps of Lanczos’ method, then
0 ≤ θ1(m)− λ1≤ (λn− λ1) tan δ(u1,v1) cm−1(1 + 2ρ1) 2 , where ρ1= (λ2− λ1)/(λn− λ2).
Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. the separation of the first two eigenvalues relative to the width of the entire spectrum of A.
Proof
The left inequality follows from Rayleigh’s principle.
We have
θ1(m)= min x ∈Km(v1,A),x 6=0
xHAx xHx ,
and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ1(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH(A − λ 1I)x xHx = min q∈Πm−1,q6=0 (v1)Hq(A)H(A − λ1I)q(A)v1 (v1)Hq(A)2v1 .
Proof
The left inequality follows from Rayleigh’s principle. We have
θ1(m)= min x ∈Km(v1,A),x 6=0
xHAx xHx ,
and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1for some q ∈ Πm−1, it follows θ1(m)− λ1 = min x ∈Km(v1,A),x 6=0 xH(A − λ 1I)x xHx = min q∈Πm−1,q6=0 (v1)Hq(A)H(A − λ1I)q(A)v1 (v1)Hq(A)2v1 .
Proof ct.
With v1=Pnj=1αjuj it holds θ1(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get
θ1(m)− λ1≤ (λn− λ1)tan2δ(u1,v1) min p∈Πm−1,p(λ1)=1
max λ∈[λ2,λn]
|p(λ)|2 and the statement follows from Theorem 10.2.
Proof ct.
With v1=Pnj=1αjuj it holds θ1(m)− λ1 = min q∈Πm−1,q6=0 Pn j=2(λj− λ1)|αjq(λj)|2 Pn j=1|αjq(λj)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 Pn j=2|αjq(λj)|2 |α1q(λ1)|2 ≤ (λn− λ1) min q∈Πm−1,q6=0 max j=2,...,n |q(λj)|2 |q(λ1)|2 · Pn j=2|αj|2 |α1|2Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when q passes through the set Πm−1is the set of all polynomials of degree not exceeding m − 1 and satisfying the constraint p(λ1) =1 we get
Example
Matrix A has eigenvalue λ1=1 and 99 eigenvalues uniformly distributed in [α, β].
it. [α, β]=[20,100] [α, β]=[2,100]
error bound error bound
2 3.3890e+001 3.4832e+003 1.9123e+001 1.1385e+005
3 1.9708e+001 1.0090e+002 1.0884e+001 7.5862e+004
4 6.6363e+000 1.5083e+001 5.8039e+000 5.5241e+004
5 1.5352e+000 2.2440e+000 4.1850e+000 3.8224e+004
10 7.9258e-005 1.6293e-004 1.8948e+000 4.2934e+003
15 4.3730e-009 1.1827e-008 1.0945e+000 4.1605e+002
20 2.9843e-013 8.5861e-013 1.6148e-001 3.9725e+001
25 1.4843e-002 3.7876e+000 30 1.6509e-003 3.6109e-001 35 2.0658e-004 3.4423e-002 40 5.8481e-006 3.2816e-003 45 9.5770e-007 3.1285e-004 50 3.0155e-009 2.9824e-005 55 8.0487e-012 2.8432e-006 60 3.1530e-014 2.7105e-007
THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)
Under the conditions of Theorem 10.4 it holds
0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .
The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.
THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)
Under the conditions of Theorem 10.4 it holds
0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .
The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.
Analogous results hold for the largest eigenvalues and Ritz values.
THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)
Under the conditions of Theorem 10.4 it holds
0 ≤ θj(m)− λj ≤ (λn− λ1) κ(m) j tan δ(v1,ui) cm−j(1 + 2ρj) !2 with ρj = (λj+1− λj)/(λn− λj+1), and κ(m)1 ≡ 1, κ(m)j = j−1 Y i=1 λn− θi(m) λj− θ(m)i .
The general case j > 1 can be proved by using the maxmin characterization of θ(m)j of Courant and Fischer.
Convergence of Arnoldi Method
For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).
This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).
Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|
where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.
The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).
Convergence of Arnoldi Method
For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).
This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).
Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|
where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.
The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).
Convergence of Arnoldi Method
For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).
This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).
Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|
where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.
The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).
Convergence of Arnoldi Method
For nonsymmetric matrices and the Arnoldi process the speed of convergence was analyzed bySaad(1983).
This was done by considering the distance of a particular eigenvector u1of A from the subspace Km(v1,A).
Let ε(m):= min p∈Π∗ m−1 max λ∈σ(A)\{λ1} |p(λ)|
where σ(A) is the spectrum of A, and Π∗m−1denotes the set of polynomials of maximum degree m − 1 such that p(λ1) =1.
The following lemma relates the distance of this quantity to k(I − Pm)u1k where Pmdenotes the projector onto Km(v1,A).
Convergence of Arnoldi Method ct.
LEMMA 10.6
Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s
method has the expansion v1=Pn
j=1αjuj with respect to an eigenbasis
u1, . . . ,unof A where kujk = 1 and α
16= 0. Then the following inequality holds
k(I − Pm)u1k ≤ ξε(m) where ξ = n
X
j=2
|αj|/|α1|.
Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.
THEOREM 10.7
Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,
focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,
where cm−1denotes the Chebyshev polynomial of degree m − 1.
The relative difference between the right and left hand side converges to 0.
Convergence of Arnoldi Method ct.
LEMMA 10.6
Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s
method has the expansion v1=Pn
j=1αjuj with respect to an eigenbasis
u1, . . . ,unof A where kujk = 1 and α
16= 0. Then the following inequality holds
k(I − Pm)u1k ≤ ξε(m) where ξ = n
X
j=2
|αj|/|α1|.
Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.
THEOREM 10.7
Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,
focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,
where cm−1denotes the Chebyshev polynomial of degree m − 1.
Convergence of Arnoldi Method ct.
LEMMA 10.6
Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s
method has the expansion v1=Pn
j=1αjuj with respect to an eigenbasis
u1, . . . ,unof A where kujk = 1 and α
16= 0. Then the following inequality holds
k(I − Pm)u1k ≤ ξε(m) where ξ = n
X
j=2
|αj|/|α1|.
Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.
THEOREM 10.7
Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,
focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,
where cm−1denotes the Chebyshev polynomial of degree m − 1.
The relative difference between the right and left hand side converges to 0.
Convergence of Arnoldi Method ct.
LEMMA 10.6
Assume that A is diagonalizable and that the initial vector v1of Arnoldi’s
method has the expansion v1=Pn
j=1αjuj with respect to an eigenbasis
u1, . . . ,unof A where kujk = 1 and α
16= 0. Then the following inequality holds
k(I − Pm)u1k ≤ ξε(m) where ξ = n
X
j=2
|αj|/|α1|.
Hence, upper bounds of ε(m)estimate the speed of convergence of the Arnoldi method.
THEOREM 10.7
Assume that all eigenvalues of A but λ1are lying in an ellipse with center c,
focal points c − e and c + e and large semiaxis a. Then it holds ε(m)≤ cm−1( a c) cm−1(λ1e−c) ,
Refined Ritz vector
After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem
kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,
This improvement was introduced byJia(1997) and was calledrefined Ritz
vectoralthough a solution in general is not a Ritz vector corresponding to θ.
Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm= h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m ∈ R(m+1)×m
Refined Ritz vector
After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem
kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,
This improvement was introduced byJia(1997) and was calledrefined Ritz
vectoralthough a solution in general is not a Ritz vector corresponding to θ.
Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm= h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m ∈ R(m+1)×m
Refined Ritz vector
After a Ritz pair (θ, y ) has been determined, the approximation y to the eigenvector can be improved solving the optimization problem
kAz − θzk2=min!, z ∈ Km(v1,A), kzk2=1,
This improvement was introduced byJia(1997) and was calledrefined Ritz
vectoralthough a solution in general is not a Ritz vector corresponding to θ.
Given a Ritz pair the refined Ritz vector can be obtained from the augmented Hessenberg matrix ˜ Hm= h11 h12 h13 . . . h1m h21 h22 h23 . . . h2m 0 h32 h33 . . . h3m . .. ... ... . .. . .. ... hm,m−1 hmm hm+1,m ∈ R(m+1)×m
Refined Ritz vector ct.
z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk2=1.
Hence,
k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2
and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.
Refined Ritz vector ct.
z ∈ Km(v1,A) can be written as z = Vmt, and kzk2=1 holds if and only if ktk2=1.
Hence,
k(AVm− θVm)tk2 = k(Vm+1Hm˜ − θVm)tk2 = kVm+1( ˜Hm− θIm+1,m)tk2 = k( ˜Hm− θIm+1,m)tk2
and this expression attains its minimum under the constraint ktk2=1 for the right singular vector of ˜Hm− θIm+1,mcorresponding to the smallest singular value.
Orthogonality of basis vectors
In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.
It can be shown that infloating point arithmetictheorthogonality is destroyed
when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.
Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.
This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).
Orthogonality of basis vectors
In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.
It can be shown that infloating point arithmetictheorthogonality is destroyed
when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.
Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.
This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).
Orthogonality of basis vectors
In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.
It can be shown that infloating point arithmetictheorthogonality is destroyed
when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.
Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.
This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).
Orthogonality of basis vectors
In exact arithmetic the Lanczos method generates an orthonormal basis of the Krylov space Km(v1,A). In the algorithm only the orthogonality with respect to two basis vectors vj and vj−1 obtained in the last two steps is enforced, with respect to the previous vi:s it follows from the symmetry of A.
It can be shown that infloating point arithmetictheorthogonality is destroyed
when a sequence θi(j), j = 1, 2, . . . , of Ritz values has converged to an eigenvalue ˜λof A, i.e. if the residual βjsj,i(j)has become small.
Thereafter all vj:s obtain a component in the direction of the eigenspace of the converged eigenvalue, and a duplicate copy of that eigenvalue will show up in the spectrum of the tridiagonal matrix Tm.
This effect was first observed and studied byPaige(1971). A detailed discussion is contained in the monograph ofParlett(1998).
Orthogonality of basis vectors ct.
Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.
For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.
Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.
Orthogonality of basis vectors ct.
Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.
For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method.
A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.
Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.
Orthogonality of basis vectors ct.
Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.
For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.
Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.
Orthogonality of basis vectors ct.
Note that these multiple Ritz values have nothing to do with possible multiple eigenvalues of a given matrix, they occur simply as a result of a converged eigenvalue.
For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmetic can only detect one eigenvector, namely the projection of the initial vector v1 to the corresponding eigenspace. Further eigenvectors can only be obtained restarting with a different initial vector or by a block Lanczos (Arnoldi) method. A simple trick to detect duplicate copies of an eigenvalue advocated by Cullum & Willoughby(1986) is the following.
Compute the eigenvalues of the reduced matrix ˆTm∈ Rm−1×m−1 obtained from Tmby detracting the first row and column. Those eigenvalues that differ less than a small multiple times machine precision from the eigenvalues of Tm are the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.
Example
Complete reorthogonalization
In each step the new vector vj+1is reorthogonalized against all previous vectors vi.
With classical Gram–Schmidt this means: vj+1is replaced by ˜
vj+1=vj+1− VjVjHvj+1.
If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1
2kv j+1k,
the reorthogonalization will have to be repeated.
Complete reorthogonalization is very reliable, but very expensive.
Complete reorthogonalization
In each step the new vector vj+1is reorthogonalized against all previous vectors vi.
With classical Gram–Schmidt this means: vj+1is replaced by ˜
vj+1=vj+1− VjVjHvj+1.
If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1
2kv j+1k,
the reorthogonalization will have to be repeated.
Complete reorthogonalization
In each step the new vector vj+1is reorthogonalized against all previous vectors vi.
With classical Gram–Schmidt this means: vj+1is replaced by ˜
vj+1=vj+1− VjVjHvj+1.
If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1
2kv j+1k,
the reorthogonalization will have to be repeated.
Complete reorthogonalization is very reliable, but very expensive.
Complete reorthogonalization
In each step the new vector vj+1is reorthogonalized against all previous vectors vi.
With classical Gram–Schmidt this means: vj+1is replaced by ˜
vj+1=vj+1− VjVjHvj+1.
If the norm is decreased by a nontrivial amount, say k˜vj+1k < √1
2kv j+1k,
the reorthogonalization will have to be repeated.
Lanczos with complete reorthogonalization
1: Choose initial vector v1with kv1k = 1
2: Set v0=0; β0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− β j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: vj+1=vj+1− VVHvj+1 8: βj = kvj+1k 9: vj+1=vj+1/β j 10: V = [V , vj+1]
11: Solve projected eigenproblem Tjs = θs
12: Test for convergence
13: end for
Example
Convergence of the Lanczos process with complete reorthogonalization for A = diag (rand(100,1))
THEOREM 10.8 (Paige)
Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos
algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk
and SH
kSk =Ik.
Let yk ,i =V
ksi be the corresponding Ritz vectors. Then it holds
(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|
.
Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi
THEOREM 10.8 (Paige)
Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos
algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk
and SH
kSk =Ik.
Let yk ,i =V
ksi be the corresponding Ritz vectors. Then it holds
(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|
.
Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi
THEOREM 10.8 (Paige)
Let Vk = [v1, . . . ,vk]be the matrix of vectors actually obtained in the Lanczos
algorithm, Θk =diag{θ1, . . . , θk} and Sk = [s1, . . . ,sk]such that TkSk =SkΘk
and SH
kSk =Ik.
Let yk ,i =V
ksi be the corresponding Ritz vectors. Then it holds
(yk ,i)Hvk +1= O(εkAk2) βk|sk ,i|
.
Hence, the component (yk ,i)Hvk +1of the computed Lanczos vector vk +1in the direction of the Ritz vector yk ,i is proportional to the reciprocal of the error bound for the Ritz value θi
Selective reorthogonalization
By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector
y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).
This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :
vj+1=vj+1− (yHvj+1)y .
This so calledselective reorthogonalizationis applied if βj|sj| <
√ εkTjk
(actually kAk would have been needed on the right hand side, but kAk is not available).
Selective reorthogonalization
By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector
y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).
This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :
vj+1=vj+1− (yHvj+1)y .
This so calledselective reorthogonalizationis applied if βj|sj| <
√ εkTjk
(actually kAk would have been needed on the right hand side, but kAk is not available).
Selective reorthogonalization
By Paige’s theorem the vj :s lose orthogonality since the vector vj+1obtained in the final step has a large component with respect to the Ritz vector
y = [v1, . . . ,vj] ∗s corresponding to the converged Ritz value θ (measured by the error bound βj|sj|).
This suggests to monitor the error bounds βj|sj| for all eigenvectors s of Tj in every iteration step, and to reorthogonalize vj+1against the Ritz vector y :
vj+1=vj+1− (yHvj+1)y .
This so calledselective reorthogonalizationis applied if βj|sj| <√εkTjk
Selective reorthogonalization
1: Choose initial vector v1with kv1k = 1 2: Set v0=0; β 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− β j−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: βj = kvj+1k 8: Solve tridiag{βi−1, αi, βi}S = SΘ 9: for i = 1, . . . , j do 10: if βj|s(i)j | < √ εmax(diagΘ)then 11: y = [v1, . . . ,vj]s 12: vj+1=vj+1− (yHvj+1)y 13: end if 14: end for 15: βj = kvj+1k 16: vj+1=vj+1/β j 17: V = [V , vj+1] 18: end for
Partial reorthogonalization
It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.
VjHVj =Ij +E with kEk2= √
ε, where ε denotes the rounding unit.
If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that
Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).
Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.
Partial reorthogonalization
It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.
VjHVj =Ij +E with kEk2= √
ε, where ε denotes the rounding unit.
If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that
Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).
Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.
Partial reorthogonalization
It can be shown (Simon(1984)) that the properties of the Lanczos method are widely retained as long as the basis issemiorthogonal, i.e.
VjHVj =Ij +E with kEk2= √
ε, where ε denotes the rounding unit.
If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj then there exists an orthonormal basis Nj of spanVj such that
Tj =VjHAVj =NjHANj+G with kGk2=O(εkAk2).
Hence, the eigenvalues of the problem projected to spanVj are obtained with full precision.
Partial reorthogonalization
1: Choose initial vector v1with kv1k = 1
2: Set v0=0; β 0=0; V = [v1] 3: for j = 1, 2, . . . do 4: vj+1=Avj− βj−1vj−1 5: αj = (vj)Hvj+1 6: vj+1=vj+1− α jvj 7: βj = kvj+1k 8: vj+1=vj+1/βj 9: if k[V , vj+1]H[V , vj+1] −Ij+1k > √ εthen 10: vj+1=vj+1− VVHvj+1 11: βj = kvj+1k 12: vj+1=vj+1/β j 13: end if 14: V = [V , vj+1] 15: end for
Explicit restarts
The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.
Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.
If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.
If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.
These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.
Explicit restarts
The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.
Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.
If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.
If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.
These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.
Explicit restarts
The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.
Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.
If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.
If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.
These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.
Explicit restarts
The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.
Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.
If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.
If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.
These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart scheme.
Explicit restarts
The growing storage and arithmetic cost may make restarts of the Arnoldi algorithm necessary.
Since the Arnoldi method naturally starts with one vector, one of the most straightforward restarting schemes is to reduce the whole basis into one vector and start the new Arnoldi iteration with it.
If only one eigenvalue is required (for instance the one with the largest real part), we can choose to restart with the corresponding Ritz vector.
If more than one eigenvalue is wanted, we may add all Ritz vectors together to form one starting vector, or use a block version of the Lanczos algorithm that has the same block size as the number of wanted eigenvalues.
These options are simple to implement but not nearly as effective as the more sophisticated ones such as the implicit restarting scheme and the thick restart
Implicit restarts (Sorensen 1992)
With m = k + p steps of the Arnoldi method one determines a factorization AVm=VmHm+rm(em)H.
With p steps of the QR algorithm with implicit shifts for Hmone gets AVm+=Vm+Hm++rm(em)HQ (∗)
where Q = Q1Q2· · · Qp, and Qj are the orthogonal matrices from the p QR–steps, and V+
m =VmQ, Hm+=QHHmQ.
The leading k − 1 components of (em)HQ are 0. Hence, the leading k columns of (*) have the form
AVk+=Vk+Hk++ (rk)+(ek)H with the updated residual (rk)+=V+
mek +1hk +1,k+rmQ(m, k ).