We give an error analysis of all four variants and a numerical comparison in the context of the solution of linear systems and eigenvalue problems

(1)

⊥∀⊥2 c 2016 Jens-Peter M. Zemke

VARIANTS OF IDR WITH PARTIAL ORTHONORMALIZATION^∗

JENS-PETER M. ZEMKE^†

Abstract. We present four variants of IDR(s) that generate vectors such that consecutive blocks ofs+ 1 vectors are orthonormal. IDR methods are based on tuning parameters: an initially chosen, so-calledshadow space, and the so-calledseed values. We collect possible choices for the seed values.

We prove that under certain conditions all four variants are mathematically equivalent and discuss possible breakdowns. We give an error analysis of all four variants and a numerical comparison in the context of the solution of linear systems and eigenvalue problems.

Key words. IDR; partial orthonormalization; minimum norm expansion; error analysis

AMS subject classifications. 65F25 (primary); 65F10; 65F15; 65F50

1. Introduction. We present four computationally different IDR(s) variants that are based on orthonormalization of everys+ 1 vectors computed in the recurrence.

IDR(s) [15] is a recent Krylov subspace method for the solution of linear systems [15,18, 17] or eigenvalue problems [5,9]. IDR is an acronym forinduced dimension reduction; a quite recent¹ technique in the setting of Krylov subspace methods. There exist several different implementations of IDR(s), but the implementation in [17]

is the only published one that computes vectors such that everys+ 1 consecutive in the same space are orthonormalized. We call IDR methods with this property

“IDR with partial orthonormalization” and present three other IDR variants with partial orthonormalization. We prove that in the generic case the four variants are mathematically equivalent, with the exception of possible additional breakdowns of the variant in [17]. We classify breakdowns of all four variants and give a simplea posteriorierror analysis,i.e., the recurrence error is bounded in terms of the computed quantities. IDR is related to the two-sided Lanczos process and suffers from the same possible breakdowns, making ana priori error analysis more or less impossible.

1.1. Motivation. In the IDR variant in [15] several vectors in the same space are computed that are differences of residual vectors corresponding to a linear system of equations to be solved. This minimizes the amount of vectors that have to be stored at the price of additional instabilities. In [18] linear combinations of these vectors are used that simplify the algorithm and speed up the solution process of small linear systems that arise in IDR(s). It is hard to predict whether this local basis transformation makes the method more stable than the original one. As a remedy we used in [17] orthonormalization of the computed vectors in the same space. Numerical experiments suggest that the latter variant is the most stable of these three variants.

At the same time we experimented in [9] with different ways of generating the new vectors in the spaces, combined with the orthonormalization used in [17]. In this note we introduce the four most interesting variants we tested, prove the mathematical equivalence in the generic case, give a common rough error analysis of all four variants, and showcase with two toy examples the typical behavior of the four variants in the context of linear systems and eigenvalue problems.

1.2. Notation. We use standard notation. Matrices are denoted by capital bold lettersA∈C^n×n, its columns by small bold letters a_j, 16j 6n, and its elements by small lettersai,j, 16i, j 6n. The identity matrix is denoted byIn ∈C^n×n, its columns byej, 16j6n, its elements by Kronecker deltaδi,j, 16i, j6n. Odenotes a zero matrix, oa zero vector. Ais the (elementwise) complex conjugate of A. A lower bar appended to a matrix or vector likeH_m∈C^(m+1)×more₁∈C^m+1indicates

∗Version of March 21, 2016, 17:13.

†Institut f¨ur Mathematik E-10, Lehrstuhl Numerische Mathematik, Technische Universit¨at Hamburg-Harburg, D-21073 Hamburg, Germany (zemke@tu-harburg.de).

1Original IDR [20] dates to 1979, but the more interesting variant IDR(s) [15] dates to 2008.

1

(2)

one extra row appended at the bottom ofHm∈C^m×m,e1∈C^m, with the exception of z_m ∈C^m, x_m∈Cⁿ, and r_m ∈Cⁿ, which are quantities related to H_m∈C^m×m ande₁∈C^m. The inverse, Moore-Penrose pseudo-inverse, transpose, and complex conjugate transpose are denoted by appending⁻¹,^†,^T, and^H, respectively. Spaces are denoted by capital calligraphic letters, vectors from this spaces are usually denoted by the same small bold letter. Forx∈R,bxc ∈Zis the largest integer withbxc6x.

Similarly,dxe ∈Zis the largest integer withx6dxe. Inclusion of sets is denoted by

⊆, strict inclusion is denoted by⊂.

1.3. Outline. In section 2 we gather basic definitions and present the IDR theorem, the core of all IDR methods. Section 3 contains a generic IDR algorithm and the four IDR algorithms with partial orthonormalization. We introduce the concept of the so-calledgeneralized Hessenberg decomposition that describes the computed quantities in theory, and give a brief sketch how to apply IDR in the context of linear systems and eigenvalue problems. Section 4 is devoted to the choice of the so-called seed values. In section 5 the mathematical equivalence of the four algorithms is analyzed and different types of breakdown are classified. Section 6 is devoted to an error analysis of all four IDR algorithms with partial orthonormalization. In section 7 we present two numerical examples, one for a linear system and one for an eigenvalue problem. We conclude in section 8 with how to select the appropriate variant.

2. Basics. IDR methods comprise a class of Sonneveld methods; Sonneveld methods comprise a class of Krylov subspace methods. Our definition of Krylov subspaces is tailored to define a class of Sonneveld methods that includes the prototype IDR(s) of [15]:

Definition 2.1 (Krylov subspaces). LetA∈C^n×n andq∈Cⁿ. We define the right Krylov subspaces

Kj:=Kj(A,q):=span{q,Aq, . . . ,A^j−1q}={p(A)q|deg(p)< j} j>1, K0:=K0(A,q):={o}, K:=K(A,q):=Kn(A,q). (2.1) Let additionally Qb = qb1, . . . ,qbs

∈C^n×s with full ranks, typicallysn. We define the left block Krylov subspaces

Kb0:=K0(A^H,Q)b :={o}, Kbj:=Kj(A^H,Q)b :=n^j−1X

i=0

(A^H)ⁱQcb i|ci∈C^s o

=

s

X

i=1

Kj(A^H,qbi), j>1.

(2.2)

Just like Krylov subspace methods are based on Krylov subspaces, Sonneveld methods are based on Sonneveld spaces [13, Definition 2.2, p. 2690]:

Definition 2.2 (Sonneveld/IDR spaces; seed polynomials/values). Let p∈C[z], A∈C^n×n,q∈Cⁿ, andQb ∈C^n×s with full ranks. We define the Sonneveld space

P(p,A,q,Q)b :=p(A) K(A,q)∩ K_deg(p)(A^H,Q)b ^⊥

. (2.3)

In this paper we focus on the IDR spaces

Gj :=P(Mj,A,q,Q),b j>0, (2.4) where the seed polynomialsM_j,j>0, are defined recursively by

M0(z) := 1, Mj(z) := (z−µj)Mj−1(z), j >1. (2.5) The rootsµ_j,j>1, are called seed values.

(3)

The following theorem states some well-known properties of IDR spaces. In particular, they are nested and can be represented recursively without referring toA^H: Theorem 2.3 (IDR Theorem). LetS :={v∈Cⁿ |Qb^Hv=o}=K1(A^H,Q)b ^⊥. Then

G0=K=K(A,q),

G_j = (A−µ_jI)V_j−1, where V_j−1:=G_j−1∩ S, j>1. (2.6) In particular, it holds thatGj ⊆ G_j−1,j>1.

Suppose thatG0 andS do not share a nontrivial invariant subspace ofA, and that µ_j∈/ spec(A),j>1.

Then there exists a uniquely defined j∞ ∈ N0, j∞ 6n, such that the first j∞

inclusions are strict,

Gj ⊂ G_j−1, 16j6j_∞, and Gj_∞ ={o}. (2.7)

Proof. See [15], [11, Theorem 11, p. 1104, Note 2, p. 1105].

By (2.7) and (2.6) ofTheorem 2.3it follows that

0<dim(Gj−1)−dim(Gj)6codim(S) =s, 16j6j∞. (2.8) In [15, p. 1043] it is stated without proof that if S is a random space, then, with probability one,dim(G_j−1)−dim(Gj) =s, 16j6bn/sc=j_∞−1. This is referred to as thegeneric case in [15].

Sonneveld methods andIDR methods are methods that compute approximations (e.g., to eigenvectors or to the solution of a linear system) that are linear combinations of vectors in a Sonneveld space and in an IDR space, respectively. IDR methods are Sonneveld methods, but notvice versa.

The approximations computed by a Sonneveld method take the formG_mc_m for c_m ∈ C^m andG_m = g₁, . . . ,g_m

with columns g₁,g₂, . . . ,g_m ∈ Km, m >1. In contrast to Krylov subspace methods like the Arnoldi method [1] and the Lanczos method [7,8], we do not enforcerank(G_m) =min a Sonneveld method. In an IDR method, typicallym− bm/(s+ 1)c6rank(G_m)6m, compare with the structure of Gm+1 in (2.11).

In the generic case,dim(Gj−1/Gj) =s, 16j < j_∞. The known IDR algorithms computes+ 1 linearly independent vectorsg(j−1)(s+1)+1, . . . ,g_j(s+1)that lie in the setG_j−1\ G_j, 16j < j_∞; the firstsvectors comprise the representatives of a basis of the quotient spaceG_j−1/G_j, the last vector is an auxiliary vector to guarantee that the intersectionspan(g(j−1)(s+1)+1, . . . ,g_j(s+1))∩ S contains a non-trivial vector. To ease the presentation of the algorithms in the next section, we define thelocal IDR matrices

G^(j−1)_s := g(j−1)(s+1)+1, . . . ,g_j(s+1)−1 , G^(j−1)_s+1 := G^(j−1)_s ,g_j(s+1)

,

j>1, (2.9) and thelocal IDR vectors

g^(j−1)_k :=g(j−1)(s+1)+k, 16k6s+ 1, j>1. (2.10) The matrixG^(j−1)_s+1 contains alls+ 1 vectors inG_j−1\ Gj, the matrixG^(j−1)s only the representatives of the basis ofG_j−1/Gj. The global matrixGm+1 is given in terms of local matrices and vectors by

G_m+1= G⁽⁰⁾_s+1,G⁽¹⁾_s+1, . . . ,G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k , j:=

m+ 1 s+ 1

, k:= (m+ 1)−j(s+ 1), m>0. (2.11)

(4)

3. Algorithms. In this section we describe algorithms that compute unique vectors

gk ∈ Kk\ K_k−1, gk∈ Gj, j= k−1

s+ 1

, 16k6m+ 1, (3.1) based on the assumption that the vectors constructed in two consecutive Gj spaces are linearly independent except possibly the last, and that no linear combinations of the firstsvectors constructed in eachGj are in the kernel ofQb^H,

rank(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_s ) = 2s+ 1, 16j <

m+ 1 s+ 1

, (3.2a)

rank(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_m−j(s+1)) =s+ 1 + (m−j(s+ 1)), j=

m+ 1 s+ 1

, (3.2b) rank(Qb^HG^(j)_s ) =s, 06j <

m+ 1 s+ 1

, (3.2c)

this we term thegeneric case. First we present ageneric IDR algorithm to compute the vectors g1, . . . ,gm+1 under this assumption. We derive four computationally different variants of it, these are namedsrIDR,fmIDR, mnIDR, andovIDR. Afterwards we specialize the generic IDR algorithm to an IDR algorithm that has the property that allG^(j)_s+1 are orthonormal,

G^(j)_s+1H

G^(j)_s+1=I_s+1, j >0. (3.3) We term this algorithmIDR with partial orthonormalization.

3.1. Generic IDR. In this section we derive our generic IDR algorithm that includes all known IDR algorithms as special cases.

Let the function [h1,0,H_m,Gm+1] ← Krylov(A,q, m) denote a generic Krylov subspace method that computes a matrixGm+1, im(Gm+1) =Km+1(A,q), a scalar h1,0 such thatGm+1e₁h1,0=q, and an extended Hessenberg matrix H_m, such that theHessenberg decomposition

AG_m=G_m+1H_m (3.4)

is satisfied. Algorithm 1with a rule for the computation of the scalarshi,j∈Cresults in any Krylov subspace method; these scalars might be given,e.g., the power method is obtained forhi,k=δ_i−1,k, or they might be computed inAlgorithm 1, an example is Arnoldi’s process given here in pseudocode asAlgorithm 3.

Algorithm 1Krylov (generic variant) input: A∈C^n×n;q∈Cⁿ;m∈N.

output: h1,0∈C;H_m∈C^(m+1)×m;Gm+1∈C^n×(m+1).

1: g1←q/h1,0;

2: fork= 1 :m do

3: gk+1← Agk−Pk

i=1gihi,k

/hk+1,k;

4: end for

To highlight the dependency on the basis used to define the shadow space we write ker(Qb^H) in place ofS in the IDR algorithms. The generic IDR algorithm is given as Algorithm 2.

InAlgorithm 2, there is a lot of freedom: the choice of the starting Krylov subspace method (line 1), the computation of the seed values (line 5), the solution of thes consecutive underdetermined systems (line 9), and the choice of the scalarsh^(j)_i,k (line 7, line 12).

(5)

Algorithm 2IDR(generic variant)

input: A∈C^n×n;q∈Cⁿ;Qb ∈C^n×s;m∈N.

output: G_m+1∈C^n×(m+1); . . . % see (2.11)

1: [h⁽⁰⁾_1,0,H⁽⁰⁾_s ,G⁽⁰⁾_s+1]←Krylov(A,q, s); % im(G⁽⁰⁾_s+1) =Ks+1⊂ G0 2: forj= 1. . . do

3: c^(j)₀ ←(Qb^HG^(j−1)s )⁻¹(Qb^Hg_s+1^(j−1)); % see (3.2)

4: v₀^(j)←g^(j−1)_s+1 −G^(j−1)s c^(j)₀ ; % v^(j)₀ ∈im(G^(j−1)_s+1 )∩ker(Qb^H)⊂ V_j−1

5: choose µ_j % discussed in section 4

6: r^(j)₁ ←Av₀^(j)−v^(j)₀ µj; % r^(j)₁ ∈(A−µjI)V_j−1=Gj 7: g^(j)₁ ←r^(j)₁ /h^(j)_1,0; % g^(j)₁ ∈ Gj

8: fork= 1 :sdo

9: solveQb^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k−1)c^(j)_k =Qb^Hg_k^(j); % see (3.8)

10: v_k^(j)←g^(j)_k −(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k−1^(j) )c^(j)_k ; % v^(j)_k ∈ V_j−1

11: r^(j)_k+1←Av^(j)_k −v_k^(j)µj; % r^(j)_k+1∈ Gj

12: g^(j)_k+1 ← r^(j)_k+1−Pk

i=1g^(j)_i h^(j)_i,k

/h^(j)_k+1,k; % g^(j)_k+1∈ Gj 13: end for

14: end for

The choice of the scalars in the starting Krylov method and inline 7,line 12will be used to derive our IDR algorithm with partial orthonormalization; the solution of the underdetermined systems inline 9defines our four variantssrIDR,fmIDR,mnIDR, and ovIDR. Mathematically speaking, the selection of the seed values is not very important, from a numerical point of view it is; the selection of appropriate seed values will be discussed insection 4.

We assume that (3.2) is satisfied and show that for any fixed choice of the free parameters, provided the algorithm does not break down, it generates vectors g_k that satisfy (3.1): Inline 1a matrixG⁽⁰⁾_s+1 is computed withim(G⁽⁰⁾_s+1) =Ks+1⊆ G0. We recall that G^(j−1)_s+1 satisfies assumption (3.2) and that im(G^(j−1)_s+1 ) ⊆ G_j−1. By assumption (3.2c), the Sonneveld coefficients c^(j)₀ can be computed in line 3, and determine a vectorv^(j)₀ in the intersectionV_j−1=G_j−1∩ S inline 4. By a dimensional argument this vector is unique up to scaling ifdim(im(G^(j)_s+1) +S) =n, since

dim(im(G^(j−1)_s+1 )∩ S) =rank(G^(j−1)_s+1 )

| {z }

s+1

+dim(S)

| {z }

n−s

−dim(im(G^(j−1)_s+1 ) +S)

| {z }

6n

>1. (3.5)

It is easy to prove that (3.2c) implies dim(im(G^(j−1)_s+1 ) +S) =n. To ensure a nonzero component in the direction of the latestg_s+1^(j) , we scalev^(j)₀ such that this component is one, which results in the linear system in line 3. We are free to choose a new seed value in line 5, which uniquely determines the next IDR space Gj. Possible selection schemes are discussed in section 4. In line 6 a first vector r^(j)₁ ∈ Gj is computed. As no more information aboutGjis available at this step, the only possible transformation is a scaling,e.g., normalization. This is done inline 7and results in the firstg^(j)₁ ∈ Gj. To repeat the whole procedure on the next level, we need sadditional vectorsg_k+1^(j) ∈ Gj, 16k6s. Here we make use of the fact thatGj ⊆ Gj−1, which implies thatim(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k )⊆ G_j−1. By assumption (3.2a) and (3.2b),

rank(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k ) =s+k+ 1, (3.6)

(6)

so we can use a dimensional argument like in (3.5):

dim(im((G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k ))∩ S)

=rank(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k^(j))

| {z }

s+k+1

+dim(S)

| {z }

n−s

−dim(im((G^(j−1)_s+1 ,g₁^(j), . . . ,g^(j)_k )) +S)

| {z }

6n

>k+ 1. (3.7)

Again, assumption (3.2c) impliesdim(im((G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k^(j)))+S) =n. The vectors v^(j)₁ , . . . ,v^(j)_k in the intersectionim(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k )∩ S ⊆ G_j−1∩ S=V_j−1are not uniquely defined. We are looking for a linear combination of the vectors inG_j−1 that areknown at this step, which lies inS. To ensure that the right Krylov subspace is expanded,Kk Kk+1, we scale the component of the current vectorg^(j)_k to one, which results in the underdetermined linear systems inline 9. By assumption (3.2c), thes×(s+k) matrix inline 9has full ranksfor 16k6s,

s>rank(Qb^H(G^(j−1)_s+1 ,g₁^(j), . . . ,g^(j)_k−1))>rank(Qb^HG^(j−1)_s ) =s, (3.8) thus the underdetermined systems are all solvable. We present four variants that result inuniquely defined vectorsv^(j)₁ , . . . ,v^(j)_k . Typically the four variants compute different v₁^(j), . . . ,v^(j)_k :

• srIDR: We set thekfirst components ofc^(j)_k to zero; this results in theshortest recurrence possible, as we no longer need the vectorsg^(j−1)₁ , . . . ,g^(j−1)_k . This might not always be possible, as the rank of the matrix

M^(j,k)_sr :=Qb^H g^(j−1)_k+1 , . . . ,g_s+1^(j−1),g^(j)₁ , . . . ,g^(j)_k−1

∈C^s×s (3.9) might be less thans. The Sonneveld coefficients ofsrIDR are given by

c^(j),sr_k :=

o_k M^(j,k)sr −1

Qb^Hg^(j)_k

. (3.10)

This variant is used in [15,16,18,17].

• fmIDR: We set the klast components ofc^(j)_k to zero; this results by assumption (3.2c) in a non-singular matrix

M^(j)_fm :=M^(j,k)_fm :=Qb^HG^(j−1)_s ∈C^s×s (3.11) that is used fors+ 1 consecutive steps; we use thefactored matrix more than once. The Sonneveld coefficients of fmIDRare given by

c^(j),fm_k :=

M^(j)_fm−1

Qb^Hg^(j)_k o_k

. (3.12)

This variant is used in [13].

• mnIDR: We compute the minimum-norm solution of the underdetermined system and use the full-rank matrix

M^(j,k)_mn :=Qb^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k−1)∈C^s×(s+k). (3.13) The Sonneveld coefficients of mnIDRare given by

c^(j),mn_k := M^(j,k)_mn †

Qb^Hg^(j)_k . (3.14) This variant has been described first in [9], it is used in numerical examples in [22].

(7)

• ovIDR: We use thekdegrees of freedom toorthogonalizeagainst the computed vectors v^(j)₀ , . . . ,v^(j)_k−1, which have to be stored, thereby increasing the storage.

We define

V^(j)_k := v^(j)₀ , . . . ,v^(j)_k−1

∈C^n×k (3.15)

and

M^(j,k)_ov := (Q,b V^(j)_k )^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k−1)∈C(s+k)×(s+k). (3.16) The Sonneveld coefficients of ovIDRare given by

c^(j),ov_k := M^(j,k)_ov ⁻¹

(Q,b V^(j)_k )^Hg^(j)_k . (3.17) This variant has not been published before.

Regardless of the variant used, inline 10uniquevectorsv^(j)_k ∈ Vj−1in the intersection are computed. These are mapped to vectorsr^(j)_k+1∈ G_j inline 11. As in stepkof the inner loop alreadyk previously computed vectorsg^(j)_k exist, we can compute linear combinations with these without leavingGj, and we can scale the result. This is done inline 12, whereg^(j)_k+1∈ G_j is computed. In this manner the algorithm computess+ 1 vectorsg^(j)₁ , . . . ,g^(j)_s+1∈ G_j and we can move to the next level.

ThesrIDRvariant in [15] is based on scalarshi,k that sum column-wise to zero, Pk+1

i=1 hi,k = 0, thesrIDRvariant in [18] uses these scalars to enforcee^T_iQb^Hg_k+1^(j) = 0, 1 6i 6k 6s, thefmIDR variant in [13] uses them to orthonormalize the vectors v^(j)₀ , . . . ,v^(j)s . A natural idea, first mentioned in [9] and first published forsrIDRin [17], is to orthonormalize the resulting vectorsg^(j)₁ , . . . ,g^(j)_s+1; the more general algorithm is described in the next subsection.

3.2. IDR with partial orthonormalization. In this subsection we specialize Algorithm 2 to an IDR algorithm with partial orthonormalization. We replace the generic Krylov method given in Algorithm 1 by Arnoldi’s process [1], [h1,0,H_m,Gm+1] ← Arnoldi(A,q, m), see Algorithm 3. This ensures that G⁽⁰⁾_s+1 is orthonormal, see (3.3).

Algorithm 3Arnoldi

input: A∈C^n×n;q∈Cⁿ;m∈N.

output: h_1,0∈C;H_m∈C^(m+1)×m;G_m+1∈C^n×(m+1).

1: h1,0← kqk;

2: g1←q/h1,0;

3: fork= 1 :m do

4: rk+1←Agk;

5: fori= 1 :kdo

6: hi,k←g^H_irk+1;

7: end for

8: pk+1←rk+1−Pk

i=1gihi,k;

9: hk+1,k← kpk+1k;

10: gk+1←pk+1/hk+1,k;

11: end for

We add the orthonormalization scheme toAlgorithm 2to ensure (3.3), and replace the solution of the underdetermined systems inline 9ofAlgorithm 2 by one of the variantssrIDR, fmIDR, mnIDR, or ovIDR to obtain Algorithm 4, IDR with partial orthonormalization.

3.3. The generalized Hessenberg decomposition. In this subsection we collect the relations between the vectors constructed and the scalars used into matrix

(8)

Algorithm 4IDR(partial orthonormalization) input: A∈C^n×n;q∈Cⁿ;Qb ∈C^n×s;m∈N.

output: G_m+1∈C^n×(m+1); . . . % see (2.11)

1: [h⁽⁰⁾_1,0,H⁽⁰⁾_s ,G⁽⁰⁾_s+1]←Arnoldi(A,q, s); % im(G⁽⁰⁾_s+1) =Ks+1⊂ G0 2: forj= 1. . . do

3: c^(j)₀ ←(Qb^HG^(j−1)s )⁻¹(Qb^Hg_s+1^(j−1)); % see (3.2)

4: v₀^(j)←g^(j−1)_s+1 −G^(j−1)s c^(j)₀ ; % v^(j)₀ ∈im(G^(j−1)_s+1 )∩ker(Qb^H)⊂ V_j−1

5: choose µ_j % discussed in section 4

6: r^(j)₁ ←Av₀^(j)−v^(j)₀ µj; % r^(j)₁ ∈(A−µjI)V_j−1=Gj 7: h^(j)_1,0← kr^(j)₁ k;

8: g^(j)₁ ←r^(j)₁ /h^(j)_1,0; % g^(j)₁ ∈ G_j

9: fork= 1 :sdo

10: if srIDRthen

11: c^(j)_k ←h ok;

Qb^H g_k+1^(j−1), . . . ,g^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k−1^(j) −1

Qb^Hg^(j)_k i

;

12: else if fmIDRthen

13: c^(j)_k ←h

Qb^HG^(j−1)s −1

Qb^Hg^(j)_k ;o_ki

;

14: else if mnIDRthen

15: c^(j)_k ←

Qb^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k−1^(j) )†

Qb^Hg^(j)_k ;

16: else if ovIDRthen

17: c^(j)_k ←

(Q,b V^(j)_k )^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k−1^(j) )⁻¹

(Q,b V^(j)_k )^Hg^(j)_k

;

18: else

19: solveQb^H(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g^(j)_k−1)c^(j)_k =Qb^Hg_k^(j); % see (3.8)

20: end if

21: v_k^(j)←g^(j)_k −(G^(j−1)_s+1 ,g^(j)₁ , . . . ,g_k−1^(j) )c^(j)_k ; % v^(j)_k ∈ V_j−1

22: r^(j)_k+1←Av^(j)_k −v_k^(j)µj; % r^(j)_k+1∈ Gj

23: fori= 1 :kdo

24: h^(j)_i,k ←(g^(j)_i )^Hr^(j)_k+1;

25: end for

26: p^(j)_k+1←r^(j)_k+1−Pk

i=1g_i^(j)h^(j)_i,k;

27: h^(j)_k+1,k← kp^(j)_k+1k;

28: g^(j)_k+1 ←p^(j)_k+1/h^(j)_k+1,k; % g^(j)_k+1∈ Gj 29: end for

30: end for

equations that will be useful later on. We define thelocal matrices R^(j)_s+1:= r^(j)₁ , . . . ,r^(j)_s+1

∈C^n×(s+1) (3.18)

collecting vectors computed in line 6 andline 11of Algorithm 2, and use V^(j)_s+1 as defined by (3.15). In the call to Krylov inAlgorithm 2 in line 1 and in line 12 of Algorithm 2 GR decompositions² of q,AG⁽⁰⁾s

and R^(j)_s+1, 1 6 j, are computed, respectively:

q,AG⁽⁰⁾_s

=G⁽⁰⁾_s+1

e₁h⁽⁰⁾_1,0 H⁽⁰⁾_s

, R^(j)_s+1=G^(j)_s+1

e₁h^(j)_1,0 H^(j)_s

. (3.19) InAlgorithm 3andAlgorithm 4these are QR decompositions.

We define theglobal matrix Vm in terms of local vectors and matrices by V_m:= g₁⁽⁰⁾, . . . ,g⁽⁰⁾_s ,V_s+1⁽¹⁾ , . . . ,V^(j−1)_s+1 ,v^(j)₀ , . . . ,v^(j)_k−1

. (3.20)

2A GR decomposition is a decomposition of the form “general matrix” times “upper triangular matrix”, see [19].

(9)

Inline 4and inline 10ofAlgorithm 2 the vectorsv^(j)₀ , . . . ,v^(j)s ∈ V_j−1 are computed as linear combinations

V^(j)_s+1=

G^(j−1)_s+1 G^(j)_s+1







−c^(j)₀ . .. −c^(j)s

1 . .. 1 o_s+1 . .. o₁







=:

G^(j−1)_s+1 G^(j)_s+1

U^(j)_s+1. (3.21)

The local matricesR^(j)_s+1 are given by

R^(j)_s+1=AV^(j)_s+1−V^(j)_s+1diag(µj, . . . , µj). (3.22) Combining equations (3.19), (3.21) and (3.22), we obtain the coupling between two local blocksG^(j−1)_s+1 andG^(j)_s+1,j>1,

A

G^(j−1)_s+1 G^(j)_s+1

U^(j)_s+1

=

G^(j−1)_s+1 G^(j)_s+1

o_s+1 O_s+1,s e₁h^(j)_1,0 H^(j)_s

+U^(j)_s+1diag(µ_j, . . . , µ_j)

. (3.23) Gluing these relations together and topping them with the first equation in (3.19), we obtain thegeneralized Hessenberg decomposition

AGmUm=Gm+1H^total_m , H^total_m := (H_m+U_mDm) (3.24) of IDR that captures the recurrences of the vectorsg_k, 16m+ 1, in bothAlgorithm 2 andAlgorithm 4; the structure of the resulting matrices is described below.

The matrixU_m∈C^(m+1)×mhasIs as leadings×sblock, followed by allU^(j)_s+1, j>1, aligned such that all ones are on the diagonal, the last block column may have less than s+ 1 columns. The matrix Um results from U_m by stripping of the last (zero) row; U_mis unit upper triangular, banded with upper bandwith 2sand has a staircase-like structure, see the example (3.25) taken from [22], whereU_m andH_m are depicted fors= 2 andm= 9 = 3(s+ 1),

U9=







◦ •••

◦•••◦••

◦••••

◦•••◦••◦••

◦•◦





, H₉=







••◦•

◦◦••

◦•◦◦••

◦•◦

◦







. (3.25)

Circles inU9depict the unit diagonal elements; bullets inU9 depict the Sonneveld coefficients −c^(j)_k defined by the IDR variant, e.g., srIDR (line 3 & line 11 of Al- gorithm 4), fmIDR (line 3 & line 13 of Algorithm 4), mnIDR (line 3 & line 15 of Algorithm 4), or ovIDR(line 3&line 17ofAlgorithm 4). The matricesU^sr_mandU^fm_m have additional known zero elements,e.g., the upper bandwidth ofU^sr_mdrops from 2s tos; the structure is depicted here fors= 2 and m= 9 = 3(s+ 1),

U^sr₉ =







◦ •◦••

◦••◦••

◦••◦•◦





, U^fm₉ =







◦ •••◦•••

◦◦ •••

◦•••◦

◦ •◦•◦





. (3.26)

The matrixH_m∈C^(m+1)×mhasH⁽⁰⁾_s as leading (s+ 1)×sblock, followed by all upper triangular basis transformation matrices

e₁h^(j)_1,0 H^(j)_s

∈C(s+1)×(s+1), j>1, (3.27) aligned such that the nonzero scaling elements h^(j)_k+1,k are on the first subdiagonal, the last block column may have less thans+ 1 columns. The band matrix H_m is

(10)

an unreduced extended upper Hessenberg matrix with upper bandwidths−1. The example (3.25) reveals the structure ofH₉ fors= 2. Circles inH₉ depict the nonzero scaling elements h^(j)_k+1,k, 0 6 k 6 s, j > 0, omitting h⁽⁰⁾_1,0; bullets depict the other elementsh^(j)_i,k, 16i6k6s,j>0, that are used inAlgorithm 2andAlgorithm 4.

The diagonal matrixDm∈C^m×mis obtained by taking ans×szero matrix and diagonally gluing together all diagonal matricesµjIs+1from (3.23),i.e.,

Dm= diag(0, . . . ,0

| {z }

stimes

, µ1, . . . , µ1

| {z }

s+ 1 times

, µ2, . . . , µ2

| {z }

s+ 1 times

, . . . , µj, . . . , µj

| {z }

ktimes

), j = m

s+ 1

, (3.28)

where k=m+ 1−j(s+ 1).

The generalized Hessenberg decomposition (3.24) is the basis for different algorithms to approximate solutions of linear systems and eigenvectors. These are introduced rather briefly in the next subsections.

3.3.1. Linear systems. In this subsection we use boldfacerto denote residual vectors. We want to approximate the solutionxof a given linear systemAx=b. Let x0be an initial approximation and define the residual r0:=b−Ax0. Suppose that Algorithm 4 is invoked withq=r0. Then by (3.24)

AG_mU_m=G_m+1H^total_m =G_mH^total_m +g_m+1h_m+1,me^T_m,

G_m+1e₁h⁽⁰⁾_1,0=G_me₁h⁽⁰⁾_1,0=r₀. (3.29) In the OR approach [9, p. 1048], we define themthOR solution zm∈C^m and themthOR iterate xm∈Cⁿ by

zm:= H^total_m −1

e1h⁽⁰⁾_1,0, xm:=x0+Vmzm. (3.30) The mth OR solution need not exist. By (3.29), the mth OR residual r_m∈ Cⁿ is given by

rm:=b−Axm=−gm+1hm+1,me^T_mzm, krmk=|hm+1,me^T_mzm|, (3.31) thus, themth residual is parallel to the next vector gm+1. The computation ofxm is possible without the need to store all vectorsg1, . . . ,gm.

In the MR approach [9, p. 1048], we define themth MR solution z_m∈C^mand themthMR iterate x_m∈Cⁿ by

z_m:= H^total_m †

e₁h⁽⁰⁾_1,0, x_m:=x₀+V_mz_m. (3.32) Themth MR solution always exists. ThemthMR residual r_m∈Cⁿ is defined by and can be bounded using (3.29) by

r_m:=b−Ax_m, kr_mk6kG_m+1kkH^total_m z_m−e_mh⁽⁰⁾_1,0k. (3.33) By [9, Lemma 4, p. 1058], kG_m+1k 6p

d(m+ 1)/(s+ 1)e in case of Algorithm 4.

An implementation of the MR approach for the srIDR variant of IDR with partial orthonormalization is given in [17].

3.3.2. Eigenvalue problems. The seed values are eigenvalues of theSonneveld pencil (H^total_m ,U_m), see [22]. The other eigenvalues θ_j can be used as approximations to eigenvalues ofA, corresponding approximate eigenvectorsy_j are given by

H^total_m sj =θjUmsj, yj:=Vmsj. (3.34) It is possible to define other pencils based on the seed values, Sonneveld coefficients and orthonormalization coefficients to compute eigenvalues, see [22], or to extend this Ritz approach to the so-calledharmonic Ritz approach [9, p. 1047].

(11)

4. Seed values. From a mathematical point of view the selection of the the seed values is not that important; the induced dimension reduction occurs independently of their selection, as long as no seed value is an eigenvalue ofA. A natural idea is to use a fixed seed value,µj =µ,j>1,e.g.,µ= 0. The latter choice results in a singular H^total_m =Hmfor allm > s and the OR approach (3.30) fails for all m > s; the MR approach (3.32) stagnates for allm > s[9, Lemma 3, p. 1057].

A constantµresults in a Jordan block atµin the Sonneveld pencil, as H^total_m − µUm=Hm+Um(Dm−µIm) has the same nonzero structure asHm;i.e.,H^total_m −µUm

has the eigenvalue 0 with algebraic multiplicity at leastbm/(s+ 1)cand geometric multiplicity 1, compare with the example (3.25). This might cause problems with numerical eigenvalue computations ifAhas eigenvalues close toµ.

Numerically, IDR and other Sonneveld methods deviate from their exact coun- terparts and ghost eigenvalues close to the seed values are computed. Numerical experiments indicate that the best constant seed value is the meanµ=trace(A)/n of the eigenvalues ofA.

More interesting are seed value selection schemes that take local information into account when computing µ_j, mostly the vectors v^(j)₀ andAv^(j)₀ . We present a few general schemes, divided into those designed for linear systems and those designed for eigenvalue problems. A new scheme is presented that combines the ideas underlying both approaches.

4.1. Seed values for linear systems. OR methods construct residuals parallel to the vectorsg_k+1. The residuals in a Krylov subspace method can always be written asr_k=r_k(A)r₀, where theresidual polynomial r_k satisfiesr_k(0) = 1. To minimize the residual, we think in terms of residual polynomials and replacez−µj by the differently scaled 1−zωj, whereωj=µ⁻¹_j , and minimize the scaledr^(j)₁ with respect toωj,

min

ω_j∈Ckv^(j)₀ −Av^(j)₀ ω_jk ⇒ ω_j= Av^(j)₀ ^H v₀^(j) Av^(j)₀ ^H

Av^(j)₀

, (4.1)

i.e., we defineµj by

µj :=ω⁻¹_j = Av^(j)₀ ^H Av₀^(j) Av^(j)₀ H

v^(j)₀

. (4.2)

This results in a harmonic Rayleigh quotient, i.e., the resulting µ_j are inverses of elements of the field of values of the inverse ofA[10], since

µ_j = ev^Hev

ve^HA⁻¹ve, where ev:=Av₀^(j). (4.3) In [12] the resulting linear polynomials 1−zωj are termed MR(1)-polynomials. This approach is used in [20, 15, 18] and results in seed values that are not too close to zero. It turns out that it is unstable forAsuch that the field of values includes zero, since then the seed values may become very large.

A modification known as “vanilla” technique has been developed and motivated forBiCGStaband related methods in [12, Theorem 3.1, p. 210; Eqn. (28), p. 213]:

compute the minimizer (4.1) and the cosinecof the Hermitean angle between Av^(j)₀ andv^(j)₀ ,

c:= | v^(j)₀ H

A^Hv^(j)₀ |

kAv^(j)₀ kkv^(j)₀ k . (4.4)

(12)

Ifc < κ,i.e., if this angle is too large³, thenωj is scaled and the new value

˜ ωj := κ

cωj, µj := ˜ω⁻¹_j =κ⁻¹· Av₀^(j)H

Av^(j)₀ Av^(j)₀ ^H

v₀^(j)

·| Av^(j)₀ H

v^(j)₀ | kAv^(j)₀ kkv^(j)₀ k

=κ⁻¹· kAv^(j)₀ k

kv^(j)₀ k ·sign v^(j)₀ H

Av^(j)₀ v^(j)₀ H

v₀^(j)

! (4.5)

is used. This modification ensures that all computed seed values are only moderately outside the field of values ofAand not too close to zero.

4.2. Seed values for eigenvalue problems. A natural idea in eigenvalue computations is to minimizer^(j)₁ with respect toµj; this gives the Rayleigh quotient with v^(j)₀ , as a consequence, r^(j)₁ is perpendicular tov^(j)₀ ,

min

µ_j∈CkAv^(j)₀ −v^(j)₀ µ_jk ⇒ µ_j:= v^(j)₀ ^H Av₀^(j) v^(j)₀ ^H

v^(j)₀

, r^(j)₁ ⊥v₀^(j). (4.6) This technique ensures that all computed seed values are in the field of values ofA. If the field of values encloses zero, a zero or small seed value may occur; this leads to problems in the OR and MR approaches.

We could use other Rayleigh quotients. We can ensure that the last diagonal element inH^total_s+1 is same as in Arnoldi’s process, if we set

µ1= g_s+1^H Ag_s+1 g^H_s+1gs+1

, i.e., we set µj := g^(j−1)_s+1 ^H

Ag^(j−1)_s+1 g^(j−1)_s+1 H

Ag^(j−1)_s+1

. (4.7)

In [10] bm/(s+ 1)cextra multiplications by Aare invested to compute Ritz values using Arnoldi’s method, which are used as seed values.

4.3. A balanced approach to seed values. The approaches (4.2) and (4.6) minimize the norm of a multiple of r^(j)₁ subject to a scaling of the vector Av₀^(j), see (4.2) and (4.1), or subject to a scaling of the vectorv^(j)₀ , see (4.6).

To treat both ingredients equally, we normalize bothAv^(j)₀ andv^(j)₀ to get rid of scaling issues with large or smallA, solve

α,β∈minC

Av^(j)₀

kAv^(j)₀ kα− v₀^(j) kv₀^(j)kβ

s.t.

α

−β

= 1, (4.8)

and setµ_j =β/α· kAv^(j)₀ k/kv^(j)₀ k. This is a mixture of an eigenvalue based and a linear system solver based approach, we expect the seed values to be away from zero for non-singularAand not too large.

The solution of (4.8) is given by the left singular vector of the smallest singular value of the matrix

B:= Av₀^(j) kAv₀^(j)k

v^(j)₀ kv^(j)₀ k

!

. (4.9)

We compute this singular vector as the eigenvector to the smallest eigenvalue of B^HB=

1 b

b 1

, b:= v^(j)₀ H

Av^(j)₀

kAv^(j)₀ kkv₀^(j)k. (4.10)

3In [12] the valueκ= 0.7 is used as upper bound, which corresponds to a rounded value of the obvious choice√

2/2 = cos(π/4).

(13)

The eigenvector to the smallest eigenvalue 1− |b|is given by 1 b

b 1

|b|

−b

= |b|

−b

(1− |b|), (4.11) which leads to a “simplified vanilla scheme” that we call “cinnamon” technique,

µj:= kAv₀^(j)k kv^(j)₀ k · b

|b| =kAv^(j)₀ k

kv₀^(j)k ·sign v₀^(j)H

Av^(j)₀ v^(j)₀ H

v^(j)₀

!

. (4.12)

This scheme is a mixture between an eigenvalue based and an SVD based approach:

we take the direction given by the Rayleigh quotient, but the length given by the amplification ofv^(j)₀ byA. These values will be on the annulus defined by{z∈C| σn(A)6|z| 6σ1(A)} and in direction of the field of values of A. This approach might cause problems: ifAv₀^(j) ⊥v^(j)₀ , both singular values coincide and the sign in (4.12) is not defined.

4.4. Additional orthogonality. Additional orthogonality between the last g^(j−1)_s+1 and the newg^(j)₁ can be enforced by setting

µ_j:= g^(j)_s+1H

Av^(j)₀ g^(j)_s+1^H

v^(j)₀

, then g^(j)₁ kr^(j)₁ =Av₀^(j)−v^(j)₀ µ_j⊥g_s+1^(j−1). (4.13) Unfortunately, numerical experiments⁴indicate that this approach is very unstable;

many of the resulting values ofµ_j lie far outside the field of values ofA.

5. Mathematical equivalence and classification of breakdowns. The following theorem states that the four variants of IDR with partial orthonormalization are all equivalent as long as assumption (3.2) holds true, except that the srIDRvariant may break down.

Theorem 5.1. Suppose thatQb is chosen such that assumption (3.2)holds true, and that one of the following seed value selection schemes

• preselected seed values, e.g., constant or a list of given seed values,

• a local seed value selection scheme, i.e.,(4.2),(4.5),(4.6), (4.7), (4.12),(4.13) is used.

Then the variantsfmIDR,mnIDR, and ovIDR of IDR with partial orthonormalization are mathematically equivalent, e.g., given the same input data, they compute the same vectors gk,k>1.

There exist cases where assumption (3.2)holds true and thesrIDRvariant of IDR with partial orthonormalization breaks down, which we term a pivot breakdown. When no pivot breakdown in the srIDR variant occurs, it constructs the same vectors gk, k>1, as the other three variants.

Proof. All four variants of IDR with partial orthonormalization are completely deterministic. We first suppose that no variant breaks down and prove that the spacesG_j and the vectorsg^(j)₁ , . . . ,g^(j)_s+1∈ G_j,j>0, are the same in all four variants, regardless of the choice of the seed value selection scheme listed in the theorem.

The IDR spacesGj,j>1, are uniquely defined by the seed valuesµj,j>1, which in turn are either fixed or computed based on the vectorv^(j)₀ and, possibly, the vector g^(j−1)_s+1 . The vectorv^(j)₀ is the same vector in all four variants, if thes+ 1 orthonormal vectorsg^(j−1)₁ , . . . ,g^(j−1)_s+1 ∈ Gj−1 are the same in all four variants, which implies that when additionally all G_`,` < j, are the same, then in this case the nextG_j is the same in all four variants.

4This observation was also made by Martin van Gijzen, who first came up with this idea and experimented with this kind of additional orthogonality.