• Keine Ergebnisse gefunden

Krylov subspace methods for estimating operator‑vector multiplications in Hilbert spaces

N/A
N/A
Protected

Academic year: 2022

Aktie "Krylov subspace methods for estimating operator‑vector multiplications in Hilbert spaces"

Copied!
23
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

ORIGINAL PAPER

Krylov subspace methods for estimating operator‑vector multiplications in Hilbert spaces

Yuka Hashimoto1,2  · Takashi Nodera3

Received: 17 April 2020 / Revised: 26 January 2021 / Accepted: 2 February 2021 / Published online: 21 February 2021

© The Author(s) 2021

Abstract

The Krylov subspace method has been investigated and refined for approximating the behaviors of finite or infinite dimensional linear operators. It has been used for approximating eigenvalues, solutions of linear equations, and operator functions act- ing on vectors. Recently, for time-series data analysis, much attention is being paid to the Krylov subspace method as a viable method for estimating the multiplications of a vector by an unknown linear operator referred to as a transfer operator. In this paper, we investigate a convergence analysis for Krylov subspace methods for esti- mating operator-vector multiplications.

Keywords Krylov subspace method · Operator-vector multiplication · Unbounded operator · Transfer operator · Nonlinear dynamical system · Infinite dimensional Hilbert space

Mathematics Subject Classification 65F60 · 37M10

1 Introduction

Linear operators are used in various tasks in engineering and scientific research such as simulations and data analysis. A classical example of linear operators is a differential operator used for describing various natural phenomena. Krylov

* Yuka Hashimoto

yuka.hashimoto.rw@hco.ntt.co.jp; yukahashimoto@math.keio.ac.jp Takashi Nodera

nodera@math.keio.ac.jp

1 NTT Network Technology Laboratories, NTT Corporation, 3-9-11, Midori-cho, Musashino, Tokyo 180-8585, Japan

2 School of Fundamental Science and Technology, Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku, Yokohama, Kanagawa 223-8522, Japan

3 Department of Mathematics, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku, Yokohama, Kanagawa 223-8522, Japan

(2)

subspace methods have been actively researched in numerical linear algebra for approximating the behavior of such given operators, such as approximation of eigenvalues, solutions of linear equations, and operator functions acting on vec- tors, which provide approximations of solutions or information on the solutions of differential equations [16, 19, 24, 27, 31–33, 35, 36, 43, 45]. In many cases, problems in infinite dimensional spaces such as differential equations are discre- tized by, for example, a finite difference method [8] or finite element method [1]

and are transformed into finite dimensional problems with matrices, after which the Krylov subspace methods are applied to the matrices. On the other hand, Kry- lov subspace methods for operators in infinite dimensional Hilbert spaces without discretization have also been investigated and more general results than those for matrices have been developed [6, 9–12, 15, 26, 28, 30].

Meanwhile, linear operators that represent time evolutions in dynamical sys- tems, called transfer operators, are being investigated in relation to various fields such as machine learning, physics, molecular dynamics, and control engineer- ing [18, 21, 25, 40–42]. Unknown transfer operators are estimated through data generated by dynamical systems. Since the transfer operators are generally lin- ear even if the dynamical systems are nonlinear, Krylov subspace methods can be used to understand nonlinear dynamical systems  [3, 4, 14, 20]. To make the algorithm computable only with the data, transfer operators are often dis- cussed in relation to RKHSs (reproducing kernel Hilbert spaces). The Arnoldi and shift-invert Arnoldi methods have been proposed as Krylov subspace meth- ods for transfer operators in RKHSs. The Arnoldi method is a standard Krylov subspace method [3, 35], but for its convergence, operators applied to it have to be bounded. However, not all transfer operators are bounded [17]. For example, transfer operators defined in the RKHS associated with the Gaussian kernel are unbounded if the dynamical system is nonlinear and deterministic. Thus, the shift- invert Arnoldi method was also considered [14]. When we apply the shift-invert Arnoldi method, a shifted and inverted operator (𝛾IK)−1 for some 𝛾 which is not contained in the spectrum of K was considered instead of an unbounded K.

The main difference between the classical settings assumed for the Krylov sub- space methods mentioned in the first paragraph and those for transfer operators is whether the information of the model is given or not. In the classical setting, a differential operator is given, and a model-driven approach with the operator is applied. On the other hand, in the above setting for transfer operators, neither a dynamical system nor a transfer operator is given. Instead, data generated by the system is given. A data-driven approach is applied in this case. The purpose of applying the Krylov subspace method also differs in some situations. When we apply it to a transfer operator, denoted as K, one important task is to estimate Knv for a given vector v and some n because K is unknown. On the other hand, in the classical setting for a given operator such as a differential operator, denoted as A, Av for a given vector v is already known, because both A and v are known. The main task of the Krylov subspace method in such a setting is to estimate f(A)v for a given vector v and a function f such as f(z) =z−1 and f(z) =ez , except for f(z) =zn . For this reason, Krylov subspace methods for estimating operator-vec-

(3)

linear algebra. Meanwhile, although Krylov subspace methods for estimating operator-vector multiplications have been proposed in machine learning, the con- vergence analysis for them has not been fully investigated.

The objective of this paper is to analyze the convergence of such Krylov subspace methods for estimating operator-vector multiplications. We define a “residual” for approximating Knv for a vector v and analyze the convergence of the residuals of the Krylov approximations. The classical Krylov subspace methods for estimating f(A)v are frequently associated with residuals. For f(z) =z−1 , for example, the GMRES (gen- eralized minimal residual method) approximation minimizes the residual in a Krylov subspace and the convergence of the residual is superlinear [9, 28, 44]. Moreover, in BiCG (biconjugate gradient) type approximations, the residual or a value relevant to the residual is orthogonal to the Krylov subspace [43]. For a more general f, a general- ized residual is proposed for evaluating the convergence of approximations [2, 13, 16, 34].In our case, we show that the Arnoldi approximation converges to the minimizer of the residual. To illustrate this point, an error bound for a Krylov approximation of an operator function acting on a vector is used [10–12, 15, 27]. For the shift-invert Arnoldi method, the convergence analysis is not straightforward. At first glance, the problem of estimating Knv seems to be the same as that of estimating the operator function f(K) acting on the vector v, where f(z) =zn . However, the situation is differ- ent from that of the classical Krylov subspace methods in terms of operator functions.

The existing error bound for the Krylov approximation of f(K)v requires an assump- tion of the holomorphicity of f on the spectrum of K. On the other hand, the function

f(z) = (𝛾 −z−1)n , where “ f((𝛾IK)−1) =Kn ” holds formally, is not holomorphic at 0, but 0 is contained in the spectrum of (𝛾IK)−1 if K is unbounded. We resolve this problem through the factor K−n that appears in the residual.

This paper is structured in the following manner. In Sect. 2, to explain why operator- vector multiplications need to be estimated for data analysis, we review the definition of a transfer operator and the Krylov subspace methods for it. In Sect. 3, we generalize the problem to Krylov approximations for estimating operator-vector multiplications for linear operators in a Hilbert space and investigate a convergence analysis. In Sect. 4, we empirically confirm the results investigated in Sect. 3. Section 5 is the conclusion.

1.1 Notations

Linear operators are denoted with standard capital letters, except for m×m matrices, which are denoted in bold. Calligraphic capital letters and Italicized Greek capital let- ters denote sets. The inner product and norm are denoted as ⟨⋅,⋅⟩ and ‖⋅‖ , respectively.

2 Background

In this section, we briefly review the definition of Perron–Frobenius operators and Krylov subspace methods for Perron–Frobenius operators [14, 20]. Perron–Frobe- nius operators are transfer operators often discussed in relation to RKHSs, and

(4)

their Krylov subspaces naturally appear [18, 20, 22]. The adjoint operators of Per- ron–Frobenius operators are referred to as the Koopman operators [23], which are also transfer operators and have been researched for data-driven approaches [3, 4, 21, 40, 42].

2.1 Perron–Frobenius operator in RKHS

Consider the following dynamical system with random noise [14]:

where t≥0 , (𝛺,F) is a measurable space, (X,B) is a Borel measurable and locally compact Hausdorff vector space, Xt and 𝜉t are random variables from 𝛺 to X , and h∶X→X is a generally nonlinear map. Assume {𝜉t}t∈ is an i.i.d. stochastic process and that 𝜉t is also independent of Xt . The random variable 𝜉t corresponds to random noise in X.

Let P be a probability measure in 𝛺 . The nonlinear time evolution of Xt in a dynamical system  (1) is regarded as a linear time evolution of the push forward measure Xt∗P , defined by Xt∗P(B) =P(Xt−1(B)) for B∈B . To describe the time evolution in a Hilbert space, RKHS [37] is used. An RKHS is a Hilbert space con- structed by a map k∶X×X→ called a positive definite kernel. For x∈X , a map 𝜙∶X→X defined as 𝜙(x) =k(x,⋅) is called a feature map. Let H

k,0 be a vector space defined as

In H

k,0 , the inner product associated with k is defined, and the completion of H

k,0 , which is denoted as Hk , is called an RKHS. An observation z∈X is regarded as a vector 𝜙(z) in Hk through 𝜙 . Moreover, if k is bounded, continuous, and c0-universal, then the space of all the complex-valued finite regular Borel measures on X , which is denoted as M(X) , is densely embedded into Hk . That is, a map 𝛷∶M(X)→Hk defined as 𝜇↦∫x∈X𝜙(x)d𝜇(x) is injective [39] and 𝛷(M(X)) is dense in Hk [14].

Here, c0-universal means that Hk is dense in the space of all continuous functions that vanish at infinity. The map 𝛷 is called a kernel mean embedding  [29]. For example, the Gaussian kernel e−cx−y22 and Laplacian kernel e−cx−y1 with c>0 for x, yd are bounded and continuous c0-universal kernels. Therefore, a complex- valued finite regular Borel measure 𝜇 is regarded as a vector 𝛷(𝜇) in the dense sub- set of Hilbert space Hk . Since the map 𝛷∶M(X)→Hk is linear, it is possible to define a linear operator K∶ 𝛷(M(X))→Hk , which is called a Perron–Frobenius operator, in Hk as follows:

where 𝛽t∶X× 𝛺→X is defined as (x,𝜔)h(x) + 𝜉t(𝜔) . Since 𝜉t and Xt are independent, 𝛷(𝛽t∗(Xt∗P⊗P)) = 𝛷((h(Xt) + 𝜉t)P) holds, and K maps 𝛷(Xt∗P) to

(1) Xt+1=h(Xt) + 𝜉t,

Hk,0= {∑n

i=1

ci𝜙(xi) ∣ n, c1, cn, x1,…, xn ∈X }

.

(2) K𝛷(𝜇) = 𝛷(𝛽t∗(𝜇 ⊗P)),

(5)

𝛷(Xt+1∗P) . In addition, since {𝜉t}t∈ is an i.i.d. process, it can be shown that K does not depend on t.

2.2 Krylov subspace methods for Perron–Frobenius operators

Let {x0, x1,…} ⊆X be observed time-series data from the dynamical system (1), i.e., xt=Xt(𝜔0) for some 𝜔0∈ 𝛺 . By using Krylov subspace methods, we estimate Kn𝜙(xt) for xt∈X to predict 𝜙(xt+n) through available data. In the mth Krylov step, the data is split into S datasets. Examples of the choice for S are S=m+1 and S=M for a sufficiently large natural number M. Let 𝜇t,NS =1∕N∑N−1

i=0 𝛿x

t+iS(t=0,…, m)

be empirical measures with the datasets, where N and 𝛿x denotes the Dirac measure at x∈X . It is assumed that 𝜇St,N weakly converges to a finite regular Borel measure 𝜇tS as N→∞ for t=0,…m.

A Krylov subspace is constructed with 𝛷(𝜇St) . To construct the Krylov subspace only with the observed data {x0, x1,… } , the following equality of the average of noise 𝜉t is assumed for any measurable and integrable function f:

The left- and right-hand sides of Eq. (3) represent the space average and time aver- age of 𝜉t , respectively. The same types of assumptions as Eq. (3) have also been con- sidered in other studies [4, 42]. By applying the above settings and assumptions, the Arnoldi and shift-invert Arnoldi approximations for the Perron–Frobenius operator K, defined as Eq. (2), are computed as explained in the following subsections.

2.3 The Arnoldi method

In this section, the Perron–Frobenius operator K is assumed to be bounded. Under the assumption (3), the following equation is derived since 𝛷 is continuous:

Thus, if K is bounded, K𝛷(𝜇tS) = 𝛷(𝜇St+1) holds. Therefore, if the set of the vec- tors {𝛷(𝜇S0),…,𝛷(𝜇Sm−1)} is linearly independent, the following space, denoted as Km(K,𝛷(𝜇0S)) , is an m-dimensional Krylov subspace of the operator K and vector 𝛷(𝜇S0):

Remark 2.1 If S depends on m, the initial vector 𝛷(𝜇0S) depends on m. In this case, the inclusion K

m−1(K,𝛷(𝜇0S)) ⊆K

m(K,𝛷(𝜇S0)) does not always hold. On the other (3)

Nlim→∞

1 N

N−1

i=0𝜔∈𝛺f(h(xt+iS) + 𝜉t(𝜔))dP(𝜔)

= lim

N→∞

1 N

N−1

i=0

f(h(xt+iS) + 𝜉t+iS(𝜂))a.s.𝜂∈ 𝛺.

(4)

N→∞lim K𝛷(𝜇t,NS ) = 𝛷(𝜇t+1S ) (t=0,…, m−1).

Km(K,𝛷(𝜇S0)) =Span{𝛷(𝜇S0),…,𝛷(𝜇Sm−1)}.

(6)

hand, if S does not depend on m, 𝛷(𝜇S0) does not depend on m and the inclusion Km−1(K,𝛷(𝜇S0)) ⊆Km(K,𝛷(𝜇0S)) holds.

Let q1,…, qm be an orthonormal basis of the Krylov subspace Km(K,𝛷(𝜇S0)) obtained through the Gram–Schmidt orthonormalization and Qmm→Hk be defined as [c1,…, cm]↦∑m

i=1ciqi . Note that QmQm , where means adjoint, is a projection operator onto the Krylov subspace. There exists an invertible matrix 𝐑mm×m such that [𝛷(𝜇S0),…,𝛷(𝜇Sm−1)] =Qm𝐑m . This makes it possible to com- pute the following Arnoldi approximation of K𝜙(z) for an observable z∈X only with observed data {x0, x1,…}:

where 𝐊̃m=QmKQm=Qm[𝛷(𝜇S1),…,𝛷(𝜇Sm)]𝐑−1m. 2.4 The shift‑invert Arnoldi method

The convergence of the Arnoldi method along m is not always attained if K is unbounded [14]. According to Ikeda et al. [17], not all the Perron–Frobenius opera- tors are bounded. For this reason, the shift-invert Arnoldi method is also considered.

Let 𝛾 ∉ 𝛬(K) be fixed, where 𝛬(K) is the spectrum of K under the assumption of 𝛬(K) , and consider using bounded bijective operator (𝛾IK)−1 . Under the assumption (3), the following equation is derived:

where uS

t,N=∑t

i=0

t i

(−1)i𝛾t−i𝛷(𝜇i,NS ) and uSt =∑t i=0

t i

(−1)i𝛾t−i𝛷(𝜇iS) . Since (𝛾IK)−1 is bounded, (𝛾IK)−1uSt+1=uSt holds. Therefore, if the set of the vectors {uS1,…, uSm} is linearly independent, then the space spanned by {uS1,…, uSm} is an m-dimensional Krylov subspace of the operator (𝛾IK)−1 and vector uSm . Similar to the Arnoldi method, let q1,…, qm be an orthonormal basis of the Krylov subspace Km((𝛾IK)−1, uSm) obtained through the Gram–Schmidt orthonormalization and Qmm→H

k be defined as [c1,…, cm]↦∑m

i=1ciqi . There exists an invertible matrix 𝐑mm×m satisfying [uS1,…, uSm]=Qm𝐑m . If K is unbounded, Kv for v∈H is not always defined. However, if v∈ 𝛷(M(X)) , Kv is defined, in which case Kv is k

represented as (𝛾I− ((𝛾IK)−1)−1)v . On the basis of this observation, the following shift-invert Arnoldi approximation of K𝜙(z) for z∈X is deduced if 𝐋̃m , which is defined as 𝐋̃m=Qm(𝛾IK)−1Qm=Qm[uS0,…, uSm−1]𝐑−1m , is invertible:

K𝜙(z) ≈QmQmKQmQm𝜙(z)

=QmQm[𝛷(𝜇S1),…,𝛷(𝜇Sm)]𝐑−1mQm𝜙(z)

=Qm𝐊̃mQm𝜙(z),

(5)

Nlim→∞(𝛾IK)−1uSt+1,N=uSt,

(7)

where f𝛾(z) = 𝛾 −z−1 for z and 𝐊̃m=f𝛾𝐋m).

3 Convergence analysis

In this section, we provide a convergence analysis of the Arnoldi method and shift- invert Arnoldi method described in Sect. 2. The problem is generalized to a separable complex Hilbert space H and linear operator K on H by setting v= 𝜙(z) , v0= 𝛷(𝜇S0) , and vi=Kiv0 for i=1,…, m.

In Sect. 3.1, we generalize the problem. In Sect. 3.2, we define a residual of an approximation of Knv . Then, we investigate the relationship between the two meth- ods and the residuals in Sects. 3.3 and 3.4.

3.1 The general setting for Krylov subspace methods for estimating operator‑vector multiplications

Let H be a separable complex Hilbert space, let K∶D→H be an unknown lin- ear map, where D is a dense subset of H , and let v and v0 be given vectors in H . We assume Kiv0∈D for any natural number i since by the definition of the Per- ron–Frobenius operator K, Kv∈D holds for D= 𝛷(M(X)) . The purpose of the Krylov subspace method is to estimate Knv only with v, v0,…, vm , where vi=Kiv0.

Assume the dimension of Km(K, v0) is m. Let q1,…, qm be an orthonormal basis of the Krylov subspace Km(K, v0) obtained through the Gram–Schmidt orthonor- malization and Qmm→H be defined as [c1,…, cm]↦∑m

i=1ciqi . Then, there exists an invertible matrix 𝐑mm×m such that [v0,…, vm−1] =Qm𝐑m . The Arnoldi approximation of Knv , which is denoted as aArnoldim , is defined as

where 𝐊̃m=QmKQm , which can be represented as Qm[v1,…, vm]𝐑−1m.

Analogously, let 𝛾 ∉ 𝛬(K) , let q1,…, qm be an orthonormal basis of the Krylov subspace Km((𝛾IK)−1, um) obtained through the Gram–Schmidt orthonormaliza- tion, where um=∑m

i=0

m i

(−1)i𝛾m−ivi , and let Qmm→H be defined as [c1,…, cm]↦∑m

i=1ciqi . Then, there exists an invertible matrix 𝐑mm×m such that [u1,…, um] =Qm𝐑m . Let ̃𝐋m=Qm(𝛾IK)−1Qm , which can be represented as Qm[u0,…, um−1]𝐑−1m . If 𝐋̃m is invertible, the shift-invert Arnoldi approximation of Knv , which is denoted as aSIAm , is defined as

K𝜙(z) ≈Qmf𝛾(Qm(𝛾IK)−1Qm)Qm𝜙(z)

=Qmf𝛾(Qm[uS0,…, uSm−1]𝐑−1m)Qm𝜙(z)

=Qm𝐊̃mQm𝜙(z),

aArnoldim =Qm𝐊̃nmQmv,

aSIAm =Qm𝐊̃nmQmv,

(8)

where 𝐊̃m=f𝛾𝐋m) and f𝛾(z) = 𝛾 −z−1 . Here, we give the different expression of 𝐊̃m for the shift-invert Arnoldi method from the Arnoldi method.

3.2 A residual of an approximation of operator‑vector multiplication

Assume 0∉ 𝛬(K) . We define a residual of an approximation am of Knv as follows:

Although the approximation error Knvam is generally not available since Knv is unknown, K−nam is available in some cases. For example, if K is a Perron–Frobe- nius operator and we know past observations x−1,…, x−n , then we can calculate K−n𝜙(𝜇St) for t=0,…, m−1 . Then, we can also calculate K−naArnoldim . In fact, the residual (6) is a reasonable criterion for evaluating the convergence of the approxi- mation for two reasons. First, the residual of an approximation am of the solution of a linear equation Ax=b is defined as bAam . If the problem of approximat- ing Knv is regarded as that of solving K−nx=v , the residual of approximation am is vK−nam . Second, the following proposition shows that the value vK−nam can be decomposed into a generalized residual of the Krylov approximation proposed by Saad [34] and Hochbruck et al. [16] and the error with respect to projecting v into a Krylov subspace.

Proposition 3.1 Assume 0∉ 𝛬(K). Let am=Qm𝐊̃nmQmv be the Arnoldi or shift- invert Arnoldi approximation of Knv and let f(z) =z−1. In addition, let rm be the generalized residual of am with respect to f(K−n)v, i.e.,

where i is the imaginary unit and 𝛤 is a rectifiable Jordan curve enclosing 𝛬(𝐊−1m) but not enclosing 0. Then, the residual of am defined as (6) is decomposed as follows:

Proof Since 0∉ 𝛬(K) and 𝛬( ̃𝐊m) = 𝛬(QmKQm) ⊆ 𝛬(K) hold, we have 0∉ 𝛬( ̃𝐊m) . Thus, we obtain 0∉ 𝛬( ̃𝐊−1m) , and there exists a rectifiable Jordan curve 𝛤 where f is holomorphic in the region enclosed by 𝛤 and continuous on 𝛤 . Therefore,

z∈𝛤f(z)dz=0 , and by the Cauchy’s integral formula, the following equalities are derived:

which completes the proof of the proposition. ◻

(6) res(am) =vK−nam.

rm= 1

2𝜋i∫z∈𝛤f(z)(

(zI−K−n)Qm(zI− ̃𝐊−nm)−1Qmvv) dz,

res(am) =rm+(

vQmQmv) .

rm= 1

2𝜋i∫z∈𝛤f(z)(

(zI−K−n)Qm(zI− ̃𝐊−nm)−1Qmvv) dz

=Qmf( ̃𝐊−nm) ̃𝐊−nmQmvK−nQmf( ̃𝐊−nm)Qmv

=QmQmvK−nam,

(9)

3.3 Convergence analysis for the Arnoldi method In this section, we assume K is bounded and 0∉ 𝛬(K).

The Arnoldi approximation aArnoldim =Qm𝐊̃nmQmv is obtained through two pro- jections. First, the vector v∈H is projected onto the Krylov subspace Km(K, v0) . Then, K acts on the projected vector in Km(K, v0) and is projected back to the Krylov subspace again. Note that we do not need the first projection in the classical Kry- lov subspace method for approximating f(A)v for a given linear operator A, vector v, and function f since we can compute Aiv for i=1,…, m−1 and construct the Krylov subspace of A and v. On the other hand, we cannot construct the Krylov subspace of K and v in our current case since K is unknown and only Kiv0 , not Kiv , for i=1,…, m−1 and a vector v0 are given. This prevents us evaluating the conver- gence speed of the approximation error or residual directly since the convergence speed of the approximation depends on that of the projected vector QmQmv to the original vector v. Therefore, we first consider the minimizer of the residual in a Kry- lov subspace and evaluate the difference between the Arnoldi approximation and minimizer.

In fact, since the projection QmQm is orthogonal, the projected vector QmQmv minimizes the difference from the original vector v, i.e.,

Since each u∈K

m(K, vn) satisfies K−nu∈K

m(K, v0) , Eq. (7) implies that the ine- quality ‖vK−nm‖≤‖vK−nu‖ holds, where m=KnQmQmv∈K

m(K, vn) . Therefore, m minimizes ‖vK−nu‖ for all u∈K

m(K, vn) , i.e.,

However, in practice, Km(K, vn) is unavailable only with v, v0,…, vm . Therefore, m is also unavailable. Thus, aArnoldim , instead of m is used for estimating Knv.

We evaluate the difference between aArnoldim and m . Let 𝔻

𝜌= {z∈∣|z|≤𝜌}

be the disk of diameter 𝜌 >0 , let W(K) = {⟨v, Kv⟩∣ v∈D, ‖v‖=1} be the numerical range of K, and let =

{∞} be the extended complex plane. More- over, let 𝛼𝜌 be a conformal map from ⧵W(K) to 𝔻

𝜌 that satisfies 𝛼𝜌(∞) = ∞ and limz→∞𝛼𝜌(z)∕z=1 , and let 𝛤r be the region enclosed by the contour {z∈∣|𝛼𝜌(z)|=r} for r> 𝜌 . Here, W(K) is the closure of W(K) and by the Rie-

mann mapping theorem, the map 𝛼𝜌 exists. The following theorem is deduced.

Theorem 3.2 Let n<m, and let pm−n−1 and p̃n−1 be polynomials of order mn−1 and n−1 that satisfy aArnoldim =Knpm−n−1(K)v0+ ̃pn−1(K)v0. Assume the set {v0,…, vm−1} is linearly independent. If the function fm defined as fm(z) =z−nn−1(z)

is holomorphic in 𝛤r, the residual of aArnoldim is evaluated as follows:

where C1 >0 is a constant and C2(m) >0 depends on m.

(7) arg minu∈Km(K,v0)vu‖=QmQmv.

arg minu∈Km(K,vn)vK−nu= ̃am.

‖res(aArnoldim ) −res(̃am)‖≤2C1C2(m)‖v0(𝜌∕r)m 1− (𝜌∕r),

(10)

We use the following lemma for deriving Theorem 3.2.

Lemma 3.3 Let n<m. Assume 0∉W(K) and the set {v0,…, vm−1} is linearly inde- pendent. Then, the following equality is deduced:

Proof The identity p(K)v0=Qmp( ̃𝐊m)Qmv0 holds for any polynomial p of an order less than or equal to m−1 . In addition, by the assumption of 0∉W(K) and the inclusion W( ̃𝐊m) ⊆W(K) , 𝐊̃m is invertible. As a result, the following equalities are derived:

which completes the proof of the lemma. ◻

The vector Qm𝐊̃−nmn−1( ̃𝐊m)Qmv0 in the right-hand side of Eq. (8) is equivalent to the Arnoldi approximation of the operator function fm(K) acting on the vector v0 [12, 15]. Note that since n−1 depends on m, fm depends on m. By using this fact, we now prove Theorem 3.2.

Proof (Proof of Theorem 3.2) Let P

m−1 be the set of all polynomials of orders less than or equal to m−1 . By Crouzeix et al. [5], the following bound is deduced for any p∈P

m−1:

where 0<C1 ≤1+√

2 . In addition, for a linear operator K and a map f that is holomorphic in the interior of W(K) and continuous in W(K) , the norm

f∞,W(K) is defined as ‖f∞,W(K) =supz∈W(K)f(z)� . By taking the infimum among p∈P

m−1 , we obtain

(8) res(aArnoldim ) −res(̃am) =Qm𝐊̃−nmn−1( ̃𝐊m)Qmv0K−nn−1(K)v0.

res(aArnoldim ) −res(̃am) =K−nmK−naArnoldim

=Qm𝐊̃−nm 𝐊̃nmQmv−(

pm−n−1(K)v0+K−nn−1(K)v0)

=Qm𝐊̃−nm Qm(

Knpm−n−1(K)v0+ ̃pn−1(K)v0)

−(

pm−n−1(K)v0+K−nn−1(K)v0)

=Qm𝐊̃−nm (𝐊̃nmpm−n−1( ̃𝐊m)Qmv0+ ̃pn−1( ̃𝐊m)Qmv0)

−(

Qmpm−n−1( ̃𝐊m)Qmv0+K−nn−1(K)v0)

=Qm𝐊̃−nm n−1( ̃𝐊m)Qmv0K−nn−1(K)v0,

‖res(aArnoldim ) −res(̃am)‖=‖Qmfm( ̃𝐊m)Qmv0fm(K)v0

≤‖Qmfm( ̃𝐊m)Qmv0Qmp( ̃𝐊m)Qmv0‖+‖fm(K)v0p(K)v0

2C1v0‖‖fmp∞,W(K),

‖res(aArnoldim ) −res(̃am)‖≤2C1v0‖ inf (9)

p∈Pm−1fmp∞,W(K).

(11)

In fact, the infimum in the inequality  (9) can be taken among p∈ {̃p∈P

m−1 ∣ ‖∞,W(K)≤2‖fm∞,W(K)} , which is a compact space. Indeed, for a polynomial p∈P

m−1 satisfying ‖p∞,W(K) >2‖fm∞,W(K) , we have

and 0∈P

m−1 . Therefore, the infimum in the inequality (9) can be replaced with the minimum. By Ellacott [7, Corollary 2.2], this factor is bounded as

where C2(m) =maxz∈𝛤

r|fm(z)| , which completes the proof of the theorem. ◻ In fact, the following proposition guarantees the order of the increase in the factor C2(m) is at least m in the case where the initial vector v0 does not depend on m (see Remark 2.1). Thus, in this case, the Arnoldi approximation aArnoldim approaches m , the minimizer of the residual, in the order of m𝛼m for some 0< 𝛼 <1.

Proposition 3.4 Assume the set {v0,…, vm−1} is linearly independent and v0 does not depend on m. If the function fm is holomorphic in 𝛤r, then the factor C2(m) is bounded as

for some constant C3>0 does not depend on m.

Proof We first evaluate the coefficients of the polynomial n−1 . Since pm−n−1 and n−1 depend on m, we denote them as pmm−n−1 and mn−1 in this proof. Let

̃

pmn−1(z) =∑n−1

i=0 ci(m)zi and pmm−n−1(z) =∑m−1

i=n ci(m)zi−n , where ci(m) ∈ . Then, by the definitions of mn−1 and pmm−n−1 , we have

for i=0,…, n , where i is a normalized vector in the orthogonal complement of the space spanned by {v0,…, vi−1, vi+1,…} . In addition, we have

As a result, the following inequality is derived:

fmp∞,W(K) >fm∞,W(K)=‖fm−0‖∞,W(K),

p∈minPm−1fmp∞,W(K)C2(m) (𝜌∕r)m 1− (𝜌∕r),

C2(m)≤C2(1) + (m−1)C3,

aArnoldimaArnoldim−1 ‖=��

��

m−1

i=0

ci(m)vi

m−2

i=0

ci(m−1)vi��

��

≥��

��

̃ qi,

m−1

i=0

ci(m)vi

m−2

i=0

ci(m−1)vi

����

��=�⟨i, vi⟩��ci(m) −ci(m−1)�,

aArnoldimaArnoldim−1 ‖=‖Qm𝐊̃nmQmvQm−1𝐊̃nm−1Qm−1v‖≤2‖Kn‖‖v‖.

(10)

ci(m) −ci(m−1)�≤ 2‖Kn‖‖v

�⟨i, vi⟩� .

(12)

We now evaluate C2(m) . By the inequality (10) and the holomorphicity of fm , we obtain

where

Applying the inequality (11) recursively completes the proof of the proposition. ◻ The decrease in the value ‖res(aArnoldim ) −res(̃am)‖ is confirmed numerically in Sect. 4.2.

3.4 Convergence analysis for the shift‑invert Arnoldi method

The convergence of the Arnoldi method is not guaranteed when K is unbounded.

Moreover, although Theorem 3.2 requires the assumption about the numerical range of K, it is generally hard to calculate the numerical range of K, a linear operator in an infinite dimensional space. Therefore, we also consider the shift-invert Arnoldi method.

The shift-invert Arnoldi approximation aSIAm =Qm𝐊̃nmQmv can also be obtained through two projections similar to the Arnoldi method. However, in this case, instead of K, the polynomial of (𝛾IK)−1 that approximates K acts on the vector which is the projection of v onto K

m((𝛾IK)−1, um).

Let n<m . To address K−n in the residual, we slightly modify the Krylov sub- space K

m((𝛾IK)−1, um) and define a space K̃

m((𝛾IK)−1, wm−n) as follows:

(11) C2(m) =sup

z∈𝛤r|z−nmn(z)|

≤sup

z∈𝛤r|z−nmn(z) −z−nm−1n (z)|+sup

z∈𝛤r|z−nm−1n (z)|

=sup

z∈𝛤r

||||

n i=0

z−n+i(ci(m) −ci(m−1))||

||+C2(m−1)

C3+C(m−1),

C3=

n i=1

2‖Kn‖‖v

�⟨i, vi⟩� sup

z∈𝛤rz−n+i�.

K̃m((𝛾IK)−1, wm−n)

∶ =Span{w1,…, wm−n, K−1wm−n,…, K−nwm−n}

=Span{(𝛾I−K)−m+n+1wm−n,…,(𝛾IK)−1wm−n,

wm−n, K−1wm−n,…, K−nwm−n}

=Span{(𝛾I−K)−m+n+1K−nwm−n,…,(𝛾IK)−1K−nwm−n, wm−n, K−1wm−n,…, K−nwm−n}

=Span{K−nw1,…, K−nwm−n−1, wm−n, K−1wm−n,…, K−nwm−n},

Referenzen

ÄHNLICHE DOKUMENTE

Four different Hausdorff distances are computed: The distance between the initial subspace estimate and the true cointegrating space, the distance between the adapted subspace

In this work we describe novel methods for effective subspace clustering on complex data including high-dimensional vector spaces (Section 2), imperfect data (Section 3), and graph

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and

subspace codes, network coding, constant dimension codes, subspace distance, integer linear programming, partial spreads.. The work was supported by the ICT COST Action IC1104

The comparison is based on the average number of ECADD operations required and the number points which have to be precomputed to compute a multi-scalar multiplication with

Note that, as for univariate and multivariate time series models, every stationary causal functional autoregressive moving average (FARMA) process is a functional linear process

In studies on the common reed (Phragmites austalis (Cav.)'kin. ISSN 0003-1847 A stratified sampling method is presented for €stimating the above-Sround biomass of

Hagedorn and I now have preliminary evidence that extirpation of the neurosecretory system will prevent the ovary from secreting ecydsone after a blood meal, and as a consequence