Sylvester equations - Interpolatory methods for large-scale matrix equations

3.3 Interpolatory methods for large-scale matrix equations

3.3.7 Sylvester equations

3.3. Interpolatory methods for large-scale matrix equations 55 well-known, see [117, Section 8.3.1], the latter approach often results in a very slow con-vergence rate for common PDEs. Since the efficiency of Algorithm 3.3.1 also depends on the convergence rate, we cannot expect it to outperform state-of-the-art low rank tech-niques when it comes to computational efficiency. On the other hand, it obviously has the advantage of locally minimizing the residual for a given rank.

56 Linear Systems with Xfulfilling (3.47). As it is easily seen, this function results from a slight modifica-tion of the H₂-norm of a dynamical system and thus can be computed as

f(Θ) = vec (BC)^T (−E⊗A−H⊗M)⁻¹vec (BC). (3.49) For later purposes, it is helpful to note that f is invariant under orthonormal transfor-mations.

Lemma 3.3.2. Let Θ = (A,E,M,H,B,C) denote a set of matrices and let X be the solution of the associated Sylvester equation (3.47). Assume that Λis a diagonal matrix containing the eigenvalues of the matrix pencil(H,E)and that Qis a matrix containing an orthogonal set of eigenvectors. Then it holds

f(Θ) = tr B^TXC^T

= tr

B^TYC˜^T ,

where C˜ =CQ and Y is the solution of AY+MYΛ+BC˜ =0.

Proof. Let Q be the matrix of eigenvectors for the matrix pencil (H,E), i.e., assume that it holds Q^TEQ = I and Q^THQ = Λ, where Λ is a diagonal matrix consisting of the eigenvalues. SinceH=H^T ≺0 and E =E^T 0 this is always possible. If we now postmultiply equation (3.47) withQ, we get

AXEQ+MXHQ+BCQ=0.

Due to the orthonormality ofQ, this can be transformed into AXQQ^TEQ+MXQQ^THQ+BCQ=0.

If we denote Y =XQ and ˜C=CQ, it follows that AY+MYΛ+BC˜ =0, which implies that

B^TYC˜^T

= tr B^TXQQ^TC

= tr B^TXC^T .

3.3. Interpolatory methods for large-scale matrix equations 57

Assume now that we have constructed a reduced set of matrices Θˆ = (A,ˆ E,ˆ M,ˆ H,ˆ B,ˆ C)ˆ

by the following projection:

Aˆ =V^TAV, Eˆ =W^TEW, Mˆ =V^TMV,

Hˆ =W^THW, Bˆ =V^TB, Cˆ =CW. (3.50)

Next, for Θ and Θˆ we define the corresponding error set

Θerr = (Aerr,Eerr,Merr,Herr,Berr,Cerr), with

Aerr=

−A 0 0 Aˆ

, Eerr =

−E 0 0 Eˆ

, Merr =

−M 0 0 Mˆ

H_err=

−H 0 0 Hˆ

, B_err = B

Bˆ

, C_err =

C Cˆ .

Similar to the previous cases, it is easy to show that a crucial lower bound is given by the objective function f evaluated in the error set Θ_err.

Corollary 3.3.1. Let Θ and Θˆ denote two sets of matrices associated with large and reduced generalized Sylvester equations of the form (3.47), respectively. Then, for the associated error set Θerr, it holds that

f(Θerr)≤f(Θ)−f(Θ).ˆ

Hence, analog to the topic of H₂-optimal model order reduction, we want to find a local minimizer off(Θerr).This can be done by deriving first order necessary conditions based on the computation formula of f(Θerr).Due to the structural similarity to the previous sections, we only briefly mention how to proceed. For convenience, let us start with the case of B = b and C = c^T. First of all, according to Lemma 3.3.2, we may w.l.o.g.

assume that H_err = Λ_err and E_err =I. Consequently, the objective function simplifies

58 Linear Systems

according to

f(Θerr) = b^T_errX_errc_err

= c^T_err⊗b^T_err

(−Λerr⊗M_err−I⊗A_err)⁻¹(cerr⊗b_err)

n+ˆn

i=1

b^T_err(−λiM_err−A_err)⁻¹b_err (c⁽ⁱ⁾_err)²,

where c⁽ⁱ⁾err denotes the i-th component of c_err. Setting the derivative of f(Θerr) with respect toˆc^(j) equal to zero yields

2ˆc^(j) b^T_err(−λˆjM_err−A_err)⁻¹b_err = 0,

with ˆλj being the j-th eigenvalue of (H,ˆ E).ˆ However, in terms of interpolation, the above means that

b^T(−λˆjM−A)⁻¹b=bˆ^T(−λˆjMˆ −A)ˆ ⁻¹ˆb. (3.51) Similarly, for the derivative with respect to ˆλj,we obtain

b^T(−λˆjM−A)⁻¹M(−λˆjM−A)⁻¹b=ˆb^T(−λˆjMˆ −A)ˆ ⁻¹M(−ˆ λˆjMˆ −A)ˆ ⁻¹b.ˆ (3.52) Hence, these conditions are obviously an extension of the Hermite interpolation condi-tions for H₂-optimality. On the other hand, we have

f(Θerr) =b^T_errX_errc_err =c^T_errX^T_errb_err and the same argumentation leads to

G(−µˆj) = ˆG(−µˆj), G⁰(−ˆµj) = ˆG⁰(−ˆµj), (3.53) with

G(s) =c^T(sE−H)⁻¹c, G(s) =ˆ ˆc^T(sEˆ −H)ˆ ⁻¹ˆc

andµj being the eigenvalues of the matrix pencil (A,ˆ M).ˆ Hence, we propose Algorithm 3.3.2 for iteratively constructing a reduced set of matrices fulfilling these conditions.

Remark 3.3.6. Due to the connection to optimal H₂-model reduction, it should be mentioned that instead of Step 5 of Algorithm 3.3.2, one can alternatively solve two

3.3. Interpolatory methods for large-scale matrix equations 59 Algorithm 3.3.2 IRKA for symmetric Sylvester equations ((Sy)²IRKA)

Input: Initial selection of real interpolation points σi and µi for i = 1, . . . ,nˆ and a convergence tolerance tol.

Output: X_n_ˆ =V ˆXW^T fulfilling first order necessary conditions

1: Choose V and W s.t. V = span{(σ₁M−A)⁻¹b, . . . ,(σn_ˆM−A)⁻¹b} and W = span{(µ1E−H)⁻¹c, . . . ,(µnˆE−H)⁻¹c}and V^TV=W^TW=I.

2: while relative change in {σi, µi}> tol do

3: Aˆ =V^TAV,Mˆ =V^TMV,Eˆ =W^TEW,Hˆ =W^THW

4: assign σi ← −λi(H,ˆ E) andˆ µi ← −λi(A,ˆ M) forˆ i= 1, . . . ,ˆn,

5: update V and W s.t. V = span{(σ1M−A)⁻¹b, . . . ,(σnˆM−A)⁻¹b} and W = span{(µ₁E−H)⁻¹c, . . . ,(µn_ˆE−H)⁻¹c} and V^TV =W^TW=I.

6: end while

7: bˆ=V^Tb,ˆc=W^Tc

8: Solve A ˆˆXˆE+M ˆˆX ˆH+bˆˆc^T.

9: Set X_ˆ_n=V ˆXW^T.

reduced Sylvester equations of the form

AV ˆE+MV ˆH+bˆc^T =0, EW ˆA+HW ˆM+cˆb^T =0.

For a robust solver for these types of equations, we refer to, e.g., [25].

It remains to show that in the case of convergence of Algorithm 3.3.2, the lower bound of Corollary 3.3.1 is actually attained. For this, we assume the following splitting of the solution of (3.47) for the error system

X_err =

X Y Z Xˆ

X 0 0 0

0 Y 0 Xˆ

0 0 Z Xˆ

−

0 0 0 Xˆ

Hence, we get

f(Θerr) = f(Θ)−f(Θ) +ˆ b^T_err Y

Xˆ

ˆc+bˆ^T Z Xˆ

c_err.

A closer look at the right hand side of the previous equation reveals that

b^T_err Y

Xˆ

c=b^TYˆc+ˆb^TXˆˆc,

60 Linear Systems

where Y is the solution of

−AY−MYΞ+bˆc^T =0.

Here, we again assumed that the reduced matrix pencil (H,ˆ E) is given in its eigenvalueˆ decomposition and that the eigenvalues are contained in the diagonal matrix Ξ. As a consequence, it holds that

ˆc^T ⊗b^T

vec (Y) =− ˆc^T ⊗b^T

(−Ξ⊗M−I⊗A)⁻¹(ˆc⊗b). On the other hand, we know that

ˆ c^T ⊗bˆ^T

vec

Xˆ

= ˆ

c^T ⊗bˆ^T −Ξ⊗Mˆ −I⊗Aˆ ₋1

ˆc⊗ˆb

which, together with the interpolation conditions, yields b^T_err Y

Xˆ

c = 0. Similarly, we can show that ˆb^T

Z Xˆ

c_err = 0.

Analog to the proof of Theorem3.3.1, one can eventually show that vec

X−V ˆXW^TT

(−LS) vec

X−V ˆXW^T

=f(Θ)−f(Θ).ˆ Altogether, we have thus proven our main result.

Theorem 3.3.4. Let Θ = (A,E,M,H,b,c^T) denote a set of matrices determining a Sylvester equation as in (3.47) with solutionX.Let furtherX_n_ˆ be computed by Algorithm 3.3.2 with convergence tolerance 0. Then X_ˆ_n is a local minimizer of

Xmink∈M{vec (X−X_k)^T (−LS) vec (X−X_k)}.

Extension to the MIMO case

So far, we have proved the result for a right hand side of rank 1, i.e.,bandcbeing vectors.

As the extension to the ’MIMO’ case is straightforward, we only sketch the necessary steps in the following. What remains to be clarified are suitable optimality conditions for the MIMO case in terms of either matrix equations or tangential interpolation conditions.

3.3. Interpolatory methods for large-scale matrix equations 61

For this, let us have a look at the objective function f evaluated in the error set f(Θerr) = tr B^T_errX_errC^T_err

where Xerr =

X Y Z Xˆ

is partitioned as before. As we have done for the case of the Lyapunov residual, the first step is to compute the derivate off with respect to an arbi-trary parameter γ that might be one of the entries ofA,B,C,E,H orM, respectively.

Accordingly, we obtain

∂f

∂γ = tr

∂Xerr

∂γ C^T_errB^T_err

+ tr

Xerr

∂(C^T_errB^T_err)

∂γ

Taking into account that X_err is the solution of the generalized Sylvester equation A_errX_errE_err+M_errX_errH_err+B_errC_err =0,

a careful analysis leads to

∂f

∂γ = tr

X_err∂Eerr

∂γ X^T_errA_err

+ tr

X_errE_errX^T_err∂Aerr

∂γ

+ tr

X_err∂H_err

∂γ X^T_errM_err

+ tr

X_errH_errX^T_err∂Merr

∂γ

+ 2 tr

X_err∂(C^T_errB^T_err)

∂γ

Depending on the specific choice of γ,we can derive different optimality conditions. For example, setting γ =Eˆ_i,j, leads to the condition

−Y^TAY+Xˆ^TA ˆˆX=0. (3.54a)

62 Linear Systems Similarly, for the derivatives with respect to Aˆ_i,j,Hˆ_i,j,Mˆ_i,j,Bˆ_i,j and Cˆ_i,j,we get

−ZEZ^T +XˆˆE ˆX^T =0, (3.54b)

−Y^TMY+Xˆ^TM ˆˆX=0, (3.54c)

−ZHZ^T +X ˆˆH ˆX^T =0, (3.54d)

Y^TB+Xˆ^TBˆ =0, (3.54e)

ZC^T +X ˆˆC^T =0. (3.54f)

We already know that there is an equivalence between Sylvester equations and the concept of tangential interpolation, see again [62]. Hence, it is not surprising that the above matrix equation based conditions can alternatively be replaced by demanding that a reduced-order transfer function matrix tangentially interpolates the original transfer function matrix at given interpolations points. To be precise, let

G(s) = C(sE−H)⁻¹C^T ∈R(s)^m^×^m and F(s) =B^T(sM−A)⁻¹B∈R(s)^m^×^m. Moreover, assume that (Q,Λ) and (R,Ξ) are the eigenvalue decompositions of the matrix pencils (H,ˆ E) and (ˆ A,ˆ M),ˆ respectively. Here, Λ = diag (λ1, . . . , λˆn) and Ξ = diag (µ1, . . . , µˆn) contain the eigenvalues while Q and R consist of a set of Eˆ and M-ˆ orthogonal eigenvectors. A locally optimal reduced set Θˆ of matrices now has to fulfill

G(−µj)˜b_j =G(−µˆ j)˜b_j, (3.55a) b˜^T_jG(−µj) = ˜b^T_jG(−µˆ j), (3.55b) b˜^T_jG⁰(−µj)˜b_j = ˜b^T_jGˆ⁰(−µj)˜b_j, (3.55c) F(−λj)˜c_j =F(−λˆ j)˜c_j, (3.55d)

c^T_jF(−λj) = ˜c^T_jF(−λˆ j), (3.55e)

c^T_jF⁰(−λj)˜c_j = ˜c^T_jFˆ⁰(−λj)˜c_j, (3.55f) with ˜B =Bˆ^TR and ˜C=CQˆ denoting tangential directions. For the sake of complete-ness, in Algorithm 3.3.3 we now see a matrix version that upon convergence yields a local minimizer for the MIMO case. Consequently, we have the following result extend-ing Theorem3.3.4.

Corollary 3.3.2. Let Θ = (A,E,M,H,B,C) denote a set of matrices determining a Sylvester equation as in (3.47) with solutionX.Let furtherX_n_ˆ be computed by Algorithm

3.4. Numerical examples 63

Im Dokument Interpolatory methods for model reduction of large-scale dynamical systems (Seite 77-85)