Regularization of Total Least Squares Problems

(1)

Regularization of Total Least Squares Problems

Heinrich Voss

voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

1 _{Total Least Squares Problems} 2 _{Numerical solution of TLS problems}

3 Regularization of TLS problems (RTLSQEP) 4 Nonlinear maxmin characterization

5 _{RTLSQEP: Numerical considerations} 6 _{Numerical example}

7 Regularization of TLS problems (RTLSLEP) 8 _{Determining the regualrization parameter}

(3)

Outline

1 _{Total Least Squares Problems}

2 _{Numerical solution of TLS problems}

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 _{RTLSQEP: Numerical considerations}

6 _{Numerical example}

7 Regularization of TLS problems (RTLSLEP)

8 _{Determining the regualrization parameter}

(4)

The ordinary Least Squares (LS) method assumes that the system matrix A of a linear model is error free, and all errors are confined to the right hand side b. However, in engineering applications this assumption is often unrealistic. Many problems in data estimation are obtained by linear systems where both, the matrix A and the right-hand side b, are contaminated by noise, for

example if A as well is only available by measurements or if A is an idealized approximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if the errors in the observations are independent random variables with zero mean and equal variance, then the total least squares (TLS) approach often gives better estimates than LS.

Given A ∈ Rm×n_{, b ∈ R}m_{, m ≥ n}

Find ∆A ∈ Rm×n_{, ∆b ∈ R}m_{and x ∈ R}n_{such that}

k[∆A, ∆b]k2_F =min! subject to (A + ∆A)x = b + ∆b, (1) where k · kF denotes the Frobenius norm.

(5)

Total Least Squares Problems

(6)

(7)

Total Least Squares Problems

(8)

Although the name “total least squares” was introduced only recently in the literature by Golub and Van Loan (1980), this fitting method is not new and has a long history in the statistical literature, where it is known asorthogonal regression,errors-in-variables, ormeasurement errors, and in image

deblurringblind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and it was rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) and Gleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outside statistics. In numerical linear algebra it was first studied by Golub and Van Loan (1980). Their analysis and their algorithm is based on the singular value decomposition.

(9)

Total Least Squares Problems cnt.

(10)

(11)

Total Least Squares Problems cnt.

(12)

If b + ∆b is in the range of A + ∆A, then there is a vector x ∈ Rn_{such that} (A + ∆A)x = b + ∆b, i.e. ([A, b] + [∆A, ∆b]) x −1 =0. (2)

Let the singular value decomposition of C := [A, b] be given by

[A, b] = UΣVT (3)

with U ∈ Rm×(n+1)_,_{V ∈ R}(n+1)×(n+1)_{having orthonormal columns and the} matrix Σ = diag(σ1, . . . , σn+1)where it holds that

σ1≥ σ2≥ σk > σk +1= · · · = σn+1. Then

σn+1 =min {k[∆A, ∆b]kF : rank([A + ∆A, b + ∆b]) < n + 1} (4) and the minimum is attained by [∆A, ∆b] = −[A, b]vvT_{, where v is any unit} vector in the right singular space

S := span{vk +1, . . . ,vn+1} (5)

(13)

TLS analysis cnt.

Denote by S the set of right singular vectors corresponding to σn+1. Suppose, that there exits a vector v ∈ S having the form

v =y α , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

T ₍₆₎ then ([A, b] + [∆A, ∆b]) x −1 = [A, b](I − vvT)(−v /α) = 0, and x solves the TLS problem.

If en+1= (0, . . . , 0, 1)T is orthogonal to S, then the TLS problem has no solution.

(14)

v =y α , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

(15)

TLS analysis cnt.

v =y α , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

(16)

If σn+1is a repeated singular value of [A, b], then the TLS problem may lack a unique solution. However, in this case it is possible to single out a unique minimum normTLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that [vk +1, . . . ,vn+1]Q =

W y

0 α

, _{y ∈ R}m, (7)

then xTLS:= −y /α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1is simple, and in case k = n there exists a unique solution if the last component of vn+1 is different from 0. Obviously, if σ0

nis the smallest singular value of A and σn+1 < σ0n, then the TLS solution is unique, and for σn+1= σn0 not.

(17)

TLS analysis cnt.

W y

0 α

, _{y ∈ R}m, (7)

(18)

W y

0 α

, _{y ∈ R}m, (7)

(19)

TLS analysis cnt.

W y

0 α

, _{y ∈ R}m, (7)

(20)

TheoremThe TLS problem has a unique solution if and only if

σn+1 = σmin([A, b]) < σmin(A) = σ0n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (ATA − σ2n+1I)−1ATb (9) which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonov regularization we observe some kind of deregularization in the TLS case. This deregularization results in a larger norm of the solution, i.e. the following relation holds

kx_TLSk ≥ kx_LSk ≥ kxλk (10) where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(21)

Uniquely solvable case

xTLS = (ATA − σ2n+1I)−1ATb (9)

which looks similar to the normal equations for LS problems.

(22)

(23)

Uniquely solvable case

kx_TLSk ≥ kx_LSk ≥ kxλk (10)

where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(24)

The inequalities (10) can be directly obtained with the help of the SVD of A = U0Σ0V0T n X i=1 σ_i0(bT_u0 i) σ_i02− σ2 n+1 v_i0 ≥ n X i=1 bT_u0 i σ0 i v_i0 ≥ n X i=1 σ0_i(bT_u0 i) σ_i02+ λ v 0 i . (11)

If σn+16= 0 the strict inequality kx_TLSk > kx_LSk holds, otherwise the system Ax = b is consistent.

(25)

Uniquely solvable case cnt.

The inequalities (10) can be directly obtained with the help of the SVD of A = U0Σ0V0T n X i=1 σ_i0(bT_u0 i) σ_i02− σ2 n+1 v_i0 ≥ n X i=1 bT_u0 i σ0 i v_i0 ≥ n X i=1 σ0_i(bT_u0 i) σ_i02+ λ v 0 i . (11)

If σn+16= 0 the strict inequality kx_TLSk > kx_LSk holds, otherwise the system Ax = b is consistent.

(26)

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n_. ₍₁₂₎ It holds that σ2_n+1=min v 6=0 vT_{[A, b]}T_{[A, b]v} vT_v ≤ f (x) = [xT_{, −}_{1][A, b]}T_{[A, b][x}T_{, −}_1]T 1 + kx k2 . If the TLS problem is solvable then we have σ2

n+1=f (xTLS), and if the TLS problem is not solvable, then there exists x ∈ Rn_{, kx k = 1 with}

kAxk = σ0

n= σn+1, and it follows that lim k →∞f (kx ) = limk →∞ kkAx − bk2 1 + k2 = kAx k 2_{= σ} n+1.

thus the solution of the TLS problem can be characterized by the minimization problem

xTLS=argminkAx − bk 2

1 + kx k2 , x ∈ R

(27)

Alternative characterization

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n_. ₍₁₂₎ It holds that σ2_n+1=min v 6=0 vT_{[A, b]}T_{[A, b]v} vT_v ≤ f (x) = [xT_{, −}_{1][A, b]}T_{[A, b][x}T_{, −}_1]T 1 + kx k2 .

If the TLS problem is solvable then we have σ2

kAxk = σ0

1 + kx k2 , x ∈ R

n_. ₍₁₃₎

(28)

kAxk = σ0

1 + kx k2 , x ∈ R

(29)

Alternative characterization

kAxk = σ0

1 + kx k2 , x ∈ R

n_. ₍₁₃₎

(30)

(31)

Newton’s method

In the generic case the TLS solution x = xTLS satisfies the system of equations AT_A _AT_b bT_A _bT_b x −1 = λ x −1 (14) where λ = σ2

n+1is the square of the smallest singular value of [A, b]. Hence, (λ, (xT, −1)T)is the eigenpair of the symmetric matrix [A, b]T[A, b] corresponding to its smallest eigenvalue. A widely used method for computing it is inverse iteration where it is natural in this case to normalize the

eigenvector approximation such that its last component is −1.

Inverse iteration is equivalent to Newton’s method for the system of n + 1 nonlinear equations in x and λ

 f (x, λ) g(x , λ) :=−A T_{r − λx} −bT_{r + λ} =0 (15) with r := b − Ax .

(32)

(33)

Newton’s method

(34)

The Jacobi matrix of this system is J =A T_{A − λI} _−x bTA 1 =: _˜ J −x bT_A ₁ (16) with ˜J = AT_{A − σ}2_{I, λ = σ}2_.

We assume in the following that σ < σ0

nsuch that ˜J is positive definite. Let the current approximation be (xk_{, λ}(k )_{). Then the corrections are obtained} from the linear system

_˜ J −xk bT_A ₁ ∆xk ∆λ(k ) = fk gk =:−A T_rk_{− λ}(k )_xk −bT_rk_{+ λ}(k ) . (17)

(35)

Newton’s method cnt.

(36)

(37)

Newton’s method cnt.

To solve this system we take advantage of the block LU factorization of J.

J = I 0 zT 1 _˜ J −xk 0 τ (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

To improve the numerical stability we note that ˜

Jz = ATb = ATrk+ ˜Jxk+ λkxk = ˜Jxk − fk, and hence z can be computed from z = xk_{− ˜}_J−1_f

k. By (block) forward and backward substitution we obtain

∆λ(k )= β := (zTfk − gg)/τ, ∆xk = −˜J−1fk+ β ˜J−1xk.

Hence, in each Newton step two symmetric linear systems have to be solved ˜

Jyk =xk, and ˜wk = −fk with ˜J = ATA − σ2I.

(38)

To solve this system we take advantage of the block LU factorization of J. J = I 0 zT 1 _˜ J −xk 0 τ (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

(39)

Newton’s method cnt.

To solve this system we take advantage of the block LU factorization of J.

J = I 0 zT 1 _˜ J −xk 0 τ (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

Jyk =xk, and ˜wk = −fk with ˜J = ATA − σ2I.

(40)

To solve this system we take advantage of the block LU factorization of J. J = I 0 zT 1 _˜ J −xk 0 τ (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

(41)

Newton’s method cnt.

Require: Initial approximation x

1: r = b − Ax

2: σ2=rT_r

3: for k = 1, 2, . . . until δ <tol do 4: f = −AT_{r − σ}2_x 5: g = −bT_{r + σ}2_x 6: δ =pfT_{f + g}2 7: solve (AT_{A − σ}2_{I)[y , w ] = [x , −f ]} 8: z = x + w 9: β = (zT_{f − g)/(z}T_{x + 1)} 10: x = z + βy 11: σ2_{= σ}2_{+ β} 12: r = b − Ax 13: end for

(42)

Newton’s method is known to converge quadratically. One can even get cubic convergence if σ2_{is not updated by σ}2_{← σ}2_{+ β}_{but by the Rayleigh quotient}

ρ = [x T_{, −}_{1][A, b]}T_{[A, b][x}T_{, −}_1]T xT_{x + 1} = kAx − bk2 1 + kx k2₊₁ = rT_r xT_{x + 1}. Then one gets the followingRayleigh quotient iteration

Require: Initial approximation x 1: for k = 1, 2, . . . until δ <tol do 2: r = b − Ax 3: σ2=rT_{r /(1 + x}T_{x )} 4: f = −AT_{r − σ}2_x 5: g = −bT_{r + σ}2_x 6: δ =pfT_{f + g}2 7: solve (AT_{A − σ}2_{I)[y , w ] = [x , −f ]} 8: z = x + w 9: β = (zT_{f − g)/(z}T_{x + 1)} 10: x = z + βy 11: end for

(43)

Rayleigh quotient iteration

Newton’s method is known to converge quadratically. One can even get cubic convergence if σ2_{is not updated by σ}2_{← σ}2_{+ β}_{but by the Rayleigh quotient}

ρ = [x T_{, −}_{1][A, b]}T_{[A, b][x}T_{, −}_1]T xT_{x + 1} = kAx − bk2 1 + kx k2₊₁ = rT_r xT_{x + 1}. Then one gets the followingRayleigh quotient iteration

Require: Initial approximation x

1: for k = 1, 2, . . . until δ <tol do 2: r = b − Ax 3: σ2=rT_{r /(1 + x}T_{x )} 4: f = −AT_{r − σ}2_x 5: g = −bT_{r + σ}2_x 6: δ =pfT_{f + g}2 7: solve (AT_{A − σ}2_{I)[y , w ] = [x , −f ]} 8: z = x + w 9: β = (zT_{f − g)/(z}T_{x + 1)} 10: x = z + βy 11: end for

(44)

(45)

Regularized Total Least Squares Problem

If A and [A, b] are ill-conditioned, regularization is necessary. We consider the quadratically constrained total least squares problem.

Let L ∈ Rk ×n, k ≤ n and δ > 0. Then the quadratically constrained formulation of the Regularized Total Least Squares (RTLS) problem reads:

Find ∆A ∈ Rm×n_{, ∆b ∈ R}m_{and x ∈ R}n_{such that} k[∆A, ∆b]k2

F =min! subject to (A + ∆A)x = b + ∆b, kLx k2≤ δ2.

Using the orthogonal distance this problems can be rewritten as Find x ∈ Rn_{such that}

kAx − bk2

1 + kx k2 =min! subject to kLx k 2_{≤ δ}2_.

(46)

k[∆A, ∆b]k2

Using the orthogonal distance this problems can be rewritten as Find x ∈ Rn_{such that}

kAx − bk2

(47)

Regularized Total Least Squares Problem

k[∆A, ∆b]k2

Using the orthogonal distance this problems can be rewritten as

Find x ∈ Rn_{such that}

kAx − bk2

(48)

If δ > 0 is chosen small enough (e.g. δ < kLxTLSk where xTLSis the solution of the TLS problem), then the constraint kLx k2_{≤ δ}2_{is active, and the RTLS} problem reads

kAx − bk2

1 + kx k2 =min! subject to kLx k

2_{= δ}2_. ₍₁₎

Problem is equivalent to the quadratic optimization problem kAx − bk2_{− f}∗

(1 + kx k2) =min! subject to kLx k2= δ2, (2) where

f∗=inf{f (x ) : kLx k2= δ2},

i.e. x∗_{is a global minimizer of problem (1) if and only if it is a global minimizer} of (2).

(49)

Regularized Total Least Squares Problem cnt.

If δ > 0 is chosen small enough (e.g. δ < kLxTLSk where xTLSis the solution of the TLS problem), then the constraint kLx k2_{≤ δ}2_{is active, and the RTLS} problem reads

kAx − bk2

1 + kx k2 =min! subject to kLx k

2_{= δ}2_. ₍₁₎

Problem is equivalent to the quadratic optimization problem kAx − bk2_{− f}∗

(1 + kx k2) =min! subject to kLx k2= δ2, (2) where

f∗=inf{f (x ) : kLx k2= δ2},

i.e. x∗_{is a global minimizer of problem (1) if and only if it is a global minimizer} of (2).

(50)

This suggests to consider the following family of quadratic optimization problems:

For fixed y ∈ Rn_{find x ∈ R}n_{such that}

g(x ; y ) := kAx − bk2− f (y )(1 + kxk2) =min! subject to kLx k2= δ2. (Py)

(51)

Regularized Total Least Squares Problem cnt.

Lemma 1

Problem (Py)admits a global minimizer if and only if

f (y ) ≤ min

x ∈N (L),x 6=0

xT_AT_Ax

xT_x , (19)

where N (L) denotes the null space of L.

If L is square and nonsingular, the feasible region is F = {x : kLx k2_{= δ}2_{} is} a nondegenerate ellipsoid; Therefore any quadratic function obtains its global minimum on F . On the other hand, N (LT_{L) = {0}, and any vector x satisfies} (19).

If LT_{L is singular, any x can be uniquely written as x = x}

1+x2with x1Tx2=0, x2∈ N (LTL), x1∈ range(LTL), and the unboundedness of g(·, y ) can only occur from the contribution of x2.

Hence, g(·, y ) is bounded below if and only if xT

2Hx2≥ 0 for any x2∈ N (LTL) where H = AT_{A − f (y )x}T_{x is the Hessian of g(·, y ), from which (19) is readily} obtained.

(52)

Lemma 1

f (y ) ≤ min

x ∈N (L),x 6=0

xT_AT_Ax

xT_x , (19)

(53)

Regularized Total Least Squares Problem cnt.

Lemma 1

f (y ) ≤ min

x ∈N (L),x 6=0

xT_AT_Ax

xT_x , (19)

(54)

Lemma 1

f (y ) ≤ min

x ∈N (L),x 6=0

xT_AT_Ax

xT_x , (19)

(55)

RTLSQEP Method

Lemma 2

Assume that y satisfies conditions of Lemma 1 and kLy k = δ, and let z be a global minimizer of problem (Py). Then it holds that

f (z) ≤ f (y ).

Proof:

(1 + kzk)2(f (z) − f (y )) = g(z; y ) ≤ g(y ; y ) = (1 + ky k2)(f (y ) − f (y )) = 0. This monotonicity result suggests the following algorithm:

Require: x0_{satisfying conditions of Lemma 1 and kLx}0_{k = δ.}

for m = 0, 1, 2, . . . until convergence do

Determine global minimizer xm+1_of

g(x ; xm) =min! subject to kLx k2= δ2.

end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub (2004), but motivated in a different way.

(56)

Lemma 2

f (z) ≤ f (y ).

Proof:

g(x ; xm) =min! subject to kLx k2= δ2.

end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub

(57)

RTLSQEP Method

Lemma 2

f (z) ≤ f (y ).

Proof:

g(x ; xm) =min! subject to kLx k2= δ2. end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub (2004), but motivated in a different way.

(58)

Lemma 2

f (z) ≤ f (y ).

Proof:

g(x ; xm) =min! subject to kLx k2= δ2. end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub

(59)

RTLSQEP cnt.

Form Lemma 2 it follows that

0 ≤ f (xm+1) ≤f (xm),

i.e. {f (xm_)}_{is monotonically decreasing und bounded below.}

The (Pxm)can be solved via the first order necessary optimality conditions

(ATA − f (xm)I)x + λLTLx = ATb, kLxk2_{= δ}2_.

Although g(·; xm₎_{in general is not convex these conditions are even sufficient} if the Lagrange parameter is chosen maximal.

(60)

0 ≤ f (xm+1) ≤f (xm),

The (Pxm)can be solved via the first order necessary optimality conditions (ATA − f (xm)I)x + λLTLx = ATb, kLxk2_{= δ}2_.

(61)

RTLSQEP cnt.

0 ≤ f (xm+1) ≤f (xm),

The (Pxm)can be solved via the first order necessary optimality conditions (ATA − f (xm)I)x + λLTLx = ATb, kLxk2_{= δ}2_.

(62)

Theorem 1

Assume that (ˆλ, ˆx ) solves the first order conditions.

(ATA − f (y )I)x + λLTLx = ATb, kLxk2_{= δ}2_. ₍₂₀₎

If kLy k = δ and ˆλis the maximal Lagrange multiplier then ˆx is a global minimizer of problem (Py).

Proof

The statement follows immediately from the following equation which can be shown similarly as in Gander (1981):

If (λj,zj), j = 1, 2, are solutions of (20) then it holds that g(z2;y ) − g(z1;y ) =1

2(λ1− λ2)kL(z

(63)

RTLSQEP cnt.

Theorem 1

Assume that (ˆλ, ˆx ) solves the first order conditions.

(ATA − f (y )I)x + λLTLx = ATb, kLxk2_{= δ}2_. ₍₂₀₎

If kLy k = δ and ˆλis the maximal Lagrange multiplier then ˆx is a global minimizer of problem (Py).

Proof

The statement follows immediately from the following equation which can be shown similarly as in Gander (1981):

If (λj,zj), j = 1, 2, are solutions of (20) then it holds that g(z2;y ) − g(z1;y ) =1

2(λ1− λ2)kL(z

1_{− z}2_)k2_.

(64)

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hT_{u = δ}2_{then the solution of problem (∗) is recovered as} x = L−T(W + ˆλI)u.

(65)

A quadratic eigenproblem

(W + λI)2u − δ−2hhTu = 0.

(66)

(W + λI)2u − δ−2hhTu = 0.

(67)

A quadratic eigenproblem

(W + λI)2u − δ−2hhTu = 0.

(68)

(W + λI)2u − δ−2hhTu = 0.

(69)

A quadratic eigenproblem cnt.

Assume that k < n where L has linearly independent rows.

Let LT_{L = USU}T _{be the spectral decomposition of L}T_{L. Then the first order} condition is equivalent to

(AU)T(AU) − f (y )Iz + λSz = (AU)Tb, zTSz = δ2 with z = UT_{x .}

Partitioning the matrices and vectors in block form (AU)T(AU) = T1 T2 TT 2 T4 , S =S1 0 0 0 , (AU)Tb =c1 c2 , z =z1 z2

where the leading blocks have dimension k , one gets T1− f (y )I_k T2 TT 2 T4− f (y )In−k z1 z2 + λS1z1 0 =c1 c2 .

(70)

(71)

A quadratic eigenproblem cnt.

(72)

Solving the second component for z2

z2= (T4− f (y )In−k)−1(c2− T2Ty1) and substituting in the first component one gets

T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T

z1+ λS1z1 = (c1− T2(T4− f (y )In−k)−1c2).

Hence, one gets the quadratic eigenvalue problem (W + λI)2u − δ−2hhTu = 0, with W = S₁−1/2T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T S₁−1/2 h = S₁−1/2c1− T2(T4− f (y )In−k)−1c2 .

(73)

A quadratic eigenproblem cnt.

Solving the second component for z2

z2= (T4− f (y )In−k)−1(c2− T2Ty1) and substituting in the first component one gets

T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T

z1+ λS1z1 = (c1− T2(T4− f (y )In−k)−1c2).

Hence, one gets the quadratic eigenvalue problem (W + λI)2u − δ−2hhTu = 0, with W = S₁−1/2T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T S₁−1/2 h = S₁−1/2c1− T2(T4− f (y )In−k)−1c2 .

(74)

If (λ, u) is the eigenpair corresponding to the right most eigenvalue and u is normalized such that uT_{h = δ}2_{, and w = (W + λI)u, then the solution of the} first order conditions is recovered by x = Uz where

z =z1 z2 = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

For k < n the quadratic eigenproblem looks very complicated. Notice however, that n − k usually is very small (n − k = 1 and n − k = 2 for the discrete first and second order derivative) and quite often fast decomposition techniques can be used.

Moreover, L with k < n often can be replace by some nonsingular approximation.

(75)

A quadratic eigenproblem cnt.

z =z1 z2 = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

(76)

z =z1 z2 = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

(77)

Convergence of RTLSQEP

Theorem

Any limit point x∗of the sequence {xm_{} constructed by RTLSQEP is a global} minimizer of the optimization problem

f (x ) = min subject to kLx k2= δ2.

Proof: Let x∗_{be a limit point of {x}m_{}, and let {x}mj_{} be a subsequence}

converging to x∗_{. Then x}mj _{solves the first order conditions}

(ATA − f (xmj−1_)I)xmj_{+ λ}

mjL

T_Lxmj ₌_AT_b.

By the monotonicity of f (xm₎_{it follows that f (x}mj−1₎_{converges to f (x}∗_).

Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λm_j0} converges to some λ∗_{, and x}∗_satisfies

(ATA − f (x∗)I)x∗+ λ∗LTLx∗=ATb, kLx∗k2_{= δ}2_, where λ∗is the maximal Lagrange multiplier.

(78)

Theorem

Proof: Let x∗_{be a limit point of {x}m_{}, and let {x}mj_{} be a subsequence} converging to x∗_{. Then x}mj _{solves the first order conditions}

(ATA − f (xmj−1)I)xmj+ λmjLTLxmj =ATb.

By the monotonicity of f (xm₎_{it follows that f (x}mj−1₎_{converges to f (x}∗_).

Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λm_j0} converges to some λ∗_{, and x}∗_satisfies

(79)

Convergence of RTLSQEP

Theorem

By the monotonicity of f (xm₎_{it follows that f (x}mj−1₎_{converges to f (x}∗_). Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λm_j0} converges to some λ∗_{, and x}∗_satisfies

(80)

Theorem

By the monotonicity of f (xm₎_{it follows that f (x}mj−1₎_{converges to f (x}∗_). Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λm_j0} converges to some λ∗_{, and x}∗_satisfies

(81)

Convergence of RTLSQEP

Hence, x∗is a global minimizer of

g(x ; x∗) =min! subject to kLx k2= δ2,

and for y ∈ Rn_{with kLy k}2_{= δ}2_{it follows that} 0 = g(x∗;x∗) ≤g(y ; x∗) = kAy − bk2_{− f (x}∗ )(1 + ky k2) = (f (y ) − f (x∗))(1 + ky k2), i.e. f (y ) ≥ f (x∗).

(82)

Hence, x∗is a global minimizer of

g(x ; x∗) =min! subject to kLx k2= δ2,

and for y ∈ Rn_{with kLy k}2_{= δ}2_{it follows that} 0 = g(x∗;x∗) ≤g(y ; x∗) = kAy − bk2_{− f (x}∗ )(1 + ky k2) = (f (y ) − f (x∗))(1 + ky k2), i.e. f (y ) ≥ f (x∗).

(83)

Outline

8 _{Determining the regualrization parameter}

(84)

Let T (λ) ∈ Cn×n_{, T (λ) = T (λ)}H_{, λ ∈ J ⊂ R an open interval (maybe} unbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function f (·; x ) : J → R, f (λ; x) := xH_{T (λ)x} is continuous, and that the real equation

f (λ, x ) = 0 has at most one solution λ =: p(x ) in J.

Then equation f (λ, x ) = 0 implicitly defines a functional p on some subset D of Cnwhich we call theRayleigh functional.

Assume that

(85)

Nonlinear maxmin characterization

Assume that

(λ −p(x ))f (λ, x ) > 0 for every x ∈ D, λ 6= p(x ).

(86)

Assume that

(87)

Nonlinear maxmin characterization

Assume that

(λ −p(x ))f (λ, x ) > 0 for every x ∈ D, λ 6= p(x ).

(88)

Let sup_{v ∈D} p(v ) ∈ J and assume that there exists a subspace W ⊂ Cn_of dimension ` such that

W ∩ D 6= ∅ and inf

v ∈W ∩D p(v ) ∈ J.

Then T (λ)x = 0 has at least ` eigenvalues in J, and for j = 1, . . . , ` the j-largest eigenvalue λj can be characterized by

λj = max

dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1)

For j = 1, . . . , ` every j dimensional subspace V ⊂ Cn_with V ∩ D 6= ∅ and λj = inf

v ∈V ∩D p(v )

is contained in D ∪ {0}, and the maxmin characterization of λj can be replaced by λj = max dim V =j, V \{0}⊂D min v ∈V \{0}p(v ).

(89)

maxmin characterization (V., Werner 1982)

v ∈W ∩D p(v ) ∈ J.

λj = max dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1) For j = 1, . . . , ` every j dimensional subspace V ⊂ Cn_with

V ∩ D 6= ∅ and λj = inf v ∈V ∩D p(v )

(90)

v ∈W ∩D p(v ) ∈ J.

λj = max dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1) For j = 1, . . . , ` every j dimensional subspace V ⊂ Cn_with

V ∩ D 6= ∅ and λj = inf v ∈V ∩D p(v )

(91)

Back to

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

f (λ, x ) = xHT (λ)x = λ2kxk2₊_2λxH_{Wx + kWx k}2_{− |x}H_h|2_/δ2_, _{x 6= 0} is a parabola which attains its minimum at

λ = −x H_Wx xH_x .

Let J = (−λmin, ∞)where λminis the minimum eigenvalue of W . Then f (λ, x ) = 0 has at most one solution p(x ) ∈ J for every x 6= 0. Hence, the Rayleigh functional p of (QEP) corresponding to J is defined, and the general conditions are satisfied.

(92)

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

(93)

Back to

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

(94)

Let xminbe an eigenvector of W corresponding to λmin. Then

f (−λmin,xmin) =xminH (W − λmin)2xmin− |xminH h|2/δ2= −|xminH h|2/δ2≤ 0

Hence, if xH

minh 6= 0 then xmin∈ D. If xH

minh = 0, and the minimum eigenvalue µminof T (−λmin)is negative, then for the corresponding eigenvector yminit holds

f (−λmin,ymin) =y_minH T (−λmin)ymin= µminkymink2<0, and ymin∈ D.

If xH

minh = 0, and T (−λmin)is positive semi-definite, then

f (−λmin,x ) = xHT (−λmin)x ≥ 0 for every x 6= 0, and D = ∅.