• Keine Ergebnisse gefunden

Regularization of Total Least Squares Problems

N/A
N/A
Protected

Academic year: 2021

Aktie "Regularization of Total Least Squares Problems"

Copied!
200
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Regularization of Total Least Squares Problems

Heinrich Voss

voss@tu-harburg.de

Hamburg University of Technology Institute of Numerical Simulation

(2)

1 Total Least Squares Problems 2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP) 4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations 6 Numerical example

7 Regularization of TLS problems (RTLSLEP) 8 Determining the regualrization parameter

(3)

Outline

1 Total Least Squares Problems

2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations

6 Numerical example

7 Regularization of TLS problems (RTLSLEP)

8 Determining the regualrization parameter

(4)

The ordinary Least Squares (LS) method assumes that the system matrix A of a linear model is error free, and all errors are confined to the right hand side b. However, in engineering applications this assumption is often unrealistic. Many problems in data estimation are obtained by linear systems where both, the matrix A and the right-hand side b, are contaminated by noise, for

example if A as well is only available by measurements or if A is an idealized approximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if the errors in the observations are independent random variables with zero mean and equal variance, then the total least squares (TLS) approach often gives better estimates than LS.

Given A ∈ Rm×n, b ∈ Rm, m ≥ n

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2F =min! subject to (A + ∆A)x = b + ∆b, (1) where k · kF denotes the Frobenius norm.

(5)

Total Least Squares Problems

The ordinary Least Squares (LS) method assumes that the system matrix A of a linear model is error free, and all errors are confined to the right hand side b. However, in engineering applications this assumption is often unrealistic. Many problems in data estimation are obtained by linear systems where both, the matrix A and the right-hand side b, are contaminated by noise, for

example if A as well is only available by measurements or if A is an idealized approximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if the errors in the observations are independent random variables with zero mean and equal variance, then the total least squares (TLS) approach often gives better estimates than LS.

Given A ∈ Rm×n, b ∈ Rm, m ≥ n

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2F =min! subject to (A + ∆A)x = b + ∆b, (1) where k · kF denotes the Frobenius norm.

(6)

The ordinary Least Squares (LS) method assumes that the system matrix A of a linear model is error free, and all errors are confined to the right hand side b. However, in engineering applications this assumption is often unrealistic. Many problems in data estimation are obtained by linear systems where both, the matrix A and the right-hand side b, are contaminated by noise, for

example if A as well is only available by measurements or if A is an idealized approximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if the errors in the observations are independent random variables with zero mean and equal variance, then the total least squares (TLS) approach often gives better estimates than LS.

Given A ∈ Rm×n, b ∈ Rm, m ≥ n

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2F =min! subject to (A + ∆A)x = b + ∆b, (1) where k · kF denotes the Frobenius norm.

(7)

Total Least Squares Problems

The ordinary Least Squares (LS) method assumes that the system matrix A of a linear model is error free, and all errors are confined to the right hand side b. However, in engineering applications this assumption is often unrealistic. Many problems in data estimation are obtained by linear systems where both, the matrix A and the right-hand side b, are contaminated by noise, for

example if A as well is only available by measurements or if A is an idealized approximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if the errors in the observations are independent random variables with zero mean and equal variance, then the total least squares (TLS) approach often gives better estimates than LS.

Given A ∈ Rm×n, b ∈ Rm, m ≥ n

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2F =min! subject to (A + ∆A)x = b + ∆b, (1) where k · kF denotes the Frobenius norm.

(8)

Although the name “total least squares” was introduced only recently in the literature by Golub and Van Loan (1980), this fitting method is not new and has a long history in the statistical literature, where it is known asorthogonal regression,errors-in-variables, ormeasurement errors, and in image

deblurringblind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and it was rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) and Gleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outside statistics. In numerical linear algebra it was first studied by Golub and Van Loan (1980). Their analysis and their algorithm is based on the singular value decomposition.

(9)

Total Least Squares Problems cnt.

Although the name “total least squares” was introduced only recently in the literature by Golub and Van Loan (1980), this fitting method is not new and has a long history in the statistical literature, where it is known asorthogonal regression,errors-in-variables, ormeasurement errors, and in image

deblurringblind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and it was rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) and Gleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outside statistics. In numerical linear algebra it was first studied by Golub and Van Loan (1980). Their analysis and their algorithm is based on the singular value decomposition.

(10)

Although the name “total least squares” was introduced only recently in the literature by Golub and Van Loan (1980), this fitting method is not new and has a long history in the statistical literature, where it is known asorthogonal regression,errors-in-variables, ormeasurement errors, and in image

deblurringblind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and it was rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) and Gleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outside statistics. In numerical linear algebra it was first studied by Golub and Van Loan (1980). Their analysis and their algorithm is based on the singular value decomposition.

(11)

Total Least Squares Problems cnt.

Although the name “total least squares” was introduced only recently in the literature by Golub and Van Loan (1980), this fitting method is not new and has a long history in the statistical literature, where it is known asorthogonal regression,errors-in-variables, ormeasurement errors, and in image

deblurringblind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and it was rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) and Gleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outside statistics. In numerical linear algebra it was first studied by Golub and Van Loan (1980). Their analysis and their algorithm is based on the singular value decomposition.

(12)

If b + ∆b is in the range of A + ∆A, then there is a vector x ∈ Rnsuch that (A + ∆A)x = b + ∆b, i.e. ([A, b] + [∆A, ∆b]) x −1  =0. (2)

Let the singular value decomposition of C := [A, b] be given by

[A, b] = UΣVT (3)

with U ∈ Rm×(n+1),V ∈ R(n+1)×(n+1)having orthonormal columns and the matrix Σ = diag(σ1, . . . , σn+1)where it holds that

σ1≥ σ2≥ σk > σk +1= · · · = σn+1. Then

σn+1 =min {k[∆A, ∆b]kF : rank([A + ∆A, b + ∆b]) < n + 1} (4) and the minimum is attained by [∆A, ∆b] = −[A, b]vvT, where v is any unit vector in the right singular space

S := span{vk +1, . . . ,vn+1} (5)

(13)

TLS analysis cnt.

Denote by S the set of right singular vectors corresponding to σn+1. Suppose, that there exits a vector v ∈ S having the form

v =y α  , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

T (6) then ([A, b] + [∆A, ∆b]) x −1  = [A, b](I − vvT)(−v /α) = 0, and x solves the TLS problem.

If en+1= (0, . . . , 0, 1)T is orthogonal to S, then the TLS problem has no solution.

(14)

Denote by S the set of right singular vectors corresponding to σn+1. Suppose, that there exits a vector v ∈ S having the form

v =y α  , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

T (6) then ([A, b] + [∆A, ∆b]) x −1  = [A, b](I − vvT)(−v /α) = 0, and x solves the TLS problem.

If en+1= (0, . . . , 0, 1)T is orthogonal to S, then the TLS problem has no solution.

(15)

TLS analysis cnt.

Denote by S the set of right singular vectors corresponding to σn+1. Suppose, that there exits a vector v ∈ S having the form

v =y α  , y ∈ Rn, α 6=0. If x := −1

α v and [∆A, ∆b] = −[A, b]vv

T (6) then ([A, b] + [∆A, ∆b]) x −1  = [A, b](I − vvT)(−v /α) = 0, and x solves the TLS problem.

If en+1= (0, . . . , 0, 1)T is orthogonal to S, then the TLS problem has no solution.

(16)

If σn+1is a repeated singular value of [A, b], then the TLS problem may lack a unique solution. However, in this case it is possible to single out a unique minimum normTLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that [vk +1, . . . ,vn+1]Q =

W y

0 α



, y ∈ Rm, (7)

then xTLS:= −y /α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1is simple, and in case k = n there exists a unique solution if the last component of vn+1 is different from 0. Obviously, if σ0

nis the smallest singular value of A and σn+1 < σ0n, then the TLS solution is unique, and for σn+1= σn0 not.

(17)

TLS analysis cnt.

If σn+1is a repeated singular value of [A, b], then the TLS problem may lack a unique solution. However, in this case it is possible to single out a unique minimum normTLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that [vk +1, . . . ,vn+1]Q =

W y

0 α



, y ∈ Rm, (7)

then xTLS:= −y /α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1is simple, and in case k = n there exists a unique solution if the last component of vn+1 is different from 0. Obviously, if σ0

nis the smallest singular value of A and σn+1 < σ0n, then the TLS solution is unique, and for σn+1= σn0 not.

(18)

If σn+1is a repeated singular value of [A, b], then the TLS problem may lack a unique solution. However, in this case it is possible to single out a unique minimum normTLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that [vk +1, . . . ,vn+1]Q =

W y

0 α



, y ∈ Rm, (7)

then xTLS:= −y /α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1is simple, and in case k = n there exists a unique solution if the last component of vn+1 is different from 0. Obviously, if σ0

nis the smallest singular value of A and σn+1 < σ0n, then the TLS solution is unique, and for σn+1= σn0 not.

(19)

TLS analysis cnt.

If σn+1is a repeated singular value of [A, b], then the TLS problem may lack a unique solution. However, in this case it is possible to single out a unique minimum normTLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that [vk +1, . . . ,vn+1]Q =

W y

0 α



, y ∈ Rm, (7)

then xTLS:= −y /α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1is simple, and in case k = n there exists a unique solution if the last component of vn+1 is different from 0. Obviously, if σ0

nis the smallest singular value of A and σn+1 < σ0n, then the TLS solution is unique, and for σn+1= σn0 not.

(20)

TheoremThe TLS problem has a unique solution if and only if

σn+1 = σmin([A, b]) < σmin(A) = σ0n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (ATA − σ2n+1I)−1ATb (9) which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonov regularization we observe some kind of deregularization in the TLS case. This deregularization results in a larger norm of the solution, i.e. the following relation holds

kxTLSk ≥ kxLSk ≥ kxλk (10) where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(21)

Uniquely solvable case

TheoremThe TLS problem has a unique solution if and only if

σn+1 = σmin([A, b]) < σmin(A) = σ0n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (ATA − σ2n+1I)−1ATb (9)

which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonov regularization we observe some kind of deregularization in the TLS case. This deregularization results in a larger norm of the solution, i.e. the following relation holds

kxTLSk ≥ kxLSk ≥ kxλk (10) where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(22)

TheoremThe TLS problem has a unique solution if and only if

σn+1 = σmin([A, b]) < σmin(A) = σ0n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (ATA − σ2n+1I)−1ATb (9)

which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonov regularization we observe some kind of deregularization in the TLS case. This deregularization results in a larger norm of the solution, i.e. the following relation holds

kxTLSk ≥ kxLSk ≥ kxλk (10) where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(23)

Uniquely solvable case

TheoremThe TLS problem has a unique solution if and only if

σn+1 = σmin([A, b]) < σmin(A) = σ0n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (ATA − σ2n+1I)−1ATb (9)

which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonov regularization we observe some kind of deregularization in the TLS case. This deregularization results in a larger norm of the solution, i.e. the following relation holds

kxTLSk ≥ kxLSk ≥ kxλk (10)

where xλis the solution of the Tikhonov regularized problem for some λ ≥ 0.

(24)

The inequalities (10) can be directly obtained with the help of the SVD of A = U0Σ0V0T n X i=1 σi0(bTu0 i) σi02− σ2 n+1 vi0 ≥ n X i=1 bTu0 i σ0 i vi0 ≥ n X i=1 σ0i(bTu0 i) σi02+ λ v 0 i . (11)

If σn+16= 0 the strict inequality kxTLSk > kxLSk holds, otherwise the system Ax = b is consistent.

(25)

Uniquely solvable case cnt.

The inequalities (10) can be directly obtained with the help of the SVD of A = U0Σ0V0T n X i=1 σi0(bTu0 i) σi02− σ2 n+1 vi0 ≥ n X i=1 bTu0 i σ0 i vi0 ≥ n X i=1 σ0i(bTu0 i) σi02+ λ v 0 i . (11)

If σn+16= 0 the strict inequality kxTLSk > kxLSk holds, otherwise the system Ax = b is consistent.

(26)

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n. (12) It holds that σ2n+1=min v 6=0 vT[A, b]T[A, b]v vTv ≤ f (x) = [xT, −1][A, b]T[A, b][xT, −1]T 1 + kx k2 . If the TLS problem is solvable then we have σ2

n+1=f (xTLS), and if the TLS problem is not solvable, then there exists x ∈ Rn, kx k = 1 with

kAxk = σ0

n= σn+1, and it follows that lim k →∞f (kx ) = limk →∞ kkAx − bk2 1 + k2 = kAx k 2= σ n+1.

thus the solution of the TLS problem can be characterized by the minimization problem

xTLS=argminkAx − bk 2

1 + kx k2 , x ∈ R

(27)

Alternative characterization

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n. (12) It holds that σ2n+1=min v 6=0 vT[A, b]T[A, b]v vTv ≤ f (x) = [xT, −1][A, b]T[A, b][xT, −1]T 1 + kx k2 .

If the TLS problem is solvable then we have σ2

n+1=f (xTLS), and if the TLS problem is not solvable, then there exists x ∈ Rn, kx k = 1 with

kAxk = σ0

n= σn+1, and it follows that lim k →∞f (kx ) = limk →∞ kkAx − bk2 1 + k2 = kAx k 2= σ n+1.

thus the solution of the TLS problem can be characterized by the minimization problem

xTLS=argminkAx − bk 2

1 + kx k2 , x ∈ R

n. (13)

(28)

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n. (12) It holds that σ2n+1=min v 6=0 vT[A, b]T[A, b]v vTv ≤ f (x) = [xT, −1][A, b]T[A, b][xT, −1]T 1 + kx k2 .

If the TLS problem is solvable then we have σ2

n+1=f (xTLS), and if the TLS problem is not solvable, then there exists x ∈ Rn, kx k = 1 with

kAxk = σ0

n= σn+1, and it follows that lim k →∞f (kx ) = limk →∞ kkAx − bk2 1 + k2 = kAx k 2= σ n+1.

thus the solution of the TLS problem can be characterized by the minimization problem

xTLS=argminkAx − bk 2

1 + kx k2 , x ∈ R

(29)

Alternative characterization

Let f (x ) := kAx − bk 2 1 + kx k2 , x ∈ R n. (12) It holds that σ2n+1=min v 6=0 vT[A, b]T[A, b]v vTv ≤ f (x) = [xT, −1][A, b]T[A, b][xT, −1]T 1 + kx k2 .

If the TLS problem is solvable then we have σ2

n+1=f (xTLS), and if the TLS problem is not solvable, then there exists x ∈ Rn, kx k = 1 with

kAxk = σ0

n= σn+1, and it follows that lim k →∞f (kx ) = limk →∞ kkAx − bk2 1 + k2 = kAx k 2= σ n+1.

thus the solution of the TLS problem can be characterized by the minimization problem

xTLS=argminkAx − bk 2

1 + kx k2 , x ∈ R

n. (13)

(30)

1 Total Least Squares Problems

2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations

6 Numerical example

7 Regularization of TLS problems (RTLSLEP)

(31)

Newton’s method

In the generic case the TLS solution x = xTLS satisfies the system of equations ATA ATb bTA bTb   x −1  = λ x −1  (14) where λ = σ2

n+1is the square of the smallest singular value of [A, b]. Hence, (λ, (xT, −1)T)is the eigenpair of the symmetric matrix [A, b]T[A, b] corresponding to its smallest eigenvalue. A widely used method for computing it is inverse iteration where it is natural in this case to normalize the

eigenvector approximation such that its last component is −1.

Inverse iteration is equivalent to Newton’s method for the system of n + 1 nonlinear equations in x and λ

 f (x, λ) g(x , λ)  :=−A Tr − λx −bTr + λ  =0 (15) with r := b − Ax .

(32)

In the generic case the TLS solution x = xTLS satisfies the system of equations ATA ATb bTA bTb   x −1  = λ x −1  (14) where λ = σ2

n+1is the square of the smallest singular value of [A, b]. Hence, (λ, (xT, −1)T)is the eigenpair of the symmetric matrix [A, b]T[A, b] corresponding to its smallest eigenvalue. A widely used method for computing it is inverse iteration where it is natural in this case to normalize the

eigenvector approximation such that its last component is −1.

Inverse iteration is equivalent to Newton’s method for the system of n + 1 nonlinear equations in x and λ

 f (x, λ) g(x , λ)  :=−A Tr − λx −bTr + λ  =0 (15) with r := b − Ax .

(33)

Newton’s method

In the generic case the TLS solution x = xTLS satisfies the system of equations ATA ATb bTA bTb   x −1  = λ x −1  (14) where λ = σ2

n+1is the square of the smallest singular value of [A, b]. Hence, (λ, (xT, −1)T)is the eigenpair of the symmetric matrix [A, b]T[A, b] corresponding to its smallest eigenvalue. A widely used method for computing it is inverse iteration where it is natural in this case to normalize the

eigenvector approximation such that its last component is −1.

Inverse iteration is equivalent to Newton’s method for the system of n + 1 nonlinear equations in x and λ

 f (x, λ) g(x , λ)  :=−A Tr − λx −bTr + λ  =0 (15) with r := b − Ax .

(34)

The Jacobi matrix of this system is J =A TA − λI −x bTA 1  =:  ˜ J −x bTA 1  (16) with ˜J = ATA − σ2I, λ = σ2.

We assume in the following that σ < σ0

nsuch that ˜J is positive definite. Let the current approximation be (xk, λ(k )). Then the corrections are obtained from the linear system

 ˜ J −xk bTA 1   ∆xk ∆λ(k )  = fk gk  =:−A Trk− λ(k )xk −bTrk+ λ(k )  . (17)

(35)

Newton’s method cnt.

The Jacobi matrix of this system is J =A TA − λI −x bTA 1  =:  ˜ J −x bTA 1  (16) with ˜J = ATA − σ2I, λ = σ2.

We assume in the following that σ < σ0

nsuch that ˜J is positive definite. Let the current approximation be (xk, λ(k )). Then the corrections are obtained from the linear system

 ˜ J −xk bTA 1   ∆xk ∆λ(k )  = fk gk  =:−A Trk− λ(k )xk −bTrk+ λ(k )  . (17)

(36)

The Jacobi matrix of this system is J =A TA − λI −x bTA 1  =:  ˜ J −x bTA 1  (16) with ˜J = ATA − σ2I, λ = σ2.

We assume in the following that σ < σ0

nsuch that ˜J is positive definite. Let the current approximation be (xk, λ(k )). Then the corrections are obtained from the linear system

 ˜ J −xk bTA 1   ∆xk ∆λ(k )  = fk gk  =:−A Trk− λ(k )xk −bTrk+ λ(k )  . (17)

(37)

Newton’s method cnt.

To solve this system we take advantage of the block LU factorization of J.

J = I 0 zT 1  ˜ J −xk 0 τ  (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

To improve the numerical stability we note that ˜

Jz = ATb = ATrk+ ˜Jxk+ λkxk = ˜Jxk − fk, and hence z can be computed from z = xk− ˜J−1f

k. By (block) forward and backward substitution we obtain

∆λ(k )= β := (zTfk − gg)/τ, ∆xk = −˜J−1fk+ β ˜J−1xk.

Hence, in each Newton step two symmetric linear systems have to be solved ˜

Jyk =xk, and ˜wk = −fk with ˜J = ATA − σ2I.

(38)

To solve this system we take advantage of the block LU factorization of J. J = I 0 zT 1  ˜ J −xk 0 τ  (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

To improve the numerical stability we note that ˜

Jz = ATb = ATrk+ ˜Jxk+ λkxk = ˜Jxk − fk, and hence z can be computed from z = xk− ˜J−1f

k. By (block) forward and backward substitution we obtain

∆λ(k )= β := (zTfk − gg)/τ, ∆xk = −˜J−1fk+ β ˜J−1xk.

Hence, in each Newton step two symmetric linear systems have to be solved ˜

(39)

Newton’s method cnt.

To solve this system we take advantage of the block LU factorization of J.

J = I 0 zT 1  ˜ J −xk 0 τ  (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

To improve the numerical stability we note that ˜

Jz = ATb = ATrk+ ˜Jxk+ λkxk = ˜Jxk − fk, and hence z can be computed from z = xk− ˜J−1f

k. By (block) forward and backward substitution we obtain

∆λ(k )= β := (zTfk − gg)/τ, ∆xk = −˜J−1fk+ β ˜J−1xk.

Hence, in each Newton step two symmetric linear systems have to be solved ˜

Jyk =xk, and ˜wk = −fk with ˜J = ATA − σ2I.

(40)

To solve this system we take advantage of the block LU factorization of J. J = I 0 zT 1  ˜ J −xk 0 τ  (18) where ˜Jz = ATb, τ = 1 + zTxk =1 + bTA˜J−1xk.

To improve the numerical stability we note that ˜

Jz = ATb = ATrk+ ˜Jxk+ λkxk = ˜Jxk − fk, and hence z can be computed from z = xk− ˜J−1f

k. By (block) forward and backward substitution we obtain

∆λ(k )= β := (zTfk − gg)/τ, ∆xk = −˜J−1fk+ β ˜J−1xk.

Hence, in each Newton step two symmetric linear systems have to be solved ˜

(41)

Newton’s method cnt.

Require: Initial approximation x

1: r = b − Ax

2: σ2=rTr

3: for k = 1, 2, . . . until δ <tol do 4: f = −ATr − σ2x 5: g = −bTr + σ2x 6: δ =pfTf + g2 7: solve (ATA − σ2I)[y , w ] = [x , −f ] 8: z = x + w 9: β = (zTf − g)/(zTx + 1) 10: x = z + βy 11: σ2= σ2+ β 12: r = b − Ax 13: end for

(42)

Newton’s method is known to converge quadratically. One can even get cubic convergence if σ2is not updated by σ2← σ2+ βbut by the Rayleigh quotient

ρ = [x T, −1][A, b]T[A, b][xT, −1]T xTx + 1 = kAx − bk2 1 + kx k2+1 = rTr xTx + 1. Then one gets the followingRayleigh quotient iteration

Require: Initial approximation x 1: for k = 1, 2, . . . until δ <tol do 2: r = b − Ax 3: σ2=rTr /(1 + xTx ) 4: f = −ATr − σ2x 5: g = −bTr + σ2x 6: δ =pfTf + g2 7: solve (ATA − σ2I)[y , w ] = [x , −f ] 8: z = x + w 9: β = (zTf − g)/(zTx + 1) 10: x = z + βy 11: end for

(43)

Rayleigh quotient iteration

Newton’s method is known to converge quadratically. One can even get cubic convergence if σ2is not updated by σ2← σ2+ βbut by the Rayleigh quotient

ρ = [x T, −1][A, b]T[A, b][xT, −1]T xTx + 1 = kAx − bk2 1 + kx k2+1 = rTr xTx + 1. Then one gets the followingRayleigh quotient iteration

Require: Initial approximation x

1: for k = 1, 2, . . . until δ <tol do 2: r = b − Ax 3: σ2=rTr /(1 + xTx ) 4: f = −ATr − σ2x 5: g = −bTr + σ2x 6: δ =pfTf + g2 7: solve (ATA − σ2I)[y , w ] = [x , −f ] 8: z = x + w 9: β = (zTf − g)/(zTx + 1) 10: x = z + βy 11: end for

(44)

1 Total Least Squares Problems

2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations

6 Numerical example

7 Regularization of TLS problems (RTLSLEP)

(45)

Regularized Total Least Squares Problem

If A and [A, b] are ill-conditioned, regularization is necessary. We consider the quadratically constrained total least squares problem.

Let L ∈ Rk ×n, k ≤ n and δ > 0. Then the quadratically constrained formulation of the Regularized Total Least Squares (RTLS) problem reads:

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that k[∆A, ∆b]k2

F =min! subject to (A + ∆A)x = b + ∆b, kLx k2≤ δ2.

Using the orthogonal distance this problems can be rewritten as Find x ∈ Rnsuch that

kAx − bk2

1 + kx k2 =min! subject to kLx k 2≤ δ2.

(46)

If A and [A, b] are ill-conditioned, regularization is necessary. We consider the quadratically constrained total least squares problem.

Let L ∈ Rk ×n, k ≤ n and δ > 0. Then the quadratically constrained formulation of the Regularized Total Least Squares (RTLS) problem reads:

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2

F =min! subject to (A + ∆A)x = b + ∆b, kLx k2≤ δ2.

Using the orthogonal distance this problems can be rewritten as Find x ∈ Rnsuch that

kAx − bk2

1 + kx k2 =min! subject to kLx k 2≤ δ2.

(47)

Regularized Total Least Squares Problem

If A and [A, b] are ill-conditioned, regularization is necessary. We consider the quadratically constrained total least squares problem.

Let L ∈ Rk ×n, k ≤ n and δ > 0. Then the quadratically constrained formulation of the Regularized Total Least Squares (RTLS) problem reads:

Find ∆A ∈ Rm×n, ∆b ∈ Rmand x ∈ Rnsuch that

k[∆A, ∆b]k2

F =min! subject to (A + ∆A)x = b + ∆b, kLx k2≤ δ2.

Using the orthogonal distance this problems can be rewritten as

Find x ∈ Rnsuch that

kAx − bk2

1 + kx k2 =min! subject to kLx k 2≤ δ2.

(48)

If δ > 0 is chosen small enough (e.g. δ < kLxTLSk where xTLSis the solution of the TLS problem), then the constraint kLx k2≤ δ2is active, and the RTLS problem reads

Find x ∈ Rnsuch that

kAx − bk2

1 + kx k2 =min! subject to kLx k

2= δ2. (1)

Problem is equivalent to the quadratic optimization problem kAx − bk2− f

(1 + kx k2) =min! subject to kLx k2= δ2, (2) where

f∗=inf{f (x ) : kLx k2= δ2},

i.e. x∗is a global minimizer of problem (1) if and only if it is a global minimizer of (2).

(49)

Regularized Total Least Squares Problem cnt.

If δ > 0 is chosen small enough (e.g. δ < kLxTLSk where xTLSis the solution of the TLS problem), then the constraint kLx k2≤ δ2is active, and the RTLS problem reads

Find x ∈ Rnsuch that

kAx − bk2

1 + kx k2 =min! subject to kLx k

2= δ2. (1)

Problem is equivalent to the quadratic optimization problem kAx − bk2− f

(1 + kx k2) =min! subject to kLx k2= δ2, (2) where

f∗=inf{f (x ) : kLx k2= δ2},

i.e. x∗is a global minimizer of problem (1) if and only if it is a global minimizer of (2).

(50)

This suggests to consider the following family of quadratic optimization problems:

For fixed y ∈ Rnfind x ∈ Rnsuch that

g(x ; y ) := kAx − bk2− f (y )(1 + kxk2) =min! subject to kLx k2= δ2. (Py)

(51)

Regularized Total Least Squares Problem cnt.

Lemma 1

Problem (Py)admits a global minimizer if and only if

f (y ) ≤ min

x ∈N (L),x 6=0

xTATAx

xTx , (19)

where N (L) denotes the null space of L.

If L is square and nonsingular, the feasible region is F = {x : kLx k2= δ2} is a nondegenerate ellipsoid; Therefore any quadratic function obtains its global minimum on F . On the other hand, N (LTL) = {0}, and any vector x satisfies (19).

If LTL is singular, any x can be uniquely written as x = x

1+x2with x1Tx2=0, x2∈ N (LTL), x1∈ range(LTL), and the unboundedness of g(·, y ) can only occur from the contribution of x2.

Hence, g(·, y ) is bounded below if and only if xT

2Hx2≥ 0 for any x2∈ N (LTL) where H = ATA − f (y )xTx is the Hessian of g(·, y ), from which (19) is readily obtained.

(52)

Lemma 1

Problem (Py)admits a global minimizer if and only if

f (y ) ≤ min

x ∈N (L),x 6=0

xTATAx

xTx , (19)

where N (L) denotes the null space of L.

If L is square and nonsingular, the feasible region is F = {x : kLx k2= δ2} is a nondegenerate ellipsoid; Therefore any quadratic function obtains its global minimum on F . On the other hand, N (LTL) = {0}, and any vector x satisfies (19).

If LTL is singular, any x can be uniquely written as x = x

1+x2with x1Tx2=0, x2∈ N (LTL), x1∈ range(LTL), and the unboundedness of g(·, y ) can only occur from the contribution of x2.

Hence, g(·, y ) is bounded below if and only if xT

2Hx2≥ 0 for any x2∈ N (LTL) where H = ATA − f (y )xTx is the Hessian of g(·, y ), from which (19) is readily obtained.

(53)

Regularized Total Least Squares Problem cnt.

Lemma 1

Problem (Py)admits a global minimizer if and only if

f (y ) ≤ min

x ∈N (L),x 6=0

xTATAx

xTx , (19)

where N (L) denotes the null space of L.

If L is square and nonsingular, the feasible region is F = {x : kLx k2= δ2} is a nondegenerate ellipsoid; Therefore any quadratic function obtains its global minimum on F . On the other hand, N (LTL) = {0}, and any vector x satisfies (19).

If LTL is singular, any x can be uniquely written as x = x

1+x2with x1Tx2=0, x2∈ N (LTL), x1∈ range(LTL), and the unboundedness of g(·, y ) can only occur from the contribution of x2.

Hence, g(·, y ) is bounded below if and only if xT

2Hx2≥ 0 for any x2∈ N (LTL) where H = ATA − f (y )xTx is the Hessian of g(·, y ), from which (19) is readily obtained.

(54)

Lemma 1

Problem (Py)admits a global minimizer if and only if

f (y ) ≤ min

x ∈N (L),x 6=0

xTATAx

xTx , (19)

where N (L) denotes the null space of L.

If L is square and nonsingular, the feasible region is F = {x : kLx k2= δ2} is a nondegenerate ellipsoid; Therefore any quadratic function obtains its global minimum on F . On the other hand, N (LTL) = {0}, and any vector x satisfies (19).

If LTL is singular, any x can be uniquely written as x = x

1+x2with x1Tx2=0, x2∈ N (LTL), x1∈ range(LTL), and the unboundedness of g(·, y ) can only occur from the contribution of x2.

Hence, g(·, y ) is bounded below if and only if xT

2Hx2≥ 0 for any x2∈ N (LTL) where H = ATA − f (y )xTx is the Hessian of g(·, y ), from which (19) is readily obtained.

(55)

RTLSQEP Method

Lemma 2

Assume that y satisfies conditions of Lemma 1 and kLy k = δ, and let z be a global minimizer of problem (Py). Then it holds that

f (z) ≤ f (y ).

Proof:

(1 + kzk)2(f (z) − f (y )) = g(z; y ) ≤ g(y ; y ) = (1 + ky k2)(f (y ) − f (y )) = 0. This monotonicity result suggests the following algorithm:

Require: x0satisfying conditions of Lemma 1 and kLx0k = δ.

for m = 0, 1, 2, . . . until convergence do

Determine global minimizer xm+1of

g(x ; xm) =min! subject to kLx k2= δ2.

end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub (2004), but motivated in a different way.

(56)

Lemma 2

Assume that y satisfies conditions of Lemma 1 and kLy k = δ, and let z be a global minimizer of problem (Py). Then it holds that

f (z) ≤ f (y ).

Proof:

(1 + kzk)2(f (z) − f (y )) = g(z; y ) ≤ g(y ; y ) = (1 + ky k2)(f (y ) − f (y )) = 0. This monotonicity result suggests the following algorithm:

Require: x0satisfying conditions of Lemma 1 and kLx0k = δ.

for m = 0, 1, 2, . . . until convergence do

Determine global minimizer xm+1of

g(x ; xm) =min! subject to kLx k2= δ2.

end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub

(57)

RTLSQEP Method

Lemma 2

Assume that y satisfies conditions of Lemma 1 and kLy k = δ, and let z be a global minimizer of problem (Py). Then it holds that

f (z) ≤ f (y ).

Proof:

(1 + kzk)2(f (z) − f (y )) = g(z; y ) ≤ g(y ; y ) = (1 + ky k2)(f (y ) − f (y )) = 0. This monotonicity result suggests the following algorithm:

Require: x0satisfying conditions of Lemma 1 and kLx0k = δ.

for m = 0, 1, 2, . . . until convergence do

Determine global minimizer xm+1of

g(x ; xm) =min! subject to kLx k2= δ2. end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub (2004), but motivated in a different way.

(58)

Lemma 2

Assume that y satisfies conditions of Lemma 1 and kLy k = δ, and let z be a global minimizer of problem (Py). Then it holds that

f (z) ≤ f (y ).

Proof:

(1 + kzk)2(f (z) − f (y )) = g(z; y ) ≤ g(y ; y ) = (1 + ky k2)(f (y ) − f (y )) = 0. This monotonicity result suggests the following algorithm:

Require: x0satisfying conditions of Lemma 1 and kLx0k = δ.

for m = 0, 1, 2, . . . until convergence do

Determine global minimizer xm+1of

g(x ; xm) =min! subject to kLx k2= δ2. end for

This is exactly the RTLSQEP (regularized total least squares method via quadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub

(59)

RTLSQEP cnt.

Form Lemma 2 it follows that

0 ≤ f (xm+1) ≤f (xm),

i.e. {f (xm)}is monotonically decreasing und bounded below.

The (Pxm)can be solved via the first order necessary optimality conditions

(ATA − f (xm)I)x + λLTLx = ATb, kLxk2= δ2.

Although g(·; xm)in general is not convex these conditions are even sufficient if the Lagrange parameter is chosen maximal.

(60)

Form Lemma 2 it follows that

0 ≤ f (xm+1) ≤f (xm),

i.e. {f (xm)}is monotonically decreasing und bounded below.

The (Pxm)can be solved via the first order necessary optimality conditions (ATA − f (xm)I)x + λLTLx = ATb, kLxk2= δ2.

Although g(·; xm)in general is not convex these conditions are even sufficient if the Lagrange parameter is chosen maximal.

(61)

RTLSQEP cnt.

Form Lemma 2 it follows that

0 ≤ f (xm+1) ≤f (xm),

i.e. {f (xm)}is monotonically decreasing und bounded below.

The (Pxm)can be solved via the first order necessary optimality conditions (ATA − f (xm)I)x + λLTLx = ATb, kLxk2= δ2.

Although g(·; xm)in general is not convex these conditions are even sufficient if the Lagrange parameter is chosen maximal.

(62)

Theorem 1

Assume that (ˆλ, ˆx ) solves the first order conditions.

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2. (20)

If kLy k = δ and ˆλis the maximal Lagrange multiplier then ˆx is a global minimizer of problem (Py).

Proof

The statement follows immediately from the following equation which can be shown similarly as in Gander (1981):

If (λj,zj), j = 1, 2, are solutions of (20) then it holds that g(z2;y ) − g(z1;y ) =1

21− λ2)kL(z

(63)

RTLSQEP cnt.

Theorem 1

Assume that (ˆλ, ˆx ) solves the first order conditions.

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2. (20)

If kLy k = δ and ˆλis the maximal Lagrange multiplier then ˆx is a global minimizer of problem (Py).

Proof

The statement follows immediately from the following equation which can be shown similarly as in Gander (1981):

If (λj,zj), j = 1, 2, are solutions of (20) then it holds that g(z2;y ) − g(z1;y ) =1

21− λ2)kL(z

1− z2)k2.

(64)

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hTu = δ2then the solution of problem (∗) is recovered as x = L−T(W + ˆλI)u.

(65)

A quadratic eigenproblem

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hTu = δ2then the solution of problem (∗) is recovered as x = L−T(W + ˆλI)u.

(66)

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hTu = δ2then the solution of problem (∗) is recovered as x = L−T(W + ˆλI)u.

(67)

A quadratic eigenproblem

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hTu = δ2then the solution of problem (∗) is recovered as x = L−T(W + ˆλI)u.

(68)

The first order conditions

(ATA − f (y )I)x + λLTLx = ATb, kLxk2= δ2 can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T(ATA − f (y )I)L−1z + λz = L−TATb =: h, zTz = δ2. With

u := (W + λI)−2h ⇒ hTu = zTz = δ2 ⇒ h = δ−2hhTu Hence,

(W + λI)2u − δ−2hhTu = 0.

If ˆλis the right-most real eigenvalue, and the corresponding eigenvector is scaled such that hTu = δ2then the solution of problem (∗) is recovered as x = L−T(W + ˆλI)u.

(69)

A quadratic eigenproblem cnt.

Assume that k < n where L has linearly independent rows.

Let LTL = USUT be the spectral decomposition of LTL. Then the first order condition is equivalent to



(AU)T(AU) − f (y )Iz + λSz = (AU)Tb, zTSz = δ2 with z = UTx .

Partitioning the matrices and vectors in block form (AU)T(AU) = T1 T2 TT 2 T4  , S =S1 0 0 0  , (AU)Tb =c1 c2  , z =z1 z2 

where the leading blocks have dimension k , one gets T1− f (y )Ik T2 TT 2 T4− f (y )In−k  z1 z2  + λS1z1 0  =c1 c2  .

(70)

Assume that k < n where L has linearly independent rows.

Let LTL = USUT be the spectral decomposition of LTL. Then the first order condition is equivalent to



(AU)T(AU) − f (y )Iz + λSz = (AU)Tb, zTSz = δ2 with z = UTx .

Partitioning the matrices and vectors in block form (AU)T(AU) = T1 T2 TT 2 T4  , S =S1 0 0 0  , (AU)Tb =c1 c2  , z =z1 z2 

where the leading blocks have dimension k , one gets T1− f (y )Ik T2 TT 2 T4− f (y )In−k  z1 z2  + λS1z1 0  =c1 c2  .

(71)

A quadratic eigenproblem cnt.

Assume that k < n where L has linearly independent rows.

Let LTL = USUT be the spectral decomposition of LTL. Then the first order condition is equivalent to



(AU)T(AU) − f (y )Iz + λSz = (AU)Tb, zTSz = δ2 with z = UTx .

Partitioning the matrices and vectors in block form (AU)T(AU) = T1 T2 TT 2 T4  , S =S1 0 0 0  , (AU)Tb =c1 c2  , z =z1 z2 

where the leading blocks have dimension k , one gets T1− f (y )Ik T2 TT 2 T4− f (y )In−k  z1 z2  + λS1z1 0  =c1 c2  .

(72)

Solving the second component for z2

z2= (T4− f (y )In−k)−1(c2− T2Ty1) and substituting in the first component one gets



T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T 

z1+ λS1z1 = (c1− T2(T4− f (y )In−k)−1c2).

Hence, one gets the quadratic eigenvalue problem (W + λI)2u − δ−2hhTu = 0, with W = S1−1/2T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T  S1−1/2 h = S1−1/2c1− T2(T4− f (y )In−k)−1c2  .

(73)

A quadratic eigenproblem cnt.

Solving the second component for z2

z2= (T4− f (y )In−k)−1(c2− T2Ty1) and substituting in the first component one gets



T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T 

z1+ λS1z1 = (c1− T2(T4− f (y )In−k)−1c2).

Hence, one gets the quadratic eigenvalue problem (W + λI)2u − δ−2hhTu = 0, with W = S1−1/2T1− f (y )Ik− T2(T4− f (y )In−k)−1T2T  S1−1/2 h = S1−1/2c1− T2(T4− f (y )In−k)−1c2  .

(74)

If (λ, u) is the eigenpair corresponding to the right most eigenvalue and u is normalized such that uTh = δ2, and w = (W + λI)u, then the solution of the first order conditions is recovered by x = Uz where

z =z1 z2  = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

For k < n the quadratic eigenproblem looks very complicated. Notice however, that n − k usually is very small (n − k = 1 and n − k = 2 for the discrete first and second order derivative) and quite often fast decomposition techniques can be used.

Moreover, L with k < n often can be replace by some nonsingular approximation.

(75)

A quadratic eigenproblem cnt.

If (λ, u) is the eigenpair corresponding to the right most eigenvalue and u is normalized such that uTh = δ2, and w = (W + λI)u, then the solution of the first order conditions is recovered by x = Uz where

z =z1 z2  = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

For k < n the quadratic eigenproblem looks very complicated. Notice however, that n − k usually is very small (n − k = 1 and n − k = 2 for the discrete first and second order derivative) and quite often fast decomposition techniques can be used.

Moreover, L with k < n often can be replace by some nonsingular approximation.

(76)

If (λ, u) is the eigenpair corresponding to the right most eigenvalue and u is normalized such that uTh = δ2, and w = (W + λI)u, then the solution of the first order conditions is recovered by x = Uz where

z =z1 z2  = S −1/2 1 w (T4− f (y )I)−1(c2− T2TS −1/2 1 w ) ! .

For k < n the quadratic eigenproblem looks very complicated. Notice however, that n − k usually is very small (n − k = 1 and n − k = 2 for the discrete first and second order derivative) and quite often fast decomposition techniques can be used.

Moreover, L with k < n often can be replace by some nonsingular approximation.

(77)

Convergence of RTLSQEP

Theorem

Any limit point x∗of the sequence {xm} constructed by RTLSQEP is a global minimizer of the optimization problem

f (x ) = min subject to kLx k2= δ2.

Proof: Let x∗be a limit point of {xm}, and let {xmj} be a subsequence

converging to x∗. Then xmj solves the first order conditions

(ATA − f (xmj−1)I)xmj+ λ

mjL

TLxmj =ATb.

By the monotonicity of f (xm)it follows that f (xmj−1)converges to f (x).

Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λmj0} converges to some λ, and xsatisfies

(ATA − f (x∗)I)x∗+ λ∗LTLx∗=ATb, kLx∗k2= δ2, where λ∗is the maximal Lagrange multiplier.

(78)

Theorem

Any limit point x∗of the sequence {xm} constructed by RTLSQEP is a global minimizer of the optimization problem

f (x ) = min subject to kLx k2= δ2.

Proof: Let x∗be a limit point of {xm}, and let {xmj} be a subsequence converging to x∗. Then xmj solves the first order conditions

(ATA − f (xmj−1)I)xmj+ λmjLTLxmj =ATb.

By the monotonicity of f (xm)it follows that f (xmj−1)converges to f (x).

Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λmj0} converges to some λ, and xsatisfies

(ATA − f (x∗)I)x∗+ λ∗LTLx∗=ATb, kLx∗k2= δ2, where λ∗is the maximal Lagrange multiplier.

(79)

Convergence of RTLSQEP

Theorem

Any limit point x∗of the sequence {xm} constructed by RTLSQEP is a global minimizer of the optimization problem

f (x ) = min subject to kLx k2= δ2.

Proof: Let x∗be a limit point of {xm}, and let {xmj} be a subsequence converging to x∗. Then xmj solves the first order conditions

(ATA − f (xmj−1)I)xmj+ λmjLTLxmj =ATb.

By the monotonicity of f (xm)it follows that f (xmj−1)converges to f (x). Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λmj0} converges to some λ, and xsatisfies

(ATA − f (x∗)I)x∗+ λ∗LTLx∗=ATb, kLx∗k2= δ2, where λ∗is the maximal Lagrange multiplier.

(80)

Theorem

Any limit point x∗of the sequence {xm} constructed by RTLSQEP is a global minimizer of the optimization problem

f (x ) = min subject to kLx k2= δ2.

Proof: Let x∗be a limit point of {xm}, and let {xmj} be a subsequence converging to x∗. Then xmj solves the first order conditions

(ATA − f (xmj−1)I)xmj+ λmjLTLxmj =ATb.

By the monotonicity of f (xm)it follows that f (xmj−1)converges to f (x). Since W (y ) and h(y ) depend continuously on f (y ) the sequence of right-most eigenvalues {λmj0} converges to some λ, and xsatisfies

(ATA − f (x∗)I)x∗+ λ∗LTLx∗=ATb, kLx∗k2= δ2, where λ∗is the maximal Lagrange multiplier.

(81)

Convergence of RTLSQEP

Hence, x∗is a global minimizer of

g(x ; x∗) =min! subject to kLx k2= δ2,

and for y ∈ Rnwith kLy k2= δ2it follows that 0 = g(x∗;x∗) ≤g(y ; x∗) = kAy − bk2− f (x∗ )(1 + ky k2) = (f (y ) − f (x∗))(1 + ky k2), i.e. f (y ) ≥ f (x∗).

(82)

Hence, x∗is a global minimizer of

g(x ; x∗) =min! subject to kLx k2= δ2,

and for y ∈ Rnwith kLy k2= δ2it follows that 0 = g(x∗;x∗) ≤g(y ; x∗) = kAy − bk2− f (x∗ )(1 + ky k2) = (f (y ) − f (x∗))(1 + ky k2), i.e. f (y ) ≥ f (x∗).

(83)

Outline

1 Total Least Squares Problems

2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations

6 Numerical example

7 Regularization of TLS problems (RTLSLEP)

8 Determining the regualrization parameter

(84)

Let T (λ) ∈ Cn×n, T (λ) = T (λ)H, λ ∈ J ⊂ R an open interval (maybe unbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function f (·; x ) : J → R, f (λ; x) := xHT (λ)x is continuous, and that the real equation

f (λ, x ) = 0 has at most one solution λ =: p(x ) in J.

Then equation f (λ, x ) = 0 implicitly defines a functional p on some subset D of Cnwhich we call theRayleigh functional.

Assume that

(85)

Nonlinear maxmin characterization

Let T (λ) ∈ Cn×n, T (λ) = T (λ)H, λ ∈ J ⊂ R an open interval (maybe unbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function f (·; x ) : J → R, f (λ; x) := xHT (λ)x is continuous, and that the real equation

f (λ, x ) = 0 has at most one solution λ =: p(x ) in J.

Then equation f (λ, x ) = 0 implicitly defines a functional p on some subset D of Cnwhich we call theRayleigh functional.

Assume that

(λ −p(x ))f (λ, x ) > 0 for every x ∈ D, λ 6= p(x ).

(86)

Let T (λ) ∈ Cn×n, T (λ) = T (λ)H, λ ∈ J ⊂ R an open interval (maybe unbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function f (·; x ) : J → R, f (λ; x) := xHT (λ)x is continuous, and that the real equation

f (λ, x ) = 0 has at most one solution λ =: p(x ) in J.

Then equation f (λ, x ) = 0 implicitly defines a functional p on some subset D of Cnwhich we call theRayleigh functional.

Assume that

(87)

Nonlinear maxmin characterization

Let T (λ) ∈ Cn×n, T (λ) = T (λ)H, λ ∈ J ⊂ R an open interval (maybe unbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function f (·; x ) : J → R, f (λ; x) := xHT (λ)x is continuous, and that the real equation

f (λ, x ) = 0 has at most one solution λ =: p(x ) in J.

Then equation f (λ, x ) = 0 implicitly defines a functional p on some subset D of Cnwhich we call theRayleigh functional.

Assume that

(λ −p(x ))f (λ, x ) > 0 for every x ∈ D, λ 6= p(x ).

(88)

Let supv ∈D p(v ) ∈ J and assume that there exists a subspace W ⊂ Cnof dimension ` such that

W ∩ D 6= ∅ and inf

v ∈W ∩D p(v ) ∈ J.

Then T (λ)x = 0 has at least ` eigenvalues in J, and for j = 1, . . . , ` the j-largest eigenvalue λj can be characterized by

λj = max

dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1)

For j = 1, . . . , ` every j dimensional subspace V ⊂ Cnwith V ∩ D 6= ∅ and λj = inf

v ∈V ∩D p(v )

is contained in D ∪ {0}, and the maxmin characterization of λj can be replaced by λj = max dim V =j, V \{0}⊂D min v ∈V \{0}p(v ).

(89)

maxmin characterization (V., Werner 1982)

Let supv ∈D p(v ) ∈ J and assume that there exists a subspace W ⊂ Cnof dimension ` such that

W ∩ D 6= ∅ and inf

v ∈W ∩D p(v ) ∈ J.

Then T (λ)x = 0 has at least ` eigenvalues in J, and for j = 1, . . . , ` the j-largest eigenvalue λj can be characterized by

λj = max dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1) For j = 1, . . . , ` every j dimensional subspace V ⊂ Cnwith

V ∩ D 6= ∅ and λj = inf v ∈V ∩D p(v )

is contained in D ∪ {0}, and the maxmin characterization of λj can be replaced by λj = max dim V =j, V \{0}⊂D min v ∈V \{0}p(v ).

(90)

Let supv ∈D p(v ) ∈ J and assume that there exists a subspace W ⊂ Cnof dimension ` such that

W ∩ D 6= ∅ and inf

v ∈W ∩D p(v ) ∈ J.

Then T (λ)x = 0 has at least ` eigenvalues in J, and for j = 1, . . . , ` the j-largest eigenvalue λj can be characterized by

λj = max dim V =j, V ∩D6=∅

inf

v ∈V ∩D p(v ). (1) For j = 1, . . . , ` every j dimensional subspace V ⊂ Cnwith

V ∩ D 6= ∅ and λj = inf v ∈V ∩D p(v )

is contained in D ∪ {0}, and the maxmin characterization of λj can be replaced by λj = max dim V =j, V \{0}⊂D min v ∈V \{0}p(v ).

(91)

Back to

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

f (λ, x ) = xHT (λ)x = λ2kxk2+2λxHWx + kWx k2− |xHh|22, x 6= 0 is a parabola which attains its minimum at

λ = −x HWx xHx .

Let J = (−λmin, ∞)where λminis the minimum eigenvalue of W . Then f (λ, x ) = 0 has at most one solution p(x ) ∈ J for every x 6= 0. Hence, the Rayleigh functional p of (QEP) corresponding to J is defined, and the general conditions are satisfied.

(92)

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

f (λ, x ) = xHT (λ)x = λ2kxk2+2λxHWx + kWx k2− |xHh|22, x 6= 0 is a parabola which attains its minimum at

λ = −x HWx xHx .

Let J = (−λmin, ∞)where λminis the minimum eigenvalue of W . Then f (λ, x ) = 0 has at most one solution p(x ) ∈ J for every x 6= 0. Hence, the Rayleigh functional p of (QEP) corresponding to J is defined, and the general conditions are satisfied.

(93)

Back to

T (λ)x := (W + λI)2− δ−2hhTx = 0 (QEP)

f (λ, x ) = xHT (λ)x = λ2kxk2+2λxHWx + kWx k2− |xHh|22, x 6= 0 is a parabola which attains its minimum at

λ = −x HWx xHx .

Let J = (−λmin, ∞)where λminis the minimum eigenvalue of W . Then f (λ, x ) = 0 has at most one solution p(x ) ∈ J for every x 6= 0. Hence, the Rayleigh functional p of (QEP) corresponding to J is defined, and the general conditions are satisfied.

(94)

Let xminbe an eigenvector of W corresponding to λmin. Then

f (−λmin,xmin) =xminH (W − λmin)2xmin− |xminH h|22= −|xminH h|22≤ 0

Hence, if xH

minh 6= 0 then xmin∈ D. If xH

minh = 0, and the minimum eigenvalue µminof T (−λmin)is negative, then for the corresponding eigenvector yminit holds

f (−λmin,ymin) =yminH T (−λmin)ymin= µminkymink2<0, and ymin∈ D.

If xH

minh = 0, and T (−λmin)is positive semi-definite, then

f (−λmin,x ) = xHT (−λmin)x ≥ 0 for every x 6= 0, and D = ∅.

Referenzen

ÄHNLICHE DOKUMENTE

Keywords: Gravity field, Least-Squares Collocation, Gravity and Ocean Circulation Explorer Mission (GOCE), Calibration... Vermessung &amp;

Short range interactions in the AA chain are important for the secondary structure:. α-helix performs a 100 ◦ turn per amino acid full turn after

Modulus-Type Inner Outer Iterative Methods for Nonnegative Constrained Least Squares Problems For the solution of large sparse nonnegative constrained least squares (NNLS) problems,

Fg.2 further illustrates that ML estimation is technically difficult both for P u s and ID systems, that LS reduces the distance from the theoretical model to the

(2001), reproducing kernel Hilbert space methods have gained popularity in recent years, especially in the machine learning com- munity, and many regularized regression techniques

At the time of the discovery of Ceres, it was well-known how to compute the six elements of the orbit of a planet from two sets of heliocentric coordinates x, y, z.. After long

While coRLSR like many other semi-supervised or trans- ductive approaches has cubic runtime complexity in the amount of unlabelled data, we proposed a semi- parametric approximation

We consider seven estimators: (1) the least squares estimator for the full model (labeled Full), (2) the averaging estimator with equal weights (labeled Equal), (3) optimal