Least upper bound of truncation error of low‑rank matrix approximation algorithm using QR decomposition with pivoting

(1)

ORIGINAL PAPER

Least upper bound of truncation error of low‑rank matrix approximation algorithm using QR decomposition

with pivoting

Haruka Kawamura¹ · Reiji Suda¹

Received: 31 May 2020 / Revised: 24 January 2021 / Accepted: 1 February 2021 / Published online: 24 February 2021

Abstract

Low-rank approximation by QR decomposition with pivoting (pivoted QR) is known to be less accurate than singular value decomposition (SVD); however, the calcula- tion amount is smaller than that of SVD. The least upper bound of the ratio of the truncation error, defined by ‖A−BC‖² , using pivoted QR to that using SVD is proved to be √

4^k−1

3 (n−k) +1 for A∈ℝ^m^×ⁿ(m≥n) , approximated as a product of B∈ℝ^m^×^k and C∈ℝ^k^×ⁿ in this study.

Keywords Error analysis · Pivoting · QR decomposition · Singular values Mathematics Subject Classification 65F55 · 15A45

1 Introduction

1.1 Low‑rank approximation

Low-rank matrix approximation involves approximating a matrix by a matrix whose rank is less than that of the original matrix. Let A∈ℝ^m×n ; then, a rank k approximation of A is given by

where B∈ℝ^m×k and C∈ℝ^k×n . Low-rank matrix approximation appears in many applications such as data mining [5] and machine learning [14]. It also plays an important role in tensor decompositions [12].

A≈BC

* Reiji Suda

reiji@is.s.u-tokyo.ac.jp Haruka Kawamura kawamulahaluka@gmail.com

1 The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

(2)

This paper discusses truncation errors of low-rank matrix approximation using QR decomposition with pivoting, or pivoted QR. In this study, rounding errors are not considered, and the norm used is basically 2-norm. A∈ℝ^m×n (without loss of generality, we assume that m≥n ) is approximated by a product of B∈ℝ^m×k and C∈ℝ^k×n , and the truncation error is defined by ‖A−BC‖2.

It is well-known that for any matrix A∈ℝ^m×n ( m≥n ), there are orthogonal matrices U∈ℝ^m×m and V ∈ℝ^n×n and a diagonal matrix 𝛴 ∈ℝ^n×n with nonnegative diagonal elements that satisfy

This is a singular value decomposition (SVD) of A. We define 𝜎

i(A) for i=1 , 2, ..., n, satisfying

and assume that 𝜎

1(A)≥𝜎

2(A)≥⋯≥𝜎

n(A)≥0 without loss of generality. The 𝜎 values are singular values of A. A has rank k if and only if 𝜎 i

k(A)>0=𝜎

k+1(A) . Let

Then,

holds [8]. Therefore, this is an A’s rank-k approximation whose 2-norm of truncation error is the smallest. We define the truncation error of low-rank approximation by SVD as

The amount of computation required to calculate SVD is O(nmmin(n,m)).

Pivoted QR was proposed by Golub in 1965 [7]. Because the amount of computation required to calculate the low-rank approximation by pivoted QR is O(nmk), it is cheaper than SVD and hence useful in many applications such as solving rank- deficient least squares problems [2]. It consists of QR decomposition and pivoting.

For any matrix A, there exist Q∈ℝ^m×n and an upper triangular matrix R∈ℝ^n×n that satisfy A=QR and Q^TQ=I_n . This is a QR decomposition of A. We use pivoting to determine the permutation matrix 𝛱_grd and apply the QR decomposition algorithm to A𝛱_grd . The subscript grd signifies the greedy method, as explained previously.

Hereafter, we redefine QR as a QR decomposition of A𝛱_grd=QR . Let Q and R be partitioned as

A=U (𝛴

O )

V^T.

diag(𝜎

1(A),𝜎

2(A),…,𝜎_n(A)) =𝛴,

𝛴_k=diag(𝜎

1(A),𝜎

2(A),…,𝜎_k(A)).

rank(X)min≤k‖A−X‖2=��

� A−U

�𝛴

k O

O O

� V^T��

�²

=𝜎

k+1(A)

SVD_k(A) =𝜎

k+1(A).

(3)

where Q_1k∈ℝ^m×k and R_1k∈ℝ^k×k. Then, we can approximate A to Q_1k(

R_1k R_2k) 𝛱^T

grd and

holds. We define the truncation error of low-rank approximation by pivoted QR as

In this study, the greedy method is used to make ‖R_3k‖2 small in pivoting. Pivoting is performed such that the elements in R= (r_ij) satisfy the following inequalities [1, p.103]

Condition (1) is not used to analyze the error for l=k+1 , k+2 , ..., n−1.

The greedy method of pivoting is not always optimal. QR decompositions of A𝛱_RR , where 𝛱_RR is chosen such that R_RR has a small lower right block and where Q_RRR_RR is a QR decomposition of A𝛱_RR , are called rank-revealing QR (RRQR). The following theorem was shown by Hong et al. in 1992 [9].

Theorem 1 Let m≥n>k, and A∈ℝ^m×n. Then, there exists a permutation matrix 𝛱 ∈ℝ^n×n such that the diagonal blocks of R=

(R₁ R₂ O R₃ )

, the upper triangular factor of the QR decomposition of A𝛱 with R₁∈ℝ^k×k, satisfy the following inequality:

Finding the optimal permutation matrix is not practical from the viewpoint of computational complexity.

1.2 Truncation error of pivoted QR

Pivoted QR sometimes results in a large truncation error. A well-known example was shown by Kahan, whose work we do not reproduce here [10]. In 1968, Faddeev et al. [6] showed that

Furthermore,

Q=(

Q_1k Q_2k) ,R=

(R_1k R_2k O R_3k )

‖A−Q_1k�

R_1k R_2k� 𝛱^T

grd‖2=‖R_3k‖2

pivotQR_k(A) =‖R_3k‖2.

(1) r²_ll≥

∑j i=l

r²_ij (l=1, 2,…,n−1, j=l+1,l+2,…,n).

‖R₃‖2≤√

k(n−k) +min(k,n−k)𝜎

k+1(A).

pivotQR_n−1(A)≤

√4ⁿ+6n−1

3 SVD_n−1(A).

(4)

holds [3].

However, in a survey in 2017, it was stated that “very little is known in theory about its behaviour” [13, p. 2218] with regard to pivoted QR, thus there is still room for further research on pivoted QR.

Our previous work showed that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is √

4ⁿ⁻¹+2

3 in case an m×n ( m≥n ) matrix is approximated to a matrix whose rank is n−1 , i.e., for k=n−1 [11]. The tight upper bound for all k is proved in the rest of this paper.

We assume that all matrices and vectors in this paper are real numbers; however, we can easily extend the discussion in this paper to complex numbers, and the same results can be obtained.

2 Preliminaries

In this section, we define the notations and examine the basic properties to analyze the truncation errors. First, we introduce the concept resi.

Proposition 1 [1, p. 16] For A∈ℝ^m×n, there exists X∈ℝ^n×m that satisfies

and X is uniquely determined by the four conditions.

Definition 1 For A∈ℝ^m×n ( m≥n ), the generalized inverse of A is defined by X∈ℝ^n×m that satisfies the four conditions in Proposition 1 and is denoted by A^†.

The following notation is closely related to the truncation error of pivoted QR.

Definition 2 Let A∈ℝ^m×n ( m≥n ) and B∈ℝ^m×l . We define resi(A,B) as

We denote the inner product of two vectors x and y as (x,y). Example 1 For x∈ℝⁿ and y∈ℝⁿ , if x≠0 , then the following holds:

The following lemma will be used to identify resi. pivotQR_k(A)≤ n√

4^k+6k−1

3 SVD_k(A)

AXA=A,XAX=X,(AX)^T =AX,(XA)^T=XA

resi(A,B) =B−AA^†B.

resi(x,y) =y−(x,y)

‖x‖²x.

(5)

holds.

Proof If resi(A,B) =B−AX holds, then

holds. If A^TAX−A^TB=O holds, then

holds. ◻

Lemma 2 [1, p. 5] Let A∈ℝ^m×n(m≥n), b∈ℝ^m, and x∈ℝⁿ. ‖b−Ax‖≤‖b−Ay‖ holds for any y∈ℝⁿ if and only if A^T(Ax−b) =0 holds.

Using Lemmas 1 and 2, we can obtain the following lemma.

Lemma 3 Let A∈ℝ^m×n(m≥n), b∈ℝ^m, and x∈ℝⁿ. ‖b−Ax‖≤‖b−Ay‖ holds for any y∈ℝⁿ if and only if resi(A,b) =b−Ax holds.

Lemma 4 Let m≥n>k, A∈ℝ^m×n, and B∈ℝ^m×l. Let A be partitioned as

where A_1k∈ℝ^m×k. Then,

holds.

Proof From the definition of resi , we can see that

and

hold where X=A^†

1kA_2k , Y =A^†

1kB and Z=resi(A_1k,A_2k)^†resi(A_1k,B) . Thus, holds from (2), (3), and (4). Lemma 1 proves

A^TAX−A^TB=O⇔resi(A,B) =B−AX

A^TAX−A^TB= −A^Tresi(A,B) = (A^TAA^†−A^T)B

= (A^T(AA^†)^T−A^T)B= ((AA^†A)^T−A^T)B=O

resi(A,B) −B+AX=AX−AA^†B=AA^†AX−AA^†B

= (AA^†)^TAX− (AA^†)^TB=A^†T(A^TAX−A^TB) =O

A=(

A_1k A_2k)

resi(A,B) =resi(resi(A_1k,A_2k), resi(A_1k,B))

(2) resi(A_1k,A_2k) =A_2k−A_1kX,

(3) resi(A_1k,B) =B−A_1kY

(4) resi(resi(A_1k,A_2k), resi(A_1k,B)) =resi(A_1k,B) −resi(A_1k,A_2k)Z

(5) resi(resi(A_1k,A_2k), resi(A_1k,B)) =B−A_1kY−A_2kZ+A_1kXZ

(6)

from (2),

from (3), and

from (4). We can see that

from (5), (6), and (7). We can see that

from (2), (4), (8), and (9). Then, (9) and (10) can be combined as

Next, (5) can be rewritten as

From this and (11), we have

Application of Lemma 1 to this proves the lemma. ◻

QR decomposition and resi have the following relation. Note that QR in this lemma is without pivoting.

Lemma 5 Let m≥n>l, A∈ℝ^m×n, and A=QR be a QR decomposition parti- tioned as

(6) A^T_1kA_2k=A^T_1kA_1kX,

(7) A^T

1kB=A^T

1kA_1kY

(8) resi(A_1k,A_2k)^T(resi(A_1k,B) −resi(A_1k,A_2k)Z) =O

(9) A^T

1kresi(resi(A_1k,A_2k), resi(A_1k,B))

= A^T_1k(B−A_1kY−A_2kZ+A_1kXZ)

= O

(10) A^T

2kresi(resi(A_1k,A_2k), resi(A_1k,B))

= (A_2k−A_1kX)^Tresi(resi(A_1k,A_2k), resi(A_1k,B))

= resi(A_1k,A_2k)^T(resi(A_1k,B) −resi(A_1k,A_2k)Z)

= O

(11) (A^T_1k

A^T

2k

)

resi(resi(A_1k,A_2k), resi(A_1k,B))

= A^Tresi(resi(A_1k,A_2k), resi(A_1k,B)) =O.

resi(resi(A_1k,A_2k), resi(A_1k,B)) =B−A

(Y−XZ Z

) .

A^T (

B−A

(Y−XZ Z

))

=O.

(7)

where A_1l∈ℝ^m×l,Q_1l∈ℝ^m×l,R_1l∈ℝ^l×l. If rank(A1l) =l holds, then

holds.

Proof We have

Let

Then, we have

Furthermore,

holds. Application of Lemma 1 to this proves the lemma. ◻ Then, we return to pivoted QR. Let

where A_1k∈ℝ^m×k . From Lemma 5, we can see that

for l=1 , 2, ..., k and j=l+1 , l+2 , ..., n and

if rank(A_1k) =k holds. The last equation suggests that, as long as rank(A_1k) =k holds, the value of pivotQR_k(A) is determined only from A_1k and A_2k , or equivalently from 𝛱

grd , and is independent of how (or in what algorithm) the QR decomposition is computed.

A=( A_1l A_2l)

, Q=(

Q_1l Q_2l)

, R=

(R_1l R_2l O R_3l )

resi(A_1l,A_2l) =Q_2lR_3l

A_1l=Q_1lR_1l, A_2l=Q_1lR_2l+Q_2lR_3l.

X=R⁻¹_1lR_2l.

Q_2lR_3l=A_2l−A_1lX.

A^T

1l(A_2l−A_1lX) =A^T

1lQ_2lR_3l=R^T

1lQ^T

1lQ_2lR_3l=R^T

1lOR_3l=O

A𝛱_grd =(

a𝜋₁ a𝜋₂ … a𝜋_n

)=(

A_1k A_2k)

(1)⇔‖‖‖

(r_ll r_l+1l … r_nl)T‖‖‖

2

≥‖‖‖

(r_lj r_l+1j … r_nj)T‖‖‖

2

⇔‖‖‖Q_2(l−1)(

r_ll r_l+1l … r_nl)T‖‖‖

2

≥‖‖‖Q_2(l−1)(

r_lj r_l+1j … r_nj)T‖‖‖

2

⇔‖

‖‖resi((

a𝜋

1 … a𝜋

l−1

),a𝜋

l

)T‖

‖‖

2

≥‖‖

‖‖resi((

a𝜋

1 … a𝜋

l−1

),a𝜋

j

)T‖‖

‖‖

2

pivotQR_k(A) =‖R_3k‖2=‖Q_2kR_3k‖2=‖resi(A_1k,A_2k)‖2

(8)

3 Evaluation from above We bound ^pivotQR_SVD ^k^(A)

k(A) from above in this section. Since pivotQR_k(A) =SVD_k(A) =0 holds if rank(A)≤k holds, we only consider the case rank(A)>k . Let A=U𝛴V^T be one SVD. Since A𝛱

grd=U𝛴(𝛱^T

grdV)^T and (𝛱^T

grdV)^T(𝛱^T

grdV) =I_n hold, U𝛴(𝛱^T

grdV)^T is one SVD of A𝛱_grd . Then, we can see that

Hereafter, we change what A represents. The previous A𝛱_grd is replaced by A. Let A∈ℝ^m×n that satisfies

be partitioned as

where A_1k∈ℝ^m×k and rank(A_1k) =k . We should compare 𝜎

k+1(A) =SVD_k(A) and

‖resi(A_1k,A_2k)‖2=pivotQR_k(A).

Lemma 6 Let m≥n, A∈ℝ^m×n, and B∈ℝ^m×l. For any v∈ℝ^l,

holds.

Proof From the definition of resi,

holds. Thus,

holds. ◻

We can see that

from the definition of 2-norm and Lemma 6. Now, we introduce an essential theorem of this paper.

SVD_k(A) =𝜎

k+1(A) =𝜎

k+1(A𝛱_grd).

(12)

‖‖

‖resi((

a₁ … a_i−1) ,a_i)‖‖‖

≥ ‖‖‖resi((

a₁ … a_i−1) ,a_j)‖

‖‖(i=1,…,k, j=i+1,…,n)

A=(

a₁ a₂ … a_n)

=(

A_1k A_2k)

resi(A,B)v=resi(A,Bv)

resi(A,B) =B−AA^†B

resi(A,B)v=Bv−AA^†Bv=resi(A,Bv)

‖resi(A_1k,A_2k)‖2= max

z∈ℝ^n−k,‖z‖=1‖resi(A_1k,A_2k)z‖

= max

z∈ℝ^n−k,‖z‖=1‖resi(A_1k,A_2kz)‖

(9)

Theorem 2 Let m≥n>1, A∈ℝ^m×n, rank(A) =n, and A be partitioned as

We define Â_i as

for i=1, 2, ..., n, and d_i as

for i=1, 2, ..., n. Then, d_i≠0 for i=1, 2, ..., n and

hold.

Proof Since rank(A) =n , {a₁,a₂,…,a_n} is linearly independent. Because d_i is a linear combination of {a₁,a₂,…,a_n} with the coefficient of a_i being 1, d_i≠0 holds for i=1 , 2, ..., n. From the definition of resi,

holds, where x₁ =Â₁^†a₁ . Let x₁=(

x₁₂ x₁₃ … x_1n)T

. Let i be one of 2, 3, ..., n. We can see that

holds if x_1i≠0 from Lemma 3. Thus,

holds. This (13) also holds if x_1i=0 . We define y∈ℝ^m as A=(

a₁ a₂ … a_n) .

Â_i=(

a₁ … a_i−1 a_i+1 … a_n)

d_i=resi(Â_i,a_i)

‖a₁‖

‖d₁‖ ≤

�n i=2

‖a_i‖

‖d_i‖

d₁=a₁−Â₁x₁

‖d_i‖≤

��

�� a_i−Â_i

⎛⎜

⎜⎜

⎝

1 x_1i

−^x¹²

x_1i

⋮

−^x¹ⁱ⁻¹

x_1i

−^x¹ⁱ⁺¹

x_1i

⋮

−^x¹ⁿ

x_1i

⎞⎟

⎟⎟

⎠

��

= ‖d₁‖

�x_1i�

(13)

�x_1i�≤ ‖d₁‖

‖d_i‖

y=d₁+x_1na_n=a₁−

∑n−1 i=2

x_1ia_i.

(10)

Since {a₁,a₂,…,a_n−1} is linearly independent, y≠0 holds. As Lemma 1 gives Â₁^Td₁=0 , we have (a_n,d₁) =0 . Thus,

holds. We can see that

holds from Lemma 3 because y is a linear combination of a_i(i=1, 2,…,n−1) . Since

and ‖d_n‖>0 hold,

holds. Furthermore, since

holds from (13),

holds, and the theorem has been proved. ◻

We refer to an essential theorem by Hong et al.

Theorem 3 [9, p. 218] Let m≥n>l , A∈ℝ^m×n and A=QR=U𝛴V^T be a QR decomposition and an SVD, respectively. Let R and V be partitioned as

x_1n= (a_n,y)

‖a_n‖²

‖d_n‖≤�

��

�a_n−(y,a_n)

‖y‖² y�

��

�

‖d_n‖²=‖d_n‖²

‖d₁‖²

��

�

y−(a_n,y)

‖a_n‖²a_n��

��

2

= ‖d_n‖²

‖d₁‖²‖y‖²

�

1− (a_n,y)²

‖y‖²‖a_n‖²

� ,

��

��a_n−(y,a_n)

‖y‖² y�

��

�

2

=‖a_n‖²

�

1− (a_n,y)²

‖y‖²‖a_n‖²

�

‖a_n‖

‖d_n‖≥ ‖y‖

‖d₁‖

‖y‖≥‖a₁‖−

�n−1 i=2

�x_1i�‖a_i‖

≥‖a₁‖−

�n−1 i=2

‖d₁‖

‖d_i‖‖a_i‖

‖a_n‖

‖d_n‖ ≥ ‖a₁‖

‖d₁‖−

�n−1 i=2

‖a_i‖

‖d_i‖

R=

(R_1l R_2l O R_3l )

, V =

(V_1l V_2l V_3l V_4l )

(11)

where R_1l∈ℝ^l×l and V_1l∈ℝ^l×l.

holds.

In the present study, this theorem is only used for l=n−1 . The following lemma provides an inequality between resi and the singular value.

Lemma 7 Under the same assumptions as Theorem 2,

holds.

Proof Let A=U𝛴V^T be an SVD partitioned as

where V₁∈ℝ^n×(n−1) . Let e_i be the ith column of I_n for i=1 , 2, ..., n. Define a permutation matrix 𝛱

i as

for i=1 , 2, ..., n. Since

and (𝛱^T

i V)^T(𝛱^T

i V) =I_n , U𝛴(𝛱^T

i V)^T is one SVD of (Â_i a_i)

. Let A𝛱

i=Q_iR_i be a QR decomposition partitioned as

where Q_i1∈ℝ^m×(n−1),R_i1∈ℝ(n−1)×(n−1) . Using Theorem 3,

holds. We can see that

holds from Lemma 5. Thus,

holds. Then,

‖R_3l‖2𝜎_n−l(V_4l)≤𝜎

l+1(A)

1≤(𝜎

n(A))²

�n i=1

1

‖d_i‖²

V=( V₁ v₂)

, v₂=(

v₂₁ v₂₂ … v_2n)T

𝛱i=(

e₁ … e_i−1 e_i+1 … e_n e_i)

(Â_i a_i)

=A𝛱_i=U𝛴(𝛱^T

i V)^T

Q_i=( Q_i1 q_i2)

,R_i=

(R_i1 r_i2 O r_i3 )

𝜎_n(A) =𝜎_n(A𝛱_i)≥|v_2i| |r_i3|

‖d_i‖=‖resi(Â_i,a_i)‖=‖r_i3q_i2‖=�r_i3�

𝜎n(A)≥�v_2i� ‖d_i‖

(12)

holds. ◻ Proposition 2 Let m≥n>k and A∈ℝ^m×n satisfy (12) with being partitioned as where A_1k∈ℝ^m×k. Let A satisfy rank(A1k) =k. Then, for all z∈ℝ^n−k with ‖z‖=1,

holds.

Proof From (12) and Lemma 6, the following holds for i=1 , 2, ..., k:

Define A^′ as

If rank(A^�)≠k+1 , then {a₁,a₂,…,a_k,A_2kz} is linearly dependent. Since

rank(A_1k) =k , {a₁,a₂,…,a_k} is linearly independent, and A_2kz can be expressed as a linear combination of {a₁,a₂,…,a_k} . Then, we have

resi(A_1k,A_2kz) =0 from Lemma 3, and the conclusion holds. Therefore, we only consider the case rank(A^�) =k+1 in the remainder of this proof. We define d^′_i as

From Lemma 4, we can see that

holds for i=1 , 2, ..., k and j=i , i+1 , ..., k, where A^�_ijk =(

a_i … a_j−1 a_j+1 … a_k A_2kz) , and 1=

�n i=1

(v_2i)²≤(𝜎

n(A))²

�n i=1

1

‖d_i‖²

A=(

a₁ a₂ … a_n)

=(

A_1k A_2k)

‖resi(A_1k,A_2kz)‖≤

�4^k−1

3 (n−k) +1𝜎

k+1(A)

(14) (n−k)‖

‖‖resi((

a₁ … a_i−1) ,a_i)‖

‖‖

2

≥

∑n j=k+1

‖‖

‖resi((

a₁ … a_i−1) ,a_j)‖

‖‖

2

=‖‖‖resi((

a₁ … a_i−1) ,A_2k)‖‖‖

2 F

≥‖‖

‖resi((

a₁ … a_i−1) ,A_2k)‖‖

‖

2 2

≥‖‖‖resi((

a₁ … a_i−1) ,A_2kz)‖

‖‖

2

.

A^�=(

a₁ a₂ … a_k A_2kz) .

d^�_i =resi((

a₁ … a_i−1 a_i+1 … a_k A_2kz) ,a_i)

(i=1, 2,…,k).

d^�_j =resi( resi((

a₁ … a_i−1) ,A^�_ijk)

, resi((

a₁ … a_i−1) ,a_j))

(13)

holds for i=1, 2, ..., k. Using Theorem 2 on resi((

a₁ a₂ … a_i−1) ,(

a_i a_i+1 … a_k A_2kz))

, we can see that

holds. Thus,

holds for i=1 , 2, ..., k from (12) and (14). Thus,

holds. We want to show that

and prove this using induction in the order of i=k , k−1 , ..., 1. Applying (15) for i=k gives

resi(A_1k,A_2kz)

= resi( resi((

a₁ … a_i−1) ,(

a_i … a_k)) , resi((

a₁ … a_i−1) ,A_2kz))

‖‖

‖resi((

a₁ a₂ … a_i−1) ,a_i)‖

‖‖

‖resi( resi((

a₁ … a_i−1) ,A^�_iik)

, resi((

a₁ … a_i−1) ,a_i))‖‖

‖

≤

∑k j=i+1

‖‖

‖resi((

a₁ a₂ … a_i−1) ,a_j)‖‖‖

‖‖

‖‖resi (

resi((

a₁ … a_i−1) ,A^�_ijk

) , resi((

a₁ … a_i−1) ,a_j))‖‖‖‖

+

‖‖

‖resi((

a₁ a₂ … a_i−1) ,A_2kz)‖

‖‖

‖resi( resi((

a₁ … a_i−1) ,(

a_i … a_k)) , resi((

a₁ … a_i−1)

,A_2kz))‖‖

‖

��

�resi��

a₁ a₂ … a_i−1� ,a_i��

‖d^�_i‖

≤

�k j=i+1

��

�resi��

a₁ a₂ … a_i−1� ,a_j��

�

‖d^�_j‖ +

��

�resi��

a₁ a₂ … a_i−1� ,A_2kz��

�

‖resi(A_1k,A_2kz)‖

≤ ��resi��

a₁ a₂ … a_i−1� ,a_i��

� _k

�

j=i+1

1

‖d^�_j‖+

√n−k

�

1 (15)

‖d^�_i‖ ≤

�k j=i+1

1

‖d^�_j‖+

√n−k

‖resi(A_1k,A_2kz)‖ (i=1, 2,…,k)

1 (16)

‖d^�_i‖ ≤ 2^k−i√ n−k

‖resi(A_1k,A_2kz)‖ (i=1, 2,…,k)

1

‖d^�_k‖ ≤

√n−k

‖resi(A_1k,A_2kz)‖ = 2^k−k√ n−k

‖resi(A_1k,A_2kz)‖.

(14)

Thus, (16) is shown in case i=k . Then, we prove that (16) holds for i=l , assuming that (16) holds for i=l+1 , l+2 , ..., k. We can see that

holds from (15) and the assumption of induction. Thus, (16) has been shown in case i=1 , 2, ..., k. Using Lemma 7 on A^′,

holds. Thus,

holds. Now, if we can show that

then the proof is complete. Considering the fact that

we want a subspace 𝛩 that satisfies

Let

1

‖d^�_l‖ ≤

�k j=l+1

1

‖d^�_j‖+

√n−k

≤

√n−k

� _k

�

j=l+1

2^k−j+1

�

= 2^k−l√ n−k

1≤(𝜎

k+1(A^�))²

� _k

�

i=1

1

‖d^�_i‖² + 1

‖resi(A_1k,A_2kz)‖²

�

≤ (𝜎

k+1(A^�))²

� (n−k)

�k i=1

4^k−i+1

�

= (𝜎

k+1(A^�))²

�4^k−1

3 (n−k) +1

�

‖resi(A_1k,A_2kz)‖≤

�4^k−1

3 (n−k) +1𝜎

k+1(A^�)

𝜎k+1(A^�)≤𝜎

k+1(A),

𝜎k+1(A) = max

𝛩,dim𝛩=k+1 min

x∈𝛩,‖x‖=1‖Ax‖,

x∈𝛩min,‖x‖=1‖Ax‖≥𝜎

k+1(A^�).

𝛩^�=span {

e₁,e₂,…,e_k, (0

z )}

.

(15)

Then, we have dim(𝛩^�) =k+1 since {

e₁,e₂,…,e_k, (0

z )}

is linearly independent.

Let y= (y_i) ∈ℝ^k+1 . Since (

e₁ e₂ … e_k (0

z ))T(

e₁ e₂ … e_k (0

z ))

=I_k+1 holds,

holds. For all y∈ℝ^k+1 that satisfies the right-hand side of (17),

holds. Then,

holds. ◻

Thus, we have proved that

4 Evaluation from below

In this section, we show that the inequality proved in the previous section is tight. An example of matrix R_h with real-valued parameter h that satisfies

is shown. R_h is as follows:

The Kahan matrix is [10]

(17)

‖y‖=1⇔

��

�k i=1

y_ie_i+y_k+1

�0 z

��

�

=1

��

�� A

� _k

�

i=1

y_ie_i+y_k+1

�0 z

��

�

=‖A^�y‖≥𝜎

k+1(A^�)

𝜎k+1(A)≥ min

x∈𝛩^�,‖x‖=1‖Ax‖≥𝜎

k+1(A^�)

pivotQR_k(A)≤

√4^k−1

3 (n−k) +1SVD_k(A).

pivotQR_k(R_h) SVD_k(R_h)

�

��→

h→0

√4^k−1

3 (n−k) +1

R_h=

⎛⎜

⎜⎜

⎜⎝

1 0 … 0

0 h ⋱ ⋮

⋮ ⋱ ⋱ 0

0 … 0 h^k

O

O O

⎞⎟

⎟⎟

⎟⎠

⎛⎜

⎜⎜

⎜⎝ 1 −√

1−h² … −√

1−h² … −√ 1−h²

0 1 ⋱ ⋱ ⋱ ⋮

⋮ ⋱ ⋱ −√

1−h² … −√ 1−h²

0 … 0 1 … 1

O

⎞⎟

⎟⎟

⎟⎠ .

(16)

Therefore, R_h is the same as the Kahan matrix in case m=n=k+1 and is an exten- sion of the Kahan matrix otherwise.

Proposition 3 Let m≥n>k. Define 𝛴_h∈ℝ^m×n, (w_hij) =W_h∈ℝ^n×n, and R_h∈ℝ^m×n as follows:

and R_h=𝛴

hW_h where 0<h<1. Then,

holds.

Proof Let Q= (I_n

O )

∈ℝ^m×n and R=diag(1,h,…,h^k, 0, 0,…, 0)W_h∈ℝ^n×n . Since R is an upper triangular matrix and Q^TQ=I_n holds, R_h=QR is one QR decomposition. We check (1) for this R. Since

(1) holds for l=1 , 2, ..., k+1 , j=l+1 , l+2 , ..., n. Obviously (1) also holds for l=k+2 , k+3 , ..., n−1 , j=l+1 , l+2 , ..., n. As in Sect. 2, let R be partitioned as

where R_1k∈ℝ^k×k . Then, K_n =

⎛⎜

⎜⎜

⎝

1 0 … 0

0 h ⋱ ⋮

⋮ ⋱ ⋱ 0

0 … 0 hⁿ⁻¹

⎞⎟

⎟⎟

⎠

⎛⎜

⎜⎜

⎜⎝

1 −√

1−h² … −√ 1−h²

0 1 ⋱ ⋮

⋮ ⋱ ⋱ −√

1−h²

0 … 0 1

⎞⎟

⎟⎟

⎟⎠ .

𝛴h=

�diag(1,h,…,h^k, 0, 0,…, 0) O

� ,

w_hij=

⎧⎪

⎨⎪

⎩

1 (i=jand 1≤i≤k)or(i=k+1 andk+1≤j≤n),

−√

1−h² (i<jand 1≤i≤k),

0 otherwise

limh→0

pivotQR_k(R_h) SVD_k(R_h) =

√4^k−1

3 (n−k) +1

(left side of (1)) =h^2l−2= (1−h²)

min(j,k+1)−1∑

i=l

h²ⁱ⁻²+h2 min(j,k+1)−2

= (right side of (1)),

R=

(R_1k R_2k O R_3k )

pivotQR_k(R_h) =‖R_3k‖2

(17)

holds. Define V ∈ℝ(n−k)×(n−k) and v₁∈ℝ^n−k as follows:

where v₂ , v₃ , ..., v_n−k are chosen such that V^TV =I_n−k holds. We can choose them freely as long as this is satisfied. Since

holds, ‖R_3k‖2=h^k√

n−k holds. We consider the value of SVD_k(R_h) =𝜎

k+1(R_h) . Considering the fact that

we want a subspace 𝛩 whose max_x∈𝛩,‖x‖=1‖R_hx‖ is small. Since v^T

1v_i=0 holds for i=2 , 3, ..., n−k,

holds for i=2 , 3, ..., n−k . We define y_j=1 for j=k+1 , ..., n and define y_j from j=k down to j=1 as y_j=√

1−h²∑n

i=j+1y_i . We define y∈ℝⁿ as (

y₁ y₂ …y_n)T

Then, .

holds. Since

holds,

V =�

v₁ v₂ … v_n−k�

, v₁= 1

√n−k

�1 1 … 1�T

R_3k=h^k√ n−k�

v₁ 0 … 0�T

𝜎k+1(R_h) = min

𝛩,dim𝛩=n−k max

x∈𝛩,‖x‖=1‖R_hx‖,

R_h

�0 v_i

�

=

⎛⎜

⎜⎝ R_2k R_3k O

⎞⎟

⎟⎠ v_i=

⎛⎜

⎜⎜

⎝

−√

(n−k)(1−h²)h⁰

−√

(n−k)(1−h²)h¹

⋮

−√

(n−k)(1−h²)h^k−1

√n−k h^k 0 0

⋮ 0

⎞⎟

⎟⎟

⎠

v^T₁v_i=0

R_hy=(

0 0 … 0(n−k)h^k 0 0 … 0)T

limh→0‖y‖=��

�(n−k)2^k−1 (n−k)2^k−2 … (n−k)2⁰ 1 1 … 1��

=

�4^k−1

3 (n−k)²+n−k