Singular values - Euclidean space basics 15

2. Euclidean space basics 15

2.5. Singular values

· · ·< i_h represent the indexes corresponding to nonzero rows in ¯R. Using this notation [y₁ · · · y_k]L^T₁L^T₂ · · ·L^T_j = [¯q₁ q¯₂ . . .q¯_j y_j+1 . . . y_k] , j ≤k , and

[¯q₁ · · · q¯_k]L^T_k+1 = [q₁ · · · q_h] is a restatement of the above Gram-Schmidt orthogonalization.

x y

span{x}

(span{x})^⊥

˜ y

σ₁u₁

σ₂u₂

i≤2hu_i,^•i²/σ²_i=1}

Figure 2.5

The figure illustrates that the set of images{[x y]c| kck= 1}of unit lengthc∈R²under [x y]

forms an ellipse {P

i≤2hu_i,^•i²/σ²_i = 1} with principal semi-axes σ₁u₁ and σ₂u₂. Moreover, the residual ˜y from orthogonally projectingy onto span{x} lies outside of that ellipse.

singular value singular value

s . Accordingly, the representation Y = P

i≤hσ_iu_ihv_i,^•i is referred to as asingular value decomposition

singular value decomposition

of Y. The latter generalizes figure2.5: the set of images {Y c| kck= 1} of unit length vectors c∈R^k forms an ellipse {P

i≤khu_i,^•i²/σ_i² = 1}.

2.5.2. Unitarily invariant norms

Singular vectors of an element Y ∈W^×k may not be unique. In contrast, the singular values ofY are uniquely determined via σ_j = inf_X_∈W^×k_,rk_X<jsup_kuk=1k(Y −X)uk, j ≤ h= rkY. Firstly, the latter expression cannot exceedσ_j asX =P

i<jσ_iu_ihv_i, ^•i ∈W^×k has rank j −1 and sup_kuk=1k(Y −X)uk = σ_j. Secondly, if X ∈ W^×k has rank less than j, then Xv1, . . . , Xvj, wherein v1, . . . , vj provide (an initial stretch of a sequence of) right singular vectors of Y—as constructed in section 2.5.1, are j elements of the j−1 dimensional space imgXand therefore exhibit linear dependence. Thus, there exists a unit length element v ∈span{v1, . . . , vj} ∩kerX and consequently k(Y −X)vk ≥σj. The notation σ₁(Y), . . . , σ_h(Y) highlights the uniqueness implied by the approxima-tion error characterizaapproxima-tion. In addiapproxima-tion, the latter suggest setting σ_h+1(Y) = · · · = σk(Y) = 0 and implies the invariance of singular values to composition—from left or right—with suitable unitary maps. The most important elements of the singular value sequence σ₁, . . . , σ_k are the maximal and least nonzero singular value and thereby deserve their own special symbols σ1 =σmax and σh =σmin,6=0, respectively.

Several norms on W^×k (and thus R^m×k) measure the length of an element Y ∈ W^×k in terms of its singular values. Such norms are invariant to composition with unitary

maps and are therefore calledunitarily invariant. The relevant examples include ^unitarilyinvariant

(d₁) the Frobenius norm

Frobenius norm

of Y ∈ W^×k defined by kYk =hY, Yi^1/2—see (d) in section 2.1.3—equals the square-root ofP

i≤kσ_i²(Y);

(d₂) the operator norm operator norm

kYk_op= sup_kck=1kY ck coincides with σ_max(Y); and, (d₃) lastly, the sum P

i≤kσ_i(Y) supplies thenuclear norm kYk_nuc of Y. nuclear norm

The triangle inequality for the nuclear norm k^•k_nuc follows from its characterization as thedual norm

dual norm

ofk^•k_op. In general, the dual norm of an arbitrary normk^•k⁰ on a Eu-clidean space is given by sup_kxk⁰₌₁hx, ^•i. Section2.1.3shows that kxk=hx, xi^1/2 equals

its own dual, which implies the Cauchy-Schwarz inequality ^Cauchy-Schwarz inequality

|hx, yi| ≤ kxkkyk. More generally, one has|hx, yi| ≤ sup_kzk0=1hz, yi

kxk⁰. This inequality suggests a symmetric relation. In fact, the normk^•k⁰ equals the dual norm of sup_kxk⁰₌₁hx, ^•i.

The nuclear norm k^•k_nuc in(d₃) satisfieshY, Xi=P

i≤hσ_ihu_i, Xv_ii ≤P

i≤hσ_i when-ever kXk_op = 1 and Y = P

i≤hσ_iu_ihv_i, ^•i represents a singular value decomposition of Y. Equality holds for X = P

i≤huihvi, ^•i. Thus, k^•knuc and k^•kop are duals. The resulting inequality|hY, Xi| ≤ kYk_opkXk_nuc can be refined using the representation

W^×k 3Y = X

i≤rkY

σ_iu_ihv_i,^•i= X

i≤rkY

(σ_i−σ_i+1)X

j≤i

u_jhv_j, ^•i= X

i≤rkY

c_iY_i , wherein σ_rk_Y₊₁ = 0, c_i = σ_i −σ_i+1 ≥ 0, and all singular values of Y_i = P

j≤iu_jhv_j, ^•i are unity. If P

i≤rkX¯ciXi is in analogy with respect toX ∈W^×k, then hX, Yi ≤ X

i≤rkY

j≤rkX

ci¯cj|hYi, Xji| ≤ X

i≤rkY

j≤rkX

ci¯cjmin{i, j}

= X

i≤rkY

j≤rkX

c_i¯c_jtrB_iB_j =X

i≤k

σ_i(Y)σ_i(X), <2.10>

whereinB_i =P

j≤iB_j,j withB_j,j being thej, j-th element of the standard basis ofR^k×k, and the second step follows from k^•k_op/k^•k_nuc-duality. The comparison between the

leftmost and rightmost term in <2.10> is known as the von Neumann trace inequality von Neumann trace inequality

. Finally, the representations in (d₁), (d₂), and (d₃) imply the inequalities

kYk²_op ≤ kYk² ≤ X

i,j≤rkY

σiσj =kYk²_nuc =h σ1

...

σh

₁ ...

i² ≤rkYkYk² ≤(rkY)²kYk²_op forY ∈W^×k. The rank-related factors in the preceding display are for a givenY ∈W^×k. The respective upper subspace compatibility constants—as defined in section2.1.2—are given by (min{dimW, k})^1/2 and its square—the maximum possible rank of Y ∈W^×k. 2.5.3. Singular space pairs

Left and right singular vectors are—at most—unique up to a sign choice. In particular, σ_ju_jhv_j, ^•i remains unchanged if both u_j and v_j are multiplied by −1. At the other extreme, a unitary map fromR^k to a Euclidean spaceW has all its singular values equal to unity, and every orthonormal basis ofR^k may serve as its right singular vectors.

This section considers Y ∈W^×k with rkY =h >0 to address the general case. If the least nonzero singular valueσ_h ofY is attained at two unit length elementsv_h and v_h⁰ 6∈

span{vh}, then the residual ˜v_h⁰ from orthogonally projectingv⁰_h onto span{vh}—a linear

combination ofv_h,v⁰_h—is a nonzero element of (kerY⁰)^⊥, whereinY⁰ =Y −σ_hu_hhv_h, ^•i and u_h symbolizes the left singular vector corresponding to v_h. Hence, this residual provides a valid choice for vh−1 in form of ˜v⁰_h/k˜v_h⁰k. Consequently, hYv˜_h⁰, Y v_hi = 0 implies σ_h² = kY v⁰_hk² = kY˜v_h⁰k² +kYˆv_h⁰k² = kYv˜_h⁰k² +σ²_hhv_h, v⁰_hi², wherein ˆv_h⁰ = v_h −

v_h = v_hhv_h, v_h⁰i. Hence, one has kYv˜⁰_hk² = σ_h²(1− hv_h⁰, v_hi²) = σ_h²k˜v⁰_hk² and thereby σ_h = σh−1. Elements c₁v_h +c₂vh−1 of span{v_h, vh−1}, wherein vh−1 = ˜v_h⁰/k˜v_h⁰k, satisfy kY(c₁v_h+c₂vh−1)k² =σ_h²(c²₁+c²₂), thus, are elements of {kY ^•k=σ_hk^•k}.

Either span{v_h, v_h−1} equals the latter set or that set contains an element v⁰_h−1 6∈

span{v_h, vh−1}. In the latter case, the residual ˜v_h−1⁰ from orthogonally projecting v_h−1⁰ onto the span ofv_h and vh−1 is nonzero and leads to a candidate ˜v⁰_h−1/k˜v_h−1⁰ k for vh−2. Then it follows thathYv˜_h−1⁰ , Y vi = 0 for all v ∈ span{v_h, v_h−1} and thereby σ_h−2 =σ_h. Hence, span{v_h, vh−1, vh−2} ⊂ {kY ^•k =σ_hk^•k}. If the latter sets differ, then a further iteration is possible. The recursion stops after m_h ≤ h steps. It identifies {kY ^•k = σ_hk^•k}as a subspace and generates a corresponding orthonormal basisv_h−m_h₊₁, . . . , v_h.

This recursive argument is applicable to the reduced mapY −σ_hPmh−1

j=0 uh−jhvh−j, ^•i, wherein u_j again represents the left singular vector corresponding to v_j, and so forth.

Eventually, these arguments construct subspaces V_j ={kY ^•k = ¯σ_jk^•k}, j ≤ s,

corre-sponding to thedistinct (nonzero) singular value ^distinct(nonzero) singular value

s ¯σ₁ >· · ·>σ¯_s(=σ_h)>0 ofY.

The characterization V_j = {kY ^•k = ¯σ_jk^•k} guarantees that these subspaces are uniquely determined by Y. Elements v ∈ V_i, v⁰ ∈ V_j, i 6= j, are orthogonal, and the

sum P ^sum

j≤sV_j = {P

j≤sv_j|v_j ∈ V_j, j ≤ s} of these subspaces equals (kerY)^⊥. The dimension dimV_j supplies the multiplicity

multiplicity

m_j of ¯σ_j, and therefore one has h = rkY = P

j≤sm_j. Moreover, the images U_j of Y restricted to V_j satisfy U_i ⊂ U_j^⊥, i 6= j, and

their sum equals imgY. The pairs (V_j, U_j) are called the singular subspace pairs for Y. ^singularsubspace

The valid selections of right and left singular values of Y consist of arbitrarily chosen orthonormal bases for Vj,j ≤s, and the corresponding scaled images, respectively.

2.5.4. Singular vectors of symmetric matrices

The symmetry of A∈S^m is reflected by its singular vectors.

Lemma 2.4. If A ∈ S^m is of rank h > 0, then there exists a choice of right singular vectorsv₁, . . . ,v_h with corresponding left singular vectorsu₁, . . . ,u_h such that u_i =±v_i. A proof of lemma 2.4 starts on page 40 in appendix 2.b. This result guarantees that the approximation error characterization, the duality relations, and inequalities in section 2.5.2 are equally valid—and follow by the same arguments—if the symmetric matrices are considered in isolation. In particular, any two symmetric matrices A, B ∈ S^m satisfy hA, Bi ≤P

i≤mσ_i(A)σ_i(B)≤ kAk_opkBk_nuc, wherein equality is possible.

A representation A=P

i≤h±σ_iv_ihv_i, ^•i as in lemma2.4 is called a spectral decompo-sition

spectral decomposition

of A. This expression reveals that the singular subspaces V_j are sums of the two subspaces V_j⁺ = ker(¯σjid−A) and V_j⁻= ker(¯σjid +A) with V_j⁺ ⊂(V_j⁻)^⊥.

Thus, A is of the form P

i≤s¯σ_i(P_V⁺

j − P_V⁻

j ), wherein ¯σ₁, . . . , ¯σ_s, P_V, and id rep-resent the distinct nonzero singular values of A, the orthogonal projector onto a sub-space V ⊂ R^m, and the identity map on R^m, respectively. Such (projector-based)

representations are unique and show that the linear space of symmetric matrices S^m provides the (⊂-)smallest subspace of R^m×m containing all orthogonal projectors onto subspaces of R^m. By definition, at most one of the two subspaces V_j⁺ and V_j⁻, j ≤ s, may equal {0}. In particular, if A is positive semidefinite, then V_j⁻ ={0} for all j ≤s and trA =P

i≤rkAσ_itr(u_iu^T_i) = kAk_nuc as tr(u_iu^T_i) = P

j≤mu²_j,i =ku_ik² = 1.

The above terminology allows a complete characterization of Y ∈ W^×k such that hY, Xi=kXk_opkYk_nucfor a givenX ∈W^×k. A proof starts on page40in appendix2.b.

Lemma 2.5. Let X be a nonzero linear map from R^k to a Euclidean space W and

X = P

i≤rkXσ_iu_ihv_i, ^•i be any singular value decomposition of X, then the equality hY, Xi = kXk_op holds for a unit k^•k_nuc-length Y ∈ W^×k if and only if there exists a positive semidefinite S ∈S^m¹ with trS = 1 and

Y = [u₁ · · · u_m₁]Shh[v₁ · · · v_m₁], ^•ii,

whereinm₁ denotes the multiplicity of the largest (distinct) singular value σ¯₁ of X.

If A∈S^m and the columns u⁺₁, . . . , u⁺_m0

1 of U₁⁺ and u⁻₁, . . . , u⁻_m00

1 of U₁⁻ form orthonor-mal bases ofV₁^± = ker(A∓σ¯₁(A) id), respectively, then m⁰₁+m⁰⁰₁ =m₁, the multiplicity of ¯σ1(A), and (u⁺_i , u⁺_i ) as well as (u⁻_j ,−u⁻_j ) provide suitable singular vector pairs. Corol-lary2.6—proved on page 41in appendix 2.b—states the resulting representation.

Corollary 2.6. For every symmetric B ∈ {k^•k_nuc = 1} with hA, Bi = kAk_op, there exist positive semidefinite S⁺∈S^m

1, S⁻ ∈S^m

1 such that B =U₁⁺S⁺hhU₁⁺, ^•ii −U₁⁻S⁻hhU₁⁻,^•ii

and trS⁺+ trS⁻ = 1. Moreover, there exists a selection of bases such that S⁺ and S⁻

are diagonal matrices, that is, all non-diagonal entries of these matrices are zero. ^diagonalmatrices

Comments and references

Section2.1 Halmos(1974) covers most of the topics of section2.1in-depth. Note that the usage of the termcoordinatesin this text is nonstandard. A more detailed treatment of matrix norms may be found inGolub and Van Loan(2013, sec. 2.3). Vershynin(2012, sec. 5.2.2) definesε-nets and covering numbers. Lemma2.1equals his lemma 5.2. At first sight, the covering of the ^ε₂-balls by {k^•k ≤1 +ε/2}in the proof given in appendix 2.b may seem overly generous. However, more refined replacements for this set do not lead to more informative upper bounds on the covering number N({k^•k= 1},k^•k, ε).

The presentation of Euclidean geometry inStrang(2005, chapter 3) exhibits a similar style as section2.1.3. Specifically, his figures 3.1a, 3.6, and 3.7 closely resemble figure2.1 [Panel (A)],2.3[Panel (A)], and figure2.1[Panel (B)], respectively. Kailath et al.(2000, appendix 4.A) supply the examples of section 2.1.1 except for the symmetric matrices.

Borwein and Lewis(2010, sec. 1.2) fill this gap. The matrix-like notation for linear maps R^k→V is borrowed from Morf and Kailath(1975, section IV).

Section 2.2 The motivation of the concept of a unitary map is taken from Halmos (1974,§73). The term representation is borrowed from Parzen (1961, def. 4A).

Ify₁, . . . , y_k∈R^m, then Y =QRin lemma2.2 is known asQR decomposition(Golub and Van Loan, 2013, sec. 5.2). Bj¨orck(1996, sec. 2.4.2) treats several variants of Gram-Schmidt orthogonalization. The formulation in<2.3>replicates his algorithm 2.4.4 and is therein termed column oriented modified Gram-Schmidt process. The present treat-ment of the case with linear dependence is nonstandard; the usual treattreat-ment (Bj¨orck, 1996, rem. 2.4.5) involves a rearrangement of the input sequence—called pivoting.

Section 2.3 Halmos (1974, §73, §41) provides a similar treatment of orthogonal and oblique projectors. Furthermore, his sections §18, §19 and §20 prove the assertions about complements. Figure 2 of Wedin (1983, p. 266) illustrates the corresponding decomposition into two projections in similar fashion as in panel (A) of figure 2.4. The notation for orthogonal and oblique projectors is close toGal´antai(2008, sec. 2).

Section2.4 The linear maphhY, ^•ii(onW) amounts to theadjointofY (Halmos,1974,

§44). Gramians are defined in Kailath et al. (2000, appendix 4.A, (4.A.2)). Doz et al.

(2011, sec. 3) employ Gramian substitutes to derive (associated) oblique projections.

The corollary of lemma 2.3 on the existence of a Euclidean space supporting h^•, ^•i_∗ is often proved by showing—by induction—that a Cholesky factorization with posi-tive semidefinite input A finishes successfully (Trefethen and Bau, 1997, par. before thm. 23.1). The geometric approach taken here is an elementary version of property (4) of Aronszajn (1950, part I, sec. 2). The proof of the chosen inner product being well defined parallelsSch¨olkopf and Smola (2002, sec 2.2, pp. 32–33).

Golub and Van Loan (2013, sec. 4.2) treat Cholesky factorization for linearly in-dependent y₁, . . . , y_k, however, use the term Cholesky factor for the lower triangu-larR^T. Their equivalent to<2.9>inGolub and Van Loan(2013, algorithm 4.2.1)—called gaxpy Cholesky—differs accordingly. Therein, linear independence leads to a (unique) Cholesky factor with positive diagonal entries; this is the common usage of this term.

Kailath et al.(2000, prob. 12.3) mention the version of ¯R—the triangular matrix gen-erated by <2.9>—with nonnegative diagonal entries but under a different name. They also discuss the Gram-Schmidt/Cholesky correspondence in their section 4.4. Eubank (2006, sec. 1.2.3) stresses the identical output of two very similar algorithms.

The representation of a Cholesky factorization as pre-multiplications with lower trian-gular matrices is fromTrefethen and Bau(1997, lecture 23, pp.173–174, algorithm 23.1).

Section 2.5 Anderson (1958, sec. 11.2, thm. 11.2.1) constructs (left) singular vectors under the alternative label principal components but in opposite order. His construc-tion treats singular values and vectors via eigen-theory; this approach is a widespread alternative to the topics of this sections (Stewart and Sun, 1990, I.3, I.4).

Trefethen and Bau (1997, lecture 4) motivate the notion of a (reduced) singular value decomposition by geometric arguments; in particular, figure 2.5 resembles their fig-ure 4.1. Golub and Van Loan (2013, proof of thm. 2.4.1, thm. 2.4.8) justify the

orthog-onality of uh−1 and u_h in essentially the same way but in reverse direction. They refer to the approximation error characterization of singular values as the Eckhart-Young theorem and provide the corresponding argument.

Recht et al. (2010, sec. 2, prop. 2.1) represent the norms (d₁), (d₂), and (d₃) using singular values. Their section 2 also defines the concept of a dual norm and derives the nuclear norm/operator norm duality. The nuclear normk^•k_nucis also known as thetrace normorSchatten-1-norm. TheKy-Fan-h-norm equals the sum of the hlargest singular values; thus, both k^•k_op and k^•k_nuc are of this type. Alternative names for k^•k in (d₁) include Schatten-2-norm and Hilbert-Schmidt norm. Unitarily invariant norms are the topic ofStewart and Sun(1990, II.3). The proof of von Neumann trace inequality is due toGrigorieff (1991). Stewart (1973, def. 6.1) contains a comparable, but less restrictive notion of singular space pairs. Halmos (1974, §79, thm. 1) presents the projector-based spectral decomposition. Lemma 2.5 amounts to theorem 4.3 of Zi¸etak (1988).

Appendixes Pollard (2002, ch. 2) contains the relevant results on L²-spaces. Therein, part (iii) ofPollard (2002, ch. 2, sec. 6, lem. 26) caters the techniques used to quantify the influence of the choice of basis element representatives.

The present approach yields areproducing kernel Hilbert space(Sch¨olkopf and Smola, 2002, def. 2.9). Aronszajn (1950, sec. 3) shows that for a given choice of representatives q₁, . . . , q_k the associated reproducing kernel is of the form K(ω, ω⁰) =P

i≤kq_i(ω)q_i(ω⁰).

Its reproducing property yields the evaluation functional fω(y) = hP

i≤kqi(ω)qi, yi, which is implicitly used in section2.4.1.

Anderson, T. W. (1958).An introduction to multivariate statistical analysis. Wiley series in probability and mathematical statistics: Probability and mathematical statistics. New York: Wiley.

Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society 68(3), 337–404.

Bj¨orck, ˚A. (1996). Numerical methods for least squares problems. SIAM.

Borwein, J. M. and A. S. Lewis (2010). Convex analysis and nonlinear optimization: theory and examples (2 ed.), Volume 3. Springer.

Doz, C., D. Giannone, and L. Reichlin (2011). A two-step estimator for large approximate dynamic factor models based on Kalman filtering. Journal of Econometrics 164(1), 188–205.

Eubank, R. L. (2006). A Kalman filter primer, Volume 186 ofStatistics. Chapman & Hall/CRC.

Gal´antai, A. (2008). Subspaces, angles and pairs of orthogonal projections. Linear and Multilinear Algebra 56(3), 227–260.

Golub, G. H. and C. F. Van Loan (2013). Matrix computations (4 ed.). Johns Hopkins studies in the mathematical sciences. Baltimore, Md.: Johns Hopkins Univ. Press.

Grigorieff, R. D. (1991). A note on von Neumanns trace inequality. Math. Nachr 151, 327–328.

Halmos, P. R. (1974). Finite-dimensional vector spaces (Repr. of the 2. ed. ed.). New York: Springer.

Kailath, T., A. H. Sayed, and B. Hassibi (2000).Linear estimation, Volume 1. Prentice Hall New Jersey.

Morf, M. and T. Kailath (1975). Square-root algorithms for least-squares estimation.Automatic Control, IEEE Transactions on 20(4), 487–497.

Parzen, E. (1961). An approach to Time Series Analysis.The Annals of Mathematical Statistics 32(4), pp. 951–989.

Pollard, D. (2002). A user’s guide to measure theoretic probability, Volume 8 ofCambridge series in statistical and probabilistic mathematics. Cambridge: Cambridge University Press.

Recht, B., M. Fazel, and P. A. Parrilo (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501.

Sch¨olkopf, B. and A. J. Smola (2002). Learning with kernels: support vector machines, regulariza-tion, optimizaregulariza-tion, and beyond. Adaptive computation and machine learning. Cambridge, Mass.:

MIT Press.

Stewart, G. W. (1973). Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM review 15(4), 727–764.

Stewart, G. W. and J. Sun (1990). Matrix perturbation theory. Computer science and scientific com-puting. Boston: Acad. Press.

Strang, G. (2005). Linear algebra and its applications (4 ed.). Cengage Learning.

Trefethen, L. N. and D. Bau (1997). Numerical linear algebra. Philadelphia: SIAM, Soc. for Industrial and Applied Math.

Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok (Eds.),Compressed Sensing, pp. 210–268. Cambridge University Press.

Wedin, P. ˚A. (1983). On angles between subspaces of a finite dimensional inner product space. In Matrix Pencils, pp. 263–285. Springer.

Zi¸etak, K. (1988). On the characterization of the extremal points of the unit sphere of matrices. Linear Algebra and its Applications 106, 57–75.

Appendix

2.a. Square integrable functions

The basic entities of the present section areµ-square integrable functions defined on a common measure space (Ω,F, µ) with finite measure µ. The geometry of the linear space spanned by such functions derives from an inner product, whose definition relies onµ-square integrability of the respective functions not finiteness ofµ. In fact, if functionsxandyareµ-square integrable, then R

|x(ω)y(ω)|P(dω) ≤ R ₁

2 x²(ω) + y²(ω)

P(dω) as 0 ≤ x(ω) ±y(ω)2

= 2

x²(ω) + y²(ω)

/2±x(ω)y(ω)

for every ω ∈ Ω, which ensures that h^•,^•i is well defined as a real-valued map. However, finiteness of µ guarantees that the set L² = L²(Ω,F, µ) of µ-square integrable functions on (Ω,F) includes theF/R¹-measurable functions—R¹ being the Borel σ-field of the|^•|-topology—with finite range. In fact, ifµis finite, then the indicators of elements of F and thus linear combinations of the latter are µ-square integrable. In particular, if y is

F/R¹-measurable, then its composition with thesign function—given by sign(x) =−1,0,1 for, sign function

respectively, negative, zero, or positivex∈R—isµ-square integrable. Thus, ifµis finite, then R|y(ω)|µ(dω) =hy,sign(y)i shows that µ-square integrability of y implies its µ-integrability.

If F contains sets of µ-measure zero, then their indicators witness the existence of nonzero elements f of L² withkyk= R

y²(ω)µ(dω)1/2

= 0. Here, these are called representatives of zero, and their existence degradesk^•kto a seminorm. The conventional way out is to takey=x ify−xis a representative of zero by partitioningL² into equivalence classes [[y]] ={k^•−yk= 0}. As the set [[0]] of representatives of zero forms a subspace under pointwise linear operations, the set of equivalence classes

[[y]]

y ∈ L² forms a linear space L²—a so-called quotient space—with the linear operationsa[[y]] = [[ay]] and [[y]] + [[x]] = [[y+x]]. Adaptingh^•,^•i to operate onL² via h[[y]],[[x]]i=hy, xi yields a well defined inner product on that space. Thus, any finite dimensional subspace of L² provides a Euclidean space. The accustomed notation double uses y instead of the pair y and [[y]], but supplements relations such = and ≤ with

“almost everywhere” qualifiers to warn against an unwarranted pointwise interpretation.

An alternative strategy—used herein—amounts to choosing a subspace of L² containing a single point from each element—an equivalence class—of theL²-subspace under consideration.

In the present setting this approach dispenses with the tiresome “almost everywhere” qual-ification and more importantly comes with the advantage that pointwise evaluation remains well defined. This construction builds on an initial choice of an F/R¹-measurable element q_i from each equivalence class [[q_i]] of an orthonormal basis [[q₁]], . . . , [[q_m]] of the relevant L²-subspace. Then, a suitable element of [[y]] follows from combining q1, . . . , qm with the coordinatesh[[y]],[[q₁]]i, . . . , h[[y]],[[q_m]]i of [[y]] with respect to [[q₁]], . . . , [[q_m]]. In fact, the resulting L²-subspace W = span{q₁, . . . , q_m} has pointwise zero as its sole representative of zero. More generally,y, x∈[[y]]∩W impliesy−x∈[[0]]∩W, thus, the pointwise equalityy=x.

Hence,h^•,^•i and thereforek^•kare an inner product and a norm onW, respectively.

The choice of the basis representatives does not affect the geometry—induced by h^•,^•i—

of the resulting L²-(sub)space W. Moreover, an alternative choice of a F/R¹-measurable q_i⁰ ∈ [[q_i]] differs from q_i solely on the µ-measure zero set N_i = {q_i 6= q⁰_i} ∈ F. Therefore, all elements of the two linear spaces span{q₁, . . . , q_k} and span{q⁰₁, . . . , q_k⁰} differ from their respective counterparts at most on the µ-measure zero set ∪_i≤kNi. In particular, theimage measure

image measure

µ◦y⁻¹, that is, R¹ 3B7→(µ◦y⁻¹)B=µ{y∈B}, of a given linear combinationy = P

i≤kc_iq_i does not depend on the choice of basis element representatives. In fact, µ X

i≤kciqi∈B =µ

i≤kciqi ∈B ∩∩_i≤k{q_i =q_i⁰}

=µ X

i≤kciq_i⁰ ∈B . An analogous argument also applies to the image measure ofω 7→ y₁(ω), . . . , y_`(ω)

—defined on the Borelσ-field of the norm topology on R^k, whereinyi =P

j≤kcj,iqj withcj,i∈R,`∈N.

2.b. Proofs

Proof of lemma2.1. A finite subset {x₁, . . . , x_q} of {k^•k = 1} is ε-separated if d(x_i, x_j) = kx_i −xjk > ε for all i, j ≤ q with i 6= j. The construction of a ⊂-maximal element of the set Sof ε-separated subsets of{k^•k= 1} succeeds by starting at an arbitrary unit length x1

and recursively adding unit length x_n with d(x_n, x_j) > ε, j < n. Compactness of {k^•k = 1}

guarantees that the construction terminates after q(∈ N) steps. If z 6∈ ∪i≤q{d(x_i,^•) ≤ ε}

for some unit length z, then {z, x₁, . . . , xq} is ε-separated, which contradicts ⊂-maximality of {x₁, . . . , x_q}. Hence,{x₁, . . . , x_q} provides an ε-net. The ε/2-balls {d(x_i,^•) ≤ε/2},i≤q, are pairwise disjoint, and their union amounts to a subset of{k^•k ≤1+ε/2}. Hence, additivity, translation invariance, and the scaling property of the Lebesgue measureν on R^k imply

qε 2

ν{k^•k ≤1} ≤ 1 +ε

2 k

ν{k^•k ≤1}.

Furthermore, ν{k^•k ≤ 1} > 0 together with the definition of a covering number implies (1 + 2/ε)^k≥q≥N({k^•k= 1},k^•k, ε).

Proof of lemma2.3. For any two element Y a and Y b of span{y₁, . . . , y_k} define hY a, Y bi_∗ = ha, Gbi. This expression inherits symmetry from G. Hence, the map h^•,^•i_∗ :V ×V →R is well defined asY a=Y a⁰ impliesa−a⁰ ∈kerY = kerG. Bilinearity of h^•,^•i_∗ follows from its definition. Finally, positive semi-definiteness of G ensures hY a, Y ai_∗ ≥ 0. Therein, equality

holds if and only ifa∈kerG= kerY, that is,Y a= 0.

Proof of lemma2.4. If (Vj, Uj) symbolizes thej-th singular subspace pair forA, thenu ∈Uj

and symmetry of A imply the (in)equalities kAuk = sup_kvk=1hAu, vi = sup_kvk=1hu, P_U_jAvi = sup_kvk=1hu, AP_V_jvi ≤¯σ_jkuk. Herein,P_L denotes the orthogonal projector onto a subspace L.

Such maps are contractions; hence,kAP_V_jvk= ¯σjkP_V_jvk ≤¯σj. The first equality expresses the self-duality ofk^•k. The third equality is due to imgAP_V^⊥

j ⊂U_j^⊥. The subsequent inequality is an application of the Cauchy-Schwarz inequality. Consideration of j = s—the number of distinct nonzero singular values ofA—implies V_s=U_sasU_j ⊂P

i≤sU_i= imgA= (kerA)^⊥ = P

i≤sVi. In particular, if s > 1, then Us−1 is a subspace of V_s^⊥. Consequently, kAuk ≥

σs−1kuk for u ∈ Us−1 and so forth. The equalities U_j = V_j and polarization imply that restrictingA/¯σ_j toV_j provides a unitary map V_j →V_j. If id denotes the identity map on V_j, j ≤ s, then hu,(id−A²/¯σ²_j)vi = 0 for all u, v ∈ Vj implies img(id−A²/¯σ²_j) = {0}. Thus, one factor in (id−A/σ¯j)(id +A/¯σj) = (id−A²/¯σ²_j) must have a nontrivial kernel, that is, Av⁰ ∈ {¯σ_jv⁰,−¯σ_jv⁰} for some unit length v⁰ ∈ V_j. If V_j∩(span{v⁰})^⊥ is nontrivial, then the same argument applies. The recursion continues until all directions inVj are exhausted.

Proof of lemma2.5. Let X = P

i≤rkXσ_iu_ihv_i,^•i and Y = P

i≤rkY σ_i⁰u⁰_ihv⁰_i,^•i be singular value decompositions of X and Y, respectively. If needed, then σrkX+p = 0 = σ⁰_rkY_+p for all p ≥ 1. The meaning of u_rkX_+p, v_rk_X+p, u⁰_rkY_+p, and v⁰_rkY_+p, p ≥ 1, is immaterial. The distinct singular values of X are represented by ¯σ₁, . . . , ¯σ_s, wherein s provides the number of distinct singular values of X and ¯σs+p = 0 if p ≥ 1. In addition, m1 symbolizes the multiplicity of ¯σ1. The equality kXk_op= ¯σ1P

i≤rkY σ⁰_i=hY, Xi ≤σ¯1P

i≤m₁σ⁰_i+ ¯σ2P

i>m1σ_i⁰ reveals rkY ≤m₁. Furthermore, the Cauchy-Schwarz inequality yields

¯ σ₁ X

i≤m₁

σ_i⁰ =hX, Yi= X

i≤m₁

σ⁰_i

¯ σ₁ X

j≤m₁

hu⁰_i, u_jihv_j, v_i⁰i

+ X

i≤m₁

σ_i⁰

m1<j≤rkX

σ_jhu⁰_i, u_jihv_j, v⁰_ii

≤σ¯₁ X

i≤m₁

σ⁰_i X

j≤m₁

|hu⁰_i, u_ji||hv_j, v_i⁰i|+ ¯σ₂ X

i≤m₁

σ⁰_i X

m1<j≤rkX

|hu⁰_i, u_ji||hv_j, v_i⁰i|

≤σ¯₁ X

i≤m1

σ⁰_ika_ikkb_ik+ ¯σ₂ X

i≤m1

σ_i⁰p

1− ka_ik²p

1− kb_ik² ≤σ₁ X

i≤m1

σ_i⁰+ 0, <2.11>

wherein a_i = (a_1,i, . . . , a_m₁_,i) = (hu⁰_i, u₁i, . . . ,hu⁰_i, u_m₁i) as well as b_i = (b_1,i, . . . , b_m₁_,i) = (hv_i⁰, v1i, . . . ,hv⁰_i, vm1i). The second inequality is due to the invariance of k^•k to changes of the signs of the entries of its argument,

1 =ku⁰_ik² ≥ kP_img_Xu⁰_ik² =

j≤rkX

u_jhu_j, u⁰_ii

2 = X

j≤m₁

a²_j,i+ X

m1<j≤rkX

hu⁰_i, u_ji²,

and 1− kb_ik² =P

m1<j≤rkXhv⁰_i, vji². Moreover, the Cauchy-Schwarz inequality yields pka_ik²kb_ik²+p

(1− ka_ik²)(1− kb_ik²)) =

* p ka_ik² p1− ka_ik²

! ,

pkb_ik² p1− kb_ik²

≤1 and thereby ¯σ₁ka_ikkb_ik+ ¯σ₂p

1− ka_ik²p

1− kb_ik² ≤ σ¯₁, which in turn generates the final inequality in<2.11>. The resulting equalities in <2.11>requireka_ik= 1 =kb_ik and ai=bi.

In fact, the first, second, and third inequality in <2.11> necessarily hold for each of the two main summands individually. Consequently,u⁰_i = P

j≤m₁a_j,iu_j = U₁a_i,v_i⁰ =P

j≤m₁a_j,iv_j = V₁a_i, and Y =P

i≤m1σ_i⁰u⁰_ihv_i⁰,^•i=U₁BhhV₁,^•ii, wherein U₁= [u₁ · · · u_m₁],V₁ = [v₁ · · · v_m₁], and B =P

i≤m₁σ_i⁰a_ia^T_i. The matrix B is positive semi-definite and satisfies kBk_nuc = trB = P

i≤m1σ_i⁰traia^T_i =kYk_nuc= 1 as traia^T_i =ha_i, aii=ka_ik²= 1.

Conversely, ifY =U1BhhV₁,^•ii, then U₁BhhV₁,^•ii, σ¯₁U₁hhV₁,^•ii+X

m1<j≤rkXσ_ju_jhv_j,^•i

= ¯σ₁tr

hhU₁, U₁iiBhhV₁, V₁ii

= ¯σ₁trB guaranteeshY, Xi=kXk_opkYk_nuc = ¯σ₁.

Proof of corollary 2.6. Let U1 consists of columns (in the given order) u⁺₁, . . . , u⁺_m0

1,u⁻₁, . . . , u⁻_m00

1 and likewise with V₁ and u⁺₁, . . . , u⁺_m0

1, −u⁻₁, . . . , −u⁻_m00

1. Then, lemma 2.5 ensures the existence of a positive semidefiniteS ∈S^m¹ such thatB =U1ShhV₁,^•iiand trS = 1. Symmetry ofB implies the equality of

hu⁺_i , Bu⁻_ji=he_i,−Se_m⁰

1+ji=−s_i,m⁰

1+j and

hBu⁺_i , u⁻_ji=hSe_i, e_m⁰

1+ji=s_m⁰

1+j,i,

whereine_i symbolizes thei-th standard basis element ofR^m¹. Symmetry ofS therefore guar-antees that s_i,m⁰

1+j = s_m⁰

1+j,i = 0 for all i ≤ m⁰₁, j ≤ m⁰⁰₁. The existence of a spectral decomposition validates the final claim.

Im Dokument A framework for spatiotemporal prediction with small and heterogeneous data - and an application to consumer price indexes - (Seite 37-48)