Gramians - Euclidean space basics 15 - A framework for spatiotemporal prediction with small and

2. Euclidean space basics 15

2.4. Gramians

2.4.1. Inner product matrices

The k ×q matrix hhY, Xii with i, j-th entry hy_i, x_ji summarizes the geometric relations between two column sequences y₁, . . . , y_k and x₁, . . . , x_q in a Euclidean space W. If k = q, then the inner product h^•, ^•i on W^×k equals the trace trhh^•, ^•ii; in particular, hh^•, ^•ii = h^•, ^•i for k = q = 1. If W = R^m, then the equality hhA, ^•ii = A^T mimics ha, ^•i = a^T. More generally, the similar appearance of h^•, ^•i and hh^•, ^•ii acknowledges the overlap of the respective feature sets of h^•, ^•i and hh^•, ^•ii: firstly, every instance of hh^•, ^•ii—defined on W^×k ×W^×q—is bilinear; secondly, it exhibits symmetry to the

extend that hhX, Yii = hhY, Xii^T; thirdly, the Gramian hhY, Yii of Y = [y₁ · · · y_k]—or ‘of ^Gramian

y₁, . . . , y_k’—is positive semidefinite _positive

semidefinite

, that isha,hhY, Yiiai ≥0 for all a∈R^k.

The final property follows fromha,hhY, Yiiai=kY ak², which also implies kerhhY, Yii=

kerY. Thus, positive definite positive definite

ness of hhY, Yii—ha,hhY, Yiiai > 0 for all a 6= 0—is tanta-mount to kerY ={0}. Furthermore, kerhhY, ^•ii= (imgY)^⊥, andhhY, Yii ∈S^k guarantees hhhY, Yiia, bi=ha,hhY, Yiibi,a, b∈R^k, which leads to imghhY, ^•ii= imghhY, Yii= (kerY)^⊥.

Example(e)in section2.1has its own customary nomenclature and notation regarding inner product matrices and more specifically Gramians. In this example,

(e) the functionsy₁, . . . ,y_kandx₁, . . . ,x_qareP-square integrable random variables de-fined on a probability space (Ω,F,P). Therandom vector

random vector

sy = (y₁, . . . , y_k) andx= (x₁, . . . , x_q), that is,F/R^j-measurable functions Ω3ω 7→ z₁(ω), . . . , z_j(ω)

∈R^j with R^j symbolizing the Borel σ-field of the norm topology on R^j, allow the al-ternative representation c7→y^Tc and c7→ x^Tc of the linear maps Y = [y₁ · · · y_k] and X = [x₁ · · · x_q], respectively. Moreover, the inner product h^•, ^•i defined on W = img [1Y X]—after adjusting the representation as in appendix 2.a if

needed—has the form of theP-expectation expectation

hx, yi=Exy =R

x(ω)y(ω)P(ω) of the pointwise productxyof x, y ∈W. Herein, 1 denotes the constant functionω 7→1.

Consequently, the inner product matrixhhY, Xiiand the GramianhhY, YiiequalEyx^T and Eyy^T, respectively. Therein, the expectations of the random matrices

random matrices

yx^T and yy^T, that is, F/R^d¹^×d²-measurable maps Ω→ R^d¹^×d² with d₁, d₂ ∈ N as well asR^d¹^×d² symbolizing the Borelσ-field of the respective norm topology, are defined entry-wise, that is,Ey_ix_j =hy_i, x_ji provides thei, j-th entry of Eyx^T.

The projection ofz ∈W onto the subspace span{1} of W equals the expectation or mean h1, zi1 = Ez of z. The latter equality implicitly identifies a function of ^mean the type ω 7→ c with c∈ R; this convention is applied throughout the text. The corresponding residual z−Ez embodies the part of z that varies across different argumentsω; its squared lengthE(z−Ez)² = var(z) is therefore called thevariance

variance

of z. The Gramian of the residuals y₁ −Ey₁, . . . , y_k−Ey_k provides the variance

matrix variance matrix

var(y) of the sequence y1, . . . , yk or (equivalently) the random vector y.

Gramians succinctly summarize the superiority of the composition ˆX_V of the linear map X = [x1 · · · xq] with the orthogonal projector onto a subspace V of W over the composition ˆX_{V /U} ofX with the oblique projector ontoV along a complementU 6=V^⊥. More specifically, the residual maps ˜X_V = ˜X =X−Xˆ and ˜X_{V /U} = ˜X_/ =X−Xˆ_/ satisfy ha,hhX˜/,X˜/iiai=kXa˜ + ˆXa−Xˆ/ak² =ha,hhX,˜ Xiiai˜ +k(PV −PV /U)Xak² <2.5>

for all a ∈ R^q. This equality directly follows from the linearity of projectors and the connectionkxk=p

hx, xi. A general comment on the role of the latter is in order.

Complementary subspaces and oblique projectors are purely linear concepts in the sense that these notions are meaningful in the absence of a norm or an inner product.

The inner product h^•, ^•i or—by polarization—its induced norm determines the mean-ing of orthogonality. It thereby smean-ingles out a specific complement V^⊥ of a subspace V of W and a single projector P_V = P_{V /V}^⊥ onto V as the orthogonal complement of V and orthogonal projector ontoV, respectively. This projector enjoys thek^•k-optimality in<2.5>. A different inner product h^•, ^•i_∗ onW gives rise to another Euclidean space usually with a different complementV^⊥^∗and projectorP_{V /V}⊥∗ being the orthogonal ones.

However, within the space (W,h^•, ^•i) the projectorP_{V /V}⊥∗ is (in general) merely oblique.

This consideration of alternative inner products h^•,^•i_∗ provides an important source

of oblique projectors in this text and facilitates the associated computations as itera-tive schemes as in section 2.2.2 become applicable. Section 4.2.2 continues this line of argument. Sections4.1.1 and 4.2.1 further investigate the final summand in <2.5>.

An inner product h^•, ^•i on W endows every sequence y₁, . . . , y_k with a Gramian hhY, Yii, that is, a positive semidefinite element of S^k with kernel kerY. Lemma 2.3 provides a converse statement and an important source of further inner productsh^•, ^•i_∗ on span{y₁, . . . , y_k}. A proof of this assertion starts on page 40in appendix 2.b.

Lemma 2.3. If G∈S^k is positive semidefinite and kerG= kerY, then there exists an inner product h^•, ^•i_∗ on span{y₁, . . . , y_k} with hy_i, y_ji∗ =g_i,j.

The required kernel equality in lemma 2.3 is tantamount to imghhY, ^•ii= (kerY)^⊥ = (kerG)^⊥ = imgG. In particular, every row

Y(ω) = y1(ω), . . . , yk(ω) row

∈ R^k of Y exhibits a representation of the form hhY, Qiic, wherein Q ∈ imgY^×h is h^•, ^•i-unitary with columns q₁, . . . , q_h and c= q₁(ω), . . . , q_h(ω)

. Therefore, the kernel condition in lemma2.3 requires the column space imgGof G to contain all rows ofY.

Lemma 2.2 yields a Cholesky decomposition of the GramianhhY, Yii, that is, ^Choleskydecomposition

hy₁, y₁i . . . hy₁, y_ki ... . .. ... hy_k, y₁i . . . hy_k, y_ki













GramianhhY, Yii

r_1,1 ... . ..

r_1,h . . . r_h,h ... . .. ...

r_1,k . . . r_h,k













r1,1 . . . r1,h . . . r1,k

. .. ... . .. ...

r_h,h . . . r_h,k













Cholesky factorR

Here, matricesRas in lemma2.2are referred to asCholesky factor

Cholesky factor

s. Section2.4.2 recov-ers such factors directly fromhhY, Yii via an implicit Gram-Schmidt orthogonalization.

As a corollary to lemma 2.3, there exists a Euclidean space V with inner prod-ucth^•, ^•i_∗and a spanning sequencey₁, . . . , y_k ∈V such thathy_i, y_ji=g_i,jcorresponding to any positive semidefinite matrix G ∈ S^k. In fact, the columns g₁, . . . , g_k of G span a subspace of R^k, and the kernel equality required by lemma 2.3 holds trivially. Con-sequently, the factorization process in section 2.4.2 finishes successfully whenever it is applied to a (nonzero) symmetric and positive semidefinite matrix.

2.4.2. Cholesky factorization

This sections considers a nontrivial sequence y₁, . . . , y_k with Y = [y₁ · · · y_k] and GramianhhY, Yii. In this case, the Cholesky factorization comprisesk major steps and a final reduction. Thej-th of these steps parallels the j-th major step of a Gram-Schmidt orthogonalization. It transforms thej-th row of the Gramian hhY, Yii to the j-th row of a preliminary upper triangular matrix ¯R using the first j−1 rows of the latter matrix.

To this end, it employs a sequence of triangularization steps—paralleling the orthogo-nalization steps in the Gram-Schmidt orthogoorthogo-nalization—and a scaling step. If needed, the reduction extracts a row echelon matrixR from ¯R; otherwise R = ¯R.

The first major step of a Gram-Schmidt orthogonalization considers y₁ alone, thus, includes no initial orthogonalization steps. In case y₁ 6= 0, it scales y₁ to obtain ¯q₁ = y₁/¯r_1,1, wherein ¯r_1,1 = ±ky₁k; if y₁ = 0, it concludes with ¯q₁ = 0, ¯r_1,1 = 0. Based on ¯q₁, the calculation of the coordinates ¯r_1,` = h¯q₁, y_`i, 2 ≤ ` ≤ k, is straightforward and is deferred to the second to k-th major step, respectively. In comparison, the first major step of the factorization considers merely the first row of the Gramian hhY, Yii.

No triangularization is required as upper triangularity places no constraints on the first row of ¯R. The case hy₁, y₁i = 0 implieshy₁, y_ji= 0 for 2≤j ≤k and—as the first row of hhY, Yii already equals that of ¯R—requires no action. Conversely, ifhy₁, y₁i>0, then the equality ¯r²_1,1 = hy₁, y₁i implies ¯r_1,1 = ±ky₁k. Thus, the first major step concludes with scaling the second to k-th entry of the first row of hhY, Yii by 1/¯r_1,1 to obtain the elements ¯r_1,`=hy₁, y_`i/¯r_1,1 =h¯q₁, y_`i,`≥2, of the first row of the preliminary matrix ¯R.

The j(> 1)-th major step of a Gram-Schmidt orthogonalization completes the or-thogonalization of y₁, . . . , y_j starting from ¯q₁, . . . , ¯q_j−1, y_j. On the way it obtains the coordinates ¯r_1,j, . . . ,r¯j−1,j and finally finishes by scaling ˜y_j^(j−1) if necessary. In compar-ison, the j-th major step of the factorization calculates the j-th row of ¯R based on its top j −1 rows and the j-th row hhy_j, Yii of hhY, Yii. It starts with j −1 triangulariza-tion steps—paralleling the above orthogonalizatriangulariza-tion steps—to eliminate the initial j−1 entries of hhy_j, Yii and finally scales the reduced row; a visual outline is given in <2.6>.

r_1,1 r¯_1,2 . . . r¯1,j−1 r¯_1,j . . . ¯r_1,k

r_2,2 . . . r¯2,j−1 r¯_2,j . . . ¯r_2,k . .. ... ... . .. ...

rj−1,j−1 r¯j−1,j . . . r¯j−1,k

hy_j, y₁i hy_j, y₂i . . . hy_j, yj−1i hy_j, y_ji . . . hy_j, y_ki

1sttriangulaization step 2ndtriang. step j−1st triang. step j-1toprowsof¯R jth row ofhhY, Yii

<2.6>

More specifically, the first triangularization step subtracts ¯r_1,j times the first row of ¯R from the final row in<2.6>. Thus, the `-th transformed entry equals

hy_j, y_`i −¯r_1,jr¯_1,` =hy_j, y_`i −r¯_1,jh¯q₁, y_`i

=hy_j−q¯₁r¯_1,j, y_`i=h˜y_j⁽¹⁾, y_`i=h˜y_j⁽¹⁾,y˜_`⁽¹⁾+ ¯q₁r¯_1,`i=h˜y_j⁽¹⁾,y˜_`⁽¹⁾i, <2.7>

wherein the notation is borrowed from<2.3>: ˜ys⁽¹⁾ symbolizes the residual from orthog-onally projecting y_s, s ≤k, onto span{y₁}. In particular, the equality ˜y⁽¹⁾₁ = 0 ensures that the first element of the final row in <2.6> disappears. The following triangular-ization steps are in analogy and implement the orthogonaltriangular-ization against y₂, . . . , y_j−1. Hence, these steps turn the final row of <2.6> into

0 0 . . . 0 h˜y_j^(j−1),y˜_j^(j−1)i h˜y^(j−1)_j ,y˜_j+1^(j−1)i . . . h˜y^(j−1)_j ,y˜^(j−1)_k i

. <2.8>

If y_j ∈ span{y₁, . . . , yj−1}, thus ˜y_j^(j−1) = 0, then the row in <2.8> equals zero, that is, the j-th row of a preliminary upper triangular matrix ¯R produced during a Gram-Schmidt orthogonalization. Consequently, the procedure may advance to the next major step. An alternative given in<2.9>multiplies this zero row by zero to endow every major step with a scaling operation. The case ˜y^(j−1)_j 6= 0 allows one of the two possible choices

r_j,j = ±k˜y_j^(j−1)k. Subsequently, scaling <2.8> by 1/¯r_j,j to obtain ¯r_j,` = hq¯_j,y˜_`^(j−1)i = h˜y_j^(j−1),y˜_`^(j−1)i/¯r_j,j is meaningful and concludes the construction of the j-th row of ¯R.

An complete description is given in display <2.9>. Therein, major steps are indexed byj, triangularization steps byi, and elements of the current row—the final row in<2.6>

at the start of the j-th major step—by `. This indexing parallels the above discussion.

If the equality k = h = rkY holds, that is, kerY = kerhhY, Yii = {0}, then ¯R is upper triangular with nonzero diagonal elements, thus, of row echelon form. Otherwise, dropping the zero rows of ¯R yields a Cholesky factorR as in lemma2.2.

1 ¯r⁽⁰⁾_i,j =hy_i, y_ji, i, j ≤k

2 forj = 1, . . . , k

3 for i= 1, . . . , j−1

4 for` = 1, . . . , k

5 r¯_j,`⁽ⁱ⁾ = ¯r_j,`⁽ⁱ⁻¹⁾−r¯_i,jr¯_i,`

6 ifr¯^(j−1)_j,j 6= 0

7 s_j =± r¯_j,j^(j−1)^−1/2

8 else

9 s_j = 0

10 for `= 1, . . . , k

11 r¯_j,` = ¯r_j,`^(j−1)s_j

<2.9>

The factorization<2.9>applies the same operations to the rows ofhhY, Yiito obtainR as a Gram-Schmidt orthogonalization with corresponding sign choices executes on the columns ofY = [y₁ · · · y_k] to obtainQ. The first part of <2.7> and the scaling applied to <2.8> exemplify this observation. Viewing the factorization as a sequence of pre-multiplications with suitable matrix factors yields a concise statement. For example,

L₃ =



 1

−¯r_1,3s₃ −¯r_2,3s₃ s₃



=



 1

1 s₃







 1

−¯r_2,3 1







 1

−¯r_1,3 1



 implements the third major step of <2.9> for k = 3. The first and second major step exhibit analogous factors L₁ and L₂, respectively, leading to L₃L₂L₁hhY, Yii = ¯R.

The general case hhY, Yii ∈ R^k×k uses k lower triangular factors L₁, . . . , L_k such that L_kL_k−1· · ·L₁hhY, Yii = ¯R. The reduction step amounts to a factor L_k+1 ∈ R^h×k with rowse_i₁, . . . , e_i_h, whereine_` denotes the `-th standard basis element ofR^k and i₁ < i₂ <

· · ·< i_h represent the indexes corresponding to nonzero rows in ¯R. Using this notation [y₁ · · · y_k]L^T₁L^T₂ · · ·L^T_j = [¯q₁ q¯₂ . . .q¯_j y_j+1 . . . y_k] , j ≤k , and

[¯q₁ · · · q¯_k]L^T_k+1 = [q₁ · · · q_h] is a restatement of the above Gram-Schmidt orthogonalization.

Im Dokument A framework for spatiotemporal prediction with small and heterogeneous data - and an application to consumer price indexes - (Seite 32-37)