Prediction algorithms - Prediction techniques 89

4. Prediction techniques 89

4.4. Prediction algorithms

single ω ∈ Ω. However, the bounds resulting from corollary 4.5 refer to the norm k^•k, which merely provides an average (acrossω) distance measure. Secondly, the findings of section 4.2.3 apply as solely projections in W⁰ are of concern, that is, the bounds may be rather “conservative”. Finally, if kerA^T = {0}, then an increase of the error vari-ancesρ²_t,i, (t, i)∈I_obs, that is, the diagonal elements ofS, decreases the first factor in the final term in <4.14>, but also increases the k^•k-length of the residuals resulting from orthogonal projection ontoV, that is, the second factor of the final term in <4.14>.

fromR^rk^X+k withk=P

t≤nk_t>0 toW is unitary. Hence, the columns of the matrixB form a representation—in the sense of section 2.2.1—of the columns of [Y X].

In this setting, the goal is to evaluate the orthogonal projectionsP_Vx_t,j of x_t,j, (t, j)∈ I_x, ontoV = span{y_t,i|(t, i)∈I_obs}at someω ∈Ω. The available information comprises the images y_t,i(ω), (t, i) ∈ I_obs, as well as the Gramian hhX, Xii of X, the aggregation matrices A_t, and S_t, t ≤ n. Thus, the images y_t,i(ω) are referred to as observations;

the corresponding functionsy_t,i provide the observables. The recovery of B from these inputs requires transforminghhX, Xii into R_X via the Cholesky factorization <2.9>.

The first step of the calculation of the required images P_Vx_t,j

(ω), (t, j)∈I_x, modifies the representation in<4.16>. More specifically, lemma2.2also applies to the coordinate matrix B. Composing the resulting unitary map Q_B from R^rk^B to imgB with [Q_x V] yields a unitary mapQ_Y,X = [Q_X V]Q_B fromR^{rk[Y X]} to the image img [Y X] such that

R_1,1 R_1,2 . . . R_1,x R_2,2 . . . R_2,x . .. ...

R_n+1,x













coord. vectors with resp. toQ1

coord. vectors with resp. toQ₂

coord. vectors with resp. toQn+1

coords.

ofY¹ coords.

ofY²

coords.

ofX

coordinate matrixRB

[Q₁ Q₂ · · · Q_n+1] [Y₁ Y₂ · · · X] =

unitary mapQY,X

. <4.17>

This display presupposes k1, k2 ≥ 1 with imgY2 6⊂ imgY1 and imgX 6⊂ imgY to con-cretize the appearance. The transition from<4.16> to <4.17> amounts to a change of the basis on the right hand side of <4.16> alongside a corresponding adjustment of the coordinates. Section 4.4.2 advances this point of view and discusses an alternative (to the Gram-Schmidt orthogonalization) implementation of this transition.

If the coordinate matrixR_Bis available alongside the observationsy_t,i(ω), (t, i)∈I_obs, then a recursive evaluation of the first k⁰ = rkY columns of QY,X at ω is possible. In fact, section 2.4.1 shows that the rows Y₁(ω⁰) = y_1,1(ω⁰), . . . , y_1,k₁(ω⁰)

, ω⁰ ∈ Ω, lie in imghhY₁, Y₁ii = imghhR_1,1, R_1,1ii = imgR^T_1,1 In particular, the equality Y₁ = Q₁R_1,1 ensures that the rowQ₁(ω) = q₁(ω), . . . , q_k⁰

1(ω)

of Q₁ satisfies the relation

R^T_1,1





 q₁(ω)

... q_k⁰

1(ω)





=







y_1,1(ω) ... y_1,k₁(ω)





 , <4.18>

wherein k₁⁰ = rkY₁ and k₁, k⁰₁ > 0 is assumed. Moreover, this equality uniquely de-termines Q₁(ω) as the rows of the row echelon matrix R_1,1, that is, the columns of its transpose R^T_1,1, are linearly independent—as elements of R^k¹. Section 4.4.2 discusses a recursive procedure for the calculation of the vectorQ₁(ω)∈R^k

1 in casek₁ =k₁⁰.

If k₂ ≥ 1 and imgY₂ 6⊂ imgY₁, then the columns q_k⁰

1+1, . . . , q_k⁰

1+k⁰₂ of Q₂ in <4.17>

provide an orthonormal basis of the k₂⁰-dimensional subspace imgP_(img_Y₁₎^⊥Y₂ of W. The evaluation of these columns uses the already available vector Q₁(ω) alongside the equalityY₂ =Q₁R_1,2+Q₂R_2,2, which is implied by the representation in <4.17>. More specifically, the resulting pointwise (with respect toω) relation

R^T_2,2





 q_k⁰

1+1(ω) ... q_k⁰

1+k₂⁰(ω)





=







y_2,1(ω) ... y_2,k₂(ω)





−R^T_1,2





 q₁(ω)

... q_k⁰

1(ω)





 ,

wherein k₁⁰ > 0 is assumed, uniquely determines Q2(ω) due to the row echelon form ofR_2,2. The evaluation of further basis elements proceeds in analogy. Finally, combining the images q_i(ω), i≤k⁰ = rkY, with the coordinates of x_t,j, (t, j)∈I_x, with respect to columnsq1, . . . , qk⁰ of [Q1 · · · Qn] in<4.17>yields the required images PVxt,j

(ω) of ω under the orthogonal projections of x_t,j, (t, j)∈I_x, onto the subspace V.

Section 4.4.3 shows how a more refined recursive strategy allows to exploit a special and complementary structure of hhX, Xii and the aggregation matrix A.

4.4.2. Basis changes by reflection

The section considers a finite sequencez1, . . . ,z`,`∈N, of linearly independent elements of a Euclidean space W with inner product h^•, ^•i. The sequence v₁, . . . , v_` forms an orthonormal basis of W⁰ = span{z_j|j ≤ `}, and the i, j-th entry of B ∈ R^`×`

equals bi,j =hvi, zji. Consequently, one has the equality

b_1,1 . . . b_1,`

... . .. ... b_`,1 . . . b_`,`













coords.

ofz₁

coords.

of z_`

with resp.

to v1

with resp.

tov_`

[v₁ · · · v_`] [z₁ · · · z_`] =

hv1, z1i

hv`, z1i unitary mapV

. <4.19>

Therein, the columns of B provide a representation of z₁, . . . , z_` in the sense of sec-tion2.2.1. In this setting, the goal is to obtain a representation as in lemma 2.2, that is,

r1,1 . . . r1,`

. .. ... r_`,`











 [q₁ · · · q_`]

[z₁ · · · z_`] =

triangular coordinate matrixR with nonzero diagonal elements unitary map fromR^`

toW⁰ = span{z1, . . . , z`} .

Reflections about hyperplanes in R^` of the form{hu, ^•i= 0} with nonzero u, that is, orthogonal complements of subspaces span{u}, are key to an alternative computational strategy, which yields the same output as the Gram-Schmidt orthogonalization <2.3>

when applied toz₁, . . . , z_` with suitable sign choices. Such reflections amount to unitary maps fromR^` to R^`. Thus, their columns form orthonormal bases of R^`, which implies that the GramianhhO, Oii=O^TO of such a reflectionO equals the`×`identity matrixI.

Lemma 4.6 asserts that the alternative strategy leads to the same output as <2.3>. More specifically, a sequence of reflectionsO₁, . . . , O_` recovers the upper triangular co-ordinate matrix R produced by a Gram-Schmidt orthogonalization viaR =O_`· · ·O₁B.

Lemma 4.6. If a sequence z₁, . . . , z_` of linearly independent elements of a Euclidean space W exhibits a representation as in<4.19>, wherein v₁, . . . , v_` amounts to an or-thonormal basis ofW⁰ = span{z_i|i≤`}, then there exist reflectionsO₁, . . . , O_` such that

V B =V O₁^TO₂^T· · ·O^T_`

| {z }

Q=[q1···q`]

O`· · ·O1B

| {z }

=QR ,

whereinQ, R denote a unitary map and an upper triangular matrix, respectively, which equal the output of<2.3> when applied to z₁, . . . , z_` with suitable sign choices.

If the coordinate matrix R with respect to q₁, . . . , q_` of z₁, . . . , z_` is available in addition to the images z₁(ω), . . . , z_`(ω), then q₁(ω), . . . , q_`(ω) may be obtained based on





 r_1,1 r_1,2 r_2,2

... ... . ..

r_1,` r_2,` . . . r_`,`











 q₁(ω) q₂(ω)

... q_`(ω)











 z₁(ω) z₂(ω)

... z_`(ω)





 ,

wherein` >2 is assumed for the sake of presentation. More specifically, r_j,j 6= 0, j ≤`, implies the equalities q₁(ω) =z₁(ω)/r_1,1, q₂(ω) = z₂(ω)−r_1,2q₁(ω)

/r_2,2, and so forth.

The computational recipe<4.20>exploits these relations to calculate q_i⁰ =q_i(ω).

1 z_j⁽⁰⁾ =z_j(ω), j ≤`

2 fori= 1, . . . `

3 q_i⁰ =z_i⁽ⁱ⁻¹⁾/r_i,i

4 for j =i+ 1, . . . , `

5 z_j⁽ⁱ⁾ =z⁽ⁱ⁻¹⁾_j −r_i,jq⁰_i

<4.20>

The remainder of this section justifies the assertion of lemma 4.6

Panel (A) of figure 4.7 illustrates the reflection of a nonzero elementx ∈R³ about a ^reflection hyperplane H ={hu, ^•i= 0} into its mirror image x⁰. This transformation amounts to

first projectingx onto H to obtain the orthogonal projection ˆx=P_Hx and thereby the corresponding residual ˜x=x−PHx=P_H^⊥x. Next, this residual is subtracted from ˆxto reach x⁰ on the opposite side of—but with equal distance infy∈Hkx−yk=k˜xk to—the

x⁰ ˆ

˜ x

−˜x

H={hu,•i=0}

b1+kb1ke1

ˆb₁

b₁

−kb₁ke₁ kb₁ke₁

b1+kb1ke1

(span{b1+kb1ke1})^⊥=H

unit circle

(A) (B)

Figure 4.7

The figure illustrates the geometry of a reflection and shows that the image of a givenb₁ ∈R² under a suitable reflection lies on the first coordinate axis. Panel (A) illustrates the reflection of a nonzero element x ∈ R³ about a hyperplane H = (span{u})^⊥. Panel (B) shows the transformation of a nonzerob1 ∈R²into a multiple−kb₁ke₁of the first standard basis element.

The transition fromb₁ to−kb₁ke₁ proceeds by reflection about H= (span{b₁+kb₁ke₁})^⊥. subspace H. Linearity of projectors and Pythagoras’s theorem ensure that the map given byx7→x⁰ = ˆx−x, called˜ Householder transform

Householder transform

, is linear and length preserving.

Moreover, the equalities P_H(ˆx−x) = ˆ˜ x and P_H^⊥(ˆx−x) =˜ −˜x show that the reflection ofx⁰ = ˆx−x˜aboutHequals (x⁰)⁰ = ˆx−(−˜x) =x. Consequently, Householder transforms are also bijective and therefore provide unitary maps fromR^` to R^`.

The utility of reflections comes from their ability to transform a point x into any target x⁰ of the same length by adequate choice of H. In fact, a reflection modifies x merely with respect to its component ˜xinH^⊥. Thus, as shown in panel (A) of figure4.7 the difference x−x⁰ lies in the one dimensional subspace H^⊥. Unit dimension ensures that if x6=x⁰, then scaling the difference x−x⁰ leads to an orthonormal basis of H^⊥ in form ofu= (x−x⁰)/kx−x⁰k. The residual x−uhu, xi from orthogonally projecting x ontoH^⊥equals the orthogonal projection ofxontoH. Thus, a reflection transformingx intox⁰ is given by id−2uhu, ^•i, wherein id symbolizes the identity map onR^`. Ifx=x⁰, then id mapsx tox⁰ and setting u to zero generalizes the previous construct.

The situation of lemma4.6 is given byZ =V B, wherein the kernel ofZ = [z₁ · · · z_`] equals{0},V = [v₁ · · · v_`] is unitary, andb_i,j =hv_i, z_ji. Linear independence ofz₁, . . . , z_`,`≥1, implies thatz₁and thus its lengthkz₁kis nonzero. The two elements±kb₁ke₁— e₁ being the first standard basis element of R^` as in example (a) of section 2.1—share their length with b₁ = (b_1,1, . . . , b_`,1) and hence z₁. Therefore, a suitable Householder transformO₁ yieldsO₁b₁ =±kb₁ke₁. Panel (B) of figure 4.7 illustrates the construction of O₁ for the choice of −kb₁ke₁ and a nonzero element b₁ of R². Therein, the element needed for constructing O₁—previously denoted by u—follows by scaling b₁+kb₁ke₁.

The composition V O₁ of two unitary maps is itself of that kind, and consequently its images of the standard basis elementse₁, . . . ,e_mform another orthonormal basis ofW⁰ = span{z₁, . . . , z_`}. Moreover, the Householder transform O₁ represents the multiplication with the symmetric matrix I −2uu^T, wherein u denotes an orthonormal basis of the respective H^⊥. The resulting equality O₁^T = O₁ proves—once more—that O₁O₁ = I withI being the identity matrix. This equality underlies the first basis change given by

b⁽¹⁾_1,1











 r1,1 r1,2 . . . r1,`

b⁽¹⁾_2,2 . . . b⁽¹⁾_2,`

... . .. ... b⁽¹⁾_`,2 . . . b⁽¹⁾_`,`

coords.

of z¹

coords.

of z²

laz^`

coords.

of z^`

with respect toq1

with respect tov₂⁽¹⁾, . . . , v⁽¹⁾_`

q₁ v₂⁽¹⁾ . . . v_`⁽¹⁾i Z =V B =V O₁ O₁B

This display assumes ` > 1. In any case, one has z1 = r1,1q1, that is, q1 = ±z1/kz1k and r_1,1 =±kz₁k. If ` ≥ 2, then the orthogonal projection of z_j, j ≥ 2, onto span{z₁} equals r_1,jq₁. Hence, the quantities q₁ and r_1,j, j ≤`, coincide with those generated by the Gram-Schmidt orthogonalization<2.3>. Moreover, the sequencev₂⁽¹⁾, . . . , v⁽¹⁾_` forms an orthonormal basis of (span{z₁})^⊥ (in W⁰). Consequently,b⁽¹⁾_j = (b⁽¹⁾_2,j, . . . , b⁽¹⁾_`,j) equals the coordinate vector with respect to this basis of the residual z⁽¹⁾_j = z_j −r_1,jq₁ for allj ≥2. Linear independence guarantees that these residuals and b⁽¹⁾_j are nonzero.

If ` ≥2, then a suitable Householder transform O⁰₂ reflects the coordinate vector b⁽¹⁾₂ with respect tov₂⁽¹⁾, . . . , v_`⁽¹⁾into one of the two vectors±kb⁽¹⁾₂ ke₁. The relevant properties of O₂⁰ carry over toO₂ = ¹_O⁰

, which drives the next basis change. If ` >2, then











 r1,1 r1,2 r1,3 . . . r1,`

r_2,2 r_2,3 . . . r_2,`

b⁽²⁾_3,3 . . . b⁽²⁾_3,`

... . .. ... b⁽²⁾_`,3 . . . b⁽²⁾_`,`

coords.

of z¹

coords.

of z²

coords.

of z³

laz^` coords.

of z^`

with respect

toq1andq2

with respect to v⁽²⁾₃ , . . . , v⁽²⁾_`

q₁ q₂ v⁽²⁾₃ . . . v_`⁽²⁾i Z = (V O₁O₂)(O₂O₁B) =

wherein the presentation presupposes ` > 3. Proceeding in this fashion verifies the assertion of lemma 4.6 for an arbitrary sequence length `∈N.

4.4.3. Recursive processing

This section designs a recursive procedure to evaluate orthogonal projections. The set-ting amounts to an extended special case of the scenario in section 4.4.1. More specifi-cally, two sequences of real-valued functionsw_t,j,I_x ={1, . . . , n}×{1, . . . , m},m, n∈N, and v_t,i, (t, i)∈ I_obs 6=∅, on a common set Ω form an orthonormal basis of a Euclidean space (W,h^•, ^•i). In particular, the second index setIobs is a finite and nonempty sub-set ofN×N. The discussion mostly focuses on the additional elements x_t,j, (t, j) ∈I_x, and y_t,i, (t, i) ∈ I_obs, of W. In fact, the task is to evaluate at ω the orthogonal projec-tions PVxt,j of xt,j onto V = span{yt,i|(t, i) ∈ Iobs} based on coordinate information and the observationsy_t,i(ω), (t, i)∈I_obs. The functionsx_t,j and y_t,i are given by

X₁ X_t

= [x_1,1 · · · x_1,m] =W₁L^T₁ ,

= [x_t,1 · · · x_t,m] =Xt−1Θ^T+ρW_t, 2≤t ≤n , and <4.21a>

Y_t = [y_t,1 · · · y_t,k_t] =

z }| { Xt−1B_t^T X_t

A^T_t

z }| { −I

B_t^T J_t^T

+V_tS_t^T , t ≤n . <4.21b>

The first part <4.21a> defines the quantities x_t,j. Therein, the linear maps W_t, t ≤ n, amount to [w_t,1 · · · w_t,m], the kernel of them×mmatrixL^T₁ equals{0}, andρ >0. These restrictions guarantee linear independence of x_t,j, (t, j)∈I_x. Moreover, Θ∈R^m×m.

The second part<4.21b>specifies the observablesy_t,i. Here, the first summand equals Z_tA^T_t =

X_t−1B_t^TX_t −I

B_t^T J_t^T

= [X_t−1 X_t] −B_t^T

B_t^T J_t^T

with J_t^T ∈ R^m×k

t, B_t^T ∈ R^m×k

t, k_t⁰ +k_t⁰⁰ = k_t > 0 being the number of observations at t, and k_t⁰, k_t⁰⁰ ∈ N∪ {0}. If either of k_t⁰ and k_t⁰⁰ equals zero, then the other coincides with k_t, and A^T_t consists of only one of the shown block columns. More specifically, the equalityk_t⁰⁰ = 0 impliesZ_t=X_tandA^T_t =J_t^T. This case holds fort = 1, and hence there is no need to ponder the meaning ofX₀. Ifk_t⁰ >0, then the columns ofJ_t^Tform a sequence of linearly independent elements of R^m. Finally, the second summand of Y_t equals the composition V_tS_t of the linear map V_t= [v_t,1 · · · v_t,k_t] with the matrix S_t^T∈R^k^t^×k^t.

The subsequent discussion assumes linear independence of the observables y_t,i, (t, i)∈ I_obs =∪t≤n {t}×{1, . . . , k_t}

. The latter amounts to linear independence of the columns of the matrix shown in figure4.8 asx_t,j, v_t,i, (t, j)∈I_x, (t, i)∈I_obs, are linearly indepen-dent. The equalities kerS_t^T ={0}, t ≤n, or alternativelyk⁰⁰_t = 0, t ≤n, guarantee this condition. Otherwise, ifk⁰⁰_t >0 and kerS_t^T 6={0}for somet ≤n, then the requirement of linearly independent observables restricts consecutive matricesJ_t,Jt−1and kerB_t^T,t ≥2.

Changing the focus fromx_t,j, (t, j)∈I_x, to the functionsz_t,j,t ≤n,j ≤k_t⁰⁰+m, lowers the complexity of the notation when designing a computational strategy to evaluate the mentioned predictions. In fact, the equalities x_t,j = z_t,j, j ≥ k⁰⁰_t + 1, imply that the computation of the rows P_VZ_t

(ω) of P_VZ_t, t≤n, settles the original prediction task.

−B₂^T

B₂^T J₂^T −B₃^T B₃^T J₃^T

. ..

−B_n^T

B_n^T J_n^T S₁^T

S₂^T

S₃^T

. ..

S_n^T











 J₁^T

coords.

ofY1

coords.

ofY2

coords.

ofY3

coords.

ofYn

with resp. toX1

with resp. toX2

with resp. toX₃

with resp. toX_n−1 with resp. toXn

with resp. toV₁ with resp. toV2

with resp. toV3

with resp. toV_n

Figure 4.8

The figure shows the matrix of coordinates of the columns of [Y₁ · · · Y_n] with respect to the columns of [X₁ . . . X_nV₁ . . . V_n] as specified in<4.21a>and<4.21b>and for the casen >3.

If the equality k⁰⁰_t = 0 holds—as in case t= 1—or k⁰_t = 0 for some t≤ n, then k⁰_t =kt >0 ork⁰⁰_t =k_t>0 and the block column containingB^T_t and J_t^T, respectively, disappears.

In addition, the specification in <4.21a> may be replaced by the equalities

Z₁ =W₁L^T₁ , Z_t=Zt−1

T_t^T

z }| {

B_t^T Θ^T

+W_t

K^T

z }| {

[ ρI], 2≤t≤n , <4.22>

wherein I denotes the m×m identity matrix, and n ≥ 2 is assumed. If the number k⁰⁰_t−1 of columns of B_t−1^T equals zero—as in case t = 2, then the first block row of T_t^T disappears. The same applies to the first block column ofT_t^T and K^T if k⁰⁰_t = 0. If the equalitiesk_t−1⁰⁰ = 0 and k_t⁰⁰ = 0 hold simultaneously, thenZ_t=X_t, T_t= Θ, and K =ρI.

The following presentation considers the casen > 4. In this setting, the strategy of sec-tion4.4.1is applicable. However, adapting the computations to the specification<4.22>

and <4.21b>allows to exploit this particular structure. The rearranged equivalent [Y₁ Z₁ Y₂ Z₂ . . . Y_n Z_n] = [W₁ V₁ W₂ V₂ . . . W_nV_n]B <4.23>

to<4.16>, wherein the matrixB is as shown in figure 4.9, facilitates this endeavor.

The computational strategy which is developed here proceeds in two stages. The first stage—referred to as filtering—exchanges the orthonormal basis consisting of columns ^filtering ofW_t and V_t, t≤n, by another orthonormal basis which contains an orthonormal basis of V = span{y_t,i|(t, i) ∈ I_obs} as a subsequence. Successive evaluation at ω of the or-thogonal projections of the columns ofZ₁ontoV⁽¹⁾ = imgY₁, the orthogonal projections of the columns ofZ₂ontoV⁽²⁾ = img [Y₁ Y₂], and so forth allows a considerable reduction of the computations needed to evaluate the elements of this subsequence. The second step—calledsmoothing

smoothing

—calculates the required images P_Vx_t,j

(ω), (t, j)∈I_x, based on the output of the first step. Herein, successive calculation of the entries of PVZn−1

(ω), the entries of P_VZn−2

(ω), and so forth further allows to avoid redundant calculations.

The first step of the filtering stage replaces the columns of W₁ and V₁. More specif-ically, a suitable sequence of k1 Householder transforms—chosen with respect to the columns ofY₁, but applied to the relevant rows of all columns of B—yields

R_1,Y₁ R_1,Z₁ R_1,Z₁T₂^TA^T₂ R_1,Z₁T₂^T . . . L^T_2,∗

L^T_2,∗T₂^T K^T

A^T₂

L^T_2,∗T₂^T K^T

. . . S₂^T

. ..













L^T₂

[Y1 Z1 Y2 Z2 · · ·]

= [Q₁ V₁⁰ W₂ V₂ · · ·]

coords.ofproj. ontoV(1)=imgY1

coords.ofproj. onto[V(1)]⊥

L^T₂A^T₂ L^T₂ L^T₂T₃^TA^T₃ L^T₂T₃^T . . . S₂^T

K^TA^T₃ K^T . . . S₃^T

. ..

P(V(1))⊥Y² P(V(1))⊥Z²

P(V(1))⊥Y³

P(V(1))⊥Z³

· · · W₂⁰=

V₁⁰ W2

V₂ W3

... modified by

Householder transforms

Therein, the use of solelyk₁ Householder transforms does not ensure a particular struc-ture of L^T_2,∗, which is therefore treated as a general m×m matrix. In contrast, linear independence of y_1,1, . . . , y_1,k₁ implies that R_1,Y₁ is a k₁ ×k₁ upper triangular matrix with nonzero diagonal elements. This structure facilitates the evaluation of the columns of Q1, which form an orthonormal basis of V⁽¹⁾, via the equality Y1 =Q1R1,Y1. In par-ticular, the recursive strategy in <4.20> is applicable to the calculation of the entries q_1,1(ω), . . . , q_1,k₁(ω) of Q₁(ω). Once the row Q₁(ω) ∈ R^k¹ is available, the images of ω under the columns ofP_V⁽¹⁾Z1, P_V⁽¹⁾Z2, and P_(V⁽¹⁾₎^⊥Y2 may be obtained via

P_V⁽¹⁾Z₁ =Q₁R_1,Z₁, P_V⁽¹⁾Z₂ = P_V⁽¹⁾Z₁

T₂^T, and P_(V⁽¹⁾₎^⊥Y₂ =Y₂− P_V⁽¹⁾Z₂ A^T₂ .

L^T₁A^T₁ L^T₁ L^T₁T₂^TA^T₂ L^T₁T₂^T . . . L^T₁T₂^T· · ·T_n^TA^T_n L^T₁T₂^T· · ·T_n^T S₁^T

K^TA^T₂ K^T . . . K^TT₃^T· · ·T_n^TA^T_n K^TT₃^T· · ·T_n^T S₂^T

. .. ... ...

K^TA^T_n K^T

S_n^T













coords.

of Y¹

coords.

of Z¹

coords.

of Y²

coords.

of Z²

coords.

of Yⁿ

coords.

of Zⁿ

w. resp.

toW1

w. resp.

toV1

w. resp.

toW₂ w. resp.

toV2

w. resp.

to Wn

w. resp.

toVn

Figure 4.9

The figure illustrates the structure of the coordinate matrix B in<4.23>under the assump-tion thatn >3. The shown structure derives from the specification in<4.22>and<4.21b>.

Moreover, the coordinates of the orthogonal projections of the columns of Zt and Yt, t≥2, onto (V⁽¹⁾)^⊥—shown in the lower part of the previous display—exhibit the same overall structure as those of Z_t and Y_t, t ≥ 1. Consequently, the previous steps can be repeated with (the coordinate matrices of) Zt, Yt, t ≥ 1, replaced by (those of) the projections P_(V(1))^⊥Z_t, P_(V(1))^⊥Y_t, t ≥ 2, as well as W_t, V_t, t ≥ 1, replaced by W₂⁰ =

V₁⁰ W₂

∈ W^×2m, V₂, W_t, V_t, t ≥ 3. A suitable sequence of k₂ Householder transforms applied to the parts of the coordinate vectors corresponding toW₂⁰,V₂ yields

. .. ... ... ... ...

R_2,Y₂ R_2,Z₂ R_2,Z₂T₃^TA^T₃ R_2,Z₂T₃^T . . . L^T_3,∗

L^T_3,∗T₃^T K^T

A^T₃

L^T_3,∗T₃^T K^T

. . . S₃^T

. ..













L^T₃

[· · · Y₂ Z₂ Y₃ Z₃ · · ·]

= [· · · Q2 V₂⁰ W3 V3 · · ·] ^co

ords.ofproj. ontoV(2) =img Y1Y2

coords.ofproj. onto(V(2))⊥

coords.

P⁽^V

(1)

)^⊥

Y² coords.

P⁽^V

(1)

⊥Z²

coords.

P⁽^V

(1)

)^⊥

Y³

coords.

P⁽^V

(1)

)^⊥

Z³

Herein, linear independence of y_1,1, . . . , y_1,k₁, y_2,1, . . . , y_2,k₂ ensures that R_2,Y₂ ∈ R^k²^×k² is upper triangular with nonzero diagonal elements. Hence, the equality P_(V(1))^⊥Y₂ = Q₂R_2,Y₂ allows the computation of the rowQ₂(ω)∈R^k² via<4.20>. The columns of the corresponding linear mapQ₂ form an orthonormal basis of the subspace imgP_(V(1))^⊥Y₂, which equals the orthogonal complement ofV⁽¹⁾ inV⁽²⁾. The images under the relevant projections ontoV⁽²⁾ and (V⁽²⁾)^⊥, respectively, follow from

P_V(2)Z₂ =P_V(1)Z₂+Q₂R_2,Z₂ , P_V(2)Z₃ = P_V(1)Z₂

T₃^T+ Q₂R_2,Z₂

T₃^T= P_V(2)Z₂

T₃^T, and P_(V(2))^⊥Y₃ =Y₃− P_V(2)Z₃

A^T₃ . Finally, the linear map W₃⁰ =

V₂⁰ W₃

exhibits 3m columns and thus L^T₃ ∈R^3m×(k

00 3+m). The next steps of the filtering stage proceed in analogy. Display <4.24> contains a complete description. The notation used therein is in accordance with the previous discussion. In addition, the symbols y_t, ˜y_t+1_|_t, ˆz_t_|_j, t ∈ {t−1, t}, and q_t refer to the vectorsY_t(ω), P_(V(t))^⊥Y_t+1

(ω), P_V(j)Z_t

(ω), and Q_t(ω), respectively. Then,

1 y˜₁|0 =y1

2 zˆ₁|0 = 0

3 fort= 1, . . . , n

4 D_t=

L^T_tA^T_t L^T_t S_t^T

Householder transforms

−−−−−−−→D⁰_t=

R_t,Y_t R_t,Z_t L^T_t+1,∗

5 solve R^T_t,Y

tq_t = ˜y_t_|t−1

6 zˆ_t|t= ˆz_t|t−1+R^T_t,Z_tq_t

7 ift < n

8 L^T_t+1 =

L^T_t+1,∗T_t+1^T K^T

9 zˆt+1|t =Tt+1zˆt|t

10 y˜_t+1|t=y_t+1−A_t+1zˆ_t+1|t

, <4.24>

wherein the number of rows of the remainderL^T_t+1,∗ equalstm. In fact, the representation of Z_t⁰₊₁ after processing line 8 of <4.24> with t=t⁰ < n has the form

Q₁ Q₂ . . . Q_t⁰

V_t⁰0 W_t⁰₊₁

· · ·

orthon. basis of the (P

t≤t⁰k_t)-dim.

lin. space img [Y₁ · · · Y_t⁰]

orthon. basis of the (P

t≤t⁰k_t+t⁰m+m)-dim.

lin. space img [W₁V₁ · · · W_t⁰ V_t⁰ W_t⁰₊₁] W_t⁰0+1

· · · R1,Z₁T₂^T· · ·T_t^T0+1 · · ·

· · · R_2,Z₂T₃^T· · ·T_t^T0+1 · · · ...

· · · R_t⁰_,Z_t₀T_t^T0+1 · · · L^T_t0+1













coords.

ofZt⁰+1 ,

wherein L_t⁰₊₁ =

T_t⁰₊₁L_t⁰+1,∗ K

, and the presentation considers the case t⁰ > 2 . However, the rank of Z_t⁰₊₁ and therefore the rank of its coordinate block column in the previous display equals at mostk⁰⁰_t0+1+m. If (t⁰+ 1)m >rkP_(V(t0))^⊥Z_t⁰₊₁ = rkL^T_t0+1, then an intermediate auxiliary basis change allows a reduction of the number of rows of L^T_t0+1. This additional transformation is also needed to develop the smoothing stage.

The output generated during the filtering stage is comparable to that of the Gram-Schmidt orthogonalization<2.3>. Each iteration—indexed byt—of<4.24>considers a sequencey_t,1, . . . ,y_t,k_t instead of a single y_t as in<2.3>, but also involves a orthogonal-ization part (line 4 and 10) and a scaling step (line 5). However, the complexity of the orthogonalization part—measured as the number of orthogonalization stepsk_t—does not increase systematically with tin <4.24>—or equivalently j in<2.3>. This reduction in computational effort provides the gain from exploding the present structure.

The recursion <4.24>does not yield the coordinates with respect to Q_t+1, . . . , Q_n of the columns of P_VX_t, t < n. These coordinates are however needed in the smoothing stage. A reconsideration of the first step of the filtering stage yields a suitable extension of<4.24> in form of the above mentioned auxiliary basis change. More specifically, the first step of filtering yields the modified representation

[Y₁ Z₁ Y₂ Z₂ · · ·] = [Q₁ V₁⁰ W₂ V₂ · · ·]







R1,Y1 R1,Z1 R1,Z1T₂^TA^T₂ R1,Z1T₂^T R1,Z1T₂^TT₃^TA^T₃ . . . L^T_2,∗

L^T₂A^T₂ L^T₂ L^T₂T₃^TA^T₃ . . . S₂^T

... . ..





 .

Therein, the 2m×(k₂⁰⁰+m) matrixL^T₂ exhibits an extended singular value decomposition

u₁ · · · u¯_rk_LT

2 u¯_rk_LT

2+1 · · · u¯_2m

left singular vectors ofL^T₂

extension to orthon. basis ofR^2m U¯2

σ₁(L^T₂) . ..

σ_rk_L^T

2(L^T₂)













nonzero singular values ofL^T₂

D¯2

V¯₂^T

right singular vectors ofL^T₂

L^T₂ =

L^T_2,∗T₂^T K^T

wherein rkL^T₂ <2m is assumed for the sake of presentation, but rkL^T₂ = 2mis possible.

In the latter case, the lower block of zeros in the singular value matrix as well as the extension ¯u_rk_L^T

2+1, . . . ,u¯_2m disappears. In any case, one has (in)equalities m ≤rkL^T₂ = rkP_(V(1))^⊥Z₂ ≤rkZ₂ ≤k₂⁰⁰+m, wherein the first inequality is due to the presence ofK^T. This extended singular value decomposition leads to the equalities

W₂⁰U¯₂ = [V₁⁰ W₂] ¯U₂ =

W₂⁰⁰ W_2,∗⁰⁰

and U¯₂^T

L^T_2,∗

R_2,Z₁ L^T_2,∗∗

. <4.25>

Therein, the leftmost term of the first equality amounts to the composition of two unitary maps, thus, is itself unitary. The righthand side of the previous display shows the coordinates of P_(V(1))^⊥Z₁ with respect to the columns of

W₂⁰⁰ W_2,∗⁰⁰

as ¯U₁U¯₁^T = P

i≤2mu¯_ih¯u_i, ^•i ∈ R^m×m represents the orthogonal projector P_R^2m, hence, equals the identity matrix I. The partition of this coordinate matrix corresponds to that of the matrix ¯U₂, that is, R_2,Z₁ is the rkL^T₂ ×(k₂⁰⁰+m) matrix containing the inner products of the original coordinate vectors with ¯u₁, . . . ,u¯_rkL^T

2. This notation leads to

[Y₁ Z₁ Y₂ Z₂ · · ·] = Q₁ W₂⁰⁰ V₂ · · · W_2,∗⁰⁰







R_1,Y₁ R_1,Z₁ R_1,Z₁T₂^TA^T₂ R_1,Z₁T₂^T R_1,Z₁T₂^TT₃^TA^T₃ . . . L¯^T₂

B₂^T

L¯^T₂A^T₂ S₂^T

L¯^T₂ L¯^T₂

T₃^TA^T₃ . . . ... . ..

L^T_2,∗∗





 ,

wherein ¯L^T₂ = ¯D₂V¯₂^T =

u₁ · · · u¯_rk_L^T

L^T₂, B₂^T = ¯V₂D¯⁻¹₂ R_2,Z₁, and therefore ¯L^T₂B^T₂ = R_2,Z₁. Herein, the case of L^T_2,∗ being an m×(k₂⁰⁰+m) zero matrix is possible and indi-cates imgZ₁ ⊂imgY₁. Then, all entries of the matricesR_2,Z₁,L^T_2,∗∗, andB₂^T equal zero.

The coordinate matrix in the previous display—ignoring its final block row—exhibits the same structure as before the auxiliary basis change but with L^T₂ replaced by ¯L^T₂. Consequently, the second filtering step may proceed as above to obtain an orthonormal basis of imgP_(V(1))^⊥Y₂ by transformation of the columns of the linear map

W₂⁰⁰ V₂ . Inserting an auxiliary basis change of the form <4.25> and some rearrangements of the basis elements at the end of every filtering step yields the matricesB^T₂, B₃^T, . . . , B_n^T, which are needed in the smoothing stage. Exchanging line 8 of <4.24> with

8 L⁰_t+1 =

L^T_t+1,∗T_t+1^T K^T

u₁ · · · u¯_rk(L⁰

t+1)^T

iD¯_t+1V¯_t+1^T

9 B_t+1^T = ¯Vt+1D¯⁻¹_t+1 h

u1 · · · u¯_rk(L⁰

t+1)^T

L^T_t+1,∗

10 L^T_t+1 = ¯D_t+1V_t+1^T

<4.26>

provides a suitably extended filtering procedure, wherein the notation is adapted to that of <4.24>. A matrix L^T_t+1,∗ obtained using the extension <4.26> contains at most k_t⁰⁰+m rows; thus, there is no systematic increase of the row count. Moreover, the orthonormal basis q_1,1⁰ , . . . , q⁰_1,k

1, . . . , q_n,k⁰

n of V evaluated when using the above ex-tension coincides with the orthonormal basisq_1,1, . . . , q_n,k_n considered by the unmodified recursions <4.24> up to sign changes. In fact, both bases lead to a representation—in the sense of section2.2.1—of the columns of Y = [Y1 . . . Yn] in form of an k×k upper triangular matrix with nonzero diagonal elements. As a consequence, the same equality assertion applies to the corresponding coordinates of the projections ontoV^(t),t ≤n.

The extended filtering stage concludes with a representations of the columns of the linear map [Y₁ Z₁ Y₂ Z₂ · · · Y_n Z_n] as shown in figure4.10. Consequently, the smoothing stage may proceed as shown in <4.27>. The latter uses the same notation as <4.24>

R1,Y1R1,Z1...R1,Z1TT 2···TT n−1AT n−1R1,Z1TT 2···TT n−1R1,Z1TT 2···TT nAT nR1,Z1TT 2···TT n

. . . . .. . . . . . . . . . . . .

Rn−1,Zn−1BT n−1···BT 2...Rn−1,Yn−1Rn−1,Zn−1Rn−1,Zn−1TT nAT nRn−1,Zn−1TT n Rn,ZnBT n···BT 2Rn,ZnBT nRn,YnRn,Zn LT n+1,∗BT n···BT 2LT n+1,∗BT nLT n+1,∗ LT 2,∗∗ . . . . .. TTTT LB···BLn,∗∗n−12n,∗∗

                      

                      

[Y1Z1···Yn−1Zn−1YnZn]=

Q1...Qn−1QnV0 nW0 2,∗∗···W0 n,∗∗

R,whereinRequals coords.ofY1coords.ofZ1coords.ofYn−1coords.ofZn−1coords.ofYncoords.ofZn

coords.

ofort h.

proj.

onto V

coords.

ofort h.

proj.

onto

⊥ V

Figure4.10 Thefigureshowsarepresentation—inthesenseofsection2.2.1—ofthecolumnsofthelinearmap[Y1Z1...YnZn]obtained (implicitly)duringtheextendedfilteringstagedescribedin<4.24>and<4.26>andifeachfilteringstepissupplementedwitha rearrangementofthebasiselementsasindicatedinthedisplayabove<4.26>.

and assumes that the output of the extended filtering stage, in particular, B₂, . . . , B_n, q₁, . . . , q_n, and ˆz_1|1, . . . ,zˆ_n−1|n−1, is available. The vector ˆz_n_|n generated during the final filtering step equals the row P_VZ_n

(ω) of P_VZ_n, thus, requires no further treatment.

1 rn−1|n=B_nR^T_n,Z_nq_n

2 fort =n−1, . . . ,1

3 zˆt|n= ˆz_t|t+r_t|t+1

4 ift >1

5 rt−1|t =B_t(r_t|t+1+R^T_t,Z_tq_t)

<4.27>

Comments and references

Section 4.1 Stewart and Sun (1990, sec. I.5, exercise 3) provide the definition of θ_max in<4.2b>; the notation is borrowed from B¨ottcher and Spitkovsky(2010, ex. 3.5). The cosines of the angles θ₁, . . . , θ_` are sometimes called canonical correlations (Anderson, 1958, sec. 12.2, def. 12.2.1). The content of lemma 4.2 can be found in B¨ottcher and Spitkovsky (2010) and Gal´antai (2008). Wedin (1983, sec. 1) serves as a role model for the discussion in section 4.1.1 and 4.1.2; his figure 4 closely resembles panel (A) of figure 4.2. The latter investigation implies P_V^⊥P_Uv_i⁰ = (cos²θ_i)v⁰_i and P_UP_V^⊥u_i = (cos²θ_i)u_i (Gal´antai, 2008, cor. 3), that is, v⁰_i and u_i provide eigenvectors of P_V^⊥P_U andP_UP_V^⊥, respectively, associated with theeigenvaluecos²θ_i. In particular,U andV^⊥ uniquely determine all of their principal angles—not justθ_maxandθmin,6=0. Wedin(1983, app. 1, (A5)) generalizes the upper bound resulting from <4.1> and <4.3b>. Alterna-tively, Zhu and Knyazev (2013, thm. 4.1, rem. 4.1, tbl. 2 (1,2-entry)) show that if V andU are equal dimensional, then thei-th singular value ofP_VP_U/V =P_V −P_{V /U} equals tanθ_i, which implies <4.1>. Their figure 1 illustrates this phenomenon.

Section 4.2 Bj¨orck (1996, sec. 5.1.1) considers the sequential least-squares problem in <4.7> as a generalization of the classical constrained least-squares problem. Com-parable convergence assertions to those at the end of section4.2.2 are given by Lawson and Hanson (1974, ch. 22), Stewart (1997), Ansley and Kohn (1985), and Koopman (1997) amongst others. The related considerations inDe Jong(1991) andEubank(2006, sec. 6.2.2) (implicitly) utilize the h^•, ^•i_∗-related construct.

The discussion of the case img Y x

∩U^⊥={0}can be extended to imgY ∩U^⊥ ={0}

by formal manipulation. If the latter holds, then ker [Y x] ⊂ kerYˆ xˆ

is possible.

Nonetheless, the bilinear maph^•, ^•i_◦may be defined—in analogy—onW⁰×W⁰, however, does not generally provide an inner product. Then, the orthogonality considerations of the main text still apply. In particular, theh^•, ^•i_◦-orthogonal projector equals P_{V /V}_∗^⊥x, but a zero h^•, ^•i_◦-residual length is possible even if xis not an element of V = imgY. Section 4.3 Cressie (1991, sec. 3.4) refers to similar predictions—but based on an orthogonal projector instead of an oblique projector—as universal kriging predictions;

this term is the usual one in spatial statistics (Sherman, 2011, sec. 2.4). Cressie(1991,

sec. 3.4.5) also mentions the geometric perspective taken here. Best linear unbiased prediction (BLUP) is another catchword for predictions based on an orthogonal pro-jector (Robinson, 1991). Doran (1992) verifies that—in the setting considered in sec-tion 4.4.3—predictions based on orthogonal projections interpolate observed values.

The computation of the (matrix of) coordinates with respect to the columns of P_(span{1})⊥∗Y¯ of the h^•, ^•i_∗-orthogonal projections of the columns of P_(span{1})⊥∗X¯ onto imgP_(span{1})⊥∗Y¯ provides an alternative to the approach of this text, which ultimately calculates weighted sums ofq₁^∗(ω), . . . , q_k^∗0(ω) instead of P_(span{1})⊥∗y¯_t,i

(ω), (t, i)∈I_obs. The ratio ha,var(˜x)ai/ha,¯ var(ˆx)ai¯ in the upper bound in corollary 4.5 equals

ha,var(˜x)ai¯

ha,var(¯x)ai − ha,var(˜x)ai¯ = ha,var(˜x)ai/ha,¯ var(¯x)ai

1− ha,var(˜x)ai/ha,¯ var(¯x)ai = r(a) 1−r(a) . Hence, the first upper bound equals sup_a∈img_AT r(a)/[1 − r(a)]

= sup_ar(a)/ 1 − sup_ar(a)

as r 7→ r/(1−r) is monotone increasing on [0,1). The latter equals (1− R²_min)/R²_min, wherein R²_min = inf_a∈img_AT 1−r(a)

has the interpretation of a minimal (population) coefficient of determination across the random variables ¯x^Ta,a∈imgA^T. Section4.4 The discussion of section4.4.1resemblesMorf and Kailath(1975, sec. IV);

their equation (40) contains the equation<4.18>. Golub and Van Loan(2013, sec. 5.1.2, 5.2.2) derive the representation of Householder transforms given in section4.4.2 as well as the associated triangularization algorithm. The latter considers all ofz₁, . . . ,z_` from the start whereas the Gram-Schmidt process<2.3>introduces them one after the other.

Reorganizing<2.3>to obey the former strategy amounts to orthogonalization of ˜z_j+1^(j−1), . . . , ˜z_`^(j−1)againstq_j immediately following its calculation. Bj¨orck(1996, algorithm 2.4.3) calls this modificationrow oriented modified Gram-Schmidt process. Panel (A) and (B) of figure4.7 are akin to Trefethen and Bau (1997, fig. 10.1, 10.2), respectively.

Section 4.4.2 requires linear independence of z₁, . . . , z_`. A generalization as in sec-tion 2.2.2 is immediate but not needed here. In fact, an implementation using finite precision arithmetic requires more refined methods of handling linear dependence as identification of zero elements is nontrivial if rounding errors are present. Golub and Van Loan(2013, sec. 5.4.2) considers a popular (rearranging) technique—calledpivoting.

Eubank (2006, ch. 2–5) develops the (Kalman) filtering and (Kalman) smoothing recursions by geometric arguments. Paige(1985) provides a similar presentation. In the usual terminology, lines10and4–6of<4.24>amount to themeasurement update; lines8 and9form thetime update. The former constructs the matrixD_tand then modifies it by pre-multiplication with special matrices. Kailath et al.(2000, ch. 12) discuss sucharray algorithms in-depth including their geometry. Zhang and Li(1996) suggest using singular value decompositions for filtering and smoothing. Extending the algorithm consisting of <4.24>, <4.26>, and <4.27> to yield Gramians of the corresponding residuals leads to so-calledsquare-root algorithms (Morf and Kailath, 1975).

Anderson, T. W. (1958).An introduction to multivariate statistical analysis. Wiley series in probability and mathematical statistics: Probability and mathematical statistics. New York: Wiley.

Im Dokument A framework for spatiotemporal prediction with small and heterogeneous data - and an application to consumer price indexes - (Seite 111-138)