A poor man’s factor model - Regularized least-squares estimation 42

3. Regularized least-squares estimation 42

3.4. A poor man’s factor model

3.4.1. Temporal dependence

This section considers the span W of a finite sequence of P-square integrable random variables v_t,j defined on a probability space (Ω,F,P) and with the index (t, j) ranging over a subsetI_v of N×N. The inner producth^•, ^•iis given by theP-expectationExy = R x(ω)y(ω)P(dω) =hx, yiof the productxy—as in example(e)of sections2.1and2.4.1—

and equips this linear space with a Euclidean geometry such that v_t,j, (t, j) ∈I_v, form an orthonormal basis ofW. Additional random variables x_t,j with (t, j) ranging over a subsetI_xofN×Nresult as linear combinations of the basis elementsv_t,j, (t, j)∈I_v. This setup allows a formal representation as at the end of appendix 2.a. For now, the vector α∈R^k, k ∈N, gathers all coordinates ofx_t,j, (t, j)∈I_x, with respect to v_t,j, (t, j)∈I_v. Section 3.5 focuses on the task of estimating a transformation Θ∗ ∈ S^m of α using a single realization—the data—x_t,j(ω), (t, j)∈I_x, and knowledge of the overall structure including thatαlies in a subsetM⊂R^k. In this framework, a successful (relative toM) estimation strategy approximately recovers the respective transformation Θ∗(α⁰) from data generated using α⁰ ∈Mirrespective of the particular value of α⁰ ∈M.

The space W is spanned by random variables vt,j. Herein, the first index t ranges from 1−l ton forj ≤h and from one ton for h+ 1≤j ≤m for somem, n∈N,l ≥0, and 0≤h≤m. These random variables are independent and with zero mean Ev_t,j = 0 as well as Ev²_t,j = 1, thus, form an orthonormal basis of W. The data used in the following sections equal one realization of the columns ofX_t= [x_t,1 · · · x_t,m] given by

X_t=F_tU₁^T+ρV_t,2U₂^T

Ft= [ft,1 · · · ft,h] =Vt,1A0+X

i≤lVt−i,1Ai

, 1≤t ≤n . <3.8>

Herein, V_t,1 = [v_t,1 · · · v_t,h], 1−l ≤ t ≤ n, and V_t,2 = [v_t,h+1 · · · v_t,m], 1 ≤ t ≤ n.

In addition, A₀, . . . , A_l ∈ R^h×h are diagonal matrices, ρ > 0, and the columns of U = [U₁ U₂] = [u₁ . . . u_h u_h+1 . . . u_m] form an orthonormal basis of R^m. If h > 0, then all diagonal entries ofA₀as well as at least one diagonal entry ofA_lare nonzero. The former requirement guarantees thatx_t,j are linearly independent; the latter gives meaning to l.

Finally, ifh= 0, then all quantities related to the first summand ofX_tand, in particular,

x1,1 . . . x1,m

x_2,1 . . . x_2,m ... . .. ... x_n,1 . . . x_n,m

f1,1 . . . f1,h

f_2,1 . . . f_2,h ... . .. ... f_n,1 . . . f_n,h

v_1−l,h . . . v2−l,h . . . ... . ..

v1,h . . . v2,h . . . ... . .. vn,h . . . . . . v1,h+1

. . . v_2,h+1 . .. ... . . . v_n,h+1

Observables Factors

Factor basis elements

Non-factor basis elements time

space

Figure 3.7

The figure illustrates the construction ofxt,1, . . . , xt,m in<3.8>as linear combinations of the basis elementsv_t,h+1, . . . , vt,mand the factorsft,1, . . . , f_t,h. Herein, the caseh= 0 (no factors) is allowed, but the figure concerns the case 1< h < m. The factors equal linear combinations of a “rolling window” of the basis elementsvt,j,j ≤h, with identical second indexj. Dashed lines surround the columns of Xt and Ft, respectively.

the second equation of the specification <3.8>disappear. Then, x_t,1, . . . , x_t,m are even mutually orthogonal. Below, this extreme case usually receives—but also requires—no explicit mention in order to simplify the exposition. The same applies to the caseh=m which eliminates all quantities related to second summand ofX_t such as U₂ and ρ.

The random variables xt,j, (t, j)∈Ix, represent a (numerical) characteristic—referred to as x—of m spatial entities at n points in time. In particular, the first index t in-dicates the respective time point; the second index j points to the location in space.

This interpretation suggests calling the subspaces imgXt−1 and img [Xt−1 · · · X1] the recent past and the past of x at t > 1, respectively. Elements of the innovation space span{v_t,1, . . . , v_t,m}attlie in the orthogonal complement of the past ofxatt. Their part in span{vt,h+1, . . . , vt,m}exerts only momentary influence. In contrast,vt,1, . . . , vt,h

enter in the construction of thefactor

factor

sf_t,1, . . . ,f_t,h attand thereby impact the columns of X_t+1, . . . , X_n. These factors f_t,1 = X_tu₁, . . . , f_t,h = X_tu_h lie in imgX_t by virtue of the (pairwise) orthogonality of the columns of U = [U1 U2] = [u1 . . . uh uh+1 . . . um].

Each factor sequence f_1,j, . . . , f_n,j, j ≤ h, embodies one of a small number—h is thought to be “much smaller” than m—of underlying determinants of x. The ele-ments f1,j, . . . ,fn,j of the j-th factor sequence equal linear combinations of overlapping subsets of the basis elementsv1−l,j, . . . , v_n,j. Thus, the factor variablesf_t,j are generally independent acrossj ≤h but dependent across the time index t ≤n unless l = 0. Fig-ure 3.7 contains a visual summary of the construction in<3.8> for the case 1< h < m, and, in particular, highlights the overlap of the subsets of basis elementsv1−l,h, . . . , v_n,h needed to construct the members of theh-th factor sequence f_1,h, . . . , f_n,h.

The coefficient matrices U1 and U2 govern the dependence among the columns of Xt

and are discussed in more detail in section 3.4.2. The equality A₀ = ρI generates a notable special case. Herein,I = [e₁ · · · e_h] denotes theh×hidentity matrix, and (thus) e₁, . . . , e_h symbolizes the standard basis of R^h. Then, the specification <3.8> becomes

X_t=

i≤l

Vt−i,1A_i

U₁^T+ρ[V_t,1 V_t,2] U₁^T

U₂^T

, 1≤t ≤n , <3.9>

wherein the first term disappears if l = 0. Moreover, the columns of the final term, which equals a scaled composition of unitary maps, amount to m pairwise orthogonal elements of the innovation space (att) of lengthρ. These columns represent idiosyncratic innovations to the individualx_t,j. In particular,U₁controls the entire spatial—acrossj— Euclidean space dependence between the observablesxt,1, . . . , xt,m.

Many properties of the setting in <3.8> are reflected by the implied (unordered) spectral decompositions of the symmetric inner product matrices hhX_t, Xt−sii given by

hhX_t, X_t−sii=









 U

Pl i=0A²_i

ρ²I

U^T , s= 0 , U₁(Pl−s

i=0A_iA_i+s)U₁^T , 0< s≤min{l, t−1}, m×m zero matrix , s >min{l, t−1}.

<3.10>

Firstly, time invariance of the coefficients in<3.8>ensures the absence oft on the right-hand side of <3.10>. Secondly, the inner product matrices hhXt, Xt−sii are symmetric due to the specific separation of time and space dependence. Thirdly, if s > 0, then rkhhX_t, Xt−siidoes not exceed the number of factorsh, which provide the sole link acrosst.

These properties become evident when projecting the elementsxt,1, . . . ,xt,m, 1< t≤ n, onto the recent past imgXt−1 of x. In fact, the coordinate matrix Θ∗ with respect toXt−1 of the composition P_img_X_t−1X_t = Xt−1Θ∗ coincides for all t ≥ 2. It is uniquely determined by the conditionhhXt−1, Xt−1iiΘ∗ =hhXt−1, Xtii, thus, equals

Θ_∗ =U₁Γ_∗U₁^T =U₁

Xl i=0A²_i

−1

X^l−1

i=0A_iA_i+1

U₁^T, <3.11>

wherein the superscript⁻¹ marks the inverse of (the bijective linear map)Pl

i=0A²_i. The (bracketed) diagonal matrix Γ∗ ∈R^h×h provides the coordinates inPimgFt−1Ft =Ft−1Γ∗. If eitherh= 0 or h >0 together with l = 0, then the Θ∗ equals the m×mzero matrix.

If h≥1 and l≥1, then these considerations lead to the alternative representation Xt=Xt−1Θ∗+

Ht+Rt

=Xt−1Θ∗+ ¯Et, <3.12>

H_t=

i≤lVt−i,1(A_i−Ai−1Γ∗)−Vt−l−1,1A_lΓ∗

U₁^T , 2≤t≤n , R_t= [V_t,1 V_t,2]

A₀ ρI

U₁^T U₂^T

The inner product matrix hhXt−s, R_tii has all its entries equal to zero if s≥ 1. If l > 0,

then the same applies to hhXt−s, H_tii for s = 1 but generally fails for t ≥ 3 and 2 ≤ s ≤ min{l+ 1, t−1} as elements of imgH_t are not contained in the innovation space at t. However, if A₀ = ρI, A_i = ρDⁱ, 1 ≤ i ≤ l, for some diagonal matrix D ∈ R^h×h with diagonal entries |d_i,i| < 1, then elements of img ¯E_t approach the subspace imgR_t of the innovation space as l → ∞. More specifically, one may consider a sequence of Euclidean spaces of the above type—indexed by k ∈ N—such that l = l_k increases in parallel with the sequence indexk. No further definition is required asmis shared across these spaces, andA_i =ρDⁱ is valid for all i ∈N. Then all of these spaces come with a measure of distance sup_{x∈img ¯}_E_t_∩{k•k=1}kP_(img_R_t₎^⊥xk, and the sequence of these distances approaches zero. Moreover, this case features the equality A₀ = ρI, thus, is a special case of <3.9> and therefore exhibits hhR_t, R_tii = ρ²I. In the above “asymptotic” sense, the symmetric matrix Θ∗ controls the transition from the recent past to the present and

is therefore called the transition matrix. If l = 0, then X_t = R_t. Thus, the transition ^transitionmatrix

matrix Θ_∗ is zero, and these considerations are meaningless.

3.4.2. Spatial dependence

Figure3.7proposes two views on the observables: firstly, as mtime series

time series

x_1,j, . . . , x_n,j (dotted lines), that is, sequences of random variables indexed by time, and, secondly,

asn random fields random fields

x_t,1, . . . ,x_t,m (dashed lines)—sequences of random variables indexed by space. From a constructional point of view, the presentation in <3.8> stresses the first of these interpretations: the observable time series result as linear combinations of the factor time series f_1,j, . . . , f_n,j, j ≤ h, and the non-factor time series v_1,j, . . . , v_n,j, h + 1 ≤ j ≤ m. The random vectors x_t = (x_t,1, . . . , x_t,m), t ≤ n, facilitate a presentation stressing the second interpretation. More specifically, expressing the relations <3.8> in terms of these random vectors and the similarly defined random vectorsf_t= (f_t,1, . . . , f_t,h),v⁽¹⁾_t = (v_t,1, . . . , v_t,h), andv⁽²⁾_t = (v_t,h+1, . . . , v_t,m) leads to

x_t=U₁f_t+ρU₂v⁽²⁾_t f_t=A₀v_t⁽¹⁾+X

i≤lA_iv_t−i⁽¹⁾ , t≤n , <3.13>

wherein the second summand of the second equation is present only if l > 0. In par-ticular, the formulation in <3.13> emphasizes that realizations xt(ω) ∈ R^m, given by

x_t,1(ω), . . . , x_t,m(ω)

, of the random vectorsx_tconsist of two mutually orthogonal parts.

The first partU₁f_t(ω) = P

j≤hf_t,j(ω)u_j reflects the influence of the factors. The second part ρU₂v⁽²⁾_t (ω) = Pm

j=h+1ρv_t,j(ω)u_j captures deviations associated with the specific time point t. In particular, the columns u₁, . . . , u_h of U₁ may be understood as h

“spatial patterns” whose strengths at timet is determined byf_t,1, . . . ,f_t,h, respectively.

These patterns u₁, . . . , u_h amount to functions—as explained in example (a) of sec-tion 2.1.1—on the space index set {1, . . . , m}. Herein, some form of smoothness of the “spatial patterns”u_j is expected. Squared difference quotients of the form u_j(i⁰)− u_j(i)2

dist(i⁰, i)²=w_i⁰_,i(u_i⁰_,j−u_i,j)²,i⁰ 6=i, measure their roughness, wherein dist(i⁰, i) = dist(i, i⁰) andw_i⁰_,i ≥0 denote a symmetric notion of distance between locationsi⁰ and i

and the square of its reciprocal, respectively. The subsequent discussion refers to dist only throughw_i,i⁰ =w_i⁰_,i,i6=i⁰. In fact, the role of dist is to facilitate the interpretation, and usingw_i,i⁰ = 0 to represent “infinite distance” introduces no technical complications.

If one sets w_i,i = 0 for all i ≤ m, then the integral of the difference quotients corre-sponding to a fixedj ≤hwith respect to the product (counting) measure on{1, . . . , m}×

{1, . . . , m}may be expressed in the form P

i⁰,i≤mw_i⁰_,i(u_i⁰_,j−u_i,j)² = 2hu_j,Λu_ji, wherein the matrix Λ is defined in the following display. This equality implies that

Λ =





 P

i⁰≤mw_i⁰_,1

. ..

i⁰≤mw_i⁰_,m





−







0 w_1,2 . . . w_1,m w_1,2 0 . . . w_2,m

... ... . .. ... w1,m w2,m . . . 0







<3.14>

is positive semidefinite and is subsequently assumed to be nonzero, that is, at least one pair i, j ≤ m exhibits finite distance. The form of Λ implies (1, . . . ,1) ∈ ker Λ, which fits the role ofu7→ hu,Λuias a measure of roughness and reveals rk Λ< m. More precisely, one has rk Λ = inf

m−k

there exists a partitionC₁, . . . , C_kof{1, . . . , m}with i∈C_s63 i⁰ ⇒w_i,i⁰ = 0 . In fact, the infimum m−k∗ is attain due to the well-ordering principle. If C₁, . . . , C_k_∗ form a corresponding partition, a_j =P

i∈C_je_i, j ≤ k∗, and R provides a Cholesky factor of Λ, thenkRa_jk² =ha_j,Λa_ji= ¹₂P

i,i⁰≤mw_i,i⁰(a_i,j−a_i⁰_,j)² = 0 as i ∈ C_s, i⁰ ∈ C_t with either s = t and therefore a_i,j = a_i⁰_,j or s 6= t and therefore w_i,i⁰ = 0. Conversely, if a∈R^m exhibits entries a_i 6=a_i⁰ withw_i,i⁰ 6= 0, then ha,Λai>0.

Due to its symmetry, Λ exhibits a spectral decomposition Λ = P

i≤rk Λσ_i(Λ)o_iho_i, ^•i, wherein o₁, . . . , o_{rk Λ} represents an orthonormal sequence of singular vectors of the form given in lemma 2.4. In this notation, the suggested measure of roughness of u_j equals kΛ^1/2u_jk², wherein Λ^1/2 = P

i≤rk Λσ^1/2(Λ)o_iho_i, ^•i does not depend on the par-ticular choice of singular vectors. The same applies to the alternative roughness matrix Λ^q/2 =P

i≤rk Λσ^q/2(Λ)o_iho_i, ^•i, wherein q > 0 allows adjustment of the weightsσ_i^q/2(Λ) for a given distance. More specifically, q < 1 downplays differences in the singu-lar values; q > 1 amplifies these differences. In addition, symmetry and img Λ = span{o₁, . . . , o_{rk Λ}}= img Λ^q/2ensure that ker Λ = ker Λ^q/2and rk Λ^q/2 < m, for allq >1.

The sumkΛ^q/2U₁k² =P

j≤hkΛ^q/2u_jk² measures the total roughness of the (spatial) pat-ternsu_j. The alternative quantitykΛ^q/2Θ∗k² amounts to a weighted sum—with weights equal to the squared diagonal entries of Γ∗—of the individual roughness termskΛ^q/2u_jk². Any valid choice for the above sequence o1, . . . , ork Λ of singular vectors for Λ can be extended to an orthonormal basis o₁, . . . , o_m of R^m. If rk Λ < m−1 or if dim ker Λ±

σ_j(Λ) id

>1 for somej, wherein ¯σ_j(Λ) and id denote thej-th distinct singular value of Λ and the identity map on R^m, respectively, then—according to section 2.5.4—the choice of singular vectors and o_{rk Λ+1}, . . . , o_m involves some ambiguity beyond sign choices.

However, these arbitrary choices are practically immaterial to the subsequent discussion as they do not affect the key quantities derived from the chosen basis. Two observations are essential in this regard. Firstly, one has span{o₁, . . . , o_{rk Λ}} = (ker Λ)^⊥ = img Λ.

Secondly, positive semidefiniteness of Λ implies ker Λ + ¯σ_j(Λ) id

={0} for all distinct

singular vectors. Hence, L^⊥_k = span{o₁, . . . , ok−1}, 1 < k ≤ rk Λ + 1, is unequivocal whenever eitherk = rk Λ + 1 or 1 < k≤rk Λ together withσ_k−1(Λ)> σ_k(Λ).

Every orthonormal basis o₁, . . . , o_m of R^m induces—comparable to e_i and ¯B_i,j in examples (a) and (c) of section 2.1.1—an orthonormal basis ¯O_i,j, i ≤ j ≤ m, of S^m, which is given by ¯O_i,i = o_io^T_i and ¯O_i,j = (o_io^T_j +o_jo^T_i )/√

2 for i < j. In terms of the latter, a “small”—relative to the other parameters such as A₀, . . . , A_l, and ρ—value of kΛ^1/2Θ∗k² corresponds to the transition matrix Θ∗ being close to k-model space

k-model space

V_k = span{O¯_i,j|j ≥i≥k}={A∈S^m| imgA⊂L_k}, L_k= span{o_k, . . . , o_m}, for some “large” k ∈ N. The latter is herein restricted to k ≤ rk Λ + 1 ≤ m with σk−1(Λ)> σk(Λ) if 1 < k ≤ rk Λ to ensure an unambiguous definition. In general, the proximity of Θ∗ to V_k may be expressed in terms of the residual length kP_V^⊥

k Θ∗k² = kΘ∗−P

j≥i≥khΘ∗,O¯_i,jiO¯_i,jk, which should be “small” relative to kΘ∗k.

Im Dokument A framework for spatiotemporal prediction with small and heterogeneous data - and an application to consumer price indexes - (Seite 62-67)