Principal Component Analysis - Thesis Structure

1.2 Thesis Structure

2.1.2 Principal Component Analysis

Although there are many differences between PCA and FA, both concepts are sometimes treated equally.

For this purpose, we capture the definition of PCA (Jolliffe, 2002, pp. 1-6, Section 1.1) in the successive lemma, before we discuss its advantages and disadvantages compared to FA. Both techniques can provide the same results, however, this remains valid only under specific conditions, which we also address. Finally, we state a non-exhaustive list of papers estimating FMs using PCA. In the sequel, letR+ be the positive real line andkuk₂=p

u⁰u denotes the Euclidean norm or 2-norm of the vector u∈R^N. Lemma 2.1.5 (Principal Components)

AssumeXt∈R^N as random vector, whereλ1> . . . > λN ∈R+ are the descendingly ordered eigenvalues of its covariance matrixΣX ∈R^N×N with orthonormal eigenvectors u1, . . . ,uN ∈R^N. This means, for 1≤l < k ≤N the eigenvectors satisfy:u⁰_ku_l= 0 (orthogonal) andkukk²₂=u⁰_ku_k = 1 (normal). Then, the k-th principal component u⁰_kXt maximizes the variance in the elements of Xt, that is, u⁰_kΣX u^k, and is uncorrelated to all previous principal componentsu⁰_lX_twith1≤l≤k−1. Furthermore, it follows for the variance of thek-th principal component: Var[u⁰_kXt] =λk.

Proof:

The method of Lagrange multipliers with Lagrange multiplierλand normalization constraintku₁k²₂= 1 provides for the first principal component the following maximization problem:

u⁰₁ΣX u¹−λ(u⁰₁u1−1).

Now, the partial derivatives with respect to the vectoru₁and searching for the zeros of the arising system of linear equations yield:

ΣX −λIN

u1=0N,

which is solved by all eigenvalues and their associated eigenvectors. Because ofu⁰₁ΣX u¹=λu⁰₁u1=λ, which we shall maximize,λhas to be the largest eigenvalue. Next, the fact that the principal components u⁰₂X_tandu⁰₁X_tare uncorrelated arises from the assumed orthogonality of the vectorsu₁ andu₂in the following manner:Cov [u⁰₂Xt,u⁰₁Xt] =u⁰₂ΣX u¹=λu⁰₂u1= 0. Using Lagrange multipliersλandφwith the orthonormality of the vectorsu₁andu₂the method of Lagrange multipliers results in the subsequent maximization problem for the second principal component:

u⁰₂ΣX u²−λ(u⁰₂u2−1)−φu⁰₂u1.

The partial derivatives with respect tou2 cause the following equation system:

2ΣX u²−2λu2−φu1=0N.

By multiplyingu⁰₁ from the left to both sides of the above equation we receiveφ= 0 and end up with:

ΣX u²−λu2=0N.

By similar reasoning as before, we conclude thatλis the second largest eigenvalue of ΣX andu₂ is its normalized eigenvector. An interative application of this procedure eventually proves the statement for

all principal componentsu⁰_kXtwith 3≤k≤N. 2

Note, Lemma 2.1.5 assumes all eigenvalues of the covariance matrix ΣX as distinct and positive. Forn equal eigenvalues with 2≤n≤N, then-dimensional space spanned by their eigenvectors is unique, but the eigenvectors themselves are exchangeable and thus, are not clearly identifiable (Jolliffe, 2002, p. 27, Section 2.4). The normlizationu⁰_kuk= 1 in Lemma 2.1.5 ensures to reach the maximum for finiteuk, but it is only one, perhaps the most common one, of serveral alternatives (Jolliffe, 2002, p. 5, Section 1.1). In empirical studies, the covariance matrix ΣX is usually replaced by the empirical covariance matrix ˆΣX in (2.2).

For distinguishing features between PCA and FA, we follow Jolliffe (2002, pp. 150-161, Sections 7.1-7.3).

First, FA assumes an underlying model as in Definitions 2.1.3 and 2.1.4, whereas PCA is a non-parametric approach and does not assume such a model. Second, for the same panel data the number of factors and principal components might be different. Guess there is a time series that is uncorrelated to the remaining ones of the panel data. Then, in PCA this time series likely becomes a principal component, but no factor in FA. In case of PCA, it specifies an own eigenvectorui∈R^N of the covariance matrix ΣX. In the end, it depends on the total number of principal componentsKand the variation covered byu⁰_iΣX uⁱ, whether the principal componentuiXtis chosen or not. If a time series behaves indenpendently to the remaining ones, FA assigns this individual nature to an idiosyncratic shock instead of a factor, since the factors cover communalities of the panel data. This fact highlights the third characteristic. In PCA, the focus lies on the diagonal elements of the covariance matrix ΣX, while in case of FA the off-diagonal entries matter more. Fourth, especially in empirical studies, the true number of factors or principal components is unknown and therefore, has to be estimated. If the number of principal components increases fromK1

toK2,K2−K1 new principal components are added to the originalK1ones. By contrast, if the number of factors increases fromK₁ to K₂, K₂ new factors are determined, which not necessarily comprise the formerK1 ones. Fifth, principal components arise from an exact linear function of the panel data, that is, u⁰_kX_t, while factors are a linear combination of the panel data and errors. Due to these differences Jolliffe (2002, p. 150, Chapter 7) assessed the use of PCA as part of FA as “bending the rules that govern factor analysis”.

Despite the differences between PCA and Factor Analysis, PCA often provides initial parameter estimates for FA. Similar to Jolliffe (2002, p. 157, Eq. 7.2.3) we have:

Remark 2.1.6 (PCA for Parameter Initialization in Factor Analysis)

Assume the SFM in Definition 2.1.3 and let λ1> . . . > λN be the descendingly sorted eigenvalues of the covariance matrixΣX with orthonormal eigenvectorsu₁, . . . ,u_N ∈R^N. Then, it holds for the parameters of a Static Factor Model initialized using PCA:

X_t=h

which coincides with the ASFM in Definition 2.1.3. In general, we cannot assume that the idiosyncratic shocks are cross-sectionally uncorrelated such that the conditions of an ESFM might be violated.

Under certain conditions PCA and FA can be reconsiled. For the ESFMs in Definition 2.1.3 with isotropic shocks, i.e., we have Σ =σ²I_N, Tipping and Bishop (1999) showed how to determine principal compo-nents using MLE. To highlight the underlying probabilistic framework they introduced the term Prob-abilistic Principal Component Analysis (PPCA). In Section 3.1.1, we will reapply their estimation pro-cedure. This is why we repeat their MLE parameter estimates in Theorem 3.1.3. A similar idea pursued Schneeweiss and Mathes (1995) by analyzing how small deviations between factors and principal com-ponents can be. For further reading on the reconsilement of PCA and FA see, e.g., Jolliffe (2002, pp.

158-161, Chapter 7.3).

PCA and FA share an important feature, namely, both techniques admit a reduction in dimension, when panel data is condensed by a few principal components or factors. Since PCA is a well-known concept in the literature, especially for now- and forecasting applications, we review some work in this area. Stock and Watson (2002a,b) forecasted univariate time series based on factors, which obey an Approximate FM and are estimated using principal components. In addition, they suggested the combination of PCA and an EM for parameter estimation with incomplete panel data, which is revived in Schumacher and Breitung (2008) and Marcellino and Schumacher (2010). The two-step estimation method for the FAVARs in Bernanke et al. (2005) first extracts factors from panel data using PCA and then, applies an Ordinary Least Squares Regression (OLS) for estimating the coefficient matrices of the factor dynamics. Bai and Ng (2002, 2006, 2008a,b) derived panel and information criteria for model selection, proved consistency and asymptotic intervals of predicted variables, provided a general overview and considered non-linear or targeted predictions, when factors are estimated using PCA. As in Bernanke et al. (2005), De Mol et al. (2006) compared Bayesian and PCA based estimation methods. Doz et al. (2011) proposed a two-step estimation method for ADFMs, which first combines PCA and OLS. In the second step, the factors are reestimated by the KS. This approach was applied or modified in Giannone et al. (2004,

2008), Hogrefe (2008) and Angelini et al. (2010). Bai and Ng (2013) studied conditions such that PCA provides asymptotically unique factor estimates, that is, they aimed to remove the uniqueness except for rotation. Finally, Stock and Watson (2011) summarized recent developments regarding FMs. Thereby, they collected contributions and results of PCA in this field.

Im Dokument Estimation of factor models with incomplete data and their applications (Seite 21-24)