Prior covariance shrinkage estimation - Important factors in DHF Implementation

3.4 Important factors in DHF Implementation

3.4.3 Prior covariance shrinkage estimation

The evaluation of the flow equation (3.37) requires the availability of the prior covariance esti-mate. The covariance estimate can be derived in several ways. The simplest way is to estimate the covariance matrix using the prior particles. This is referred to as the sample covariance estimateS. S is an unbiased estimator of the true prior covarianceP, and is the maximum likelihood estimate if the data is Gaussian distributed. But for nonlinear models / non-Gaussian noises, the Gaussian assumption does not remain valid. AlsoS could progressively get ill-conditioned, i.e. the spread of the eigenvalues gets larger with the time. This is especially the case, when the _N^d_p ratio is non-negligible, wheredis the state vector dimension andNpis the number of particles. As a consequence, the matrix inversion could lead to stability problems.

For the cased > n, the resulting covariance matrix is not even full rank and hence not invertible.

An alternative method suggested by the authors in [DH09a] is to run an EKF/UKF in parallel to DHF, and to use the prior covariance matrix generated by those filters. We refer to such a matrix asPXKF, where XKF could be an extended or unscented versions of the KF. While this method is better than using the raw data based covariance estimate, it ties the DHF estimation accuracy to that of the EKF/UKF.PXKF could also exhibit a wide spread of the eigenvalues.

Therefore, we look for an alternative method of covariance matrix estimation. That method should have two properties: the resulting matrix should always be positive definite (PD) and the matrix should be well-conditioned [SS05b]. One approach could be to start with the sample co-variance, and ensure that the matrix is always PD. Such a matrix might not be well-conditioned.

Alternatively, variance reduction techniques could be used to get a well-conditioned matrix, but this could be computationally expensive [SS05a]. There is another approach used in the multi-variate statistics literature for the estimation of the covariance matrices, known as theshrinkage estimation. The use of such methods dates back to work of Stein [Ste56]. The main idea is to merge the raw estimate (S) which is unbiased but normally with high variance, together with a more structured but typically biased target (B) through a scale factor, to get the combined estimate (P∗). The objective is to reduce the estimation error, typically in a mean squared sense (MSE), by achieving an optimal trade off between the biased (B) and the unbiased (S) esti-mators. The scale factor is also called shrinkage intensityρas it shrinks the eigenvalues of S optimally towards the mean of eigenvalues of the true covariance matrixP [LW04b]. The resulting covariance matrix (P∗) will be biased, but will improve w.r.t. the two aforementioned properties, and is expected to lower the estimation error. There are several shrinkage estimators mentioned in the literature, with different target covariance matrices. In the current work, we

3.4 Important factors in DHF Implementation 55 describe some of the more established shrinkage estimators. In subsections 3.4.3.1 to 3.4.3.3, shrinkage estimators are defined through a convex combination of the matricesBandS. The objective becomes to find an optimal shrinkage intensity that minimizes the cost function,

minρ E[||P∗−P||²] (3.38)

whereP∗=ρB+ (1−ρ)S.

3.4.3.1 Shrinkage towards the Identity matrix

Shrinkage towards the Identity matrix is described in [LW04b]. The two main objectives de-fined are, 1) to get an asymptotically consistent estimator that is more accurate than the sample covariance matrixS, and is 2) also well-conditioned. No prior structure is assumed for the target matrixB, as it could lead to an increased bias. Instead, a simple matrix with same covariance terms and zero cross-variances (scaled identity) is chosen as the target. The shrinkage estimator has the following form

P∗= α²

δ²µ²I+β²

δ²S, (3.39)

The estimatorP∗ asymptotically shrinks its eigenvalues towards the mean eigenvalue of the true covariance matrixP, in quadratic mean sense i.e. The termsα, β,δ andµdepend on the unobserved true covariance matrixP. Therefore, a consistent estimator ofP∗is derived under the assumptions of general asymptotics. We term this estimator asPLW0, and it has the following form,

PLW0= a²_n

d²_nmnI+ b²_n

d²_nS, (3.40)

wheremn=tr(S)/dand,

d²n = ||S−mnI||² ¯b²n= 1 n²

i=1

[||Xnⁱ(Xnⁱ)^T−S||²] b²_n = min(¯b²_n, d²_n) a²_n=d²_n−b²_n.

||.||is the squaredFrobenius normandXnⁱ is theithparticle. Also the shrinkage intensityρ is given by^a_d²ⁿ2

n. It is shown that the MSE forPLW0 asymptotically approaches that ofP∗i.e.

||P∗−P||²n

−E

||PLW0−P||²n

→0. One main advantage of this estimator is that it does not assume any particular distribution for the data, and therefore isdistribution-free.

3.4.3.2 Shrinkage towards the constant correlation matrix

This estimator is derived in [LW04a], in the context of portfolio optimization. The target matrix is chosen according to the constant correlation model. It means that pairwise correlations are identical, which is given by the average of all the sample correlations. We denote this estimator byPLW1. The target matrixBis given by







Sii :i=j

¯ rp

SiiSjj :i6=j

56 3 Log Homotopy based particle flow filters

wherer¯is the average sample correlation. It is defined as

r= 2

d(d−1)

d−1

i=1 d

j=i+1

Sij

pSiiSjj

(3.41) The shrinkage intensity is defined asρ = max{0,min{1,^κ_d}}, withκgiven asκ = ^π−ˆ^ˆ_γ_ˆ^̺. ˆ

πdenotes the sum of asymptotic variances of the entries of the sample covariance matrixS, while̺ˆdenotes the sum of asymptotic covariances of the entries of the shrinkage targetBwith the entries of the sample covariance matrix. ˆγgives a measure of the misspecification of the shrinkage target. The hats (ˆ.) on top of terms indicate the fact that these are the estimates of the true values, which are not known.ˆπare̺ˆare given by,

ˆ π=1

i=1 n

j=1 d

k=1

{(xik−x¯i)(xjk−x¯j)−sij}²

̺=

i=1

ˆ πii+

i=1 n

j=1 j6=i

¯ r 2

rSjj

Siiϑii,ij+ s

Sii

Sjjϑjj,ij

, (3.42)

where

ϑ_ii,ij= 1 d

Xd k=1

{(x_ik−¯x_i)²−x¯_i)−S_ii}{(x_ik−x¯_i)(x_jk−x¯_j)−S_ij}

ϑ_jj,ij= 1 d

Xd k=1

{(x_jk−x¯_j)²−¯x_j)−S_jj}{(x_ik−x¯_i)(x_jk−x¯_j)−S_ij}. (3.43) Finallyˆγis given by

ˆ γ=

i=1 n

j=1 j6=i

(Bij−Sij)². (3.44)

3.4.3.3 Shrinkage towards the perfect costive correlation matrix

The authors in [LW03] suggest single-factor matrix as the shrinkage target. The paper is con-cerned with estimating the structure of the risk in the stock market and the modeling of the stock returns. The fact that stock returns are positively correlated with each other, is exploited. The shrinkage target is given by,

Bij=







Sii :i=j pSiiSjj :i6=j

The resulting linear estimator is denoted asPLW2. The shrinkage intensity has the same form as forPLW1, but with slightly different formula for̺, which is given below.ˆ

̺=

i=1

ˆ πii+

i=1 n

j=1 j6=i

1 2

rSjj

Sii

ϑii,ij+ sSii

Sjj

ϑjj,ij

(3.45)

3.4 Important factors in DHF Implementation 57 3.4.3.4 Empirical Bayesian Estimator

In [Haf80], an estimator for multivariate Gaussian data is derived. It is given by the linear combination of the sample covariance matrixS and the scaled identity matrix. The scaling factor is estimated from the data. We denote this estimator byPEBand it is given by

PEB= Npd−2Np−2

N_p²d [det(S)]^d¹I+ Np

Np+ 1S (3.46)

3.4.3.5 Stein Haff Estimator

This estimator is described in [Ste75]. The general form of the estimator is V(S)Φ(l(S))V(S)^T, where matrixV(S)contains the eigenvectors of the sample covariance matrixSwhileΦ(l(S))is a matrix that is a function of the eigenvaluesl(S)of theS. The data is assumed to be normally distributed. and the sample covariance estimateS is there-fore Wishart distributedS ∼ W(P, d). The Stein-Haff estimator is denoted byPSH. It is constructed by leaving the eigenvectors ofSunchanged while replacing the eigenvalueslby

˜ˆ

li=nli/

n−p+ 1 + 2li Np

j=1,j6=i 1 l_i−l_j

. Eigenvalues can get disordered by the transforma-tion and might become negative, which could lead to the covariance estimate losing its positive definiteness. Therefore another algorithm calledIsotonic regressionis used in conjunction with the transformation [LP85]. This leads to eigenvalues˜l = {l˜1,l˜2...l˜p}^T. Hence the estimate PSHis given byV(S)D(˜l)V(S)^T.

3.4.3.6 Minimax Estimator

The final shrinkage estimator considered is derived in [Ste82]. Again, Gaussian assumption is made. This estimator is termed Minimax because under a certain loss function, it has the lowest worst case error [LW04b]. Its structure is similar toPSH, but sample eigenvalues are replaced by˜li = _n+p−1−2iⁿ li. This estimator is denoted here byPM X. Isotonizing regression is not applied in this case.

There is another interesting covariance estimator by Ledoit and Wolf [LW12] in which nonlinear transformation of the sample eigenvalues is considered. Also, it requires solving a nonlinear optimization problem using sequential linear programming. It is shown that the new nonlinear estimator outperforms the linear shrinkage estimators described earlier in this section. In the current work we don’t consider this method.

58 3 Log Homotopy based particle flow filters

Im Dokument Nonlinear Filtering based on Log-homotopy Particle Flow (Seite 64-68)