Asymptotic Theory for Relative Entropy Measures

Estimation of Transfer Entropy and Other Relative Entropy Measures Based on Smoothed Quantile Regressions 30

5.1.2 Asymptotic Theory for Relative Entropy Measures

As discussed in the introduction to this chapter and listed in Table 5.1, relative entropy measures are constructed as Kulback-Leibler divergences. The discussion in Section 5.1.2 focuses first on the asymptotics of MI as a show case of the method. However, the simulation study for MI will also be accompanied by another simulation study. I also look at the specifics of TE as another example for a relative entropy measure and a special case of CMI in Section 5.1.2. While discretization approaches are deemed appropriate for the estimation of MI, for TE coarse-graining methods have some unsatisfying characteristics (seeKaiser and Schreiber 2002). Therefore, even though I first focus on MI (and leave time series considerations aside), I deem the asymptotic behavior and estimation technique for TE as the major contribution of this chapter.

Mutual Information

While the concept of MI is defined in general for K variables, it is useful to limit the discussion here to the three variable case in order to alleviate notational complexities. A

limitation to two variables would be possible, however, for the extension to conditional MI (and TE as a special case), I use the three variable case in which MI is defined as

I(X, Y, Z) =E[log(

f_X,Y,Z(x, y, z)

f_X(x)f_Y(y)f_Z(z))] (5.19)

= ∫x∈X∫

y∈Y ∫

z∈Z

f_X,Y,Z(x, y, z)C(x, y, z)dzdxdy

= ∫x∈X∫

y∈Y ∫

z∈Z

f_X,Y,Z(x, y, z)log(

f_X,Y,Z(x, y, z) fX(x)fY(y)fZ(z)

)dzdxdy. (5.20) This definition allows for two approaches to calculate MI. The first one calculates the sample mean equivalent of Equation (5.19) over all observations (x_i, y_i, z_i) with i∈ {1, . . . , N}.

I(X, Y, Zˆ ) =E[log(

f_X,Y,Z(x, y, z) f_X(x)f_Y(y)f_Z(z)

)] ≈ 1 N

∑

i=1

log(

fˆ_X,Y,Z(x_i, y_i, z_i) fˆ_X(x_i)fˆ_Y(y_i)fˆ_Z(z_i)

) (5.21) This approach uses the representativity of the sample to circumvent the integration over a grid of artificial support points. Which leads to the other calculation approach: the numerical integration suggested by the integral formulation in Equation (5.20). This entails the integration across a grid that covers the sample space across the various dimensions.

If I chose M grid points then for three variablesX,Y and Z, the integral would need to be evaluated at M³ grid points. The number of points grows exponentially with each dimension, making the calculation practically already for a small number of variables infeasible. The expectation formulation is much more feasible.

For MI, another possibility of estimation seems interesting: One can estimate the involved densities by means of kernel density techniques. For the concept of conditional MI and TE, where the conditional densities are directly needed in the calculation, however, the quantile regression approach emerges much more naturally as can be seen in Section 5.1.2 as it strongly limits the computational resources. So even though the approach may not be the most flexibel to estimate MI, I use the method of smoothed quantile regression estimates for MI estimation as a show case in order to make the concept more accessible.

Also, the estimates come with standard errors and are testable.

For the purpose of calculating the MI contributions via conditional densities, one can rewrite the joint density as

f_X,Y,Z(x, y, z) =f_X∣Y,Z(x∣y, z)f_Y_∣Z(y∣z)f_Z(z).

This makes the summand for the ith observation in Equation (5.21) C(x_i, y_i, z_i) =log⎛

⎝

fˆ_X_∣Y,Z(x_i∣y_i, z_i)fˆ_Y_∣Z(y_i∣z_i) fˆ_X(x_i)fˆ_Y(y_i)

⎞

⎠ .

This decomposition provides the basis for the quantile regression based approach. All constituents of the joint density can be estimated from the parameter estimates of the respective quantile regression such that

fˆ_X,Y,Z(x, y, z) =γˆ₁(θˆ_X_∣Y,Z)γˆ₁(θˆ_Y_∣Z)γˆ₁(θˆ_Z).

Note, the summed up terms in the sample mean estimate ˆI(X, Y, Z)can again be conceived as functions of the parameter estimates from the various quantile regressions.

Also recall that each of the density estimates in C may converge at a different rate to the true density value. This is due to the result of Martins-Filho and Saraiva (2012) for the asymptotic convergence of local polynomial regression parameters estimated with stochastic bandwidths. The result for the local polynomial regression parameter associated to the first order derivative is reproduced for convenience in Equation (5.12). Therefore, in order to derive an asymptotic distribution of a test statistic, I need to align the convergence of the density estimates. For the construction of a test statistic, I therefore divide each of the density estimates by the square root of the third power of its bandwidth. In effect, since MI is formulated in logarithms this is equivalent to subtracting a constant term from the TE estimate. For the so normalized term, a function of normalized density estimates, I conjecture that √

Q-normality is sustained. The simulation results in section Section 5.2.3 underpin this conjecture.

For the construction of the test statistic, I treat the bandwidths as somewhat fixed and independent of the regression estimates ˆθ. The delta method for the adjusted contributions, thus, can be derived to be

Nlim→∞

√ Q

C(ˆ θ) −ˆ C−C^∗ θˆ−θ = lim

N→∞

1 θˆ−θ

√

Qlog⎛

⎝

h⁻¹_X∣Y,Z(fˆ_X∣Y,Z−f_X∣Y,Z)h⁻¹_Y_∣Z(fˆ_Y_∣Z−f_Y_∣Z) h⁻¹_X(fˆ_X −f_X)h⁻¹_Y (fˆ_Y −f_Y)

⎞

⎠

∂Cˆ_x_i_,y_i_,z_i

∂θ¯_lm ∣

θ¯lm=θˆlm

where the correcting termC^∗= ³₂log(

h_X∣Y,Zh_Y_∣Z

hXhY ) is used.

In order to calculate the variance of the MI estimate, not only the variance of each summand is needed, but also the covariances among all of the summands. Again, the asymptotic convergence results for the density estimates developed in Section 5.1.1 are of importance.

Knowing the limiting distribution of ˆθ, one can work out the limiting distribution of each summand ˆC(x, y, z) and the covariance between any two summandsC_i=C(x_i, y_i, z_i) and C_j =C(x_i, y_i, z_i)using the delta method. Everything that is needed are the gradients of the summands with respect to the parameter estimates ˆθ.

Collecting the derivatives of the i^th contribution C_i with respect to θ in a matrixΥ_i, the elements of Υ_i can be written as

∂Cˆ_x_i_,y_i_,z_i

∂θ¯_lm ∣

θ¯_lm=θˆ_lm

⎧⎪

⎪⎪

⎨

⎪⎪

⎪⎩

∂fˆ_X∣Y,Z

fˆ_X∣Y,Z if ˆθ_lm∈θˆ_X∣Y,Z

∂fˆ_Y_∣Z

fˆ_Y∣Z if ˆθ_lm∈θˆ_Y_∣Z 0 if ˆθlm∈θˆZ

−_ˆ¹

fY ∂fˆ_Y if ˆθ_lm∈θˆ_Y

−_ˆ¹

fX ∂fˆ_X if ˆθ_lm∈θˆ_X

where the elements ∂fˆcan be replaced with the respective elements of H and ˆf with the density estimates.

Note that when the representativity of the sample is not used and MI is estimated by integrating over the estimated joint density function, the derivatives of the contributions C_i need to be extended by additional terms.

Knowing Υ_i and its variance, the limiting distribution of the contribution can be approxi-mated using the delta method (cp. Oehlert 1992, Hayashi 2000) which leads to

Cˆ_i(θ) +ˆ C^∗∼ N (C_i, 1

QN vec(Υ_i)^′Avar(θ)ˆ vec(Υ_i)). (5.22) In order to calculate MI, the conventions 0 log(⁰

0) =0, 0 log(_f⁰

Y) =0 as well as log(^f^X

0 ) = ∞ need to be introduced (cp. Cover and Thomas 2005).³⁶ Therewith, the covariance between Cˆ_i(θˆ) and ˆC_j(θˆ) may be approximated by (cp. Klein 1953)

cov(Cˆ_i(θˆ),Cˆ_j(θ)) =ˆ 1

QN vec(Υ_i)^′Avar(θ)ˆ vec(Υ_j). (5.23) Based on the estimation of MI by the sample mean equivalent of Equation (5.21), the variance of the MI estimate can be computed as

var(Iˆ_X,Y,Z) = 1 N²

∑

i=1 N

∑

j=1

cov(Cˆ_i(θˆ),Cˆ_j(θˆ))

= 1 QN

∑

i=1 N

∑

j=1

N vec(Υ_i)^′Avar(θ)ˆ 1

N vec(Υ_j)

= 1 QN [

1 N

∑

i=1

vec(Υ_i)]

′

Avar(θ) [ˆ 1 N

∑

j=1

vec(Υ_j)]. (5.24)

36 Note that during implementation, one can choose to numerically represent infinity by a sufficiently large value. However, I choose to exclude such points in the calculation, since dragging these values through all calculations results in numerical instabilities.

This approach also leads me to conjecture that the limiting distribution of ˆvar(Iˆ_X,Y,Z) is a χ² distribution. For the application, however, I am only interested in the distribution of IˆX,Y,Z.

Transfer Entropy

Similar to Equations 5.19 and 5.20, transfer entropy as a special case of conditional MI in a time series context with time ordered random variables can be constructed as

T_X_→Y =I(Y_t, X_t−1∣Y_t−1)

=E[Θ_t] =E[log(

f_Y_t_∣X_t−1_,Y_t−1(y_t∣x_t−1, y_t−1) f_Y_t_∣Y_t−1(y_t∣y_t−1)

)]

= ∭

R³

f_Y_t_,X_t−1_,Y_t−1(y_t, x_t−1, y_t−1)log(

f_Y_t_∣X_t−1_,Y_t−1(y_t∣x_t−1, y_t−1) f_Y_t_∣Y_t−1(yt∣yt−1)

)dy_tdx_t−1dy_t−1 Note that only two quantile regressions are necessary to calculate the measure. Given stationary and ergodic time series for y_t, x_t andz_t, one can approximate TE via a sample mean

Tˆ_X→Y =E[log(

f_Y_t_∣X_t−1_,Y_t−1(y_t∣x_t−1, y_t−1) f_Y_t_∣Y_t−1(yt∣yt−1)

)] ≈ 1 T

∑

t=1

log(

f_Y_t_∣X_t−1_,Y_t−1(y_t∣x_t−1, y_t−1) f_Y_t_∣Y_t−1(yt∣yt−1)

) (5.25) The derivative of each summand is then given by

∂Θˆ_t

∂θ¯_lm∣

θ¯lm=θˆlm

⎧⎪

⎪⎪

⎪

⎨

⎪⎪

⎩

∂fˆ_Yt∣_{Xt−1,Yt−1}

fˆ_Yt∣_{Xt−1,Yt−1} if ˆθ_lm∈θˆ_Y_t_∣X_t−1_,Y_t−1

−

∂fˆ_Yt∣_Yt−1

fˆ_Yt∣_Yt−1 if ˆθ_lm∈θˆ_Y_t_∣Y_t−1 The variance of ˆTX→Y may be estimated analogously to MI.

Im Dokument Essays on the Statistics of Financial Markets (Seite 181-185)