• Keine Ergebnisse gefunden

Asymptotic Theory for Relative Entropy Measures

Estimation of Transfer Entropy and Other Relative Entropy Measures Based on Smoothed Quantile Regressions 30

5.1.2 Asymptotic Theory for Relative Entropy Measures

As discussed in the introduction to this chapter and listed in Table 5.1, relative entropy measures are constructed as Kulback-Leibler divergences. The discussion in Section 5.1.2 focuses first on the asymptotics of MI as a show case of the method. However, the simulation study for MI will also be accompanied by another simulation study. I also look at the specifics of TE as another example for a relative entropy measure and a special case of CMI in Section 5.1.2. While discretization approaches are deemed appropriate for the estimation of MI, for TE coarse-graining methods have some unsatisfying characteristics (seeKaiser and Schreiber 2002). Therefore, even though I first focus on MI (and leave time series considerations aside), I deem the asymptotic behavior and estimation technique for TE as the major contribution of this chapter.

Mutual Information

While the concept of MI is defined in general for K variables, it is useful to limit the discussion here to the three variable case in order to alleviate notational complexities. A

limitation to two variables would be possible, however, for the extension to conditional MI (and TE as a special case), I use the three variable case in which MI is defined as

I(X, Y, Z) =E[log(

fX,Y,Z(x, y, z)

fX(x)fY(y)fZ(z))] (5.19)

= ∫x∈X

y∈Y

z∈Z

fX,Y,Z(x, y, z)C(x, y, z)dzdxdy

= ∫x∈X

y∈Y

z∈Z

fX,Y,Z(x, y, z)log(

fX,Y,Z(x, y, z) fX(x)fY(y)fZ(z)

)dzdxdy. (5.20) This definition allows for two approaches to calculate MI. The first one calculates the sample mean equivalent of Equation (5.19) over all observations (xi, yi, zi) with i∈ {1, . . . , N}.

I(X, Y, Zˆ ) =E[log(

fX,Y,Z(x, y, z) fX(x)fY(y)fZ(z)

)] ≈ 1 N

N

i=1

log(

X,Y,Z(xi, yi, zi) fˆX(xi)fˆY(yi)fˆZ(zi)

) (5.21) This approach uses the representativity of the sample to circumvent the integration over a grid of artificial support points. Which leads to the other calculation approach: the numerical integration suggested by the integral formulation in Equation (5.20). This entails the integration across a grid that covers the sample space across the various dimensions.

If I chose M grid points then for three variablesX,Y and Z, the integral would need to be evaluated at M3 grid points. The number of points grows exponentially with each dimension, making the calculation practically already for a small number of variables infeasible. The expectation formulation is much more feasible.

For MI, another possibility of estimation seems interesting: One can estimate the involved densities by means of kernel density techniques. For the concept of conditional MI and TE, where the conditional densities are directly needed in the calculation, however, the quantile regression approach emerges much more naturally as can be seen in Section 5.1.2 as it strongly limits the computational resources. So even though the approach may not be the most flexibel to estimate MI, I use the method of smoothed quantile regression estimates for MI estimation as a show case in order to make the concept more accessible.

Also, the estimates come with standard errors and are testable.

For the purpose of calculating the MI contributions via conditional densities, one can rewrite the joint density as

fX,Y,Z(x, y, z) =fX∣Y,Z(x∣y, z)fY∣Z(y∣z)fZ(z).

This makes the summand for the ith observation in Equation (5.21) C(xi, yi, zi) =log⎛

X∣Y,Z(xi∣yi, zi)fˆY∣Z(yi∣zi) fˆX(xi)fˆY(yi)

⎠ .

This decomposition provides the basis for the quantile regression based approach. All constituents of the joint density can be estimated from the parameter estimates of the respective quantile regression such that

X,Y,Z(x, y, z) =γˆ1(θˆX∣Y,Z)γˆ1(θˆY∣Z)γˆ1(θˆZ).

Note, the summed up terms in the sample mean estimate ˆI(X, Y, Z)can again be conceived as functions of the parameter estimates from the various quantile regressions.

Also recall that each of the density estimates in C may converge at a different rate to the true density value. This is due to the result of Martins-Filho and Saraiva (2012) for the asymptotic convergence of local polynomial regression parameters estimated with stochastic bandwidths. The result for the local polynomial regression parameter associated to the first order derivative is reproduced for convenience in Equation (5.12). Therefore, in order to derive an asymptotic distribution of a test statistic, I need to align the convergence of the density estimates. For the construction of a test statistic, I therefore divide each of the density estimates by the square root of the third power of its bandwidth. In effect, since MI is formulated in logarithms this is equivalent to subtracting a constant term from the TE estimate. For the so normalized term, a function of normalized density estimates, I conjecture that √

Q-normality is sustained. The simulation results in section Section 5.2.3 underpin this conjecture.

For the construction of the test statistic, I treat the bandwidths as somewhat fixed and independent of the regression estimates ˆθ. The delta method for the adjusted contributions, thus, can be derived to be

Nlim→∞

√ Q

C(ˆ θ) −ˆ C−C θˆ−θ = lim

N→∞

1 θˆ−θ

Qlog⎛

h−1X∣Y,Z(fˆX∣Y,Z−fX∣Y,Z)h−1Y∣Z(fˆY∣Z−fY∣Z) h−1X(fˆX −fX)h−1Y (fˆY −fY)

=

∂Cˆxi,yi,zi

∂θ¯lm

θ¯lm=θˆlm

,

where the correcting termC= 32log(

hX∣Y,ZhY∣Z

hXhY ) is used.

In order to calculate the variance of the MI estimate, not only the variance of each summand is needed, but also the covariances among all of the summands. Again, the asymptotic convergence results for the density estimates developed in Section 5.1.1 are of importance.

Knowing the limiting distribution of ˆθ, one can work out the limiting distribution of each summand ˆC(x, y, z) and the covariance between any two summandsCi=C(xi, yi, zi) and Cj =C(xi, yi, zi)using the delta method. Everything that is needed are the gradients of the summands with respect to the parameter estimates ˆθ.

Collecting the derivatives of the ith contribution Ci with respect to θ in a matrixΥi, the elements of Υi can be written as

∂Cˆxi,yi,zi

∂θ¯lm

θ¯lm=θˆlm

=

⎧⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎪

⎪⎩

fˆX∣Y,Z

fˆX∣Y,Z if ˆθlm∈θˆX∣Y,Z

fˆY∣Z

fˆY∣Z if ˆθlm∈θˆY∣Z 0 if ˆθlm∈θˆZ

ˆ1

fY ∂fˆY if ˆθlm∈θˆY

ˆ1

fX ∂fˆX if ˆθlm∈θˆX

where the elements ∂fˆcan be replaced with the respective elements of H and ˆf with the density estimates.

Note that when the representativity of the sample is not used and MI is estimated by integrating over the estimated joint density function, the derivatives of the contributions Ci need to be extended by additional terms.

Knowing Υi and its variance, the limiting distribution of the contribution can be approxi-mated using the delta method (cp. Oehlert 1992, Hayashi 2000) which leads to

i(θ) +ˆ C∼ N (Ci, 1

QN vec(Υi)Avar(θ)ˆ vec(Υi)). (5.22) In order to calculate MI, the conventions 0 log(0

0) =0, 0 log(f0

Y) =0 as well as log(fX

0 ) = ∞ need to be introduced (cp. Cover and Thomas 2005).36 Therewith, the covariance between Cˆi(θˆ) and ˆCj(θˆ) may be approximated by (cp. Klein 1953)

cov(Cˆi(θˆ),Cˆj(θ)) =ˆ 1

QN vec(Υi)Avar(θ)ˆ vec(Υj). (5.23) Based on the estimation of MI by the sample mean equivalent of Equation (5.21), the variance of the MI estimate can be computed as

var(IˆX,Y,Z) = 1 N2

N

i=1 N

j=1

cov(Cˆi(θˆ),Cˆj(θˆ))

= 1 QN

N

i=1 N

j=1

1

N vec(Υi)Avar(θ)ˆ 1

N vec(Υj)

= 1 QN [

1 N

N

i=1

vec(Υi)]

Avar(θ) [ˆ 1 N

N

j=1

vec(Υj)]. (5.24)

36 Note that during implementation, one can choose to numerically represent infinity by a sufficiently large value. However, I choose to exclude such points in the calculation, since dragging these values through all calculations results in numerical instabilities.

This approach also leads me to conjecture that the limiting distribution of ˆvar(IˆX,Y,Z) is a χ2 distribution. For the application, however, I am only interested in the distribution of IˆX,Y,Z.

Transfer Entropy

Similar to Equations 5.19 and 5.20, transfer entropy as a special case of conditional MI in a time series context with time ordered random variables can be constructed as

TX→Y =I(Yt, Xt−1∣Yt−1)

=E[Θt] =E[log(

fYt∣Xt−1,Yt−1(yt∣xt−1, yt−1) fYt∣Yt−1(yt∣yt−1)

)]

= ∭

R3

fYt,Xt−1,Yt−1(yt, xt−1, yt−1)log(

fYt∣Xt−1,Yt−1(yt∣xt−1, yt−1) fYt∣Yt−1(yt∣yt−1)

)dytdxt−1dyt−1 Note that only two quantile regressions are necessary to calculate the measure. Given stationary and ergodic time series for yt, xt andzt, one can approximate TE via a sample mean

X→Y =E[log(

fYt∣Xt−1,Yt−1(yt∣xt−1, yt−1) fYt∣Yt−1(yt∣yt−1)

)] ≈ 1 T

T

t=1

log(

fYt∣Xt−1,Yt−1(yt∣xt−1, yt−1) fYt∣Yt−1(yt∣yt−1)

) (5.25) The derivative of each summand is then given by

∂Θˆt

∂θ¯lm

θ¯lm=θˆlm

=

⎧⎪

⎪⎪

⎪⎪

⎪⎪

fˆYt∣Xt−1,Yt−1

fˆYt∣Xt−1,Yt−1 if ˆθlm∈θˆYt∣Xt−1,Yt−1

fˆYt∣Yt−1

fˆYt∣Yt−1 if ˆθlm∈θˆYt∣Yt−1 The variance of ˆTX→Y may be estimated analogously to MI.