• Keine Ergebnisse gefunden

Estimation with exact conditional variances

6.2 Maximum likelihood type estimation

6.2.2 Estimation with exact conditional variances

j1

and

(ft,2(j))j≥1 = (bj(θ))j≥1, while for (b)

σt(θ)−σ¯t(θ) = Φ1

with

(ft,1(j))j≥1= (bj+t(θ))j≥1 and

(ft,2(j))j≥1 = (bj(θ))j≥1.

Though Giraitis et al. (2003c) consider a more specific situation, the proofs of their lemmas B.1 and B.3 still hold leading to

E|Φt|3 ≤ X k1,k2,k3=1

Eh

(kt 1)Φ(kt 2)Φ(kt 3)|i , and

Eh

(kt 1)Φ(kt 2)Φ(kt 3)|i

≤Dt,13 Dkt,21+k2+k33, where

Dt,i =|µ|313kft,ik3+ 3ζkft,ik2

and ζ is defined as in assumption (M3). Hence, E|Φt|3≤ Dt,13

(1−Dt,2)3.

Since Θ is compact, we get in (a) Dt,1 < C1, Dt,2 <1−C2, where the constants C1 < ∞ and 0 < C2 < 1 are independent of θ. Furthermore, in (b), Dt,1 → 0 as t → ∞ uniformly for all θ∈ Θ. Note, that kft,1k2 may be greater than 1 and that we only used kft,2k2 <1. The proof is thus finished by propositions 6.1(d) together with the same arguments as under assumption (Mp).

6.2.2 Estimation with exact conditional variances

In a first step, we will assume that σt(θ) can be calculated exactly, i.e. as if we knew the infinite past (Xs)st. To avoid the problem of unbounded σt1(θ), we modify the objective function as follows.

CHAPTER 6. LARCH - STATISTICAL INFERENCE 114 Definition 6.1 Let h > 0. Given Xs, s ≤ t the modified conditional maximum likelihood estimator of the parameter vector θ is defined by

θ(h)n :=argmin

θΘ Ln,h(θ), where the objective function is given by

Ln,h(θ) := 1 n

Xn t=1

lt,h(θ) := 1 n

Xn t=1

Xt2+h

σ2t(θ) +h + ln(σt2(θ) +h), (6.10) and σt(θ) by (6.3).

Obviously, E[Xt2] < ∞ and E[σt2(θ)] < ∞ imply E[lt,h(θ)] < ∞, guaranteeing integrability of Ln,h(θ) and solving the second problem mentioned in the intro-duction. The effect of the additional parameterhis illustrated in figure6.1, where the computable version of the objective function ¯Ln,h(θ), with σt(θ) replaced by

¯

σt(θ) (see (6.21) below), is plotted as function of d for different values of h for a simulated LARCH process. One can see that forh= 0 there are several peaks due to vanishing values in the denominator making ¯Ln,0(θ) numerically intractable.

For increasing h these peaks seem to disappear, for which reason we call h the smoothing parameter. On the other hand, we have to ensure that θ(h)n does not become biased for positive h. This is done by the additionalh in the numerator and in the logarithm in (6.10). The reason that Ln,h(θ) is still asymptotically minimized at the true parameter value θo can be explained intuitively as follows.

Denote by

Lh(θ) :=E[lt,h(θ)]

the expected value of the individual terms in Ln,h(θ) and consider the process Yt=Xtt, t∈Z,

where ζt is an gaussian i.i.d sequence that is independent of Xt with E[ζt] = 0, var(ζt) = h > 0. Then, var(Yt|Xs, s ≤ t −1) = σ2t +h and the conditional log-likelihood function of Y1, . . . , Yn is given (up to a constant) by

Ln,Y(θ) = 1 n

Xn t=1

(Xtt)2

σ2t(θ) +h + ln(σt2(θ) +h).

Independence of Xt and ζt leads to E[Ln,Y(θ)] = E

E[Xt2 + 2Xtζtt2|Xs, s≤t]

σt2(θ) +h + ln(σt2(θ) +h)

= E

Xt2+h

σt2(θ) +h + ln(σt2(θ) +h)

=Lh(θ).

CHAPTER 6. LARCH - STATISTICAL INFERENCE 115

0.0 0.1 0.2 0.3 0.4 0.5

1500200025003000

L(d), h=0.01

d

L(d)

0.0 0.1 0.2 0.3 0.4 0.5

3800420046005000

L(d), h=0.001

d

L(d)

0.0 0.1 0.2 0.3 0.4 0.5

600080001000012000

L(d), h=0.0001

d

L(d)

0.0 0.1 0.2 0.3 0.4 0.5

0.0e+001.0e+072.0e+07

L(d), h=0

d

L(d)

Figure 6.1: For h = 0.01,0.001,0.0001 and 0, the function ¯Ln,h, see (6.21), is plotted as a function of d with fixed a = 1 and c = 0.1. In each plot the same simulated path of Xt is used, where the true parameter value isθo= (1,0.4,0.1)T and n= 2000. The vertical lines indicate the true value of d.

Hence, the asymptotic objective function Lh(θ) corresponding to θ(h)n coincides, up to a constant, with the conditional log-likelihood function of the sequence Yt and therefore should possess a minimum at the true θo. In the next lemma, this is proved rigorously.

Lemma 6.3 Let (A5.1), (B) and (S) hold. We then have (a) For all t ∈Z: P(σt= 0) = 0.

(b) If σ2t(θ) =σt2o) a.s. then θ =θo. (c) For every θ∈Θ\{θo}

Lh(θ)> Lho).

Proof: (a) Define fort∈Z the sets

Nt :={ω ∈Ω|σt = 0}

CHAPTER 6. LARCH - STATISTICAL INFERENCE 116 and denote Ntc the complement of Nt. On Nt∩Ntc1 we then have

0 =σt=a+b1εt−1σt−1+ X

j=2

bjXt−j and thus

εt1 =− 1 b1σt−1

( a+

X j=2

bjXtj

) .

The right-hand side is measurable with respect to the sigma-algebra Ft2 and hence independent of the left-hand side. By (A5.1), εt−1 has a continuous distri-bution, and thus this is only possible if

P(Nt∩Ntc1) = 0.

To justify the latter argument, consider two independent random variablesF and G, whereF has a continuous distribution. Then, the distribution of the difference F −G is continuous as well (see e.g. Gut 2005, pg 170) and thus the set where F and G coincide has probability zero.

On the sets

Nt,k :=Nt

k\1 i=1

Nti

!

∩Ntck, k ≥2

repeat the same arguments as above to get a representation of εtk in terms of Ftk1-measurable random variables and conclude thatP(Nt,k) = 0 for all k≥2 and thus

P(Nt) = P [ k=1

Nt,k

!

= 0.

Note that the set {ω ∈ Ω|∃t0 < t : σs = 0 for all s ≤ t0} has probability zero, otherwise equation (5.2) would not be fulfilled.

(b) Given θ = (d, c, a)T and θo = (do, co, ao)T with σt2(θ) = σt2o) a.s., we show θ =θo. Recall that σt(θ) may be negative andσt2(θ) =σt2o) does not immedi-ately imply σt(θ) =σto). Define

At:={ω∈Ω|σt(θ) =σto)}, and

t :=ACt ∩ {ω ∈Ω|σt2(θ) =σ2to)}, where ACt denotes the complement of At. Note, that

t ={ω ∈Ω|σt(θ) =−σto)}.

CHAPTER 6. LARCH - STATISTICAL INFERENCE 117

As above, the right-hand side is measurable with respect to the sigma algebra Ft2and hence independent of the left-hand side. Continuity ofεt1 again implies

P( ¯At∩Nt−1) = 0.

Taking expectation yields a=ao. Finally, the variance in (6.11) is given by 0 = E[σt2]

X j=1

(cojdo1−cjd1)2 implying cojdo1 =cjd1 for all j ≥1 and thus c=co, d=do.

(c) From E(ε2t) = 1 and by taking the conditional expectation, we get Lh(θ)−Lho) = E

(c) is completed by applying part (b).

We will now study the asymptotic properties of θn(h) by the standard procedure

CHAPTER 6. LARCH - STATISTICAL INFERENCE 118 described in (4.2). For consistency, we need the next lemma which concerns uniform convergence of Ln,h(θ) and the corresponding derivatives as n tends to infinity. Therefore denote the gradient of Ln,h(θ) by

Ln,h(θ) := 1

The corresponding limits will be denoted by Lh(θ) :=E

Lemma 6.4 Let h >0 and assumptions (A5.1), (B) and (S) hold.

(a) If further (M3) or (M3) holds, then

CHAPTER 6. LARCH - STATISTICAL INFERENCE 119 In each particular case Lh(θ), Lh(θ) resp. L′′h(θ) are continuous in θ.

Proof: There are several ways of proving almost sure uniform convergence, see e.g. Andrews (1992). We will proceed as follows: Recall that uniform convergence of a sequence of real-valued deterministic continuous functions is equivalent to pointwise convergence together with equicontinuity of the given sequence, see e.g.

Lang (1969). Thus to prove a.s. uniform convergence of a stochastic function of the form

wheredt(θ) is a differentiable stationary and ergodic sequence, it has to be shown that both properties hold for the sum cn(θ) with probability one. Therefore we apply in a first step the ergodic theorem todt(θ) for fixed θto derive almost sure pointwise convergence. In a second step we will prove a.s. equicontinuity which is obviously implied by see also Straumann (2004). Note that by the mean value theorem the left-hand side of (6.18) is dominated by

sup is finite with probability one. If cn(θ) is vector-valued one proceeds analogously replacing | · | by k · k. Therefore recall that the mean value theorem does not generalize to the vector-valued case, however, the inequality

kcn(θ)−cn)k ≤ kθ−θksup still holds, see Edwards (1995).

We start with the proof of (6.15). There is a finite constant K with sup

θ∈Θ

E|lt,h(θ)| ≤K(E[Xt2] +h) +Ksup

θ∈Θ

E[σt2(θ)]<∞.

CHAPTER 6. LARCH - STATISTICAL INFERENCE 120 Thus for each single θ ∈ Θ, Ln,h(θ) a.s.→ L(θ) by ergodicity of the sequences Xt2 andσt(θ). Ergodicity of sup∂θlt,h(θ) follows as in proposition5.1. Thus it suffices to show that This together with H¨older’s inequality and lemma 6.2 leads to

E In (6.16) and (6.17), pointwise convergence follows again from ergodicity and the particular moment assumption. Uniform convergence is also proved as above.

Note that the matrix-norm of ∂θ∂θ2 lt,h(θ) is dominated by a linear combination of the expressions corresponding expected values are finite and (6.16) follows. Analogously, under (M5), (6.17) follows by considering a similar linear combination involving also the terms

CHAPTER 6. LARCH - STATISTICAL INFERENCE 121 where i, j, k ∈ {1,2,3}, for which again lemma 6.2 can be applied.

Thus by lemmas 6.3 and 6.4a the conditions for consistency of θn(h) according to (4.22) are fulfilled implying the following result.

Theorem 6.2 Let h > 0 and assume that (A1), (B), (S) and further (M3) or (M3) hold. Then θ(h)n is a strongly consistent estimator forθ, i.e. almost surely

θn(h) →θo as n → ∞.

Next, the asymptotic distribution of θ(h)n is derived. For the specification of the asymptotic covariance matrix of θ(h)n , define the following matrices

Gh :=E

is used. We will see in the proof of the next theorem that the asymptotic distri-bution of θn(h) is essentially determined by the limiting behavior of Ln,h(θ) at the true parameter θo. Note that

E and thus a limit theorem for martingale differences can be applied.

Theorem 6.3 Let h > 0 and θo be an interior point of Θ. Then under assump-tions (A5.1), (B), (S) and (M5),

n1/2(h)n −θo)→ Nd (0, Hh1GhHh1)

as n→ ∞, i.e. the estimator θn(h) is asymptotically normal distributed with n12 -rate of convergence.

Proof: For each component of the vector Ln,h(θ) we apply the mean value theo-rem. This leads to

0 =Ln,hn(h)) =Ln,ho) + ˜L′′n,h·(θ(h)n −θo)

CHAPTER 6. LARCH - STATISTICAL INFERENCE 122 with a three-dimensional matrix ˜L′′n,h∈R3×3 that coincides with

L′′n,h(θ) = 1

θ=θo is a vector of stationary, ergodic martingale differences with finite variance. Hence, from theorem 23.1 in Billingsley (1968) and the Cramer-Wold device,

n1/2Ln,ho)→ Nd (0, Gh)

as n → ∞. From lemma 6.4c and consistency of θn(h), we get L˜′′n,h→Hh

almost surely as n → ∞. By lemma 6.6 below, Hh is invertible. This together

with Slutky’s theorem concludes the proof.

Thus we have shown that the estimator θn(h) exhibits the same asymptotic prop-erties as the estimators in the short memory GARCH respectively ARCH(∞) models, see section 4.2. In particular, in the case do >0, the rate of convergence is not affected by the presence of long memory in the squared values.

The remaining step in this section is devoted to the verification of invertibility of the matrix Hh which has been used in the proof of theorem 6.3.

Lemma 6.5 Let h >0 and (A5.1), (B), (S) and (M5) hold. Then the matrices Gh and Hh are positive definite.

Proof: Givenλ ∈R3, we have to show that

Note that all terms in both expected values are non-negative and recall that P(σt2 = 0) =P(σt4 = 0) = 0 by lemma 6.3. Thus it remains to show that if there is a vector λ = (λ1, λ2, λ3)T ∈R3 such that

Tσ)˙ 2 = 0 (6.20)

CHAPTER 6. LARCH - STATISTICAL INFERENCE 123 almost surely, then we must have λ = 0. From (6.20) together with lemma 6.1, we get

1 σt1

( λ1+

X j=2

λ2jdo13colog(j)jdo1 Xtj

)

=−λ2εt1.

The left-hand side is measurable with respect to Ft2 and thus independent of the right-hand side which implies λ2 = 0. Hence,

λ1+ X

j=2

λ3colog(j)jdo1Xtj = 0.

Taking expectation yields λ1 = 0, whereas considering the variance leads to

λ3 = 0.