• Keine Ergebnisse gefunden

Estimation given the finite past

6.2 Maximum likelihood type estimation

6.2.3 Estimation given the finite past

λ1+

X j=2

λ2jdo13colog(j)jdo1 Xtj

)

=−λ2εt1.

The left-hand side is measurable with respect to Ft2 and thus independent of the right-hand side which implies λ2 = 0. Hence,

λ1+ X

j=2

λ3colog(j)jdo1Xtj = 0.

Taking expectation yields λ1 = 0, whereas considering the variance leads to

λ3 = 0.

6.2.3 Estimation given the finite past

Given a finite sample X1, . . . , Xn, we can not calculate the infinite series σt(θ) and thus the objective function of the estimator θn(h) is infeasible. Therefore we replaceσt(θ) by ¯σt(θ), see (6.4), and define the computable version of the modified conditional maximum likelihood estimator θn(h):

Definition 6.2 Let h > 0. For a sample X1, . . . , Xn, the feasible estimator of the parameter vector θ is defined by

θ¯n(h) :=argmin

θ∈Θ

n(θ), where the feasible objective function is given by

n,h(θ) := 1 n

Xn t=1

Xt2+h

¯

σt2(θ) +h + ln(¯σ2t(θ) +h)

(6.21) and σ¯t(θ) by (6.4).

The statistical properties of ¯θ(h)n certainly depend on the asymptotic behavior of the difference of the objective functions |Ln,h(θ)−L¯n,h(θ)|as n tends to infinity.

We will show that this error converges to zero uniformly inθimplying consistency of ¯θn(h). However, below, it will turn out that convergence is too slow to deduce the asymptotic distribution of ¯θ(h)n from asymptotic normality of the infeasible estimator θ(h)n .

Denote the gradient of ¯Ln,h(θ) by ¯Ln,h(θ) and the corresponding Hessian matrix by ¯L′′n,h(θ). These function are given analogously as Ln(θ) and L′′n(θ) with σt(θ) replaced by ¯σt(θ), see (6.12) - (6.14).

CHAPTER 6. LARCH - STATISTICAL INFERENCE 124 Lemma 6.6 Let h >0 and assumptions (A5.1), (B) and (S) hold.

(a) If further (M3) or (M3) holds, then Proof: From the mean value theorem applied to the functions (x2+h)1 and ln(x+h) (note that the corresponding derivatives are bounded) we get

sup

where K is a finite constant. Next, by lemma6.2b, assumption (M3) respectively (M3) implies

E[sup

θΘ|σ¯t(θ)−σt(θ)|3]→0 as n→ ∞.

Together with H¨older’s inequality and Ces`aro summability, this implies E[sup

CHAPTER 6. LARCH - STATISTICAL INFERENCE 125 and thus the proof of (a) is finished since E[|Xt|3]<∞.

For (6.23) consider the following decomposition Ln,h(θ)−L¯n,h(θ) = 1 x. Hence an application of the mean value theorem leads to

sup As above, assumption (M′′4) ensures that all expected values are finite and further, by using lemma 6.2b, one gets

E[sup

CHAPTER 6. LARCH - STATISTICAL INFERENCE 126 whereby the proof is finished by Ces`aro summability. The remaining part (6.24)

can be proven analogously.

The preceding result and lemma 6.4 can now be combined to derive consistency of the feasible estimator ¯θn(h).

Theorem 6.4 Let h > 0 and assume that (A5.1), (B) and (S) hold. If further (M3) respectively (M3) holds, then

θ¯(h)n →θo as n→ ∞, where convergence holds in probability.

Proof: By lemma6.4and6.6, we get uniform convergence in probability of ¯Ln,h(θ) to Lh(θ). Thus the conditions described in (4.22) are fulfilled.

Obtaining the asymptotic distribution of ¯θn(h) is more complicated. The reason is the slow convergence of |σt(θ)−σ¯t(θ)|to zero. To be more specific, note that

E[σt(θ)−¯σt(θ)]2 =E[σ2t] X

j=t

b2j(θ)∼c1t2d−1 as t → ∞,

with a constant c1 <∞. As in the proof of theorem 6.3, Taylor series expansion yields

0 = ¯Ln(¯θ(h)n ) = ¯Ln,ho) +Lf¯′′n·(¯θn(h)−θo).

Again Lf¯′′n coincides with the Hessian matrix ¯L′′n(θ), where the latter is evaluated in each row at a value ˜θjn with kθ˜nj −θok ≤ kθ¯(h)n −θok, j = 1,2,3. By lemmas 6.4 and 6.6 together with consistency of ¯θ(h)n , the matrix Lf¯′′n converges to Hh in probability as n tends to infinity, and thus the asymptotic distribution of ¯θ(h)n is given, up to the factor Hh, by the asymptotic distribution of ¯Ln,ho). The latter is the same as the one of Ln,ho) provided that

Dn:=√

n(Ln,ho)−L¯n,ho))→0 (6.26) in probability as n→ ∞.However, it is not clear whether (6.26) holds in general, in particular ifdo is positive and thus convergence of|σt(θ)−σ¯t(θ)|to zero is very slow. First, we study the asymptotic behavior of Dn and thus the asymptotic distribution of ¯θn(h) in the short memory case where do <0:

CHAPTER 6. LARCH - STATISTICAL INFERENCE 127 Proposition 6.2 Let h >0 and θo be an interior point of Θ with do <0. Then under assumptions (A5.1), (B), (S) and (M6),

n1/2(¯θ(h)n −θo)→ Nd (0, Hh1GhHh1)

as n → ∞, i.e. in the short memory situation, the feasible estimator θ¯(h)n is asymptotically normal distributed with n12-rate of convergence.

Proof: The proposition follows by proving Dn → 0 in theL1-norm. Recall that σt = σto) and denote ¯σt := ¯σto), ˙σt := ∂θσto) and ˙¯σt := ∂θ ¯σto). Using Σ2 consists of uncorrelated random variables and thus

E

asn tends to infinity by lemma6.2b and Ces`aro summability. We further decom-pose Σ1 into

CHAPTER 6. LARCH - STATISTICAL INFERENCE 128 where the constant K originates from the application of the mean value theorem to the function x2x2+h2 . Hence, by assumption (M6), all expected values are finite and E[σt−σ¯t]4 → 0 implying E1nΣ1,2 the short memory situation do <0 and the bound (6.27) converges to zero.

Note, that assumption (M6) was needed in the proof of proposition 6.2to express the upper bound (6.27) in terms of E[σt −σ¯t]2 for which the rate of decay is known.

In the long-memory case, i.e. do > 0, the proof of the preceding proposition does not hold anymore, since the upper bound (6.27) for EkDnk (more precisely for Ek1nΣ1,1k) does not converge to zero. We therefore propose an alternative estimator, at the cost of a slower rate of convergence:

Definition 6.3 Let h >0and0< β <1. Definem(n) =⌊nβ⌋−1. For a sample

where the truncated objective function is given by L˜n,h(θ) := 1

Here, ⌊·⌋ denotes the floor function, i.e. ⌊x⌋is the largest integer smaller than x.

The purpose of the additional function m(n) can be explained as follows: As described above, the asymptotic distributions of θn(h) and ¯θn(h) may differ due to

CHAPTER 6. LARCH - STATISTICAL INFERENCE 129 the slow convergence of |σto)−σ¯to)| to zero. More precisely, the difference of the objective functions Dn mainly comes from poor estimates ofσto) for low t, since then only a small number of past X1, . . . , Xt is used for the calculation of the approximation ¯σto) and thus a large deviation of ¯σto) fromσto) can occur. In the definition ofθn(h,β), we try to avoid this problem by skipping the first n−m(n)−1 summands, i.e. we only use the part of the sum which is based on the most reliable values of ¯σto). The functionm(n) is chosen in such a way, that the corresponding bound (6.27) of the difference converges to zero (see the proof of theorem 6.5 below). Note however, that all available values of Xs, s= 1, . . . , t are used for the calculations of σto), t = 1, . . . , n. The estimator of definition 6.3 has the following properties:

Theorem 6.5 Let h > 0 and θo be in the interior of Θ. Further assume that (A5.1), (B) and (S) hold.

(a) If (M3) or (M3) holds and 0 < β < 1, then θn(h,β) converges to θo in the L1-norm, i.e. θn(h,β) is a weakly consistent estimator.

(b) If (M6) holds and 0< β <1−2do, then

nβ2n(h,β)−θo)→d N(0, Hh−1GhHh−1) as n tends to infinity.

Proof: (a) Define

n,h(θ) := 1 m(n) + 1

Xn t=nm(n)

Xt2+h

σ2t(θ) +h + ln(σt2(θ) +h).

Then, the same arguments as in the proof of lemma 6.6 can be applied to show that supθΘkLˇn,h(θ)−L˜n,h(θ)k converges to zero in the L1-norm and that the analogue results hold for the gradient and Hessian matrix of ˜Ln,h(θ). Moreover, since Xt and σt(θ) are stationary, the distributions of ˇLn,h(θ) and

n,h(θ) := 1 m(n) + 1

m(n)+1

X

t=1

Xt2+h

σt2(θ) +h + ln(σt2(θ) +h)

coincide. This leads to supθΘkLˇn,h(θ)−Lh(θ)k → 0, since ˇLn,h(θ) is a subse-quence ofLn,h(θ). Altogether, ˜Ln,h(θ) converges toLh(θ) in probability uniformly

CHAPTER 6. LARCH - STATISTICAL INFERENCE 130 in θ and thus consistency of θ(h,β)n follows.

(b) Again, by Taylor expansion,

0 = ˜Ln,hn(h,β)) = ˜Ln,ho) +Lf˜′′n·(θn(h,β)−θo)

where ˜Ln,h(θ) denotes the gradient of ˜Ln,h(θ) and Lf˜′′n coincides with the Hessian matrix of ˜Ln,h(θ) evaluated in each row in such a way that Lf˜′′n → Hh in proba-bility (compare with the preceding proofs). Thus the asymptotic distribution of L˜n,ho) has to be derived. First note again, that the distributions of ˇLn,ho) and Lˇ∗′n,ho) coincide and that the later is a subsequence ofLn,ho) which is a mar-tingale difference. Thusp

m(n) + 1 ˇLn,ho)→ Nd (0, Gh).It remains to show that D˜n :=p

m(n) + 1( ˜Ln,ho)−Lˇn,ho)) converges to zero. This can be shown by decomposing ˜Dnas in the proof of propostion6.2(recallDn = 1n1,11,22)) changing the upper bound (6.27) into

K m(n) + 1

pm(n) + 1ndo12 =K ⌊nβ

p⌊nβ⌋ndo12 ∼Kndo12+β2 →0

as n tends to infinity (note that β2 < 12 −do).

Thus the asymptotic properties of θn(h,β) depend on the value of do. If do > 0 is close to zero, then the best rate of convergence nβ/2 is close ton1/2. However, for strong long memory withdo close to 1/2, the upper bound forβ, given by 1−2do, is very small. Thus, the number ofσt’s used for estimation is very small compared to n and the rate of convergence of θ(h,β)n is very slow. Though consistency holds for all β ∈(0,1], the asymptotic distribution of θ(h,β)n for β ≥1−2do remains an open problem.