Estimation given the finite past - Maximum likelihood type estimation

6.2 Maximum likelihood type estimation

6.2.3 Estimation given the finite past

λ1+

X∞ j=2

λ2j^d^o⁻¹+λ3colog(j)j^d^o⁻¹ Xt−j

)

=−λ2εt−1.

The left-hand side is measurable with respect to Ft−2 and thus independent of the right-hand side which implies λ2 = 0. Hence,

λ1+ X∞

j=2

λ3colog(j)j^d^o⁻¹Xt−j = 0.

Taking expectation yields λ1 = 0, whereas considering the variance leads to

λ3 = 0.

6.2.3 Estimation given the finite past

Given a finite sample X1, . . . , Xn, we can not calculate the infinite series σt(θ) and thus the objective function of the estimator θn^(h) is infeasible. Therefore we replaceσt(θ) by ¯σt(θ), see (6.4), and define the computable version of the modified conditional maximum likelihood estimator θn^(h):

Definition 6.2 Let h > 0. For a sample X1, . . . , Xn, the feasible estimator of the parameter vector θ is defined by

θ¯_n^(h) :=argmin

θ∈Θ

L¯n(θ), where the feasible objective function is given by

L¯_n,h(θ) := 1 n

Xn t=1

X_t²+h

σ_t²(θ) +h + ln(¯σ²_t(θ) +h)

(6.21) and σ¯t(θ) by (6.4).

The statistical properties of ¯θ^(h)n certainly depend on the asymptotic behavior of the difference of the objective functions |Ln,h(θ)−L¯n,h(θ)|as n tends to infinity.

We will show that this error converges to zero uniformly inθimplying consistency of ¯θn^(h). However, below, it will turn out that convergence is too slow to deduce the asymptotic distribution of ¯θ^(h)n from asymptotic normality of the infeasible estimator θ^(h)n .

Denote the gradient of ¯Ln,h(θ) by ¯L^′_n,h(θ) and the corresponding Hessian matrix by ¯L^′′_n,h(θ). These function are given analogously as L^′_n(θ) and L^′′_n(θ) with σt(θ) replaced by ¯σ_t(θ), see (6.12) - (6.14).

CHAPTER 6. LARCH - STATISTICAL INFERENCE 124 Lemma 6.6 Let h >0 and assumptions (A5.1), (B) and (S) hold.

(a) If further (M₃) or (M₃^′) holds, then Proof: From the mean value theorem applied to the functions (x²+h)⁻¹ and ln(x+h) (note that the corresponding derivatives are bounded) we get

sup

where K is a finite constant. Next, by lemma6.2b, assumption (M3) respectively (M^′₃) implies

E[sup

θ∈Θ|σ¯t(θ)−σt(θ)|³]→0 as n→ ∞.

Together with H¨older’s inequality and Ces`aro summability, this implies E[sup

CHAPTER 6. LARCH - STATISTICAL INFERENCE 125 and thus the proof of (a) is finished since E[|Xt|³]<∞.

For (6.23) consider the following decomposition L^′_n,h(θ)−L¯^′_n,h(θ) = 1 x. Hence an application of the mean value theorem leads to

sup As above, assumption (M^′′₄) ensures that all expected values are finite and further, by using lemma 6.2b, one gets

E[sup

CHAPTER 6. LARCH - STATISTICAL INFERENCE 126 whereby the proof is finished by Ces`aro summability. The remaining part (6.24)

can be proven analogously.

The preceding result and lemma 6.4 can now be combined to derive consistency of the feasible estimator ¯θn^(h).

Theorem 6.4 Let h > 0 and assume that (A5.1), (B) and (S) hold. If further (M3) respectively (M₃^′) holds, then

θ¯^(h)_n →θ_o as n→ ∞, where convergence holds in probability.

Proof: By lemma6.4and6.6, we get uniform convergence in probability of ¯Ln,h(θ) to Lh(θ). Thus the conditions described in (4.22) are fulfilled.

Obtaining the asymptotic distribution of ¯θn^(h) is more complicated. The reason is the slow convergence of |σt(θ)−σ¯t(θ)|to zero. To be more specific, note that

E[σ_t(θ)−¯σ_t(θ)]² =E[σ²_t] X∞

j=t

b²_j(θ)∼c₁t^2d−1 as t → ∞,

with a constant c1 <∞. As in the proof of theorem 6.3, Taylor series expansion yields

0 = ¯L^′_n(¯θ^(h)_n ) = ¯L^′_n,h(θo) +Lf¯^′′_n·(¯θ_n^(h)−θo).

Again Lf¯^′′_n coincides with the Hessian matrix ¯L^′′_n(θ), where the latter is evaluated in each row at a value ˜θ^j_n with kθ˜_n^j −θok ≤ kθ¯^(h)n −θok, j = 1,2,3. By lemmas 6.4 and 6.6 together with consistency of ¯θ^(h)n , the matrix Lf¯^′′_n converges to Hh in probability as n tends to infinity, and thus the asymptotic distribution of ¯θ^(h)n is given, up to the factor Hh, by the asymptotic distribution of ¯L^′_n,h(θo). The latter is the same as the one of L^′_n,h(θo) provided that

D_n:=√

n(L^′_n,h(θ_o)−L¯^′_n,h(θ_o))→0 (6.26) in probability as n→ ∞.However, it is not clear whether (6.26) holds in general, in particular ifdo is positive and thus convergence of|σt(θ)−σ¯t(θ)|to zero is very slow. First, we study the asymptotic behavior of Dn and thus the asymptotic distribution of ¯θn^(h) in the short memory case where do <0:

CHAPTER 6. LARCH - STATISTICAL INFERENCE 127 Proposition 6.2 Let h >0 and θo be an interior point of Θ with do <0. Then under assumptions (A5.1), (B), (S) and (M₆),

n^1/2(¯θ^(h)_n −θo)→ N^d (0, H_h⁻¹GhH_h⁻¹)

as n → ∞, i.e. in the short memory situation, the feasible estimator θ¯^(h)n is asymptotically normal distributed with n⁻¹²-rate of convergence.

Proof: The proposition follows by proving Dn → 0 in theL¹-norm. Recall that σt = σt(θo) and denote ¯σt := ¯σt(θo), ˙σt := _∂θ^∂σt(θo) and ˙¯σt := _∂θ^∂ ¯σt(θo). Using Σ2 consists of uncorrelated random variables and thus

asn tends to infinity by lemma6.2b and Ces`aro summability. We further decom-pose Σ1 into

CHAPTER 6. LARCH - STATISTICAL INFERENCE 128 where the constant K originates from the application of the mean value theorem to the function _x^2x2+h² . Hence, by assumption (M₆), all expected values are finite and E[σt−σ¯t]⁴ → 0 implying E^√¹nΣ1,2 the short memory situation do <0 and the bound (6.27) converges to zero.

Note, that assumption (M₆) was needed in the proof of proposition 6.2to express the upper bound (6.27) in terms of E[σ_t −σ¯_t]² for which the rate of decay is known.

In the long-memory case, i.e. do > 0, the proof of the preceding proposition does not hold anymore, since the upper bound (6.27) for EkDnk (more precisely for Ek^√¹_nΣ1,1k) does not converge to zero. We therefore propose an alternative estimator, at the cost of a slower rate of convergence:

Definition 6.3 Let h >0and0< β <1. Definem(n) =⌊n^β⌋−1. For a sample

where the truncated objective function is given by L˜n,h(θ) := 1

Here, ⌊·⌋ denotes the floor function, i.e. ⌊x⌋is the largest integer smaller than x.

The purpose of the additional function m(n) can be explained as follows: As described above, the asymptotic distributions of θn^(h) and ¯θn^(h) may differ due to

CHAPTER 6. LARCH - STATISTICAL INFERENCE 129 the slow convergence of |σt(θo)−σ¯t(θo)| to zero. More precisely, the difference of the objective functions D_n mainly comes from poor estimates ofσ_t(θ_o) for low t, since then only a small number of past X1, . . . , Xt is used for the calculation of the approximation ¯σt(θo) and thus a large deviation of ¯σt(θo) fromσt(θo) can occur. In the definition ofθn^(h,β), we try to avoid this problem by skipping the first n−m(n)−1 summands, i.e. we only use the part of the sum which is based on the most reliable values of ¯σt(θo). The functionm(n) is chosen in such a way, that the corresponding bound (6.27) of the difference converges to zero (see the proof of theorem 6.5 below). Note however, that all available values of Xs, s= 1, . . . , t are used for the calculations of σt(θo), t = 1, . . . , n. The estimator of definition 6.3 has the following properties:

Theorem 6.5 Let h > 0 and θo be in the interior of Θ. Further assume that (A5.1), (B) and (S) hold.

(a) If (M₃) or (M₃^′) holds and 0 < β < 1, then θn^(h,β) converges to θ_o in the L¹-norm, i.e. θn^(h,β) is a weakly consistent estimator.

(b) If (M₆^′) holds and 0< β <1−2do, then

n^β²(θ_n^(h,β)−θo)→^d N(0, H_h⁻¹GhH_h⁻¹) as n tends to infinity.

Proof: (a) Define

Lˇn,h(θ) := 1 m(n) + 1

Xn t=n−m(n)

X_t²+h

σ²_t(θ) +h + ln(σ_t²(θ) +h).

Then, the same arguments as in the proof of lemma 6.6 can be applied to show that sup_θ_∈_ΘkLˇ_n,h(θ)−L˜_n,h(θ)k converges to zero in the L¹-norm and that the analogue results hold for the gradient and Hessian matrix of ˜L_n,h(θ). Moreover, since Xt and σt(θ) are stationary, the distributions of ˇLn,h(θ) and

Lˇ^∗_n,h(θ) := 1 m(n) + 1

m(n)+1

t=1

X_t²+h

σ_t²(θ) +h + ln(σ_t²(θ) +h)

coincide. This leads to sup_θ_∈_ΘkLˇ^∗_n,h(θ)−L_h(θ)k → 0, since ˇL^∗_n,h(θ) is a subse-quence ofL_n,h(θ). Altogether, ˜L_n,h(θ) converges toL_h(θ) in probability uniformly

CHAPTER 6. LARCH - STATISTICAL INFERENCE 130 in θ and thus consistency of θ^(h,β)n follows.

(b) Again, by Taylor expansion,

0 = ˜L^′_n,h(θ_n^(h,β)) = ˜L^′_n,h(θo) +Lf˜^′′_n·(θ_n^(h,β)−θo)

where ˜L^′_n,h(θ) denotes the gradient of ˜Ln,h(θ) and Lf˜^′′_n coincides with the Hessian matrix of ˜Ln,h(θ) evaluated in each row in such a way that Lf˜^′′_n → Hh in proba-bility (compare with the preceding proofs). Thus the asymptotic distribution of L˜^′_n,h(θo) has to be derived. First note again, that the distributions of ˇL^′_n,h(θo) and Lˇ^∗′_n,h(θo) coincide and that the later is a subsequence ofL^′_n,h(θo) which is a mar-tingale difference. Thusp

m(n) + 1 ˇL^′_n,h(θo)→ N^d (0, Gh).It remains to show that D˜n :=p

m(n) + 1( ˜L^′_n,h(θo)−Lˇ^′_n,h(θo)) converges to zero. This can be shown by decomposing ˜D_nas in the proof of propostion6.2(recallD_n = ^√¹_n(Σ_1,1+Σ_1,2+Σ₂)) changing the upper bound (6.27) into

K m(n) + 1

pm(n) + 1n^d^o⁻¹² =K ⌊n^β⌋

p⌊n^β⌋n^d^o⁻¹² ∼Kn^d^o⁻¹²⁺^β² →0

as n tends to infinity (note that ^β₂ < ¹₂ −do).

Thus the asymptotic properties of θn^(h,β) depend on the value of do. If do > 0 is close to zero, then the best rate of convergence n^β/2 is close ton^1/2. However, for strong long memory withd_o close to 1/2, the upper bound forβ, given by 1−2d_o, is very small. Thus, the number ofσt’s used for estimation is very small compared to n and the rate of convergence of θ^(h,β)n is very slow. Though consistency holds for all β ∈(0,1], the asymptotic distribution of θ^(h,β)n for β ≥1−2do remains an open problem.

Im Dokument Asymptotic Statistical Theory for Long Memory Volatility Models (Seite 128-135)