• Keine Ergebnisse gefunden

Conditional maximum likelihood estimation

Estimation for ARCH(∞)

In this section we present a very general result concerning parameter estimation for stationary ARCH(∞) processes due to Robinson and Zaffaroni (2006). Let Xt be a solution of the equations

Xt = εtσt characterized by a finite dimensional parameter vectorθowhich shall be estimated given a sample (X1, . . . , Xn). More precisely, they consider functions bj(ζ) of a vector ζ ∈Rr, such that there exists some ζo with

bjo) =bo,j, j ≥1.

Then the unknown parameter value is θo = (ao, ζo)T. An approximate maximum likelihood estimator which is standard for conditionally heteroskedastic processes can be explained as follows: Define for θ ∈Θ

σt2(θ) = a+

Neglecting constants, the conditional maximum likelihood estimator is therefore defined by

CHAPTER 4. VOLATILITY MODELS - STATISTICAL INFERENCE 75 However, practically it is only possible to observe finitely many Xt, t = 1, . . . , n and thus σt2(θ) has to be approximated, e.g., by

¯

σt2(θ) = a+ Xt−1

j=1

bj(ζ)Xt−j2 , 1≤t≤n.

Then, the feasible analogon of ˜θn is given by θˆn =argmin

θΘ

n(θ),

where ¯Ln(θ) is defined as Ln(θ) with σt2(θ) replaced by ¯σ2t(θ). Robinson and Zaffaroni (2006) then study the asymptotic properties of ˜θn and ˆθn based on a set of assumptions which we formulate in a slightly simplified version.

(C4.1) The innovations εt are i.i.d. standard normal.

(C4.2) θo is an interior point of Θ = [a,¯a]×Ξ, where Ξ⊂Rr is a compact set.

(C4.3) For all j ≥1,

infζξbj(ζ)>0 and sup

ζξ

bj(ζ)≤Kj−d−1 for some d >0 and further bo,j ≤Kbo,k for 1≤k ≤j. (C4.4) There exists do > 12 such that

bo,j ≤Kj1do.

(C4.5) For each ζ ∈Ξ, there are integers 1≤ j1(ζ)< . . . < jr(ζ)<∞ and rank

∂bj1(ζ)

∂ζ , . . . ,∂bjr(ζ)

∂ζ

=r.

(C4.6) For all j ≥ 1, bj(ζ) has continuous k-th derivative on ξ such that (ζi denoting the i-th element of ζ)

kbj(ζ)

∂ζi1· · ·∂ζik

≤Kbj(ζ)1η for all η >0 and all ih = 1, . . . , r, h= 1, . . . , k, k ≤l.

(C4.7) (Xt)t∈Z is a strictly stationary and ergodic ARCH(∞) process with E|Xt|<∞

for some ρ∈((d+ 1)1,1)∩(4/(2do+ 3),1).

CHAPTER 4. VOLATILITY MODELS - STATISTICAL INFERENCE 76 We should mention that Robinson and Zaffaroni (2006) give a much more general condition (C4.1), in particular they allow for non-gaussian innovations leading to a so-called pseudo maximum likelihood estimator ˆθn. Moreover note thatXtmay have infinite variance and thus the results can be applied to processes similar to FIGARCH, see sections 3 and 4 in Robinson and Zaffaroni (2006).

The following theorem can then be proven.

Theorem 4.5 (a) Assume (C4.1)-(C4.7), where (C4.6) holds withl= 1. Then θ˜n and θˆn are strongly consistent for θo, i.e. almost surely

θ˜n→θo and θˆn→θo as n→ ∞.

(b) If further (C4.6) holds withl = 3, thenθˆn andθ˜n are asymptotically normal, n1/2(˜θn−θo)→ Nd (0, H1GH1)

and

n1/2(ˆθn−θo)→ Nd (0, H1GH1), where the matrices G, H are given by G= 2M, H =M with

M =E

(∂/∂θlnσt2)(∂/∂θlnσt2)T .

(Actually, assumtption (C4.4) is only used in the proof of asymptotic normality of ˆθn.) The assumptions should be compared to our results in chapter6, where we derive a similar result for long memory LARCH processes. Above, the coefficients and their derivatives are uniformly bounded by an absolutely summable series, while the moment assumptions are rather weak. On the other side, in chapter 6, a less convenient situation is given regarding the coefficients corresponding to a LARCH process which are not summable. Thus different methods than used in Robinson and Zaffaroni (2006) have to be applied for the derivation of the asymptotic behavior of parameter estimators. See the introduction of section 6.2, where further difficulties are described.

Consistency of extremum-type estimators

Estimators that are defined as minimizer (respectively maximizer) have been studied for a long time, see e.g. Jennrich (1969) and Huber (1967). We will briefly describe in a general framework how consistency can be proven for such

CHAPTER 4. VOLATILITY MODELS - STATISTICAL INFERENCE 77 estimators since the results in section 6.2 are derived analogously.

Therefore, assume a stochastic function Ln(θ) of a vector θ that belongs to a parameter space Θ. Usually, Θ is the set of all ’possible’ parameter vectors and it is assumed that there is a ’true’ parameter θo which is unknown. Ln(θ) is called the objective function and

θˆn =argmin

θΘ Ln(θ) (4.21)

is the corresponding estimator for θo. Consider the following conditions:

(D4.1) Θ⊂Rr is compact.

(D4.2) Ln(θ) is continuous in θ and converges to a non-stochastic functionL(θ) in probability uniformly in θ ∈Θ as n→ ∞.

(D4.3) Ln(θ) has a unique minimum in θo. We then have:

Proposition 4.3 Under (D4.1)-(D4.3),

θˆn →θo (4.22)

in probability as n→ ∞. Moreover if convergence in (D4.2) holds almost surely, then θˆn converges to θo almost surely.

Proof: See Jennrich (1969).

Note that due to continuity and compactness, there exists at least one minimizer in (4.21), however there may be more than one as well. In the latter case, one can always choose one root such that the preceding proposition still holds. In many cases, as above in theorem4.5,Ln(θ) is a sum (or mean) of a stationary sequence, and thus a uniform weak (or) strong law of large numbers has to be proven for consistency of the corresponding estimator, also compare to section 6.2.

CHAPTER 4. VOLATILITY MODELS - STATISTICAL INFERENCE 78

Chapter 5 LARCH

A linear ARCH (LARCH) process (Xt, σt)t∈Z is defined by the equations

Xt = εtσt, (5.1)

σt = b0+ X

j=1

bjXt−j, (5.2)

where the conditions for the coefficients (bj)j=0,1,... are given below. The essential modification, compared to the definition of ARCH(∞)-processes, is that (5.2) applies for σt and Xt instead of σ2t and Xt2. In the next subsection, it will become clear how this difference enables us to use very slowly decaying coefficients (bj)j=0,1,..., similar to FARIMA-weights, consequently leading to long memory in squares Xt2.

The literature on LARCH processes starts with Robinson (1991) who considered model (5.1) and (5.2) as one of different parametric alternatives in the context of hypothesis testing. A rigorous treatment of probabilistic aspects, such as stationarity and moment assumptions, was given in Giraitis et al. (2000b and 2003c). The most important results, which will be used in chapter6, are described in section 5.2and5.3. Moreover, they provide a limit theorem for sums of integer powers of Xt which is given in section 5.4 together with an asymptotic result for sums of functions f(Xt) and f(σt) studied in a paper by Berkes and Horv´ath (2003).

79

CHAPTER 5. LARCH 80

5.1 Basic properties

Stationarity

First, we consider the question whether a stationary solution of the two equations exists. Analogously to (3.3), a candidate for σt can be obtained by iteratively substituting (5.1) in (5.2) and vice versa leading to

σt = b0+ an orthogonal system inL2(Ω) and thus convergence in (5.3) follows by examining the variance. Let E[ε2t] = 1 and denote kbk2 =P conver-gence of the series (5.3), since then

var(σt) = b20kbk22

1− kbk22

<∞.

Moreover, the processXttσt then apparently solves equations (5.1) and (5.2).

Note that only a condition for the squared coefficients is needed, in particular they neither must be absolutely summable nor a positivity constraint exists to ensure positivity of σ2t. Hence, it is possible to choose more general coefficients than for ARCH(∞) models. We summarize the conditions as follows:

(A5.1) εt are i.i.d. random variables defined on a probability space (Ω,A, P), E[εt] = 0, and E[ε2t] = 1. Moreover, the εt’s have a continuous marginal distribution.

(A5.2) b0 6= 0 and kbk22 =P

j=1b2j <1.

CHAPTER 5. LARCH 81 Giraitis et al. (2003c) extend the preceding arguments by addressing also the case b0 = 0 and by showing uniqueness of σt forb0 6= 0:

Theorem 5.1 Let assumption (A5.1) hold.

(i) Then, under (A5.2), a unique strictly and second order stationary solution σt of (5.1) and (5.2) exists and is given by (5.3). On the other hand, b0 6= 0 and the existence of a second order stationary solution σt imply kbk2 <1.

(ii) If b0 = 0 and kbk2 <1, then σt= 0 a.s. is the unique solution of (5.1) and (5.2).

Proof: See theorem 2.1 in Giraitis et al. (2003c).

In (A5.1), we require continuity of the distribution of εt, which is clearly not necessary for stationarity of σt. However, later in section 6.2, we frequently need to cite (A5.1) together with this additional condition and thus it is already included at this point.

Obviously, Xt is a martingale difference since σt is measurable with respect to Ft1 =σ(εs, s ≤t−1). Thus, by the Wold-decompostion (5.2) of σt in terms of the uncorrelated Xtj’s, standard arguments from the long memory literature for linear processes imply the following corollary:

Corollary 5.1 Let (A5.1) and (A5.2) hold with bj ∼ L(j)jd1 as j → ∞ and d ∈ (0,12). Then cov(σt, σt+k) ∼ L1(j)j2d1 as k → ∞, where L(·) and L1(·) denote slowly varying functions.

This similarity to linear long memory processes is probably the reason for the letter L in the acronym LARCH. In section 5.3 we will see that the long memory property is also retained in squares and in higher order powers of the actually observable process Xt.

Finally, observe that σ2t coincides with the conditional variancevar(Xt|Ft−1) and thus |σt| is the conditional standard deviation of Xt, also called volatility. By (5.2),σtmay be negative and therefore it lacks somehow the usual interpretation of volatility. However, by choosing a suitable parameterization, the probability P(σt<0) can be made very small (or even zero). Moreover, ifεthas a symmetric distribution, then εtσt and εtt| are distributed identically.

CHAPTER 5. LARCH 82 Ergodicity

In addition to stationarity, it is possible to show thatσtadmitts further important properties such as ergodicity. The results in section 6.2concerning consistency of different parameter estimators are based on the following result (see Beran and Sch¨utzner 2008b):

Proposition 5.1 Let (A5.1) and (A5.2) hold. Then the unique solution σt of (5.1) and (5.2) is ergodic.

Proof: For the proof of ergodicity, it is sufficient to find a measurable function f : RN → R with σt = f(εt1, εt2, . . .), where equality holds almost surely (see for example theorem 3.5.8 in Stout 1974). First note, that convergence of the infinite sum in (5.3) is independent of the order of summation, since the series of squared coefficients is absolutely summable. Hence, we make use of the following alternative representation of σt: Define

fk(x1, x2, . . .) = X

ji1, lk j1+· · ·+jl=k

bj1bj2· · ·bjlxj1xj1+j2· · ·xj1+...+jl

and

Mt(k) =fkt1, εt2, . . .).

Then

σt=b0 +b0

X k=1

Mt(k)

and for every fixed t ∈ Z, Mt(k), k = 1,2, . . . is a martingale difference with respect to Fkt = σ(Mt(l), l ≤ k). An application of the martingale convergence theorem (see e.g. Billingsley 1995) yields that

St(m) = Xm k=1

Mt(k)→ X k=1

Mt(k)

as m→ ∞ almost surely. Hence, the desired representation is given by f(x) =

( P

k=1fk(x), x∈C

0, x /∈C ,

where x= (x1, x2, . . .) ∈RN and C ={x∈RN: limm→∞Pm

k=1fk(x) exists}. For the measurability of f see corollary 2.1.3 in Straumann (2004).

CHAPTER 5. LARCH 83