• Keine Ergebnisse gefunden

Asymptotic Statistical Theory for Long Memory Volatility Models

N/A
N/A
Protected

Academic year: 2022

Aktie "Asymptotic Statistical Theory for Long Memory Volatility Models"

Copied!
187
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Asymptotic Statistical Theory for Long Memory Volatility Models

Dissertation

zur Erlangung des akademischen Grades

des Doktors der Naturwissenschaften (Dr. rer. nat.) an der

Universit¨ at Konstanz

Mathematische-Naturwissenschaftliche Sektion Fachbereich Mathematik und Statistik

vorgelegt von

Martin Sch¨ utzner

Referenten:

Prof. Dr. Jan Beran, Universit¨at Konstanz Prof. Dr. Siegfried Heiler, Universit¨at Konstanz

Tag der m¨ undlichen Pr¨ ufung: 16. Juni 2009

Konstanzer Online-Publikations-System (KOPS) URN: http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-82995

URL: http://kops.ub.uni-konstanz.de/volltexte/2009/8299/

(2)

To Bianca

(3)

Acknowledgment

This work was financially supported by the German Research Foundation (DFG).

I am deeply grateful to my supervisor Jan Beran for his guidance and for the in- teresting and demanding topic of this thesis which turned out to be very fruitful and surprisingly manifold. Moreover, I would like to thank him for many ideas, perfect working conditions and the opportunity to travel to many international conferences. My special thanks also go to Siegfried Heiler for his support since the beginning of my studies and for refereeing this PhD thesis.

Konstanz, June 2009 Martin Sch¨utzner

(4)

Abstract

In this thesis, statistical theory for time series with conditional heteroskedastic- ity and long memory in volatility is studied. We present appropriate models and consider several problems regarding parametric estimation. First, we discuss the question whether the asymptotic properties of M-estimators of location are af- fected by slowly decaying autocorrelations in squares. It turns out that under certain symmetry assumptions, consistency and the usual central limit theorem still hold. On the other hand, deviations from these assumptions can lead to non-standard behavior, in particular non-gaussian limiting distributions. For the asymptotic analysis, a connection to Appell polynomials and linear long memory processes is derived. Furthermore, we focus on the parametric LARCH model and investigate a modified conditional maximum likelihood estimator. Consis- tency and asymptotic normality are derived. The proofs differ substantially from the case of related models such as ARCH(∞), since the volatility of a LARCH process is not separated from zero. Moreover, the long memory property leads to additional difficulties, for instance a slower rate of convergence. Consequently, we discuss the question how more efficient estimators can be defined for models with slowly decaying autocorrelations. Therefore, we exploit the result that long memory can be explained by contemporaneous aggregation. Based on a panel scheme of random AR(1) processes, a new estimator for the long memory pa- rameter of the aggregated process is introduced and asymptotic properties are proven. The results indicate that the described procedure could lead to improved statistical methods, in particular for heteroskedastic models with long memory.

(5)

Zusammenfassung

In der vorliegenden Arbeit werden statistische Methoden f¨ur Zeitreihen mit be- dingter Heteroskedastizit¨at und langfristigen Abh¨angigkeiten in der Volatilit¨at untersucht. Es werden entsprechende stochastische Prozesse vorgestellt und ver- schiedene Probleme im Zusammenhang mit Parametersch¨atzung behandelt. Zu- n¨achst studieren wir die Frage, ob langsam fallende Autokorrelationen die asymp- totischen Eigenschaften von M-Sch¨atzern f¨ur Lokationsparameter beeinflussen k¨onnen. Es stellt sich heraus, dass dies nicht der Fall ist, wenn bestimmte Symmetriebedingungen erf¨ullt sind. Andererseits ergibt sich bei Abweichungen von diesen Bedingungen ein unterschiedliches Verhalten, insbesondere muss die Grenzverteilung nicht mehr normalverteilt sein. F¨ur die Herleitung der asymp- totischen Ergebnisse wird ein Zusammenhang zu Appell-Polynomen und linearen Prozessen mit langfristigen Abh¨angigkeiten hergeleitet. Desweiteren konzentri- eren wir uns auf das parametrische LARCH-Modell und untersuchen bedingte Maximum-Likelihood-Sch¨atzer. Konsistenz und asymptotische Normalit¨at wer- den gezeigt, wobei sich die Beweise wesentlich von den Methoden f¨ur ¨ahnliche Modelle wie ARCH(∞) unterscheiden. Der Grund daf¨ur liegt haupts¨achlich in der Tatsache, dass die Volatilit¨at eines LARCH-Prozesses beliebig klein werden kann.

Ferner f¨uhren die langfristigen Abh¨angigkeiten zu weiteren Schwierigkeiten wie etwa einer langsameren Konvergenzrate. Dadurch stellt sich die Frage, wie sich ef- fizientere Methoden f¨ur solche Zeitreihen definieren lassen. Dabei greifen wir das Ergebnis auf, dass langfristige Abh¨angigkeiten durch Aggregation einfacherer Ba- sisprozesse erzeugt werden k¨onnen, und betrachten ein Panel-Schema mit AR(1)- Prozessen und zuf¨alligen Koeffizienten. In diesem Modell wird basierend auf Sch¨atzern f¨ur die Basisprozesse eine neue Sch¨atzmethode f¨ur den aggregierten Prozess eingef¨uhrt. Die Ergebnisse zeigen, dass das beschriebene Vorgehen zu ef- fizienteren Sch¨atzern f¨uhren kann, insbesondere bei heteroskedastischen Modellen mit langfristigen Abh¨angigkeiten.

(6)

Contents

1 Introduction 3

2 Long memory and limit theorems 9

2.1 Basic concepts . . . 9

2.2 Limit theorems . . . 11

2.3 Appell polynomials . . . 16

2.3.1 Definition and examples . . . 17

2.3.2 Diagrams and the central limit theorem. . . 20

2.3.3 Expansion of entire functions . . . 24

3 Volatility models 31 3.1 The ARCH(∞) model . . . 31

3.2 LM-SV, LARCH and related models . . . 38

4 Volatility models - statistical inference 45 4.1 Location estimation . . . 45

4.1.1 Symmetric innovations . . . 49

4.1.2 General innovations . . . 53

4.1.3 Examples . . . 61

4.1.4 Cumulants and diagrams . . . 68

4.2 Conditional maximum likelihood estimation . . . 74

5 LARCH 79 5.1 Basic properties . . . 80

5.2 Moment assumptions . . . 83

5.3 Long memory and leverage effect . . . 90

5.4 Limit theorems . . . 92

1

(7)

CONTENTS 2

6 LARCH - statistical inference 97

6.1 Location estimation . . . 98

6.2 Maximum likelihood type estimation . . . 103

6.2.1 Preliminaries . . . 104

6.2.2 Estimation with exact conditional variances . . . 113

6.2.3 Estimation given the finite past . . . 123

6.2.4 Simulations . . . 130

7 Aggregation and Estimation 139 7.1 Introduction . . . 139

7.2 Aggregating asymptotically stationary AR(1) processes . . . 141

7.3 Estimation procedure . . . 144

7.4 Simulations . . . 150

7.5 Bias and MSE of the serial correlation coefficient . . . 152

8 Concluding remarks 165

Bibliography 169

Appendix 181

(8)

Chapter 1 Introduction

Many financial time series such as log-returns of asset prices or exchange rates seem to be stationary and uncorrelated. On the other hand, they simultaneously possess time-varying volatility and strong dependencies measured by the autoco- variances of some non-linear transformation such as the absolute values or squares.

To model this kind of behavior, Engle (1982) proposed in his path-breaking paper the autoregressive conditional heteroskedastic (ARCH) process. Since then an enormous number of related models have been introduced in financial time series analysis and have become essential tools in practice, for instance in prediction, risk assessment, risk management, option pricing and portfolio management. The innovative feature of ARCH is conditional heteroskedasticity, i.e. time varying (and stochastic) conditional variance, which improved econometric methods in such a way that Engle was awarded in 2003 with the Nobel Memorial Prize in Economic Sciences. In its simplest form the ARCH(1) process can be written as

Xttσt, σt01Xt1, t ∈Z,

where β0 > 0, 0 ≤ β1 < 1 are model parameters and the so-called innovations (εt)tZ are an i.i.d. sequence with zero means and unit variances. Thus, for a causal stationary solution (in the sense that Xt is measurable with respect to Ft =σ{εs, s≤t}), one gets

E[Xt|Ft1] = 0, var(Xt|Ft1) =σt2 and further

cov(Xt, Xs) = 0, cov(Xt2, Xs2) =β1|t−s|var(X02)6= 0. (1.1) 3

(9)

CHAPTER 1. INTRODUCTION 4

1995 2000 2005

−0.10−0.050.000.05

DAX Nov. 1990 − June 2008, daily log−returns

time

diff(log(DAX))

1995 2000 2005

0.0000.0020.0040.0060.0080.010

squared log−returns

time

diff(log(DAX))^2

2e−04 1e−03 5e−03 2e−02 1e−01

1e−101e−081e−06

periodogram of squared log−returns (log−log scale)

freq

spec

0 5 10 15 20 25 30 35

0.00.20.40.60.81.0

Lag

ACF

acf of log−returns

0 10 20 30 40

0.00.20.40.60.81.0

Lag

ACF

acf of squared log−returns

5 10 20 50 100 200 500

2e−083e−084e−086e−088e−08

variance plot for squared log−returns

sample size

variance

slope=−0.287 slope=−1

Figure 1.1: Daily log-returns of DAX from November 1990 until June 2008: series, acf, squared series, acf of squares, log-log-plot of periodogram and variance plot in log-log-scale. The latter has been generated by dividing the whole series into subseries of decreasing lengths whereas for each length the sample variance of the means has been calculated.

Obviously, the conditional standard deviation sd(Xt|Ft1) =σt (named volatil- ity) is time varying and together with (1.1) the basic properties of financial time series are captured by ARCH. Bollerslev (1986) introduced thegeneralized ARCH (GARCH(p, q)) model given by the equations

Xttσt, σ2t0+ Xp

i=1

αiσ2ti+ Xq

j=1

βjXt2j, (1.2) where β0 >0,αi, βj ≥0 andp, q ≥0. Though GARCH allows for rather flexible and parsimonious modelling, it can be shown (see section 3.1), that the auto- covariances of the squares, if they exist, decay exponentially fast analogously to (1.1). This is in contrast to several studies which investigate the observation, that often empirical autocorrelations in squared log-returns persist over long stretches of time (see Dacorogna et al. 1993, Ding et al. 1993, Baillie et al. 1996, Breidt et al. 1998, Liu 2000, Andersen et al. 2001, Beran and Ocker 2001, Mikosch and Starica 2003). Compare, for illustration, the correlogram of log-returns and

(10)

CHAPTER 1. INTRODUCTION 5 squared log-returns of the DAX stock index series in figure 1.1: While there is no apparent autocorrelation in the original series, the autocorrelations of the squares decay very slowly, in particular much slower than exponentially decreas- ing autocorrelations of squared GARCH processes. Further, the log-log-plot of the periodogram exhibits a negative slope near the origin while the variance plot shows a similar behavior, both indicating long memory in squares. By long mem- ory, we mean that the autocovariances of a second order stationary process are not absolutely summable. For instance, let

cov(Xt2, Xt+k2 )∼bk2d−1 as k → ∞ (1.3) with constants b >0 and d >0, then

X

kZ

|cov(Xt2, Xt+k2 )|=∞. (1.4) During the last 15 years or so, modifications of ARCH and GARCH have been in- troduced to include the possibility of slowly decaying autocovariances in squares or volatility, so-called long memory volatility models. Examples are ARCH(∞), FIGARCH, FIEGARCH, LARCH, quadratic ARCH(∞) and stochastic volatility models (LM-SV) - though, as it turned out, not all of these models exhibit long memory in the sense (1.4), whereas their correlations may decay hyperbolically.

Note that (1.3) with d ∈ 0,12

(a value which is empirically often observed in financial data) leads to 2d−1∈(−1,0) and thus to long memory (1.4), whereas d <0 leads to summable autocovariances.

The problem of statistical inference for short memory volatility models is exten- sively studied, see e.g. Straumann (2004). On the other side, statistical analy- sis for long memory models is still a topic with many open research problems.

This thesis mainly studies three of such problems. First, we consider sequences Yt=µ+Xt, whereXttσt is a volatility model satisfying (1.3) without further parametric specifications. The question is then, whether there is an effect of (1.3) on M-estimators of the location parameter µ, compared to the case of indepen- dence or short memory inXt2. It turns out that the answer essentially depends on symmetry of the innovations εt. While symmetric εt imply the usual asymptotic properties of M-estimators, asymmetry can lead to non-standard convergence rates, non-gaussian asymptotic distributions and thus to different confidence in- tervals and tests. In this context, a connection to Appell polynomials and limit

(11)

CHAPTER 1. INTRODUCTION 6 theorems for linear long memory processes is derived.

Secondly, we focus on the parametric LARCH process, which has been introduced by Robinson (1991) to allow for long memory in volatility and asymmetries such as the leverage effect. The latter describes the empirically observed property, that volatility is more sensitive to falling markets than to raising markets, which can not be replicated by the classical GARCH model whereσtis a function of absolute past returns. Though theoretical properties of LARCH have been investigated in the recent literature, there is a lack of estimation methods and correspond- ing asymptotic theory. We show in this thesis, that standard methods lead to difficulties and propose modified parameter estimators for which consistency and asymptotic normality are derived. As a result, the usual √

n-normalization still holds for short memory versions of the model, whereas for general LARCH pro- cesses the rate of convergence is affected by the long memory property. There are two reasons for the emphasis on LARCH. On the one hand, the long memory property can be derived easily. Thus, the probabilistic aspects can be treated rigorously. On the other hand, there are no latent processes in the definition of LARCH and direct estimation of the model parameters is possible. Compare e.g.

to stochastic volatility models, where long memory is generated by an unobserv- able process.

The findings lead to the question how more efficient estimators can be defined for LARCH processes in particular, and for long memory models in general. Within this context, we should recall, that a possible explanation of long memory is given by contemporaneous aggregation in the sense of summing or averaging across dif- ferent micro-level short memory processes. It is then possible that the aggregated process has long memory, whereas the long memory parameter is characterized exclusively by the distribution of the parameters of the micro-level processes.

Thus, inference for the long memory parameter can be based on estimators for the short memory processes, which often have a faster rate of convergence com- pared to corresponding estimators for long memory models, e.g. as it is the case for LARCH. We will investigate the basic problem, whether the described proce- dure works in the elementary case of aggregating random AR(1) processes. In a panel set-up, where the AR(1)-parameters have a Beta-type distribution, we in- troduce the new parameter estimator for the (possibly) long memory aggregated process and derive the main asymptotic properties.

Additionally to the mentioned statistical problems, we present related models and

(12)

CHAPTER 1. INTRODUCTION 7 results as prerequisites or supplements to our study. The thesis is structured as follows: In chapter 2we state central and non-central limit theorems for gaussian and linear long memory processes. Moreover, we introduce Hermite respectively Appell polynomials, whereas the connection of the latter to limit theorems is described in detail. The results will be used in chapter 4. In chapter 3 we give a brief overview of available volatility models and discuss the existence of long memory in the different processes. The problem of M-estimation is studied in chapter 4, where we also present a standard method for parametric estimation of ARCH(∞) processes, the conditional maximum likelihood estimator. Chapter 5 gives an introduction to the LARCH model. Subsequently, chapter 6 consid- ers the asymptotic study of the modified maximum likelihood estimator together with a result regardingM-estimation for LARCH processes. Finally, the problem of aggregation and estimation is described in chapter 7. The thesis is completed with concluding remarks in chapter8, a bibliography and an appendix containing the R program code used for the simulations in section 6.2.4.

(13)

CHAPTER 1. INTRODUCTION 8

(14)

Chapter 2

Long memory and limit theorems

In this section, we collect some well-known results that will be used later in this thesis. The main focus lies on the presentation of limit theorems for sequences of long memory processes. After introducing some standard notations in section 2.1, we will consider in section 2.2 the case of functionals of gaussian processes, where the asymptotic behavior of partial sums essentially depends on the con- nection to Hermite polynomials, and further we will present the generalizations for linear processes and corresponding Appell polynomials. The results are used, for instance, in section 4.1.2 for the derivation of similar limit theorems in the context of volatility models. The necessary theory of Appell polynomials, which can be used for the proof of the central limit theorem for linear long memory processes, is described in section 2.3. There, we also outline the problem of Ap- pell polynomial expansions of entire functions, which will also be used in section 4.1.2. For more detailed descriptions of long memory we refer to Beran (1994), Doukhan et al. (2003) and Robinson (2003).

2.1 Basic concepts

We start with a definition what we mean by long memory:

Definition 2.1 Let (Xt)tZ be a stochastic process.

(a) (Xt)tZ is called strictly stationary, if for all (t1, . . . , tk)T ∈ Zk, k ≥1, the joint distributions of (Xt1+t, . . . , Xtk+t) do not depend on t ∈Z.

(b) Define µ(t) = E[Xt] and γ(t, s) = cov(Xt, Xs). Then, (Xt)tZ is called second-order stationary, if E[Xt2] < ∞ for all t ∈ Z, µ(·) is constant and

9

(15)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 10 γ(t, s) only depends on |t−s|. In this case, denote

γX(k) :=γ(0, k), called to autocovariance function of (Xt)tZ.

(c) A second-order stationary process (Xt)tZ has long memory (or long range dependence), if

X

kZ

|γ(k)|=∞. Otherwise, (Xt)tZ has short memory.

As example, we consider a linear process exhibiting long memory. Therefore, let ξt, t = 1,2, . . .be an i.i.d. sequence of random variables withE[ξt] = 0,E[ξt2]<∞ and coefficients

bj =cjd1, j = 1,2, . . . , where c >0 and d∈ 0,12

. Then, define the linear process Xt=

X j=1

bjξtj, t∈Z.

Thus, the autocovariances are given by E[X0Xk] =

X j=1

bjbj+k =c2 X

j=1

jd1(j+k)d1

= k2d1c2 X

j=1

j k

d1 1 + j

k d1

1

k =:k2d1Λ(k).

Note that the following truncated sum is a Riemann approximation for the cor- responding integral,

[N k]

X

j=1

j k

d1 1 + j

k d1

1 k →

Z N 0

xd−1(1 +x)d−1dx, as k → ∞, where convergence is uniform in N >0. This leads to

t→∞lim lim

N→∞

[N k]

X

j=1

j k

d−1 1 + j

k

d−1 1

k = lim

N→∞ lim

t→∞

[N k]

X

j=1

j k

d−1 1 + j

k

d−1 1 k

= Z

0

xd1(1 +x)d1dx <∞.

(16)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 11 In particular, Λ(·) is a slowly varying function, in the sense that Λ(·) is positive with limj→∞ Λ(aj)Λ(j) = 1 for alla >0. Hence, by Karamata’s theorem,

XM k=1

γX(k) = XM k=1

Λ(k)k2d−1 ∼Λ(M) 1

2dM2d → ∞ as M → ∞.

Here, ”∼” means that the ratio of the expressions on the left and right hand side converges to one. Consequently, Xt has long memory.

Finally, we introduce some notations which will be used throughout this thesis:

Let (Yn)n1 and Y be random variables. Then Yn

d Y, Yn

p Y, Yn L2

→ Y stand as usual for convergence in distribution, in probability respectively in the L2-norm. Moreover, weak convergence in the space D[0,1] equipped with the Skorohod metric of a continuous time process Yn(u), u∈ [0,1] to Y(u), u∈[0,1]

is denoted by Yn(u)→D Y(u). Incidentally, the Landau symbols o(·) andO(·) are used.

2.2 Limit theorems

The asymptotic behavior of partial sumsPn

t=1Ytfor stationary processes Ytwith linear long-range dependence is well known (see e.g. Rosenblatt 1961, Taqqu 1975, 1979, Dobrushin and Major 1979, Surgailis 1981, 1982, Giraitis 1983, 1985, Giraitis and Surgailis 1985, 1986, 1999, Avram and Taqqu 1987, Dehling and Taqqu 1989, Arcones and Yu 1994, Ho and Hsing 1996, 1997). One should already mention that asymptotic theory for processes that exhibit long-range dependence in volatility, which will be defined later, is much less developed. Two recent references are, for instance, Berkes and Horv´ath (2003) and Giraitis et al. (2000b), where certain limit theorems for LARCH processes are derived. The results of the latter two papers are summarized in section 5.4.

Here, we will describe limit theorems for Yt = f(Xt), where (i) Xt is gaussian and f admits a Hermite expansion and (ii) Xt is linear and f admits an Appell expansion. Start with case (i) and let (Xt)tZ be a stationary gaussian process with E[Xt] = 0, E[Xt2] = 1 and covariances such that

γX(t) =E[X0Xt]∼tqΛ(t), t→ ∞ (2.1) for some 0 < q <1 and a slowly varying function Λ(t) (in fact Λ may take negative values as well). For a function H with E[H(X0)] = 0 and E[H(X0)2] <∞, we

(17)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 12 derive the asymptotic behavior of

Sn,H(u) :=

X[nu]

t=1

H(Xt).

(We will also use the notation Sn,H :=Sn,H(1).) Therefore, define Hermite polynomials as follows:

Definition 2.2 For m≥0, the Hermite polynomials Hm are defined by Hm(x) =ex2/2 dm

dxme−x2/2.

Since (Hm)m≥0 constitutes a complete orthogonal system in the space{H :R→ R : E[H(X0)] = 0, E[H2(X0)] < ∞}, see e.g. Abramowitz and Stegun (1972), the following expansion holds:

H(Xt) = X m=0

cmHm(Xt), where P

m=0c2mm!<∞, that means the series converges in L2 since the variance of Hm(Xt) is given by var(Hm(Xt)) =m!(γX(0))m, see (2.2). Moreover, Hermite polynomials are uncorrelated and thus the coefficients can be calculated by cm =

1

m!E[H(X0)Hm(X0)]. The lowest integer m ∈N with cm 6= 0 is then called the Hermite rank ofH. DenoteγH(t) the covariance function ofH(Xt) and note that the (cross-)covariances of Hermite polynomials are given by

E[Hk(X0)Hj(Xt)] =δk,jk!(γX(t))k. (2.2) (The latter can be shown, e.g., by the diagram formula in theorem 2.5 below.) Thus

γH(t) =

X n,m=m

cncmE[Hn(Xt)Hm(X0)]

= γX(t)m X m=m

c2mm!γX(t)mm. (2.3) For m >1/q, the covariances of Hm(Xt) are absolutely summable, since

X t=1

X(t)|m = X

t=1

tqmΛm(t)<∞.

(18)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 13 Hence, (2.3) leads to

X t=1

H(t)| ≤c X

t=1

X(t)|m, where the constant c is such that c ≥ P

m=mc2mm!γX(t)mm for all t ∈ Z (therefore note that γX(t) →0), i.e. one gets summability of the covariances of the process H(Xt).

The other way around, (2.3) and γX(t)→0 imply

H(t)| ≥ |γX(t)|mc2mm!/2, for t large. Consequently, we have the equivalence

X t=0

H(t)|<∞ ⇐⇒

X t=0

X(t)|m <∞,

meaning that a function H with Hermite rank m >1/q leads to short memory in the process H(Xt). Indeed, Giraitis and Surgailis (1985) proved the following theorem:

Theorem 2.1 Let(Xt)tZ be a zero mean gaussian process for which (2.1) holds.

If P

t=0H(t)|<∞ (i.e. m >1/q) and σ2 :=P

t=0γH(t)>0, then n−1/2Sn,H(u)→D σB(u),

where B(u) is a standard Brownian motion.

Next, we consider the case m <1/q. First, we derive the order of growth of the variance ofSn,m:=Sn,Hmin the simple caseγ(t) =ctq for a constant 0< c <∞:

var(Sn,m) = nvar(Hm(X0)) + 2n

n1

X

t=1

1− t

n

E[Hm(X0)Hm(Xt)]

= nvar(Hm(X0)) + 2cn

n1

X

t=1

1− t

n

m!γX(t)m

= nvar(Hm(X0)) + 2cn

n1

X

t=1

m!tqm−2cn

n1

X

t=1

t

nm!tqm. (2.4) Thus, for n→ ∞,

n1

X

t=0

tqm =n1qm

n1

X

t=0

t n

qm

1

n ∼n1qm Z 1

0

xqmdx,

(19)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 14 since the sum is a Riemann approximation of the latter integral. Analogously, we get for the latter term in (2.4) that

Xn−1 t=1

t1qm1

n =n1qm Xn−1

t=1

t n

1qm

1

n ∼n1qm Z 1

0

x1qmdx,

and hence limn→∞var(Sn,m)/n2qm = const. Consequently, since 2−qm > 1, the usual central limit theorem with standard √

n-scaling can not hold for sums of Hm(Xt). Indeed, even a non-normal limiting distribution can arise and the same is true for Sn,H(u), as the next theorem, which goes back to Taqqu (1979) and Dobrushin and Major (1979), states.

Theorem 2.2 Let m be the Hermite rank of H und (Xt)t∈Z a zero-mean gaus- sian process for which (2.1) holds with 0 < q < 1/m. Further denote An = n1mq/2Λm/2(n), then

1 An

Sn,H(u) = 1 An

X[nu]

t=1

H(Xt)→D cmHm(u), as n → ∞, where cm is the m-th coefficient in the Hermite expansion of H.

Here, the Hermite process Hk(u) of orderk ≥1 is defined by Hk(u) = Cq

Z

Rk

Z u 0

Yk j=1

(s−yj)(1+q)/2

!

dsdB(y1)· · ·dB(yk),

where B(·) is a standard Brownian motion and Cq is a positive constant (see e.g.

Taqqu 1979). Fork = 1,Hk(u) is fractional Brownian motion, and thus gaussian, while Hk(u) has non-normal marginal distributions for k ≥2.

The preceding theorem can be understood as a reduction principle, in the sense that the asymptotic properties of Sn,H(u) only depend on the Hermite rank m of H. In particular, the sum ofH(Xt) and the sum ofcmHm(Xt) have the same limiting distribution.

Giraitis (1985), Surgailis (1982) and Avram and Taqqu (1987) showed that this reduction principle, together with the central limit theorem2.1, can be generalized for linear processes (our case (ii)), which will be described in the following. Let Xt be given by

Xt=X

sZ

b(t+s)ξs, t ∈Z (2.5)

(20)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 15 with coefficients

b(t) = Λ(|t|)|t|d1, t∈Z, (2.6) where Λ is a slowly varying function and d ∈ (0,12). (Here, the reason for the two-sided moving average representation of Xt is a more convenient notation in section 2.3.2.) Moreover, ξs, s ∈ Z are i.i.d. with E[ξs] = 0, E[ξs2] = 1 and E[|ξs|k] < ∞ for all k ≥ 0. The covariance function is then given by γX(t) = E[X0Xt] = Λ1(t)t2d1. Due to the more general marginal distributions of Xt, the Hermite polynomials have to be replaced by Appell polynomials defined as follows:

Definition 2.3 Let X be a random variable with finite moments up to order M, i.e. E[|X|M] < ∞. Then, the corresponding Appell polynomials Am are defined by A0(X) = 1 and for m= 1, . . . , M recursively by

d

dxAm(x) =mAm1(x), E[Am(X)] = δ0,m.

In the next section, we will give a more detailed description of the polynomials Am. However, one immediately sees that eachAm is of orderm form≥0. Thus, every polynomial Gof order M can be uniquely expanded as

G(Xt) = XM m=0

cmAm(Xt).

Correspondingly to the gaussian case, the lowest m with cm 6= 0 is called Appell rank and is denoted by m. Write Sn,G(u) := P[nu]

t=1 G(Xt) and let γG(t) be the covariance function ofG(Xt). Then, the following theorem, due to Giraitis (1985), is the (partial) analogon to theorem 2.1.

Theorem 2.3 Let (2.5) and (2.6) hold and denotem the Appell rank of G. For m ≥2, the conditions

X t=0

G(t)|<∞ and X

t=0

|γ(t)|m <∞ (2.7) are equivalent. Moreover, if one of them holds and σ2 :=P

t=0γG(t)>0, then n1/2Sn,G(u)→D σB(u). (2.8)

(21)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 16 Hence, if m > (112d), (2.7) is fulfilled and the central limit theorem holds. The next theorem considers the case m < (112d). The result was primarily proven by Surgailis (1982), whereas the connection to Appell polynomials was carried out in Avram and Taqqu (1987), see also Surgailis (2003).

Theorem 2.4 For1≤m <1/(1−2d), letAmdenote them-th Appell polynomial corresponding to the linear process (2.5) and define

Sn,Am(u) :=

X[nu]

t=1

Am(Xt).

Then, Bn,m2 :=E[Sn,A2 m(1)]∼bmΛm(n)n2−m(1−2d) with a constant bm >0 and Bn,m1 Sn,Am(u)→ HD m(u),

as n tends to infinity.

Thus, given a polynomial function G with Appell rank m < (1−2d)−1, one gets var(Sn,Am(u)) = o(Bn,m2 ) for m > m and the leading term in the Appell expansion of G(Xt) is cmAm(Xt). Therefore, the preceding non-central limit theorem also holds for G(Xt).

Finally, note that we only considered polynomials G until now since the Appell expansion of more general functions is rather complicated, see the next section.

However, one should mention that Giraitis (1985) also proved that an expansion G(Xt) =

X m=m

cmAm(Xt)

still holds for entire functionsGsatisfying some growth condition (see theorem2.7 below) by which theorem 2.3 can be proven forG(Xt) ifm >(1−2d)1. Thus, if m < (1−2d)−1, theorem 2.4 can also be applied to G(Xt) by decomposing G=G1+G2, where G1 is a polynomial with Appell rankm(1) <(1−2d)−1 and G2 is an entire function with rank m(2) >(1−2d)1. Then, theorem 2.4 can be applied to G1(Xt), while the variance of Sn,G2(u) is of order O(n), and hence of smaller order than var(Sn,G1(u)), leading to asymptotic negligibility of Sn,G2(u).

2.3 Appell polynomials

For detailed descriptions of Appell polynomials and their role in the context of limit theorems for linear processes see Avram and Taqqu (1987), Giraitis and

(22)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 17 Surgailis (1986), Giraitis (1985), Surgailis (2003) and Sch¨utzner (2006). Here, we will give the definition and some examples in section 2.3.1, the connection to cumulants (the so-called diagram formula) and the application to the central limit theorem in section 2.3.2, and describe the problem of Appell polynomials expansions in section 2.3.3.

2.3.1 Definition and examples

We start with an alternative definition of Appell polynomials: First assume that the moment-generating function MX(z) = E[ezX] exists for |z| < r with r > 0.

Then MX(0) = 1 and 1/MX(z) is analytic in a neighborhood of 0 leading to a power series in the variable z for exp(zx)/MX(z) with fixed x∈R.

Definition 2.4 Let X be a random variable with MX(z) = E[ezX] < ∞ for

|z|< r. Then the Appell polynomials Am(x) are defined by X

m=0

zm

m!Am(x) = exp(zx)

E[ezX] . (2.9)

The function exp(zx)E[ezX] is called generating function of (Am)m0.

Note that for this definition, the moment-generating function actually need not exist as analytical object. By matching of coefficients with respect to the gener- ating function

exp(zx) PM

m=0 zm

m!E[Xm],

it suffices that moments of X up to order M ≤ ∞ are finite. Consequently, if M <∞, Appell polynomials Am can only be defined for m≤M.

For the first derivative dxdAm(x) =Am(x), we have X

m=1

zm

m!Am(x) = d dx

exp(zx)

E[ezX] = zexp(zx) E[ezX] =

X m=0

zm+1

(m+ 1)!(m+ 1)Am(x).

Thus

Am(x) =mAm1(x). (2.10)

Moreover, by applying the expected value in (2.9), we get X

m=0

zm

m!E[Am(X)] =E

exp(zX) E[ezX]

= 1,

(23)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 18 yielding

E[Am(X)] = δ0,m. (2.11)

However note that the justification of this argument relies on converge of the series in L2(X) (or at least in L1(X)). On the other hand, (2.11) can also be shown by combinatorial arguments, see e.g. Sch¨utzner (2006) where also equivalence of definitions 2.4 and 2.3 is derived.

We will now calculate (Am)m0 in two examples. First, let X be distributed according to an exponential distribution with density

f(x) =λeλx, x >0, λ >0.

Then MX(z) is given by

MX(z) = 1

1−λ1z, |z|< λ.

Thus we have to compare the coefficients in the following equation:

X m=0

zm

m!Am(x) = exp(zx)(1−λ1z) = X m=0

zm m!xm

X m=0

zm+1 λm!xm

= 1 + X m=1

zm m!

xm− 1

λmxm−1

. Hence,

A0(X) = 1

Am(X) = Xm− 1

λmXm1, m≥1.

Observe, that E[Xm] = λm!m impliesE[Xnλ1nXn1] = 0 and thus definition 2.3 is also fulfilled.

Next, we show that Appell polynomials are indeed a generalization of Hermite polynomials. Therefore, let X be standard-normally distributed leading to the moment generating function MX(z) = exp(z2/2), and consider

X m=0

zm

m!Am(x) = exp

zx− z2 2

. Thus

Am(x) = dm dzmexp

x2 2 − 1

2(x−z)2

z=0

= ex

2

2 dm

dzme12(xz)2

z=0

= (−1)mex

2

2 dm

dxme12(x−z)2

z=0

=Hm(x),

(24)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 19 compare to definition 2.2.

Finally, we will show that Hm are the only Appell polynomials that are orthog- onal. I.e. assuming E[An(X)Am(X)] = 0 for n 6=m, we have to show thatX is normally distributed. Denote µm =E[Xm] and note that

φ(z) = ln X m=0

zm m!µm

!

= X m=1

zm m!χm

is the cumulant generating function and χm are the cumulants of X. Since Am(x) = dzdmmezx−φχ(z)

z=0, one can inductively show that Am+1(X) = XAm(X)−

Xm k=0

m k

χm−k+1Ak(X). (2.12)

Let m ≥ 2. Then, by orthogonality, we have E[A1(X)Am(X)] = E[(X − µ1)Am(X)] = 0, which together with E[Am(X)] = 0 implies E[XAm(X)] = 0.

Thus taking expectation on both sides in (2.12) leads to 0 = E

" m X

k=0

m k

χmk+1Ak(x)

#

m+1.

Thus, X has moment-generating function exp(φχ(z)) = exp(χ1z +χ2z2) corre- sponding to the normal distribution.

Wick products

There are several more general versions of Appell polynomials in the multivariate case. Here, we will introduce the so-called Wick products which are useful for the presentation of the diagram formula in the next section.

Definition 2.5 For random variablesX1, . . . , Xmthe Wick product:X1, . . . , Xm: is defined as :∅:= 1 for m= 0 and then recursively for m≥1 by

∂Xi :X1, . . . , Xm: = :X1, . . . , Xi1, Xi+1, . . . , Xm: E[:X1, . . . , Xm:] = δ0,m.

The Appell polynomials are then given by Am(X) =:X, . . . , X

| {z }

mtimes

:.

(25)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 20

2.3.2 Diagrams and the central limit theorem

Now, we want to outline the connection of Appell polynomials to the central limit theorem given in theorem 2.3. The link is given by the diagram formula, for which the introduction of joint cumulants and diagrams is needed.

Cumulants

We start with some notations: For an arbitrary index set I, let (Xi)i∈I be a system of random variables with finite moments. Given a finite subset W ⊂ I, denote XW = Q

iWXi and X′W = {Xi, i ∈ W}. For X′W with |W| = m, the joint cumulants are defined by

χ(XW) := χ(Xi, i∈W) := ∂m

∂z1· · ·∂zm lnEh

ePmj=1zjXiji

z1=···=zm=0. If Xi = X for all i ∈ W, then χm(X) := χ({X, . . . , X

| {z }

m

}) is the m-th cumulant of X. Thus the cumulants are the coefficients of the joint cumulant generating function. If the latter is not finite, it can be replaced by the logarithm of the joint characteristic function. The definition immediately leads to the following properties:

(i) Multilinearity: If Xik = P

j=1cikjYj with random variables Yj, j = 1,2, . . . and constants cikj ∈R, W ={ik:k = 1, . . . , m} ⊂I, then:

χ(X′W) = X j1,...,jm=1

Ym k=1

cikjk

!

χ({Yj1, . . . , Yjm}).

(ii) If there is a partitionW =W1∪W2for which{Xi, i∈W1}and{Xi, i∈W2} are independent, then χ(XW) = 0.

(iii) If XW are jointly normally distributed and |W| ≥3, thenχ(XW) = 0.

The aim is now to express the joint cumulants of Appell polynomials, or more general of Wick product χ(:XW:), in terms of χ(XU), U ⊂W. This will be the contents of the diagram formula. Later, we will see thatχ(XU) can be calculated explicitly, if Xi corresponds to a linear process.

(26)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 21 Diagram formalism

We now present the diagram formula for Wick products, which is the key for the proof of central limit theorems for linear processes. Though this formula provides a very elegant representation of cumulants, an extensive notation by means of so- called diagrams is necessary.

Let W = W1 ∪ · · · ∪ Wk denote a table whose rows consist of tuples Wj = {(j,1), . . . ,(j, mj)}, j = 1, . . . , k. (For intuition, W is an ordinary table whose rows Wj are arranged one apon the other and consists altogether of m := m1+

· · ·mk tuples.) A diagram γ = (V)r = (V1, . . . , Vr) is a partition of W, i.e. W = Sr

i=1Vr withVi, Vj disjunct fori6=j. The class of all diagrams overW is denoted by ΓW. A diagram γ = (V1, . . . , Vr) is called connected, if the rows W1, . . . , Wk

can not be divided into two subgroups which are partitioned separately by γ, i.e.

if there is no division K1∪K2 ={1,2, . . . , k} withK1∩K2 =∅, K1, K2 6=∅such that for all q = 1, . . . , r we have either Vq ∈ S

j∈K1Wj or Vq ∈ S

j∈K2Wj. The class of all connected diagrams is denoted by ΓcW. Finally, Vq is called an edge of the diagram γ = (V1. . . , Vr). An edge Vq is flat, if there is aj withVq ⊂Wj, and Γ6−W is the class of diagrams without flat edges respectively Γ6−W,c = Γ6−W ∩ΓcW, the class of connected diagrams without flat edges.

Theorem 2.5 Let X(j,i), (j, i) ∈ W be random variables with finite moments, where W ={(j, i) :i= 1, . . . , mj and j = 1, . . . , k}. Then, for usual products we have

E

" k Y

j=1

XWj

#

= X

γ=(V)rΓW

χ(XV1)· · ·χ(XVr) χ(XW1, . . . , XWk) = X

γ=(V)r∈ΓcW

χ(XV1)· · ·χ(XVr), while for Wick products

E

" k Y

j=1

:XWj :

#

= X

γ=(V)rΓ6−W

χ(X′V1)· · ·χ(X′Vr) χ(:XW1 :, . . . ,:XWk :) = X

γ=(V)r∈Γ6−,cW

χ(XV1)· · ·χ(XVr).

More generally, for combinations of usual and Wick products:

χ(:X′W1 :, . . . ,:X′Ws :, XWs+1, . . . , XWk) = X

χ(X′V1)· · ·χ(X′Vr),

(27)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 22 where the sum is taken over all connected diagrams without flat edges inW1, . . . , Ws.

Proof: See Giraitis and Surgailis (1986).

Diagrams and limit theorems

The usefulness of the diagram formula will become clear, if one applies the fol- lowing approach to weak convergence:

Theorem 2.6 Let (Yn)n1 be a sequence of random variables with finite mo- ments. Further assume that there exists a random variable Y such that for all k ≥0

χk(Yn)→ χk(Y) as n→ ∞.

Then, if the cumulants (or equivalently the moments) of Y uniquely determine the distribution of Y, we have Yn

d Y as n→ ∞.

Proof: By theorem 2.5, the moments of all orders of Yn converge to the corre- sponding moments of Y asn→ ∞. Thus the statement of the theorem is proven

in Gut (2007, theorem 8.6).

The proof of asymptotic normality (2.8) in theorem 2.3 can now be described as follows, compare e.g. Breuer and Major (1983), Giraitis and Surgailis (1985). We will follow the line of Surgailis (2003).

(1) By theorem 2.6, it suffices to show

χk(n1/2Sn,G)→0 as n→ ∞

for all k ≥ 3. (Note that χk(Y) = 0, k ≥ 3, if Y is normally distributed, and further that the normal distribution is uniquely determined by their moments.)

(2) Multilinearity of cumulants leads to χk(n−1/2Sn,G) =n−k/2

XM m1,...,mk=1

cm1· · ·cmkχ(Sn,Am1, . . . , Sn,Amk), where Sn,Amj :=Pn

t=1Amj(Xt) forj = 1, . . . , k.

(28)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 23 (3) The diagram formula can be applied to evaluate the cumulants of Appell

polynomials, so we get for m1, . . . , mk≥1:

χ(Sn,Am1, . . . , Sn,Amk) =

Xn t1,...,tk=1

χ(Am1(Xt), . . . , Amk(Xt))

= X

γΓ6−,cW

Jn,γ,

where

Jn,γ = Xn t1,...,tk=1

χV1. . . χVr, γ = (V1, . . . , Vr)∈ΓW.

The cumulants χV for V ⊂ W are defined as follows: For given t1, . . . , tk

and table W = W1 ∪. . .∪Wk with Wj = {(j, i) : i = 1, . . . , mj}, we fill the rows Wj with the random variables X(j,i) :=Xtj for all i = 1, . . . , mj, (1≤j ≤k). Then we denote

χV :=χ(X′V) = χ(Xt1, . . . , Xt1

| {z }

|V∩W1|−times

, . . . , Xtk, . . . , Xtk

| {z }

|VWk|−times

).

(4) Linearity of the process Xt (see (2.5) and recall that ξs are independent) leads to

χV = ˜χ|V|X

sZ

b(t1+s)|V∩W1|· · ·b(tk+s)|V∩Wk|, (2.13) where ˜χp = χp0) is the p-th cumulant of ξ0. Using (2.13), the product χV1· · ·χVr can be written (with a constant c) as

χV1· · ·χVr =cX

(sji)

Yk j=1

mj

Y

i=1

b(tj +sjiV1· · ·δVr, (2.14) where

δVq =1{sji =sji for all (j, i),(j, i)∈Vq}, q = 1, . . . , r.

The sum in (2.14) is taken over all ’matrices’ (sji)j=1,...,k;i=1,...,mj with sji ∈ Z. (However, note that we only get a non-zero summand, if thesji coincide within each edge Vq,q = 1, . . . , r.) Thus, the following representation holds:

Jn,γ =X

(sji)

Xn t1,...,tk=1

Yk j=1

mj

Y

i=1

b(tj +sji) Yr q=1

δVq. (2.15)

(29)

CHAPTER 2. LONG MEMORY AND LIMIT THEOREMS 24 The problem is then to show nk/2Jn,γ →0 as n → ∞. This will be done in the remaining two steps.

(5) Consider the case P

sZ|b(s)| < ∞. Then one can show by combinatorial arguments that Jn,γ = O(n), for every diagram γ ∈ Γ6−W,c and table W consisting of k≥2 rows (see Surgailis 2003, proposition 6.1).

(6) Finally, for coefficients (2.6), one decomposesb(s) =b<K(s)+b>K(s), where b<K(s) = b(s)1{|s| < K}, b>K(s) = b(s)1{|s| ≥ K}, and applies step (5) to Jn,γK (here, Jn,γK is defined analogously to (2.15) with b(·) replaced by b<K(s)). Then the proof is finished by proving |Jn,γ −Jn,γK | ≤ ǫ(K)nk/2, where ǫ(K) is independent of n and converges to zero for K → ∞, and γ ∈ Γ6−W,c for a table W consisting of rows Wj with |Wj| = mj ≥ m (see Surgailis 2003, lemma 6.1). Hence, only this last step depends on the Appell rank m of G.

These steps, in particular step (5) and (6), which we did not carry out here, will be adapted for a volatility model in section 4.1.4.

2.3.3 Expansion of entire functions

Obviously, the asymptotic results for G(Xt) given in theorems2.3 and 2.4 are of limited applicability in statistics since G has to be a polynomial. For instance, the study of robust location estimators (see section4.1) directly leads to bounded G excluding polynomials. Therefore, the question whether there are functions for which an Appell expansion of infinite order holds is of high importance in statistics. In this section, we point out that entire functions satisfying some growth condition are natural candidates in this context.

Therefore note that in (2.9) we can multiply both sides with A(ω)1 := MX(ω) and get

exp(zω) = 1 A(ω)

X m=0

ωm

m!Am(z), z ∈C. (2.16)

(Note the change of variables from P

m=0 zm

m!Am(x) to P

m=0 ωm

m!Am(z).) Thus, the definition of Appell polynomials already delivers an expansion of the entire function exp(·ω), where ω is such that the moment generating function MX(ω) is finite and greater than zero. We can then use results from complex analysis to

Referenzen

ÄHNLICHE DOKUMENTE

Also, the problem of determining the minimum number of mutually non overlapping con- gruent copies of a given disk which can form a limited snake is very complicated.. The only

Lemma 3.1 will now be used to derive a first result on the asymptotic convergence of choice probabilities to the multinomial Logit model.. In order to do so, an additional

Finally in Section 2.3 we consider stationary ergodic Markov processes, define martingale approximation in this case and also obtain the necessary and sufficient condition in terms

Working Papers are interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed

In this paper, we follow Salinetti and Wets [a] in analyzing the distributions induced by the multifunction regarded as a measurable function (random closed set)

Modeling time series by locally stationary long-memory processes is closely related to change point detection in the spectral domain.. For spectral change point detection in the

The paper is organized as follows. In Section 2, the background is explained and the filtered GPH estimator is defined. Asymptotic properties are derived in Section 3. Conditions

This in- volves multiresolution analysis, the continuous and discrete wavelet transforma- tion, construction of wavelet bases, shrinkage, thresholding, and some well-known results