• Keine Ergebnisse gefunden

PROPERTIES OF THE NONPARAMETRIC AUTOREGRESSIVE BOOTSTRAP

N/A
N/A
Protected

Academic year: 2022

Aktie "PROPERTIES OF THE NONPARAMETRIC AUTOREGRESSIVE BOOTSTRAP"

Copied!
26
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

PROPERTIES OF THE NONPARAMETRIC AUTOREGRESSIVE BOOTSTRAP

J. FRANKE, J.-P. KREISS, E. MAMMEN AND M. H. NEUMANN

Abstract. We prove geometric ergodicity and absolute regularity of the nonpara- metric autoregressive bootstrap process. To this end, we revisit this problem for nonparametric autoregressive processes and give some quantitative conditions (i.e., with explicit constants) under which the mixing coecients of such processes can be bounded by some exponentially decaying sequence. This is achieved by using well-established coupling techniques. Then we apply the result to the bootstrap process and propose some particular estimators of the autoregression function and of the density of the innovations for which the bootstrap process has the desired properties. Moreover, by using some \decoupling" argument, we show that the sta- tionary density of the bootstrap process converges to that of the original process.

As an illustration, we use the proposed bootstrap method to construct simultane- ous condence bands and supremum-type tests for the autoregression function as well as to approximate the distribution of the least squares estimator in a certain parametric model.

Date: October 26, 1998.

1991Mathematics Subject Classication. Primary 62G09 secondary 62M10.

Key words and phrases. Bootstrap, nonparametric autoregression, coupling, geometric ergodicity, consistence.

.

(2)

1. Introduction

Since the seminal paper of Efron (1979), bootstrap methods have become a widely accepted and powerful tool to estimate the distribution as well as related quantities of certain statistics of interest. Typical elds of application are the construction of condence sets for parameters or the closely related problem of determining the critical region for tests. The basic idea of the bootstrap in its original form is to mimic, on the basis of a single sample at hand, the whole structure of the data generating process. In the context of time series, this leads to the additional challenge of estimating the dependence structure of the process.

We assume throughout the present paper that data are generated by a nonparamet- ric autoregressive process. Franke, Kreiss and Mammen (1997) discussed dierent bootstrap methods in this context. Besides two regression-type approaches includ- ing the wild bootstrap, they investigated the nonparametric autoregressive bootstrap which was rst proposed by Franke and Wendel (1992) and Kreutzberger (1993), and proved its consistency for the pointwise behaviour of nonparametric estimators of the mean and the variance function. In subsequent papers, Neumann and Kreiss (1997) and Kreiss, Neumann and Yao (1998) showed the validity of the wild boot- strap beyond the pointwise distribution. The ultimate goal of the present paper is to open such a wide eld of applications for the autoregressive bootstrap scheme.

For this purpose, we rst prove important basic properties of the bootstrap process such as absolute regularity and the convergence of the stationary distribution to that of the original process. Since the autoregressive bootstrap process is in particular a Markov chain, we can partially apply well-established techniques to prove the desired results. However, in contrast to many qualitative results in this eld which simply state a certain rate for the decay of the mixing coecients, we need here uniformity w.r.t. some parameters of the process varying within certain limits. This is because the properties of the bootstrap process depend on the original sample which is itself random. Hence, we will restate some well-known results with an explicit descrip- tion of how constants depend on certain features of the process. To make the paper understandable for statisticians who are not specialists in Markov chain theory, we present self-contained versions of all major proofs.

These results can be used to prove consistency of the autoregressive bootstrap in several instances. We illustrate this by constructing simultaneous condence bands and supremum-type tests for the autoregression function as well as by approximating the distribution of a least squares estimator in a certain parametric model.

2. Mixing of Markov chains revisited: A set of sufficient conditions for geometric ergodicity

Throughout the present paper, our minimal assumption on the data generating pro- cess is that fXtg forms a Markov chain. Properties like ergodicity and mixing are usually derived under two main assumptions: First, the existence of some \drift"

towards a certain compact set K, and second, some condition on the conditional

(3)

distribution of future states, given that Xt;1 falls into K. The latter condition en- sures that information about previous states will be forgotten suciently fast by the Markov chain. Here is the rst of our main conditions on the Markov chain:

(A1)

There exists a compact set K such that (i) there exist > 1 and " > 0 with

E (jXtjjXt;1 =x) ;1jxj ; " for all x =2K (ii) there exists A <1with

sup

x2K

fE (jXtjjXt;1 =x)g A:

The drift criterion already ensures that the set K is reached from every point with probability 1. However, it is not clear so far, which particular point in K is the rst one visited by the Markov chain. If, for example,K contains more than one absorbing set, then it isapriorinot clear to which of these sets the Markov chain will converge.

Moreover, it might also happen that the Markov chain is periodic, that is, it moves periodically through a nite cycle of disjoint sets. There are well-known techniques to handle such cases, however, in order to facilitate the technical part of this paper, we will impose a condition that excludes them.

(A2)

(i) K is a small set, that is,

there exist n0 2N, > 0 and a probability measure such that inf

x2K

fPn0(xB)g (B)

holds for all measurable sets B . Pn(x) denotes the n-step transition probability of the Markov chain started in x :

(ii) There exists > 0 such that inf

x2K

fP(xK)g : Remark 1.

(i) Classical properties like irreducibility, aperiodicity and the existence of a unique stationary density follow readily from (A1) and (A2) see the proof of Theorem 2.1.

(ii) To ensure aperiodicity and irreducibility, one often assumes instead of (A2) that the innovations, "t = Xt ;m(Xt;1) , are i.i.d. with an everywhere positive den- sity. However, as noted by Meyn and Tweedie (1993, page 99), such a condition is unnecessarily restrictive. A possible condition which immediately implies (A2) and does not require an everywherepositive density of the innovations is the following one:

(A2')

The conditional distributionL(Xt jXt;1 =x) has a density p(yjx) which fullls, for some c" > 0,

p(yjx) c > 0 for all xy2K with jx;yj" :

(4)

(iii) Assumption (A2) allows the distribution of the innovations "t =Xt;m(Xt;1) to depend on Xt;1, which in particular allows for conditional heteroscedasticity. We prove our results in this section in this general context, whereas we restrict them when dealing with the autoregressive bootstrap in the next section.

(iv) If fXtg can be written as Xt = m(Xt;1) + "t , where the innovations "t are i.i.d. with mean 0 and Ej"tj<1, then (A1) follows from

limsup

jxj!1

fjm(x)=xjg < 1:

The following lemma provides an important result about exponential moments of return times to K. The return time is dened as K = infft 1 j Xt 2 Kg . Moreover, we denote by Ex the conditional expectation under the condition that X0 =x .

Lemma 2.1.

Suppose that (A1) is fullled. Then (i) ExK ";1jxj for all x =2K

(ii) ExK (1 + ";1A) for all x2K:

Lemma 2.1 is the main tool to prove, in conjunction with assumption (A2), geometric ergodicity of the Markov chain, that is

Z

kPn(x) ; kVar(dx) C;n (2.1) for some > 1, where kkVar stands for the total variation norm and stands for if fXtg is started with the stationary distribution , or for the Dirac measure x0 if fXtg is started at some nonrandom point x0.

Exponential ergodicity will be proved via coupling of two Markov chains, one started at some nonrandom point x, and the other one started with initial distribution . We pair both chains in such a way that they are completely identical to each other after they arrived at any state simultaneously. The coupling of fXtg and fXt0g is actually organized in two steps. Both chains are run independently until they reach the set K simultaneously, perhaps still at dierent points x and x0. By (A2), the set K is an appropriate place for an attempt to initiate an exact pairing which may occur aftern0 further steps with a probability of at least . Lemma 2.1 guarantees, in conjunction with (A2)(ii), that a simultaneous entry in the set K occurs suciently often. This leads to the following theorem:

Theorem 2.1.

Suppose that (A1) and (A2) are fullled. Then (2.1) holds true with some > 1 and C<1 which only depend on KAn0 .

Having proved geometric ergodicity, we obtain the desired geometric absolute regu- larity immediately from Proposition 1 of Davydov (1973). The coecient of absolute regularity is dened as follows.

(5)

Let (AP) be a probability space and let U and V be two -subelds of A. The coecient of absolute regularity (-mixing coecient) is dened as

(UV) = E

"

sup

V2V

fjP(V jU) ; P(V )jg

#

= sup

Ui2UVj2V

12

8

<

: X

ij

jP(Ui)P(Vj) ; P(Ui\Vj)j

9

=

where the supremum in the last expression is taken over all nite partitions (Ui)i2I and (Vj)j2J of withUi 2U and Vj 2V.

In our particular case of a possibly nonstationary process fXtgt=01:::, we adopt the denition of Davydov (1973), namely

(s) = sup

t

E

2

4 sup

B2M 1

t+s

P(B jMt0) ; P(B)

3

5

where Mvu = (Xu::: Xv). Note that Davydov had an additional factor of 2 in comparison with our denition of (s).]

The following lemma shows the close connection of ergodicity and absolute regularity for Markov chains.

Lemma 2.2.

(adapted from Davydov (1973))

Let fXtg be a Markov chain with marginal distributions Xt t. Then (s) = 12 supt Z t(dx)kPs(x) ; t+skVar:

Now we obtain, in conjunction with Theorem 2.1, the desired mixing property of the Markov chain. Recall that is used to denote the initial distribution, that is X0 .

Corollary 2.1.

Suppose that (A1) and (A2) are fullled. Then (n) C;n :

So far we have derived sucient conditions for geometric ergodicity in the general context of a Markov chain fXtg. The nonparametric autoregressive bootstrap, which we study in the next section, is taylored for the special case thatfXtgcan be written in the form of a nonparametric autoregressive model,

Xt = m(Xt;1) + "t (2.2) where the innovations "t are independent, identically distributed random variables with mean 0. It can be easily seen that the following condition implies (A1) and (A2):

(A3)

fXtg obeys (2.2), where

(6)

(i) jm(x)jC1+C2jxj for allx and some C1 <1, C2 < 1 , (ii) Ej"tj<1,

(iii) p"(x) C3 > 0 for all x2;C4;supx2Kfm(x);xgC4;infx2Kfm(x); xg] and some C4 > 0 , where K = ;C5C5] , C5 > (C1+Ej"tj)=(1;C2) .

3. The nonparametric autoregressive bootstrap

In this section we will investigate important basic properties of the autoregressive bootstrap and therefore we restrict the quite general structure of the data generating process as considered in the previous section to the special case (2.2), where "t are i.i.d. with mean 0 and variance2. To ensure mixing properties to hold for fXtg, we assume that the "t have a densityp".

The nonparametric autoregressive bootstrap is a generalization of an idea of Efron and Tibshirani (1986) and Holbert and Son (1986) for the case of linear autoregression, and has been rst proposed by Franke and Wendel (1992) and Kreutzberger (1993).

It was proved in Franke et al. (1997) that this method is asymptotically consistent for the pointwise properties of kernel estimators of m. We continue this investigation and derive some important properties of this bootstrap method which will allow to apply this technique also for other problems such as the construction of simultaneous condence bands and supremum-type tests for the autoregression function as well as for approximating the distribution of a least squares estimator in a certain parametric model.

3.1.

Some basic properties of the autoregressive bootstrap.

The implemen- tation of the nonparametric autoregressive bootstrap requires explicit estimates cm and pb" of m and p", respectively. Before we propose some particular estimators, we formulate quite general conditions that ensure ergodicity and absolute regularity of the bootstrap process as well as some consistency properties. The bootstrap process is generated according to the equation

Xt = cm(Xt;1 ) + "t t = 1::: T (3.1) where the"t are i.i.d. with densitypb". Under the conditions given below, there exists a stationary distribution . For simplicity we assume that fXtg is stationary, that is, X0 .

To prove ergodicity and absolute regularity of fXtg, we need only some analog to (A3) forcm and pb" in place ofm and p", respectively. On the other hand, such a result alone would be of little use because one applies bootstrap methods to imitate some features of the original process. One of the minimal requirements is certainly that the stationary distribution of fXtg approximates that of fXtg in some appropriate sense. This will be ensured by suitable conditions on the consistency of the estimates m andc pb". We make throughout this paper the convention that > 0 denotes an arbitrarily small and < 1 an arbitrarily large constant. Moreover, we use the letter # > 0 to denote some appropriately chosen positive constant. Besides (A3),

(7)

we will assume

(A4)

There exists an appropriate sequence of sets T RT+1 with P((X0::: XT)62 T) = o(1) , such that for (X0::: XT) 2 T the following properties are fullled:

(i) jm(x)=xc j C1 +C2jxj , for some C1 < 1 and C2 < 1 . W.l.o.g. we assume that C1 and C2 coincide with the constants in (A3).]

(ii) supx2XT fjcm(x);m(x)jg = O(T;#) for an appropriate sequence of sets

X

T

2Rwith P(Xt62XT) =O(T;#) , (iii) kpb" ; p"k1 CT;# ,

(iv) R jpb"(x) ; p"(x)jdx CT;# ,

(v) for allM there exists some CM <1 such that

Z

j"jMpb"(")d" CM:

We propose in the next subsection particular estimators cm and pb" that satisfy (A4) under suitable conditions. Under (A3) and (A4), cm and pb" full the conditions of (A3) (possibly with dierent constants) with high probability. Hence, according to Theorem 2.1, fXtg is geometrically ergodic, which implies geometric absolute regularity. This is formalized in the following theorem:

Theorem 3.1.

Suppose that the data generating process obeys (2.2) and that (A3) and (A4) are fullled. Let (n) be the coecient of absolute regularity of the process

fXtg. Then there exists some b > 1 such that (n) Cb;nb holds if (X0::: XT)2T .

In the proofs of the previous theorems, we use coupling of Markov chains to get geometric ergodicity. To prove closeness of the stationary distributions of fXtg and

fXtg, we use the opposite approach which we call decoupling: We start both chains at a common point, X0 X0 x0, and analyze the decoupling of appropriately paired versions of them. Since, according to (A4), the transition probabilities are similar, we can couple both chains in such a way that P(Xn 6=Xn) increases slowly. On the other hand, both chains are geometrically ergodic. Therefore,Pn(x0) and Pn(x0) converge quite fast to and , respectively. This idea leads to the following theorem which characterizes the closeness of the respective stationary distributions and .

Theorem 3.2.

Suppose that the data generating process obeys (2.2) and that (A3) and (A4) are fullled. Then

sup

Bmeasurable

(B)T;# + T;;1j(B) ; (B)j C holds if (X0::: XT)2T , where (:) denotes the Lebesgue measure.

(8)

3.2.

Particular estimators of

m

and

p"

.

The consistency of the autoregressive bootstrap follows from suitable consistency properties of cm and pb". Franke et al.

(1997) proved an appropriate kind of uniform consistency of cm on a sequence of sets ; T T], T ! 1 , under the additional assumption that the stationary density is not less than cT ( cT ! 0 with a suitable rate) on ; T T]. Here we try to avoid this condition and impose regularity conditions solely on m and p". To be able to estimatem with a sucient accuracy, we assume that

(A5)

m is Lipschitz continuous.

To facilitate our proofs, in particular that of the consistency of a certain estimator of m, we assume that

(A6)

All moments of "tare nite.

In contrast to regression-type methods such as the wild bootstrap, it is also important to estimate the distribution of the innovations "t consistently. We will assume that

(A7)

p" is Lipschitz and of bounded total variation.

In view of the dierent size of the stationary density in dierent regions, it seems natural to use a nearest neighbor estimator of m, which is dened as

cmN(x) = N;1 X

t:X

t;1 2

b

N

N (x)

Xt: (3.2)

The (random) neighborhoods NcN(x) = x;nbN(x)x +nbN(x)] are chosen such that

#ft T j Xt;1 2 NcN(x)g = N, where N = N(T) ! 1 as T ! 1 . Instead of cmN one could also use other nonparametric estimators such as kernel or local polynomial estimators with appropriate adjustments of the bandwidths in regions of a low stationary density.

Since many assertions in this article are of the type that a certain random variable is below some threshold with a high probability, we introduce the following notation.

Denition 3.1.

Let fZTg be a sequence of random variables and let fTg and

f

T

g be sequences of positive reals. We write ZT = O(e T T)

if P(jZTj > CT) C T

holds for T 1 and some C <1.

(9)

This denition is obviously stronger than the usual OP and it is well suited for our particular purposes of constructing condence bands and nonparametric tests see its application in Section 4.

The following lemma provides a useful result about the uniform convergence proper- ties of mcN.

Lemma 3.1.

Suppose that the data generating process obeys (2.2) and that (A3), (A5) and (A6) are fullled. Then there exists a sequence of sets XT 2 R with P(Xt 62XT) =O(T;#) and

sup

x2X

T

fjcmN(x) ; m(x)jg = OeT N=T + N;1=2log(T)T;: Dene

pb"(x) = 1T Xt=1T 1 hK

x;"bt h

where "bt=Xt;cmN(Xt;1) are the residuals.

Lemma 3.2.

Suppose that the data generating process obeys (2.2) and that (A3) and (A5) to (A7) are fullled. Furthermore, let h and N be chosen such that h = O(T;#0) , h;1 =O(N1=2T;#0) and N = O(T1;#0) for some #0 > 0 . Then there exists some # > 0 such that

(i) kpb" ; p"k1 = OeT;#T;

(ii) R jpb"(x) ; p"(x)jdx = OeT;#T;:

4. Application to parametric and nonparametric estimates of the autoregression function

In the rst part of this section we use the proposed bootstrap method to construct simultaneous condence bands and supremum-type tests for the autoregression func- tion. Similar results for a regression-type bootstrap, the so-called wild bootstrap, can be found in Neumann and Kreiss (1997). The validity of the wild bootstrap in context with nonparametric estimation in autoregression relies on the fact that the underlying statistic forms a sum of martingale dierences. Moreover, bootstrap methods based on the (ctive) assumption of independent random variables are con- sistent for many statistics based on nonparametric estimators in the context of general processes since the eect of weak dependence vanishes asymptotically see, e.g., Neu- mann (1996, 1997). Usually, this is not true for parametric estimation. In such a situation a process bootstrap as proposed in this paper is really necessary for con- sistency, since the whole dependence structure of the underlying process has to be mimicked. One may argue that this may motivate the use of process bootstrap even for nonparametric estimation. However, for nonparametric estimation, a rigorous

(10)

comparison of process bootstrap with other resampling schemes would require higher order methods.

4.1.

Application to supremum-typestatistics: condence bands and goodness- of-t tests.

We suppose that the data generating process obeys (2.1). A simulta- neous condence band for m is usually based on and centered around some non- parametric estimator cmh(x). For simplicity, one can take a Nadaraya-Watson kernel estimator,

cmh(x) =

P

T

t=1Kx;Xht;1Xt;1

P

T

t=1Kx;Xht;1 : (4.1)

The dierence of cmh(x) and m(x) can be decomposed into a stochastic term,

X

t

K ((x;Xt;1)=h)

!

;1

X

t

K ((x;Xt;1)=h)Xt ; m(Xt;1)] (4.2) and a bias-type term,

X

t

K ((x;Xt;1)=h)

!

;1

X

t

K ((x;Xt;1)=h)m(Xt;1) ; m(x):

(4.3) We call the latter expression \bias-type term" rather than \bias term" since it is only asymptotically nonrandom.]

For the construction of condence intervals or bands, one may account for the bias- type term by separate adjustments, i.e., it is not necessary to imitate it by the bootstrap. Usual techniques are undersmoothing or explicit bias correction see, e.g., Neumann and Kreiss (1997) for a discussion in the context of nonparametric autoregression. In order to nd an appropriate width of the condence band, it remains to get knowledge about the stochastic term. This term can be approximated by (pK(:=h))(x)];11=T)PtK((x;Xt;1)=h)"t , where p denotes the density of . Hence, we have to approximate the distribution of

ST = sup

x2ab]

(

(pK(:=h))(x)];1

T1

X

t

Kx;Xt;1 h

"t

)

: (4.4)

For a parametric hypothesis H0 : m 2 M = fm j 2 g we can use the test statistic

WT = sup

x2R (

X

t

Kx;Xt;1 h

Xt ; ccm(Xt;1)]

)

(4.5)

where ccm is any estimator that satises on the hypothesis m2M sup

x2R (

X

t

Kx;Xt;1 h

ccm(Xt;1) ; m(Xt;1)]

)

= oP (Th)1=2(logT);1=2: (4.6) For the determination of critical values we have to approximate the distribution of WT. A sucient condition for (4.6) to be fullled is obviously that ccm itself

(11)

converges on the hypothesis in the supremum norm to m with a faster rate than (Th);1=2(logT);1=2. If (4.6) is actually satised, it suces to nd a consistent ap- proximation to the distribution of the statistic

UT = sup

x2R (

X

t

Kx;Xt;1 h

"t

)

(4.7)

which is closely related to ST in (4.4).

The distributions of ST and UT can be approximated by those of appropriate boot- strap statistics. We discuss only the approximation of UT by

UT = sup

x2R (

X

t

Kx;Xt;1 h

"t

)

(4.8) more closely. Whereas a purely analytic approach of showing consistency is pre- sumably quite cumbersome for such supremum-type functionals, a proof via strong approximations is much more convenient.

Lemma 4.1.

Suppose that the data generating process obeys (2.2) and that (A3) and (A4) are fullled. Then there exists on a suciently large probability space a pairing of (X0"1::: "T) and (X0"1::: "T) such that

sup

x2R (

X

t

Kx;Xt;1 h

"t ; X

t

K

x;Xt;1 h

!

"t

)

= oP (Th)1=2(logT);1=2 holds uniformly over all bootstrap distributions L((X0"1::: "T) j X0::: XT) for (X0::: XT)2T , where T is an appropriate set with P(cT) =o(1).

This strong approximation result basically says that the stochastic behaviour of the process fPtK((x;Xt;1)=h)"tgx2R is well approximated by that of the bootstrap counterpart fPtK((x;Xt;1 )=h)"tgx2R. This implies in particular that the dis- tribution of UT is consistently approximated by that of UT. As can be seen from Lemma 3.2 in Neumann and Kreiss (1997), the rate of oP((Th)1=2(logT);1=2) for the approximation error is just sucient for the validity of the bootstrap in the context of supremum-type functionals. Hence, we may apply the nonparametric autoregressive bootstrap to determine the critical value for a supremum-type test based onWT. For the same reason it can also be used for the construction of simultaneous condence bands.

4.2.

Application to a problem of parametric inference.

As an illustration for a situation where the nonparametric autoregressive bootstrap procedure (cf. Section 3) is really necessary, we consider the following example. Suppose that we intend to t a parametric model,

Xt = m(Xt;1) + "t

to the time series. For the sake of simplicity, let us deal with the simplest case, i.e., m(u) = mo(u) for some known function mo and the least squares estimator ^

(12)

which satises

pT^;= p1T

P

t(Xt;mo(Xt;1))mo(Xt;1)

1

T P

tmo(Xt;1)2 :

Recall that we do not assume that the parametric model coincides with the underlying model. If we assume (A1), (A2), (A3)(i) formo andEjXtj <1for some > 4, then we obtain from a CLT for strongly mixing processes, cf. Bosq (1996, Theorem 1.7), asymptotic normality for the least squares estimator ^ namely

pT^;)N 02=(Emo(X0)2)2 :

In the case of model inadequacy, the parameter is dened in the sense of the best approximation, that is

= arg min

e

EX1 ; me o(X0)2 = EX1mo(X0) Emo(X0)2 : The term

2 = E (X1;mo(X0))2mo(X0)2 +2X1

k =1

Cov(X1;mo(X0))mo(X0)(Xk +1;mo(Xk))mo(Xk)]

= E"21Emo(X0)2+E (m(X0);mo(X0))2mo(X0)2 +2X1

k =1

Cov(X1;mo(X0))mo(X0)(Xk +1;mo(Xk))mo(Xk)]

depends on the whole dependence structure of the process. The application of the wild bootstrap will lead in any case to an asymptotic normal distribution with variance E"21=Emo(X0)2 which is in general not equal to 2=(Emo(X0)2)2 : In contrast, the process bootstrap described in Section 3 leads to consistency. This is the content of the following result.

Lemma 4.2.

Suppose that the data generating process obeys (2.2) and that (A3), (A4) and (A7) are fulllled. Then

pT

P

tXtmo(Xt;1 )

P

tmo(Xt;1 )2 ;

!

)N

02=(Emo(X0)2)2

holds if (X0::: XT)2T : denotes the value of the optimal t (in the L2-sense) of a parametric model in the bootstrap world, i.e., =EX1mo(X0)=Emo(X0)2! EX1mo(X0)=Emo(X0)2 = as T !1 :

(13)

5. Proofs

Proof of Lemma 2.1. A condensed proof of this lemma has already been given in Nummelin and Tuominen (1982).

(i) Let x =2K. We get immediately from (A1)(i)

jxj ; E(jX1jj X0 =x) ": (5.1) Analogously we have

I(y2Kc)jyj ; E(jX2jjX1 =y)] "I(y 2Kc):

Multiplying both sides with and taking the expectation over X1 under the condition X0 =x, we obtain

ExI(X1 2Kc)hjX1j ; 2jX2ji 2"Px(X1 2Kc): (5.2) By analogous considerations, we get

ExI(X1::: Xk 2Kc)hkjXkj ; k +1jXk +1ji k +1"Px(X1::: Xk 2Kc):

(5.3) Now we obtain from (5.1) to (5.3) that

jxj "X1

k =0

k +1Px(X1::: Xk 2Kc) = "X1

k =0

k +1Px(K > k) "ExK: (ii) Forx2K, we obtain that

ExK = Z

K

P(xdy) + Z

K

cP(xdy)EyK

Px(K = 1) + ";1Z

K

cP(xdy)jyj:

Notice that the term \Px(K = 1)" was missing in Theorem 3.1 of Nummelin and Tuominen (1982) as well as on page 90 in Doukhan (1994).]

Proof of Theorem 2.1. (i) Some preliminaries: Irreducibility, recurrence and the existence of

First we check irreducibility of fXtg since this simplies the analysis by excluding the case of more than one absorbing set. By Lemma 2.1, ' = (:\K) is obviously an irreducibility measure. According to Proposition 4.2.2 from Meyn and Tweedie (1993, p. 88), there also exists a maximal irreducibility measure.

Since K is a small set with Px(K <1) = 1 for all x, we obtain from Theorem 8.3.6 in Meyn and Tweedie (1993, p. 187) that fXtgis recurrent. (fXtgis called recurrent if it is -irreducible and P1n=1Pn(xA) = 1 for each x 2 Rand every measurable set A with (A) > 0.)

Since fXtg is recurrent, we conclude from Theorem 10.4.4 of Meyn and Tweedie (1993, p. 242) that there exists a unique invariant measure which we denote by .

(14)

(ii) Coupling

Our proof of geometric ergodicity is mainly based on an appropriate coupling of one Markov chain started in some statex with another chain having an initial distribution equal to . This is one of the classical approaches to prove ergodicity of Markov chains see, for example, Lindvall (1992) and Meyn and Tweedie (1993). The most substantial novelty of our proof is that we focus on explicit constants which are necessary in view of the randomness of the parameters of the bootstrap process.

Coupling consists of establishing an appropriate pairing of two Markov chains, X0X1::: with X0 x

and X00X10::: with X00

on a joint probability space. Let be the rst time that both chains reach any state simultaneously. By the Markov property, we can pair these chains in such a way that XtXt0 for allt . We call the time the coupling time of the two processes. It is easy to see that

kPn(x) ; kVar = sup

f:jfj1

Z Pn(xdy)f(y) ; Z (dy)f(y)

2Px(Xn6=Xn0) = 2Px( > n): (5.4) For Markov chains with an accessible atom (A set is called anatomif there exists a probability measure such that P(xB) = (B) for all x2.) the construction of such a pairing is not dicult: One simply lets run both chains independently until they reach simultaneously, and from that time both chains are identical.

In our context, which includes the case of purely continuous distributions of the innovations"t, the existence of an accessible atom is not guaranteed. However, under assumption (A2)(i), we may use the splitting device of Nummelin (1978) and Athreya and Ney (1978) to introduce an appropriate substitute, which we also denote by , and which is an atom for the n0-skeleton for the chain, that is

Pn0(xB) = (B) for all x 2B measurable:

Hence, we can couplefXtgand fXt0gin such a way thatXtXt0 for allt +n0, where is the time of the rst common visit to the state.

To dene a suitable substitute for the atom , we apply the idea of Athreya and Ney (1978) and use an additional randomization with the aid of independent random variables Nt and Nt0, t = 1::: T, with P(Nt = 1) = P(Nt0 = 1) = . If Xt hits K, then we dene the n0-step transition probability equal to (:) if Nt = 1 and equal to Pn0(Xt:); (:)]=(1; ) if Nt = 0. (The same is done for the chain fXt0g in dependence on the value of Nt0.) In other words, Xt hits the atom if Xt 2 K and Nt = 1.

(iii) An experiment consisting of successive trials

(15)

In view of (5.4), it remains to nd a pairing of fXtg and fXtg such that

Z Px(+n0 > n)(dx) C;n (5.5) where Px refers to an initial condition of X0 =x for the Markov chain.

To bound the probabilityPx(+n0 > n), we consider successive trials of the chains

fXtg and fXt0g to hit the state at the same time. We dene stopping times i and i0 that refer to certain events that fXtgand fXt0g visitK. Let

0 = min

j

fXj 2Kg

and 00 = min

j

fXj0 2K jj 0g: Further, we dene inductively

i = min

j

fXj 2K jj i;10 g

and i0 = min

j

fXj0 2K jj ig:

It is clear that i and i0 are indeed stopping times with respect to the -eld Bi = (X0::: XiX00::: X0i0). These stopping times are dened in such a way that

00 00 1 10 :::i i0:::

The timecorresponds to the rst joint visit of the Markov chains fXtg andfXt0g at . Accordingly, we call a trial (ii0NiNi0) successful ifi =i;10 Ni =Ni;10 = 1 or i0 = iNi0 = Ni = 1. Our next step consists of showing that the conditional probability of a success of a trial (ii0NiNi0) given Bi;1 is bounded away from zero. Actually, we were not able to prove that P(i = i;10 Ni = Ni;10 = 1 j Bi;1) can be bounded in such a way. It might happen that i;10 ;i;1 is arbitrarily large.

Since we do not have an explicit lower bound for infjLinfx2Pxj(K), we cannot derive an explicit lower bound for P(i = i;10 Ni = Ni;10 = 1 j Bi;1).] However, fortunately, we can nd such a bound for P(i0 = iNi0 = Ni = 1 j Bi;1). This explains why we are considering such \double trials" (ii0NiNi0) rather than single trials.

Using the last-exit representation, we nd by (ii) of Lemma 2.1 that P i;i;10 LjBi;1

=

0

i;1

;

i;1

X

s=0 Z

K

PXi;10i;1;i;1;s(dy)Py(K L + s)

1

X

t=L

sup

y 2K

fPy(K t)g

1

X

t=L

sup

y 2K

nEyK;to CX1

t=L

;t ;! 0 as L!1: (5.6)

Referenzen

ÄHNLICHE DOKUMENTE

Particularly since the early 1970s, states have established a complex system of international treaties that regulate their rights and duties in different maritime spaces

Bioenergy, Germany, renewable energy systems, bioenergy carbon capture and storage, 46.. integrated assessment, climate policy

projects with no and low level policy impacts had significantly lower project success compared to 394.. projects with medium and high policy impacts. b) Projects initiated

Effects of electrokinetic phenomena on bacterial deposition monitored by quartz crystal microbalance with dissipation

The world needs effective thermal insulation of buildings for pollution control and energy savings. Optimum thermal, fire and acoustic insulations are achieved by using

In particular, we focus on the thermodynamic aspects of the adsorption process, including dimensional inconsistency of fitted parameters, Temkin isotherm assumptions,

Correlations of dissolved organic carbon (DOC), total nitrogen (TN), ammonia (NH4), and total phosphorus (TN) concentration with selected variables (C3/Tryp and specific

Prediction of soil organic carbon and the C:N ratio on a national scale using machine learning and satellite data: A comparison between Sentinel-2, Sentinel-3 and Landsat-8