• Keine Ergebnisse gefunden

B. Supplement: Technical details

B.2. Proof of Theorem 2

Testing for the number of states in hidden Markov models

Then for t= 2, . . . , n, we have that E

T1{l}1)−T1{l}2)

Tt{l}1)−Tt{l}2)

TΓt1λ.

Now

0 = E

T1{l}1)−T1{l}2)

Tπ, and therefore for some c >0,

λTΓt−1λ≤crt−1∥λ∥2,

where 0 ≤r <1 can be chosen slightly larger than the second-largest eigenvalue of Γ, see e.g. Seneta (2006, theorem 1.2). By (25), we get for some c1 >0

∥λ∥2 ≤c1||ζ1−ζ2||21. Therefore

2 n

n t=2

(n+ 1t)E

T1{l}1)T1{l}2)

Tt{l}1)Tt{l}2)2cc1

t=2

rt−1||ζ1ζ2||21

which concludes the proof.

Let us next turn to consistency of the QMLE.

Recall that we assume the entries of 

ϑ1, . . . , ϑk0

to be distinct and ordered ϑ1 <· · · < ϑk0, ν := (ν1, . . . , νd1), θ = (ν, ϑ1, . . . , ϑk0). Let πj :=P(St =j) for j ∈ {1, . . . , k0} denote the true stationary probability of the Markov chain for state j and π := 

π1, . . . , πk0

. The assumption of irreducibility, see A1, implies πj >0.

For the QMLE under the hypothesis we write θ(k 0) = θ = 

ν,ϑ1, . . . ,ϑk0

, where ˆϑ1 ≤. . .≤ϑˆk0. For the QMLE (θ,π) under our specific alternative with 2k0 states, see (6), we write

θ= (ν, ϑ˜1, . . . ,ϑ˜2k0), π =β˜1π˜1,(1−β˜1)˜π1, . . . ,β˜k0π˜k0,(1−β˜k0)˜πk0

,

where each ˜βj ∈ J.

Lemma 7. Under Assumptions A1, A2 and A3, we have that a. ν→ν, ˆπj →πj and ϑˆj →ϑj, j = 1, . . . , k0, in probability,

b. ν→ν, π˜j →πj and ϑ˜2j1,ϑ˜2j →ϑj, j = 1, . . . , k0, in probability.

Proof of Lemma 7. a. Let ¯Θ1 be the closure of Θ1 in ¯Rd1, where ¯R = R ∪ {+∞,−∞}, and similarly for ¯Θ2. For parametersθ= (ν, ϑ1, . . . , ϑk0),ν ∈R¯d1, ϑj ∈Θ¯2 and k0 weights π let

Gθ,π(t,s) =

k0

j=1

πj I(ϑj ≤t, ν1 ≤s1, . . . , νd1 ≤sd1), t∈Θ¯2,s∈Θ¯1

denote the corresponding mixing distribution with at mostk0 components. Let dw() denote a metric which metrizes weak convergence of probabilities on ¯Θ2× Θ¯1. Our claim follows from the weak convergence

dw

Gθ,π, Gθ

→0 in probability, (26)

since by assumptionGθ hask0 distinct support points, so that the (ordered) support points and weights ofGθ,π must converge as well.

To show (26), we apply the classical consistency result by Wald (1949), in the version of theorem 5.14 in van der Vaart (1998) for general M-estimators. Since the result only relies on a law of large numbers of an integrable function in the observations, the theorem also applies in case of stationary, ergodic observations (the P in van der Vaart is the univariate marginal distribution). In our case, the parameter is the mixing distribution Gθ,π, ν ∈ Θ¯2, ϑj ∈ Θ¯1, j = 1, . . . , k0, which ranges through a compact set by compactness of ¯Θ2 and ¯Θ1, and the criterion function is the mixing density forν ∈Θ2, ϑj ∈Θ1,j = 1, . . . , k0,

mGθ,π(x) :=

Θ2×Θ1

f(x;s, t)dGθ,π(t,s) = fmix(k)(x;θ,π),

andmGθ,π(x) = −∞, if the parameters are not all contained in Θ1 and Θ2. The quasi log-likelihood of section 2, expressed in terms of the mixing distribution, is thus given by

ln

Gθ,π

=

n t=1

mGθ,π(Xt).

It remains to check the assumptions for Theorem 5.14 in van der Vaart (1998).

First, by identifiability of finite mixtures and the existence of the Kulback diver-gence, Assumption A3 a. and d., from the definiteness of the Kulback-Leibler divergence and the boundary condition mGθ,π(x) = −∞ the set of maximizers of EmGθ,π(X1) in Gθ,π is the singleton Gθ, and as noted above the space of mixing distributions is compact.

Now, condition (5.13) in van der Vaart (1998) is immediate from the uniform boundedness condition Assumption A3 c., For condition (5.12), ifdw(Gθll, Gθ,π)

→0, l → ∞, where all mixing distributions as above have at most k0 support points, then support points of the Gθll must converge to some support point of Gθ,π, or their weight converges to 0. Further, the sum of the weights of the

Testing for the number of states in hidden Markov models

support points converging to a specific support of Gθ,π converges to the weight of that support point. Therefore, (5.12) in van der Vaart (1998) follows by the continuity and limit properties of the densities, Assumption A2 and Assumption A3 b.

Finally, by definition of Gθ,π we have that ln

Gθ,π

≥ln

Gθ

, (27)

so that (26) finally follows from theorem 5.14 in van der Vaart (1998).

b. Now consider mixing distributions Gθ,π with up to 2k0 states for parame-ters θ = (ν, ϑ1, . . . , ϑ2k0) and 2k0-dimensional weights π (potentially with zero entries). We shall show that

dw

Gθ,π, Gθ

→0 in probability,

then by the specific forms of the parameter vector θ and the weight vector π, the claim in part b. follows. In order to apply theorem 5.14 in van der Vaart (1998), by the arguments in part a. we only need to check that

ln

Gθ,π

≥ln(Gθ) +oP(1). (28) Now Gθ,π can apparently be written as an element in

{Gθ,π : π ∈Ω2k0(J),ν ∈Θ1, ϑ2j1, ϑ2j ∈Ij, j = 1, . . . , k0}. Since Gθ,π is by definition the maximizer of ln over this class, we have

ln

Gθ,π

≥ln

Gθ,π , which together with (27) implies (28).

Setting

 sn :=

n t=1

˜b2t,

we have the following quadratic approximation to the test statistics.

Lemma 8. For the test statistic we have under the assumptions A1-A5 that

Rn−sup

z≥0

2zsn−nzΣ22z=oP(1), (29) where {z ≥0}:={(z1, . . . , zk0) :zj ≥0, j = 1, . . . , k0}.

Proof of Lemma 8. The proof is quite similar to those in Chen, Chen and Kalbfleisch (2004) and Li and Chen (2010).

Decompose

Rn = 2(ln(2k0)(θ, π) −l(kn0)))−2(ln(k0)(θ,π) −ln(k0)))

=:R(1)n −R(0)n .

Consider R(1)n : We have R(1)n = 2n

t=1log(1 +δt), where δt := fmix(2k0)(Xt;θ,π) −fmix(k0)(Xt)

fmix(k0)(Xt) .

First we derive an upper bound onR(1)n . Since, log(1 +x)≤x−x2/2 +x3/3 we shall considern

t=1δtl forl = 1,2,3.

Fort= 1, . . . , n we have

k0−1 j=1

(˜πj −πj)∆tj(ν,ϑ) =

k0−1 j=1

(˜πj−πj)f(Xt;ν, ϑj)

fmix(k0)(Xt) − f(Xt;ν, ϑ k0) fmix(k0)(Xt)

k0−1 j=1

(˜πj−πj)

=

k0

j=1

(˜πj−πj)f(Xt;ν, ϑ j) fmix(k0)(Xt) , since

k01 j=1

(˜πj−πj) = (1−π˜k0)−(1−πk0) = −˜πk0k0 =−(˜πk0 −πk0).

Now, we subtract the right side of the previous equation and add the resulting zero toδt. This gives

δt=

k0−1 j=1

(˜πj−πj)∆tj(ν, ϑ) +fmix(k0)

Xt; (ν,ϑ),π

−fmix(k0)(Xt) fmix(k0)(Xt)

+

k0

j=1

˜ πjβ˜j

f(Xt;ν,ϑ˜2j−1)−f(Xt;ν, ϑj) fmix(k0)(Xt)

+

k0

j=1

˜

πj(1−β˜j)

f(Xt;ν,ϑ˜2j)−f(Xt;ν, ϑj) fmix(k0)(Xt) .

(30)

Now we expand each of the terms in (30). To start, for t = 1, . . . , n and j = 1, . . . , k0−1

tj(ν,ϑ) = ∆tj) +

tj(ν,ϑ)−∆tj)

= ∆tj) + (ν−ν)Tν

∆tj(ν,ϑ)

= ∆tj) +

d l=1

(˜νl−νl) [R{tl}(ν, ϑj)−R{tl}(ν, ϑk0)]

Testing for the number of states in hidden Markov models

for some ν betweenν and ν, and whereR{tl} is defined in (22). Therefore, we obtain

k01 j=1

(˜πj −πj)∆tj(ν, ϑ) =

k01 j=1

(˜πj −πj)∆tj) +ε(A)tn , where

ε(A)tn :=

k01 j=1

(˜πj−πj

d l=1

ν˜l−νl 

R{l}t (ν, ϑj)−Rt{l}(ν, ϑ k0)

. Therefore,

n t=1

k01 j=1

(˜πj−πj)∆tj(ν,ϑ) =

n t=1

k01 j=1

(˜πj−πj)∆tj)

(A)n , (31) where due to Lemmas 6 and 7,

ε(A)n =

n t=1

ε(A)tn =oP(n1/2)

k0−1 j=1

π˜j −πj .

The second part of (30) can be expanded similarly. Here, for brevity we omit Xt, k0in the marginal mixture, i.e. writingft,mix(ν) forfmix(k0)

Xt; (ν,ϑ),π . We obtain

ft,mix(ν)ft,mix) ft,mix) =

ννT

νft,mix) + 1/2

ννT

νν ft,mix(ν)

νν ft,mix)

=

d l=1

νlνl)Ut{l},ϑ) + 1/2

d l,i=1

νlνl)(ν1iνi)Vt{l,i}(ν,ϑ)

=:

d l=1

νlνl)Ut{l},ϑ) +ε(B)tn

where and ν is again betweenν and ν, and Vt{l,i}(ν,ϑ) is defined in (22). By Lemmas 6 and 7 it follows that

n i=1

fmix(k0)

Xt; (ν,ϑ),π

−fmix(k0)

Xt; (ν),π fmix(k0)

Xt; (ν),π

=

n i=1

d l=1

Ut{l}) (˜νl−νl)

(B)n ,

(32)

where

ε(B)n =

n t=1

ε(B)tn =oP(n1/2)

d l=1

(˜νl−νl).

To expand the remaining term in (30), we now consider

f(Xt;ν,ϑ˜2j−i)−f(Xt;ν, ϑ j)

/fmix(k0)(Xt) fort = 1, . . . , n, j = 1, . . . , k0 and i= 0,1. We have

f(Xt;ν,ϑ˜2ji)−f(Xt;ν, ϑj)

fmix(k0)(Xt) = f(Xt,ϑ˜2ji)−f(Xt, ϑj)

fmix(k0)(Xt) +ε(Ctjin1) where

ε(Ctjin1) :=

d l=1

(˜νl−νl)fνl(Xtj,ϑ˜2ji)−fνl(Xtj, ϑj) fmix(k0)(Xt)

=

d l=1

(˜νl−νl) ( ˜ϑ2j−i −ϑj)fνlϑ(Xtj, ϑ2j−i) fmix(k0)(Xt) , and ϑ2ji and νj lie between the appropriate parameters.

Moreover,

f(Xt,ϑ˜2j−i)−f(Xt, ϑj) fmix(k0)(Xt)

=Ytj( ˜ϑ2ji−ϑj) + 1/2Ytj′′( ˜ϑ2ji−ϑj)2+ 1/6Ytj′′′, ϑ2ji)( ˜ϑ2ji−ϑj)3. Therefore, settingε(Ctjin2):= 1/6Ytj′′′,ϑ˜2ji)( ˜ϑ2ji−ϑj)3 let us define the error term by

ε(C)tn :=

k0

j=1

˜ πjβ˜j

ε(Ctj1n1)(Ctj1n2)

+ ˜πj(1−β˜j)

ε(Ctj0n1)(Ctj0n2)

, We obtain that

n t=1

k0

j=1

˜ πjβ˜j

f(Xt;ν, ϑ˜2j1)f(Xt;ν, ϑ j)

+ ˜πj(1β˜j)

f(Xt;ν,ϑ˜2j)f(Xt;ν, ϑ j) fmix(k0)(Xt;θ,π)

=

n t=1

k0

j=1

π˜jm1jYtj + ˜πjm2jYtj′′ +ε(C)n ,

(33)

where

mhj := ˜βj( ˜ϑ2j1−ϑj)h+ (1−β˜j)( ˜ϑ2j −ϑj)h, for h= 1,2 and

ε(C)n :=

n t=1

ε(C)tn =oP(n1/2)

k0

j=1

˜

πj(m1j+m2j)

Testing for the number of states in hidden Markov models

by Lemmas 6 and 7. Due to equations (31), (32) and (33) we may write

n t=1

δt=

n t=1

bTtτ+εn, where εn(A)n(B)n(C)n and

 τ =

˜

π1−π1, . . . ,π˜k01−πk0−1,ν˜1−ν1, . . . ,ν˜d1−νd1,

˜

π1m11, . . . ,˜πk0m1k0,π˜1m21, . . . ,˜πk0m2k0T

. (34)

Using |x| ≤1 +x2 we further see that

n| ≤oP(1)

3k01+d1

j=1

n1/2j| ≤oP(1)

3k01+d1

j=1

(n τj2+ 1) =oP(n)τTτ+oP(1).

Turning to n

t=1δi2 we have

n t=1

δt2 =

n t=1

(bTtτ)2(Q)n where

ε(Q)n :=

n t=1

ε(A)tn(B)tn(C)tn2

+ 2

n t=1

bTtτ (ε(A)tn(B)tn(C)tn ) . Now

(A)tn | ≤oP(1) g(Xt)1/3

k01 j=1

(˜πj −πj),

(B)tn | ≤oP(1)g(Xt)1/3

d l=1

(˜νl−νl),

(C)tn | ≤oP(1)g(Xt)1/3

k0

j=1

˜

πj (m1j +m2j).

By integrability of g(Xt), we get from the ergodic theorem

n t=1

ε(A)tn(B)tn(C)tn2

≤4

n t=1

(ε(A)tn )2+ (ε(B)tn )2+ (ε(C)tn )2

≤oP(n)τTτ+oP(1) =OPn) +oP(1).

As in Li and Chen (2010), by the Cauchy inequality the second error term of the expansion of n

t=1δt2 results in no higher order. Since the remainder term

of the expansion of n

t=1δt3 is also OPn), we obtain the following bound for R(1)n

Rn(1) ≤2

n t=1

bTtτ−

n t=1

(bTtτ)2+ 2/3

n t=1

(bTtτ)3+OPn). (35) In order to estimate the cubic term, from

n1

n t=1

btbTt a.s.→E(b1bT1) we obtain

n t=1

(bTtτ)2 =nτTΣτ (1 +oP(1)).

Because of the positive definiteness of Σ, we further get

n t=1

(bTtτ)2+OPn) = nτTΣτ (1 +oP(1)) +oP(1)

and n

t=1(bTtτ)3

n

t=1(bTtτ)2 ≤max(|τ|) =oP(1).

Thus, (35) reduces to the following bound R(1)n ≤2

n t=1

bTtτ−n τTΣτ (1 +oP(1)) +oP(1).

Now, analogously to Li and Chen (2010) the just established upper bound for R(1)n is bounded by OP(1) and thus we deduce τ =OP(n1/2). As for R(0)n , the classic expansion is

R(0)n =

n t=1

bT1t nΣ11

−1n t=1

bT1t+oP(1).

Therefore,

R(1)n R(0)n sup

τ∈R3k0−1+d1

2n

t=1

bTt

τTΣτ

n t=1

bT1t

nΣ11−1n t=1

bT1t+oP(1)

= sup

τ1∈R2k0−1+d1

2n

t=1

bT1t

τ1T1Σ11τ1

+ sup

2≥0}

T2n

t=1

˜b2t

T2Σ22τ2

n t=1

bT1t

nΣ111n t=1

bT1t+oP(1)

= sup

2≥0}

T2n

t=1

˜b2t

T2Σ22τ2

+oP(1),

Testing for the number of states in hidden Markov models

where {z ≥0}:={(z1, . . . , zk0) :zj ≥0, j = 1, . . . , k0}.

The reasoning why this upper bound is attained in our setting is analogous to the i.i.d. case without structural parameters, i.e. to this in Li and Chen (2010).

Let τ := (˜τ12), with

˜

τ1= arg sup

τ1∈R2k0−1+d1

2n

t=1

bT1t

τ1T1Σ11τ1

=n−1Σ111

n t=1

b1t=OP(n−1/2),

τ2= arg sup

τ20

T2n

t=1

b˜2t

T2Σ22τ2

,

(36)

denote the vector attaining the upper bound of the previous display, where the order assessment of ˜τ1 is due to the CLT for stationary weak dependent processes.

The unrestricted optimal point of the second function in (36) is n1Σ−122sn= OP(n−1/2), since n−1/2sn is asymptotically normal. This implies that the unre-stricted and hence the reunre-stricted optimum of the second function are bounded by n1sTnΣ−122sn = OP(1). Therefore, we also have τ2 = OP(n1/2), because otherwise we would get a contradiction to the OP(1) upper bound.

Denote by ¯θ,π¯ the parameter leading, under the same mapping as in (34), toτ. Due to the non-negativity restriction in (36) and τ =OP(n−1/2) its existence is obvious. Further, since τ =OP(n1/2)

¯

π−π =OP(n1/2), ν¯−ν =OP(n1/2),

ϑ¯2j−1−ϑj =OP(n1/4), ϑ¯2j−ϑj =OP(n1/4), j = 1, . . . , k0. Now, due to the previous order assessment and a further expansion, see Chen, Chen and Kalbfleisch (2004, proof of Lemma 2) for a similar argument, we obtain

(1)n : = 2(l(2kn 0)(¯θ,π)¯ −ln(k0))) = 2n

t=1

bTt

τ−n(τ)TΣτ+oP(1)

= sup

τ∈R3k0−1+d1

2n

t=1

bTt

τ −nτTΣτ

+oP(1) and thus

(1)n −R(0)n = sup

{τ20}

2τT2n

t=1

˜b2t

−nτT2Σ22τ2

+oP(1).

Since R(1)n ≥ R¯(1)n due to the maximizing property of the QMLE under the alternative, it holds

R(1)n −R(0)n ≥R¯(1)n −R(0)n = sup

{τ20}

2τT2n

t=1

2t

−nτT2Σ22τ2

+oP(1).

This ends the proof of Lemma 8.

To conclude the proof of Theorem 2 we show that (˜b2t)tis a martingale difference sequence, which is quite analogous to the case in Appendix 1. Then (7) follows as in the i.i.d. setting of Li and Chen (2010).

Consider the filtration (Ft)t∈N with

Ft:=σ(Sj,bj;j ≤t) fort∈N.

Then L(bt| Ft−1) =L(bt|St−1), and therefore also L(˜b2t|Ft−1) =L(˜b2t|St−1).

Thus, it remains to show that E

˜b2t|St1 =j

= 0, j = 1, . . . , k0. (37) Let

λh :=E(b1 |S1 =h) andγjh:=P (St =h|St−1 =j) for h= 1, . . . , k0. As the Markov chain can adoptk0 states under the hypothesis, it follows that E(bt|St1 =j) =

k0

h=1

γjhλh and E(blt|St1 =j) =

k0

h=1

γjhλhl for l = 1,2, where we partitionλTh =

λTh1Th2

with λh1 ∈R2k01+d. We get E

˜b2t|St1 =jT

=

k0

h=1

γjhλTh2

k0

h=1

γjhλTh1

Σ111Σ12. (38)

Since 0 =E(b1) = k0

h=1πhλh, we obtain λk0 =

k0−1 h=1

chλh, with ch :=−πhk0, (39) and inserting (39) in (38) gives setting dh :=γjhjk0ch for h= 1, . . . , k0−1,

E

˜b2t |St1 =jT

=

k01 h=1

dhλTh2

k01

h=1

dhλTh1

Σ111Σ12. (40) Now observe that

E(∆1hb1) =λh−λk0, h= 1, . . . , k0−1.

Testing for the number of states in hidden Markov models

Let

S :=

 0d×(k01)

Ik01

0(2k0)×(k0−1)

, T :=

0d×(k01)

Ik01

0k0×(k0−1)

,

then from the definition of Σ and (39) we get ΣS =

E(∆11b1), . . . , E(∆1k0−1b1)

= λ1

k01 h=1

chλh, . . . , λk0−1

k01 h=1

chλh

=:Λ, (41)

where 0· denotes matrices of zeros and I· are identity matrices, all with the appropriate dimensions. This result also holds for the partitioned λ vectors, i.e.

Σl1T =

λ1l−k01

h=1 chλhl , · · · , λ(k0−1)l−k01

h=1 chλhl

, l= 1,2.

As in Appendix 1, one shows that

span (Λ) = span{λ1, . . . ,λk01},

where span (Λ) denotes the space spanned by the columns of Λ. Therefore, there is a matrix M ∈R(k01)×(k01) such that ΛM =

d1λ1 , . . . , dk01λk01

 and thus from (41)

ΣSM =

d1λ1, . . . , dk01λk01

and hence for the submatrices of Σ Σl1T M =

d1λ1l, . . . , dk01λ(k01)l

, l= 1,2.

This implies

1 ,· · ·, 1

MTTTΣ1l=

k01 h=1

dhλThl, l = 1,2.

Using this subsequently for l= 1 and l= 2 we get

k01

h=1

dhλTh1

Σ111Σ12=

1 ,· · · , 1

MTTTΣ11Σ111Σ12 =

k01 h=1

dhλTh2 which due to (40) implies (37). This ends the proof of Theorem 2.

C. Supplement: Details on Simulations, additional results,