Proof of Theorem 2 - Supplement: Technical details

B. Supplement: Technical details

B.2. Proof of Theorem 2

Testing for the number of states in hidden Markov models

Then for t= 2, . . . , n, we have that E

T₁^{l}(ζ₁)−T₁^{l}(ζ₂)

T_t^{l}(ζ₁)−T_t^{l}(ζ₂)

=λ^TΓ^t⁻¹λ.

Now

0 = E

T₁^{^l^}(ζ₁)−T₁^{^l^}(ζ₂)

=λ^Tπ, and therefore for some c >0,

λ^TΓ^t−1λ≤cr^t−1∥λ∥²,

where 0 ≤r <1 can be chosen slightly larger than the second-largest eigenvalue of Γ, see e.g. Seneta (2006, theorem 1.2). By (25), we get for some c1 >0

∥λ∥² ≤c₁||ζ₁−ζ₂||²1. Therefore

2 n

n t=2

(n+ 1−t)E

T₁^{^l^}(ζ₁)−T₁^{^l^}(ζ₂)

T_t^{^l^}(ζ₁)−T_t^{^l^}(ζ₂)≤2cc1

∞ t=2

r^t−1||ζ₁−ζ₂||²1

which concludes the proof.

Let us next turn to consistency of the QMLE.

Recall that we assume the entries of 

ϑ^∗₁, . . . , ϑ^∗_k₀

to be distinct and ordered ϑ^∗₁ <· · · < ϑ^∗_k₀, ν^∗ := (ν₁^∗, . . . , ν_d^∗₁), θ^∗ = (ν^∗, ϑ^∗₁, . . . , ϑ^∗_k₀). Let π^∗_j :=P(St =j) for j ∈ {1, . . . , k0} denote the true stationary probability of the Markov chain for state j and π^∗ := 

π₁^∗, . . . , π^∗_k₀

. The assumption of irreducibility, see A1, implies π_j^∗ >0.

For the QMLE under the hypothesis we write θ(k 0) = θ = 



ν,ϑ1, . . . ,ϑk0

, where ˆϑ1 ≤. . .≤ϑˆk0. For the QMLE (θ,π) under our specific alternative with 2k0 states, see (6), we write

θ= (ν, ϑ˜1, . . . ,ϑ˜2k0), π =β˜1π˜1,(1−β˜1)˜π1, . . . ,β˜k0π˜k0,(1−β˜k0)˜πk0

,

where each ˜βj ∈ J.

Lemma 7. Under Assumptions A1, A2 and A3, we have that a. ν→ν^∗, ˆπj →π_j^∗ and ϑˆj →ϑ^∗_j, j = 1, . . . , k0, in probability,

b. ν→ν^∗, π˜j →π_j^∗ and ϑ˜2j−1,ϑ˜2j →ϑ^∗_j, j = 1, . . . , k0, in probability.

Proof of Lemma 7. a. Let ¯Θ1 be the closure of Θ1 in ¯R^d¹, where ¯R = R ∪ {+∞,−∞}, and similarly for ¯Θ₂. For parametersθ= (ν, ϑ₁, . . . , ϑ_k₀),ν ∈R¯^d¹, ϑj ∈Θ¯2 and k0 weights π let

Gθ,π(t,s) =



j=1

πj I(ϑj ≤t, ν1 ≤s1, . . . , νd1 ≤sd1), t∈Θ¯2,s∈Θ¯1

denote the corresponding mixing distribution with at mostk0 components. Let dw() denote a metric which metrizes weak convergence of probabilities on ¯Θ2× Θ¯₁. Our claim follows from the weak convergence

G_θ,__π_, Gθ^∗,π^∗

→0 in probability, (26)

since by assumptionGθ^∗,π^∗ hask0 distinct support points, so that the (ordered) support points and weights ofG_θ,__π must converge as well.

To show (26), we apply the classical consistency result by Wald (1949), in the version of theorem 5.14 in van der Vaart (1998) for general M-estimators. Since the result only relies on a law of large numbers of an integrable function in the observations, the theorem also applies in case of stationary, ergodic observations (the P in van der Vaart is the univariate marginal distribution). In our case, the parameter is the mixing distribution G_θ,π, ν ∈ Θ¯₂, ϑ_j ∈ Θ¯₁, j = 1, . . . , k₀, which ranges through a compact set by compactness of ¯Θ2 and ¯Θ1, and the criterion function is the mixing density forν ∈Θ2, ϑj ∈Θ1,j = 1, . . . , k0,

mGθ,π(x) :=



Θ2×Θ1

f(x;s, t)dGθ,π(t,s) = f_mix^(k)(x;θ,π),

andmGθ,π(x) = −∞, if the parameters are not all contained in Θ1 and Θ2. The quasi log-likelihood of section 2, expressed in terms of the mixing distribution, is thus given by

Gθ,π

=

n t=1

mG_θ,π(Xt).

It remains to check the assumptions for Theorem 5.14 in van der Vaart (1998).

First, by identifiability of finite mixtures and the existence of the Kulback diver-gence, Assumption A3 a. and d., from the definiteness of the Kulback-Leibler divergence and the boundary condition mGθ,π(x) = −∞ the set of maximizers of EmGθ,π(X1) in Gθ,π is the singleton Gθ^∗,π^∗, and as noted above the space of mixing distributions is compact.

Now, condition (5.13) in van der Vaart (1998) is immediate from the uniform boundedness condition Assumption A3 c., For condition (5.12), ifd_w(G_θ_l_,π_l, G_θ,π)

→0, l → ∞, where all mixing distributions as above have at most k0 support points, then support points of the Gθl,πl must converge to some support point of G_θ,π, or their weight converges to 0. Further, the sum of the weights of the

Testing for the number of states in hidden Markov models

support points converging to a specific support of Gθ,π converges to the weight of that support point. Therefore, (5.12) in van der Vaart (1998) follows by the continuity and limit properties of the densities, Assumption A2 and Assumption A3 b.

Finally, by definition of G_θ,__π we have that ln

G_θ,__π_

≥ln

Gθ^∗,π^∗

, (27)

so that (26) finally follows from theorem 5.14 in van der Vaart (1998).

b. Now consider mixing distributions Gθ,π with up to 2k0 states for parame-ters θ = (ν, ϑ1, . . . , ϑ2k0) and 2k0-dimensional weights π (potentially with zero entries). We shall show that

G__θ,_π, Gθ^∗,π^∗

→0 in probability,

then by the specific forms of the parameter vector θ and the weight vector π, the claim in part b. follows. In order to apply theorem 5.14 in van der Vaart (1998), by the arguments in part a. we only need to check that

G__θ,_π_

≥ln(Gθ^∗,π^∗) +oP(1). (28) Now G_θ,__π can apparently be written as an element in

{Gθ,π : π ∈Ω2k0(J),ν ∈Θ1, ϑ2j−1, ϑ2j ∈Ij, j = 1, . . . , k0}. Since G__θ,_π_ is by definition the maximizer of l_n over this class, we have

G__θ,_π

≥ln

G_θ,__π , which together with (27) implies (28).

Setting

 sn :=

n t=1

˜b2t,

we have the following quadratic approximation to the test statistics.

Lemma 8. For the test statistic we have under the assumptions A1-A5 that

Rn−sup

z≥0



2z^′sn−nz^′Σ22z=oP(1), (29) where {z ≥0}:={(z₁, . . . , z_k₀) :z_j ≥0, j = 1, . . . , k₀}.

Proof of Lemma 8. The proof is quite similar to those in Chen, Chen and Kalbfleisch (2004) and Li and Chen (2010).

Decompose

Rn = 2(l_n^(2k⁰⁾(θ, π) −l^(k_n⁰⁾(θ^∗,π^∗))−2(l_n^(k⁰⁾(θ,π) −l_n^(k⁰⁾(θ^∗,π^∗))

=:R⁽¹⁾_n −R⁽⁰⁾_n .

Consider R⁽¹⁾n : We have R⁽¹⁾n = 2n

t=1log(1 +δt), where δt := f_mix^(2k⁰⁾(Xt;θ,π) −f_mix^(k⁰⁾(Xt;θ^∗,π^∗)

f_mix^(k⁰⁾(Xt;θ^∗,π^∗) .

First we derive an upper bound onR⁽¹⁾n . Since, log(1 +x)≤x−x²/2 +x³/3 we shall considern

t=1δ_t^l forl = 1,2,3.

Fort= 1, . . . , n we have

k0−1 j=1

(˜π_j −π_j^∗)∆_tj(ν,ϑ^∗) =

k0−1 j=1

(˜π_j−π_j^∗)f(X_t;ν, ϑ^∗_j)

f_mix^(k⁰⁾(X_t;θ^∗,π^∗) − f(X_t;ν, ϑ ^∗_k₀) f_mix^(k⁰⁾(X_t;θ^∗,π^∗)

k0−1 j=1

(˜π_j−π^∗_j)



j=1

(˜π_j−π^∗_j)f(X_t;ν, ϑ ^∗_j) f_mix^(k⁰⁾(X_t;θ^∗,π^∗) , since

k0−1 j=1

(˜πj−π_j^∗) = (1−π˜k0)−(1−π^∗_k₀) = −˜πk0 +π^∗_k₀ =−(˜πk0 −π_k^∗₀).

Now, we subtract the right side of the previous equation and add the resulting zero toδt. This gives

δ_t=

k0−1 j=1

(˜π_j−π^∗_j)∆_tj(ν, ϑ^∗) +f_mix^(k⁰⁾

X_t; (ν,ϑ^∗),π^∗

−f_mix^(k⁰⁾(X_t;θ^∗,π^∗) f_mix^(k⁰⁾(Xt;θ^∗,π^∗)



j=1

˜ π_jβ˜_j

f(X_t;ν,ϑ˜_2j−1)−f(X_t;ν, ϑ^∗_j) f_mix^(k⁰⁾(X_t;θ^∗,π^∗)



j=1

π_j(1−β˜_j)

f(X_t;ν,ϑ˜_2j)−f(X_t;ν, ϑ^∗_j) f_mix^(k⁰⁾(X_t;θ^∗,π^∗) .

(30)

Now we expand each of the terms in (30). To start, for t = 1, . . . , n and j = 1, . . . , k0−1

∆_tj(ν,ϑ^∗) = ∆_tj(ν^∗,ϑ^∗) +

∆_tj(ν,ϑ^∗)−∆_tj(ν^∗,ϑ^∗)

= ∆tj(ν^∗,ϑ^∗) + (ν−ν^∗)^T ∇ν

∆tj(ν,ϑ^∗)

= ∆tj(ν^∗,ϑ^∗) +

d l=1

(˜νl−ν_l^∗) [R^{_t^l^}(ν, ϑ^∗_j)−R^{_t^l^}(ν, ϑ^∗_k₀)]

Testing for the number of states in hidden Markov models

for some ν betweenν and ν^∗, and whereR^{_t^l^} is defined in (22). Therefore, we obtain

k0−1 j=1

(˜πj −π^∗_j)∆tj(ν, ϑ^∗) =

k0−1 j=1

(˜πj −π_j^∗)∆tj(ν^∗,ϑ^∗) +ε^(A)_tn , where

ε^(A)_tn :=

k0−1 j=1

(˜π_j−π_j^∗)·

d l=1

ν˜_l−ν_l^∗ 

R^{l}_t (ν, ϑ^∗_j)−R_t^{l}(ν, ϑ ^∗_k₀)

. Therefore,

n t=1

k0−1 j=1

(˜πj−π_j^∗)∆tj(ν,ϑ^∗) =

n t=1

k0−1 j=1

(˜πj−π_j^∗)∆tj(ν^∗,ϑ^∗)

+ε^(A)_n , (31) where due to Lemmas 6 and 7,

ε^(A)_n =

n t=1

ε^(A)_tn =oP(n^1/2)

k0−1 j=1

π˜j −π^∗_j .

The second part of (30) can be expanded similarly. Here, for brevity we omit Xt,π^∗,ϑ^∗, k0in the marginal mixture, i.e. writingft,mix(ν) forf_mix^(k⁰⁾

Xt; (ν,ϑ^∗),π^∗ . We obtain

ft,mix(ν)−ft,mix(ν^∗) ft,mix(ν^∗) =

ν−ν^∗T

∇^νft,mix(ν^∗) + 1/2

 ν−ν^∗T

∇^νν ft,mix(ν)

 ν−ν^∗ ft,mix(ν^∗)

d l=1

(˜νl−ν_l^∗)U_t^{^l^}(ν^∗,ϑ^∗) + 1/2

d l,i=1

(˜νl−ν_l^∗)(ν1i−ν_i^∗)V_t^{^l,i^}(ν,ϑ^∗)

d l=1

(˜νl−ν_l^∗)U_t^{l}(ν^∗,ϑ^∗) +ε^(B)_tn

where and ν is again betweenν and ν^∗, and V_t^{^l,i^}(ν,ϑ) is defined in (22). By Lemmas 6 and 7 it follows that

n i=1

f_mix^(k⁰⁾

Xt; (ν,ϑ^∗),π^∗

−f_mix^(k⁰⁾

Xt; (ν^∗,ϑ^∗),π^∗ f_mix^(k⁰⁾

Xt; (ν^∗,ϑ^∗),π^∗

n i=1

d l=1

U_t^{^l^}(ν^∗) (˜νl−ν_l^∗)

+ε^(B)_n ,

(32)

where

ε^(B)_n =

n t=1

ε^(B)_tn =oP(n^1/2)

d l=1

(˜νl−ν_l^∗).

To expand the remaining term in (30), we now consider

f(Xt;ν,ϑ˜_2j−i)−f(Xt;ν, ϑ ^∗_j)

/f_mix^(k⁰⁾(Xt;θ^∗,π^∗) fort = 1, . . . , n, j = 1, . . . , k0 and i= 0,1. We have

f(Xt;ν,ϑ˜2j−i)−f(Xt;ν, ϑ^∗_j)

f_mix^(k⁰⁾(Xt;θ^∗,π^∗) = f(Xt;ν^∗,ϑ˜2j−i)−f(Xt;ν^∗, ϑ^∗_j)

f_mix^(k⁰⁾(Xt;θ^∗,π^∗) +ε^(C_tjin¹⁾ where

ε^(C_tjin¹⁾ :=

d l=1

(˜νl−ν_l^∗)fν_l(Xt;νj,ϑ˜2j−i)−fν_l(Xt;νj, ϑ^∗_j) f_mix^(k⁰⁾(Xt;θ^∗,π^∗)

d l=1

(˜ν_l−ν_l^∗) ( ˜ϑ_2j−i −ϑ^∗_j)f_ν_l_ϑ(X_t;ν_j, ϑ_2j−i) f_mix^(k⁰⁾(Xt;θ^∗,π^∗) , and ϑ2j−i and νj lie between the appropriate parameters.

Moreover,

f(Xt;ν^∗,ϑ˜_2j−i)−f(Xt;ν^∗, ϑ^∗_j) f_mix^(k⁰⁾(X_t;θ^∗,π^∗)

=Y_tj^′( ˜ϑ2j−i−ϑ^∗_j) + 1/2Y_tj^′′( ˜ϑ2j−i−ϑ^∗_j)²+ 1/6Y_tj^′′′(ν^∗, ϑ^′_2j₋_i)( ˜ϑ2j−i−ϑ^∗_j)³. Therefore, settingε^(C_tjin²⁾:= 1/6Y_tj^′′′(ν^∗,ϑ˜2j−i)( ˜ϑ2j−i−ϑ^∗_j)³ let us define the error term by

ε^(C)_tn :=



j=1



˜ πjβ˜j



ε^(C_tj1n¹⁾+ε^(C_tj1n²⁾

+ ˜πj(1−β˜j)

ε^(C_tj0n¹⁾+ε^(C_tj0n²⁾

, We obtain that

n t=1



j=1

˜ πjβ˜j

f(Xt;ν, ϑ˜2j−1)−f(Xt;ν, ϑ ^∗_j)

+ ˜πj(1−β˜j)

f(Xt;ν,ϑ˜2j)−f(Xt;ν, ϑ ^∗_j) f_mix^(k⁰⁾(Xt;θ^∗,π^∗)

n t=1



j=1

π˜jm1jY_tj^′ + ˜πjm2jY_tj^′′ +ε^(C)_n ,

(33)

where



mhj := ˜βj( ˜ϑ2j−1−ϑ^∗_j)^h+ (1−β˜j)( ˜ϑ2j −ϑ^∗_j)^h, for h= 1,2 and

ε^(C)_n :=

n t=1

ε^(C)_tn =oP(n^1/2)



j=1

πj(m1j+m2j)

Testing for the number of states in hidden Markov models

by Lemmas 6 and 7. Due to equations (31), (32) and (33) we may write

n t=1

δt=

n t=1

b^T_tτ+εn, where εn =ε^(A)n +ε^(B)n +ε^(C)n and

 τ =

π₁−π^∗₁, . . . ,π˜_k₀₋₁−π_k^∗₀₋₁,ν˜₁−ν₁^∗, . . . ,ν˜_d₁−ν_d^∗₁,

π₁m₁₁, . . . ,˜π_k₀m_1k₀,π˜₁m₂₁, . . . ,˜π_k₀m_2k₀T

. (34)

Using |x| ≤1 +x² we further see that

|ε_n| ≤o_P(1)

3k0−1+d1

j=1

n^1/2|τ_j| ≤o_P(1)

3k0−1+d1

j=1

(n τ_j²+ 1) =o_P(n)τ^Tτ+o_P(1).

Turning to n

t=1δ_i² we have

n t=1

δ_t² =

n t=1

(b^T_tτ)²+ε^(Q)_n where

ε^(Q)_n :=

n t=1

ε^(A)_tn +ε^(B)_tn +ε^(C)_tn 2

+ 2

n t=1

b^T_tτ (ε^(A)_tn +ε^(B)_tn +ε^(C)_tn ) . Now

|ε^(A)_tn | ≤oP(1) g(Xt)^1/3

k0−1 j=1

(˜πj −π_j^∗),

|ε^(B)_tn | ≤oP(1)g(Xt)^1/3

d l=1

(˜νl−ν_l^∗),

|ε^(C)_tn | ≤oP(1)g(Xt)^1/3



j=1

πj (m1j +m2j).

By integrability of g(Xt), we get from the ergodic theorem

n t=1

ε^(A)_tn +ε^(B)_tn +ε^(C)_tn 2

≤4

n t=1

(ε^(A)_tn )²+ (ε^(B)_tn )²+ (ε^(C)_tn )²

≤oP(n)τ^Tτ+oP(1) =OP(εn) +oP(1).

As in Li and Chen (2010), by the Cauchy inequality the second error term of the expansion of n

t=1δ_t² results in no higher order. Since the remainder term

of the expansion of n

t=1δ_t³ is also OP(εn), we obtain the following bound for R⁽¹⁾n

R_n⁽¹⁾ ≤2

n t=1

b^T_tτ−

n t=1

(b^T_tτ)²+ 2/3

n t=1

(b^T_tτ)³+OP(εn). (35) In order to estimate the cubic term, from

n⁻¹

n t=1

btb^T_t ^a.s.→E(b1b^T₁) we obtain

n t=1

(b^T_tτ)² =nτ^TΣτ (1 +oP(1)).

Because of the positive definiteness of Σ, we further get

n t=1

(b^T_tτ)²+OP(εn) = nτ^TΣτ (1 +oP(1)) +oP(1)

and n

t=1(b^T_tτ)³

n

t=1(b^T_tτ)² ≤max(|τ|) =o_P(1).

Thus, (35) reduces to the following bound R⁽¹⁾_n ≤2

n t=1

b^T_tτ−n τ^TΣτ (1 +oP(1)) +oP(1).

Now, analogously to Li and Chen (2010) the just established upper bound for R⁽¹⁾n is bounded by OP(1) and thus we deduce τ =OP(n⁻^1/2). As for R⁽⁰⁾n , the classic expansion is

R⁽⁰⁾_n =

n t=1

b^T_1t nΣ11

−1n t=1

b^T_1t+oP(1).

Therefore,

R⁽¹⁾_n −R⁽⁰⁾_n ≤ sup

τ∈R^3k^0−1+d¹

2ⁿ

t=1

b^T_t

τ−nτ^TΣτ

−

n t=1

b^T_1t

nΣ11−1n t=1

b^T_1t+oP(1)

= sup

τ1∈R^2k⁰⁻^1+d¹

2ⁿ

t=1

b^T_1t

τ1−nτ^T₁Σ11τ1

+ sup

{τ2≥0}

2τ^T₂ⁿ

t=1

˜b2t

−nτ^T₂Σ22τ2



−

n t=1

b^T_1t

nΣ11−1n t=1

b^T_1t+oP(1)

= sup

{τ2≥0}

2τ^T₂ⁿ

t=1

˜b2t

−nτ^T₂Σ22τ2

+oP(1),

Testing for the number of states in hidden Markov models

where {z ≥0}:={(z1, . . . , zk0) :zj ≥0, j = 1, . . . , k0}.

The reasoning why this upper bound is attained in our setting is analogous to the i.i.d. case without structural parameters, i.e. to this in Li and Chen (2010).

Let τ^∗ := (˜τ^∗₁,τ^∗₂), with

τ^∗₁= arg sup

τ1∈R^2k^0−1+d¹

2ⁿ

t=1

b^T_1t

τ1−nτ^T₁Σ11τ1

=n⁻¹Σ⁻₁₁¹

n t=1

b1t=OP(n^−1/2),

τ^∗₂= arg sup

τ2≥0

2τ^T₂ⁿ

t=1

b˜2t

−nτ^T₂Σ22τ2

,

(36)

denote the vector attaining the upper bound of the previous display, where the order assessment of ˜τ^∗₁ is due to the CLT for stationary weak dependent processes.

The unrestricted optimal point of the second function in (36) is n⁻¹Σ⁻¹₂₂sn= OP(n^−1/2), since n^−1/2sn is asymptotically normal. This implies that the unre-stricted and hence the reunre-stricted optimum of the second function are bounded by n⁻¹s^T_nΣ⁻¹₂₂sn = OP(1). Therefore, we also have τ^∗₂ = OP(n⁻^1/2), because otherwise we would get a contradiction to the OP(1) upper bound.

Denote by ¯θ,π¯ the parameter leading, under the same mapping as in (34), toτ^∗. Due to the non-negativity restriction in (36) and τ^∗ =OP(n^−1/2) its existence is obvious. Further, since τ^∗ =O_P(n⁻^1/2)

π−π^∗ =OP(n⁻^1/2), ν¯−ν^∗ =OP(n⁻^1/2),

ϑ¯_2j−1−ϑ^∗_j =O_P(n⁻^1/4), ϑ¯_2j−ϑ^∗_j =O_P(n⁻^1/4), j = 1, . . . , k₀. Now, due to the previous order assessment and a further expansion, see Chen, Chen and Kalbfleisch (2004, proof of Lemma 2) for a similar argument, we obtain

R¯⁽¹⁾_n : = 2(l^(2k_n ⁰⁾(¯θ,π)¯ −l_n^(k⁰⁾(θ^∗,π^∗)) = 2ⁿ

t=1

b^T_t

τ^∗−n(τ^∗)^TΣτ^∗+oP(1)

= sup

τ∈R^3k⁰⁻^1+d¹



2ⁿ

t=1

b^T_t

τ −nτ^TΣτ

+oP(1) and thus

R¯⁽¹⁾_n −R⁽⁰⁾_n = sup

{τ2≥0}

2τ^T₂ⁿ

t=1

˜b_2t

−nτ^T₂Σ₂₂τ₂

+o_P(1).

Since R⁽¹⁾n ≥ R¯⁽¹⁾n due to the maximizing property of the QMLE under the alternative, it holds

R⁽¹⁾_n −R⁽⁰⁾_n ≥R¯⁽¹⁾_n −R⁽⁰⁾_n = sup

{τ2≥0}

2τ^T₂ⁿ

t=1

b˜2t

−nτ^T₂Σ22τ2

+oP(1).

This ends the proof of Lemma 8.

To conclude the proof of Theorem 2 we show that (˜b_2t)_tis a martingale difference sequence, which is quite analogous to the case in Appendix 1. Then (7) follows as in the i.i.d. setting of Li and Chen (2010).

Consider the filtration (Ft)_t_∈N with

F^t:=σ(Sj,bj;j ≤t) fort∈N.

Then L(bt| Ft−1) =L(bt|S_t−1), and therefore also L(˜b2t|Ft−1) =L(˜b2t|S_t−1).

Thus, it remains to show that E

˜b2t|St−1 =j

= 0, j = 1, . . . , k0. (37) Let

λh :=E(b1 |S1 =h) andγjh:=P (St =h|S_t−1 =j) for h= 1, . . . , k0. As the Markov chain can adoptk0 states under the hypothesis, it follows that E(bt|St−1 =j) =



h=1

γjhλh and E(blt|St−1 =j) =



h=1

γjhλhl for l = 1,2, where we partitionλ^T_h =

λ^T_h1,λ^T_h2

with λh1 ∈R^2k⁰⁻^1+d. We get E

˜b2t|St−1 =jT



h=1

γjhλ^T_h2−

 _k₀



h=1

γjhλ^T_h1



Σ⁻₁₁¹Σ12. (38)

Since 0 =E(b1) = k0

h=1π^∗_hλh, we obtain λk0 =

k0−1 h=1

chλh, with ch :=−π_h^∗/π_k^∗₀, (39) and inserting (39) in (38) gives setting d_h :=γ_jh+γ_jk₀c_h for h= 1, . . . , k₀−1,

E

˜b2t |St−1 =jT

k0−1 h=1

dhλ^T_h2−

_k₀₋₁



h=1

dhλ^T_h1



Σ⁻₁₁¹Σ12. (40) Now observe that

E(∆_1hb₁) =λ_h−λ_k₀, h= 1, . . . , k₀−1.

Testing for the number of states in hidden Markov models

Let

S :=



 0d×(k0−1)

Ik0−1

0_(2k₀_)×(k₀₋₁₎



, T :=



0d×(k0−1)

Ik0−1

0_k₀_×(k₀₋₁₎



,

then from the definition of Σ and (39) we get ΣS =

E(∆₁₁b₁), . . . , E(∆_1k₀₋₁b₁)

= λ₁−

k0−1 h=1

c_hλ_h, . . . , λ_k₀₋₁−

k0−1 h=1

c_hλ_h

=:Λ, (41)

where 0_· denotes matrices of zeros and I_· are identity matrices, all with the appropriate dimensions. This result also holds for the partitioned λ vectors, i.e.

Σl1T =

λ_1l−k0−1

h=1 c_hλ_hl , · · · , λ_(k₀_−1)l−k0−1

h=1 c_hλ_hl

, l= 1,2.

As in Appendix 1, one shows that

span (Λ) = span{λ1, . . . ,λk0−1},

where span (Λ) denotes the space spanned by the columns of Λ. Therefore, there is a matrix M ∈R^(k⁰⁻¹⁾^×^(k⁰⁻¹⁾ such that ΛM =

d1λ1 , . . . , dk0−1λk0−1

 and thus from (41)

ΣSM =

d1λ1, . . . , dk0−1λk0−1



and hence for the submatrices of Σ Σl1T M =

d1λ1l, . . . , dk0−1λ(k0−1)l

, l= 1,2.

This implies

1 ,· · ·, 1

M^TT^TΣ1l=

k0−1 h=1

dhλ^T_hl, l = 1,2.

Using this subsequently for l= 1 and l= 2 we get

_k₀₋₁



h=1

dhλ^T_h1



Σ⁻₁₁¹Σ12=

1 ,· · · , 1

M^TT^TΣ11Σ⁻₁₁¹Σ12 =

k0−1 h=1

dhλ^T_h2 which due to (40) implies (37). This ends the proof of Theorem 2.

C. Supplement: Details on Simulations, additional results,

Im Dokument Inference and Application of Likelihood Based Methods for Hidden Markov Models (Seite 54-66)