# Proofs

In document Penalized likelihood based tests for regime switching in autoregressive models (Page 45-65)

almost surely and uniformly over (α, ϑ1, ϑ2,η) ∈ [δ,1−δ]×Θ2 ×H. Let ω be a point in the sample space for which (2.5.2) is true and note that the set of all such points has probability 1.

Suppose for a ω the claim of the theorem is not true and, for example (the procedure for the other parameters is the same), ϑb1 does not converge to ϑ0. There must exist a subsequence (n0)such that ϑb1n0 →ϑ0 6=ϑ0. Consider

0 ={(α, ϑ1, ϑ2,η) :|ϑ1−ϑ0| ≥, α∈[δ,1−δ]},

where=|ϑ0−ϑ0|/2. Then for all largen0,(αb,ϑb1,ϑb2,ηb)at the sample pointω, belongs to the subset. By Assumption 2.2 and Lemma 2.9Q(α, ϑ1, ϑ2,η)<0for all(α, ϑ1, ϑ2,η)∈Ω0. It then follows that

pln0(αb,ϑb1,ϑb2,bη)−pln0(1/2,ϑb0,ϑb0,ηb0)<0

for all large enoughn0. But this is a contradiction to (αb,ϑb1,ϑb2,ηb)being modified maxi-mum likelihood estimator, and so ϑb1n0 →ϑ0 on ω. Thus ϑb1n0 →ϑ0 almost surely.

Proof of Theorem 2.2. Let

rn(α, ϑ1, ϑ2,η) = 2{pln(α, ϑ1, ϑ2,η)−pln(1/2,ϑb0,ϑb0,bη0)}, r1n(α, ϑ1, ϑ2,η) = 2{pln(α, ϑ1, ϑ2,η)−pln(1/2, ϑ0, ϑ00)},

r2n= 2{pln(1/2, ϑ0, ϑ00)−pln(1/2,ϑb0,ϑb0,ηb0)}.

Therefore, Mn = rn(αb,ϑb1,ϑb2,ηb) and rn(α, ϑ1, ϑ2,η) = r1n(α, ϑ1, ϑ2,η) +r2n. We first examine r1n: Expand

r1n(α, ϑ1, ϑ2,η) = 2

n

X

i=1

log(1 +δi) + 2p(α)−2p(1/2) (2.5.3) with

δi = (1−α)

g(Xi|Xi−1p1,η) g(Xi|Xi−1p00) −1

g(Xi|Xi−1p2,η) g(Xi|Xi−1p00) −1

. (2.5.4)

We can write δi as

δi = (1−α)(ϑ1−ϑ0)g(Xi|Xi−1p1,η)−g(Xi|Xi−1p0,η) (ϑ1−ϑ0)g(Xi|Xi−1p00) +α(ϑ2−ϑ0)g(Xi|Xi−1p2,η)−g(Xi|Xi−1p0,η)

2−ϑ0)g(Xi|Xi−1p00)

2.5 Proofs 43

+(η1−η1,0)g(Xi|Xi−1p0,η)−g(Xi|Xi−1p0, η1,0, η2, . . . , ηd) (η1−η1,0)g(Xi|Xi−1p00)

+(η2−η2,0)g(Xi|Xi−1p0, η1,0, η2, . . . , ηd)−g(Xi|Xi−1p0, η1,0, η2,0, η3, . . . , ηd) (η2−η2,0)g(Xi|Xi−1p00)

...

+(ηd−ηd,0)g(Xi|Xi−1p0, η1,0, . . . , ηd−1,0, ηd)−g(Xi|Xi−1p00) (ηd−ηd,0)g(Xi|Xi−1p00)

= (1−α)(ϑ1−ϑ0)Yi1,η) +α(ϑ2−ϑ0)Yi2,η) +

+(η1−η1,0)Uiη1(η) +. . .+ (ηd−ηd,0)Uiηd1,0, . . . , ηd−1,0, ηd), (2.5.5) where Uiηj(·) is defined in (2.2.6). Now, for j = 1,2,

Yij,η) = Yij,η)−Yij, η1, . . . , ηd−1, ηd,0)

+Yij, η1, . . . , ηd−1, ηd,0)−Yij, η1, . . . , ηd−2, ηd−1,0, ηd,0) ...

+Yij, η1, η2,0, . . . , ηd−1,0, ηd,0)−Yij0) +(ϑj −ϑ0)(Zij)−Zi)

+(ϑj −ϑ0)Zi+Yi (2.5.6)

and

Uiηd1,0, . . . , ηd−1,0, ηd) = Uiηd1,0, . . . , ηd−1,0, ηd)−Uiηd+Uiηd Uiηd−11,0, . . . , ηd−2,0, ηd−1, ηd) = Uiηd−11,0, . . . , ηd−2,0, ηd−1, ηd)

−Uiηd−11,0, . . . , ηd−1,0, ηd)

+Uiηd−11,0, . . . , ηd−1,0, ηd)−Uiηd−1 +Uiηd−1

...

Uiη1(η) = Uiη1(η)−Uiη11, . . . , ηd−1, ηd,0) +Uiη11, . . . , ηd−1, ηd,0)

−Uiη11, . . . , ηd−2, ηd−1,0, ηd,0) ...

+Uiη11, η2,0, . . . , ηd−1,0, ηd,0)−Uiη1

+Uiη1. (2.5.7)

Plugging (2.5.6) and (2.5.7) into (2.5.5), we can write

δi = (η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+m1Yi+m2Zi+in, (2.5.8)

where

m1 = (1−α)(ϑ1−ϑ0) +α(ϑ2−ϑ0), m2 = (1−α)(ϑ1−ϑ0)2+α(ϑ2−ϑ0)2 and in is a remainder term. Note at this stage that in each of the sequences the variables (Uiηj)i≥1,j = 1, . . . d,(Yi)i≥1 and (Zi)i≥1 form square integrable (Assumption 2.4) station-ary martingale difference sequences w.r.t. the filtration generated by the observations(Xi).

Letn=Pn

i=1in. By Assumption 2.7, n = √

n(ηd−ηd,0)2OP(1) +√

n(ηd−1−ηd−1,0) Xd

j=d−1

j −ηj,0) OP(1) ...

+√

n(η1−η1,0)Xd

j=1

j −ηj,0) OP(1)

+√

n(1−α)(ϑ1−ϑ0)Xd

j=1

j −ηj,0) OP(1)

+√

nα(ϑ2−ϑ0) Xd

j=1

j−ηj,0)

OP(1) +√

n(1−α)(ϑ1−ϑ0)3OP(1) +√

nα(ϑ2−ϑ0)3OP(1).

Let us now restrict our attention to a small neighborhood of(η1,0, . . . , ηd,0, ϑ0)as suggested by the consistency results in Theorem 2.1(ii). Therefore, we may regardη1−η1,0, . . . , ηd− ηd,0, ϑ1−ϑ0, ϑ2−ϑ0 as oP(1) and we get

n = √

n(ηd−ηd,0)oP(1) +√

n(ηd−1−ηd−1,0)oP(1) +. . .+√

n(η1−η1,0)oP(1) +√

n(1−α)(ϑ1−ϑ0)oP(1) +√

nα(ϑ2−ϑ0)oP(1) +√

n(1−α)(ϑ1−ϑ0)2oP(1) +√

nα(ϑ2−ϑ0)2oP(1).

Since |x| ≤1 +x2, we obtain

|n| ≤n{(η1 −η1,0)2+. . .+ (ηd−ηd,0)2+m21+m22}op(1) +oP(1).

By Assumption 2.6 there is a λ >0 such that for all (α1, . . . , αd+2)∈Rd+2\ {0} we have E{α1U1η1 +. . .+αdU1ηdd+1Y1d+2Z1}2 ≥λ(α21 +. . .+α2d+2). (2.5.9)

2.5 Proofs 45 The ergodic theorem, Assumption 2.4 and (2.5.9) imply

Pn

i=1|(η1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd+m1Yi+m2Zi|3 Pn

i=1((η1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd+m1Yi+m2Zi)2

= E|η1−η1,0)U1η1 +. . .(ηd−ηd,0)U1ηd+m1Y1+m2Z1|3

E((η1−η1,0)U1η1 +. . .(ηd−ηd,0)U1ηd+m1Y1 +m2Z1)2OP(1)

≤ |η1−η1,0|3 +. . .+|ηd−ηd,0|3+|m1|3+|m2|31−η1,0)2+. . .+ (ηd−ηd,0)2+m21+m22 OP(1)

≤ {|η1−η1,0|+. . .+|ηd−ηd,0|+|m1|+|m2|}OP(1) =oP(1).

Therefore, using the properties of the penalty function in the second step, we get r1n(α, ϑ1, ϑ2,η)

= 2

n

X

i=1

log(1 +δi) + 2p(α)−2p(1/2)

≤ 2

n

X

i=1

δi

n

X

i=1

δ2i +2 3

n

X

i=1

δi3

≤ 2

n

X

i=1

{(η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+m1Yi+m2Zi}

n

X

i=1

{(η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+m1Yi+m2Zi}2{1 +oP(1)}

+oP(1).

We orthogonalize

iη1 = Uiη1, U˜iη2 =Uiη2 − Pn

k=1kη1Ukη2 Pn

k=1( ˜Ukη1)2

iη1, . . . (2.5.10)

i = Yi

d

X

j=1

Pn

k=1kηjYk Pn

k=1( ˜Ukηj)2

iηj, Z˜i =Zi− Pn

k=1Zkk Pn

k=1k2i

d

X

j=1

Pn

k=1kηjYk Pn

k=1( ˜Ukηj)2iηj.

Therefore

1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd+m1Yi+m2Zi

= t1iη1 +t2iη2+. . . tdiηd+td+1i +td+2i

with some coefficients ti, where in particular td+2 =m2.

Computing the maximum of the quadratic function q(t1, . . . , td+2) =2

n

X

i=1

{t1iη1 +. . . tdiηd +td+1i+td+2i}

n

X

i=1

{t1iη1+. . . tdiηd+td+1i+td+2i}2

we get an asymptotic upper bound for r1n as follows. Due to the constraint td+2 ≥0, (˜t1, . . . ,˜td+2) = arg max

t1,...,td+2

q(t1, . . . , td+2)

=

PU˜iη1 P( ˜Uiη1)2

, . . . ,

PU˜iηd P( ˜Uiηd)2

, PY˜i

PY˜i2,(PZ˜i)+ PZ˜i2

! (2.5.11)

and therefore an upper bound for r1n is given by r1n(αb,ϑb1,ϑb2,ηb)≤ (PU˜iη1)2

P( ˜Uiη1)2

+. . .+(PU˜iηd)2 P( ˜Uiηd)2

+(PY˜i)2

PY˜i2 +((PZ˜i)+)2

PZ˜i2 +oP(1). (2.5.12) Forα= 1/2and the valuesϑ˜1,ϑ˜2 andη˜ given implicitly in (2.5.11) we see that this upper bound is attained.

Expanding r2n in a similar way as r1n (see below),

−r2n = 2 n

pln(1/2,ϑb0,ϑb0,ηb0)−pln(1/2, ϑ0, ϑ00) o

= (PU˜iη1)2 P( ˜Uiη1)2

+. . .+(PU˜iηd)2 P( ˜Uiηd)2

+ (PY˜i)2

PY˜i2 +oP(1).

Therefore,

Mn= (PZ˜i)+2

PZ˜i2 +oP(1).

Let ( ˆUiηj),( ˆYi),( ˆZi) be the square integrable stationary martingale difference sequences obtained by replacing in (2.5.10) at each stage the empirical scalar products by their expected versions (e.g. Uˆiη2 = Uiη2EU

η1 1 U1η2

EU1η12Uiη1). Then n−1/2P

i( ˜Zi −Zˆi) = oP(1) and therefore our result follows from the ergodic theorem (applied to the denominator) and the central limit theorem for stationary ergodic martingale difference sequences (applied to the numerator).

2.5 Proofs 47

Expansion of r2n. We expand r2n = 2

n

X

i=1

log(1 +δi), δi = g(Xi|Xi−1p ;ϑ,η)

g(Xi|Xi−1p00) −1. (2.5.13) Write δi as

δi = (ϑ−ϑ0)Yi(ϑ,η) + (η1−η1,0)Uiη1(η) +. . .+ (ηd−ηd,0)Uiηd1,0, . . . , ηd−1,0, ηd).

Now,

Yi(ϑ,η) =

Yi(ϑ,η)−Yi(ϑ, η1, . . . , ηd−1, ηd,0)

+

Yi(ϑ, η1, . . . , ηd−1, ηd,0)−Yi(ϑ, η1, . . . , ηd−2, ηd−1,0, ηd,0) . . .

+

Yi(ϑ, η1, η2,0, . . . , ηd−1,0, ηd,0)−Yi(ϑ,η0) +

Yi(ϑ,η0)−Yi +Yi

and

Uiηd1,0, . . . , ηd−1,0, ηd) = Uiηd1,0, . . . , ηd−1,0, ηd)−Uiηd+Uiηd Uiηd−11,0, . . . , ηd−2,0, ηd−1, ηd) = Uiηd−11,0, . . . , ηd−2,0, ηd−1, ηd)

−Uiηd−11,0, . . . , ηd−1,0, ηd)

+Uiηd−11,0, . . . , ηd−1,0, ηd)−Uiηd−1 +Uiηd−1

...

Uiη1(η) = Uiη1(η)−Uiη11, . . . , ηd−1, ηd,0) + Uiη11, . . . , ηd−1, ηd,0)

−Uiη11, . . . , ηd−2, ηd−1,0, ηd,0) ...

+Uiη11, η2,0, . . . , ηd−1,0, ηd,0)−Uiη1 +Uiη1.

Then

δi = (η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+ (ϑ−ϑ0)Yi +in, (2.5.14)

where in is a remainder term. Let n=Pn

i=1in. By Assumption 2.7 we show n = √

n(ηd−ηd,0)2OP(1) +√

n(ηd−1−ηd−1,0){

d

X

j=d−1

j−ηj,0)}OP(1) +. . .

+√

n(η1−η1,0){

d

X

j=1

j −ηj,0)}OP(1)

+√

n(ϑ−ϑ0){

d

X

j=1

j−ηj,0)}OP(1).

Let us now restrict our attention to a small neighorhood of (η1,0, . . . , ηd,0)as suggested by the consistency results in Theorem 2.1(i). Therefore, we may regardη1−η1,0, . . . , ηd−ηd,0 as oP(1) and we get

n = √

n(ηd−ηd,0)oP(1) +√

n(ηd−1−ηd−1,0)oP(1) +. . . +√

n(η1 −η1,0)oP(1) +√

n(ϑ−ϑ0)oP(1).

Using |x| ≤1 +x2 we obtain

|n| ≤n{(η1 −η1,0)2+. . .+ (ηd−ηd,0)2+ (ϑ−ϑ0)2}op(1) +oP(1).

By Assumption 2.6, there exists a λ >0

E{α1U1η1 +. . . αdU1ηdd+1Y1}2 ≥λ(α21+. . .+α2d+1) (2.5.15) for all(α1, . . . , αd+1)∈Rd+1\ {0}. The ergodic theorem, Assumption 2.4 and (2.5.9) imply

Pn

i=1|(η1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd+ (ϑ−ϑ0)Yi|3 Pn

i=1((η1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd + (ϑ−ϑ0)Yi)2

= E|(η1−η1,0)U1η1 +. . .(ηd−ηd,0)U1ηd+ (ϑ−ϑ0)Y1|3 E((η1−η1,0)U1η1 +. . .(ηd−ηd,0)U1ηd+ (ϑ−ϑ0)Y1)2OP(1)

≤ |η1−η1,0|3+. . .+|ηd−ηd,0|3+|ϑ−ϑ0|3

1−η1,0)2+. . .+ (ηd−ηd,0)2+ (ϑ−ϑ0)2OP(1)

≤ {|η1−η1,0|+. . .+|ηd−ηd,0|+|ϑ−ϑ0|}OP(1) =oP(1).

Therefore, we get

r2n = 2

n

X

i=1

log(1 +δi)

2.5 Proofs 49

≤ 2

n

X

i=1

δi

n

X

i=1

δi2+2 3

n

X

i=1

δi3

≤ 2

n

X

i=1

{(η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+ (ϑ−ϑ0)Yi}

n

X

i=1

{(η1−η1,0)Uiη1 +. . .+ (ηd−ηd,0)Uiηd+ (ϑ−ϑ0)Yi}2{1 +oP(1)}

+oP(1).

We proceed as in (2.5.10) and obtain

1−η1,0)Uiη1 +. . .(ηd−ηd,0)Uiηd+ (ϑ−ϑ0)Yi

= t1iη1 +t2iη2 +. . . tdiηd+td+1i

with some coefficients ti. Computing the maximum of the quadratic function q(t1, . . . , td+1) =2

n

X

i=1

{t1iη1 +. . . tdiηd+td+1i}

n

X

i=1

{t1iη1 +. . . tdiηd+td+1i}2 we get

(˜t1, . . . ,t˜d+1) = arg max

t1,...,td+2q(t1, . . . , td+1)

=

PU˜iη1 P( ˜Uiη1)2

, . . . ,

PU˜iηd P( ˜Uiηd)2

, PY˜i

PY˜i2

! .

(2.5.16)

An upper bound for r2n is then given by r2n ≤ (PU˜iη1)2

P( ˜Uiη1)2

+. . .+ (PU˜iηd)2 P( ˜Uiηd)2

+ (PY˜i)2

PY˜i2 +oP(1).

For the valuesϑ˜and η˜ which are implicitly given in (2.5.16), that upper bound is attained.

Proof of Lemma 2.2. First, we consider model (2.2.1). Letµ(ζ, φ1, . . . , φp;xp0) = ζ+φ1x0+ . . .+φpx1−p. Then

U1φj =

∂µf X1;µ(ζ, φ1, . . . , φp;X0p), σ X1−j

f X1;µ(ζ, φ1, . . . , φp;X0p), σ , j = 1, . . . , p,

U1σ =

∂σf X1;µ(ζ, φ1, . . . , φp;X0p), σ f X1;µ(ζ, φ1, . . . , φp;X0p), σ , Y1 =

∂µf X1;µ(ζ, φ1, . . . , φp;X0p), σ f X1;µ(ζ, φ1, . . . , φp;X0p), σ , Z1 =

2

2µf X1;µ(ζ, φ1, . . . , φp;X0p), σ f X1;µ(ζ, φ1, . . . , φp;X0p), σ .

The covariance is non-degenerate if and only if these random variables are linearly inde-pendent (in L2). Therefore, suppose that for constantsbj,

b1Z1+b2Y1+b3U1σ+

p

X

j=1

bj+3U1φj = 0 a.s. (2.5.17) Since the distribution of X1, . . . , X1−p is equivalent to Lebesgue measure on Rp+1, in the sense that the associated probability measure and the Lebesgue measure on Rp+1 are mutually absolutely continuous, (2.5.17) is equivalent to

b1

2

2µf+b2

∂µf+b3

∂σf +

p

X

j=1

bj+3

∂µf x1−j = 0 Leb.−a.s.

where f =f x1;µ(ζ, φ1, . . . , φd;xp0), σ

. From (2.3.3), it follows thatb1 =b3 = 0 and that b2+

p

X

j=1

bj+3x1−j = 0 Leb.−a.s., so that b2 =b4 =. . .=bp+3 = 0.

Letj0 be the index of the autoregressive parameter which switches according to the hidden state St in model (2.2.2) andµ(ζ, φ1, . . . , φp;xp0) =ζ+φ1x0+. . .+φpx1−p. Then

U1ζ =

∂µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ) f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) , U1φτ =

∂µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ)X1−τ

f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) =U1ζX1−τ, 1≤τ 6= j0 ≤p, Y1 =

∂µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ)X1−j0

f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) =U1ζX1−j0, Z1 =

2

2µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ)X1−j2 0 f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) ,

2.5 Proofs 51

U1σ =

∂σf(X1;µ(ζ, φ1, . . . , φp;X0p), σ) f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) .

The covariance matrix of(U1ζ, U1φ1, . . . , U1φj0−1, U1φj0+1, . . . , U1φp, U1σ, Y1, Z1)is non-degenerate if and only if these random variables are linearly independent (inL2). Therefore, suppose that for some constants bj

b1U1ζ+b2U1σ +b3Z1+X

τ6=j0

b3+τU1φτ +b3+j0Y1 = 0 a.s. (2.5.18) holds. Since the distribution of X1, . . . , X1−p is equivalent to Lebesgue measure on Rp+1, (2.5.18) is equivalent to

b1

∂µf +b2

∂σf+b3

2

2µf x21−j0 +

p

X

τ=1

bτ+3

∂µf x1−τ = 0 Leb. – a.s. (2.5.19) From equation (2.3.3), it follows that

b2 = 0,

b3x21−j0 = 0 Leb. – a.s., b1+

p

X

τ=1

bτ+3x1−τ = 0 Leb. – a.s., so that b3 = 0 and b1 =b4 =. . .=bp+3 = 0.

Proof of Lemma 2.3. The characteristic function of thet-distribution is given by (cf. Hurst 1995)

ϕ(t) = K1

2ν(√

ν|t|) (√ ν|t|)12ν Γ 12ν

212ν−1 ,

whereΓ(·) is the Gamma function and Kp(·)is the modified Bessel function of the second kind and order p(cf. Andrews 1986, Chp. 6). Therefore, the characteristic function of the corresponding location-scale family is

ϕ(t;µ, σ) =eiµt ϕ(σt) =eiµtKm(√

νσ|t|) (√

νσ|t|)m

Γ (m) 2m−1 , (2.5.20) where we put m= 12ν. The partial derivatives are

∂ϕ(t;µ, σ)

∂µ = it eiµt Km(√

νσ|t|) (√

νσ|t|)m Γ (m) 2m−1 ,

2ϕ(t;µ, σ)

2µ = −t2 eiµt Km(√

νσ|t|) (√

νσ|t|)m Γ (m) 2m−1 and

∂ϕ(t;µ, σ)

∂σ = −|t|eiµtKm−1(√

νσ|t|)√ ν(√

νσ|t|)m

Γ(m)2m−1 ,

2ϕ(t;µ, σ)

2σ = |t|√ νeiµt Γ(m)2m−1

√ν|t|m

σm−1

νσ|t|Km−2(√

νσ|t|)−Km−1(√

νσ|t|) , cf. Andrews (1986).

(i). Taking the Fourier transform in (2.3.4) and interchanging integral and derivative gives a1∂ϕ(t;µ, σ)

∂µ +a22ϕ(t;µ, σ)

2µ +a3∂ϕ(t;µ, σ)

∂σ = 0 for all t ∈R. (2.5.21) Plugging the partial derivatives into (2.5.21), dividing byteiµt(√

ν|t|)mσm−1/ Γ(m)2m−1 , and putting x=√

νσ|t| gives

a1iσ Km(x)−a2σt Km(x)−a3

√νσsign(t)Km−1(x) = 0, t∈R. (2.5.22) Choosing t = 1 and t = −1 and adding, we get a1 = 0. Next, dividing by t Km(x) and lettingt → ∞(hence x→ ∞), since Km−1(x)/Km(x)→1(Andrews 1986), we get a2 = 0 and finally a3 = 0.

(ii). Taking the Fourier transform in (2.3.5) and interchanging integral and derivative gives b1∂ϕ(t;µ, σ)

∂µ +b2∂ϕ(t;µ, σ)

∂σ +b32ϕ(t;µ, σ)

2σ = 0 for all t∈R. (2.5.23) Plugging the partial derivatives into (2.5.23) and dividing byteiµt(√

ν|t|)mσm−1/ Γ(m)2m−1 , and putting x=√

νσ|t| gives b1iσ Km(x)−b2

νσsign(t)Km−1(x) +b3

νsign(t) xKm−2(x)−Km−1(x)

= 0, t ∈R. (2.5.24) Choosingt= 1 andt=−1and adding, we getb1 = 0. Therefore equation (2.5.24) reduces to

b2σKm−1(x)−b3 xKm−2(x)−Km−1(x)

= 0, t ∈R. (2.5.25) Dividing byx Km−2(x)and letting x→ ∞, sinceKm−1(x)/Km−2(x)→1 (Andrews 1986), we get b3 = 0 and therefore b2 = 0.

2.5 Proofs 53 Proof of Lemma 2.4. The proof basically follows the proof of Lemma 2.2 for model (2.2.2) up to equation (2.5.19). For the normal distribution (2.5.19) is equivalent to

b1

∂µf+b2

∂σf+ b3

σ

∂σf x21−j0 +

p

X

τ=1

bτ+3

∂µf x1−τ = 0 Leb. – a.s., (2.5.26) since σ22µf = ∂σ f holds for the normal distribution. From Lemma 2.6, to be shown, it follows that

b2+b3 σx21−j

0 = 0 Leb. – a.s., b1+

p

X

τ=1

bτ+3x1−τ = 0 Leb. – a.s., so that b2 =b3 = 0 and b1 =b4 =. . .=bp+3 = 0.

Proof of Lemma 2.5. Letµ(ζ, φ1, . . . , φp;xp0) =ζ+φ1x0+. . .+φpx1−p. Then U1ζ =

∂µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ) f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) , U1φτ =

∂µf(X1;µ(ζ, φ1, . . . , φp;X0p), σ)X1−τ

f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) =U1ζX1−τ, τ = 1, . . . , p, Y1 =

∂σf(X1;µ(ζ, φ1, . . . , φp;X0p), σ) f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) , Z1 =

2

2σf(X1;µ(ζ, φ1, . . . , φp;X0p), σ) f(X1;µ(ζ, φ1, . . . , φp;X0p), σ) .

The covariance matrix of (U1ζ, U1φ1, . . . , U1φp, Y1, Z1) is non-degenerate if and only if these random variables are linearly independent (in L2). Therefore, suppose that for some con-stantsbj

b1Z1+b2Y1+b3U1ζ+

p

X

τ=1

b3+τU1φτ = 0 a.s. (2.5.27) holds. Since the distribution of X1, . . . , X1−p is equivalent to Lebesgue measure on Rp+1, (2.5.27) is equivalent to

b1

2σf+b2

∂σf+b3

∂µf +

p

X

τ=1

b3+τ

∂µf x1−τ = 0 Leb. – a.s. (2.5.28)

with f =f(x1;µ(ζ, φ1, . . . , φp;xp0), σ). From (2.3.6) it follows that b1 =b2 = 0 and b3+

p

X

τ=1

b3+τx1−τ = 0 Leb. – a.s., so that b3 =. . .=b3+p = 0.

Proof of Lemma 2.6. The characteristic function of a normally distributed random variable with expectation µand standard deviation σ >0 is

ϕ(t;µ, σ) = exp itµ− σ2t2 2

. The partial derivatives are

∂ϕ(t;µ, σ)

∂σ = −σt2ϕ(t;µ, σ),

2ϕ(t;µ, σ)

2σ = (σ2t4−t2)ϕ(t;µ, σ)

and ∂ϕ(t;µ, σ)

∂µ =itϕ(t;µ, σ).

Taking the Fourier transform in (2.3.7) and interchanging integral and derivative gives a1∂ϕ(t;µ, σ)

∂µ +a2∂ϕ(t;µ, σ)

∂σ +a32ϕ(t;µ, σ)

2σ = 0 for all t ∈R. (2.5.29) Plugging the partial derivatives into (2.5.29) and dividing by ϕ(t;µ, σ) yields

a1it−a2σt2 +a32t4−t2) = 0 for all t∈R which is equivalent to

a1it+ (−a2σ−a3)t2+a3σ2t4 = 0 for all t∈R. (2.5.30) Plugging in t = −1 and t = 1 in (2.5.30) and subtracting, we get a1 = 0. Since the monomials {1, t, . . . , t4} build a basis of the vector space of all polynomials of degree less than or equal to 4, we get a2 = a3 = 0. The result follows by the inversion formula for probability density functions (see e.g. Billingsley, 1995).

Proof of Lemma 2.8. Letσ(ϑ, φ1, . . . , φp;xp0) = ϑ+φ1x20+. . .+φdx21−p(1/2)

. Then setting

2.5 Proofs 55 σ =σ(ϑ, φ1, . . . , φp;X0p),

U1φj =

∂σf X1;σ X1−j2 f X1

2σ , j = 1, . . . , p, Y1 =

∂σf X1;σ /(2σ) f X1;σ , Z1 =

2

2σf X1

/(4σ2)−∂σ f X1;σ /(4σ3)

f X1;σ .

Again, the covariance is non-degenerate if and only if these random variables are linearly independent in L2. Therefore, suppose that for constants bj,

b1Z1+b2Y1+

p

X

j=1

bj+2U1φj = 0 a.s. (2.5.31)

Again, (2.5.31) is equivalent to b12

2σf /(2σ)− ∂

∂σf /(2σ)

+b2

∂σf +

p

X

j=1

bj+2

∂σf x21−j = 0 Leb.−a.s.

where f =f x1;σ(ϑ, φ1, . . . , φp;xp0)

. From (2.3.9), b1 = 0 (as coefficient of 22σf) and b2+

p

X

j=1

bj+2x21−j = 0 Leb.−a.s., so that b2 =b3 =. . .=bp+2 = 0.

Proof of Theorem 2.3. It is clear that

EMn(K) ≤Mn≤ (PZ˜i)+2

PZ˜i2 +oP(1).

Since one of the starting values in the EM-test is assumed to beαJ = 1/2and since the EM algorithm only increases the value of the likelihood (even though applied to a penalized quasi likelihood, see below for the argument), using the same argument as in the end of the proof of Theorem 2.2, we have

EMn(K) ≥ (PZ˜i)+2

PZ˜i2 +oP(1) and the claim follows.

Derivation of the EM Property. For the argument, given the sample X1 =x1, . . . , Xn = xn, we work with a (hypothetic) independent regime (Sk)k≥0. The parameter vector is then given by ψT = (α, ϑ1, ϑ2T) ∈ Rd+3, where α is the probability for state 2 for the independent regime. Let

(i) S= (S1, . . . , Sn), X= (X1, . . . , Xn), x= (x1, . . . , xn)and s= (s1, . . . , sn), (ii) q be the joint pdf of (X,S)given X0p,ψ (under this artificial model), (iii) r be the pdf of S givenX, X0p,ψ (also under this artificial model).

so that

p(x|xp0,ψ)r(s|x, xp0,ψ) =q(x,s|xp0,ψ). (2.5.32) Explicitly, we have

p(x|xp0,ψ) =

n

Y

k=1

{(1−α)g(xk|xpk−11,η) +αg(xk|xpk−12,η)}, r(s|x, xp0,ψ) =

n

Y

k=1

(1−α)1{sk=1}α1{sk=2}g(xk|xpk−1sk,η) (1−α)g(xk|xpk−11,η) +αg(xk|xpk−12,η), q(x,s|xp0,ψ) =

n

Y

k=1

(1−α)1{sk=1}α1{sk=2}g(xk|xpk−1sk,η).

Denote by Eψ(k) expectation w.r.t. the (artificial) distribution including the independent regime under the parameterψ(k). From (2.5.32), we get

pln(ψ) = Q(ψ|ψ(k))−R(ψ|ψ(k)) +p(α), where

Q(ψ|ψ¯ (k)) = Eψ(k) log(q(X,S|X0p,ψ))|X, X0p(k)

+p(α), R(ψ|ψ(k)) = Eψ(k){log(r(S|X, X0p,ψ))|X, X0p(k)}

and ψ(k) is the current value of ψ. Then

Q(ψ¯ (k+1)(k))≥Q(ψ¯ (k)(k)) =⇒pln(k+1))≥pln(k)). (2.5.33) Proof of (2.5.33). Using Jensen’s inequality we get:

R(ψ(k+1)(k))−R(ψ(k)(k)) = Eψ(k)

(

log r(S|X, X0p(k+1)) r(S|X, X0p(k))

X, X0p(k) )

2.5 Proofs 57

≤ logEψ(k)

(r(S|X, X0p(k+1)) r(S|X, X0p(k))

X, X0p(k) )

= 0, and therefore

pln(k)) = Q(ψ¯ (k)(k))−R(ψ(k)(k))

≤ Q(ψ¯ (k+1)(k))−R(ψ(k)(k))

≤ Q(ψ¯ (k+1)(k))−R(ψ(k+1)(k))

= pln(k+1)).

Next we show that Q(ψ¯ (k+1)(k)) ≥ Q(ψ¯ (k)(k)) holds for the updates obtained by the ECM algorithm (as proposed in Meng and Rubin, 1993). Relabel ψ= (ψ1, . . . , ψd+3) and 1≤r≤d+ 3 let

π{t1,...,tr} :Rd+3 →Rr, π{t1,...,tr}1, . . . , ψd+3) = (ψt1, . . . , ψtr), P1, . . . , Pq any partition of {1, . . . , d+ 3}and −Pj ={1, . . . , d+ 3} \Pj. The ECM algorithm proceeds as follows.

Step 1: Compute ψ(k+1/q)= arg maxψQ(ψ|ψ¯ (k)) subject toπ−P1(ψ) =π−P1(k)).

Step 2: Compute ψ(k+2/q)= arg maxψQ(ψ|ψ¯ (k)) subject toπ−P2(ψ) =π−P2(k+1/q)).

...

Step q: Computeψ(k+q/q)= arg maxψQ(ψ|ψ¯ (k)) subject toπ−Pq(ψ) = π−Pq(k+(q−1)/q)

).

The updated value is given by ψ(k+1)(k+q/q). Then, by construction, we have Q(ψ¯ (k+1)(k))≥Q(ψ¯ (k+(q−1)/(q))(k))≥. . .≥Q(ψ¯ (k+1/q)(k))≥Q(ψ¯ (k)(k)).

which implies (2.5.33).

Since

Q(ψ|ψ¯ (k))

=

n

X

i=1

{log((1−α)g(Xi|Xi−1p1,η))(1−w(k)i ) + log(αg(Xi|Xi−1p2,η))wi(k)}+p(α)

with

w(k)i = α(k)g(Xi|Xi−1p(k)2(k))

(1−α(k))g(Xi|Xi−1p(k)1(k)) +α(k)g(Xi|Xi−1p(k)2(k)),

the algorithm in our EM-test is the ECM algorithm withP1 ={α, ϑ1, ϑ2} and P2 ={η}.

## autoregressive model with normal innovations

In this chapter we discuss testing for homogeneity in a linear switching autoregressive model with possibly switching intercept under the alternative and normal innovations. Even for HMMs with state-dependent distributions P(Xt ≤ x|St = i) = Φ (x−ζi)/σ

no feasible methods for testing the hypothesis H :m = 1 againstK :m ≥2 have been available, yet (see e.g. Piger 2009). For mixture models Chen and Li (2009) recently developed the so called EM-test for testing for homogeneity in a normal mixture model in the presence of a structural parameter. They show that the asymptotic distribution of the EM-test statistic is a simple function of the 12χ20+12χ21 and χ21 distributions.

### 3.1 Example 2.1.1 (reconsidered)

As noted in the previous chapter, the LRT for testing for homogeneity in a switching autoregressive model does not admit a usual χ2 distribution, since parameters of the full model are not identifiable under the hypothesis. For example, the hypothesis of a single regime in model (3.1.1), i.e. M={1}, can be represented byH :ζ12 under which the parameters a12 and a21 are not identifiable. In addition to that, testing for homogeneity in the model

XtSt +

p

X

j=1

φjXt−jt, tiid∼N(0,1) (3.1.1) is much more involved sinceσ2f(x;µ,σ)2µ = ∂f(x;µ,σ)∂σ holds for the normal distribution. There-fore, Assumption 2.6 is not satisfied for model (3.1.1) and the previously introduced MQLRT for testing for homogeneity does not admit a simple 12χ20+12χ21 distribution. This problem also arises in the related problem of testing for homogeneity in homoscedastic two-component normal mixtures which is a special case of our problem letting p= 0 and Stiid∼M ult 1; (1−α, α)

and has been studied extensively by Chen and Chen (2003), Qin and Smith (2004) and Chen and Li (2009). Chen and Chen (2003) derive an asymptotic upper bound for the MLRT for testing for homogeneity in normal mixture models in the presence of a structural parameter which is strengthened by Qin and Smith (2004). They

give a stochastic upper bound for the MLRT which has 12χ21 +12χ22 distribution. But it is not clear at all whether this upper bound is also attained. Chen and Li (2009) investigate an EM-test for testing for homogeneity in this model.

In the following we extend the EM-test of Chen and Li (2009) to linear switching autoregres-sive models with possibly switching intercept under the alternative and normal innovations.

To this end, we suppose that under the null hypothesis, i.e. no regime switch, (Xk)k is a causal AR(p) process. This assumption assures that the orderpas well as the parameters of the autoregressive process are uniquely defined, cf. Kreiss and Neuhaus (2006). Through-out this chapter we assume σ ∈ [δ,∞), δ > 0, and ζ ∈ Θ and φ = (φ1, . . . , φp)T ∈ H, where Θand H are any subsets of R and Rp, respectively.

### 3.1.1 Penalized maximum likelihood

As in Chapter 2, following Cho and White (2007), we consider a model which ignores the serial correlation in (Sk)k but captures the serial correlation of the process (Xk)k. Even if we ignore the serial correlation in(Sk)k we are able to test for the number of regimes. Let X1, . . . , Xn be a random sample of size n from model (3.1.1). We do not work with the (full) likelihood conditional on the intial observations (X0, . . . , X−p+1)and the initial state S0 =i0. Instead, we consider the quasi log-likelihood function which is given by

ln(ψ) =

n

X

t=1

log (1−α)g(Xt|Xt−1p1,φ, σ) +αg(Xt|Xt−1p2,φ, σ)

, (3.1.2)

with ψ = (α, ζ1, ζ2T, σ)T. Here, the parameter (1−α, α) corresponds to the stationary distribution of the hidden Markov chain (Sk)k, cf. Remark 1.1. Since we assume that the innovations (k)k are independent and identically normally distributed with expectation 0 and scale parameterσ, the conditional density (w.r.t. Lebesgue measure onR) ofXt given Xt−1p =xpt−1 and St=i is

g(xt|xpt−1i,φ, σ) = 1

√2πσ2 exp −(xt−ζi−Pp

j=1φjxt−j)22

! .

In the next section, we give a test for testing the hypothesis of no regime switch, i.e.

H :α(1−α)(ζ1−ζ2) = 0 in model (3.1.1) using a penalized version of (3.1.2).

In document Penalized likelihood based tests for regime switching in autoregressive models (Page 45-65)