• Keine Ergebnisse gefunden

Fully Observed Markov Model

Im Dokument Algebraic Statistics (Seite 133-137)

Hidden Markov Models

The hidden Markov model is a statistical model in which the system modelled is a Markov chain with unknown parameters, and the challenge is to determine the hidden parameters from the observable data. Hidden Markov models were introduced for speech recognition in the 1960s and are now widely used in temporal pattern recognition.

We first introduce the fully observed Markov model in which the states are visible to the observer.

Then we proceed to the hidden Markov model where the states are not observable, but each state has a probability distribution over the generated output data. This information can be used to determine the mostly likely state sequence that generated the output data.

6.1 Fully Observed Markov Model

We introduce a variant of the Markov chain model that will serve as a prelimiarly model for the hidden Markov model.

For this, we take an alphabet Σ with l symbols, an alphabet Σ with l symbols, and a positive integer n. We consider wordsσ=σ1. . . σn andτ =τ1. . . τn over Σ andΣ of lengthn, respectively.

These words are used to label the entries of an integral block matrix

A(l,l),n= (Aσ,τ)σ∈Σn,τ∈Σ′n, (6.1) whose entriesAσ,τ are pairs of matrices (w, w) such thatw= (wrs) is anl×l matrix andw= (wrt) is anl×l matrix. The entrywrs=wrs(σ) counts the number of occurrences inσof the length-2 word rs, and the entrywst=wst(σ, τ) counts the number of indicesi, 1≤i≤n, such thatσi=sandτi=t.

We may view the matrices (w, w) as columns of the matrix A(l,l),n. Then the matrix A(l,l),n has d=l·l+l·l =l2+l·l rows labelled by the length-2 words rs in Σ2 and in turn by the length-2 words rtin Σ×Σ. Moreover, the matrix hasm =ln·l′n columns labelled by the pairs of length-n wordsσandτ overΣandΣ, respectively. The matrixA(l,l),n has the property that the sum of each of its column entries is (n−1) +n= 2n−1, since each word of lengthnhasn−1 consecutive length-2 words and two words of length n pair in n positions. Thus the matrix A(l,l),n defines a toric model f =f(l,l),n:Rd→Rmgiven as

(θ, θ)7→(pσ,τ)σ∈Σn∈Σ′n, (6.2)

where

pσ,τ = 1

σ11θσ12θσ22θσ23· · ·θσn−1nθσnn. (6.3) Here we assume a uniform initial distribution on the states in the alphabetΣas described by (7.1). All termspσ,τhaven+(n−1) = 2n−1 factors. The parameter spaceΘof the model is the cartesian product of the set of positivel×l matricesθand the set of positivel×l matricesθ; that is,Θ=Rl×l>0×Rl×l>0. The matrixθencodes a toric Markov chain, while the matrixθ encodes the interplay between the two alphabets. The state space of the model isΣn×Σ′n. This model is called afully observed toric Markov model.

Example 6.1.Consider a dishonest dealer in a casino tossing coins. We know that she may use a fair or loaded coin, the latter of which is supposed to have the probability of 0.75 to get heads. We also know that she does not tend to change coins, which happens with probability of 0.1 (Fig. 6.1). Given a sequence of coin tosses we wish to determine when she used the biased and fair coins.

?>=<

Fig. 6.1.Transition graph of casino model.

This model can be described by a toric Markov model consisting of the alphabetsΣ={F, L}, where F stands for fair andLstands for loaded, and Σ ={h, t}, wherehstands for heads andt stands for tails. The corresponding 8×256 matrix A(2,2),4 for sequences of lengthn= 4 consists of the columns Aσ,τ, whereσandτ range over all words of length 4 overΣ andΣ, respectively; e.g.,

The model hasd= 8 parameters given by the matrices θ=

6.1 Fully Observed Markov Model 123

Suppose the dealer tosses four coins in a row such that we consider sequences of lengthn= 4. Then the model hasm= (2·2)4= 256 states. The fully observed toric Markov model is defined by the mapping

f :R8→R256: (θ, θ)7→(pσ,τ)σ∈Σ4,τ∈Σ′4, where

pσ1σ2σ3σ41τ2τ3τ4 = 1

2·θσ11θσ12θσ22θσ23θσ33θσ34θσ44. For instance, in view of the above pairs of sequences, we obtain

pF F LL,htht = 1

2·θF hθF FθF tθF LθLh θLLθLt

= 1

2·θF FθF LθLLθF h θF tθLhθLt and

pF F F L,hhhh = 1

2·θF hθF FθF hθF FθF hθF LθLh

= 1

2·θ2F FθF Lθ3F hθLh.

♦ Second, we introduce the fully observed Markov model as a submodel of the toric Markov model.

For this, the parameter space of the fully observed toric Markov model is restricted to the set of pairs of matrices (θ, θ) whose row sums are equal to 1. The parameter space of the fully observed Markov model is thus a subsetΘ1ofRl×(l−1)>0 ×Rl×(l>0 −1), and the number of parameters isd= (l·(l−1))·(l·(l−1)) = l·(l+l−2). A pair of matrices (θ, θ) inΘ1provides anl×lmatrixθdescribingtransition probabilities and anl×lmatrixθprovidingemission probabilities. The valueθij represents the probability to transit from statei∈Σto statej∈Σin one step, and the valueθijis the probability to emit the symbolj∈Σ in statei∈Σ. Thefully observed Markov model is given by the mappingf(l,l),n:Rd→Rmrestricted to the parameter spaceΘ1. Each pointpin the imagef(l,l),n1) is called a marginal probability. We assume usually that the initial distribution at the first state inΣ is uniform.

Letu= (uσ,τ)∈Nl0n×l′nbe a frequency vector representingN observed sequence pairs inΣn×Σ′n. That is,uσ,τ counts the number of times the pair (σ, τ) is observed. Thus, we haveP

σ,τuσ,τ =N. The sufficient statistic v =A(l,l),n·ucan be regarded as a pair of matrices (v, v), wherev = (vrs) is an l×l matrix whose entriesvrsare the number of occurrences ofrs∈Σ2 as a consecutive pair in any of the sequencesσoccurring in the observed pairs (σ, τ), andv = (vst) is anl×l matrix whose entries vst are the number of occurrences ofst∈Σ×Σ at the same position in any of the observed sequence pairs (σ, τ).

The likelihood function of the fully observed Markov model is given as

L(θ, θ) = (θ, θ)A(l,l),n·uv·(θ)v, (θ, θ)∈Θ1. (6.4) Proposition 6.2.In the fully observed Markov chain modelf(l,l),n, the maximum likelihood estimate of the frequency datau∈Nl0n×l′n with sufficient statisticv=A(l,l),n·uis the matrix pair(ˆθ,θˆ)inΘ1

such that

θˆσ1σ2 = vσ1σ2

P

σ∈Σvσ1σ

and θˆσ1τ1 = vσ1τ1 P

τ∈Σvσ1τ, σ1, σ2∈Σ, τ1∈Σ.

The proof is analogous to that of Prop. 4.11, since the log-likelihood function similarly decouples into independent parts.

Example 6.3 (Maple). We reconsider the dealer’s example. The parameter space Θ1 of the fully observed Markov model can be viewed as the set of all pairs of probability matrices

θ=

θF F 1−θF F

1−θLL θLL

and θ=

θF h 1−θF h 1−θLt θLt

,

whereθF,FL,L= 0.9 is the probability to stay with a fair or loaded coin,θF,h= 0.5 is the probability to observe heads for a fair coin, andθL,h= 0.75 is the probability to observe heads for a loaded coin.

This model has onlyd= 4 parameters.

Suppose a game involves rolling the coinn= 4 times. Then the model hasm= (2·2)4= 256 states.

This Markov model is given by the mappingf(2,2),4:R4→R256 with marginal probabilities pσ1σ2σ3σ41τ2τ3τ4 = 1

2·θσ11θσ12θσ22θσ23θσ33θσ34θσ44.

For instance, in a game the fair coin was used two times and then the loaded coin was taken two times, and each time heads was observed. The corresponding probability is given by

pF F LL,hhhh =1

2 ·θF h θF FθF h (1−θF F)(1−θLtLL(1−θLt ).

We compute symbolically the likelihood function. For this, we take the packages with(combinat): with(linalg):

and initialize

n := 4: l := 2: l’ := 2: m := (l * l’)^n:

T := array([ [tFF, tFF], [tLL, tLL] ]):

E := array([ [tFh, tFh], [tLt, tLt] ]):

P := array([], 1..l^n, 1..l’^n):

The marginal values are computed by the following code, R := powerset([1,2,3,4]):

S := powerset([1,2,3,4]):

for i from 1 to nops(R) do x := vector( [1,1,1,1] ):

for u from 1 to nops(R[i]) do x[R[i,u]] := 2;

od:

for j from 1 to nops(S) do y := vector( [1,1,1,1] ):

for v from 1 to nops(S[j]) do y[S[j,v]] := 2;

od:

P[i,j] := 1/2 * E[x[1],y[1]] * T[x[1],x[2]] * E[x[2],y[2]]

Im Dokument Algebraic Statistics (Seite 133-137)