• Keine Ergebnisse gefunden

Markov Chains

Im Dokument Algebraic Statistics (Seite 190-196)

We consider discrete time, discrete state space Markov chains. A Markov chain is an infinite sequence of random variables (Xt) indexed by timet≥0. The set of all possible values ofXtis the state space.

The state space is assumed to be a finite or countable setS. The sequence (Xt) is aMarkov chain if it satisfies theMarkov property,

P(Xt+1=j|X0=i0, . . . , Xt−1=it−1, Xt=i) =P(Xt+1=j|Xt=i), (8.6) for all states i, j∈S andt≥0. That is, the transition probabilities depend only on the current state, but not on the past.

A Markov chain ishomogeneous if the probabilities of going from one state to another in one step are independent of the current step; that is,

P(Xt+1=j|Xt=i) =P(Xt=j|Xt−1=i), i, j∈S, t≥1. (8.7) Let (Xt) be a homogeneous Markov chain and let the state space Sbe finite; we putS = [n]. Then the transition probabilitiesP(Xt+1|Xt) can be represented by an n×ntransition probability matrix P= (pij), where the entrypij is the probability that the chain provides a transition from stateito state j in one step. Note that the conservation of probability requires that the matrixPis row stochastic, i.e.,P

jpij = 1.

8.2 Markov Chains 179

Let p(k)ij denote the probability that the chain moves from state i to state j in k ≥ 1 steps. The k-step transition probabilities satisfy the Chapman-Kolmogorov equation

p(k)ij =X

z∈S

p(l)izp(k−l)zj , i, j∈S, 0< l < k. (8.8)

It follows that thek-step transition probabilities are the entries of the matrixPk; i.e., P(k)= (p(k)ij ) = Pk, wherePis thek-th power ofP.

LetQ= (Q(i)) be a distribution of the state space. Then the distributionQ= (Q(i)) of the state space at the next time instant is given by

Q(j) =X

i∈S

pijQ(i), j ∈S, (8.9)

or in shorthand notation,

Q =P Q, (8.10)

whereP can be viewed as an operator on the space of probability distributions on the state setS.

An induction argument shows that the evolution of the Markov chain through t≥1 time steps is given by

P(Xt=j) =X

i∈S

p(t)ijP(X0=i), j ∈S.

Starting from the initial distributionQ0 given by a column vectorq0= (q0(1), . . . , q0(n)) withq0(i) = P(X0 =i), i ∈ S, the marginal distribution Qt after t time steps given by the column vector qt = (qt(1), . . . , qt(n)) withqt(i) =P(Xt=i),i∈S, satisfies the equation

qt=Ptq0, t≥1. (8.11)

Thus the operator on the space of probability distribution on the state setS is linear.

A distributionq= (q(i)) as a column vector of the state setSisstationaryif it satisfies the matrix equation

q=Pq. (8.12)

Thus a stationary distribution is an eigenvector of the transition matrix with the eigenvalue 1. The problem of finding the stationary distributions of a homogeneous Markov chain is a non-trivial task.

Example 8.9 (Maple). Consider the transition probability matrix for the Jukes-Cantor model

P=



1−3a a a a

a 1−3a a a

a a 1−3a a

a a a 1−3a



.

Since the matrix must be row stochastic, we have 0< a <1/3.

Takinga= 0.1, the 2-step and 16-step transition matrices are

The transition probabilities in each row converge to the same stationary distribution π on the four states given byπ(i) = 14 for 1≤i≤4 (Sect. 7.6).

In view of the initial distribution (0.1,0.2,0.2,0.5), we obtain after 2 steps and after 16 steps the distributions (0.1960,0.2320,0.2320,0.3400) and (0.2499,0.2499,0.2500,0.2500), respectively. The correspondingMaplecode is We study the convergence of Markov chains. For this, we consider a homogeneous Markov chain with finite state spaceS and transition probability matrixP = (pij). To this end, we need a metric on the space of probability distributions on the state space.

Proposition 8.10.A metric on the space of probability distributions on the state spaceS is given as d(Q, Q) =X We assume that the transition probabilities satisfies thestrong ergodic condition; that is, all transi-tion probabilitiespij are positive. First, we provide a fixpoint result that will guarantee the existence and uniqueness of fixed points.

8.2 Markov Chains 181

Theorem 8.11.If the transition probabilities P satisfy the strong ergodic condition, they define a contraction mapping with respect to the metric; that is, there is a non-negative real numberα <1 such that for all probability distributionsQandQ on the state space S,

d(P Q, P Q)≤α·d(Q, Q). (8.14)

Proof. LetQandQ be distinct distributions onS. Define

∆Q(s) =Q(s)−Q(s), s∈S.

t into to partial sumsP

t++P

t such thatt+ and t are the states where

∆Qis positive or negative, respectively. This gives us d(P Q, P Q) =X

X Theorem 8.12.Let the transition probabilities P satisfy the strong ergodic condition. For each prob-ability distributionQ on the state space, the sequence(Q, P Q, P2Q, . . .)has the property that for each numberǫ >0, there exists a positive integerN such that for all steps nandm larger thanN,

0≤d(PnQ, PmQ)< ǫ. (8.15)

The sequence(Q, P Q, P2Q, . . .)converges to the probability distribution Q= lim

n→∞PnQ (8.16)

such that Q is the unique fixed point ofP.

Proof. First, by repeated application of the triangle inequality and Thm. 8.11, we obtain d(PnQ, PmQ)≤d(PNQ, PnQ) +d(PNQ, PmQ), min{m,} ≥N ≥0,

8.2 Markov Chains 183

d(PnQ, PmQ)< ǫ.

Thus the series (PnQ) is a Cauchy sequence and hence is convergent by completeness ofRn. Second, let Qdenote the limiting point. We have

0≤d(Pn+1Q, P Q)≤α·d(PnQ, Q), n≥0.

Butα·d(PnQ, Q) tends to 0 as ngoes to infinity. ThusPnQgoes to P Q whenntends to infinity.

Since limits are unique, it follows thatQ =P Q.

Third, assume there is another distribution Q satisfying P Q = Q. Then 0 ≤ d(Q, Q) = d(P Q, P Q) ≤ α·d(Q, Q). Thus 0 ≤ (1−α)d(Q, Q) ≤ 0 which shows that d(Q, Q) = 0 and

henceQ=Q. ♦

A probability distributionQon the state spaceS is called afixed point or astationary distribution of the operatorP provided thatP Q=Q.

Example 8.13 (R).A one-dimensional random walk is a discrete-time Markov chain whose state space is given by the set of integersS =Z. For some numberpwith 0< p <1, the transition probabilities (the probabilitypi,j of moving from stateito statej) are given by

pi,j=



p ifj=i+ 1, 1−pifj=i−1, 0 otherwise,

for alli, j∈Z.

In a random walk, at each transition a step of unit length is made at random to the right with probability pand to the left with probability 1−p. A random walk can be interpreted as the betting of a gambler who bets 1 Euro on a sequence of p-Bernoulli trials and wins or loses 1 Euro at each transition; if X0= 0, the state of the process at time nis her gain or loss afterntrials. The probability to start in state 0 and return to state 0 in 2nsteps for the first time is

p(2n)00 = 2n

n

pn(1−p)n. It can be shown thatP

n=1p(2n)00 <∞if and only ifp6= 1/2. Thus the expected number of returns to 0 is finite if and only ifp6= 1/2. A random walk with Bernoulli probabilityp= 0.7 can be generated over a short time span as follows.

> n <- 400

# n p-Bernoulli trials

> X <- sample( c(-1,1), size=n, replace = TRUE, p=(0.3,0.7) )

# coerce to a data frame

> D <- as.integer( c(0,cumsum(X)) )

> plot ( 0..n, D, type="l", main="", xlab = "i" ) A trajectory of the process starting at state 0 is given in Fig. 8.1.

Another way to define a one-dimensional random walk is to take a sequence (Xt)t≥1of independent, identically distributed random variables, where each variable has state space {±1}. Put S0 = 0 and consider the partial sumSn =Pn

i=1Xi. The sequence (St)t≥0is asimple random walkonZ. The series

Fig. 8.1.Partial realization of a random walk withp= 0.7.

given by the sum of sequences of 1’s and−1’s provides the walking distance if each part of the walk is of unit length. In case ofp= 1/2 we speak of asymmetric random walk. In a symmetric random walk, all states are recurrent; i.e., the chain returns to each state with probability 1. A symmetric random walk can be generated over a short time span as follows.

> n <- 400

# n Bernoulli trials with p=1/2

> X <- sample( c(-1,1), size=n, replace = TRUE )

# coerce to a data frame

> S <- as.integer( c(0,cumsum(X)) )

> plot ( 0..n, S, type="l", main="", xlab = "i" )

A trajectory of the process starting at S0 = 0 is given in Fig. 8.2. The process returns to 0 several times within the given time span. This can be seen by invoking the commandwhich(S==0). ♦

Im Dokument Algebraic Statistics (Seite 190-196)