Markov Chains - Algebraic Statistics

We consider discrete time, discrete state space Markov chains. A Markov chain is an infinite sequence of random variables (Xt) indexed by timet≥0. The set of all possible values ofXtis the state space.

The state space is assumed to be a finite or countable setS. The sequence (Xt) is aMarkov chain if it satisfies theMarkov property,

P(Xt+1=j|X0=i0, . . . , Xt−1=it−1, Xt=i) =P(Xt+1=j|Xt=i), (8.6) for all states i, j∈S andt≥0. That is, the transition probabilities depend only on the current state, but not on the past.

A Markov chain ishomogeneous if the probabilities of going from one state to another in one step are independent of the current step; that is,

P(Xt+1=j|Xt=i) =P(Xt=j|Xt−1=i), i, j∈S, t≥1. (8.7) Let (Xt) be a homogeneous Markov chain and let the state space Sbe finite; we putS = [n]. Then the transition probabilitiesP(Xt+1|Xt) can be represented by an n×ntransition probability matrix P= (pij), where the entrypij is the probability that the chain provides a transition from stateito state j in one step. Note that the conservation of probability requires that the matrixPis row stochastic, i.e.,P

jpij = 1.

8.2 Markov Chains 179

Let p^(k)_ij denote the probability that the chain moves from state i to state j in k ≥ 1 steps. The k-step transition probabilities satisfy the Chapman-Kolmogorov equation

p^(k)_ij =X

z∈S

p^(l)_izp^(k−l)_zj , i, j∈S, 0< l < k. (8.8)

It follows that thek-step transition probabilities are the entries of the matrixP^k; i.e., P^(k)= (p^(k)_ij ) = P^k, wherePis thek-th power ofP.

LetQ= (Q(i)) be a distribution of the state space. Then the distributionQ^′= (Q^′(i)) of the state space at the next time instant is given by

Q^′(j) =X

i∈S

pijQ(i), j ∈S, (8.9)

or in shorthand notation,

Q^′ =P Q, (8.10)

whereP can be viewed as an operator on the space of probability distributions on the state setS.

An induction argument shows that the evolution of the Markov chain through t≥1 time steps is given by

P(Xt=j) =X

i∈S

p^(t)_ijP(X0=i), j ∈S.

Starting from the initial distributionQ0 given by a column vectorq₀= (q0(1), . . . , q0(n)) withq0(i) = P(X0 =i), i ∈ S, the marginal distribution Qt after t time steps given by the column vector q_t = (qt(1), . . . , qt(n)) withqt(i) =P(Xt=i),i∈S, satisfies the equation

q_t=P^tq₀, t≥1. (8.11)

Thus the operator on the space of probability distribution on the state setS is linear.

A distributionq= (q(i)) as a column vector of the state setSisstationaryif it satisfies the matrix equation

q=Pq. (8.12)

Thus a stationary distribution is an eigenvector of the transition matrix with the eigenvalue 1. The problem of finding the stationary distributions of a homogeneous Markov chain is a non-trivial task.

Example 8.9 (Maple). Consider the transition probability matrix for the Jukes-Cantor model







1−3a a a a

a 1−3a a a

a a 1−3a a

a a a 1−3a





.

Since the matrix must be row stochastic, we have 0< a <1/3.

Takinga= 0.1, the 2-step and 16-step transition matrices are

The transition probabilities in each row converge to the same stationary distribution π on the four states given byπ(i) = ¹₄ for 1≤i≤4 (Sect. 7.6).

In view of the initial distribution (0.1,0.2,0.2,0.5), we obtain after 2 steps and after 16 steps the distributions (0.1960,0.2320,0.2320,0.3400) and (0.2499,0.2499,0.2500,0.2500), respectively. The correspondingMaplecode is We study the convergence of Markov chains. For this, we consider a homogeneous Markov chain with finite state spaceS and transition probability matrixP = (pij). To this end, we need a metric on the space of probability distributions on the state space.

Proposition 8.10.A metric on the space of probability distributions on the state spaceS is given as d(Q, Q^′) =X We assume that the transition probabilities satisfies thestrong ergodic condition; that is, all transi-tion probabilitiespij are positive. First, we provide a fixpoint result that will guarantee the existence and uniqueness of fixed points.

8.2 Markov Chains 181

Theorem 8.11.If the transition probabilities P satisfy the strong ergodic condition, they define a contraction mapping with respect to the metric; that is, there is a non-negative real numberα <1 such that for all probability distributionsQandQ^′ on the state space S,

d(P Q, P Q^′)≤α·d(Q, Q^′). (8.14)

Proof. LetQandQ^′ be distinct distributions onS. Define

∆Q(s) =Q(s)−Q^′(s), s∈S.

t into to partial sumsP

t⁺+P

t⁻ such thatt⁺ and t⁻ are the states where

∆Qis positive or negative, respectively. This gives us d(P Q, P Q^′) =X

X Theorem 8.12.Let the transition probabilities P satisfy the strong ergodic condition. For each prob-ability distributionQ on the state space, the sequence(Q, P Q, P²Q, . . .)has the property that for each numberǫ >0, there exists a positive integerN such that for all steps nandm larger thanN,

0≤d(PⁿQ, P^mQ)< ǫ. (8.15)

The sequence(Q, P Q, P²Q, . . .)converges to the probability distribution Q^∗= lim

n→∞PⁿQ (8.16)

such that Q^∗ is the unique fixed point ofP.

Proof. First, by repeated application of the triangle inequality and Thm. 8.11, we obtain d(PⁿQ, P^mQ)≤d(P^NQ, PⁿQ) +d(P^NQ, P^mQ), min{m,} ≥N ≥0,

8.2 Markov Chains 183

d(PⁿQ, P^mQ)< ǫ.

Thus the series (PⁿQ) is a Cauchy sequence and hence is convergent by completeness ofRⁿ. Second, let Q^∗denote the limiting point. We have

0≤d(Pⁿ⁺¹Q, P Q^∗)≤α·d(PⁿQ, Q^∗), n≥0.

Butα·d(PⁿQ, Q^∗) tends to 0 as ngoes to infinity. ThusPⁿQgoes to P Q^∗ whenntends to infinity.

Since limits are unique, it follows thatQ^∗ =P Q^∗.

Third, assume there is another distribution Q satisfying P Q = Q. Then 0 ≤ d(Q, Q^∗) = d(P Q, P Q^∗) ≤ α·d(Q, Q^∗). Thus 0 ≤ (1−α)d(Q, Q^∗) ≤ 0 which shows that d(Q, Q^∗) = 0 and

henceQ=Q^∗. ♦

A probability distributionQon the state spaceS is called afixed point or astationary distribution of the operatorP provided thatP Q=Q.

Example 8.13 (R).A one-dimensional random walk is a discrete-time Markov chain whose state space is given by the set of integersS =Z. For some numberpwith 0< p <1, the transition probabilities (the probabilitypi,j of moving from stateito statej) are given by

pi,j=





p ifj=i+ 1, 1−pifj=i−1, 0 otherwise,

for alli, j∈Z.

In a random walk, at each transition a step of unit length is made at random to the right with probability pand to the left with probability 1−p. A random walk can be interpreted as the betting of a gambler who bets 1 Euro on a sequence of p-Bernoulli trials and wins or loses 1 Euro at each transition; if X0= 0, the state of the process at time nis her gain or loss afterntrials. The probability to start in state 0 and return to state 0 in 2nsteps for the first time is

p⁽²ⁿ⁾₀₀ = 2n

pⁿ(1−p)ⁿ. It can be shown thatP∞

n=1p⁽²ⁿ⁾₀₀ <∞if and only ifp6= 1/2. Thus the expected number of returns to 0 is finite if and only ifp6= 1/2. A random walk with Bernoulli probabilityp= 0.7 can be generated over a short time span as follows.

> n <- 400

# n p-Bernoulli trials

> X <- sample( c(-1,1), size=n, replace = TRUE, p=(0.3,0.7) )

# coerce to a data frame

> D <- as.integer( c(0,cumsum(X)) )

> plot ( 0..n, D, type="l", main="", xlab = "i" ) A trajectory of the process starting at state 0 is given in Fig. 8.1.

Another way to define a one-dimensional random walk is to take a sequence (Xt)t≥1of independent, identically distributed random variables, where each variable has state space {±1}. Put S0 = 0 and consider the partial sumSn =Pn

i=1Xi. The sequence (St)t≥0is asimple random walkonZ. The series

Fig. 8.1.Partial realization of a random walk withp= 0.7.

given by the sum of sequences of 1’s and−1’s provides the walking distance if each part of the walk is of unit length. In case ofp= 1/2 we speak of asymmetric random walk. In a symmetric random walk, all states are recurrent; i.e., the chain returns to each state with probability 1. A symmetric random walk can be generated over a short time span as follows.

> n <- 400

# n Bernoulli trials with p=1/2

> X <- sample( c(-1,1), size=n, replace = TRUE )

# coerce to a data frame

> S <- as.integer( c(0,cumsum(X)) )

> plot ( 0..n, S, type="l", main="", xlab = "i" )

A trajectory of the process starting at S0 = 0 is given in Fig. 8.2. The process returns to 0 several times within the given time span. This can be seen by invoking the commandwhich(S==0). ♦

Im Dokument Algebraic Statistics (Seite 190-196)