• Keine Ergebnisse gefunden

Recursion for Sampling Probabilities

3. Computing Likelihoods 57

3.3. Infinitely-Many-Sites

3.3.2. Recursion for Sampling Probabilities

n = (n1, . . . , nd) the matrix S can be used to describe a numbered or unnumbered sample configuration.

Note that the genetree of every sample obtained from the stationary distribution of the Λ-Fleming-Viot process or the seven-step procedure can be transformed into a matrix S, whereas the converse is not true. Denote by Oj the set of types carrying mutation j. A matrix S can be transformed into a genetree if for any pair j, j the inequality Oj ∩Oj 6= ∅ implies that either Oj ⊆ Oj or Oj ⊇ Oj holds. If this conditions are satisfied Gusfield’s algorithm can be applied to obtain the associated genetree. We refer to [BB08, Section 2.1] or [G91] for more details.

Sequence data obtained from biological populations naturally leads to the notion of this matrix S. However, if the conditions for the infinitely-many-sites model are violated, then it is not possible to obtain a sample configuration that can be treated by the recursions we derive in the next section. We give the details on how to cope with violations of the infinitely-many-sites model to obtain feasible datasets in Section 5.6.

advantageous since these factors are rather involved and it is cumbersome to calculate them for a given sample. The maximum likelihood estimator is not changed if the likelihood is multiplied by a constant that depends on the data. Thus we will apply these modified recursions in Section 5.

(iii) Furthermore, recursion (3.10) can be employed to define a recursion for the sampling probabilities of sample configurations with ordered types via p(t,n) := c(td!,n)p[t,n].

Note that this recursion does also not include the factors c(·,·).

In the following we give three proofs of Theorem 3.23. The first proof was given by Birkner and Blath in [BB08] and is based on the duality ofZΛ to the Λ-coalescent established by the lookdown construction of the lookdown process Xn. The second proof is an extension of Theorem 4.1 from [EG87] proving the recursion by considering the stationarity condition for the generator of the Λ-Fleming-Viot process. The last proof is based on an urn-like Markov chainU(n) that can be used to simulate samples and was also introduced in [BB08, Section 7.2]. The Markov urn closely resembles the Hoppe urn and will play an important role in Chapter 4. Due to the connections between the different models

p[t,n] =Pθ{Xn(t)∈[t,n]}=Eµ˜ h1[t,n], µ⊗ni

=Pθ{U(n)(τ(∂)−1)∈[t,n]} holds for the sample configuration [t,n], where we assume the lookdown process to be in equilibrium at timetand ˜µis the stationary distribution of ZΛ. The random exit time of the Markov urn is denoted byτ(∂). The parameters for the underlying evolutionary model are denoted byθ= (r,Λ).

Proof Based on Lookdown Construction

In Section 3.3.1 we introduced the seven step procedure from [BB08, Section 3] that can be used to obtain a labelled numbered sample with ordered types. By the duality between the Λ-coalescent and the Λ-Fleming-Viot process the distribution of samples obtained by this procedure is the same as the distribution of samples in the Λ-Fleming-Viot process at stationarity. The following proof is based on this observation.

Proof of Theorem 3.23. To obtain the sampling probability of the configuration [t,n]

consider the first n levels of the stationary lookdown process Xn(t) at time 0 and denote by−τrecentthe most recent instant before time 0 when at least one of the types residing at levels 1, . . . , n was changed. Mutation occurs on every line with rate r and any number of lines merge at a rate of λn. τrecent is exponentially distributed with parameter rn =nr+λn and Xn(t) is already stationary at (−τrecent)−. Conditioning on the event occurring at time −τrecent yields

p(z,a) =1 rn

X

i:|Ai|≥2

|Ai|

X

k=2

|Ai| k

λn,kp(z,a−(k−1)ei) + r

rn 1 s

X

i:|Ai|=1,xi0unique, s(xi)6=xj∀j

p(si(z),a) + r rn

1 s 1 d

X

i:|Ai|=1, xi0unique

X

j:s(xi)=xj

p(ri(z),ri(a+ej)),

(3.11)

for the probability of observing an ordered labelled sample with numbered types (z,a)∈ Ts,d× An,d that belongs to the equivalence class [t,n]. By a slight abuse of notation a−(k−1)ei denotes the ordered partition where|Aj|is unchanged for allj6=iand|Ai| is decreased by k−1. Since this does not in general result in an ordered partition of {1, . . . , n−k+ 1} we renumber the elements keeping their ranking. By exchangeability this does not change the corresponding probability. The partition a+ej denotes the straightforward extension of a to an ordered partition of {1, . . . , n+ 1} where |Aj| is increased by 1.

The terms in the first sum of the right hand side of equation (3.11) subsume the possible transitions if the most recent event was a merging event. The merger could have occurred in a group of lines having the same type. For a given type i there are

|Ai| lines that could possibly merge and we must choose k of them, resulting in the binomial coefficient in the sum. The probability that ak-merger occurs equals the rate λb,k divided by the total rate rn. The last merger could also have been larger than any |Aj|or involved ancestral lines bearing different types, but since the probability of observing the sample given those events is zero the corresponding terms do not appear in the sum.

The most recent event could also have been a mutation, but therefore the multiplicity of the respective type has to be one and the type has to be unique since, similar as before, the terms corresponding to events not consistent with the sample configuration vanish. There ares different mutation labels. Therefore the probability that the most recent event produced the particular mutation at the tip is rr

n

1

s. The second sum in equation (3.11) captures those events where type i is still unique in the sample after removing the most recent mutation. In the case that by removing the outmost mutation we obtain typej that is already present in the sample, typei and its multiplicity are erased and the multiplicity of typejis increased by 1. Furthermore, we get an additional factor of 1d. This stems from the fact that the types are ordered, and thus, looking forward in time, there are dpossibilities to insert the new type into the type vector of lengthd−1. Since the sample configuration specifies the position, the probability has to be divided by d.

By exchangeability all (z,a) that belong to the equivalence class [t,n] have the same probability. Thus the probability that the observed sample configuration belongs to the equivalence class [t,n] is given by

p[t,n] =s! n!

n1!. . . nd! d!

c(t,n)p(t,a),

where equation (3.11) is multiplied by the number of elements in the equivalence class [t,n] stated in Lemma 3.20. Upon multiplying observe that the factor 1d in the last sum cancels, since all configurations have one type less. Furthermore, the factor 1s in the last two sums cancel since all samples have one segregating site less. The multinomial factor can be treated as in [BB08, Section 4.2] and thus recursion (3.10) is proved.

Remark 3.25. In [EG87, Theorem 5.6] Ethier and Griffiths derived the recursion under Kingman’s coalescent and the infinitely many sites model by conditioning on the number

of mutations in the coalescent tree between time 0 and the most recent coalescence event.

Their proof can be extended to the Λ-coalescent case. The cases where the number of mutations is greater or equal to 1 can be treated as in the proof of [EG87, Theorem 5.6].

In the case that no mutations occurred before the most recent merging event we have to add the probabilities for all possiblek-mergers and thus the recursion (3.10) is obtained.

The next proof relies on the generator 3.7 of the Λ-Fleming-Viot processZΛ. Generator Approach

In the following we will mimic Theorem 4.1, Corollary 4.2 and Remark 4.4 from [EG87]

to derive recursion (3.23). This proof will not make use of the duality relation, rather it involves the generator of the Λ-Fleming-Viot process with mutation.

Proof of Theorem 3.23. The right hand side of equation (3.7) can be rewritten as X

J⊂{1,...,n},|J|≥2

λn,|J|hf(xJ), µ⊗ni+r Xn

i=1

hIif, µ⊗ni −rnhf, µ⊗ni, (3.12) whereIi is as defined in equation (3.8).

Now choosef :=1[t,a]∈B(En) for some [t,a]∈[t,n]. This choice for f yields Gf(µ) =h1[t,a], µ⊗ni=µ⊗n [t,a]

, (3.13)

which is the probability of observing the sample [t,a] if the Λ-Fleming-Viot process is in stateµ. Applying the generator (3.12) to (3.13) gives

LGf(µ) = X

i:ni≥2 ni

X

k=2

ni k

λn,kµ⊗(n−k+1)

t,a−(k−1)ei

+ X

i:ni=1,xi0unique, s(xi)6=xj∀j

⊗n

si(t),a

+ X

i:ni=1, xi0unique

X

j:s(xi)=xj

⊗n

ri(t),ri(a+ej)

−(λn+nr)µ⊗n [t,a]

.

(3.14) Now Eµ˜LGf(µ) = 0 holds for the stationary distribution ˜µ of ZΛ, see for example [EK86, Proposition 4.9.2]. Applying this to (3.14), denotingR

µ⊗n [t,a]

d˜µ by p[t,a]

and adding (λn+nr)p[t,a] to both sides results in (λn+nr)p[t,a] = X

i:ni≥2 ni

X

k=2

ni k

λn,kp

t,a−(k−1)ei

+r X

i:ni=1,xi0unique, s(xi)6=xj∀j

p

si(t),a

+r X

i:ni=1, xi0unique

X

j:s(xi)=xj

p

ri(t),ri(a+ej) .

(3.15)

Due to Lemma 3.21 equation (3.15) has to be multiplied by n n!

1!...nd!c(t,n) to obtain the probability p[t,n]. This proves the recursive formula (3.10).

We now turn to the last proof of equation (3.10) which is based on a certain Markov urn model.

Markov Urn

In this section we introduce an urn-like discrete-time Markov chain U(n) with killing that starts in the configuration [(0)],(1)

=: (root) and can be used to simulate a sample of sizen. This Markov chain was introduced in [BB08, Section 7.2]. A similar chain for the Kingman-case was already introduced in [EG87]. A Markov chain with killing in discrete or continuous time is a regular Markov chain with an extended state space S ∪ {∂} where in addition to the transitions with probabilities (px,y)x,y∈S the Markov chain progresses to the cemetery state ∂ from a statex with probabilitypx,∂. This cemetery state can not be left, thus p∂,x = 0 holds for all x ∈ S. Birkner and Blath showed that the pre-exit distribution of the Markov urn U(n)(τ(∂)−1) is equal to the sampling distribution of the stationary Λ-Fleming-Viot processZΛ.

To derive the transition probabilities of the Markov urn U(n) Birkner and Blath consider theblock counting process {D(t)}t≥0={|ΠΛ(t)|}t≥0 of the Λ-coalescent which is a continuous-time Markov chain on Nwith transition rates

qi,j :=

i i−j+ 1

λi,i−j+1,

for i > j. Note that 1 is an absorbing state that can be identified with the cemetery state. The jump probabilities for the skeleton chain are given by

pi,j := qi,j λi ,

whereλi is the total jump rate as defined in Section 1.3.2. LetD(n) be the instance of the block counting process started inn. Denote by

g(n, m) :=E Z

0

1{D(n)s =m}ds (3.16)

the time-continuousGreen function, that is the expected time the processD(n) spends inm. Birkner and Blath use Nagasawa’s formula to show that for 0≤t≤τ the reversed process

(n)(t) :=D(n) (τ −t)− has transition rates

˜

q(n)j,i = g(n, i)

g(n, j)qi,j, (3.17)

whereτ := inf

t:D(n)(t) = 1 is the random time when the processD hits one. Na-gasawa’s formula for continuous-time Markov chains is given in [RW94, Chapter III.46].

In Lemma 4.5 we will show the formula for the time-discrete case in more detail. The entrance law is given by P{D˜(n)(0) =k}=g(n, k)qk,1. The cemetery state ∂ can only be reached from the state nand the transition rate is given by ˜q(n)n,∂n.

The following lemma shows the fact that the rate of leaving a certain state of the block counting processD(n) is the same in the reversed process ˜D(n).

Lemma 3.26. The rates of the block counting process and the rates of its inverse satisfy λi =

Xi−1

j=1

qi,j= Xn

j=i+1

˜

qij(n) = ˜qi(n).

Proof. Since the inverse ˜D(n)(t) of the block counting process D(n)(t) is obtained by reversing the latter pathwise, the equality

E Z

1{D(n)(s)=i}ds

=E Z

1{D˜(n)(s)=i}ds

holds. The processD(n)(t) is monotone decreasing, thus ˜D(n)(t) is monotone increasing.

Since

E Z

1{D(n)(s)=i}ds

=P{∃s:D(n)(s) =i} 1 λi, E Z

1D˜(n)(s)=i}ds

=P{∃s: ˜D(n)(s) =i} 1

˜ qn(n)

and P{∃s : D(n)(s) = i} = P{∃s : ˜D(n)(s) = i} hold, the statement of the lemma follows.

Lemma 3.26 yields the following corollary.

Corollary 3.27. The total rate of any event happening in the lookdown process satisfies rn=nr+λn =nr+ ˜q(j)n ,

for j > n.

Birkner and Blath used the transition rates of the inverse block counting process to derive the transition probabilities of the Markov urn (U(n)(t))t∈N0. Note that if the inverse of the block counting process ˜D(n) is in state k, then the number of blocks or ancestral lines increases by l with probability ˜q(n)k,k+l/˜qk(n) and the number of blocks cannot increase to more thann. Thus in general the final size of the sample has to be specified in advance.

For the following definition denote by nthe final sample size and let k:=Pd

i=1ni. Definition 3.28 (Markov urn). Set U(n)(0) = (root). The dynamics of the Markov chain starts att= 1. The initial distribution is given by

Pn

U(n)(1) = (0)

,(k)o

=PD˜(n)(0) =k .

The transitions of U(n) are given by

(t,n)→





(t,n+lei

w.p. r1

k

ni

kk,k+l(n) ai(t),n

w.p. rr

k ifni = 1

ei(t),e(n−ei)

w.p. rr

k

1

d+1ni ifni >1,

(3.20)

where[t,n]denotes the current state,idenotes the type that is involved in the transition event with1≤i≤d, andrk:=kr+ ˜qkk(n). The functionai(t)attaches a mutation to the type i. The operator ei(t) copies type i, attaches a mutation and appends the resulting type to the vectort. The expressione(n1, . . . , nd)denotes the vector(n1, . . . , nd,1). The exit probability is given by

P (t,n)→∂

= q˜n(n) nr+ ˜qn(n)

1{n=P

ini}.

Remark 3.29. (i) The factor d+11 appears in the third case of equation (3.20) since in this case a new type is added to the sample configuration. This type can be inserted before the first type, or after any other type. Thus there are d+ 1 possibilities. All possibilities have the same probability, so for convenience we add the type at the end of the type-vector.

(ii) It is possible to introduce a Markov chain on the space of sample configurations with unordered types, but this involves combinatorial factors derived fromc(t,n). For convenience we use U in the sequel.

It can be shown that the pre-exit distribution of this Markov chain pushed forward to samples with unordered types P{U(n)(τ(∂)−1)∈[t,n]} also satisfies recursion (3.10).

To show this we first investigate the relationship between certain Green functions of different Markov chains more closely. The definition of the Green function in discrete time closely resembles the time-continuous definition (3.16).

Definition 3.30. For every i∈N the Green function G(i)(t,n) :=EX

k=0

1{U(i)(k)=(t,n)}

yields the expected time the process U(i) spends in the state (t,n).

Due to the form of the transition probabilities of the Markov chain U(i) every state can only be visited once and is immediately left afterwards. Thus the Green function is given by

G(i)(t,n) =P

∃k:U(i)(k) = (t,n) .

As the Markov urnU(i) can be pushed forward, we can also define Green functions for samples with unordered types. Due to Observation 3.19 the equality

G(i)[t,n] = d!

c(t,n)G(i)(t,n) =P

∃k:U(i)(k)∈[t,n] (3.21)

holds. This is closely connected to the statement in Remark 3.24(iii).

Furthermore, the following relation between Green functions of Markov chains with different final sample sizes holds (for ordered as well as unordered types).

Lemma 3.31. Given the above definitions, G(n)

t,n−(k−1)ei

= qn,n−k+1

˜ qn−k+1,n(n)

λn−k+1

λn G(n−k+1)

t,n−(k−1)ei

(3.22)

holds for the Green functions G(n) and G(n−k+1).

Proof. We derive relation (3.22) forG(i)(t,n) and show that it also holds for unordered types. First note that equation (3.31) can be rewritten to obtain

λn(n)n−k+1,nG(n) t,n−(k−1)ei

= n

k

λn,kλn−k+1G(n−k+1) t,n−(k−1)ei λnn−k+1,n(n) X

U

P{U(n)=U}= n

k

λn,kλn−k+1X

U

P{U(n−k+1) =U},

(3.23)

where the sum on both sides extends over all admissible sequences of states that end with t,n−(k−1)ei

. Note that there are only finitely many such sequences. We claim that for every admissible sequenceU the corresponding summand on the left hand side equals the corresponding summand on the right hand side, thus

λnn−k+1,n(n) P{U(n)=U}= n

k

λn,kλn−k+1P{U(n−k+1) =U} (3.24) should hold for each U

U : U(τ(∂)−1) = [t,n−(k−1)ei] . Note that upon fixingU we fix the sequence of events that produced the sequence and the probability is given by the product of the one-step transition probabilities. Since by Corollary 3.27 rj(n) =rj(n−k+1) holds for allj < n−k+ 1, the probabilities for the mutation steps and also the combinatoric terms appearing in the probabilities for the transitions increasing the number of ancestral lines are the same on both sides. Both sides only differ in the probabilities for the jumps in the number of ancestral lines. Let 1 = l1 < l2 < . . . <

lm−1 < lm=nbe the sequence of ancestral line numbers specified by U. Note that

˜

qn−k+1,n(n) P{U(n)=U}=g(n, l2)ql2,1

m−1Y

i=2

˜ q(n)l

i,li+1

r(n)l

i

R(U) (3.25)

and n

k

λn,kP{U(n−k+1) =U}= n

k

λn,kg(n−k+ 1, l2)ql2,1

m−2Y

i=2

˜ q(n−k+1)l

i,li+1

r(n−k+1)l

i

R(U) (3.26)

hold for the same quantity R(U) subsuming the terms identical on both sides. The denominators are identical in both expressions, thus we focus on the numerators and by definition (3.17)

g(j, l2)ql2,1

m−1Y

i=2

qli,li+1 =g(j, l2)ql(j)

2,1 m−1Y

i=2

g(j, li+1)qli+1,li g(j, li)

=ql2,1

m−1Y

i=2

qli+1,lig(j, j)

(3.27)

holds for j ≥n. Inserting equation (3.27) combined with equation (3.25) into the left hand side of equation (3.24) and using definition (3.17) yields

λnql2,1

m−1Y

i=2

qli+1,lig(n, n) =ql2,1

m−1Y

i=2

qli+1,li, since g(n, n) = λ1

n. Note that we omitted the denominator and R(·). This expression also follows from the right hand side of equation (3.24) if we apply the same chain of arguments to equation (3.26). Thus equality (3.24) holds for each U, hence the equality (3.23) follows. The terms can be rearranged and combined with equation (3.21) to obtain the statement of the lemma.

This lemma can now be used to show that the probability that the pre-exit state of the urn is in the equivalence class [t,n] satisfies recursion (3.10). In [BB08, Theorem 7.2]

Birkner and Blath use the duality between the Λ-coalescent and the Λ-Fleming-Viot process to show that the distribution of the pre-exit states in the Markov urn is identical to the sampling probabilities under the stationary distribution of the Λ-Fleming-Viot process with mutation. We follow a different chain of arguments here.

Proof of Theorem 3.23. It follows from conditioning on the previous state of the Markov chain and equation (3.21) that the Green functionG(n)[t,n] satisfies the recursive for-mula

G(n)[t,n]

= X

i:ni≥2 ni

X

k=2

1 rn−k+1

ni−k+ 1

n−k+ 1q˜n−k+1,n(n) c(t,n−(k−1)ei) c(t,n) G(n)

t,n−(k−1)ei)

+ X

i:ni=1,xi0unique, s(xi)6=xj∀j

r rn

c(si(t),n) c(t,n) G(n)

si(t),n

+ X

i:ni=1, xi0unique

X

j:s(xi)=xj

r(nj+ 1) rn

c(ri(t),ri(n+ej)) c(t,n) G(n)

ri(t),ri(n+ej) ,

yielding G(n)[t,n]

= X

i:ni≥2 ni

X

k=2

1 rn−k+1

ni−k+ 1

n−k+ 1q˜n−k+1,n(n) c(t,n−(k−1)ei) c(t,n) G(n)

t,n−(k−1)ei) + r

rn

X

i:ni=1,xi0unique, s(xi)6=xj∀j

c(si(t),n) c(t,n) G(n)

si(t),n

+ r rn

X

i:ni=1, xi0unique

X

j:s(xk)=xj

(nj+ 1)c(ri(t),ri(n+ej)) c(t,n) G(n)

ri(t),ri(n+ej) . (3.28) Applying Lemma 3.31 to each summand of the first term on the right hand side of equation (3.28) gives

1 rn−k+1

ni−k+ 1

n−k+ 1q˜n−k+1,n(n) G(n)

t,n−(k−1)ei)

= 1

rn−k+1

ni−k+ 1

n−k+ 1qn,n−k+1λn−k+1

λn G(n−k+1)

t,n−(k−1)ei

=qn,n−k+1 λn

ni−k+ 1 n−k+ 1

λn−k+1

rn−k+1G(n−k+1)

t,n−(k−1)ei

=qn,n−k+1 λn

ni−k+ 1 n−k+ 1p

t,n−(k−1)ei

for all i and k such that ni ≥ 2 and 2≤ k≤ ni, where we omitted the combinatorial factors. Substituting this into equation (3.28) and multiplying by the exit probability

qn

rn gives recursion (3.10).

The following example shows that the combinatorial factorsc(·,·) need to be included into recursion (3.10).

Example 3.32. Consider samples of size 2. For convenience we denote the sample h(j1, . . . ,0),(j2, . . . ,0)

,(1,1)i

by [j1, j2], thus ji is the number of mutations type i bears, with i ∈ {1,2}. Note that [0,0] denotes the sample configuration with two individuals bearing the same type.

Since the sample configurations are unlabelled unnumbered and the types are unordered the equality [j1, j2] = [j2, j1] holds. Recursion (3.10) can be used to show that p[0,0] = 1/(1 + 2r),

p[j1, j2] = 2

j1+j2 j1

r 1 + 2r

(j1+j2) 1

1 + 2r (3.29)

ifj16=j2 and

p[j1, j1] = 2j1

j1

r 1 + 2r

(2j1) 1

1 + 2r (3.30)

hold for all Λ (see also [BB08, Equation 24]).

Summing over all possible sample configurations of size 2 yields p[0,0] +

X

m=1

Xm

n=⌈m2

p[n, m−n]. (3.31)

Note that p[j1, j2] = 12p[j1, j2] +12p[j2, j1] holds if j1 6= j2. This observation together with equation (3.29) and equation (3.30) can be used to show that equation (3.31) equals

1 1 + 2r +

X

m=1

Xm

n=0

m n

r 1 + 2r

m 1 1 + 2r

= 1

1 + 2r

1 + X

m=1

r 1 + 2r

mXm

n=0

m n

= 1

1 + 2r

1 + X

m=1

2r 1 + 2r

m

= 1

1 + 2r

1 + 1

1−1+2r2r −1

= 1 + 2r 1 + 2r = 1.

Without the combinatorial factorsc(·,·) this sum would be larger than one andpwould

not be a probability measure.