• Keine Ergebnisse gefunden

Hobolth, Uyenoyama & Wiuf’s Scheme

4. Importance Sampling in the Infinitely-Many-Sites-Model 89

4.2. Importance Sampling Schemes

4.2.3. Hobolth, Uyenoyama & Wiuf’s Scheme

In [HUW08] the authors introduced a proposal distribution for the infinitely many sites model where the underlying genealogy is given by Kingman’s coalescent. The basic idea behind their proposal distribution is to treat each mutation as if it is the only mutation in the genetree. In the case of just one mutation, the optimal proposal weights can be calculated explicitly. The proposal weights from each mutation are then suitably combined to obtain a proposal distribution for samples with an arbitrary number of mutations. We present a possible generalisation of their proposal distribution to the Λ-coalescent scenario and give a short summary of the work in [HUW08] later.

Recall from Section 3.3.1 that in the infinitely many sites model the genetree of the sample configuration can also be represented by a matrixS = (si,m)1≤i≤d,1≤m≤s, where si,m= 1 if and only if typeibears a mutation at the segregating sitem. We denote by dm =Pd

i=1si,mni the number of individuals in the sample bearing a mutation at site m.

For convenience denote by

Mnd :=

(0),(0,1)

,(n−d, d)

the sample configuration with one segregating site wheredindividuals bear the mutation at the segregating site and n−ddo not. Denote by

Mn0 :=

(0)

,(n)

n − d

d

(a) The sample configuration [Mnd].

n

(b) The sample configuration [Mn0].

Figure 4.1.: The sample configurations [Mnd] and [Mn0]. The sample has one segrat-ing site respectively no segregatsegrat-ing site.

the configuration where all n individuals share the same type. These sample config-urations are illustrated in Figure 4.1. Note that in the sequel whenever we present illustrations of such samples we will use the corresponding equivalence class under ≈ out of convenience.

In case of one segregating site let p(n, d) for n≥2 be the probability that the most recent event affected an individual out of thedindividuals bearing the mutation, that is

pθ(n, d) =

(Pd−1

l=1 Qθ(Mnd → Mn−ld−l) ifd >1 Qθ(Mn1 → Mn0) ifd= 1.

In the first case the most recent event was a merger of any size involving individuals bearing the mutation, whereas in the second case the last event was the origin of the mutation.

To utilise this probability for a general sample (t,n) fix a type i and a segregating site m. Say that the type i is involved in this most recent evolutionary event with probability

u(m)θ (i) :=

(pθ(n, dm)dni

m ifsi,m = 1,

(1−pθ(n, dm))n−dni

m ifsi,m = 0.

In the first case typei bears the mutation at sitem, thus the probability for the most recent event affecting individuals carrying the mutation has to be multiplied by ndi since this is the fraction of individuals with typeiamong the individuals bearing the mutation at sitem. In the second case a similar factor shows up due to the same reasoning among the individuals not carrying the mutation. In the remainder we refer to the genetree Mndm as thecompressed genetree.

For a general sample consider the contribution of all segregating sites. We expect the method to perform well, since types bearing more mutations will get a larger weight and thus are involved in the most recent event with a higher probability, which seems rather intuitive.

Definition 4.15(Proposal distributionQHθ1). Denote by (t,n) the current state of the

0 0

1 2

4 2 5 4

2 1 3 3

i

carrying:

0 4 =nd1

1 8 =d1

i

0 8

5 4 i

not carrying:

0 11

2 1

i 0 9

3 3 i

0 10

4 2 i

Figure 4.2.: A sample configuration is depicted on the left and a typeiis marked. On the right all possible compressed genetrees are listed. The type corresponding to i is marked in the compressed genetrees. Either type i corresponds to the type carrying the mutation or not.

proposal Markov chainQHθ1. Propose type ito be involved in the most recent event with probability

QHθ1(i|t,n)∝ (P

mu(m)θ (i) if iis allowed to act

0 else, (4.20)

where type i is allowed to act if either ni ≥ 2 or type i corresponds to a leaf node in the associated genetree. If the multiplicity of the chosen type is one remove the outmost mutation. If the multiplicity is larger than one, perform a merger inside this group. The size of the merger is chosen to be l+ 1 with probability proportional to Qθ(Mndo → Mn−ldo−l), where do is the number of individuals in the sample bearing the outmost mutation of the chosen type and1≤l≤ni−1. If the selected typeicarries no mutation the merging size is chosen proportional toQθ(Mn0 → Mn−l0 )whereQθ(Mn0 → M10) = g(PiG(Mni,n)q(n,1))n

0) is the probability of jumping to the terminal state.

Remark 4.16. The quantities pθ(n, d) and the proposal of the merging size involve the optimal proposal distribution for samples with at most one segregating site so these quantities can be computed explicitly.

Figure 4.2 depicts a sample configuration and all corresponding compressed genetrees with one mutation. If the underlying genealogy is given by Kingman’s coalescent, then the transition probabilities can be computed explicitly.

Explicit Formulae for the Kingman case

The authors in [HUW08] developed their proposal distribution in the Kingman case (Λ = δ0). In this case the transition probabilities of the Markov urn introduced in Definition 3.28 take the form

(t,n)→



(t,n+ei

w.p. nnin−1+θn−1 ai(t),n

w.p. n1n−1+θθ ifni = 1 ei(t),e(n−ei)

w.p. d+11 nnin−1+θθ ifni >1,

(4.21)

whereU(nf)(1) = [(0)],(2)

almost surely andai,ei and eas in Definition 3.28. This Markov chain was introduced in [EG87, Section 5] and restated in [HUW08]. Note that the transition probabilities do not depend on the final sample size, soU(nf) =U. This fact was also remarked in the proof of Proposition 1 from [BB08]. Furthermore, either the number of lines or the number of mutations increase by one in each step and therefore the sample complexity comp(t,n) = s+n increases in each step by exactly one where, as before,n=Pd

i=1ni andsis the number of segregating sites in (t,n). It follows that the Markov urn can visit a given state (t,n) only at the times+n−1 and immediately leaves it in the next step, thus the Green function simplifies to

G(nf)(t,n) =Eµ X

k=0

1{U(k)=(t,n)}=Pθ

U(s+n−1) = (t,n) (4.22)

and the probability of observing the sample (t,n) at stationarity is given by p(t,n) =Pθ

U(s+n−1) = (t,n) n−1 n−1 +θ.

The authors in [HUW08] investigated the case of only one segregating site in the sample and showed that

Pθ{U(n) =Mnd}=

n−d+1X

k=2

θ k−1 +θ

n−2Y

l=1

l l+θ

n−d−1 k−2

n−1 k−1

−1

(4.23)

holds, see [HUW08, Appendix]. A main ingredient of the proof is a result from [S00b]

on the time at which the unique mutation occurred in the coalescent tree.

Recall that in Kingman’s coalescent only binary merging is possible. For d ≥ 2 Theorem 4.4 and the special form of the Green function (4.22) yield

pθ(n, d) =Pθ

U(n−1) =Mn−1d−1U(n) =Mnd

=Pθ

U(n) =Mnd

U(n−1) =Mn−1d−1 Pθ

U(n−1) =Mn−1d−1 Pθ

U(n) =Mnd

= d−1 n−1

n−2 n−2 +θ

Pn−d+1

k=2 θ

k−1+θ

Qn−3

l=1 l l+θ

n−d−1 k−2

n−2

k−1

−1

Pn−d+1

k0=2 θ k0−1+θ

Qn−2

l=1 l l+θ

n−d−1 k0−2

n−1

k0−1

−1

=

Pn−d+1

k=2 1

k−1+θd−1 n−k

n−d−1 k−2

n−1

k−1

−1

Pn−d+1

k0=2 1 k0−1+θ

n−d−1 k0−2

n−1

k0−1

−1

for the probability that individuals bearing the mutation were involved in the most recent event, which is a merger of size 2 among those individuals in this case. This coincides with equation (11) in [HUW08]. For d = 1 the probability that the most

recent event was the origin of the mutation is given by P

U(n−1) =Mn0

U(n) =Mn1

=P

U(n) =Mn1

U(n−1) =Mn0

P

U(n−1) =Mn0

P

U(n) =Mn1

= 1 2

θ n−1 +θ

Qn−1

l=1 l

Pn l+θ

k=2 θ

k−1+θ

Qn−1 l=1 l

l+θ n−2 k−2

n−1

k−1

−1

= 1 2

1 n−1+θ

Pn

k=2 1

k−1+θk−1 n−1

,

which basically coincides with equation 14 in [HUW08]. The factor 12 vanishes if Mnd is replaced by [Mnd].

Regarding Pairs of Mutations

In this section we extend the approach of [HUW08] to consider pairs of mutations. First note that there are two kinds of structurally distinct genetrees with two mutations.

Definition 4.17. Let

Mnd1,d2 :=

(0),(0,1),(0,2)

,(n−d1−d2, d1, d2) ,

be the genetree, where the two mutations are on different branches. The number of individuals carrying mutation 1 is d1 and d2 gives the number of individuals bearing mutation 2. Denote by

Ndn1,d2 :=

(0),(0,1),(0,1,2)

,(n−d1, d1−d2, d2)

the sample configuration where the mutations are on the same branch. The number of individuals carrying only mutation 1 is d1−d2 and both mutations are carried by d2 individuals.

The two possible types of genetrees are depicted in Figure 4.3.

Remark 4.18. (i) Note that [Mnd1,d2] = [Mnd2,d1] holds ford1+d2≤n. Furthermore, note that Mnd1,d2 with d1 +d2 =n, Ndn2,d2 and Nn,dn 2 denote valid genetrees with two segregating sites, whereas by a slight abuse of notation Mnd1,0 = Mn0,d1 = Mnd1 and Ndn1,0 =Mnd1 denote genetrees with only one segregating site. As before we denote by Mn0,0 =N0,0n =Mn0 the sample of size nwith no segregating sites.

To consider pairs of mutations determining the probabilities of performing a step involving type i in a general sample configuration [t,n] it is necessary to know the relation of the outmost mutation of type i to two given mutations m1 and m2 in the genetree. In other words, the type in the compressed genetree corresponding to typei has to be determined. Based on this information the corresponding most recent event

0 nd1d2

1 d1 2 d2

(a) The genetree for the sam-ple configuration [Mnd

1,d2]

0 nd1

1 d1d2

2 d2

(b) The genetree for the sam-ple configuration [Ndn

1,d2]

Figure 4.3.: The two different sample configurations of sizenwith two segregating sites (or mutations). The number of individuals carrying mutation 1 isd1 andd2 individuals carry mutation 2

in the compressed tree can be chosen. Figure 4.4 shows two examples of compressing the genetree for two given mutations. Due to symmetry reasons the relation can be described by one of five distinct cases.

Definition 4.19. The relation between the type iand the mutations m1 and m2 is Case I if si,m1 = 1 and si,m2 = 1,

Case II if si,m1 = 1, si,m2 = 0 and m1, m2 are on the same branch, Case III if si,m1 = 1, si,m2 = 0 and m1, m2 are on different branches, Case IV if si,m1 = 0, si,m2 = 0 and m1, m2 are on the same branch, or Case V if si,m1 = 0, si,m2 = 0 and m1, m2 are on different branches.

In Case I type ibears both mutations. In Case II and III typeibears one mutation and in Case IV and V the type bears none of the two mutations. In the first case the two mutations lie on the same branch of the genetree. In the remaining four cases this property may or may not hold. The five cases are depicted in Figure 4.6 and the pseudo-code in Figure 4.5 can be used to identify the relation of the outmost mutation of typeito the mutationsm1and m2 in the genetree and the case to be used in a given setting.

In each case the most recent event affecting one of the three types is either a mutation event (not for the root type) or a merging event. For convenience denote by

d1 =dm1 = Xd

i=1

si,m1di

the number of individuals carrying mutationm1 and by d2 =dm2 =

Xd

i=1

si,m2di

the number of individuals bearing mutationm2. Without loss of generality the mutation labels can be chosen such thatd1 ≥d2 ⇒m1< m2 holds ifm1 andm2 are on the same branch.

0 3

1

4 1 5

8 2

2 5 3

6 1 i

0 9

1 1

8 2 i

(a) The two mutations considered in this example are mutation 1 and 8. The com-pressed genetree is of the formNdn

1,d2 and typeicorresponds to mutation 1.

0 3

1

4 1 5

8 2

2 5 3

6 1 i

0 9

5 2 3 1

i

(b) In this example mutation 3 and 5 are con-sidered. The compressed genetree is of the form Mnd

1,d2 and typeicorreponds to the root type.

Figure 4.4.: Two examples of genetree compressions. The type iis identified with one of the three types in the compressed sample by this procedure.

In Case III and V the compressed genetree is of the form Mnd1,d2. In Case V the probability that the most recent event reduced the multiplicity of the type bearing no mutation by l is given by Qθ(Mnd1,d2 → Mn−ld1,d2) if n−d1−d2 > l. In Case III the probability that the multiplicity of the type bearing mutation 1 is decreased bylis given by Qθ(Mnd1,d2 → Mn−ld1−l,d2) if d1 > 1. If d1 = 1, the probability that the most recent event was the origin of mutation 1 is given byQθ(Mn1,d2 → Mn0,d2). By symmetry Case III includes the case that typeicorresponds to m2 in the compressed genetree.

In Case I, II and IV the compressed genetree is of the form Ndn1,d2. The most recent event affected the type bearing no mutation in Case IV. The probability that the mul-tiplicity of the type is decreased bylis given byQθ(Ndn1,d2 → Ndn−l1,d2) ifn−d1 >1. The multiplicity of the type bearing only mutation 1 gets decreased by l with probability Qθ(Ndn1,d2 → Ndn−l1−l,d2) ifd1−d2 >1 in Case II. Note that in Case II the probability for the most recent event being the origin of mutation 1 is zero since in general mutation 2 is a successor of mutation 1 in the genetree and must therefore be removed from the genetree before mutation 1 can be removed. Decreasing the multiplicity of the type bearing both mutations bylin Case I occurs with probabilityQθ(Ndn1,d2 → Ndn−l1−l,d2−l) if d2 > 1. In Case I mutation 2 can originate from the most recent event and the probability is given byQθ(Ndn1,1 → Mnd1,0) if the multiplicity of the type bearing both mutations is one.

The optimal transition probabilities for samples with two mutations can be calculated numerically. To determine the transitions of the proposal Markov chain until it hits the root configuration M10 the transition probabilities for the cases with one mutation or zero mutations also have to be provided. The above considerations can now be combined in several ways to obtain proposal distributions for general sample configurations.

We now present several proposal distributions combining the considerations from above in different ways.

1 i f same branch (m1,m2) : 2 i f ( same branch (i,m1) ) :

3 i f (i < m1) : + Case IV

4 e l s e:

5 i f same branch (i,m2) :

6 i f (i < m2) :

+ Case II

7 e l s e:

+

Case I

8 e l s e:

+ Case II

9 e l s e: +Case IV

10 e l s e:

11 i f same branch (i,m1) :

12 i f (i < m1) : + Case V

13 e l s e:

+ Case III

14 e l s e:

15 i f same branch (i,m2) :

16 i f (i < m2) : + Case V

17 e l s e:

+Case III

18 e l s e: x Case V

Figure 4.5.: Pseudo-code to check the constellation of typei(or the outmost mutation of this type) and the two mutationsm1andm2in the genetree. The predicate same branch(m1, m2) is true if and only if the two mutations are on the same branch. In the pictograms the hollow dot depicts the root. The two filled dots correspond to the two mutations and the cross, ‘+’ or ‘x’, gives the position of the outmost mutation of type i. The case corresponding to the given constellation is indicated to the right of the arrow. Furthermore, m1 < m2 holds if mutation m2 is a descendant of mutation m1 in the genetree.

0 nd1

m1 d1d2

m2 d2 (a) Case I

0 nd1

m1 d1d2

m2 d2 (b) Case II

0 nd1d2

m1 d1 m2 d2 (c) Case III

0 nd1

m1 d1d2

m2 d2 (d) Case IV

0 nd1d2

m1 d1 m2 d2 (e) Case V

Figure 4.6.: The five different cases of types being affected by the most recent event in the two genetrees corresponding to configurationsMnd1,d2 (Case III and V) and Ndn1,d2 (Case I, II and IV).

Two-Step Procedure

One way to extend the proposal distributionQH1 to consider pairs of mutation is again a two step procedure as in Definition 4.15, where in the first step a type that is involved in the most recent event is chosen and the size of the possible merger is determined in the second step. Fix a configuration (t,n). For each typei,1≤i≤dand all unordered pairs of mutations in the genetree {m1, m2} let

u{mθ 1,m2}(i) :=











































ni

d2

1{d2>1}

dX2−1

l=1

Qθ(Ndn1,d2,Ndn−l1−l,d2−l) +1{d2=1}Qθ(Ndn1,1,Ndn1,0)

in Case I

ni

d1−d21{d1−d2>1}

d1−dX2−1

l=1

Qθ(Ndn1,d2,Ndn−l1−l,d2) in Case II

ni

d1

1{d1>1}

dX1−1

l=1

Qθ(Mnd1,d2,Mn−ld1−l,d2) +1{d1=1}Qθ(Mn1,d2,Mn0,d2)

in Case III

ni

n−d1

1{n−d1>1}

n−dX1−1

l=1

Qθ(Ndn1,d2,Ndn−l1,d2) in Case IV

ni

n−d1−d21{n−d1−d2>1}

n−dX1−d2−1

l=1

Qθ(Mnd1,d2,Mn−ld1,d2) in Case V be the probability that if the genetree is compressed to the one consisting only of mutation m1 and m2, then the most recent event involved an individual bearing type i. Againdj =dmj denotes the number of individuals bearing the respective mutation.

In Case I and III type i identifies with one of the types in the compressed tree that could have been involved in either a merging event or a mutation event, depending on the number of types bearing the corresponding mutation. In the other three cases the type could have been involved in a merging event if the number of individuals bearing the corresponding mutation is greater than or equal to 2. Note that we write Qθ(H, H) instead of Qθ(H → H) for simplicity and that in Case I, III and IV the

order of the mutations in the compressed tree Ndn1,d2 is determined by their order in the original genetree. As in equation (4.21) the prefactors stem from the fact that ni individuals bearing typeihave to be chosen from a larger group of individuals bearing the respective mutation.

Based on this probabilities define the distributionQHθ2f1 with probabilities QHθ2f1(i|t,n)∝

(P

{m1,m2}u{mθ 1,m2}(i) ifiis allowed to act,

0 otherwise, (4.24)

for all typesi∈ {1, . . . , d}, where the sum extends over all unordered pairs of mutations present in the sample. This distribution can be used to propose a type to be involved in the most recent event. One way of choosing the size of the possible merger is analogous to the one pursued inQH1 in that again the probabilities of the merger size in a specific sample, now with two mutations, are considered.

Definition 4.20 (Proposal distributionQHθ2f1,1). Let (t,n) be the current sample con-figuration. Choose typeito be involved in the most recent event according toQHθ2f1(i|t,n) defined in equation (4.24). If ni= 1, remove the outmost mutation of typei. Ifni ≥2, letobe the outmost mutation of typei. Reduce the multiplicity of typeibylwith proba-bilityQθ(Ndno,do−ni → Ndn−lo−l,do−ni). If typeiis the root type, reduce the multiplicity byl with probabilityQθ(Mnn−ni → Mn−ln−l−ni). Denote this proposal distribution byQHθ2f1,1.

Another way of proposing the merger size is to again consider all mutations present in the sample. Define the quantity

u{m1,m2}

θ (i, l) :=















ni

d2Qθ(Ndn1,d2 → Ndn−l1−l,d2−l) in Case I

ni

d1−d2Qθ(Ndn1,d2 → Ndn−l1−l,d2) in Case II

ni

d1Qθ(Mnd1,d2 → Mn−ld1−l,d2) in Case III

ni

n−d1Qθ(Ndn1,d2 → Ndn−l1,d2) in Case IV

ni

n−d1−d2Qθ(Mnd1,d2 → Mn−ld1,d2) in Case V

for a given unordered pair of mutations{m1, m2} and a given 1≤l≤ni−1, providing the weight of anl+ 1-merger in the compressed genetree. Based on this quantities the following proposal distribution can be defined:

Definition 4.21 (Proposal distributionQHθ2f1,2). Let (t,n) be the current sample con-figuration. Choose typeito be involved in the most recent event according toQHθ2f1(i|t,n) defined in equation (4.24). If ni= 1, remove the outmost mutation of typei. Ifni ≥2, choose to decreaseni byl with probability proportional to

X

m1,m2

u{m1,m2}

θ (i, l),

where the sum extends over all pairs of mutations present in the current sample. Denote this proposal distribution by QHθ2f1,2.

Whereas the proposal distributions in this paragraph perform two steps to choose the transition of the Markov chain, we introduce proposal distributions performing only one step in the next paragraph.

One-Step procedure

The ideas of [HUW08] can be extended in another direction by choosing the type in-volved in the most recent event and the size of the possible merger in one step. In this paragraph we present two proposal distributions that let pairs of mutations valuate all possible transitions and then the most recent step is chosen proportionally to the weights.

Note that a possible transition of the proposal Markov chain can be characterised by a pair (i, l) with 1 ≤ i ≤ d and 0 ≤ l ≤ ni−1, where i denotes the type that is involved in the most recent event and l denotes the amount by which the multiplicity is decreased. Denote byl= 0 the case that the outmost mutation of type iis removed from the genetree. The probability for an event (i,0) is zero ifni ≥2 since a type has to be singleton in order to remove the outmost mutation. Forni ≥2 andl >0 let

v{mθ 1,m2}(i, l) :=















ni

d2Qθ(Ndn1,d2 → Ndn−l1−l,d2−l) in Case I

ni

d1−d2Qθ(Ndn1,d2 → Ndn−l1−l,d2) in Case II

ni

d1Qθ(Mnd1,d2 → Mn−ld1−l,d2) in Case III

ni

n−d1Qθ(Ndn1,d2 → Ndn−l1,d2) in Case IV

ni

n−d1−d2Qθ(Mnd1,d2 → Mn−ld1,d2) in Case IV

be the probability that if the genetree would just consist of mutationm1 and m2, then the multiplicity of the type that corresponds toiin the compressed genetree is reduced by l. Again the optimal transition probability has to be multiplied withni divided by the multiplicity of the corresponding type.

If the multiplicity of type iis one, then

v{mθ 1,m2}(i,0) :=















ni

d2Qθ(Ndn1,1→ Ndn1,0)1{d2=1} in Case I

0 in Case II

ni

d1Qθ(Mnd2,1 → Mnd2,0)1{d1=1} in Case III

0 in Case IV

0 in Case V

(4.25)

yields the probability that in the compressed genetree consisting of mutationm1andm2

the most recent event removed the outmost mutation of the type that is identified with i. Note that this probability is zero if the respectivedmjis greater than one, because in this case the outmost mutation cannot be removed in the compressed tree. Furthermore in Case II, IV and V the mutation cannot be removed since it is not a leaf node in the genetree. These quantities can be combined as follows:

Definition 4.22 (Proposal DistributionQHθ2f2,1). Denote by(t,n) the current sample configuration. The proposal distribution QHθ2f2,1 proposes the event (i, l) for 1≤i≤d and 0≤l≤ni−1 to be the most recent evolutionary event with probability

Qfθ2,1(i, l|t,n)∝ (P

{m1,m2}v{mθ 1,m2}(i, l) if iis allowed to act

0 otherwise,

where the sum extends over all pairs of mutations present in the current sample and again type iis allowed to act if ni≥2 or it is a leaf node in the genetree.

Note that in a given sample (t,n), by equation (4.25) the contribution of the pair {m1, m2} to the event (i,0) of removing the outmost mutation of a leaf typei is zero if the corresponding dmj is greater than one. In a general genetree this case appears rather frequently and thus we argue that the proposal distribution QHθ2f2,1 underrates mutation events. This effect is illustrated in Figure 4.8. To circumvent this effect, we modify the proposal distribution by summing only over those pairs of mutations where one of the mutations coincides with the outmost mutation of the current type. This should reduce the number of pairs that put too much emphasis on the merging events and establishes a more balanced proposal distribution.

Definition 4.23 (Proposal DistributionQHθ2f2,2). Denote by(t,n) the current sample configuration. The proposal distribution QHθ2f2,2 chooses the event (i, l) for 1 ≤ i ≤d an0≤l≤ni−1 with probability

QHθ2f2,2(i, l|t,n)∝ (P

m6=o(i)vθ{o(i),m}(i, l) if i is allowed to act

0 otherwise,

whereo(i)denotes the outmost mutation of typeiand the sum extends over all mutations except this outmost mutation.

Remark 4.24. (i) Another promising side effect of this method is that it reduces the complexity of proposing a step from quadratic to linear in the number of mutations.

(ii) The same authors who derived the probabilities (4.23) in the case of Kingman’s co-alescent and one segregating site present equations for the case of two segregating sites in Section 4 of [HW09]. They note in Section 7 that their results “could potentially be used to further improve the proposal distribution for inference in coalescent mod-els.” Indeed, their results can be applied to derive explicit formulas for the quantities um1,m2(l), um

1,m2(i, l), and vm1,m2(l) that govern the proposal distributions regarding pairs of mutations.

(iii) Note that the idea to let mutations valuate all possible transitions can also be applied for the case when just one mutation at a time is considered in the sense ofQHθ1. The next proposal distribution combines the single-mutation approach with the pair approach.

Sesqui-proposal

The complexity of the proposal distributions regarding pairs of mutations is quadratic in the number of mutations, whereas the proposal distributions regarding all mutations have linear complexity. We will see in Section 4.3 that the real time to compute steps for the distributions differ. However, we noted that the method determining the size of the merger in proposal distributionQHθ2f1,1 from Definition 4.20 performs well. Thus a promising candidate concerning speed and performance is given by the combination of proposing a type in the first step considering all mutations separately and then choosing the merging size in the second step by the method from QHθ2f1,1.

Definition 4.25 (Proposal Distribution QSQHθ ). Choose type i to be involved in the most recent event considering all mutations according to the distribution QHθ1 defined in (4.20). If a singleton type is chosen, remove the outmost mutation, whereas in the case of a non-singleton type withni ≥2the multiplicity is decreased bylwith probability Qθ(Ndno,do−ni → Ndn−lo−l,do−ni), where 1≤l≤ni−1.