• Keine Ergebnisse gefunden

5. Inference of Evolutionary Parameters from Datasets 125

5.7. Concluding Remarks

In Section 4.3.4 we argue that the proposal distributionQHθ1 shows good performance and can be used for the sample sizes in the analyses from the present Chapter. We in-troduce several methods to further improve the performance of the maximum likelihood analysis. We show that maximum likelihood estimates of the tree-shape parameter in the different datasets from marine species do not correspond to Kingman-type genealo-gies. In the case of the Atlantic Cod we reason that this is due to a sweepstake-like reproduction behaviour. However, the empirical distributions of the maximum likeli-hood estimators show that for the given sample sizes the variance is still rather large.

Thus we can not present hard statistical proof of the non-Kingmanness in the Atlantic cod datasets.

It is called for further improvements of the likelihood computation methods to cope with and present empirical distributions for larger sample sizes. But in view of the results in [D96], as the sample size increases, the variance reduction for the maximum likelihood estimators might not be as strong as expected, due to the correlation struc-ture of the sampled individuals. Thus it might be necessary to consider more general approaches, for example incorporating recombination and analysing multi-locus data, to obtain more precise results for the inferred evolutionary parameters.

We now describe the tools and datasets that can be found on the compact disc attached to this work.

A.1. MetaGeneTree

The program MetaGeneTreecan be used to compute exact likelihoods or estimate the likelihoods via importance sampling for a given sample configuration [t,n] under cer-tain Λ-coalescent models and the infinitely-many-sites-model. It implements the exact recursion given in Section 3.3 to calculate c(t,n)p[t,n] in an iterative fashion. Fur-thermore an implemention of importance sampling in the sense of equation (4.4) for all proposal distribution schemes from Section 4.2 is provided.

The program can be found in the foldermeta gene tree/ on the compact disc. The documentation in HTML-format is provided in the subdirectory html/ and can also be generated from the file my doxyfle. To compile the program run ./configure &&

make in the main folder. The program checks for the libraries cppunit available at http://sourceforge.net/projects/cppunit/and the gnu scientific libraygsl avail-able at http://www.gnu.org/software/gsl/. Depending on the local library situa-tion it might be necessary to reconfigure the installasitua-tion files by running the command autoreconf -f -i -v before the configure script. After the program is compiled suc-cessfully, the binary execution file is located at bin/MetaGeneTree. Before providing a description of the full usage and providing some examples on calling the program, we first give some details of the methods implemented in the program and refer to the documentation for more details.

The iterative computation of the exact likelihoods only terminates for samples with a small complexity. Based on the likelihoods for all subconfiguration (t,n) of (t,n) that havemmutations, the likelihoods for all subconfigurations havingm+ 1 mutations can be computed. Proceeding iteratively the program arrives at the likelihood for (t,n).

It is also possible to evaluate the recursions only for all subconfigurations having a given number n of mutations (this is controlled by the argument -n). These values can then be used directly as a basis for importance sampling estimations in the sense of Section 5.1.2 or stored to a file for later use.

The second major feature of the program is the importance sampling implementation.

It can be used to estimate the likelihood of a given sample (t,n) via the estimator given in equation 4.4. The program simulates a given number of independent histories (or runs), computes the importance weights, and combines them suitably to return the resulting estimate. The proposal distribution used to simulate the histories can be

155

chosen via the argument-d. We provide implementations for all proposal distributions from Section 4.2.

Both methods can be used to estimate likelihoods for a group of parameters based on a single driving value according to the procedure introduced in Section 5.1.1. For the details on the implementations of the different methods we refer to the documentations.

We now present a few examples on how to use the program.

In case thecppunit-testsuite is provided, the callmake checkruns a number of tests that should guarantee that the program works properly. These tests include relations for the rates and comparison of the exactly calculated likelihood with the estimated value, see the sourcecode for more details on the tests. In particular line 50 and 51 of the file tests/complete test.cpp can be adapted to check the correctness of the different proposal distributions.

Example A.1. Assume throughout this example that the binary file of the program is found at bin/MetaGeneTree The sample configuration that should be analysed is handed to the program in an input file whose path is specified on the command line. The file is interpreted as follows: On each line there is a multiplicity followed by a colon and a type. The type is given as a space-separated list of integers corresponding to mutation numbers. Note that each line has to be terminated by the root mutation 0. The given multiplicity is used as the multiplicity for this type in the sample configuration. If the file does not present a valid sample configuration according to the three conditions given in Section 3.3.1, then an error is returned. An input file might for example look as follows:

3 : 2 1 0 2 : 5 0 1 : 4 3 1 0

This input file corresponds to the sample configuration h (2,1,0),(5,0),(4,3,1,0)

,(3,2,1)i .

If this input file is namedinput.tre, then a possible program-call is bin/MetaGeneTree -f input.tre -a 1.5 -m 1 -n 10

The program then analyses the tree in the file input.tre under the parameter r = 1 (mutation rate -m) andα = 1.5 (tree-shape parameter for the Beta-coalescent -a) for the underlying evolutionary model. Furthermore, the likelihood values for all subcon-figurations having 10 mutations should be computed. Since the given sample shows only 5 segregating sites this corresponds to calculating the recursive formula for the full sample. The program returns the output

1 1.5 9.60339e-05

The first value is the mutation rate, the second value the tree parameter and the third value the exactly computed likelihood.

To use the importance sampling method together with the driving value method from Section 5.1.1 an additional file with a list of parameter pairs has to be provided. In this

file on every line there are two space-separated values, interpreted as the mutation rate and the tree-shape parameter respectively. Assume the fileex.par looks like:

1 1.75 2 1.5 3 1.25

The program has to be called as follows:

MetaGeneTree -f input.tre -a 1.5 -m 1 -n 0 -pa ex.par -r 10000 -s 4711 -d 8 The primary parameters given by-aand -m are used as driving values and likelihoods are estimated for the parameters in the file. Since now histories (independent runs) should be simulated under a given proposal distribution-d 8, the number of indepen-dent runs (-r) and a seed (-s) has to be specified. The option -n 0 tells the program to calculate the likelihoods for all subconfigurations with 0 segregating sites, thus it simulates and evaluates the full path of the histories to the root configuration. The output is given as:

1 1.75 0.000125178 1.26983e-06 2 1.5 4.91742e-05 3.40743e-07 3 1.25 1.24948e-05 1.11985e-07

As before the first two values give the respective mutation rate and tree-shape param-eter. The third value is the estimated likelihood and the fourth value the estimated standard deviation.

To combine the importance sampling method with the pre-calculation a number be-tween zero and the number of mutations in the sample has to be specified by the -n parameter. The program call

MetaGeneTree -f input.tre -w 0.5 -m 1 -n 2 -r 10000 -s 4711 -d 8

thus results in the pre-calculation of the likelihoods for all subconfigurations having 2 mutations and then simulating the remaining path according to the proposal distribu-tion -d 8to obtain the estimate. The program returns

1 0.5 2.70136e-05 1.5913e-07

where again the first two values are the parameter for the analysis followed by the estimated likelihood and the estimated standard deviation.

We now describe the command line options of the program (see also the usage of the program).

-a value Analyse the given sample under the Beta-coalescent{Λβα}0<α≤2 from Sec-tion 5.2.2 and value provides the parameter for this family.

-c file Assumes that the likelihoods for subconfigurations of the given (-n) size have been previously calculated and stored in file. The values are loaded, roughly checked, and then the values are used as a basis for the importance sampling estimation specified by the other parameters.

-dvalue Specifies the proposal distribution used for the analysis. The value can be a number between 1 and 19, but we just present the options cor-responding to proposal distributions from Section 4.2. We refer to the documentation for the remaining distributions. The possible values are:

1 –QGTθ 7 –QSDθ 8 –QHθ1 10 – QHθ2f2,1

12 –QHθ2f2,2 14 –QHθ2f1,1 15 –QHθ2f1,2 18 –QSQHθ

-evalue Analyse the given sample under the coalescent obtained for γ = 2 in Section 5.2.1 and usevalue as the parameter for this family.

-ffile Specifies thefile containing the sample configuration for the analysis -gfile Pre-calculate the likelihoods of all subconfigurations having a given

num-ber of mutations (-n) and then stores the results intofile.

-h Print the usage.

-lvalue Specify the limit to split runs into groups. If the number of requested independent runs given by the parameter-ris too large it might be more efficient to analyse smaller groups of runs and suitably combine the results to return an estimate. If the given number of runs is larger than value,

√#runs many groups of size√

#runs are formed, analysed separately, and then the results are combined.

-mvalue The mutation rate for the underlying evolution model.

-nvalue The likelihoods for all subconfigurations having value mutations are cal-culated using the exact recursion. If the number is greater than or equal to the number of mutations in the sample, then its likelihood is calculated exactly. If it is smaller, then the values are pre-calculated and combined with importance weights of partially sampled histories in order to esti-mate the likelihood. If -n is zero, then the full paths of the histories are simulated.

-pafile Specify a parameter file where each line consists of a mutation rate and a tree-shape parameter for the coalescent model that is used for parameter -a. If this file is specified and the number given by -n is smaller than the number of mutations in the sample, then the driving value method (see Section 5.1.1) is performed. Otherwise the likelihoods for the given parameters are calculated exactly.

-pefile Specify a parameter file where each line consists of a mutation rate and a tree-shape parameter for the coalescent model that is used for parameter -e. If this file is specified and the number given by -n is smaller than the number of mutations in the sample, then the driving value method (see Section 5.1.1) is performed. Otherwise the likelihoods for the given parameters are calculated exactly.

-pwfile Specify a parameter file where each line consists of a mutation rate and a tree-shape parameter for the coalescent model that is used for parameter -w. If this file is specified and the number given by -n is smaller than the number of mutations in the sample, then the driving value method (see Section 5.1.1) is performed. Otherwise the likelihoods for the given parameters are calculated exactly.

-r value The number of independent runs of the Markov chain or histories to be simulated in order to estimate the likelihood value.

-s value Specify the seed.

-t value The value is either a single value or a vector given as “(hvaluei, ...

,hvaluei)”. It it is a single value, then the probability that under the given sample and parameters the most recent common ancestor lived before time value is returned. If value is vector, then this analysis is performed for each entry.

-v Output more verbose.

-w value Analyse the given sample under the single atom coalescent{Λatomψ }0≤ψ≤1

obtained for 1 < γ < 2 in Section 5.2.1 and use value as the parameter for this family.

-x Print all importance sampling weights obtained by the independent runs.

A.2. TreeCount

The tool TreeCount can be employed to calculate the number of subconfigurations of a given sample (z,n) that have a given number of mutations. Note that the mutations are labelled. The program is located in the directorytree count/ and the command ./configure && makeproduce the executable binarybin/TreeCount. Again it is pos-sible that previous reconfiguration viaautoreconf -f -i -v is needed. The program can be called with the following options:

-f file This file contains the sample configuration (z,n).

-h file Print the usage.

-v file Output more verbose.

The program starts counting the number of subconfigurations for each number of mutations up to the number of mutations in the sample. It returns the total number of subconfigurations. If the verbose option-vis set, then the number of subconfigurations for each mutation number is also returned.

A.3. Data

Parts of the data for the different analyses performed in this work is also located on the compact disc.

Performance Comparison

In the folder performance comparison/ the data for the analysis in Section 4.3 is provided. The various scripts in the main folder can be used to generate sample config-urations, analyse those and collect the number of runs and times needed for the analysis.

The subdirectory lines plot/ provide the data for Figure 4.9. The other subdirecto-ries contain the simulated trees and the collected data from the program calls for the empirical runtime distributions provided throughout Section 4.3.

Empirical Distributions

In the directory empirical distributions/ the data for the analysis presented in Section 5.5 is provided. The subfolders contain trees simulated under the respective pa-rameters. The script driving graduiert approximately.py implements the method described in Section 5.3.1 and can be used to approximate the maximum likelihood pa-rameter values for the different samples. The files*.mlesprovide the estimates, where the first two values in a row are the mutation rate and the α parameter respectively and the last value gives the corresponding tree number.

Real Data

The directory real data analysis/ contains the data for the analysis provided in Section 5.6. The subdirectories contain the tree files corresponding to the datasets having the file extension .tre. The first level subdirectories contain the samples of the Case 1) analysis whereas the Case 2) samples are provided inconsistent data/.

The data for the likelihood surfaces are located in the corresponding subdirectories in the files called complete output. If the directory name includes sssss, then it refers to the analysis based in the frequency spectrum, whereas the absence of this tag refers to the likelihood analysis based on the full sample configuration. The “exact”-folders provide exact calculated likelihoods. “lonely” refers to the method where at every parameter point the estimation is performed independently of the other, whereas

“proposal” refers to the described method combining estimates for one parameter-pair using different driving values. The scripts located inconsistent data/can be used to perform the respective analysis.

[A04] Arnason, E.:´ Mitochondrial cytochromebdna variation in the high-fecundity at-lantic cod: trans-atat-lantic clines and shallow gene genealogy.Genetics, 166(4):1871–

1885, (2004).

[ABT99] Arratia, R.; Barbour, A. D.; Tavar´e, S.: The Poisson-Dirichlet distribution and the scale-invariant poisson process. Combin. Probab. Comput., 8(5):407–416, (1999).

[AH07] Alkemper, R.; Hutzenthaler, M.: Graphical representation of some duality relations in stochastic population models. Electron. Comm. Probab., 12:206–220, (2007).

[AP96] Arnason, E.; Palsson, S.:´ Mitochondrial cytochromebdna sequence variation of atlantic cod, gadus morhua, from norway. Mol. Ecol., 5:715–724, (1996).

[APKS00] Arnason, E.; Petersen, P. H.; Kristinsson, K.; Sigurg´ıslason, H.:´ Mi-tochondrial cytochromebdna sequence variation of atlantic cod from iceland and greenland. J. Fish Biol., 56:409–430, (2000).

[APP98] Arnason, E.; Petersen, P. H.; P´´ alsson, S.:Mitochondrial cytochromebdna sequence variation of atlantic cod, gadus morhua, from the baltic and the white seas. Hereditas, 129(1):37–43, (1998).

[AS05] Athreya, S. R.; Swart, J. M.: Branching-coalescing particle systems. Probab.

Theory Related Fields, 131(3):376–414, (2005).

[BB08] Birkner, M.; Blath, J.: Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model.J. Math. Biol., 57(3):435–465, (2008).

[BB09] Birkner, M.; Blath, J.:Measure-valued diffusions, general coalescents and pop-ulation genetic inference. In: Trends in Stochastic Analysis, London Mathematical Society Lecture Note Series (No. 353), pages 329–363. Cambridge University Press, (2009).

[BBB94] Boom, J. D. G.; Boulding, E. G.; Beckenbach, A. T.: Mitochondrial dna variation in introduced populations of pacific oyster, crassostrea gigas, in british columbia. Can. J. Fish. Aquat. Sci., 51:1608–1614, (1994).

[BBC+05] Birkner, M.; Blath, J.; Capaldo, M.; Etheridge, A.; M¨ohle, M.;

Schweinsberg, J.; Wakolbinger, A.: Alpha-stable branching and beta-coalescents. Electron. J. Probab., 10:no. 9, 303–325 (electronic), (2005).

[BBM+09] Birkner, M.; Blath, J.; M¨ohle, M.; Steinr¨ucken, M.; Tams, J.: A mod-ified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Am. J. Probab. Math. Stat., 6:25–61, (2009).

161

[BBS09a] Birkner, M.; Blath, J.; Steinr¨ucken, M.: Analysis of dna sequence variation under highly skewed offspring distributions and application to atlantic cod and pacific oyster data (working title). Manuscript, (2009).

[BBS09b] Birkner, M.; Blath, J.; Steinr¨ucken, M.: Importance sampling for λ-coalescents in the infinitely many sites model (working title). Manuscript, (2009).

[BLG03] Bertoin, J.; Le Gall, J.-F.: Stochastic flows associated to coalescent processes.

Probab. Theory Related Fields, 126(2):261–288, (2003).

[BLG05] Bertoin, J.; Le Gall, J.-F.: Stochastic flows associated to coalescent pro-cesses. II. Stochastic differential equations.Ann. Inst. H. Poincar´e Probab. Statist., 41(3):307–333, (2005).

[BLG06] Bertoin, J.; Le Gall, J.-F.: Stochastic flows associated to coalescent processes.

III. Limit theorems. Illinois J. Math., 50(1-4):147–181 (electronic), (2006).

[BS98] Bolthausen, E.; Sznitman, A.-S.: On ruelle’s probability cascades and an abstract cavity method. Comm. Math. Phys., 197(2):247–276, (1998).

[C74] Cannings, C.: The latent roots of certain markov chains arising in genetics: a new approach. i. haploid models. Adv. in Appl. Probab., 6:260–290, (1974).

[C75] Cannings, C.: The latent roots of certain markov chains arising in genetics: a new approach. ii. further haploid models.Adv. in Appl. Probab., 7:264–282, (1975).

[C99] Chang, J. T.: Recent common ancestors of all present-day individuals. Adv. in Appl. Probab., 31(4):1002–1038, (1999). With discussion and reply by the author.

[CM91] Carr, S. M.; Marshall, H.: Detection of intraspecific dna sequence variation in the mitochondrial cytochrome b gene of atlantic cod (gadus morhua) by the polymerase chain reaction. Can. J. Fish. Aquat. Sci., 48:48–52, (1991).

[CSHW95] Carr, S. M.; Snellen, A. J.; Howse, K. A.; Wroblewski, J. S.: Mitochon-drial dna sequence variation and genetic stock structure of atlantic cod (gadus morhua) from bay and offshore locations on the newfoundland continental shelf.

Mol. Ecol., 4(1):79–88, (1995).

[D86] Donnelly, P.: Dual processes in population genetics. In: Stochastic spatial processes (Heidelberg, 1984), volume 1212 of Lecture Notes in Math., pages 94–

105. Springer, Berlin, (1986).

[D96] Donnelly, P.: Interpreting genetic variability: The effects of shared evolutionary history. Ciba Found. Symp., 197:25–40, (1996).

[D08] Durrett, R.: Probability Models for DNA Sequence Evolution (Probability and Its Applications). Springer, Berlin, 2nd ed. edition, (2008).

[DGP06] Dong, R.; Gnedin, A.; Pitman, J.: Exchangeable partitions derived from markovian coalescents.http://arxiv.org/abs/math/0603745, version 1, (2006).

[DIG04] De Iorio, M.; Griffiths, R. C.: Importance sampling on coalescent histories.

I. Adv. in Appl. Probab., 36(2):417–433, (2004).

[DK96] Donnelly, P.; Kurtz, T. G.: A countable representation of the Fleming-Viot measure-valued diffusion. Ann. Probab., 24(2):698–742, (1996).

[DK99] Donnelly, P.; Kurtz, T. G.: Particle representations for measure-valued pop-ulation models. Ann. Probab., 27(1):166–205, (1999).

[DS04] Durrett, R.; Schweinsberg, J.: Approximating selective sweeps. Theor.

Popul. Biol., 66(2):129–138, (2004).

[DS05] Durrett, R.; Schweinsberg, J.: A coalescent model for the effect of advan-tageous mutations on the genealogy of a population. Stochastic Process. Appl., 115(10):1628–1657, (2005).

[E72] Ewens, W. J.: The sampling theory of selectively neutral alleles. Theoret. Popul.

Biol., 3:87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376, (1972).

[E00] Etheridge, A. M.: An introduction to superprocesses, volume 20 ofUniversity Lecture Series. American Mathematical Society, Providence, RI, (2000).

[EG87] Ethier, S. N.; Griffiths, R. C.: The infinitely-many-sites model as a measure-valued diffusion. Ann. Probab., 15(2):515–545, (1987).

[EK86] Ethier, S. N.; Kurtz, T. G.: Markov processes. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley &

Sons Inc., New York, (1986).

[EK93] Ethier, S. N.; Kurtz, T. G.: Fleming-Viot processes in population genetics.

SIAM J. Control Optim., 31(2):345–386, (1993).

[EW06] Eldon, B.; Wakeley, J.: Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics, 172:2621–2633, (2006).

[F22] Fisher, R. A.: On the dominace ratio.Proc. Roy. Soc. Edin., 42:321–431, (1922).

[FL93] Fu, Y.-X.; Li, W.-H.: Statistical tests of neutrality of mutations. Genetics, 133(3):693–709, (1993).

[FM09] Freund, F.; M¨ohle, M.: On the time back to the most recent common ancestor and the external branch length of the bolthausen-sznitman coalescent. submitted to Markov Process. Related Fields, (2009).

[FV79] Fleming, W. H.; Viot, M.: Some measure-valued Markov processes in popula-tion genetics theory. Indiana Univ. Math. J., 28(5):817–843, (1979).

[G87] Griffiths, R. C.: Counting genealogical trees. J. Math. Biol., 25(4):423–431, (1987).

[G91] Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks, 21(1):19–28, (1991).

[GJS08] Griffiths, R. C.; Jenkins, P. A.; Song, Y. S.: Importance sampling and the two-locus model with subdivided population structure. Adv. in Appl. Probab., 40(2):473–500, (2008).

[GPW09] Greven, A.; Pfaffelhuber, P.; Winter, A.: Convergence in distribution of random metric measure spaces (Λ-coalescent measure trees). Probab. Theory Related Fields, 145(1-2):285–322, (2009).

[GT94] Griffiths, R. C.; Tavar´e, S.: Ancestral inference in population genetics.

Statist. Sci., 9(3):307–319, (1994).

[GT97] Griffiths, R. C.; Tavar´e, S.: Computational methods for the coalescent. IMA Vol. Math. Applic., 87:165–182, (1997).

[H84] Hoppe, F. M.: P´olya-like urns and the Ewens’ sampling formula. J. Math. Biol., 20(1):91–94, (1984).

[H94] Hedgecock, D.: Does variance in reproductive success limit effective population size of marine organisms? In: Genetics and Evolution of Aquatic Organisms, pages 123–134. Chapman and Hall, London, (1994).

[HUW08] Hobolth, A.; Uyenoyama, M. K.; Wiuf, C.: Importance sampling for the infinite sites model. Stat. Appl. Genet. Mol. Biol., 7, Iss. 1:Article 32, (2008).

[HW09] Hobolth, A.; Wiuf, C.: The genealogy, site frequency spectrum and ages of two nested mutant alleles. Theor. Popul. Biol., 75(4):260–265, (2009).

[JB96] Johansen, S.; Bakke, I.: The complete mitochondrial dna sequence of atlantic cod (gadus morhua): relevance to taxonomic studies among codfishes. Mol. Mar.

Biol. Biotechnol., 5(3):203–214, (1996).

[JS04] Jagers, P.; Sagitov, S.: Convergence to the coalescent in populations of sub-stantially varying size. J. Appl. Probab., 41(2):368–378, (2004).

[K69] Kimura, M.: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61(4):893–903, (1969).

[K75] Kingman, J. F. C.: Random discrete distributions. J. Royal Statist. Soc. B, 37:1–22, (1975).

[K82a] Kingman, J. F. C.: The coalescent. Stochastic Process. Appl., 13(3):235–248, (1982).

[K82b] Kingman, J. F. C.: On the genealogy of large populations. J. Appl. Probab., 19A:27–43, (1982). Essays in statistical science.

[K93] Kingman, J. F. C.: Poisson processes, volume 3 ofOxford Studies in Probability.

The Clarendon Press Oxford University Press, New York, (1993). Oxford Science Publications.

[K07] Klenke, A.: Probability Theory: A Comprehensive Course (Universitext).

Springer, Berlin, 1st edition, (2007).

[KM72] Karlin, S.; McGregor, J.: Addendum to a paper of w. ewens. Theor. Popul.

Biol., 3:113–116, (1972).

[KR08] Kurtz, T. G.; Rodrigues, E.: Poisson representations of branching markv abd measure-valued branching processes. Preprint, (2008).

[KT81] Karlin, S.; Taylor, H. M.: A second course in stochastic processes. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, (1981).

[L85] Liggett, T. M.: Interacting particle systems, volume 276 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci-ences]. Springer-Verlag, New York, (1985).

[M58] Moran, P. A. P.: Random processes in genetics. Proc. Camb. Philos. Soc., 54:60–71, (1958).

[M99] ohle, M.: The concept of duality and applications to Markov processes arising in neutral population genetics models. Bernoulli, 5(5):761–777, (1999).

[M00] ohle, M.: Total variation distances and rates of convergence for ancestral coa-lescent processes in exchangeable population models. Adv. Appl. Probab., 32:983–

993, (2000).

[M01] ohle, M.: Forward and backward diffusion approximations for haploid ex-changeable population models. Stochastic Process. Appl., 95:133–149, (2001).

[M06] ohle, M.: On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli, 12(1):35–53, (2006).

[MS01] ohle, M.; Sagitov, S.: A classification of coalescent processes for haploid exchangeable population models. Ann. Probab., 29(4):1547–1562, (2001).

[MS03] ohle, M.; Sagitov, S.: Coalescent patterns in diploid exchangeable population models. J. Math. Biol., 47(4):337–352, (2003).

[N64] Nagasawa, M.: Time reversions of markov processes. Nagoya Math. J., 24:177–

204, (1964).

[P99] Pitman, J.: Coalescents with multiple collisions. Ann. Probab., 27(4):1870–1902, (1999).

[P01] Pogson, G. H.: Nucleotide polymorphism and natural selection at the panto-physin (pan i) locus in the atlantic cod, gadus morhua (l.). Genetics, 157(1):317–

330, (2001).

[PC93] Pepin, P.; Carr, S. M.: Morphological, meristic, and genetic analysis of stock structure in juvenile atlantic cod (gadus morhua) from the newfoundland shelf.

Can. J. Fish. Aquat. Sci., 50:1924–1933, (1993).

[R59] Rosenblatt, M.: Functions of a markov process that are markovian. J. Math.

Mech., 8:585–596, (1959).

[RW94] Rogers, L. C.; Williams, D.: Diffusions, Markov Processes and Martingales, volume 1. Wiley, New York, 2nd edition, (1994).

[RY05] Revuz, D.; Yor, M.: Continuous Martingales and Brownian Motion (Grundlehren der mathematischen Wissenschaften Volume 293 A Series of Com-prehensive Studies in Mathematics). Springer, Berlin, 3 edition, (2005).

[S99] Sagitov, S.: The general coalescent with asynchronous mergers of ancestral lines.

J. Appl. Probab., 36(4):1116–1125, (1999).

[S00a] Schweinsberg, J.: Coalescents with simultaneous multiple collisions. Electron.

J. Probab., 5:Paper no. 12, 50 pp. (electronic), (2000).

[S00b] Stephens, M.: Times on trees, and the age of an allele. Theor. Popul. Biol., 57(2):109 – 119, (2000).

[S03a] Sagitov, S.: Convergence to the coalescent with simultaneous multiple mergers.

J. Appl. Probab., 40(4):839–854, (2003).

[S03b] Schweinsberg, J.: Coalescent processes obtained from supercritical Galton-Watson processes. Stochastic Process. Appl., 106(1):107–139, (2003).

[SA03] Sigurg´ıslason, H.; ´Arnason, E.: Extent of mitochondrial dna sequence varia-tion in atlantic cod from the faroe islands: a resoluvaria-tion of gene genealogy.Heredity, 91(6):557–564, (2003).