• Keine Ergebnisse gefunden

Possible biological causes for shallow genealogies

5. Inference of Evolutionary Parameters from Datasets 125

5.6. Real data

5.6.4. Possible biological causes for shallow genealogies

Case 1) Case 2)

No. Ref. (ˆr,α) (fs)ˆ (˜r,α) (flh)˜ (ˆr,α) (fs)ˆ (˜r,α) (flh)˜ 1 [AP96] (2.4, 1.8) (0.75, 2.0) (1.15, 1.58) (0.7, 1.65) 2 [APP98] (1.3, 1.72) (0.35, 1.5) (1.15, 1.58) (0.6, 1.55) 3 [APKS00] (1.2, 1.62) (0.6, 1.5) (1.1, 1.5) (0.7, 1.5) 4 [CM91] (0.8, 1.33) (0.75, 1.4) (0.8, 1.33) (0.8, 1.4) 5 [CSHW95] (1.1, 1.53) (0.6, 1.45) (1.25, 1.58) (0.6, 1.5) 6 [PC93] (0.6, 1.3) (0.65, 1.4) (0.6, 1.3) (0.6, 1.4) 7 [SA03] (0.8, 1.32) (0.7, 1.35) (0.85, 1.3) (0.7, 1.3)

Table 5.6.: Inference results for Atlantic Cod datasets under the Beta-coalescent based on summary statistics (fs) and on the complete information (flh).

Case 1) Case 2)

No. Ref. (ˆr,ψ) (fs)ˆ (˜r,ψ) (flh)˜ (ˆr,ψ) (fs)ˆ (˜r,ψ) (flh)˜ 1 [AP96] (0.65, 0.015) (0.6, 0.025) (0.8, 0.036) (0.9, 0.03) 2 [APP98] (0.5, 0.024) (0.45, 0.045) (0.8, 0.036) (0.75, 0.045) 3 [APKS00] (0.7, 0.039) (0.75, 0.04) (0.85, 0.06) (1.05, 0.04) 4 [CM91] (0.85, 0.102) (1.05, 0.07) (0.85, 0.102) (1.05, 0.07) 5 [CSHW95] (1 ,0.024) (1.2, 0.02) (0.95, 0.021) (1.6, 0) 6 [PC93] (0.75, 0.078) (1.05, 0.035) (0.75, 0.078) (1.05, 0.035) 7 [SA03] (0.95, 0.084) (1.2, 0.06) (0.95, 0.096) (1.2, 0.065) Table 5.7.: Inference results for Atlantic Cod datasets under theψ-coalescent based on

summary statistics (fs) and on the complete information (flh).

In the next section we argue about the evidence of non-Kingman type genealogies in the Atlantic cod data.

sites and even balanced polymorphisms due to selection by the cellular machinery” [A04, p. 1882].

Concerning indirect selection acting on the mitochondrial genome ´Arnason states that frequent selective sweeps of mitochondrial variation could have brought haplotypes to high frequencies through linkage. Indeed, it is shown for example in [DS04, DS05] that recurrent selective sweeps could explain multiple mergers in genealogies. ´Arnason does not dismiss this hypothesis but considers it rather unlikely since “the main difficulty is to explain why there would be so much adaptive evolution going on for mitochondrial activity in cod” [A04, p. 1883].

High frequencies of different haplotypes might also be explained by population struc-ture, either resulting from the population splitting into various subgroups in different refuges during the ice age(s) or due to local environmental adaptations which are linked to the observed region of the genome. However, this would lead to deep branches in the genealogy as, for example, observed in [P01]. Thus ´Arnason argues that “the shallowness of the genealogy is evidence against these explanations” [A04, p. 1883].

Furthermore, he points out that under these hypotheses, due to considerable environ-mental differences one would expect the Baltic cod to be most genetically divergent, which is not the case.

Since the Atlantic cod shows high fitness variance, ´Arnason considers a sweepstake-like mechanism (in the sense of e.g. [H94]), where a few highly successful individuals give birth to a major part of the next generation, the most plausible cause for the ob-served shallowness of the genealogies. Multiple merger coalescents can lead to shallow genealogies and therefore present an appealing framework to deal with these sweep-stake scenarios. Since such mechanisms also predict high temporal variance in the type configurations ´Arnason claims that “studies of temporal variation are called for to test [the hypothesis] and better resolve the differences between historical and contemporary factors influencing variance in offspring number and effective population sizes”[A04, p. 1883].

It has been discussed that the reproduction mechanism of the Pacific oyster exhibits sweepstake-like features, c.f. for example [H94] or [EW06]. But, as already mentioned above, the population considered in [BBB94] has been introduced one century ago and thus it is likely that the population went through a relatively recent bottleneck. The possibility that the bottleneck influenced the genealogy cannot be ruled out and there-fore the observed shallowness can not be solely attributed to the natural reproduction mechanism of the Pacific Oyster only.

Still, in view of Section 2.6.1 for example, it might be that multiple merger coalescents or even more general models with simultaneous multiple mergers are also a suitable tool to address bottleneck-scenarios.

Model selection

We argued that the multiple merger coalescents might be a good way to model neutral populations with a sweepstake-like reproduction mechanism.

The model of [EW06] introduced in Section 5.2.1 captures the basic features of

sweep-stakes as it allows single individuals to give birth to a substantial fraction of the next generation. However, the authors point out that this is a simple idealised model. In-deed, it seems unclear why there should be reproduction sweepstakes in a real population where each time exactly one uniformly chosen individual produces a fixed fraction of the offspring in the next generation, and this fraction stays constant over time. Still this model serves well as a toy model to investigate characteristic patterns generated by sweepstake-like reproduction mechanisms as for example in [EW06].

We argue that Schweinsberg’s population model and the associated coalescent in-troduced in Section 5.2.2 present a proper way to model neutral populations showing sweepstake-like reproduction as, for example, discussed in [H94]. In the first step it allows every individual to have a large (heavy-tailed) number of offspring independent of the other individuals. As mentioned before this resembles the high fertility of each individual. In the second step a small number of offspring, when compared to the total amount of potential offspring, is chosen to present the next generation. We already pointed out that this mechanism mimics high mortality early in life which also is a characteristic feature of type-III survivorship curves.

One can argue that there is no obvious reason why the offspring distribution should exhibit a power law tail-behaviour or why the the number of individuals that form the next generation should be a given constant which does not vary over time. Still the advantages of the Beta-coalescents are that it does not a-priori restrict the number of sweepstake-winning individuals to one and it imposes no external restriction on the fraction of individuals that stem from a single parent. Furthermore, additional param-eters are needed when introducing finer structure for the reproduction mechanism and this might also depend on the local environment. Thus the Beta-coalescents seem a natural way to model sweepstake-like behaviour while still providing a one-parameter subfamily of coalescent processes therefore being suitable for statistical enquiries.

Observed non-Kingmanness

We are not able to present confidence intervals for our maximum likelihood estimators.

Furthermore, as shown in Section 5.5 the variance of the empirical distributions is sub-stantial for sample sizes of about 100 individuals and we have to expect the confidence intervals to be on the same order of magnitude. Thus it is hard to argue that the pre-sented point estimates in Table 5.6, Table 5.7 and Table 5.2 provide the “real” shape parameter of the underlying genealogies. But in view of the approximate marginal em-pirical distributions, especially Figure 5.7, it seems unlikely that the observed sample variability is produced under Kingmans’s coalescent since the point estimates from dif-ferent localities agree quite well on values away from the Kingman axis. Thus we do not claim that the presented coalescents and associated forward models describe the truth about the reproductive behaviour of the Atlantic cod or the Pacific oyster, but we argue that they are quite helpful to give sound evidence that the presented data is not based on neutral Kingman-type genealogies.

In the last section we present an application of the Λ-coalescent to another interesting population genetic question.

A98 S03 C91

replacemen

cum.prob.

coal.-time-unit

0.00.20.40.60.81.0

1 2 3 4

(a) The distribution functions for three datasets from Case 1).

A00 C91 P93

cum.prob.

coal.-time-unit

0.00.20.40.60.81.0

1 2 3 4

(b) The distribution functions for three datasets from Case 2).

Figure 5.15.: The cumulative distribution function of the time to the most recent com-mon ancestor conditioned on the dataset. For each case only the largest, the smallest and an intermediate function is shown.