Empirical Runtime Distributions

4. Importance Sampling in the Infinitely-Many-Sites-Model 89

4.3. Performance Comparison

4.3.2. Empirical Runtime Distributions - Random Samples

We now investigate the performance of the different proposal distributions on samples simulated with the the Markov urn from Definition 3.28 under the Beta-coalescent. For given parametersθ= (r, α) we simulate a sample (t,n) and then estimatec(t,n)L_[t,n](θ) according to equation (4.4) based on several histories independently sampled from the given proposal distribution. Due to Remark 3.24(ii) we multiply the likelihood by c(t,n) to simplify the combinatorial structure of the computations. In the following we occasionally refer to one sampled history as one run. Due to the law of large numbers the estimate converges to the true value with an increasing number of runs.

For a given number of runs the standard deviation would provide a measure of ac-curacy for the estimator. This value is again a random quantity but can be estimated bysd(c(t,b n)L_[t,n](θ)) :=q

¯ v

M, where ¯v is the empirical variance of the single estimates from each run and M is the number of independent runs. Based on this define the

relative error of the estimate by

err := sdb L_[t,n](θ) L_[t,n](θ) .

Intuitively speaking, the negative base-10 logarithm of the relative error roughly gives the number of correct digits of the estimate and it decreases as the variance of the estimator decreases.

One Sample

We first investigate the decrease of the relative error for one sample as the number of runs increases. In Figure 4.9 the base-10 logarithm of the relative error achieved by the different proposal distributions is plotted against the number of runs, again on a base-10 logarithmic scale. The underlying sample was generated randomly with parameters (r = 1.52, α= 1.00) and sample size 15. The likelihood was estimated for the same parameters. The solid horizontal line at 0 denotes a relative error of one, which corresponds to the standard deviation being on the same order of magnitude as the estimate. The dotted line at -2 corresponds to a relative error of -0.01.

Figure 4.9(a) shows that Q^H_θ¹ outperforms Q^GT_θ and Q^SD_θ for this sample, since a relative error of 0.01 can be achieved with a smaller number of independent runs.

Q^Hθ²^f^2,1 performs slightly worse thenQ^Hθ¹ butQ^Hθ²^f^2,2 outperforms both of them, which is shown in Figure 4.9(b). Finally, it is shown in Figure 4.9(c) that the distributions Q^Hθ¹ and Q^SQH_θ are comparable and so are Q^H_θ²^f^1,1 and Q^H_θ²^f^1,2, whereas the latter pair is slightly better. The proposal distributionQ^H_θ²^f^2,2 performs best for this sample.

Many Samples

In the previous paragraphs we compared the performance of the different proposal distri-butions for the inference of the likelihood of a specific sample under specific parameters.

To complement this we now simulate 100 samples under a given pair of parameters and estimate the likelihood for these samples with the same parameters.

We start the analysis with 100 independent runs. Increase this number in each step by multiplying the previous number with four. Perform the given number of runs and combine the result with all results from previous stages. Thus the number of runs increases linearly in the logarithmic scale. We increase the number of runs stepwise until the relative error is less than 0.01. Histograms for the number of runs to achieve this are given in Figure 4.10(a) for the parameters (1, 1.5) and in Figure 4.11(a) for the parameters (1, 2) under certain proposal distributions. Since one simulated sample for α= 2 showed no mutations, we assumed the number of runs to be one.

Another way to measure the time required by a certain method to achieve a relative error below 0.01 is the actual time in seconds the computer needed to perform the computations. The simulations were run on a Cluster with 22 CPUs of which 13 were Opteron 252 CPUs each consisting of 2 cores with 2.6 GHz. The remaining 9 CPUs were

2 3 4 5 6 7

0-1-2-3

Q^GT_θ Q^SD_θ Q^H_θ¹

log(#)

log(err)

(a) Performance ofQ^H_θ¹,Q^GT_θ andQ^SD_θ .

2 3 4 5 6 7

0-1-2-3

Q^H_θ¹ Q^H_θ²^f²^,¹ Q^H_θ²^f²^,²

log(#)

log(err)

(b) Performance of Q^H_θ¹, Q^H_θ²^f^2,1 and Q^H_θ²^f²^,².

2 3 4 5 6 7

0-1-2-3

Q^H_θ¹ Q^H_θ²^f¹^,¹ Q^H_θ²^f¹^,² Q^SQH_θ

log(#)

log(err)

Figure 4.9.: The relative error (err) plotted against the number of runs (#) needed to achieve this error for the different proposal distribution. Both axis are plotted in a base-10 logarithmic scale. The sample of size 15 was simulated under (r = 1.52, α= 1.00) and analysed with the same parameters. The solid line corresponds to a relative error of 1 and the dotted line to 0.01

010203040506070

-0.4 1 2.3 3.6 4.8 6 7.2

Q^GT_θ Q^H_θ¹ Q^H_θ²^f²^,² Q^H_θ²^f¹^,¹

log(#)

count

(a) Histogram of the base-10 logarithmic number of runs needed to obtain a relative error less than 0.01.

010203040506070

-1.5 -0.5 0.5 1.5 2.5

Q^GT_θ Q^H_θ¹ Q^H_θ²^f²^,² Q^H_θ²^f¹^,¹

log(t)

count

(b) Histogram of the base-10 logarithmic real-time needed to obtain a relative error less than 0.01.

Figure 4.10.: Empirical distributions for the number of runs and the real-time for 100 samples of size 15 with, simulated withr= 1 andα= 1.5. The likelihood was computed for the same parameters.

DualCore-Opteron 2218 with each core having 2.6 GHz. The base-10 logarithms of the corresponding times are given in Figure 4.10(b) and Figure 4.11(b) for certain proposal distributions. We assumed a duration of zero for the sample showing no mutation.

Certainly, these times are implementation dependent but as already pointed out, there exist fundamental differences in the algorithms for different proposal distributions.

Whereas the complexity of the proposal distributions introduced in Section 4.2.1 and Section 4.2.2 is basically determined by the number of types and possible steps, the proposal distributions introduced in Section 4.2.3 depend linearly or quadratically on the number of mutations in the sample configuration.

Figure 4.10 and 4.11 show that the ranking of the proposal distributions according to the necessary number of runs basically agrees with the one in Figure 4.9 except that Q^H_θ²^f^1,1 outperforms the other methods in the Kingman case (α = 2). However, the ranking obtained from the real runtimes is different. In both cases theQ^H_θ²^f^1,1 requires the largest amount of time. This is due to the fact that this proposal distribution is quadratic in the number of segregating sites. Thus one independent run takes more time than for example in theQ^GTθ case. In the real time ranking the proposal distribution Q^GT_θ performs well and is the best candidate for r = 1 and α = 2. The proposal distributions Q^Hθ¹ and Q^H_θ²^f^2,2 show a similar performance where the latter is slightly advantageous.

010203040506070

-0.4 1 2.3 3.6 4.8 6 7.2

Q^GT_θ Q^H_θ¹ Q^H_θ²^f²^,² Q^H_θ²^f¹^,¹

log(#)

count

(a) Histogram of the base-10 logarithmic number of runs needed to obtain a relative error less than 0.01.

010203040506070

-1.5 -0.5 0.5 1.5 2.5

Q^GT_θ Q^H_θ¹ Q^H_θ²^f²^,² Q^H_θ²^f¹^,¹

log(t)

count

(b) Histogram of the base-10 logarithmic real-time needed to obtain a relative error less than 0.01.

Figure 4.11.: Empirical distributions for the number of runs and the real-time for 100 samples of size 15 with, simulated with r = 1 and α = 2. Again, the likelihood was computed for the same parameters.

Im Dokument Multiple Merger Coalescents and Population Genetic Inference (Seite 123-127)