An adaptive Metropolis-Hastings algorithm for posterior sampling . 56

3.2 Posterior computation for ODE-based models

3.2.3 An adaptive Metropolis-Hastings algorithm for posterior sampling . 56

A possible candidate for posterior sampling in our setting is a so-called adaptive algorithm proposed by Haario et al. (2001), which is a modification of the classical Metropolis-Hastings algorithm. The intuition behind the approach is to adjust the transition proposal distribution q on the run according to the so far generated samples. A natural way of doing this using the standard Metropolis-Hastings algorithm is to first generate a rough

sample of the posterior using a possibly inefficient proposalq and afterwards define a new informed proposal function based on the generated sample to achieve a better performance in a second comprehensive run. The adaptive method is putting this approach to the next level as the proposal function q is adjusted at each step. Thus, the chain is continuously

"learning" to more frequently propose parameter vectors within regions of high posterior density. Furthermore, the step size is adjusted to obtain a reasonable acceptance rate.

This approach provides a decreasing autocorrelation of the MCMC sample and thus a faster convergence rate to the posterior distribution compared to the classical Metropolis-Hastings as it was shown via simulation by Haario et al. (2001). Therefore, the algorithm requires less costly evaluations of the posterior and likelihood function subject to generating a representative sample. The adaptive procedure works as illustrated in Algorithm 2.

The adaptive part of the algorithm is to be found in Step 1 of Algorithm 2. During an initial period (j ≤K) the transition kernel has a fixed covariance matrix Σ₀ which has to be provided in advance. Classical candidates for this could be the prior distribution’s covariance matrix or, if available, an educated guess of the posterior’s covariance matrix.

After the initial period has passed the transition density is defined depending on the chain’s past values. The reasoning for this definition is that the chain itself already yields a rough sample from the posterior and therefore provides information regarding the posterior covariance. Using this information ensures that the algorithm is more likely to search within areas of high posterior density and, thus, avoids improbable parameter vectors due to a badly defined proposal density.

It is important to point out that the adaptive algorithm does not yield a Markov chain. However, the ergodic properties are still fulfilled as shown by Roberts and Rosenthal (2007, Corollary 6) and Haario et al. (2001, Theorem 2) such that the resulting sample, if sufficiently large, is still representative for the posterior distribution .

The scaling parameter s_d controls the acceptance ratio of the chain and should secure an efficient search. According to Gelman et al. (1996) an efficient choice is given by

s_d= 2.4² d ,

withd being the model parameter dimension. This scaling parameter achieves a large step size while still maintaining high acceptance rates for the case where the unscaled proposal covariance is of similar magnitude as the posterior covariance. There are approaches to also compute the scaling factor adaptively during the run to maximise the mean squared jump size E[kϑ^(j+1)−ϑ^(j)k²], see e.g. Pasarica and Gelman (2010). However, this method is so far only investigated for the univariate case.

Altogether, the adaptive algorithm proposed by Haario et al. (2001) works very effi-ciently compared to a classical Metropolis-Hastings procedure, as they have shown within a simulation-based analysis. The superior performance is most pronounced in cases where the initial proposal density does not match the correlation structure within the posterior distribution or where the proposed step sizes would be far too small or too large com-pared to the posterior variance. In these cases the adaptive method is able to considerably

Algorithm 2: Adaptive Metropolis-Hastings

Input: π(ϑ|D): (unnormalized) posterior density

Input: Σ₀: initial covariance matrix of the proposal distribution Input: s_d: scaling factor for the sample covariance matrix Input: ϑ⁽⁰⁾: initial parameter value of the chain

Input: J: length of the chain

Input: K: length of the initial period

Output: Θ= (ϑ_j)_j=1,...,J: sample from the posterior distribution for j = 1 to J do

1. Set the Gaussian proposal densityq_j to be

q_j(ϑ,ϑ^∗) = φ_ϑ,Σ_j(ϑ^∗) where φ is the multivariate normal density with

Σ_j =

( Σ₀ ,ifj ≤K s_dCov^d ϑ⁽⁰⁾, . . . ,ϑ^(j−1) ,ifj > K where Cov^d ϑ⁽⁰⁾, . . . ,ϑ^(j−1) denotes to the empirical covariance and s_d is a predefined scaling factor.

2. Generate a candidate vector ϑ^∗ using the proposal distribution q_j(ϑ^(j−1),·) 3. Compute the acceptance probability α=Aϑ^(j−1),ϑ^∗ with

Aϑ^(j−1),ϑ^∗= min

(

1, π(ϑ^∗|D)

π(ϑ^(j−1)|D)

)

4. Setϑ^(j)=ϑ^∗ with probability α and ϑ^(j) =ϑ^(j−1) otherwise end

improve the proposal distribution on the run. This makes the presented procedure par-ticularly suitable for Bayesian inference in complex ODE-based settings such as in the context of modelling dynamic disease transmission, because in these models the parame-ters to be estimated often exhibit high correlation and the marginal posterior variances are difficult to guess in advance. This would make the construction of a suitable fixed proposal distribution practically impossible.

3.3 Marginal likelihood and model selection

One aim within the task of estimating the rotavirus vaccine effectiveness trough a trans-mission model in Chapter 5 was to find the model among an ensemble of models Mthat is most suitable to explain the observed rotavirus incidence data.

As outlined in Chapter 2 in Bayesian statistics the validity of a modeli∈ Mis measured by the marginal likelihood of the data f^(M)(D|i) given the model, i.e.

f^(M)(D|i) =

f⁽ⁱ⁾Dϑ⁽ⁱ⁾π⁽ⁱ⁾ϑ⁽ⁱ⁾dϑ⁽ⁱ⁾,

which can be interpreted as the expectation of the model specific likelihood function f⁽ⁱ⁾ with respect to the prior distributionπ⁽ⁱ⁾.

Within this section we will present methods for computing or estimating this marginal likelihood for any given model. In this context, an estimation approach proposed by Chib and Jeliazkov (2001) based on the detailed-balance condition will be explained in more de-tail. Based on their method we will derive novel marginal likelihood estimation algorithms which better address the challenges of ODE-based models, i.e. a high parameter dimen-sion as well as the computation effort for evaluating these models. The new algorithms’

performance is later investigated within a simulation-based setting.

Since in the following we will talk about how to compute the marginal likelihood for a specific model, the index (i) will be dropped and the model specific likelihood and prior and posterior functions will be denoted byf(D|·),π(·) and π(· |D), respectively.

Similarly to sampling from the posterior distribution it is often difficult to compute the marginal likelihood of a given model analytically, since it involves an integration problem which can be solved only for certain classes of prior distribution and likelihood function, e.g.

members of the exponential family and their conjugate prior. In other cases the marginal likelihood also has to be approximated by numerical methods (Kass and Raftery, 1995).

Due to the same reasons stated in Section 3.2.3 we are interested in an approach that requires only few evaluations of the posterior. Thus, an estimation procedure which could utilize an existing posterior sample would be preferable.

Raftery et al. (2007) outlaid an approach, which is solely based on an existing posterior sample and does not at all require any additional posterior evaluations. Their procedure utilizes the harmonic mean identity of the marginal likelihood, i.e. the inverted expectation

of the inverted likelihood function with respect to the posterior f^(M)(D) =

f(D|ϑ)⁻¹π(ϑ|D)dϑ

−1

Thus, the marginal likelihood can easily be estimates using the likelihood values from a posterior sample, which yields the Monte Carlo estimate

fˆ^(M⁾(D) =



K⁻¹

j=1

f(D|ϑ_j)⁻¹





−1

where (ϑj)j=1,...,K is a sample from the posterior. However, for many applications the resulting estimator has infinite variance and is often unstable as the integrand is large in regions where the posterior, i.e. the integrator, is small such that the corresponding sample is very sparse in those regions. While Raftery et al. (2007) propose parameter dimension reduction methods to stabilize the estimator, these methods are based on knowing the marginal posterior distribution of a reduced parameter vector, which is unfortunately not available within a complex ODE-based model.

A sampling-based estimator for the marginal model likelihood, as suggested by Friel and Pettitt (2008), utilizes the identity

logf^(M⁾(D) =

Z 1

0 Et[logf(D|ϑ)]dt,

where the expectation Et is with respect to the so-called power posterior π_t(ϑ|D) ∝ f(D|ϑ)^tπ(ϑ) (the concept of the power posterior was already utilized for the likelihood adjustment procedure presented in Chapter 2.2). An existing posterior sample may be used to approximate a part of the integral witht ∈[1−,1], e.g. by using importance weights.

However, for calculation of the remaining integral one needs additional samples from the power posteriors πt for various t∈[0,1−], which would require the same computational effort as for generating the original posterior sample.

An approach that samples across several competing models and its corresponding pa-rameter spaces is the reversible jump MCMC method presented by Green (1995). This method does not require marginal likelihood calculation as it directly estimates the joint distribution of models and parameters. Unfortunately, finding efficient proposal distribu-tion to jump between different models are difficult to construct, especially when probability mass is very dense within the singular models (Brooks et al., 2003).

3.3.1 Marginal likelihood estimation based on the detailed-balance

Im Dokument Bayesian inference for infectious disease transmission models based on ordinary differential equations (Seite 76-80)