Gibbs sampling - Estimating GMM parameters

2.2 Estimating GMM parameters

2.2.3 Gibbs sampling

Gibbs sampling (Geman and Geman 1984) is a Markov-chain Monte Carlo (MCMC) algorithm for sampling from the posteriorp(θ|D). Whereas the EM algorithm was used to obtain a single es-timate ˆθof the model parameters, a Gibbs sampler produces multiple parameters{θ₁, θ₂, . . . , θ_T}. The EM estimate ˆθ approximates the posterior mode θ_MAP, while the Gibbs samples typically explore the region around the mode. The variation in the parameter valuesθtaround the mode convey the precision with which the model parameters can be determined given the data.

As in the case of the EM algorithm, Gibbs sampling also makes use of the component assignments as latent variablesZ. The idea is to use a Gibbs sampler to draw samples from the

Initial mixture Final mixture

0 5 10 15 20

Iterations

−700

−680

−660

−640

−620

−600

Log-posterior

log-posterior component size

0.0 0.5 1.0 1.5 2.0 2.5

Component size

1 2 3 4 5 6

7 8 9 10 11 12

Figure 2.8: Using the EM algorithm to estimate the parameters of a two-dimensional isotropic Gaussian mixture with three components. The data (blue dots) was sampled from a three-component mixture. (Top left) The initial three-components (red circles) are centered around the origin, but after 20 EM iterations (top middle) the components match the data well. The means of the components are shown asblack dots withblack lines linking means from successive iterations. (Middle and bottom row) The algorithm has converged after about 12 iterations.

(Top right) The log-posterior increases after every iteration, and the component size decreases to match the value of 1 used to generate the original data.

extended posterior p(θ,Z|D):

{(θ₁, z₁),(θ₂, z₂), . . . ,(θ_T, z_T)}, (2.40) and discard the component assignments to obtain samples from the marginal distributionp(θ|D):

{θ₁, θ₂, . . . , θ_T}. (2.41) The idea behind Gibbs sampling is to partition the parameters that constituteθand Z into sets, and iteratively sample each set of parameters conditioned on the other parameters. In our case the parameters are partitioned into four sets: the means µ, the precision s, the weightsw, and the component assignments z.

Given the current parameter values (µ_t, s_t, w_t, z_t), the values for the next step are sampled

Initial mixture Final mixture

0 5 10 15 20

Iterations

−700

−680

−660

−640

−620

−600

Log-posterior

log-posterior component size

0.0 0.5 1.0 1.5 2.0 2.5

Component size

1 2 3 4 5 6

7 8 9 10 11 12

Figure 2.9: Using Gibbs sampling to estimate the parameters of the same two-dimensional mixture model using the same data as in Fig. 2.8. (Bottom two rows) Compared to the EM algorithm, the Gibbs sampler converges faster to the correct solution. (Top middle) The Gibbs samples (dotted red circles) are all close to the optimal solution found by the EM algorithm.

Increasing the number of data points would decrease the amount of variation. (Top right) The log-posterior is no longer monotonically increasing, but instead oscillates just below the optimal value found by the EM algorithm.

from the following conditional distributions, one after the other:

p(z_t+1|µ_t, s_t, w_t,D) (2.42) p(µ_t+1|s_t, w_t, z_t+1,D) (2.43) p(st+1|µt+1, wt, zt+1,D) (2.44) p(wt+1|µt+1, st+1, zt+1,D) (2.45)

The Gibbs sampler is initialised in the same way that the EM algorithm was initialised, for example by samplingθ0 = (µ0, s0, w0) from the prior, andz0 from its conditional distribution.

The expressions for the conditional distributions are (see Appendix B for the derivation):

p(z_i|µ, s, w,D) = There are strong similarities between the Gibbs sampling algorithm and the EM algorithm.

Both algorithms produce a sequence of parameter values. While the EM algorithm discards all but the last term of the sequence, the Gibbs sampling algorithm typically discards a number of samples at the start of the sequence (the burn-in period), and then picks every nth sample (with n= 50 for example) to reduce the dependency between successive samples.

The division of each step of the EM algorithm into an E-step and an M-step is also reflected by the conditional distributions of the Gibbs sampler. The soft assignments r_ik that are com-puted during the E-step also appear in the conditional distribution for the hard assignments z_i (Eqn. 2.46). The other conditional distributions are for sampling the model parameters, which corresponds to the M-step.

Furthermore, each of the four EM update equations is exactly the same as either the mean or the mode of the corresponding Gibbs sampling conditional distribution. For the assignments, z_i is sampled from a categorical distribution where the expected value for each z_ik is r_ik. The mean (and mode) of the Gaussian distribution for µ_k (Eqn. 2.47) coincides with the update for µ_k(Eqn. 2.34). The conditional distribution forsis the Gamma distribution (Eqn. 2.48), whose mode (˜a−1)/˜b is exactly the update fors(Eqn. 2.38). Finally, the conditional distribution for the weights is a Dirichlet distribution (Eqn. 2.51, with its mode given by the corresponding EM update (Eqn. 2.39).

One consequence of the similarity between the two algorithms is that they require a similar amount of computation per step. The computationally most intensive part is computing the soft assignments rik, which are needed in both cases. The additional computations needed for Gibbs sampling are insignificant in comparison.

Fig. 2.9 shows how Gibbs sampling can be used instead of EM for the example from Fig. 2.8.

It suggests that one advantage of Gibbs sampling is that it converges faster than EM. Another advantage is that it explores the posterior distribution, instead of providing just a single estimate.

Figure 2.10: (Left) The Gibbs sampling and EM algorithms need to be modified to work with images as input for application to cryo-EM. (Middle) The same image at a lower resolution.

(Right) The two algorithms introduced above use point clouds as input.

EM can be viewed as a special case of Gibbs sampling obtained by increasing the amount of data, which decreases the width of the Gibbs sampling conditional distributions. As shown above, the update equations for EM can (at least in this case) be derived from the conditional distributions of Gibbs sampling. For these reasons, we will focus on Gibbs sampling in the rest of this chapter.

Im Dokument A Bayesian approach to initial model inference in cryo-electron microscopy (Seite 39-43)