• Keine Ergebnisse gefunden

1.4 Algorithms for inferring initial models

1.4.3 Algorithms based on projection matching

An alternative to the above two-step approach is to estimate both the orientation parameters and the electron density parameters at the same time. Most algorithms that follow this approach are based on projection matching.

Projection matching (Penczek et al. 1994) is the standard algorithm used in the refinement step of the cryo-EM pipeline (Section 1.2). It improves the resolution of the reconstruction by refining the orientation parameters. It is an iterative algorithm which alternates between updating the electron density, and updating the orientations.

At every iteration, the current estimate of the electron density is projected along a discrete grid covering the range of possible projection directions. Every particle image is compared to every projection image by computing the cross-correlation coefficient. The orientation of the

projection image maximising the cross-correlation is used to update the orientation parameters of the particle image.

Using the updated orientation parameters, a new electron density is reconstructed with one of the reconstruction algorithms from Section 1.3.

These two steps are repeated, usually a predetermined number of times. After a few iter-ations, there are typically only small changes in the orientation parameters, and therefore the discrete grid over all orientations can be replaced by a smaller grid in the neighbourhood of the previous orientation.

Projection matching is widely used, and works well if the initial electron density does not differ too much from the ‘true’ electron density. Note that there is no explicit or implicit cost function that is being optimised, and therefore no guarantee that the algorithm will converge.

Furthermore, there are many algorithmic settings which can influence the final result, such as the choice of discrete grid of orientations at every step, and the function to compare projection images to particle images. Determining appropriate settings requires an experienced user, and can lead to results that are biased.

The simplest initial model algorithm based on projection matching is the random model algorithm (Sanz-Garcia et al. 2010; Yan et al. 2007). A similar algorithm is used by the software suite EMAN2 (Tang et al. 2007). They start with a random model, and refine it using projection matching. Class averages are used instead of individual particle images.

Creating an initial random model can be done in different ways. Sanz-Garcia et al. (2010) assigned random orientations to the input images and used them to reconstruction an initial random model. In contrast, EMAN2 applies a low-pass filter to three-dimensional random noise.

Because it uses projection matching, the result of the random model algorithm will be biased by the initial random model. To make it more robust, the algorithm is usually repeated several times starting from different random models. The corresponding final models are then ranked using different strategies. One strategy is to compute the cross-correlations between the input images and the projections of the final model using the estimated orientations. Sanz-Garcia et al. (2010) evaluated this and several other strategies based on Fourier shell correlation (FSC, see page 53), principal component analysis (PCA) and map variance. They concluded that no single strategy is always reliable, and highlighted the importance of comparing different models by eye.

Similarly to the algorithms based on common lines, random model algorithms also work better if the structure can be assumed to be symmetric, as was done by Yan et al. (2007).

Sanz-Garcia et al. (2010) explored asymmetric structures, and found that their method works if there are prominent structural features, but struggles with round and relatively featureless structures.

To summarise, random model algorithms only work on some structures, and lack a reliable way of comparing multiple resulting models. They may also require careful tuning of algorithmic parameters such as the angular step size in each projection matching iteration (Sanz-Garcia et

al. 2010).

Two recent algorithms (Elmlund et al. 2013; Sorzano et al. 2015) modify the projection matching strategy to try not to get trapped in local optima. In particular, they modify the orientation assignment step. In projection matching, each data image is assigned a single orient-ation, corresponding to the most similar projection image. But there are often several projection images that are similar to the data image, possibly corresponding to very different orientations.

This can be due to the noise in the data image, or because the current estimate of the electron density still differs too much from the ‘true‘ density. Instead of trying to choose between these equally worthy candidates, Elmlund et al. (2013) and Sorzano et al. (2015) assign multiple pro-jection images to a single data image. A weight is attached to each propro-jection image to quantify how similar it is to the data image. During the subsequent reconstruction step of projection matching, all the assigned projection images are used, possibly including copies of the same image, one for each data image to which it was assigned.

Replacing hard orientation assignments by soft assignments in this way makes the algorithm more robust. But to turn the above general description into a concrete algorithm, several details have to be specified, such as choosing which projection images to assign to a data image, and how to compute the weights.

Both Elmlund et al. (2013) and Sorzano et al. (2015) use cross-correlations to compare projections and data images. But Sorzano et al. (2015) then choose a predetermined percentage of images with the highest cross-correlations, letting the percentage threshold decrease from 15% to 0.01% over several iterations. In contrast, Elmlund et al. (2013) set the threshold to the highest cross-correlation from the previous iteration, and use only a random subset of the highest cross-correlations.

To determine the weights, Sorzano et al. (2015) directly normalise the cross-correlations themselves, while Elmlund et al. (2013) first apply two successive transformations to the cross-correlations, each involving the exponential function, before normalising the result.

These choices constitute two ad hoc approaches, with many algorithmic parameters that can be adjusted by the user, leading to biased results. We will see below that a similar algorithm follows naturally from a more principled approach, with fewer algorithmic parameters, and ones that are easier to interpret.

These two algorithms also differ in other ways. Instead of using class averages as data images, Elmlund et al. (2013) start with the individual particle images. In this way, the class averaging step is made part of the initial model inference algorithm. The resulting algorithm is computationally very demanding, requiring around 5000 to 10000 CPU hours. In comparison, the algorithm by Sorzano et al. (2015) is relatively fast, needing just a few hours starting from the class averages.