• Keine Ergebnisse gefunden

5 EOM 2.0: new capabilities

5.3 Point group symmetry

The impressive results achieved over the last years using SAXS ensemble analysis (see Chapter 4) paves the way to further even more challenging studies. Consequently, numerous requests were received from the users’ community to employ SAXS ensemble analysis on very complicated scenarios, i.e. mixing structured and unstructured portions of particles. In particular, a challenging and practically important case is that of macromolecules consisting of conserved symmetric core but with some disordered regions too. As briefly mentioned before, proteins showing point group symmetry are not easy to handle for an ensemble approach where a high number of independent conformations must be generated, as a preparation step, in short time. The main reason lies in the high computational time necessary to model feasible models having symmetry, and none of the above described SAXS ensemble tools is capable of providing such models. Below, the problem of modelling particles with point symmetry is addressed and the solution implemented in EOM 2.0 is presented. For clarity, the point group symmetry is often abbreviated with symmetry in EOM 2.0 and it is indicated with the lowercase letter p (i.e., p2, p3, …, p222).

In order to model a chain free of steric clashes, the distance between a new dummy residue (DR) and any other DR previously positioned needs to be checked. DRs with a distance lower than the defined threshold (usually 3.8Å) must be repositioned. This implies that the modelling procedure has the computational cost proportional to O(n2) (using the big O notation), where n is the number of residues in the polypeptide chain. In case of particles with symmetry then the time complexity raises up to O(nm+1), where m is the number of symmetric chains that compose the particle (symmetry p2 -> m=2, p3 -> m=3, p4 -> m=4, …, p62 -> m=12, …, p222 -> m=4). The probability of having two DRs at a distance lower than the threshold increases exponentially with increasing the number of chains within the particle. Accordingly, the number of checks necessary to exclude any clash situations increases too. The high probability of having steric clashes is therefore a severe bottleneck for using a linear modeling approach in the presence of symmetry.

The re-designed EOM 2.0 allows for an efficient modeling of particles with symmetry.

This is done in two different ways: either (i) using oligomerized sub-units coming from high-resolution techniques or (ii) by generating symmetry at runtime as specified by the user. Both the cases are presented below.

95

It is very often the case that proteins with disordered regions are decomposed into their constituent domains when characterized by high-resolution methods, and the atomic structures are available from public protein databases. The domains (or subunits) often form oligomers by applying point group symmetry. These symmetric oligomers provide a significant amount of information that must be fully considered during the modelling.

Although in some cases the packing forces in the crystals may generate artificial oligomers rather often the oligomerization interfaces from MX are also observed when the particles are studied in solution using SAXS. The potentially flexible portions in a full length protein may (at least to some extent) retain the symmetry observed for the high resolution core. Generating symmetric configurations of such a full length protein will restrict significantly the conformational space for the genetic algorithm to run.

The practical importance and utility of this approach is best demonstrated in a practical case that vividly depicts a typical problem and its solution offered by the new EOM 2.0 algorithm (Fig. 5.9). In one of the collaborative user projects at the EMBL SAXS beamline, a flexible multi-domain protein (Nucleophosmin) was studied containing a long (122AA) inter-domain region.

Crystallization of the full length protein did not succeed, presumably due to the disorder of this inter-domain region. Crystallographic characterization of the single sub-units was hence made and a SAXS study was conducted on the full-length construct. Unexpectedly, the N-terminal domain arranged as a pentamer in the crystal whereas the C-terminal domain appeared as a monomer. This information is extremely important and must be taken into account in the analysis of the SAXS data from the full length protein.

96

Figure 5.9 Case study on a flexible multi-domain particle with a long inter-domain region (122 AA) and a N-terminal tail (31 AA), both disordered. Domains are available as crystal structure showing different oligomerization arrangements: pentamer (N-terminal domain) and monomer (C-(N-terminal domain). The full length protein is observed in solution as pentamer.

In the scenario illustrated in Fig. 5.9, EOM 2.0 models the full length particle extracting the transformation matrix from the MX core containing the pentameric N-terminal domain. Once a single full length chain is modeled following the generation procedure described in section 5.1, symmetric operations are applied using the transformation matrix from the core. Accordingly, symmetric operations guaranty the absence of steric clashes in the entire system if two consecutive chains do not show any steric clash between them. Thus the pool will contain symmetric full length protein molecules of the type illustrated in Fig. 5.10(A) where the pentameric N-terminal domain is kept in its original arrangement and only the peripheral C-terminal domains are appropriately moved and rotated, being tethered by the linker.

97

Figure 5.10 Pool reconstruction performed using EOM 2.0 for the case study in Fig.

5.9. Transparency is used to represent representative multiple conformations present in the pool. (A) Flexible pentameric particle modelled extending to the entire particle the symmetry present in the pentameric core. (B) Flexible pentameric particle modelled as asymmetric with symmetry present only in the core.

Clearly, symmetry may not necessarily be preserved in the flexible parts and the possibility to add asymmetric flexible sections is also available (Fig. 5.10(B)). In this case, every chain is modelled as an independent one and no symmetric operations are performed. This generation requires longer computational times since the clash checks must be performed for all individual chains.

In principle, the symmetry observed for one of the subunits in the crystal is not necessarily extended to the full length protein. Moreover, as in SAXS homologous domains can also be used for the modelling, it may well be that oligomerization and thus symmetric arrangement of a homologue is different from that of the actual protein. If no reliable high-resolution information about the way of oligomerization is available, then the possibility to generate a symmetric model represents a feasible opportunity for the modelling. Therefore, EOM 2.0 offers also the possibility to generate symmetrical conformations using a user-defined oligomerization interface (range of residues and distance). Computer tools like PISA (Krissinel and Henrick 2007), are able to suggest potential oligomerization interfaces for such modelling. In EOM 2.0, a domain forming the asymmetric part of the structure is first generated with appropriately positioned interface and then all the other domains and/or missing regions within the chain are

98

modelled according to the applied symmetry. Fig. 5.11 shows the case where an oligomerization interface (β-sheet) as well as a distance (10Å) are provided and the oligomerization is then performed by EOM 2.0. For simplicity of representation, only the N-terminal from the Fig. 5.9 has been used here. However, the strategy is of course extendable to more complex multi-domain particles.

Figure 5.11 Graphical representation of the strategy used by EOM 2.0 to generate symmetric models using a user defined interface. (A) N-terminal domain from the case study in Fig. 5.9 with the oligomerization interface (β-sheet in the dashed red box) and a Cα-Cα atoms distance (10Å). (B) Symmetry generation strategy performed moving the domain at the center of the coordinates in order to apply symmetry operations according to the oligomerization interface defined.

Once the interface and a distance are specified, a residue is randomly selected within the interface (Fig. 5.11(A)). The particle is then moved to the center of coordinates (with the selected residue at the origin), randomly rotated and then shifted by dist=2sin(/2)shift following the chord theorem, with =360°/(number of chains)) (Fig. 5.11(B)). By using of symmetric transformation will then generate symmetric mates at the residue-residue distance specified (dist=10Å for the case in Fig. 5.11). In case of no clashes, the oligomerization is then accepted. Using this approach, each model in the pool will show different oligomerization arrangements depending on: (i) the distance chosen, (ii) the random rotation performed before the shifting and (iii) the defined interface. The distance and the size of the interface must therefore be appropriately chosen as larger interface – with high number of residues involved – as well as large distance will generate

99

oligomerizations with higher diversity whereas smaller ones will force the tool to generate similar oligomerization arrangements (Fig. 5.12).

Figure 5.12 Selection of symmetric models generated using the same interface (as in Fig. 5.11 (A)) but different Cα-Cα atoms distance (20Å vs. 10Å). Smaller distances (lower dashed box) results in limited difference in the oligomerization such that its effect is similar to a MD approach.

In this approach the main rotational axis is usually z. If needed (i.e., symmetry p222, 42, etc.), rotations around y are also computed. The strategy described above allows also the modeling of non-crystallographic symmetry such that others kinds of particles (e.g., viruses) can also be modeled (i.e. symmetry p7, p9, p11, etc.). In all the cases listed above, the final pool models are free of steric clashes.

Finally, Fig. 5.13 shows examples of yet more complicated cases that can be handled by EOM 2.0.

100

Figure 5.13 Graphical representation of some case studies involving very complicated particles that are now possible to model with an ensemble approach by using EOM 2.0.

(A) Multi-domain particle with p62 symmetry reconstruction through a user defined oligomerization interface. (B) Big virus particle (~2MDa) composed by 2 domains (available at atomic resolution) connected by a long disordered region.