• Keine Ergebnisse gefunden

4 Materials and methods

4.2 NMSim and methodological comparisons

In order to analyze the usefulness and the limitations of the NMSim approach, it was compared with different counterpart approaches on a test case: the Hen Egg White Lysozyme (HEWL) protein. The HEWL conformations207 from a state of the art MD56-58 and different experimental structures are compared with the conformations obtained form the most efficient geometric based methods i.e., FRODA,64 CONCOORD62,63 and NMSim.

4.2.1 Analysis of MD, NMSim, FRODA, CONCOORD and experimental HEWL ensembles

The MD trajectory was taken from a recent study by A. Koller et al.,207 where a 100 ns MD simulation of HEWL (PDB code 1hel)208 was performed with AMBER9 under periodic boundary conditions in the NVT ensemble. The Amber force-field 99SB was used with TIP3P water model at 300 K. This simulation took approximately 4 month on 4 CPUs on a linux cluster. Here, 1,000 equal-spaced conformations were selected from the trajectory, which forms the MD ensemble used in this study.

The NMSim program was applied to the same starting structure with the default parameter set (see Appendix A). In total 10,000 conformations were generated using a simulation cycles of 1,000 and an NMSim cycle of 10. This simulation took 30 hours on a 64-bit desktop computer. Every 10th structure was then selected for the NMSim ensemble.

The FRODA64 simulation with the latest available version 6.2 was performed using the default parameter set. However, the hydrophobic cutoff –c is set to 0.35Å, because the default cutoff of 0.5Å resulted in a highly rigid protein with no relative motions.

For the other parameters the default values were used. In total, 10 million conformations were generated, and every 1000th conformation was saved during the simulations. A total of 10,000 conformations were saved from the simulations. For the analysis, every 10th conformation was selected from the saved conformations, which

forms the FRODA ensemble of 1000 conformations. This simulation took 6 days on a 64-bit desktop computer. Here it is important to note that, despite of generating 10 millions of conformations and using approx. 6 days of computational time, the FRODA trajectory was less explorative in terms of RMSD from the starting structure as compared to the NMSim trajectory. The average backbone and heavy atom RMSD of every structure to its previous structure in the FRODA ensemble are 0.25 Å and 0.5 Å, respectively, as compared to 0.4 Å and 0.6 Å in the NMSim ensemble.

The latest available version of the CONCOORD62 2.0 program was run with the default parameter set. As recommended on the CONCOORD home page, van der Waals parameters “yamber2” and bonded parameters “Engh-Huber” were used.

CONCOORD is a pure conformation generation method with no pathways/trajectories of the simulations. Every conformation is generated using the starting structure distortion and correction procedure, and, hence, does not depend on simulation time; therefore, only 1000 structures were generated for the ensemble. The generation of the 1000 conformation only took 53 minutes on a 64-bit desktop computer.

In order to compare the conformations generated from the different approaches with the experimentally observed conformations of HEWL, an ensemble of experimental structures was made. The experimental structures, which show 100 % sequence similarity with the sequence of the starting structure of HEWL (PDB code 1hel)208 were downloaded from the RCSB Protein Data Bank.202 In case of NMR structures, each model was treated separately. The structures that were closest to the starting structure with a Cα RMSD less than 0.5 Å were removed. The experimental ensemble contains 130 different X-ray crystal structures and NMR structures.209 These structures are listed in Appendix B.

4.2.2 Rotamer states and derived measures: rotamericity, heterogeneity, and occupancy

To compare side-chain conformational sampling in different methods, the Penultimate rotamer library179 was used in this study. A side-chain conformer was assigned to a rotameric state if every χ-angle of that residue falls within ±30° of the corresponding χ-angle of any of the rotameric states available to that particular residue. Different rotamer derived measures were then used for the analysis.

The rotamericity measure is used to compare the quality of side-chain conformations in different ensembles. The rotamericity of a residue in a protein sequence is defined as the ratio of the total number of occurrences of the residue in any of the possible rotamers to the total number of conformers in the ensemble. It is important to note here that the rotamericity of each residue in the sequence is calculated in this study.

This is in contrast to the rotamericity of each amino acid of protein, used by Schrauber et al.192 The rotamericity of amino acids has been used previously, for example to show that a substantial number of side-chains are under strain192 and that ligand binding induces non-rotamericity.210

To analyze the potential of different methods in sampling different rotamer states, different measures are introduced. The heterogeneity measure of a residue in a protein sequence is defined as the ratio of the total number of distinct rotamer states of the residue observed in an ensemble to the total number of available rotamer states for that residue in the rotamer library.179 This measure defines how well the different methods explore the available side-chain conformational space.

The heterogeneity is normalized with the available rotamer states of a residue.

According to the Penultimate rotamer library,179 some long side-chains like Arg and Lys have 34 and 27 rotamer states respectively, whereas, side-chains like Cys and Ser have only 3 rotamer states. These uneven normalization factors need to be considered for the heterogeneity measure. Therefore, the occupancy measure was also introduced, i.e., the heterogeneity measure without normalization. The occupancy measure of a residue in a protein sequence is defined as the total number of distinct rotamer states of the residue observed in an ensemble. Furthermore, the occupancy

vector is introduced, which is simply a vector containing the occupancy value of every residue in a protein. The correlation coefficient between the occupancy vectors is then calculated to compare the patterns of rotamers sampled in the different ensembles.

4.2.3 Structure quality using Procheck

The quality of a subset of the structures obtained from the different types of methods was analyzed using the Procheck194 program. Here, 100 equal-spaced structures were taken from the ensemble of every method for the analysis. To better judge the structure quality, 100 high resolution crystal structures from Richardson’s lab211 (here named as EXPTOP) were also used for the analysis, in addition to the 130 experimental structures of HEWL. The averages and the standard deviations were calculated for the different properties obtain from Procheck.

The G-factor provides a measure of how normal a given stereo-chemical property is.

In Procheck, it is computed for dihedrals angles (i.e. ϕ−ψ combination,

2

1 χ

χ − combination, χ1 torsion for those residues that do not have a χ2, combined χ3 and χ4 torsion angles, ω torsion angles) and covalent geometry (main-chain bond lengths, main-(main-chain bond angles). The G-factor is a log-odd score based on the observed distributions of these stereo-chemical parameters. A low G-factor indicates that the property corresponds to a low-probability conformation.