• Keine Ergebnisse gefunden

5 Results and discussions

5.2 Comparison of the performance of NMSim to other conformation generation

5.2.4 Side-chain flexibility and rotamers

To compare side-chain quality and flexibility in terms of rotamer sampling, the rotamer derived measures (see section 4.2.2), i.e., heterogeneity, occupancy, and rotamericity, were calculated for the structural ensembles of the different methods.

Rotamers have been successfully used to account for side-chain flexibility in docking

applications.77,245-247 With the increasing amount of experimental data, many rotamer libraries have been published.179,190,191

In this study, the Penultimate rotamer library179 from the Richardson lab has been used. A recent review has regarded the Penultimate rotamer library as the best among the available backbone-independent rotamer libraries.191

To analyze how well the different methods sample available rotamer states, a rotamer heterogeneity measure of each HEWL residue was calculated over the structural ensembles of the different methods. The rotamer heterogeneity derived from the 130 experimental structures was taken as reference (see Figure 5.9). Here CONCOORD, which was found to explore good backbone conformation space, poorly explores different rotamer states as compared to the experimentally observed rotamer states (see Figure 5.9-d). None of the residues in the CONCOORD ensemble was found to explore the full range (i.e., heterogeneity = 1) of available rotamer states, whereas the experimental structures show a heterogeneity = 1 for 13 out of 103 residues (i.e., excluding GLY, ALA, and PRO). Furthermore, almost all heterogeneity values observed in the CONCOORD ensemble are lower than the experimentally derived values. This is an interesting observation, since conformations in CONCOORD are generated from randomized atomic positions62 and thus should be sampling diverse sets of rotamer states. This poor sampling of side-chains should be considered before using CONCOORD structures in side-chain sensitive applications such as ligand docking.

Figure 5.9: The rotamer heterogeneity of HEWL residues in the structural ensembles of 1000 structures obtained from MD (red in panel a), NMSim (green in panel b), FRODA (blue in panel c) and CONCOORD (magenta in panel d). The rotamer heterogeneity values derived from 130 experimental structures (cyan in a-d) are shown as reference in all graphs.

MD, NMSim, and FRODA show a similar pattern of the rotamer heterogeneity, which is also similar to the pattern derived from 130 experimental structures (Figure 5.9 a-c). Small differences occur in the mobile regions of HEWL (i.e., residues 40 to 60), where all methods, especially FRODA, show lower heterogeneity than is experimentally observed. In contrast, in the tail region, all methods show higher heterogeneity than in the experiments. Comparing different methods reveals that MD explores more rotamer states than NMSim, whereas NMSim samples more states than FRODA. This can also been seen by the average heterogeneity values over 103 HEWL residues for different methods (see Table 5.5).

The average of the “rotamer occupancy” measure (see section 4.2.2) can be used to quantify the diversity of the rotamer states captured in an ensemble, and thus reflects the flexibility available to side-chains. It should be noted that the highest possible average rotamer occupancy is ~10 for an HEWL ensemble; i.e., if in a hypothetical case every residue (103 residues, excluding GLY, ALA, and PRO) of HEWL in the ensemble samples all possible rotamer states available in the rotamer library. MD, NMSim, FRODA, and CONCOORD on average sample 5.78, 4.97, 3.14 and 1.63 rotamer states, respectively, out of 10 (see Table 5.5). Here CONCOORD shows around 2.7 times less diversity in rotamer states than the experimentally observed 4.41. This again shows a restricted conformational space available to side-chains in structures generated by CONCOORD. Contrarily, a high average occupancy value for NMSim as compared to FRODA and CONCOORD is observed, which justifies the specific modeling of rotamer states in geometry-based conformational modeling. The correlation coefficient between the occupancy vectors (103-dimensional vector containing occupancy values) is shown in Table 5.5 (see section 4.2.2) which compares the patterns of rotamers sampled in the different ensembles. NMSim was found to have a higher correlation coefficient of 0.71 and 0.80 with the experimental and the MD derived vectors, respectively, as compared to FRODA and CONCOORD.

In order to analyze the probability for any rotamer state to exist for each residue in a protein sequence, the rotamericity measure is calculated over an ensemble of structures (see section 4.2.2). This is related to the quality of side-chains in the ensemble in terms of rotamers. The average rotamericity for 103 residues (Table 5.5)

shows a higher value for CONCOORD compared to NMSim and FRODA. This can be expected, if there is a tendency of a method to keep a rotamer state as found in the starting structure over the trajectory/ensemble. However, the average rotamericity measure for NMSim (0.698) and FRODA (0.685) are comparable to the experimentally found value of 0.731, whereas for MD it is even 0.816 (see Table 5.5).

Table 5.5: The rotamer derived measures for different structural ensembles.

Methods Average values a) Occupancy

vector e) Heterogeneity b) Occupancy c) Rotamericity d) EXP MD

EXP 0.498 4.407 0.731 1.000 0.861

MD 0.537 5.786 0.816 0.861 1.000

NMSim 0.459 4.970 0.698 0.713 0.808

FRODA 0.338 3.145 0.685 0.569 0.733

CONCOORD 0.228 1.631 0.752 0.438 0.520

a) The averages of different measures are calculated over 103 out of 129 residues of HEWL (excluding GLY, ALA, and PRO). b) The heterogeneity measure of a residue in a protein sequence is defined as the ratio of the total number of distinct rotamer states of the residue observed in an ensemble to the total number of available rotamer states for that residue in the rotamer library.179 c) The occupancy measure of a residue in a protein sequence is defined as the total number of distinct rotamer states of the residue observed in an ensemble. d) The rotamericity of a residue in a protein sequence is defined as the ratio of the total number of occurrences of the residue in any of the possible rotamers to the total number of conformers in the ensemble. e) The correlation coefficients between the different occupancy vectors of different methods.

Occupancy vector in HEWL is a 103-dimensional vector containing occupancy values of the residues.

5.2.5 Structure quality using Procheck

The quality of a subset of the structures obtained from the different types of methods was analyzed using the Procheck194 program (see section 4.2.3). The averages and the standard deviations were calculated for the different properties obtain from Procheck.

Table 5.6 summarizes the Procheck results including Ramachandran plot distribution, G-factors, and planar groups.

Procheck divides the Ramachandran plot into four areas: core, additionally-allowed, generously-allowed, and disallowed. Every method shows a good Ramachandran plot distribution with almost zero percent of the structures located in disallowed or generously allowed regions and with a highly populated core region. Specific modeling of ϕ ψ constraints in NMSim results in the highest core region population on average (i.e., 92 %) as compared to the other methods. Remarkably, this is in agreement with the high resolution experimental structures EXPTOP.

The Procheck G-factor provides a measure of how normal a given stereo-chemical property is. This value is computed for dihedrals angles and covalent geometry.

A low G-factor indicates that the property corresponds to a low-probability conformation; ideally, the G-factor value should be above -0.5, whereas structures with values below -1.0 may need investigation. Table 5.6 shows that for every method except for MD the overall factor value is higher than -0.5. Notably, the covalent G-factor (i.e., main-chain bond lengths and main-chain bond angles) for MD is as low as -1.5. NMSim on average achieves 100 % planarity for the planar groups. Considering experimental EXP/EXPTOP, other methods also give acceptable planarity for planar groups except for MD, which gives around 56 % planarity on average. In short, structure quality properties for all methods are within acceptable ranges, as compared to the properties derived from experimental structures, except for main-chain bond lengths and side-chain planarity from MD derived structures. The poor quality of MD structures is understandable as the MD simulation is performed at 300 K, whereas geometry-based methods implicitly minimize each structure during correction cycles.

Table 5.6: The averages and standard deviations for quantities determining structure quality.

Methods Ramachandran plot a) G-factor b) Planar

c) Ramachandran plots of the structural ensembles. b) Averages/standard deviations of Procheck derived G-factors for the structural ensembles. A low G-factor indicates that the property corresponds to a low-probability conformation. Ideally, G-factor should be above -0.5, whereas a value below -1.0 indicates that the structure may need investigation. Procheck calculates G-factors for dihedral angles, covalent geometry and overall. c) Averages/standard deviations of percentages of side-chain planarity found in the structural ensembles.

In short, the NMSim approach described in chapter 3 was validated on hen egg white lysozyme in this section. NMSim sufficiently samples both the backbone and the side-chain conformations taking experimental structures and conformations from the state of the art MD simulation as reference. A comparison of different geometry-based simulation approaches shows that FRODA is restricted in sampling the backbone conformational space and CONCOORD is restricted in sampling the side-chain conformational space. NMSim produces structures of a good structural quality.

Furthermore, the explicit modeling of rotamer states in NMSim improves the quality of side-chain conformations as compared to without modeling in NMSim and as compared to the other geometry-based approaches. The NMSim approach will be used for exploring biologically relevant motions in the following section.

5.3 Performance of NMSim in exploring biologically relevant