3. Results and Discussion
3.2. Molecular dynamic simulations of the CTLD of perlucin and MBP-‐A
3.2.2. Secondary structure of the CTLD of perlucin and MBP-‐A
The first characteristic that was extracted from the simulated trajectories was the average secondary structure of each residue (see Nelson & Cox [2013] or Richardson [2007] for general information on secondary structure). The AMBER “ptraj” software module uses the DSSP algorithm (Kabsch & Sander [1983]) to assign secondary structure elements to the residues of a trajectory. “ptraj” discriminates between the following elements: parallel strand, anti-‐parallel strand, α-‐helix, 3/10-‐helix, π-‐helix and turns. Each of these structural elements is classified according to the hydrogen bond pattern (see Kabsch & Sander [1983] for details) Relevant are the hydrogen bonds that form between the hydrogen bound to the backbone nitrogen of residue 𝑗𝑗 and the backbone oxygen of another residue 𝑖𝑖.
The helical structures differ in the number of residues between the residues that are involved in the hydrogen bond formation. In the familiar α-‐helix, the backbones of residue 𝑖𝑖 and 𝑖𝑖 + 4 are connected via one hydrogen bond. The 3/10-‐helix is “tighter”
since the residues 𝑖𝑖 and 𝑖𝑖 + 3 are connected. On the contrary the π-‐helix is “looser”
than the α-‐helix since residues 𝑖𝑖 and 𝑖𝑖 + 5 are connected. At least two consecutive hydrogen bonds must be formed to define a helix. If only one hydrogen bond is formed then a turn is formed.
Parallel and anti-‐parallel strands differ in the orientation of the residue segments that are connected by hydrogen bonds. In the first case both segments’ C-‐ and N-‐terminal ends pointing in the same direction whereas in the latter case the C-‐ and N-‐terminal ends pointing in opposite direction. Note that according to the definitions that underlie DSSP a β-‐strand is composed of successive residues that are in a “β-‐bridge”
conformation. In the following both termini are used interchangeably except when stated otherwise. A β-‐bridge is characterised by two hydrogen bonds formed between two non-‐overlapping sequences of three residues. It is only mentioned that the STRIDE algorithm (Frishman & Argos [1995]) uses additionally backbone dihedral angle information for secondary structure classification.
To condense the secondary structure information obtained from a trajectory as much as possible following steps were performed. “ptraj” computes the percentage of frames of the whole trajectory a particular residue can be classified by one of the above mentioned secondary structure elements. Here the analysed trajectories comprised
5010 frames each beginning from the input structure, extending over the restart structures from the restrained heating phase and every frame from the unconstrained simulation. The influence of the ten restart-‐structures from the first 220 𝑝𝑝𝑝𝑝 on the subsequent 5000 frames from the unconstrained simulation was considered to be negligible. To condense the information further only the total strand (sum of the percentages of parallel and anti-‐parallel conformation – more strictly it is the sum of parallel and anti-‐parallel β-‐bridges), total helical (sum of all helical conformations) and turn conformation per residue were considered. The time dependency of the secondary structure conformation per residue was not considered further. Since several MD simulations were performed with the same initial structure (see Table 3.2.2.) the results of every single MD simulation with the same initial structure were averaged.
The following two figures – one for perlucin with four calcium ions (run09) and MBP-‐A with three calcium ions (run07) – show for every residue the average percentage of frames that a certain residue is in a strand, helical or turn conformation. The figures for the remaining MD simulations can be found in the Appendix III.R.3. and are omitted here to maintain readability.
Fig. 3.2.3. Average secondary structure conformation from six 10.2 ns simulations of perlucin with four calcium ions (run09). For every residue the percentage of frames the given residue
adopts one of the following conformations is given. The “general helical” (violet) conformation is the sum of the α-‐helix, 3/10-‐helix and π-‐helix conformations and the “general strand”
(yellow) is the sum of parallel and anti-‐parallel β-‐strands (strictly it is the sum of parallel and anti-‐parallel β-‐bridges). The third conformation is the “turn” (cyan) conformation. Note that due to the graphical representation with “columns” or “bars” the residue number marker on the bottom axis is positioned on the left side of the corresponding column/bar. For better orientation the (presumed) identifiers of the characteristic SSEs of CTLDs according to Zelensky et al. (Zelensky & Gready [2003], Fig. 2a therein) are given at the top of the graph.
Fig. 3.2.4. Average secondary structure conformation from three 10.2 ns simulations of the CTLD of MBP-‐A (PDB code 1KWV, chain A, residues 104-‐221) with three calcium ions (run07).
For every residue the percentage of frames the given residue adopts one of the following conformations is given. The “general helical” (violet) conformation is the sum of the α-‐helix, 3/10-‐helix and π-‐helix conformations and the “general strand” (yellow) is the sum of parallel and anti-‐parallel β-‐strands (strictly it is the sum of parallel and anti-‐parallel β-‐bridges). The third conformation is the “turn” (cyan) conformation. Note that due to the graphical representation with “columns” or “bars” the residue number marker on the bottom axis is positioned on the left side of the corresponding column/bar. The crosses in either violet or yellow positioned at the 100% value of some residues indicate the secondary structure obtained for the crystal structure 1KWV (chain A) from the PDB web site. Here again all helix types are subsumed in the violet crosses and as well as all β-‐strands and β-‐bridge content is subsumed in the yellow crosses. Note that the crosses are attached on the left hand side of the corresponding column. For better orientation the (presumed) identifiers of the characteristic SSEs of CTLDs according to Zelensky et al. (Zelensky & Gready [2003], Fig. 2a therein) is given at the top of the graph.
As it can be seen in the Figures 3.2.3. and 3.2.4. the secondary structure elements that are expected for CTLDs in the long form (perlucin with β0-‐strand) and short form (MBP-‐A without β0-‐strand) can be identified. During the simulations of the CTLD of MBP-‐A deviations from the secondary structure conformations obtained from the crystal structure can be observed. This can be seen in Fig. 3.2.4. by comparing the crosses reflecting the secondary structure of the crystal structure and the height of the bars/columns representing data from the simulation. First of all it has to be stated that the total number of simulated protein models/structures is low (twelve for perlucin and nine for MBP-‐A in total) is low compared to typical concentrations in typical laboratory experiments. Additionally initial models/structures might have partial non-‐
native conformations due to modelling/crystallization. Since only one simulation parameter set was used in this thesis their influence on the simulated proteins could not inferred from the data. Therefore it cannot be expected that the distribution of the here simulated secondary structure conformations reflects the situation in a protein crystal used for experimental structure determination.
It was desirable to assign one unique secondary structure to each residue of the simulated structures/models. Since the time dependency of the secondary structure was not evaluated in this thesis an arbitrary threshold was chosen to assign a “general helical” (α-‐helix, 3/10-‐helix, π-‐helix) or “general strand” (parallel and anti-‐parallel β-‐
strands including β-‐bridges) conformation to the residues of the simulated proteins.
Referring to the averaged results of a MD simulation series, e.g. the results presented in Fig. 3.2.3. and 3.2.4., a certain conformation was assigned to one residue if it was in at least 75% of the frames of the analysed trajectories on average in this particular conformation. In Figure 3.2.5. the result of this assignment is shown. For every MD simulation series every residue of the simulated protein was assigned a “h” (general helical) or “e” (general strand) if appropriate. This can be compared to expected secondary structure. For perlucin this secondary structure could only be inferred from the alignment with templates that was used during the modelling process (see section 3.1. and Fig. 3.1.4.). In the case of the CTLD of MBP-‐A the secondary structure was obtained from the PDB web page for the structures 1KWT and 1KWV. The PDB provides sequences annotated according to the DSSP algorithm. Note that 1KWT and 1KWV have the identical sequence as well as the identical secondary structure.
A) perlucin ------ number | 1 10 20 30 40 50 60 70 80 90 100 110 120 130 PERLUCIN | GCPLGFHQNRRSCYWFSTIKSSFAEAAGYCRYLESHLAIISNKDEDSFIRGYATRLGEAFNYWLGASDLNIEGRWLWEGQRRMNYTNWSPGQPDNAGGIEHCLELRRDLGNYLWNDYQCQKPSHFICEKER w/ 4 calcium | eee eeeee e hhhhhhhhhh ee hhhhhhhhh eee ee ee e eee eee e eeee w/ 2 calcium | ee eeeee e hhhhhhhhhh ee hhhhhhh ee ee ee (e) eee eee eeeeee w/o calcium | ee eeeee e hhhhhhhhhh ee hhhhhhh e ee ee eeee eeee e eeeee DSSP expct. | EEE EEEEE B HHHHHHHHHH EE HHHHHHHHHh h EEEEEE EE b B ggg eeEEEE ggg EEEE ‡EEEEEEE SSE-Id | b0 b1 a1 b1' a2 b2 b2'' b3 b4 b5 PERLUCIN | GCPLGFHQNRRSCYWFSTIKSSFAEAAGYCRYLESHLAIISNKDEDSFIRGYATRLGEAFNYWLGASDLNIEGRWLWEGQRRMNYTNWSPGQPDNAGGIEHCLELRRDLGNYLWNDYQCQKPSHFICEKER ------ B) CTLD of MBP-A (1KWT, 1KWV) ------ number | 1 10 20 30 40 50 60 70 80 90 100 110 118 1KWV/T chn. A| GKKSGKKFFVTNHERMPFSKVKALCSELRGTVAIPRNAEENKAIQEVAKTSAFLGITDEVTEGQFMYVTGGRLTYSNWKKDEPNDHGSGEDCVTIVDNGLWNDISCQASHTAVCEFPA w/ 3 calcium | ee eeehhhhhhhhhh ee hhhhhhhhh ee ee ee e eeee hhh eeee ee ee w/ 1 calcium | eee eeehhhhhhhhhh ee hhhhhhhhh ee ee ee e eeee hhh eeee ee eee w/o calcium | --- ee eeehhhhhhhhhh ee hhhhhhhhhh ee ee ee e eeee eeee ee ee DSSP (1KWV/T)| EEEEEEEEEEHHHHHHHHHH EE HHHHHHHHHHH EEEEEE EE B B EEEE GGG EEEE EEEEEEEE SSE-Id | b1 a1 b1' a2 b2 b2'' b3 b4 b5 1KWV/T chn. A| GKKSGKKFFVTNHERMPFSKVKALCSELRGTVAIPRNAEENKAIQEVAKTSAFLGITDEVTEGQFMYVTGGRLTYSNWKKDEPNDHGSGEDCVTIVDNGLWNDISCQASHTAVCEFPA ------ Fig. 3.2.5. Summary of the secondary structure conformations of the CTLD of perlucin (A) and MBP-‐A (B) obtained from the MD simulations. Each part is organised as follows. The first line labelled “number” contains the residue numbering. The first residue of the simulated protein is assigned the number “1”. The next line labelled ”sequence” contains the sequence of perlucin (residue 1 to 131) and MBP-‐A (residue 104 to 221 in PDB numbering). The following lines contain the secondary structure of every residue as obtained from the simulations. For the two proteins the number of associated calcium ions differed in each simulation series. Only a general helical (“h”) or general strand (“e”) conformation is assigned if in at least 75% of the frames of the analysed trajectories on average in one of the aforementioned conformations persists. The line containing “DSSP” in its label holds the expected secondary structure of each residue. In the case of perlucin these conformations were taken from the alignment of the perlucin sequence with templates during the modelling process (cf. section 3.1. and Fig. 3.1.4). In the case of MBP-‐A the conformations were taken from the crystal structures 1KWT and 1KWV (both structures have identical secondary structure conformations). The PBD web page offers the sequences of the structures with conformational annotations mady by the DSSP algorithm. “E” represents a β-‐strand, “B” a β-‐bridge, “H” an α-‐helix and “G” a 3/10-‐helix. In the case of perlucin lower case are used to indicate that only one template has the corresponding conformation instead of both. The last but not least line labelled “SSE-‐Id” refers to the SSE notation scheme for CTLDs as described by Zelensky et al. (Zelensky & Gready [2003]). The exceptional character “‡” signals that one template residue is in a β-‐strand and the other in a β-‐bridge conformation. “(e)” in A) means that the residue was in 74.2% of the frames in strand conformation on average.
To provide more information the secondary structure elements were divided into β-‐
strands (“E”), β-‐bridges (“B”), α-‐helices (“H”) and 3/10-‐helices (“G”). As already introduced in Fig. 3.1.4. a lower case for the expected secondary structure for a perlucin residue implies that only one template has this conformation and not both of the templates. The SSEs that are characteristic for the CTLD (e.g. Zelensky & Gready [2003]) are given as well in Fig. 3.2.5.
Two conclusions can be drawn from Fig. 3.2.5. For both perlucin and MBP-‐A deviations of the average secondary structure assigned to each residue from the expected secondary structure can be observed. First of all it has to be pointed out that both simulated proteins lack a considerable structural segment: perlucin lacks the C-‐
terminal region for which no structural information were available and MBP-‐A lacks the N-‐terminal helical region. For the latter protein this might influence at least the stability of the first strand or even other parts of the protein depending on the native state of MBP-‐A. It is suggested that this protein can form oligomers (see e.g. Heise et al.
[2000], Weis & Drickamer [1994]). In the case of perlucin the native structure is not known therefore nothing can be said about the influence of the C-‐terminal region on the overall protein stability.
For MBP-‐A the secondary structure reference was obtained from a crystal structure.
Since a protein crystal is not a native environment for proteins the observed deviations might reflect the influence of the simulated environment on the overall protein structure. However the results of the MD simulations with MBP-‐A as a reference protein set the frame for the best results that can be expected from the simulation protocol that is used in this thesis.
With respect to perlucin the most obvious explanation for any deviation is that the generated model has some shortcomings and differs from a native solution structure or the energetically most favourable one.
Nonetheless every SSE characteristic for CTLDs can be identified in every MD simulation series of perlucin and MBP-‐A and the number of deviations is of the same order of magnitude (if one counts naively the number of residues in Fig. 3.2.5. that are not in the secondary structure conformation expected for CTLDs and omits the 3/10 helices). Therefore the obtained secondary structure assignment is considered to be reasonable. An obvious influence of the calcium ions on the secondary structure seems not to be visible.
A final remark concerns the subsuming of the α-‐helix, 3/10-‐helix and π-‐helix conformations into a “general helical” class. The π-‐helix conformation is not encountered to a relevant extent during the MD simulations. In contrast the 3/10 helical conformation is observed more frequently. In the Appendix the Figures III.R.10.
to III.R.12. show (non-‐representative) examples from the conducted MD simulations.
Especially in the α2 helix of the CTLD fold residues adopt a 3/10-‐helical conformation or switch between the 3/10-‐ and α-‐helical conformations. This feature might be linked to the overall stability of the α2 helix. In a short review of α-‐ and 3/10-‐helices in polypeptides Bolin and Millhauser (Bolin & Millhauser [1999]) conclude amongst others that the 3/10-‐helix could be an intermediate state between the unfolded and α-‐
helical conformation of polypeptides.
Therefore it should be part of future investigations if the instability of the C-‐terminal end of the α2 helix is the results of a modelling shortcoming or actually a protein feature. Remember that in the perlucin model the loop region between α2 and β2 lacked a template during the modelling process (see also end of section 3.1.3.). As it will become clear in section 3.2.5. this region shows a high positional fluctuations. It would be interesting to investigate the behaviour of the structure of OC-‐17 (PDB accession number 1GZ2), which has a 15 residue long segment between its α2 helix and β2 strand.
3.2.3. Solvent accessible surface area estimation of the CTLD of perlucin and