3. Results and Discussion
3.3. Computational docking analysis of the CTLD of perlucin
3.3.2. Basic principles of protein-‐protein docking with ATTRACT
In the following some general remarks about protein-‐protein docking will be made and the program ATTRACT (Zacharias [2003], Fiorucci & Zacharias [2010]) will be introduced. In the scope of this thesis term “docking” has to be understood as
“generating and evaluating protein-‐protein complexes (or more generally protein-‐
ligand complexes) with computer algorithms”.
Protein-‐ligand interactions play major roles in the function of organisms. The ligands that can bind to proteins can be ions (e.g. Ca2+ binding protein calmodulin), small molecules (e.g. oxygen carrying globin protein family), DNA (e.g. DNA polymerase), polysaccharides (e.g. lysozyme that catalyses polysaccharide cleavage), proteins (e.g.
Tim-‐Per heterodimer of the circadian clock in Drosophila) and even solids (e.g. ice binding antifreeze proteins) (the latter example taken from Jia & Davies [2002], first examples arbitrarily taken from Alberts et al. [2002]).
There are proteins with a CTLD – currently the author is aware of at least three to five – that can form homodimers in solution. Poget and co-‐workers (Poget et al. [1999]) could show that the recombinant C-‐type lectin TC14 (UniProt accession number P16108) from Polyandrocarpa misakiensis forms dimers under “physiological conditions” (p. 869) and obtained a crystal structure of the dimer (PDB accession code 1TLG). Note that Suzuki et al. (Suzuki et al. [1990]) concluded from earlier analytical gelfiltration experiments that TC14 is a monomer in solution. It was speculated that the protein is part of the animals defense system (Suzuki et al. [1990]). TC14 might also be involved “in bud morphogenesis” (Kawamura et al. [1991], p. 995) in Polyandrocarpa misakiensis.
Another example of a dimeric CTLD is the “human hematopoietic cell receptor CD69”
or “early activation antigen CD69” (Llera et al. [2001], UniProt accession number Q07108). CD69 is a transmembrane receptor, whose CTLD can be found for example on the surface of lymphocytes. There it forms homodimers connected through at least one disulphide bridge which is/are not part of the CTLD itself (Llera et al. [2001], Testi et al. [1994]). Llera et al. determined that the recombinant extracellular CTLD of CD69 can form non-‐covalently bound dimers and obtained a crystal structure of this dimer.
CD69 and TC14 are introduced here as examples since their structures will be used to test the systematic docking approach that is used in this thesis for perlucin.
The general aim of computational docking methods is to predict the structure of protein-‐ligand complexes. Reviews on some key issues and algorithms used in protein-‐
protein docking are given by Moreira et al. (Moreira et al. [2010]), Halperin et al.
(Halperin et al. [2002]) as well as Smith and Sternberg (Smith & Sternberg [2002]).
The central parts of docking are: the representation of the structures under investigation, the sampling of possible complex conformations and the assessment of the obtained complexes (see aforementioned reviews). In the following it is described how the ATTRACT program package performs systematic docking (Zacharias [2003], Zacharias [2008], Fiorucci & Zacharias [2010]).
Protein representation by ATTRACT
The proteins structures are represented by a reduced model. While the backbone atoms are retained the sidechain atoms are replaced by not more than two “pseudo atoms”. However note that only the backbone nitrogen and oxygen atoms are involved in the energy calculations during the docking process explained in the next paragraph.
Fig. 3.3.4. gives three examples of the placement of pseudo atoms in residues. The position of the pseudo atom in the case of the “small” residues (Ala, Ser, Thr, Val, Leu, Ile, Asn, Asp, Pro, Cys) is the geometric centre of the sidechain heavy atoms (here this includes the Cα atom). The exceptional Gly residue is represented by the backbone C, N, O and Cα. The remaining residues are described by two pseudo atoms. In the case of Tyr, Met and Phe the first one is placed half-‐way between the sidechain Cβ and Cγ atoms. The second one is placed at the geometric centre of the remaining heavy atoms.
In the case of the residues Glu, Arg, Lys, Trp and Gln the first pseudo atom is placed at the position of the Cγ atom. The position of the second pseudo atom is different for each residue. Arg: geometric centre of Nε and Cζ. Glu: geometric centre of Cδ, Oε1 and Oε2. Gln: geometric centre of Cδ, Oε1 and Nε2. Lys: position is equivalent to Cε. Trp:
geometric centre of Cδ2, Cε2, Cε3, Cη2, Cζ3, Cζ2. These information were directly inferred from the FORTRAN source code (reduce.f) of the used “reduce” software module of the ATTRACT package that produces the reduced protein structures.
Fig. 3.3.4. Examples of protein residues with all non-‐hydrogen atoms and their representation in the reduced model used by ATTRACT. In every case the spheres – independent of their colours – represent the positions of atoms in the reduced model. In the reduced model the heavy backbone atoms C, N, O and Cα are retained. However note that only the backbone nitrogen and oxygen atoms are involved in the energy calculations during the docking process.
Cyan symbolizes carbon atoms, blue is the colour for nitrogen atoms and red for oxygen atoms.
The pseudo atoms that represent the sidechains are given as orange sphere. Asn is an example for a “small” residue whose sidechain is represented by one pseudo atom. It is positioned at the geometric centre of all heavy sidechain atoms including Cα. The sidechain of Trp is represented by two pseudo atoms. In this case the first one is placed at the position of Cγ and the second one in the ring formed by six carbon atoms. Phe exemplifies a residue with two pseudo atoms as well. The first one is placed half-‐way between Cβ and Cγ and second one at the geometric centre of the remaining sidechain atoms. The atom labels follow the IUPAC recommendations (Markley et al. [1998], see Appendix III.A.) and the structures are rendered with VMD (Humphrey et al. [1996] version 1.9.1).
This kind of reduced protein representation will not only save computational time but
“reduce[s] the number of energy minima on the surface of the protein partners”
(Zacharias [2003], p. 1279).
Effective interaction between pseudo atoms
In ATTRACT to each of the possible pseudo atom pairs four parameters are assigned (see Fiorucci & Zacharias [2010] and especially supplementary material). These parameters are necessary to calculate the pairwise interaction energy. Note that in the context of ATTRACT this interaction energy has to be understood as an “effective interaction” (Fiorucci & Zacharias [2010], p. 3132) energy. As long as the ATTRACT
methodology and the docking results are discussed the terms “interaction energy” and
“effective interaction energy” are used interchangeably.
ATTRACT distinguishes a priori and explicitly between repulsive and attractive pseudo atom pairs. The interaction energy 𝑉𝑉!" between an attractive pair of atoms A and B with a distance 𝑟𝑟!" is given by
𝑉𝑉!" 𝑟𝑟!" = 𝜖𝜖!" 𝑅𝑅!"
𝑟𝑟!"
!− 𝑅𝑅!"
𝑟𝑟!"
! + 𝑞𝑞! 𝑞𝑞!
𝜀𝜀 𝑟𝑟!" 𝑟𝑟!" (3.3.1.)
𝑅𝑅!" and 𝜖𝜖!" are effective pairwise Lennard-‐Jones interaction parameters. Note that in the case of pure Lennard-‐Jones interactions the minimum position is 𝑟𝑟!"!",! = 4/3 𝑅𝑅!"
and consequently the minimal Lennard-‐Jones interaction energy is 𝑉𝑉!"!" 𝑟𝑟!"!",! =
−(27/256) 𝜖𝜖!". Additionally the Coulomb energy between pseudo atoms A and B is considered if A and B originate from the charged residues Lys, Arg, Glu or Asp. The charge is the integer ±1. The Coulomb interaction is additionally reduced with a distance dependent dielectric constant 𝜀𝜀 𝑟𝑟!" = 15 ⋅ 𝑟𝑟!".
The effective energy of repulsive pseudo atom pairs is calculated as
𝑉𝑉!" 𝑟𝑟!"
=
−𝜖𝜖!" 𝑅𝑅!"
𝑟𝑟!"
!
− 𝑅𝑅!"
𝑟𝑟!"
!
+ 𝑞𝑞! 𝑞𝑞!
𝜀𝜀 𝑟𝑟!" 𝑟𝑟!" ; 𝑟𝑟!" > 𝑟𝑟!"!",!
2 ⋅ |𝑉𝑉!"!" 𝑟𝑟!"!",! | + 𝜖𝜖!" 𝑅𝑅!"
𝑟𝑟!"
!
− 𝑅𝑅!"
𝑟𝑟!"
!
+ 𝑞𝑞! 𝑞𝑞!
𝜀𝜀 𝑟𝑟!" 𝑟𝑟!" ; 𝑟𝑟!" ≤ 𝑟𝑟!"!",!
(3.3.2.)
Note that neither ions nor water molecules are included in the reduced representation of the proteins. To illustrate the interaction energies given by the equations 3.3.1. and 3.3.2. two exemplary interaction energy graphs are shown in Fig. 3.3.5.
Fig. 3.3.5. Exemplary interaction energy of pseudo atoms. The red graph shows the attractive interaction between the sidechain pseudo atoms of two Ala residues. The blue graph shows the repulsive interaction between the sidechain pseudo atoms of an Ala and an Asn residue. In both cases the pseudo atoms are not charged. The interaction energy is given in units of 𝑅𝑅 ⋅ 𝑇𝑇 where 𝑅𝑅 = 8.31 𝐽𝐽/𝑚𝑚𝑚𝑚𝑚𝑚 𝐾𝐾 and 𝑇𝑇 is room temperature (Fiorucci & Zacharias [2010]).
So far the proteins are described in a reduced representation and to the pseudo atoms parameters are assigned that are supposed to reflect their physico-‐chemical properties. This provokes the question why the proteins are not treated in an all-‐atom fashion with force field parameters used in MD simulations. To answer this question one has to consider how ATTRACT samples the possible protein-‐protein complexes.
Sampling of protein-‐protein complexes by ATTRACT
The position of one protein in the reduced representation is kept fixed. This protein is denoted here as the receptor. In a first step, around the surface of the receptor several starting positions for the ligand centre are generated. The distances of these starting points from the receptor surface are slightly larger than the largest distance between any of the ligands pseudo atoms and its geometric centre. The methodology employed by ATTRACT seems to be similar to that already described for the determination of the SASA (see section 3.2.3.). The total number of these starting points ranges between 83
and 104 for the investigated structures in this thesis. Fig. 3.3.6. exemplifies the distribution of ligand starting positions around a receptor (here perlucin).
Fig. 3.3.6. Exemplary distribution of ligand starting positions (blue spheres) around a perlucin receptor molecule. The geometric centre of the ligand is placed at the positions of the blue spheres. The molecule is rendered with VMD (Humphrey et al. [1996] version 1.9.1). The “New Cartoon” representation of the protein involves the STRIDE algorithm (Frishman & Argos [1995]).
At each of the starting positions (blue spheres in Fig. 3.3.6.) the geometric centre of the ligand is placed and subsequently rotated. This generates several different relative orientations between ligand and receptor at each of the starting points. As far as it could be extracted here 228 different relative orientations per starting point are generated through ligand rotation. In total around 20000 initial ligand-‐receptor pairs with different relative orientations are generated. Note that these pairs are not docked yet. They are still spatially separated.
In the systematic docking approach of ATTRACT the next step is to minimize the total effective potential energy (sum of the pairwise effective potential energy in equations 3.3.1. and 3.3.2.) for each of the relative orientations of the ligand and receptor as described in the preceding paragraph. During the minimization the ligand is allowed to rotate and translate.
This minimization can be performed in several stages (here four stages). In this thesis the minimization stages differ in the number of minimization steps, the cut-‐off distance
that is used to determine the interacting partners at the beginning of each stage and whether positional restraints are used.
The first two minimization stages are performed with a harmonic positional restraint between the geometric centre of the receptor and the Cα atom of the ligand that is closest to the geometric centre of the receptor. This additional harmonic potential ensures that the ligand gets into close contact with the receptor surface during the first minimization stages.
The next two minimization stages are performed without this additional harmonic potential. In these steps the ligand is supposed to adopt the energetically most favourable orientation with respect to the ligand.
The cut-‐off distance that is used to determine the interacting pseudo atoms is subsequently reduced to ≈ 7.1 Å. The number of interacting pseudo atoms is only determined at the beginning of each minimization stage.
To perform the energy minimization of several thousand starting orientations in reasonable time it is necessary to reduce the number of interacting atoms. This is achieved with the reduced protein representation. In summary the ATTRACT program constructs the docked protein-‐protein complexes by 1) generating a large number of different initial orientations between receptor and ligand 2) minimize the effective energy between the ligand and the receptor.
Assessment of the generated complexes
The final step is the assessment of the docked complexes. Initially this is done with a ranking of the complexes according to the ascending effective energy. It is assumed that complexes with lower effective energy are closer to the native conformation.
In a second step the generated complexes are filtered according to two conditions. At the end of minimization procedure of every initial ligand-‐receptor orientation it is possible that more than one final ligand position are similar. ATTRACT considers two ligand positions/orientations as equivalent if they can be superposed with a rotation of less than 3.4° (of each angle) and a translation of less than 0.45 Å (of each centre coordinate).
An additional filter that can be used to reduce the number of generated complexes evaluates symmetry property of the complexes.
The current experimental information (see section 3.4.) suggests – if at all – a dimeric complex of perlucin under certain experimental conditions. If proteins form a dimer then it is reasonable to assume that both receptor and ligand contribute nearly the same residues to the interface. If they would not then complexes containing more two proteins could form. Note that this implies that those experimental conditions, e.g. like protein concentration, do not influence the oligomerisation behaviour.
In general ATTRACT checks the symmetry of a complex via the pairwise distance of atoms. Consider a receptor and a ligand with the same number and same sequence of atoms. Let 𝑟𝑟!"#! and 𝑟𝑟!"#! denote the position of atom 𝑖𝑖 from the receptor and ligand respectively as well as 𝑟𝑟!"#! and 𝑟𝑟!"#! denote the position of atom 𝑗𝑗 from the receptor and ligand respectively. For symmetric complexes the relation 𝑟𝑟!"#! − 𝑟𝑟!"#! = 𝑟𝑟!"#! − 𝑟𝑟!"#! must hold for every atom pair. This rigorous condition is softened in the actual calculations. Both distances are allowed to differ maximal 8.4 Å to account for moderate structural differences of the receptor and the ligand. Additionally only atom pairs with a distance < 22.4 Å are evaluated.
These information are directly inferred from the FORTRAN source code (col_sym.f) of the used “col_sym” software module of the ATTRACT package that performs the filtering of the unique and symmetric complexes.
In the cases of the proteins that were used in this thesis for computational docking studies the above described filtering procedures resulted in a number of complexes in the order of 100 for each docking run.
Beyond rigid docking
So far only the docking of rigid structures is described. “Rigid” means that the protein structures are treated as rigid bodies. Obviously this is a strong simplification. Halperin et al. (Halperin et al. [2002]) summarised three possible kinds of changes that can occur between the bound (in a complex) and unbound state of a protein. Particular residues can change their conformation upon complexation (see for example Fig. 5 in Betts & Sternberg [1999]), larger protein segments can adopt new positions during the protein-‐ligand interaction (see for example Ramakrishnan & Qasba [2001] where the conformational change of a galactosyltransferase is shown upon binding of a small
molecule, esp. Fig. 8 therein) and even intrinsically disordered segments of proteins or peptides can adopt a fold upon binding to a protein (see e.g. Dyson & Wright [2005]).
The ATTRACT program package can provide approaches to tackle the first two of the aforementioned issues. Different conformations of the large sidechains can be included explicitly in the structures before their representation is reduced. During the minimization steps the different conformations are evaluated with respect to the effective potential. The sidechain conformation with the lowest effective potential energy is used during the final minimizations (Zacharias [2003]).
Concerning the second issue ATTRACT is capable of accounting for larger conformational changes of the protein structure. Briefly, this is done by the calculation of low-‐frequency harmonic modes. They are obtained from a harmonic potential between the Cα atoms of the protein structure. This force constant of the harmonic potential is distance-‐dependent and decays with increasing distance. From this harmonic potential low-‐frequency oscillations of the backbone (sidechains included as rigid bodies) can be calculated and considered during the effective energy minimization of the docking procedure (see May & Zacharias [2008] for the implementation in ATTRACT, see Hinsen [1998] for principles of normal mode calculations).
Large conformational changes like transition from an unfolded to folded state of a protein are currently not supported by ATTRACT.
The options of different sidechain conformations and low-‐frequency modes were not exploited for the systematic docking of perlucin as well as the test proteins TC14 and CD69 in this thesis. In the case of perlucin the choice of six different structures (see Fig.
3.3.3.) was supposed to account for different conformations in the first instance.
In the following it is shown that the docking procedure used in this thesis without refinements (sidechain conformations and low-‐frequency modes) can predict some residues of the interfaces of the crystal structures of CTLD dimers if the monomeric protein structures from the crystallised dimers are used. This latter point is resumed in the next section. Furthermore the procedure how the interface residues were determined is explained in the next section using the reference dimers as examples.