• Keine Ergebnisse gefunden

Basic  principles  of  protein-­‐protein  docking  with  ATTRACT

3.   Results  and  Discussion

3.3. Computational  docking  analysis  of  the  CTLD  of  perlucin

3.3.2.   Basic  principles  of  protein-­‐protein  docking  with  ATTRACT

In  the  following  some  general  remarks  about  protein-­‐protein  docking  will  be  made  and   the   program   ATTRACT   (Zacharias   [2003],   Fiorucci   &   Zacharias   [2010])   will   be   introduced.   In   the   scope   of   this   thesis   term   “docking”   has   to   be   understood   as  

“generating   and   evaluating   protein-­‐protein   complexes   (or   more   generally   protein-­‐

ligand  complexes)  with  computer  algorithms”.  

Protein-­‐ligand  interactions  play  major  roles  in  the  function  of  organisms.  The  ligands   that   can   bind   to   proteins   can   be   ions   (e.g.   Ca2+   binding   protein   calmodulin),   small   molecules   (e.g.   oxygen   carrying   globin   protein   family),   DNA   (e.g.   DNA   polymerase),   polysaccharides   (e.g.   lysozyme   that   catalyses   polysaccharide   cleavage),   proteins   (e.g.  

Tim-­‐Per   heterodimer   of   the   circadian   clock   in   Drosophila)   and   even   solids   (e.g.   ice   binding  antifreeze  proteins)  (the  latter  example  taken  from  Jia  &  Davies  [2002],  first   examples  arbitrarily  taken  from  Alberts  et  al.  [2002]).  

There  are  proteins  with  a  CTLD  –  currently  the  author  is  aware  of  at  least  three  to  five   –   that   can   form   homodimers   in   solution.   Poget   and   co-­‐workers   (Poget  et  al.   [1999])   could   show   that   the   recombinant   C-­‐type   lectin   TC14   (UniProt   accession   number   P16108)   from   Polyandrocarpa   misakiensis   forms   dimers   under   “physiological   conditions”  (p.  869)  and  obtained  a  crystal  structure  of  the  dimer  (PDB  accession  code   1TLG).  Note  that  Suzuki  et  al.  (Suzuki  et  al.  [1990])  concluded  from  earlier  analytical   gelfiltration  experiments  that  TC14  is  a  monomer  in  solution.  It  was  speculated  that   the   protein   is   part   of   the   animals   defense   system   (Suzuki  et  al.   [1990]).   TC14   might   also   be   involved   “in   bud   morphogenesis”   (Kawamura   et   al.   [1991],   p.   995)   in   Polyandrocarpa  misakiensis.  

Another  example  of  a  dimeric  CTLD  is  the  “human  hematopoietic  cell  receptor  CD69”  

or   “early   activation   antigen   CD69”   (Llera   et   al.   [2001],   UniProt   accession   number   Q07108).  CD69  is  a  transmembrane  receptor,  whose  CTLD  can  be  found  for  example   on  the  surface  of  lymphocytes.  There  it  forms  homodimers  connected  through  at  least   one  disulphide  bridge  which  is/are  not  part  of  the  CTLD  itself  (Llera  et  al.  [2001],  Testi   et  al.  [1994]).  Llera  et  al.  determined  that  the  recombinant  extracellular  CTLD  of  CD69   can  form  non-­‐covalently  bound  dimers  and  obtained  a  crystal  structure  of  this  dimer.  

CD69  and  TC14  are  introduced  here  as  examples  since  their  structures  will  be  used  to   test  the  systematic  docking  approach  that  is  used  in  this  thesis  for  perlucin.  

The   general   aim   of   computational   docking   methods   is   to   predict   the   structure   of   protein-­‐ligand  complexes.  Reviews  on  some  key  issues  and  algorithms  used  in  protein-­‐

protein   docking   are   given   by   Moreira   et   al.   (Moreira   et   al.   [2010]),   Halperin   et   al.  

(Halperin  et  al.   [2002])   as   well   as   Smith   and   Sternberg   (Smith   &   Sternberg   [2002]).  

The   central   parts   of   docking   are:   the   representation   of   the   structures   under   investigation,  the  sampling  of  possible  complex  conformations  and  the  assessment  of   the  obtained  complexes  (see  aforementioned  reviews).  In  the  following  it  is  described   how  the  ATTRACT  program  package  performs  systematic  docking  (Zacharias  [2003],   Zacharias  [2008],  Fiorucci  &  Zacharias  [2010]).  

 

Protein  representation  by  ATTRACT    

The   proteins   structures   are   represented   by   a   reduced   model.   While   the   backbone   atoms  are  retained  the  sidechain  atoms  are  replaced  by  not  more  than  two  “pseudo   atoms”.  However  note  that  only  the  backbone  nitrogen  and  oxygen  atoms  are  involved   in  the  energy  calculations  during  the  docking  process  explained  in  the  next  paragraph.  

Fig.   3.3.4.   gives   three   examples   of   the   placement   of   pseudo   atoms   in   residues.   The   position  of  the  pseudo  atom  in  the  case  of  the  “small”  residues  (Ala,  Ser,  Thr,  Val,  Leu,   Ile,  Asn,  Asp,  Pro,  Cys)  is  the  geometric  centre  of  the  sidechain  heavy  atoms  (here  this   includes  the  Cα  atom).  The  exceptional  Gly  residue  is  represented  by  the  backbone  C,  N,   O  and  Cα.  The  remaining  residues  are  described  by  two  pseudo  atoms.  In  the  case  of   Tyr,   Met   and   Phe   the   first   one   is   placed   half-­‐way   between   the   sidechain   Cβ   and   Cγ   atoms.  The  second  one  is  placed  at  the  geometric  centre  of  the  remaining  heavy  atoms.  

In  the  case  of  the  residues  Glu,  Arg,  Lys,  Trp  and  Gln  the  first  pseudo  atom  is  placed  at   the   position   of   the   Cγ   atom.   The   position   of   the   second   pseudo   atom   is   different   for   each  residue.  Arg:  geometric  centre  of  Nε  and  Cζ.  Glu:  geometric  centre  of  Cδ,  Oε1  and   Oε2.   Gln:   geometric   centre   of   Cδ,   Oε1   and   Nε2.   Lys:   position   is   equivalent   to   Cε.   Trp:  

geometric  centre  of  Cδ2,  Cε2,  Cε3,  Cη2,  Cζ3,  Cζ2.  These  information  were  directly  inferred   from  the  FORTRAN  source  code  (reduce.f)  of  the  used  “reduce”  software  module  of   the  ATTRACT  package  that  produces  the  reduced  protein  structures.  

 

 

Fig.  3.3.4.  Examples  of  protein  residues  with  all  non-­‐hydrogen  atoms  and  their  representation   in   the   reduced   model   used   by   ATTRACT.   In   every   case   the   spheres   –   independent   of   their   colours   –   represent   the   positions   of   atoms   in   the   reduced   model.   In   the   reduced   model   the   heavy   backbone   atoms   C,   N,   O   and   Cα   are   retained.   However   note   that   only   the   backbone   nitrogen  and  oxygen  atoms  are  involved  in  the  energy  calculations  during  the  docking  process.  

Cyan  symbolizes  carbon  atoms,  blue  is  the  colour  for  nitrogen  atoms  and  red  for  oxygen  atoms.  

The  pseudo  atoms  that  represent  the  sidechains  are  given  as  orange  sphere.  Asn  is  an  example   for  a  “small”  residue  whose  sidechain  is  represented  by  one  pseudo  atom.  It  is  positioned  at  the   geometric  centre  of  all  heavy  sidechain  atoms  including  Cα.  The  sidechain  of  Trp  is  represented   by  two  pseudo  atoms.  In  this  case  the  first  one  is  placed  at  the  position  of  Cγ  and  the  second   one  in  the  ring  formed  by  six  carbon  atoms.  Phe  exemplifies  a  residue  with  two  pseudo  atoms   as  well.  The  first  one  is  placed  half-­‐way  between  Cβ  and  Cγ  and  second  one  at  the  geometric   centre  of  the  remaining  sidechain  atoms.  The  atom  labels  follow  the  IUPAC  recommendations   (Markley   et   al.   [1998],   see   Appendix   III.A.)   and   the   structures   are   rendered   with   VMD   (Humphrey  et  al.  [1996]  version  1.9.1).  

 

This  kind  of  reduced  protein  representation  will  not  only  save  computational  time  but  

“reduce[s]   the   number   of   energy   minima   on   the   surface   of   the   protein   partners”  

(Zacharias  [2003],  p.  1279).    

 

Effective  interaction  between  pseudo  atoms    

In  ATTRACT  to  each  of  the  possible  pseudo  atom  pairs  four  parameters  are  assigned   (see   Fiorucci   &   Zacharias   [2010] and   especially   supplementary   material).   These   parameters  are  necessary  to  calculate  the  pairwise  interaction  energy.  Note  that  in  the   context   of   ATTRACT   this   interaction   energy   has   to   be   understood   as   an   “effective   interaction”   (Fiorucci   &   Zacharias   [2010],   p.   3132)   energy.   As   long   as   the   ATTRACT  

methodology  and  the  docking  results  are  discussed  the  terms  “interaction  energy”  and  

“effective  interaction  energy”  are  used  interchangeably.  

ATTRACT  distinguishes  a  priori  and  explicitly  between  repulsive  and  attractive  pseudo   atom   pairs.   The   interaction   energy  𝑉𝑉!"  between   an   attractive   pair   of   atoms   A   and   B   with  a  distance  𝑟𝑟!"  is  given  by  

 

𝑉𝑉!" 𝑟𝑟!" = 𝜖𝜖!" 𝑅𝑅!"

𝑟𝑟!"

!− 𝑅𝑅!"

𝑟𝑟!"

! + 𝑞𝑞!  𝑞𝑞!

𝜀𝜀 𝑟𝑟!"  𝑟𝑟!"   (3.3.1.)    

𝑅𝑅!"  and  𝜖𝜖!"  are  effective  pairwise  Lennard-­‐Jones  interaction  parameters.  Note  that  in   the  case  of  pure  Lennard-­‐Jones  interactions  the  minimum  position  is  𝑟𝑟!"!",! = 4/3  𝑅𝑅!"  

and   consequently   the   minimal   Lennard-­‐Jones   interaction   energy   is  𝑉𝑉!"!" 𝑟𝑟!"!",! =

−(27/256)    𝜖𝜖!".   Additionally   the   Coulomb   energy   between   pseudo   atoms   A   and   B   is   considered   if   A   and   B   originate   from   the   charged   residues   Lys,   Arg,   Glu   or   Asp.   The   charge   is   the   integer  ±1.   The   Coulomb   interaction   is   additionally   reduced   with   a   distance  dependent  dielectric  constant  𝜀𝜀 𝑟𝑟!" = 15 ⋅ 𝑟𝑟!".  

The  effective  energy  of  repulsive  pseudo  atom  pairs  is  calculated  as    

𝑉𝑉!" 𝑟𝑟!"

=

 

−𝜖𝜖!" 𝑅𝑅!"

𝑟𝑟!"

!

− 𝑅𝑅!"

𝑟𝑟!"

!

+ 𝑞𝑞!  𝑞𝑞!

𝜀𝜀 𝑟𝑟!"  𝑟𝑟!"      ; 𝑟𝑟!" > 𝑟𝑟!"!",!

2 ⋅ |𝑉𝑉!"!" 𝑟𝑟!"!",! |  + 𝜖𝜖!" 𝑅𝑅!"

𝑟𝑟!"

!

− 𝑅𝑅!"

𝑟𝑟!"

!

+ 𝑞𝑞!  𝑞𝑞!

𝜀𝜀 𝑟𝑟!"  𝑟𝑟!"  ; 𝑟𝑟!" ≤ 𝑟𝑟!"!",!

  (3.3.2.)  

 

Note  that  neither  ions  nor  water  molecules  are  included  in  the  reduced  representation   of  the  proteins.  To  illustrate  the  interaction  energies  given  by  the  equations  3.3.1.  and   3.3.2.  two  exemplary  interaction  energy  graphs  are  shown  in  Fig.  3.3.5.  

 

Fig.  3.3.5.  Exemplary  interaction  energy  of  pseudo  atoms.  The  red  graph  shows  the  attractive   interaction  between  the  sidechain  pseudo  atoms  of  two  Ala  residues.  The  blue  graph  shows  the   repulsive   interaction   between   the   sidechain   pseudo   atoms   of   an   Ala   and   an   Asn   residue.   In   both  cases  the  pseudo  atoms  are  not  charged.  The  interaction  energy  is  given  in  units  of   𝑅𝑅 ⋅ 𝑇𝑇   where  𝑅𝑅 = 8.31  𝐽𝐽/𝑚𝑚𝑚𝑚𝑚𝑚  𝐾𝐾  and  𝑇𝑇  is  room  temperature  (Fiorucci  &  Zacharias  [2010]).  

 

So  far  the  proteins  are  described  in  a  reduced  representation  and  to  the  pseudo  atoms   parameters   are   assigned   that   are   supposed   to   reflect   their   physico-­‐chemical   properties.  This  provokes  the  question  why  the  proteins  are  not  treated  in  an  all-­‐atom   fashion  with  force  field  parameters  used  in  MD  simulations.  To  answer  this  question   one  has  to  consider  how  ATTRACT  samples  the  possible  protein-­‐protein  complexes.  

 

Sampling  of  protein-­‐protein  complexes  by  ATTRACT    

The  position  of  one  protein  in  the  reduced  representation  is  kept  fixed.  This  protein  is   denoted  here  as  the  receptor.  In  a  first  step,  around  the  surface  of  the  receptor  several   starting  positions  for  the  ligand  centre  are  generated.  The  distances  of  these  starting   points  from  the  receptor  surface  are  slightly  larger  than  the  largest  distance  between   any  of  the  ligands  pseudo  atoms  and  its  geometric  centre.  The  methodology  employed   by  ATTRACT  seems  to  be  similar  to  that  already  described  for  the  determination  of  the   SASA  (see  section  3.2.3.).  The  total  number  of  these  starting  points  ranges  between  83  

and   104   for   the   investigated   structures   in   this   thesis.   Fig.   3.3.6.   exemplifies   the   distribution  of  ligand  starting  positions  around  a  receptor  (here  perlucin).  

 

Fig.  3.3.6.  Exemplary  distribution  of  ligand  starting  positions  (blue  spheres)  around  a  perlucin   receptor   molecule.   The   geometric   centre   of   the   ligand   is   placed   at   the   positions   of   the   blue   spheres.  The  molecule  is  rendered  with  VMD  (Humphrey  et  al.  [1996]  version  1.9.1).  The  “New   Cartoon”   representation   of   the   protein   involves   the   STRIDE   algorithm   (Frishman   &   Argos   [1995]).  

 

At  each  of  the  starting  positions  (blue  spheres  in  Fig.  3.3.6.)  the  geometric  centre  of  the   ligand   is   placed   and   subsequently   rotated.   This   generates   several   different   relative   orientations   between   ligand   and   receptor   at   each   of   the   starting   points.   As   far   as   it   could   be   extracted   here   228   different   relative   orientations   per   starting   point   are   generated  through  ligand  rotation.  In  total  around  20000  initial  ligand-­‐receptor  pairs   with  different  relative  orientations  are  generated.  Note  that  these  pairs  are  not  docked   yet.  They  are  still  spatially  separated.  

In  the  systematic  docking  approach  of  ATTRACT  the  next  step  is  to  minimize  the  total   effective  potential  energy  (sum  of  the  pairwise  effective  potential  energy  in  equations   3.3.1.   and   3.3.2.)   for   each   of   the   relative   orientations   of   the   ligand   and   receptor   as   described  in  the  preceding  paragraph.  During  the  minimization  the  ligand  is  allowed  to   rotate  and  translate.  

This  minimization  can  be  performed  in  several  stages  (here  four  stages).  In  this  thesis   the  minimization  stages  differ  in  the  number  of  minimization  steps,  the  cut-­‐off  distance  

that  is  used  to  determine  the  interacting  partners  at  the  beginning  of  each  stage  and   whether  positional  restraints  are  used.  

The  first  two  minimization  stages  are  performed  with  a  harmonic  positional  restraint   between   the   geometric   centre   of   the   receptor   and   the   Cα   atom   of   the   ligand   that   is   closest   to   the   geometric   centre   of   the   receptor.   This   additional   harmonic   potential   ensures  that  the  ligand  gets  into  close  contact  with  the  receptor  surface  during  the  first   minimization  stages.    

The   next   two   minimization   stages   are   performed   without   this   additional   harmonic   potential.   In   these   steps   the   ligand   is   supposed   to   adopt   the   energetically   most   favourable  orientation  with  respect  to  the  ligand.  

The   cut-­‐off   distance   that   is   used   to   determine   the   interacting   pseudo   atoms   is   subsequently   reduced   to  ≈ 7.1  Å.   The   number   of   interacting   pseudo   atoms   is   only   determined  at  the  beginning  of  each  minimization  stage.  

To   perform   the   energy   minimization   of   several   thousand   starting   orientations   in   reasonable   time   it   is   necessary   to   reduce   the   number   of   interacting   atoms.   This   is   achieved  with  the  reduced  protein  representation.  In  summary  the  ATTRACT  program   constructs  the  docked  protein-­‐protein  complexes  by  1)  generating  a  large  number  of   different   initial   orientations   between   receptor   and   ligand   2)   minimize   the   effective   energy  between  the  ligand  and  the  receptor.  

 

Assessment  of  the  generated  complexes    

The  final  step  is  the  assessment  of  the  docked  complexes.  Initially  this  is  done  with  a   ranking   of   the   complexes   according   to   the   ascending   effective   energy.   It   is   assumed   that  complexes  with  lower  effective  energy  are  closer  to  the  native  conformation.  

In  a  second  step  the  generated  complexes  are  filtered  according  to  two  conditions.  At   the   end   of   minimization   procedure   of   every   initial   ligand-­‐receptor   orientation   it   is   possible  that  more  than  one  final  ligand  position  are  similar.  ATTRACT  considers  two   ligand  positions/orientations  as  equivalent  if  they  can  be  superposed  with  a  rotation  of   less   than  3.4°  (of   each   angle)   and   a   translation   of   less   than  0.45  Å  (of   each   centre   coordinate).  

An   additional   filter   that   can   be   used   to   reduce   the   number   of   generated   complexes   evaluates  symmetry  property  of  the  complexes.    

The  current  experimental  information  (see  section  3.4.)  suggests  –  if  at  all  –  a  dimeric   complex  of  perlucin  under  certain  experimental  conditions.  If  proteins  form  a  dimer   then   it   is   reasonable   to   assume   that   both   receptor   and   ligand   contribute   nearly   the   same  residues  to  the  interface.  If  they  would  not  then  complexes  containing  more  two   proteins  could  form.  Note  that  this  implies  that  those  experimental  conditions,  e.g.  like   protein  concentration,  do  not  influence  the  oligomerisation  behaviour.  

In  general  ATTRACT  checks  the  symmetry  of  a  complex  via  the  pairwise  distance  of   atoms.  Consider  a  receptor  and  a  ligand  with  the  same  number  and  same  sequence  of   atoms.   Let  𝑟𝑟!"#!  and  𝑟𝑟!"#!  denote   the   position   of   atom  𝑖𝑖  from   the   receptor   and   ligand   respectively  as  well  as  𝑟𝑟!"#!  and  𝑟𝑟!"#!  denote  the  position  of  atom  𝑗𝑗  from  the  receptor  and   ligand   respectively.   For   symmetric   complexes   the   relation   𝑟𝑟!"#! − 𝑟𝑟!"#! =   𝑟𝑟!"#! − 𝑟𝑟!"#!   must   hold   for   every   atom   pair.   This   rigorous   condition   is   softened   in   the   actual   calculations.   Both   distances   are   allowed   to   differ   maximal  8.4  Å  to   account   for   moderate  structural  differences  of  the  receptor  and  the  ligand.  Additionally  only  atom   pairs  with  a  distance  < 22.4  Å  are  evaluated.  

These  information  are  directly  inferred  from  the  FORTRAN  source  code  (col_sym.f)  of   the   used   “col_sym”   software   module   of   the   ATTRACT   package   that   performs   the   filtering  of  the  unique  and  symmetric  complexes.  

In   the   cases   of   the   proteins   that   were   used   in   this   thesis   for   computational   docking   studies  the  above  described  filtering  procedures  resulted  in  a  number  of  complexes  in   the  order  of  100  for  each  docking  run.  

 

Beyond  rigid  docking    

So  far  only  the  docking  of  rigid  structures  is  described.  “Rigid”  means  that  the  protein   structures  are  treated  as  rigid  bodies.  Obviously  this  is  a  strong  simplification.  Halperin   et   al.   (Halperin   et   al.   [2002])   summarised   three   possible   kinds   of   changes   that   can   occur   between   the   bound   (in   a   complex)   and   unbound   state   of   a   protein.   Particular   residues  can  change  their  conformation  upon  complexation  (see  for  example  Fig.  5  in   Betts  &  Sternberg  [1999]),  larger  protein  segments  can  adopt  new  positions  during  the   protein-­‐ligand  interaction  (see  for  example  Ramakrishnan  &  Qasba  [2001]  where  the   conformational   change   of   a   galactosyltransferase   is   shown   upon   binding   of   a   small  

molecule,  esp.  Fig.  8  therein)  and  even  intrinsically  disordered  segments  of  proteins  or   peptides  can  adopt  a  fold  upon  binding  to  a  protein  (see  e.g.  Dyson  &  Wright  [2005]).  

The  ATTRACT  program  package  can  provide  approaches  to  tackle  the  first  two  of  the   aforementioned  issues.  Different  conformations  of  the  large  sidechains  can  be  included   explicitly   in   the   structures   before   their   representation   is   reduced.   During   the   minimization   steps   the   different   conformations   are   evaluated   with   respect   to   the   effective   potential.   The   sidechain   conformation   with   the   lowest   effective   potential   energy  is  used  during  the  final  minimizations  (Zacharias  [2003]).  

Concerning   the   second   issue   ATTRACT   is   capable   of   accounting   for   larger   conformational  changes  of  the  protein  structure.  Briefly,  this  is  done  by  the  calculation   of   low-­‐frequency   harmonic   modes.   They   are   obtained   from   a   harmonic   potential   between   the   Cα   atoms   of   the   protein   structure.   This   force   constant   of   the   harmonic   potential   is   distance-­‐dependent   and   decays   with   increasing   distance.   From   this   harmonic  potential  low-­‐frequency  oscillations  of  the  backbone  (sidechains  included  as   rigid   bodies)   can   be   calculated   and   considered   during   the   effective   energy   minimization   of   the   docking   procedure   (see   May   &   Zacharias   [2008]   for   the   implementation   in   ATTRACT,   see   Hinsen   [1998]   for   principles   of   normal   mode   calculations).    

Large   conformational   changes   like   transition   from   an   unfolded   to   folded   state   of   a   protein  are  currently  not  supported  by  ATTRACT.    

The  options  of  different  sidechain  conformations  and  low-­‐frequency  modes  were  not   exploited  for  the  systematic  docking  of  perlucin  as  well  as  the  test  proteins  TC14  and   CD69  in  this  thesis.  In  the  case  of  perlucin  the  choice  of  six  different  structures  (see  Fig.  

3.3.3.)  was  supposed  to  account  for  different  conformations  in  the  first  instance.  

 

In   the   following   it   is   shown   that   the   docking   procedure   used   in   this   thesis   without   refinements   (sidechain   conformations   and   low-­‐frequency   modes)   can   predict   some   residues  of  the  interfaces  of  the  crystal  structures  of  CTLD  dimers  if  the  monomeric   protein  structures  from  the  crystallised  dimers  are  used.  This  latter  point  is  resumed  in   the   next   section.   Furthermore   the   procedure   how   the   interface   residues   were   determined  is  explained  in  the  next  section  using  the  reference  dimers  as  examples.