Structural studies on the outer membrane heme receptor of Serratia marcescens using molecular dynamics simulation

(1)

membrane heme receptor of Serratia marcescens using molecular dynamics

simulation

Dissertation

zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.)

vorgelegt von

Dipl. Biol. Simon Becker

an der

Mathematisch - Naturwissenschaftliche Sektion Fachbereich Biologie

Tag der mündlichen Prüfung: 05.10.2012 1. Referent: Prof. Dr. Kay Diederichs

2. Referent: PD Dr. Thomas Exner

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-208702

(2)

1 Zusammenfassung 1

2 Summary 3

3 Introduction I: Simulating dynamic molecules 5

3.1 Phase Space . . . 8

3.2 Force Field . . . 9

3.2.1 Bond Energy . . . 9

3.2.2 Angle Energy . . . 11

3.2.3 Torsional Energy . . . 13

3.2.4 van der Waals Potential . . . 14

3.2.5 Electrostatic Potential . . . 15

3.3 Molecular Dynamics . . . 18

3.3.1 Integration Schemes . . . 19

3.3.2 system setup and equilibration; from pdb to model . . . 20

3.3.3 Modications . . . 24

3.4 kinetics of protein association . . . 28

4 Introduction II: System of Interest 31 4.1 Iron . . . 31

4.2 The cell envelope of Gram negative bacteria . . . 32

4.3 TonB dependent transporters . . . 34

4.3.1 The TonB-ExbB-ExbD complex . . . 34

4.3.2 Mechanisms of energy transfer and substrate transport . . . 35

4.4 Heme acquisition in Serratia marcescens . . . 36

4.5 The heme acquisition system Has of Serratia marcescens . . . 37

4.5.1 HasA . . . 37

4.5.2 HasR . . . 40

4.5.3 Interaction between HasA and HasR . . . 42

4.5.4 HasB . . . 42

5 The transfer of heme from HasA to HasR 44 5.1 Preface . . . 44

5.2 Preparation . . . 44

5.3 Application of individual guiding potentials . . . 45

(3)

6.1 Preface . . . 49

6.3 Simulation details . . . 49

6.4 Applying the reaction coordinate . . . 52

6.5 Overview of simulation . . . 54

6.5.1 Events at the rst contact area . . . 56

6.5.2 Events at the second contact area . . . 58

6.6 Outlook . . . 61

6.6.1 Loop 7 . . . 61

6.6.2 Explaining the behavior of the Mutants . . . 62

7 Force propagation in TonB dependent receptors 65 7.1 Preface . . . 65

7.2 Introduction . . . 65

7.4 Repeating a known experiment . . . 68

7.5 Completely unwinding the plug . . . 70

7.5.1 How to simulate a refolding event (in theory) . . . 71

8 Miscellaneous 74 8.1 Membranes . . . 74

8.1.1 advanced setup . . . 74

8.1.2 Implicit Membrane . . . 75

8.2 Structural properties of the two proteins . . . 77

8.3 the structure of HasR alone . . . 78

8.3.1 Co-crystallization . . . 78

8.3.2 Locally enhanced sampling . . . 80

8.4 AmberPython . . . 80

9 Discussion 82 9.1 Heme transfer . . . 83

9.2 Complex formation . . . 84

9.3 Plug pulling . . . 85

10 Acknowledgments 87

(4)

3.1 Fundamental Aspects of Force Fields . . . 9

3.2 Bond Energy for CH4 . . . 10

3.3 Examples for Grid-based particle interactions . . . 17

3.4 multigrid . . . 18

3.5 Steepest Descent . . . 20

3.6 Modications to classical MD . . . 26

3.7 Basic Protein Reaction . . . 28

4.1 Structure of heme b . . . 31

4.2 The cell envelope of Gram negative bacteria . . . 33

4.3 Structures of TonB . . . 35

4.4 The heme acquisition system of Serratia marcescens . . . 38

4.5 Structures of HasA . . . 39

4.6 Alignment of TonB and HasB . . . 43

5.1 A close view of the binding pockets . . . 45

5.2 Eects of the targeting potential . . . 47

6.1 System setup . . . 51

6.2 Waters in the interface area at 5 ns . . . 52

6.3 Waters in the interface area at 20 ns . . . 53

6.4 Waters in the interface area at 35ns . . . 54

6.5 Waters in the interface area at 45 ns. HasR is shown in blue, the water molecules as red spheres. For orientational purposes HasA is shown as a cartoon representation with a transparent surface in picture a. . . 55

6.6 number of waters near the contact areas . . . 56

6.7 The rmsd of the simulation with known structures . . . 57

6.8 end point comparison . . . 59

6.9 Appearance of native contacts . . . 60

6.10 Superposition with the double mutant position at the start and at 10 ns . . 61

6.11 Superposition with the double mutant at 22 and 10 ns . . . 63

6.12 Superposition with the double mutant at 40 and 50 ns . . . 63

6.13 Superposition with the wild type at 35 and 40 ns. . . 64

6.14 Superposition with the wild type at 45 and 50 ns. . . 64

7.1 Force microscopy on BtuB . . . 66

(5)

7.5 The PMF experienced by the heme as a function of the simulation time. . 71

7.6 The predicted 241 n-terminal residues . . . 72

7.7 The predicted residues 113-241 . . . 72

8.1 Membrane setup . . . 76

8.2 electrostatic studies of HasR . . . 77

8.3 electrostatic studies of HasA . . . 78

(6)

Die hier vorliegende Arbeit befasst sich mit molekulardynamischen Untersuchungen am Aussenmembran-Rezeptor HasR des Enterobakteriums Serratia marcescens. S. marcescens ist ein opportunistischer Erreger, der als einer der Hauptverursacher von Krankenhaus- infektionen gilt. Einer der bestimmenden Faktoren für seine Virulenz ist seine Fähigkeit den Mikronährsto Eisen aus dem ansonsten sehr eisenarmen Blutplasma aufzunehmen.

Dieses liegt hauptsächlich in Form von Häm gebunden an Hämoglobin oder Hämoglobin- Haptoglobin vor. Der Rezeptor HasR kann freies Häm oder an das Hämophor HasA gebundenes Häm aufnehmen. Der Transport durch die Membran erfordert Energie, die von dem Innenmembranprotein HasB zur Verfügung gestellt wird.

Molekulardynamik ist eine atomistisch aufgelöste Simulationsmethode, die die Beobach- tung eines Moleküls über die Zeit erlaubt. Die Kräfte, die dabei auf jedes einzelne Atom einwirken, werden mit Hilfe eines so genannten Kraftfeldes bestimmt. Dieses Kraftfeld behandelt das Molekül als wären die Atome von Federn zusammengehalten. Die Kraftkon- stante jeder Feder ist dabei experimentell bestimmt worden und enthält die Einüsse der beteiligten Elektronen implizit. Daraus kann die Beschleunigung, die jedes einzelne Atom erfährt, berechnet werden. Ein kleiner Zeitschritt, üblicherweise in der Grössenordnung von 0,5 - 2 fs, erlaubt die iterative Berechnung aufeinander folgender Konformationen entlang der Reaktionskoordinate.

Diese Methode benötigt natürlich eine Ausgangsstruktur. Im Fall des HasA-HasR-Häm- Komplexes liegen eine Reihe von Kristallstrukturen vor, die unterschiedliche Zustände entlang der Reaktionskoordinate darstellen: Eine Wildtyp-Struktur, in der HasA an HasR gebunden ist und das Häm von HasR koordiniert ist, die Struktur einer Punktmutante, in der HasA an HasR gebunden ist aber das Häm weiterhin von HasA koordiniert ist, und eine noch nicht publizierte Doppelmutante, in der die Komplexbildung von HasA und HasR in einem frühen Stadium festgehalten wurde. Ausserdem existieren eine NMR- Struktur von HasA alleine und eine Kristallstruktur von HasA mit gebundenem Häm.

Eine Struktur des Rezeptors alleine ist bisher noch nicht bestimmt worden. Versuche eine relaxierte Rezeptorstruktur am Computer zu generieren sind in Kapitel8.3 zu nden.

Der vorgeschlagene Reaktionsmechanismus ist wie folgt: HasA mit gebundenem Häm bindet an HasR. Dadurch wird der das Häm koordinierende Loop A1 von HasA durch sterische Zwänge verschoben und die Koordinierung des Häms geschwächt. Durch eine Verschiebung der Position von Isoleucin 671 wird das Häm aus seiner Bindetasche ver- drängt und gleitet in die Tasche an HasR. Dies passiert spontan. In dieser Arbeit wurde ein künst- lich generierter Anfangszustand verwendet, mit dem Rezeptor alleine und einem HasA etwa 40 Å oberhalb der Kontaktäche platziert. Ein künstliches Zusatzpotential

(7)

wurde verwendet um das HasA näher an das HasR heranzuziehen ohne dabei den Reak- tionsweg zu beeinussen. Es soll hier lediglich sicherstellen, dass HasA nicht davon dif- fundiert. Während der Annäherung interagiert das negativ geladene HasA mit einigen Argininen am Rand der Kontaktäche und rotiert etwas in Relation zu HasR. Ab hier durchläuft die Trajektorie alle von Kristallstrukturen her bekannten Konformationen, was eine hervorragende Untermauerung der Resultate darstellt.

Diese Arbeit beginnt mit einer Einleitung in atomistische Simulationsmethoden im Allgemeinen und MD im Speziellen. An dieses Kapitel schliesst sich eine Beschreibung des Systems an. Drei Kapitel mit den eigentlichen Experimenten folgen. Das Erste (Kapitel 5), und auch das älteste, stellt den eigentlichen Einstieg in die Experimente dar.

Es beschäftigt sich mit der Frage wie das Häm von HasA auf HasR übertragen wird.

Im Laufe der Experiment wurde klar, dass dieses Übergeben kein unabhängiger Prozess ist, sondern ein Teil der Komplexbildung sein muss. Deshalb habe ich daraufhin die Komplexbildung als solche simuliert. Alle dazu gehörenden Experimente sind in Kapitel 6 beschrieben. Thematisch etwas isoliert ist Kapitel 7. Hier werden meine vergeblichen Versuche beschrieben, bereits publizierte Experimente des Transports des Häms durch die Membran nachzuvollziehen, und die Schlüsse die ich daraus gezogen habe.

(8)

The work presented here is focused on molecular dynamics studies on the outer membrane receptor HasR from Serratia marcescens. S. marcescens is an opportunistic pathogen which has become one of the prevalent organisms causing hospital-acquired infections.

One of its main virulence factors is its ability to acquire iron from its host, primarily in the form of heme. The receptor HasR can take up free heme or heme bound to the hemophore HasA, a soluble heme-binding protein secreted from S. marcescens. HasA can acquire heme from hemoglobin or hemoglobin-haptoglobin. The transport through the outer membrane requires energy which is provided by the inner membrane protein HasB.

Molecular dynamics simulation is an all-atom simulation method that allows computation of molecular behavior over time. The forces acting on each individual atom are computed using so-called force eld descriptors. This force eld describes the molecule as atoms connected by springs, implicitly involving the electronic eects of each bond and non bonded interaction. From this force the acceleration acting on each atom can be computed and, using a short time step, usually in the range of 0.5 to 2 fs, a series of sequent conformations is achieved following the trajectory through phase space.

This method of course needs a starting structure. For the HasA-HasR complex several structures are available showing dierent steps of the reaction path as well as an NMR- structure of HasA alone and a crystal structure of heme bound to HasA. For the complex of HasA-HasR and heme several structures are available: a wild type structure with HasA and HasR forming a complex and the heme is bound to HasR, a I671G point mutation where the heme transfer from HasA to HasR is somehow prevented and the heme is still bound to HasA, and a double mutant where HasA is trapped in an early stage of complex formation. A structure of HasR alone has not yet been solved. My endeavors to generate one in silico can be found in chapter8.3.

The reaction mechanism is proposed as follows: HasA with a bound heme binds to HasR. In the process the A1 loop of HasA is being pushed away from the heme due to a steric clash with HasR. The coordination of the heme is thereby weakened and heme is transferred to HasR spontaneously upon binding.

In this work an articially constructed start conguration is being used for articially biasing MD consisting of a relaxed receptor and HasA hovering 40 Å above the binding site. An articial potential draws HasA closer to HasR, basically preventing HasA from diusing away and exploring uninteresting areas of phase space. During the approach HasA starts to interact electrostatically with a number of arginines near the edge of the interface area and rotates slightly with respect to HasR. From this point the complex formation does require almost no additional force. The trajectory passes all congurations

(9)

known from crystal structures, which supports excellently the result of the simulation.

(10)

dynamic molecules

Computer simulation methods have become a powerful tool to solve many-body problems in particle physics¹, theoretical chemistry² and structural biology³. Although both the theoretical description of complex systems and the experimental techniques for detailed microscopic information are rather well developed it is often possible to study specic aspects of those systems in great detail only via simulation. It has been stated that the eld of computer simulations has developed into a very important branch of science, which on the one hand helps theorists and experimentalists to go beyond their inherent limitations. On the other hand it has also become a separate and independent scientic eld, carrying its own strengths and limitations. Therefore, simulation science has often been called the third pillar of science, complementing theory and experiment.

The traditional simulation methods for many-body systems can be divided into two classes, i.e. stochastic and deterministic simulations, which are largely represented by the Monte Carlo (MC) method^1,4 and the molecular dynamics (MD) method^5,6, respectively.

Monte Carlo simulations probe the conguration space by trial moves of particles. Within the so-called Metropolis algorithm, the energy change from step n to n+ 1 is used as a trigger to accept or reject a new conguration. Paths towards lower energy are always accepted, those to higher energy are accepted with a probability governed by Boltzmann statistics. This algorithm ensures the correct limiting distribution and properties of a given system can be calculated by averaging over all Monte Carlo moves within a given statistical ensemble (where one move means that every degree of freedom is probed once on average). In contrast, MD methods are governed by the system Hamiltonian and consequently Newtons's equations of motion which can be expressed in the Hamiltonian form⁷⁹ as:

˙

p_i =−δH

δq˙_i , q˙_i =−δH

δp˙_i (3.1)

with the Hamiltonian operator (Hamiltonian) H describing the energy of the system, the atomic positions a and the monenta p. These are integrated to move particles to new positions and to assign new velocities at these new positions. This is an advantage of MD simulations with respect to MC, since not only the conguration space is probed but the system has at least the possibility to probe the whole phase space which gives additional information about the dynamics of the system. Both methods are complementary in nature but they lead to the same averages of static quantities, given that the system

(11)

under consideration is ergodic and the same statistical ensemble is used.

Crucial for simulations are specic input parameters that characterize the system in question, and which come either from theoretical considerations or are provided by hands- down experiments. Having fully characterized a system of interest in terms of model parameters, simulations are often used both to investigate theoretical models beyond certain approximations and to provide a hint to experimentalists for further investigations.

In the case of big experimental facilities it is often even desired to predict and also required to prove the potential outcome of an experiment using simulations. It is simply more cost eective to investigate e.g. a novel pharmacon in silico than in vitro. In order to characterize a given system and to simulate its complex behavior, a model for interactions between system constituents is required. This model has to be tested against experimental results, i.e. it should reproduce or approximate experimental ndings like distribution functions or phase diagrams, and theoretical constraints, i.e. it should obey certain fundamental or limiting laws like energy or momentum conservation.

For a MD investigation these three main aspects are necessary: (i) A model for the interaction between system constituents (atoms, molecules, surfaces etc.) is needed. Often it is assumed that particles interact only pairwise, which is exact e.g. for particles with xed partial charges. This assumption greatly reduces the computational eort and the work required for implementation of the model into the program. (ii) An integrator is needed, which propagates particle positions and velocities from time t to t+δt. It is a nite dierence scheme which propagates trajectories discretely in time. The time step δt has to be chosen properly to guarantee stability of the integrator, i.e. there should be no drift in the system's energy. (iii) A statistical ensemble has to be chosen, where thermodynamic quantities like pressure, temperature or the number of particles are controlled. The natural choice of an ensemble in MD simulations is the micro-canonical ensemble keeping the amount of substance, the box volume and the energy content constant (NVE). Never- theless, there are extensions to the sampling procedure which also allows the simulation of dierent statistical ensembles.

These steps form the essential framework for an MD simulation. Having this information at hand, in the perfect case it is possible to obtain exact results within numerical precision. Results, however, are only correct with respect to the model which enters into the simulation. If the simulation results dier from real system properties or if they are incompatible with solid theoretical manifestations, the model has to be rened. This procedure of iterative renement and improvements can be understood as an adaptive process which nally leads to an approximation of a model of the real world at least for certain properties. The model itself may be constructed from plausible considerations, where parameters are chosen from neutron diraction or NMR measurements. It may also result from rst principle ab-initio calculations. Although the electronic distribution of the particles is calculated very accurately, this type of model building contains also some approximations, since many-body interactions are mostly neglected (this would increase the parameter space in the model calculation enormously). However, it often provides a good starting point for a realistic model.

(12)

An important issue of simulation studies are the accessible time- and length scales which can be covered by particle simulations. At this point it is important to consider dierent concepts of particles:

• Ab-initio models: particles are described consistently together with the electronic structure. Interactions are not restricted to a xed force eld description but the interactions are dependent on the conguration of particles in the system. The calculation of interactions is therefore a many-body problem. Depending on the approximations used for the solution of electronic congurations, the computational complexity is large.

• All-atom model: molecules are resolved on an atomistic level and each atom is represented with a force-eld description. Molecular degrees of freedom may contain exible bonds or rigid body constraints.¹²

• Unied-atom model: molecules are again described on an atomistic level, but some intra-molecular atom-groups are described as single interaction center. This technique is often applied in case of light atoms, e.g. hydrogens, in order to allow a larger timestep in the equations of motion.

• Unied-molecule model: whole molecules or even groups of molecules are considered as one particle. Interactions are e.g. described by multipole moments, introducing a non-isotropic interaction between molecules. This description is used in the simulation e.g. of liquid crystals¹³ or polar liquids^14,15.

• Coarse-grain eld representation: pseudo-particles represent large groups of actual particles, which exhibit dynamics on mesoscopic time- and length-scales.

Fluctuations in particle motion thereby reect thermal behavior. Interactions are modeled such as to represent macroscopic behavior, governed by Navier-Stokes equations, as a limiting case.

Since this work is primarily focused on bio-molecular systems in atomic detail, this introduction focuses on all-atom simulation and force-eld descriptors.

Classical molecular dynamics methods are nowadays applied to a huge class of problems, e.g. properties of liquids, defects in solids, fracture, surface properties, friction, molecular clusters, polyelectrolytes and biomolecules. Due to the large area of applica- bility, simulation codes for molecular dynamics were developed by many groups. On the Internet homepage of the Collaborative Computational Project No.5 (CCP5)¹⁶a number of computer codes are assembled for condensed phase dynamics. During the last years several programs were designed for parallel computers. Among them and partly available free of charge, are, e.g., Amber/Sander¹⁷, CHARMM¹⁸, NAMD¹⁹, GROMACS^? and Desmond²⁰.

Although with the development of massively parallel architectures and highly scalable molecular dynamics codes it has become feasible to extend the time and length scales

(13)

to relatively large scales²¹²⁴, simulation of many processes are still beyond technical feasibilities. In addition, the time and eort for running these simulations is enormous and it is certainly still far beyond standard. A way out of this dilemma is the invention of new methodological approaches. In recent years the introduction of articially biasing potentials acting as an additional degree of freedom driving the simulation towards a desired rare event has been successfully embedded in the eld.

3.1 Phase Space

A fully classical system, i.e. one that is void of every relativistic aspect, can be described completely by the positions and momenta of the particles contained. Phase space is a 6N dimensional space, with 3 positional coordinates and 3 coordinates specifying the momentum for each of the N particles. At any instant the system occupies one point in space

X = (p, q) (3.2)

with q being the coordinates and p the momenta:

p= (p_x1, p_y1, p_z1, p_x2, p_y2, p_z2,· · ·) (3.3)

q= (x1, y1, z1, x2, y2, z3,· · ·) (3.4) The space the system could technically visit is vast, but large areas are naturally only accessible with very small possibility, such as areas were the momentum would cause the system to y apart or the coordinates place two particles closer than their respective VdW-radius. Addressing the remaining space analytically is still out of the question.

A system that is not stationary like a pendulum in its resting position, always shows an evolution of some sort. This can be described as a trajectory through the phase space.

Assuming that no external force acts on it, this trajectory is energy conservant. Theoret- ically, knowledge of one coordinate in phase space allows to determine the whole energy conserving trajectory of the system. The argument behind this is that any coordinate determines its successor, which is a simplication of course as a trajectory is not discrete but continuous.

Since the Hamiltonian is time independent, following this argumentation it is also possible to determine the trajectory leading to this point.

In Molecular Dynamics the propagation through Phase Space is calculated at discrete intervals, with a certain time step δt. The fastest motion in a typical system is the vibration between a heavy atom and a hydrogen with a frequency of 10^-14s. Except when additional methods are applied during the simulation that allow a wider time step (e.g.

SHAKE), aδtof 0.1-0.2 fs should not be exceeded. Still the next iteration of the trajectory is unlikely to end up precisely on the real trajectory but a little o. The larger the time step the larger this error will grow.

(14)

Figure 3.1: The fundamental aspects of force elds are displayed. Covalently bound atoms (a) and angles formed by three atoms (b) are controlled by harmonic potentials. Rotable bonds (c) are determined by periodic functions.

3.2 Force Field

The force eld allows the computation of the energy required for distorting a molecule in a specic fashion. It is usually written as a summation of terms each describing a specic aspect of the total energy:

E_{F F} =E_bond+E_angle+E_dihe+E_vdw +E_elec(+E_cross) (3.5) E_bond is the energy required for stretching a bond between two atoms, E_angle represents the energy required for bending an angle, Edihe is the torsional energy for rotating around a bond. Evdw and Eelec describe non-bonded interactions. The term Ecross is a correction term for coupling between the bonded interactions. The necessity of this term depends on the parameterization strategy of the force eld and is therefore not always included.

In fact the most commonly used forceelds for protein and DNA simulation (Charmm, AMBER and OPLS) do not taker it into account^12,25.

3.2.1 Bond Energy

Ebond is the energy required for stretching or shortening a covalent bond between two atom A and B (see g. 3.1a). This term only describes the eects of the orbitals forming the bond. Other electronic orbitals, such as the globular s-orbitals are not inuencing this term. In its simplest form it can be described as a Taylor expansion around an equilibrium bond length R0. Terminated at second order this looks like:

E_bond R^AB−R^AB₀

=E(0) + dE

dR R^AB −R^AB₀ +1

2 d²E

dR² R^AB −R^AB₀ 2

(3.6) E(0) is set to zero since it represents the equilibrium bond-length, contributing no energy. The derivatives are evaluated at R=R(0), therefore, the rst order term is also zero. Thus the energy term can be simplied to

(15)

Figure 3.2: A) The C-H bond energy is plotted for a polynomial of second and forth order, the Morse correction and the correct values versus the displacement. B) Only the biological relevant area up to 40 kJ/mol is displayed.

E_bond R^AB−R^AB₀

=kÂB RÂB−RÂB₀ 2

=k^AB ∆R^AB2

(3.7) with k^AB as the "force constant" for the AB bond. This is the form of a harmonic oscillator, with the potential being quadratic in the displacement from the minimum.

This harmonic form (see g. 3.1a) is the simplest possible and sucient for determining most equilibrium geometries and it is sucient for biomolecular simulation.

There are certainly strained and crowded systems where the results from a harmonic approximation are signicantly dierent from experimental values, and if the force eld should be able to reproduce features such as vibrational frequencies, the functional form E_bond must be improved. The straightforward approach is to include more terms in the Taylor expansion:

E_bond ∆R^AB

=k₂^AB ∆R^AB2

+k₃^AB ∆R^AB3

+k₄^AB ∆R^AB4

+· · · (3.8) Of course this has a price: more parameters have to be assigned^26,27.

Polynomial expansions of the stretch energy do not have the correct limiting behavior (see g. 3.2a). Third order polynomials go towards -∞ for negative values. Therefore, the cubic anharmonicity constant k3 is normally negative, and if the Taylor expansion is terminated at third order, the energy will go toward -∞ for long bond lengths. Mini- mization of the energy with such an expression can cause a molecule to implode if a poor starting structure is chosen. The quartic constant k4 is normally positive, and the energy will go towards +∞ for long bond lengths if the Taylor series is terminated at fourth order. The correct limiting behavior for a bond stretched to innity, however, is that the

(16)

energy should converge towards the dissociation energy. A simple function satisfying this criterion is the Morse potential^28,29

E ∆R^AB

=D 1−e^−α∆R2

α= r k

2D

(3.9) with D as the dissociation constant and α related to the force constant. The Morse function reproduces the actual behavior quite accurately over a wide range of distances²⁸. There are however some diculties with the Morse potential in actual applications. For long bond lengths the restoring potential is quite small. Distorted structures may either be a poor starting geometry or developing during a simulation. These will therefore display a slow convergence towards the equilibrium bond length. For minimization purposes and simulation at ambient temperatures it is sucient that the potential is reasonably accurate up to 40 kJ/mol above the minimum (see g. 3.2b). In this range there is little dierence between a Morse potential and a Taylor expansion, and most force elds therefore employ a simple polynomial for the bond energy. The number of parameters is often reduced by taking the cubic, quartic, etc., constants as a predetermined fraction of the harmonic force constant. A popular method is to demand that the nth-order derivative at R₀ matches the corresponding derivative of the Morse potential. For a fourth-order expansion this leads to

E_bond ∆R^AB

=k^AB₂ ∆R^AB2

1−α ∆R^AB + 7

12α ∆R^AB2

(3.10) Thisα is the same as in the Morse function but can be seen as a tting parameter. An alternative method for introducing anharmonicity is to use the harmonic form in eq.(3.7) but allow the force constant to depend on the bond distance³⁰.

3.2.2 Angle Energy

Eangle is the energy required for bending an angle formed by three atoms A-B-C, with a bond between A and B, and between B and C (see g. 3.1b). The argumentation is similar to E_bond: E_angle is usually expanded as a Taylor series around a `natural' bond angle and terminated at second order, giving the harmonic approximation

E_angle Θ^ABC−Θ^ABC₀

=kÂBC ΘÂBC−ΘÂBC₀ 2

(3.11) While the simple harmonic expansion is adequate for most applications, there may be cases where higher accuracy is required. The next logical improvement is to include a third order term, analogous to Ebond. This can give a very good description over a large range of angles³⁰. Higher order terms are often included in order to reproduce higher order frequencies. Analogous to E_bond, force constants are often taken as xed fraction of the harmonic constant. The constants beyond third order can rarely be assigned values

(17)

with high condence due to insucient experimental information. Fixing the higher order constant in terms of the harmonic, of course, reduces the quality of the t. While a third order polynomial is capable of reproducing the experimental ndings very well if the cubic term is tted independently, the assumption that it is a xed fraction of the harmonic constant deteriorates the t, but it still represents an improvement relative to a simple harmonic approximation.

In the chemically important region below 40kJ/mol above the bottom of the energy curve, a second order expansion is normally sucient.

Angles, where the central atom is di- or trivalent present a special problem. In these cases, an angle of 180^◦ corresponds to a maximum, i.e. the derivative of the energy with respect to the angle should be zero and the second derivative should be negative. This may be enforced by suitable boundary conditions on Taylor expansions of at least third order. A third order polynomial xes the barrier for linearity in terms of the harmonic force constant and equilibrium angle (∆E⁶⁼=k(Θ−Θ₀)²/6). A fourth order polynomial enables an independent t of the barrier to linearity, but such constrained polynomial ttings are rarely done. Instead, the bending function is taken to be identical for all atom types, for example a forth order polynomial with cubic and quartic constants as a xed fraction of the harmonic constant³¹.

Another special case is the formation of a small ring, owing to the very dierent equilibrium angles for such rings. In cyclopropane, for example, the carbons are formally sp³-hybridized, but have equilibrium CCC angles of 60^◦, in contrast to 110^◦ in an acyclic system. With a low order polynomial for the angle energy, the energy cost for such a deformation is large. For cyclobutane, for example, E_angle will dominate the total energy and cause the calculated structure to be planar, in contrast to the puckered geometry found experimentally.

For each combination of three atoms A, B and C, there are at least two bending parameters to be determined, k^ABC and Θ^ABC₀ .

Out-of-plane bending

If the central atom B in the angle ABC is sp²-hybridized, there is a signicant energy penalty associated with making the center pyramidal, since the four atoms prefer to be located in a plane. If the four atoms are exactly in a plane, the sum of the three angles with B as the central atom, should be exactly 360^◦, however, a quite large pyramidalisation may be achieved without seriously distorting any of these three angles. Increasing the bond distance to 1.5 Å and moving the central atom 0.2 Å out of the plane, only reduces the angle sum to 354.8^◦, only 1.7^◦ per angle. The corresponding out of plane angle, χ, is in this case 7.7^◦. Very large force constants must be used if the ABC, ABD and BCD angle distortions are to reect the energy cost associated with pyramidalisation.

The consequence is, that the in-plane angle deformations for a planar structure becomes unrealistically sti. Therefore, a special out-of-plane energy bending term (Eoop) is usually added, while the in-plane angles (ABC, ABD and BCD) are treated normally. Eoop may

(18)

be written as a harmonic term in the angle χ, with the equilibrium angle for a planar structure being zero, or as a quadratic function in the distance d, as given in

E_oop(χ) =k^Bχ² or E_oop(d) =k^Bd² (3.12) Such energy terms may also be used for increasing the inversion barrier in sp³-hybridized atoms, and Eoop is also sometimes called EINV. Inversion barriers are in most cases adequately modeled without an explicit EINV term, the barrier arising naturally from the increase in bond angles upon inversion. The energy cost for non-planarity of sp²- hybridized atoms may also be accounted for by `improper' torsional angles.

For each sp²-hybridized atom there is one additional out-of-plane force constant to be determined, k^B.

3.2.3 Torsional Energy

With the previously mentioned potentials it is not possible to predict the energetic contributions of rotation around a single bond (see g. 3.1c). In the widely used model system ethane a C-C bond provides a rotational axis for two methyl groups.

The explanation for this `Bermuda triangle of electronic theory'³² however has not been found yet^33,34. Explanations include the relief of steric congestion and optimal achievement of resonance stability³⁴. The latter quantum-mechanical approach represents a newer hypothesis. For a long time repulsive interactions due to overlapping bond orbitals have been considered as primary reason for rotational energy barriers³⁵.

In ethane the torsional energy is highest when the methyl groups are closest together, as it happens in the eclipsed form, and smallest when the methyl groups are farthest away, as in the staggered form. Without a torsional potential that simulates these quantum eects the eclipsed form of ethane is found to be higher in energy than the staggered form, however not suciently higher to explain the observed staggered preference³⁶.

Accounting for rotational contributions accurately in atomic sequences is important since rotational eects aect biological processes. Such torsional parameters are obtained by analyzing the energetic prole of model systems as a function of the exible torsional angle. The energy dierences between rotational isomers are used to parameterize appropriate torsional potentials.

The general function used for each exible torsion angle τ has the form E(τ) =X

n

V_n

2 [1±cos(nτ)] (3.13)

whereτ is the torsional angle, n is an integer denoting the periodicity of the rotational barrier and Vn is the associated barrier height. The value of n depends of course on the atoms involved but also (and mainly) on the force eld parameterization. Typical values for n are in the range 1 to 4 (e.g. OPLS and Amber), the CHARMM force eld uses values up to 6³⁷.

(19)

Values of barrier heights and periodicity are derived from experimental data of spec- troscopic experiments (NMR, IR, Raman,...) on low molecular weight compounds. As mentioned above, potential rotational barriers arise from exchange interactions of electrons. Therefore, the barriers are assumed to have similar characters in similar orbital compositions. This theory has allowed tabulations of barrier heights as class averages³⁸⁴¹, and solves the problem that rotational barriers in proteins and DNA are not accessible experimentally but can be estimated from low molecular weight compounds.

A threefold torsional potential has three minima 120^◦ apart and three maxima with a 60^◦ oset regarding the minima. Ethane has a simple threefold torsional energy prole.

All the maxima are energetically equivalent and corresponding to the eclipsed form and all the minima correspond to the staggered form. n-Butane has the same symmetric orientation regarding the maxima and minima, but the barrier heights are not equal anymore.

In the CHARMM^37,42and AMBER⁴³ force eld theses values are highly environment dependent, and since several torsional potentials can be used for a single rotational bond these potentials have to be weighed proportionally.

Except when stated otherwise, the argumentation and mathematical formulas are adapted from^2,44.

3.2.4 van der Waals Potential

In contrast to the previously mentioned potentials the van der Waals potential is the rst of two potentials dealing with noncovalently bound atoms. It describes the repulsive or attractive forces acting between two atoms and may be interpreted as the non-polar part of the interaction that is not dependent on static energy due to atomic charges. E_vdW is very repulsive for short inter-atomic distances. It mimics the quantum mechanical eect of two electron orbitals overlapping each other as the negatively charged electrons repel each other. E_vdW is zero for large inter-atomic distances. At intermediate distances there is a weak attractive interaction due to induced dipole-dipole-interactions. There are of course also contributions from induced dipole-quadrupole and quadrupole-quadrupole- interactions, etc., which vary with R^-8 and R^-10 describing the distance dependence of the potential with R^-6 describes the asymptotic behavior suciently enough. It is usually referred to as the `Hill-' or `Buckingham'-Potential:

E_Hill(R) = Ae^−BR− C

R⁶ (3.14)

with A, B and C being suitable atom type dependent constants.

It is not possible to derive the exact formula for the repulsive behavior theoretically.

It is only required that it goes towards zero as R goes to innity, and faster than the R^-6-term. A popular formula that obeys these inquiries is the Lennard-Jones potential.

The repulsive part is given by an R^-12 dependence.

(20)

R_LJ =

"

R₀ R

6

−2 R₀

R 12#

(3.15) Here R₀ is the minimum energetic distance and is the well depth of the minimum. It has also been shown that an exponent of 9 or 10 gives better results than 12. The latter is chosen simply for computational reasons, since the value for the repulsive term is simply a square of the attractive term.

3.2.5 Electrostatic Potential

Long range interactions essentially require to take all particle pairs into account for a proper treatment of interactions. This may become a problem, if periodic boundary conditions are imposed to the system, i.e. formally simulating an innite number of particles (no explicit-boundaries imply innite extend of the system). Therefore one has to devise special techniques to treat this situation. On the other hand, one has to apply also fast techniques to overcome the inherentO(N²)complexity of the problem, since for large numbers of particles this would imply an intractable computational bottleneck.

The two most generally used methods for biological simulations are the Particle Mesh Ewald^45,46 (PME) and the Particle-Particle Particle-Mesh^47,48,48(P3M) methods. Both are grid based methods that make extended use of the so called Ewald summation⁴⁹⁵¹: The Ewald summation

The Ewald summation method originates from crystal physics, where the problem was to determine the Madelung constant⁵², describing a factor for an eective electrostatic energy in a perfect periodic crystal. Considering the electrostatic energy of a system of N particles in a cubic box and imposing periodic boundary conditions, leads to an equivalent problem. At position r_i of particle i, the electrostatic potential, φ(r_i), can be written down as a lattice sum

φ(r_i) =X

n

X

i,j

q_i

||r_ij+nL||; with i 6=j ∀ ||n||= 0 (3.16) where n = (n_x, n_y, n_z), n_α ∈ Z is a vector along Cartesian coordinates and L is the length of the simulation box. Equation 3.16 is conditionally convergent, i.e. the result of the outcome depends on the order of summation. Also the sum extends over an innite number of lattice vectors, which means that one has to modify the procedure in order to get an absolute convergent sum and to get a fast convergence. The original method of Ewald consisted of introducing a convergence factor e^−ns, which makes the sum absolute convergent; then transforming it into dierent fast converging terms and then putting s in the convergence factor to zero. The nal result of the calculation can be easier understood from a physical picture. If every charge in the system is screened by a counter charge (of opposite sign), which is smeared out, then the potential of this composite charge

(21)

distribution becomes short ranged, is similar to electrolytic solutions, where ionic charges are screened by counter charges - the result being an exponentially decaying function, the Debye potential⁵³. In order to compensate for the added charge distribution it has to be subtracted again. The far eld of a localized charge distribution is, however, again a Coulomb potential. Therefore, this term will be long ranging. Nothing would be gained if one would simply sum up these dierent terms. The eciency gain shows up, when one calculates the short range interactions as direct particle-particle contributions in real space, while summing up the long range part of the smeared charge cloud in reciprocal Fourier space. Choosing a Gaussian charge cloud of half width1/αas the smeared charge distribution the corresponding expression for the energy becomes

φ(r_i) = X

n

X

i,j

q_ierf c(α||rij+nL||)

||rij+nL||

+4π L³

X

k6=0 N

X

j1

q_i

||k||²e^−||k||²^/4α²e^ikr^ij−q_i 2α

√π (3.17) withi6=j ∀ ||n||= 0. The last term corresponds to a self energy contribution which has to be subtracted, as it is considered in the Fourier part. Equation 3.17 is an exact equivalent of Equation 3.16, with the dierence that it is an absolute converging expression.

Therefore, nothing would be gained without further approximation. Since the complementary error function can be approximated for large arguments by a Gaussian function and the k-space parts decreases like a Gaussian, both terms can be approximated by stopping the sums at a certain lattice vectorn and a maximalk-valuek_max. The choice of these parameters depend on the error, = (−p)², which are acceptable for the experiment.

By setting the error tolerancepand choosing the width of the counter charge distribution the result can either be solved iteratively or if one is only interested in an approximate estimate for the error, i.e. neglecting logarithmic terms one gets analytical terms for p and k_max.

Using this error estimate and also introducing execution times that are spent for the real and reciprocal-space part it is possible to show that these parameters can be chosen in such a fashion to get a complexity of O(N²/2)for the Ewald sum^52,54.

The present form of the Ewald sum gives an exact representation of the potential energy of point like charges in a system with periodic boundary conditions. Sometimes the charge distribution in a molecule is approximated by a point dipole or higher multipole moments.

Sometimes experimental setups require a more general form of the Ewald sum, taking into account arbitrary point multipoles^55,56 or electronic polarizabilities⁵⁷.

Particle Mesh Ewald

The PME⁴⁵ method is the most commonly used method for computing coulombic interactions. It uses the Ewald Summation to split the interactions into `near eld -' and `far eld contributions'⁵⁸⁶⁰. In the near eld the particles interact directly, see g 3.3A. In the far eld the particle interacts with a grid-point. The charges of its neighboring atoms

(22)

Figure 3.3: Examples for Grid-based particle interactions: A) Direct particle particle interaction B) particles interacting with a grid-point, the explicit charges got mapped onto it C) A grid-point interacting with particles D) Two grid-points interacting.

were mapped onto this point, see g. 3.3C. A gaussian distribution for the screening charges is used.

E`r =X

k

Φ˜`r(k)|˜ρ(k)|² (3.18) withΦ˜ andρ˜representing the Fourier transforms of potential and charge density. Since both summations converge quickly in their respective spaces (real and Fourier), they may be truncated with little loss of accuracy and great improvement in required computational time. This makes the PME-method best suitable for large or periodic systems with a constant density throughout the system. For systems with large density changes the Fast Multipole Method by Greengard and Rokhlin is better suited⁶¹.

Further speedup can be achieved by using the fast Fourier transformations. As this is a grid based method, the charges somehow have to get mapped onto the grid-points. The original PME method uses Lagrange interpolation for charge interpolation⁴⁵. A newer development, the smooth PME (sPME) uses B-splines which is smoother and allows higher accuracy by simply increasing the order of interpolation⁴⁶.

Particle-Particle Particle-Mesh Methods

The P3M Method, developed by R. Hockney and J. Eastwood 1981 and subsequently further developed^62,63 is very similar to the PME method. The forces are also divided into short range parts, that are only nonzero within a cuto and long range forces, that are smooth and can be mapped onto a grid.

One signicant dierence is the shape of the standard charge distribution function.

While PME uses a Gaussian, P3M uses a sphere with uniform decreasing density, the S2 distribution:

λ(r) = ₄₈

πa⁴ a 2 −r

r < ^a₂

0 otherwise (3.19)

a is a parameter that adjusts the S2-function. The short range potential between two particles is also evaluated using the S2-function, in contrast to the coulombic calculations

(23)

of the PME method.

The long range interactions are evaluated using Fourier transform on the density and charge distribution functions. To compute the long range potential the charges are mapped onto an equispaced three dimensional grid. This is done by applying an assignment function of choice onto the charge distribution λ to derive a slightly dierent charge distribution ρ. The choice of the assignment function depends on whether speed is desired or accuracy. A Fourier transform is used to derive ρ˜:

F˜(k) = ˜ρ(k) ˜Φ(k) (3.20) and subsequently F(x) is evaluated at the grid points by inverse Fourier transform.

The grid-dened electrostatic forces can be obtained by dierentiating the potentials numerically. Several numerical dierencing techniques can be used such as 4-point central dierencing⁶⁴. Interpolating the potentials from the grid to particle locations is done by using the same assignment function.

Multigrid Methods

Figure 3.4: multigrid A novel and computationally very rewarding approach for com-

puting long range interactions are the multigrid methods. The near eld interactions are computed as described in the PME section, the far eld interaction is also an interaction with charges mapped onto an grid. The main idea is to use several dierent grids with dierent mesh sizes (usually potentials of 2). So at the stage of the program where the charges are mapped onto the grid-points, several grids are lled all with dierent mesh sizes.

When the far eld interaction potential is to be computed, one uses ner grids for closer particles and coarser grids for particles further away. This of course induces a certain error to the calculations, the error is pretty small though, since far away particles have smaller eects^{65? ,66}.

The next step is to use `adaptive 4D multigrids'. Here time-steps can be left out for areas of limited interest, e.g. because the particle density is pretty small there, or because the atomistic knowledge of this area is slim to none (bulk water far away from the solute, etc.)⁶⁷

Except when stated otherwise, the argumentation and mathematical formulas are adapted from^2,68.

3.3 Molecular Dynamics

Molecular dynamics (MD) is a simulation technique that allows to compute the movements of a molecule at distinctive time steps. The use of a force eld enables one to calculate

(24)

possible motions of a molecule from a given set of initial coordinates. For each atom the forces acting on it are calculated using formula 3.5, and the acceleration of this atom is then calculated using Newton's second law of motionF =m·a, which can be written as a dierential

− dV

d~r =md²~r

dt² (3.21)

with dV being the potential energy at position r. With the predened time step, usually 0.5 - 2 ps, the next position is calculated. This is done for all atoms in the system and repeated until a termination criterion is met, e.g. the maximal number of time steps is reached.

3.3.1 Integration Schemes

Widely used methods to integrate the equations of motion are the nite dierences methods, which assume that all dynamic properties can be approximated by Taylor expansions:

~

r(t+δt) =~r(t) +δt~v(t) + 1

2δt²~a(t) + 1

6δt³~b(t) + 1

24δt⁴~c(t)· · ·

~v(t+δt) =~v(t) +δt~a(t) + 1

2δt²~b(t) + 1

6δt³~c(t) + 1

24δt⁴d(t)~ · · ·

~a(t+δt) =~a(t) +δt~b(t) + 1

2δt²~v(t)· · ·

~b(t+δt) =~b(t) +δt~c(t)· · ·

(3.22)

where v is the velocity, a is the second derivative (the acceleration), and so on. A widely used integration scheme is the Verlet integration⁶⁹. It uses the positions and accelerations of this time step (t) and the position from the previous one (t−δt) to calculate the new positions at t+δt:

~

x(t+δt) =~x(t) +~v(t)δt+~a(t)δt²

2 +~b(t)δt³

6 +O(δt⁴),

~

x(t−δt) =~x(t)−~v(t)δt+~a(t)δt²

2 −~b(t)δt³

6 +O(δt⁴)

(3.23)

adding these two equations gives:

~

x(t+δt) = 2~x(t)−~x(t−δt) +~a(t)δt²+O(δt⁴). (3.24) The velocities do not appear natively in this approach. The most common solution is to divide the dierence in position between the last and the next time step by 2δt. The calculation of positions and accelerations is always one step ahead of the velocities. This can be a hindrance sometimes since the velocity is used for kinetic energy calculation which in turn is necessary for state functions like temperature and pressure. This means that

(25)

temperature or pressure regulation of the system can only be performed after calculation of the next time step.

Countless other integration schemes have been developed and are used in simulation programs, such as the velocity Verlet⁷⁰, the leap frog integration scheme⁷¹ and the Bee- man algorithm⁷². Although the predictor-corrector methods⁷³ have been shown to be much more precise than the Verlet methods, they show a drift in the overall system energy and are, therefore, hardly ever used⁷⁴.

This chapter contains mainly textbook knowledge. Except when stated otherwise it was quoted from² and⁶⁸.

3.3.2 system setup and equilibration; from pdb to model

A very common way to store structural data of proteins or DNA is the Brookhaven pdb- le. Since it only contains information concerning atom type and Cartesian coordinates the corresponding masses, charges and bond parameters have to be assigned externally before force eld energies can be computed. In addition since most structural information is derived using x-ray crystallography pdb-les usually do not contain protons. These have to be added for completeness. Most simulation programs therefore contain a program for setting up the model. These programs can usually also add solvent like water or acetone, ions for mimicking buer conditions as well as charge balancing and determine the box size and the boundary conditions.

Minimization

Figure 3.5: Steepest Descent

Usually, biological structures contain a certain error concerning bond lengths and angles and sometimes side chain orientations.

Since experimentally derived structures commonly contain small derivations regarding the optimal values, it is common practice to energetically optimize the structure prior to simulation.

The algorithms performing energy minimization work dierent from the MD algorithms. The structure coordinate in phase space is evaluated and a dierent coordinate is determined with a lower overall energy value. The atoms' individual positions are updated. This process is iterated until a convergence criterion is reached, usually the dierence to the prior step or the change in the overall energy value being smaller than a preset threshold or the maximal number of cycles being reached.

Commonly used minimization methods for proteins are the Derivative Methods.

The minimum of a function is dened as the point where the rst derivative is zero and the other derivatives are positive:

(26)

δf δxi

= 0 : δ²f

δx²_i >0 (3.25)

with x_i being the 3N Cartesian (or internal) coordinates of the structure.

Two popular rst-order Minimization Methods are the Steepest Descent (SD) and the Conjugate Gradient (CG) methods. The Steepest Descent Method moves in the direction parallel to the net force, the search direction being dened as d=−g (g = gradient vector). Using a geographical analogy to the phase space, the algorithm walks straight downhill. The algorithm has to stop though as soon as it starts walking uphill again. A simple line search can be used to identify this point, in the multidimensional phase space of a protein a simple comparison with the last step's energy value is computationally less demanding. At this point the algorithm changes course perpendicular to the old one and continues the walk downhill until the energy value starts increasing again. This leads to a characteristic 'zig-zagging ` (see g 3.5). The advantage of steepest descent methods lies in their fast convergence towards the minimum, close to the minimum where the valley attens out the performance becomes worse. The worst possible scenario for SD is a long narrow valley. The asymptotic rate of convergence makes it inferior to other methods.

For that reason it is standard procedure to have the steepest descend part being followed by a Conjugate gradient approach. The search direction is not perpendicular to the previous direction, but constructed to be a conjugate of the previous direction(s) and g.

If the surface is purely quadratic, the conjugate direction criterion guarantees that each successive minimization will not generate gradient components along any of the previous directions, and the minimum is reached within NDim steps. The rst step is always a SD, the last step of the SD section works equally well, of course, and the next direction is constructed as

d_i =−g_i+β_idi−1, d: direction (3.26) There are several ways of choosing the β value, such as the Fletcher-Reeves⁷⁵, the Polak-Ribiere⁷⁶ and Hestenes-Stiefel⁷⁷ formula. Especially the Polak-Ribiere method is often found in software, though the process needs to be restarted often (setting β to zero) to be productive, for example when βk becomes negative.

Heating the system

After building and optimizing the system, one now has to bring the system to the desired operation temperature. The temperature of the system denes the maximal displacement length of each atom. The normal procedure of the simulator when encountering an atom with zero velocity is to assign a random value and direction to the particle. So setting a static molecule to 300K results in a molecule where every atom moves randomly around resulting in huge force eld energies. To prevent this the system is heated up slowly so that coupled atoms start swinging in phase with each other.

(27)

The temperature also has to be maintained during the following simulation steps. It has to be prevented that an imbalance of energy appears, e.g. the solvent being hot and the solute being cool or all the energy building up in one corner of the box which leads to a blow up.

A number of energy coupling algorithms have been developed to maintain a steady temperature. A very early idea is to scale the velocities⁷⁸:

∆T = (λ²−1)T(t), λ =p

Tnew/T(t), T : T emp, λ : couplingf actor (3.27) The Berendsen Thermostat couples the system to an innitely large heat bath, adding or removing thermal energy from the system as appropriate. The velocities are scaled such that the change of temperature is proportional to the dierence between bath and system.

δT(t) δt = 1

τ/T_bath−T(t) (3.28)

τ is the coupling factor that determines how fast temperature changes of the system are balanced. Unfortunately, the Berendsen Thermostat (like all velocity scaling approaches) leads to an articially prolonged temperature equilibration between solute and solvent, also it can be shown that due to numerical instabilities ying ice cubes may appear. Disre- garding these problems the Berendsen Thermostat is the recommended way to change the system's temperature during heating or simulated annealing. The heating step should be suciently long though and might even have a short equilibration step at the end, holding the system at the desired temperature.

The counterpart to Berendsen's heat bath is the microwave like approach of Anderson.

Here, massless pseudo particles are ejected into the simulation box, colliding with the atoms and inducing entropy and thereby thermal noise to the system⁷⁹. Between the collisions, the system is simulated at constant energy, so the overall eect is like a series of micro canonical ensembles each at a slightly dierent energy. This and related approaches are called the stochastic collision methods, since the algorithm mimics this behavior by rescaling individual velocities randomly by a Maxwell-Boltzmann distribution.

Another energy coupling method is the extended system method, originally introduced by Nosé for constant temperature dynamics and subsequently developed by Hoover^80,81. The idea is to consider the thermal reservoir as an internal part of the system. The reservoir is represented as an additional degree of freedom, and the reservoir is assigned a potential energy as a function of the desired temperature, a kinetic energy and a coupling constant. These are evolved parallel to the parameters of the system. In contrast to the aforementioned methods this approach does not articially alter the energy content of the system and therefore produces correct ensembles⁸².

The Langevian Dynamic is widely used in biological simulations for constant temperature simulations. It is an alternative to the Newton dynamics, by adding a random force as well as a friction coecient to the force calculation.

(28)

F −γ =m∗a+F_random (3.29) The random force is typically taken to have a Gaussian distribution with a mean value of zero. γ is in literature sometimes named the collision parameter or the dampening factor.

It has to be chosen carefully to avoid over-dampening of fast vibrational movements^83,84. The Verlet integration scheme has to be altered to include these parameters. A common discretization by Brooks, Brünger and Karplus⁸⁵⁸⁷ is well established for small values of γ.

Density correction

When the system builder lls the simulation box with the solvent, all it does basically is take a pre-equilibrated cube of it and staple it into the box. Whenever one solvent molecule overlaps with the solute, the solvent molecule is deleted. Therefore, small cavities of the solute might not be lled correctly or not at all. This leads to an irregular layer of vacuum around the solute. The same happens at the box boundaries. After the box is lled, ions can be added by replacing solvent molecules. This also causes a distortion since most ions are much smaller than the solute molecules. This leads for a system consisting of protein, water and Na⁺ or Cl^- ions as counter charges to an overall density of 0.75 to 0.85 depending on the system size and water model.

To correct this, a short simulation with constant pressure, usually 1 bar, is performed after the heating step. To maintain pressure in the system, procedures are applied to the simulation that are similar to the temperature coupling algorithms: the Amber modeling package comes with a Berendsen barostat coupling the system to a `pressure bath'. One of the major selling point of the Desmond package is an ecient implementation of the Nosé-Hoover barostat⁸⁸.

One notable point is that this stage does not take part at constant moles, box volume and temperature, also known as the canonical ensemble (NVT), but using the isothermal- isobaric ensemble (NPT) keeping the pressure xed instead of the volume. Since this is a short step in the equilibration process, interesting thermodynamic information is hardly extracted from it, one has to keep in mind though that any method to do so might not deliver correct data. This is to some extend also true for any production runs done in the canonical ensemble. Most methods for computing thermodynamic data from simulations use methods that are developed and tested for the micro-canonical ensemble (NVE). The dierence might be very small, it might happen though that the energy output of the desired reaction is articially dampened by the temperature coupling algorithm^89,90. This is the reason why one usually uses Langevian Dynamics for temperature and pressure coupling during production runs.

(29)

Equilibration

The last stage of the system setup is usually an equilibration run with the same conditions as the following production runs. This resolves small disturbances in the system: cavities in the solute with too many water molecules in it, hydrophobic or hydrophilic patches or residues on the surface surround themselves with a desired amount of water molecules, etc. This stage should be of sucient lengths to avoid for disturbances carry on into the production runs.

3.3.3 Modications

Restraints

It might be desired that some parts of the solute remain roughly where they are and do not make signicant movements during the simulation. These parts can be restrained in motion against a reference model that usually has to contain exactly the same atoms as the simulation model. An additionally degree of freedom is added to each of the atoms involved, usually in the form of quadratic harmonic potentials, moving the atoms back to where they came from. Since the solute can tumble around the simulation box during simulation the reference system has to be superimposed onto the system otherwise the restraining energies would become disproportionally large.

In addition to restraints, some packages traditionally support the possibility to constrain parts of the system in space, by keeping its coordinates absolutely xed. However this is hardly used nowadays.

Steered MD

One popular method to force the protein to move towards an interesting area of the phase space is the steered molecular dynamics simulation (SMD). Here an articially biasing potential is applied along the assumed reaction path, called the reaction coordinate.

Typical applications involve shortening of the distance between non-covalently bound atoms, like dragging a ligand into a binding pocket by bringing atoms of ligand and protein into known distances or changing backbone angles to enforce a conformational change⁹¹⁹³. The inicted distortions are fairly small and easy to handle.

It can be visualized as a spring hooked up the atom of interest. The anchor point of the spring is then moved along the reaction coordinate. The spring constant determines the reaction of the atom. The force acting on the atom has,therefore, a time dependence that looks like a sigmoidal function as the spring only expands until the force detaining the atom becomes smaller than the force of the spring and the atom starts following the spring. The amplitude of the spring allows to compute the energy necessary to distort the protein in the desired fashion and is in this case the easiest way to calculate the Potential of mean force⁹³ (PMF), see below.