system setup and equilibration; from pdb to model

3.3 Molecular Dynamics

3.3.2 system setup and equilibration; from pdb to model

A very common way to store structural data of proteins or DNA is the Brookhaven pdb-le. Since it only contains information concerning atom type and Cartesian coordinates the corresponding masses, charges and bond parameters have to be assigned externally before force eld energies can be computed. In addition since most structural information is derived using x-ray crystallography pdb-les usually do not contain protons. These have to be added for completeness. Most simulation programs therefore contain a program for setting up the model. These programs can usually also add solvent like water or acetone, ions for mimicking buer conditions as well as charge balancing and determine the box size and the boundary conditions.

Minimization

Figure 3.5: Steepest Descent

Usually, biological structures contain a certain error concerning bond lengths and angles and sometimes side chain orientations.

Since experimentally derived structures commonly contain small derivations regarding the optimal values, it is common practice to energetically optimize the structure prior to simulation.

The algorithms performing energy minimization work dierent from the MD algorithms. The structure coordinate in phase space is evaluated and a dierent coordinate is determined with a lower overall energy value. The atoms' individual positions are updated. This process is iterated until a convergence criterion is reached, usually the dierence to the prior step or the change in the overall energy value being smaller than a preset threshold or the maximal number of cycles being reached.

Commonly used minimization methods for proteins are the Derivative Methods.

The minimum of a function is dened as the point where the rst derivative is zero and the other derivatives are positive:

δf δxi

= 0 : δ²f

δx²_i >0 (3.25)

with x_i being the 3N Cartesian (or internal) coordinates of the structure.

Two popular rst-order Minimization Methods are the Steepest Descent (SD) and the Conjugate Gradient (CG) methods. The Steepest Descent Method moves in the direc-tion parallel to the net force, the search direcdirec-tion being dened as d=−g (g = gradient vector). Using a geographical analogy to the phase space, the algorithm walks straight downhill. The algorithm has to stop though as soon as it starts walking uphill again. A simple line search can be used to identify this point, in the multidimensional phase space of a protein a simple comparison with the last step's energy value is computationally less demanding. At this point the algorithm changes course perpendicular to the old one and continues the walk downhill until the energy value starts increasing again. This leads to a characteristic 'zig-zagging ` (see g 3.5). The advantage of steepest descent methods lies in their fast convergence towards the minimum, close to the minimum where the valley attens out the performance becomes worse. The worst possible scenario for SD is a long narrow valley. The asymptotic rate of convergence makes it inferior to other methods.

For that reason it is standard procedure to have the steepest descend part being followed by a Conjugate gradient approach. The search direction is not perpendicular to the previous direction, but constructed to be a conjugate of the previous direction(s) and g.

If the surface is purely quadratic, the conjugate direction criterion guarantees that each successive minimization will not generate gradient components along any of the previous directions, and the minimum is reached within NDim steps. The rst step is always a SD, the last step of the SD section works equally well, of course, and the next direction is constructed as

d_i =−g_i+β_idi−1, d: direction (3.26) There are several ways of choosing the β value, such as the Fletcher-Reeves⁷⁵, the Polak-Ribiere⁷⁶ and Hestenes-Stiefel⁷⁷ formula. Especially the Polak-Ribiere method is often found in software, though the process needs to be restarted often (setting β to zero) to be productive, for example when βk becomes negative.

Heating the system

After building and optimizing the system, one now has to bring the system to the desired operation temperature. The temperature of the system denes the maximal displacement length of each atom. The normal procedure of the simulator when encountering an atom with zero velocity is to assign a random value and direction to the particle. So setting a static molecule to 300K results in a molecule where every atom moves randomly around resulting in huge force eld energies. To prevent this the system is heated up slowly so that coupled atoms start swinging in phase with each other.

The temperature also has to be maintained during the following simulation steps. It has to be prevented that an imbalance of energy appears, e.g. the solvent being hot and the solute being cool or all the energy building up in one corner of the box which leads to a blow up.

A number of energy coupling algorithms have been developed to maintain a steady temperature. A very early idea is to scale the velocities⁷⁸:

∆T = (λ²−1)T(t), λ =p

Tnew/T(t), T : T emp, λ : couplingf actor (3.27) The Berendsen Thermostat couples the system to an innitely large heat bath, adding or removing thermal energy from the system as appropriate. The velocities are scaled such that the change of temperature is proportional to the dierence between bath and system.

δT(t) δt = 1

τ/T_bath−T(t) (3.28)

τ is the coupling factor that determines how fast temperature changes of the system are balanced. Unfortunately, the Berendsen Thermostat (like all velocity scaling approaches) leads to an articially prolonged temperature equilibration between solute and solvent, also it can be shown that due to numerical instabilities ying ice cubes may appear. Disre-garding these problems the Berendsen Thermostat is the recommended way to change the system's temperature during heating or simulated annealing. The heating step should be suciently long though and might even have a short equilibration step at the end, holding the system at the desired temperature.

The counterpart to Berendsen's heat bath is the microwave like approach of Anderson.

Here, massless pseudo particles are ejected into the simulation box, colliding with the atoms and inducing entropy and thereby thermal noise to the system⁷⁹. Between the collisions, the system is simulated at constant energy, so the overall eect is like a series of micro canonical ensembles each at a slightly dierent energy. This and related approaches are called the stochastic collision methods, since the algorithm mimics this behavior by rescaling individual velocities randomly by a Maxwell-Boltzmann distribution.

Another energy coupling method is the extended system method, originally introduced by Nosé for constant temperature dynamics and subsequently developed by Hoover^80,81. The idea is to consider the thermal reservoir as an internal part of the system. The reservoir is represented as an additional degree of freedom, and the reservoir is assigned a potential energy as a function of the desired temperature, a kinetic energy and a coupling constant. These are evolved parallel to the parameters of the system. In contrast to the aforementioned methods this approach does not articially alter the energy content of the system and therefore produces correct ensembles⁸².

The Langevian Dynamic is widely used in biological simulations for constant tempera-ture simulations. It is an alternative to the Newton dynamics, by adding a random force as well as a friction coecient to the force calculation.

F −γ =m∗a+F_random (3.29) The random force is typically taken to have a Gaussian distribution with a mean value of zero. γ is in literature sometimes named the collision parameter or the dampening factor.

It has to be chosen carefully to avoid over-dampening of fast vibrational movements^83,84. The Verlet integration scheme has to be altered to include these parameters. A common discretization by Brooks, Brünger and Karplus⁸⁵⁸⁷ is well established for small values of γ.

Density correction

When the system builder lls the simulation box with the solvent, all it does basically is take a pre-equilibrated cube of it and staple it into the box. Whenever one solvent molecule overlaps with the solute, the solvent molecule is deleted. Therefore, small cavities of the solute might not be lled correctly or not at all. This leads to an irregular layer of vacuum around the solute. The same happens at the box boundaries. After the box is lled, ions can be added by replacing solvent molecules. This also causes a distortion since most ions are much smaller than the solute molecules. This leads for a system consisting of protein, water and Na⁺ or Cl^- ions as counter charges to an overall density of 0.75 to 0.85 depending on the system size and water model.

To correct this, a short simulation with constant pressure, usually 1 bar, is performed after the heating step. To maintain pressure in the system, procedures are applied to the simulation that are similar to the temperature coupling algorithms: the Amber modeling package comes with a Berendsen barostat coupling the system to a `pressure bath'. One of the major selling point of the Desmond package is an ecient implementation of the Nosé-Hoover barostat⁸⁸.

One notable point is that this stage does not take part at constant moles, box volume and temperature, also known as the canonical ensemble (NVT), but using the isothermal-isobaric ensemble (NPT) keeping the pressure xed instead of the volume. Since this is a short step in the equilibration process, interesting thermodynamic information is hardly extracted from it, one has to keep in mind though that any method to do so might not deliver correct data. This is to some extend also true for any production runs done in the canonical ensemble. Most methods for computing thermodynamic data from simulations use methods that are developed and tested for the micro-canonical ensemble (NVE). The dierence might be very small, it might happen though that the energy output of the desired reaction is articially dampened by the temperature coupling algorithm^89,90. This is the reason why one usually uses Langevian Dynamics for temperature and pressure coupling during production runs.

Equilibration

The last stage of the system setup is usually an equilibration run with the same conditions as the following production runs. This resolves small disturbances in the system: cavities in the solute with too many water molecules in it, hydrophobic or hydrophilic patches or residues on the surface surround themselves with a desired amount of water molecules, etc. This stage should be of sucient lengths to avoid for disturbances carry on into the production runs.

3.3.3 Modications

Restraints

It might be desired that some parts of the solute remain roughly where they are and do not make signicant movements during the simulation. These parts can be restrained in motion against a reference model that usually has to contain exactly the same atoms as the simulation model. An additionally degree of freedom is added to each of the atoms involved, usually in the form of quadratic harmonic potentials, moving the atoms back to where they came from. Since the solute can tumble around the simulation box during simulation the reference system has to be superimposed onto the system otherwise the restraining energies would become disproportionally large.

In addition to restraints, some packages traditionally support the possibility to constrain parts of the system in space, by keeping its coordinates absolutely xed. However this is hardly used nowadays.

Steered MD

One popular method to force the protein to move towards an interesting area of the phase space is the steered molecular dynamics simulation (SMD). Here an articially biasing potential is applied along the assumed reaction path, called the reaction coordinate.

Typical applications involve shortening of the distance between non-covalently bound atoms, like dragging a ligand into a binding pocket by bringing atoms of ligand and protein into known distances or changing backbone angles to enforce a conformational change⁹¹⁹³. The inicted distortions are fairly small and easy to handle.

It can be visualized as a spring hooked up the atom of interest. The anchor point of the spring is then moved along the reaction coordinate. The spring constant determines the reaction of the atom. The force acting on the atom has,therefore, a time dependence that looks like a sigmoidal function as the spring only expands until the force detaining the atom becomes smaller than the force of the spring and the atom starts following the spring. The amplitude of the spring allows to compute the energy necessary to distort the protein in the desired fashion and is in this case the easiest way to calculate the Potential of mean force⁹³ (PMF), see below.

The two major realizations that can be found in literature are the `constant force' and the `constant velocity' SMD. While the latter simply moves the anchor through space while holding the predened speed constant and dragging the protein along, the anchor velocity of the former is computed such that the energy inicted by the spring is not larger that a given value^94,95.

Targeted MD

Another method to enforce the desired reaction is the targeted MD⁹⁶ in which a con-straint is employed to reduce the root mean square deviation rmsd from the target by a preset value at each MD step. The TMD method is computationally attractive, since the calculation of the constraint is analytical and scales linearly in time and memory with the number of atoms involved. In eect, the small overhead introduced by the constraint makes the cost of a step in the TMD method nearly identical to that of the unbiased MD method. Moreover, it is a nite temperature method in which explicit solvent can be included, in contrast to some other methods that nd minimum energy paths^97,98. TMD generates also the transition in a single trajectory, whereas some other methods use more than one⁹⁹¹⁰², and TMD permits application of the biasing force to only part of the system, while observing the unbiased response of other parts of interest. In ap-plications of the TMD method some problems have been reported, however: It has been shown that the TMD method does not yield reversible pathways; the sequence of events in the forward and reverse simulation may dier¹⁰². The problem of reversibility was also briey discussed in the original TMD paper⁹⁶. It has been suggested that the method may introduce articial concertedness along the pathway¹⁰³ although no evidence for this was given. Since in TMD the motion is determined not only by the constraint but also by the unperturbed force and velocities, such concertedness does not necessarily occur.

The potential of mean force and the Jarzinsky equation

The goal of most bio-molecular simulations is to identify a structure function relationship.

Usually the reaction path across phase space is identied, e.g. as path from one known conguration to the next, and it is described as the reaction coordinate. This can be used to design an articially biasing potential applied to the simulation usually in the form of SMD or TMD. The potential of mean force (PMF) plays an important role in this process. It is dened as the mean potential energy between two xed sets of atoms over all possible congurations of the other atoms^9,104, and can be seen as the energy prole along the reaction coordinate. However, a biased MD simulation is a nonequilibrium process, since it is not a reversible reaction in which micro states are in equilibrium, and the PMF is an equilibrium property.

A theory that bridges this condition is the Jarzynski's equality¹⁰⁵¹¹⁰. It presents a numerically exact relation between free energy dierences of the states of the system and the work done to the system through non-equilibrium processes. The Jarzynski equation is perfectly suited for the calculation of the PMF from computer experiments, and has

(a) Umbrella Sampling (b) Transition Path

Sam-pling (c) Self Guided Lagevian

Figure 3.6: (a)A schematic representation of Umbrella Sampling along a reaction coordinate.

Here specic snapshots from the reaction path (circles) are rerun using harmonic potentials restraining the simulation. (b) Transition Path Sampling starts two simulations at the transient state or complex and has them run both forwards to the product state with a ∆t > 0 and backwards to the educt state with a t <0. (c) Self guided Langevian dynamics use the already sampled areas of phase space and direct the simulation away from the known area.

been successfully applied¹¹¹¹¹⁵. The use in SMD has been explained in great detail in¹¹⁰, and most MD packages for biomolecules are capable of performing PMF calculations.

A signicant aw remains though: the PMF is dened as an average over all possible congurations of solute and solvent that are not part of the reaction coordinate. And this obviously cannot be computed. Necessarily the number of congurations is much smaller than `all' and, therefore, the computed number is smaller than the actual PMF, asymptotically closing with increasing number of unique congurations.

A direct and computationally intensive approach is to perform the same production run several times. MD is numerically unstable enough that the chaotic behavior of the water has unique inuences on the protein early on. Snapshots from an unbiased simu-lation can be used as starting points for the dierent simusimu-lations. In the end equivalent congurations can be extracted from the individual runs and used for PMF calculation.

Umbrella sampling

Umbrella sampling¹¹⁶ is a dierent approach to sample the reaction path across an energy barrier. MD Simulations rarely sample the space with higher energy values, closer to the transition state. Umbrella sampling therefore biases the sampling towards a specic region of space. In Monte Carlo simulations the standard Boltzmann weighting is replaced by a potential leveling the energy barrier. For protein MD the simulation is restrained with respect to a known structure. By performing a series of simulations with biasing potentials located at dierent positions along the reaction path the PMF prole can be derived.

Sucient knowledge about the reaction path must be given prior to the simulations.

Umbrella sampling can also be used to further rene an already existing reaction path.

Transition Path Sampling

`Throwing Ropes Over Rough Mountain Passes, in the Dark' is the title of one of the reference papers for this method¹¹⁷¹¹⁹. It is used when little is known about the reaction path, but the transition state can be given. The main idea is to start the simulation at the peak and have it nd the way towards to the product state, and also start at the peak and simulate back towards the educt state by using a negative∆t. The method has been shown to nd the correct reaction path across the transition state. It is only implemented in the CHARMM MD package since it is the only one capable of using negative time steps. TPS has been successfully applied to protein simulation. One notable application is the folding experiments of the Trp-cage protein. The energy landscape is very rough, with many peaks and valleys. The simulation would therefore run directly into the next local minimum and stay there. It would need to be restarted from all the peaks along the reaction path, which is just not feasible especially since peaks have to be determined prior to the simulation.

Self-guided Langevian Dynamics

Self guided molecular or Langevian dynamics uses a local average over neighboring or

Im Dokument Structural studies on the outer membrane heme receptor of Serratia marcescens using molecular dynamics simulation (Seite 25-33)