• Keine Ergebnisse gefunden

2.5 Calculation of Protonation Probabilities

2.5.5 Metropolis Monte Carlo

Monte Carlo methods comprise any method that uses statistical sampling. Their proba-bilistic nature is reflected by naming the method after the casinos of Monte Carlo. The potential of computers for statistical sampling in mathematical physics was first recog-nized by Stanislaw Ulam. Together with John von Neumann and Nicholas Metropolis, he developed a formal methodology applicable to a wide variety of problems [132]. The Metropolis Monte Carlo method employed in this thesis provides an approximation of the Boltzmann distribution of the states [133]. A flowchart of the sampling procedure, which is summarized in the following, is presented in Figure 2.7.

THEMETROPOLIS CRITERION

Input of the algorithm is a randomly chosen protonation state with the protonation state energy Gold. Then a random site µ is chosen and its protonation form is changed. The state energy of the new state,Gnew, is determined and the change in energy between old and new state is calculated: ∆G = GnewGold.

The new state is accepted, if ∆G≤0. For the case that ∆G>0, a random number r is chosen with 0≤ r≤ 1 such that allr are evenly distributed between 0 and 1. The new state is accepted, ifexp (−∆G/kBT)≥r. Ifexp (−∆G/kBT)<r, the new state is rejected and the

2.5. Calculation of Protonation Probabilities 39 intitial state

choose random siteµ

change protonation of siteµ;

∆G=Gnew−Gold

∆G≤0 choose randomr

with 0≤r≤1

accept

new state ekB T∆G ≥r keep

old state N O

Y ES

Y ES N O

Figure 2.7. Metropolis Monte Carlo. From an initial random protonation state a random site is chosen and its protonation is changed. The new state is accepted if the energy difference between old and new state is smaller than zero. If this is not the case, the state is still accepted, if the Boltzmann factor of the change in energy is larger than a number randomly chosen between 0 and 1. The Metropolis criterion is iteratively applied until either a certain number of steps is performed or convergence of the sampled properties is achieved.

system remains in the old state. This Metropolis criterion ensures that the sampling of the phase space is biased towards the Boltzmann distribution,i.e., the important region of the phase space.

DOUBLE ANDTRIPLE MONTECARLOMOVES

Metropolis Monte Carlo can approximate the Boltzmann distribution of states only, if the algorithm in principle allows to reach the complete search space. For some systems, regions of the search space may be separated by a high energy barrier that can only be overcome with a very small probability. Then, the system may not or too seldom traverse from one region the other during the Monte Carlo sampling. In such cases, the Boltzmann distribution is not approximated. To prevent this problem for the calculation of protonation probabilities, convergence is ensured by introducing double or triple Monte Carlo moves. In such moves, the protonation form of two or three sites, respectively, is changed simultaneously in one Monte Carlo step. Double moves are introduced, when a pair of sites has an interaction energyWµν above a certain threshold. Equivalently, triple

moves can be introduced, when the interaction energies of siteµ with two other sites ν andγboth lie above a certain threshold.

Considering a pair of sitesµandν for which the states with one proton,i.e.,~xa = (0µ,1ν) and~xb = (1µ,0ν), have a low energy. Furthermore, it shall be assumed that the interme-diate states of the transition from~xato~xborvice versa,i.e.,~xc = (1µ,1ν)and~xd = (0µ,0ν), have a very high energy. Then, starting the Metropolis Monte Carlo sampling from either

~

xa or ~xb, would render the sampling of the other state highly improbable. This sam-pling problem can be overcome by changing the protonation form of both sites µ andν simultaneously in a so-called double move.

DERIVATION OF PROBABILITIES FROM METROPOLIS MONTE CARLO

The Metropolis Monte Carlo method samples a set of states which approximates the Boltzmann distribution of all states. Thus, the protonation probability as well as the correlation coefficient can be approximated employing this method. Assuming Mstates are sampled, the protonation probabilityhxµidefined by Eq. (2.24) can approximately be computed by:

hxµi = 1 M

M

X

i=1

xµ,i . (2.32)

Equivalently, the protonation probabilityhxµxνigiven by Eq. (2.27) is approximated as:

hxµxνi = 1 M

M

X

i=1

xµ,ixν,i , (2.33)

and the probability of protonation substates given by Eq. (2.31) can be approximately computed by:

h~xsubi = 1 M

M

X

i=1

δi(~xsub) . (2.34)

Furthermore, Eq. (2.32) and Eq. (2.33) allow to compute the correlation coefficientcµν of the protonation of two sitesµandν by the formula of Eq. (2.29):

cµν = cov(xµ, xν) σµσν

= hxµxνi − hxµi hxνi q

hxµi − hxµi2

hxνi − hxνi2

. (2.35)

In this work, probabilities are computed utilizing the Metropolis Monte Carlo method and computing the equations given above. The parameters used are detailed in the next chapter.

C HAPTER 3

H IGH -R ESOLUTION R HODOPSIN P ROTEIN

S TRUCTURES AND S TRUCTURE P REPARATION

Could the search for ultimate truth really have revealed so hideous and visceral looking an object?

Max Perutz, The Hemoglobin Molecule

The dynamics of a protein and, therefore, its function, e.g., ligand binding, substrate catalysis or the transmission of signals, is determined by the protein structure. An im-portant step towards understanding the functional mechanism of a protein is, thus, the determination of its three-dimensional structure in atomic detail. Due to the complex-ity and diverscomplex-ity of proteins this remains one of the fundamental challenges in protein science both in experimental and in theoretical approaches. The main goal is to find structure of lowest energy which may then provide insight into the functional mecha-nism of the protein. Furthermore, the protein folding mechamecha-nism which is as yet only partly understood is extensively researched. A theoretical understanding of protein fold-ing may significantly advance further attempts to predict protein structures from amino acid sequences.

In the following sections, theoretical and experimental protein structure determination will be briefly discussed. Thereafter, details of the calculations performed on the ex-perimental X-ray structures of bacteriorhodopsin (BR), halorhodopsin (HR) and sensory rhodopsin II (SRII) are presented. The necessary preparations of the structures used for the calculations will be described in Subsection 3.2.3. Section 3.5 covers the electro-static calculation of the pKintrvalue and the interaction energy matrixWµν (cf. Chapter 2, Subsections 2.4.1 and 2.4.2).

41

3.1 P ROTEIN S TRUCTURE P REDICTION

Due to the difficult and time-consuming task of experimental protein structure determi-nation, major efforts are made to develop and improve theoretical techniques for protein structure prediction. Attempts in de novo protein structure prediction from the amino-acid sequence have met with varying success. For this task, large computational re-sources like those offered by powerful supercomputers or distributed computing, i.e., grid computing, projects are needed.

The world’s largest distributed computing projects is Folding@Home devised by Vijay Pande at Stanford University. In 2009, Folding@home officially reached a performance level above 5 native petaFLOP being the first computing system to do so. Folding@home aims at understanding the dynamics of the protein folding process [134, 135]. Initially the folding of small molecules was studied, for example the folding pathway of the c-terminal β-hairpin from protein G [136]. Since then, the folding of more complex pro-teins has been simulated: for example, the effect of mutations on the tumor suppressor protein p53 [137]. Another distributed computing project is Rosetta@home that was de-veloped by David Baker’s group at the University of Washington [138]. The main goal of Rosetta@home is the prediction of the lowest energy structure of proteins. Rosetta, a knowledge-based force field, is one of the top performing methods forde novoprediction in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experi-ment. One of the hallmarks was the design of a protein called Top7 with a novel sequence and topology [139]. Top7 was experimentally shown to be stable and the X-ray structure closely resembles the design model.

Homology modeling, also termed comparative or knowledge-based modeling, develops a three-dimensional model from a protein sequence based on the structures of homolo-gous proteins [140, 141]. This approach is based on the observation that the three-dimensional structures of homologous proteins are conserved to a greater extent than their primary structures. Furthermore, it appears that the number of tertiary structural motifs is limited. It has been suggested that there are only approximately 2000 distinct protein folds in Nature, though there are many millions of different proteins. However, de-spite the encouraging developments in protein structure prediction, to date high-quality, high-resolution structures suitable for theoretical investigations of the functional mech-anism are still mainly determined by experimental methods.