• Keine Ergebnisse gefunden

3.5 Comparison to Previously Used Energy Functions

3.5.1 Calculations based on Experimental Data

In this work, a state vector ~xi,n (eq. 3.4) was introduced, which allows a varying number of instances k≡xji,n for each site j. Many binding events, however, can be described by two forms, a bound form and an unbound form only. Previous works used a binary state vector, so thatk= 0in the unbound form andk= 1in the bound form. For a protein withNsite,ibinding sites and two forms each, the total number of microstates isNmicro,i= 2Nsite,i. The contribution of a site to the microstate energy was described as an energy difference between bound and unbound form instead of energies associated with each individual instance. Instead of the general description by chemical potentials and energies, specific formulations were used as pHandpKaorEandE [121, 122]. In fact, the program used before (Multiflex), is only thought to be used forpKa calculations. Standard reduction potential calculations have to be done in pKa-units. The user has to keep track of the different ligand types and recalculate energies for subsequent titration calculations. Therefore, onlypKa calculations will be discussed here.

Multiflex allows only relative pKa calculations (relative to a model compound, section 3.4) in a two dielectric environment (protein and solvent). Instead of the intrinsic energy Gintr,i(jk) (eq. 3.37) an intrinsicpKa value is calculated:

pKa,intr,i(j) = pKa,model(j)− 1 RTln 10

∆∆GBorn,i(j) + ∆∆Gback,i(j)

(3.44) The pKa value describes the deprotonation equilibrium of an acid (section 2.1.4) at site j. The double differences in Born and background energy, describes first the difference between the model compound environment and the protein environment in the heterogeneous transfer (section 3.2.6) and second the difference between the protonated j and deprotonated j charge form.

∆∆GBorn,i(j) = ∆GBorn,heterotrans,i(j)−∆GBorn,heterotrans,i(j) (3.45)

∆∆Gback,i(j) = ∆Gback,heterotrans,i(j)−∆Gback,heterotrans,i(j) (3.46) The modelpKa value,pKa,model(j), is the experimentalpKa value of a model compound of site j in aqueous solution. The intrinsic pKa value, pKa,intr,i(j) is here directly the pKa value of sitej if all other sites are in their reference state.

Microstate energies are calculated relative to a reference state, which is chosen to be the neutral protonation form. The protein is not divided into a background charge set and the interaction energy of instances of sites, but the background energy includes all interactions of sitej with all other sites in their reference charge form.

O O

Figure 3.10. A carboxylic acid has different tautomers and rotamers in the protonated form.

The interaction energy∆∆Ginter,i(j, l)between sitesj andl can be written as double difference of interaction energiesGinter,i(jk, lm)(eq. 3.31):

The microstate energy can be written (using equations from section 2.1.4) as:

Gmicro(~xi,n,pH) =Gconf,i+

As in eq. 3.7,Gconf,iis the conformational energy. The first sum runs over all sites to calculate the difference of intrinsicpKavalue (intrinsic energy) andpH(chemical potential of protons in the bulk solvent) and the second sum adds the interaction energy with all other sites. Sites only contribute to the microstate energy, if they are in a different protonation form than in the reference state and interaction energy is only added if both sites are not in their reference form. The sign of the contribution is positive if the site in the reference form (xji,ref andxli,ref, respectively) is deprotonated and the site in the form of interest is deprotonated, which is the case for bases. Acids have a protonated reference form and the contribution is negative.

In the generalized formulation this complexity arising from the filter function ([xji,n−xji,ref] and [xli,n−xli,ref], respectively) is avoided, because the energies are not calculated relative to a physical reference state. Instead an artificial state is chosen as reference state, where all atoms belonging to instances of sites are uncharged.

Treatment of Carboxyl-, Amine-, Hydroxyl- and Thiol-Groups

In the last two decades, the approach described above was successfully used for many appli-cations [13, 35–44], even though the calculation ofpKavalues with a single protonated form is strictly only valid for sites with no tautomers or rotamers. Since no amino acid strictly fulfills this requirement for proton binding (Fig. 3.10 and Fig. 3.11), a single average charge distri-bution (set of partial charges) describes the bound form and another the unbound form of the site. The carboxyl-group of acidic amino acid sidechains (i.e.,aspartate and glutamate) and

N

N

N

N

N

N +

H H

H

H +

+

H

H

−tautomer

−tautomer δ

ε

Figure 3.11. Histidine has two tautomers in the singly protonated form (protonated onN δ1 orN 2withpKavalues of 7.0 and 6.6 in solution, respectively). The doubly protonated form is shown as well. The doubly deprotonated form has apKavalue of about 14.0 in solution and can therefore be excluded for most calculations.

the C-terminus can be described by a single protonated form, where the proton has a position between the two oxygens (as the first tautomer in Fig. 3.10). The average charge distribution avoids the problem of the two tautomers and the rotameric forms, where the protonated acid is a hydrogen bond donor. Also the basic amino acids, i.e., lysine and the N-terminus can be described by a amine-group with or without positive charge and arginine can be described by a charged or uncharged guanidinium group. For hydroxyl- and thiol-groups (of cystein, tyrosine, serine and threonine), however, this method is problematic, since the calculatedpKa values of neighboring titrateable groups depends strongly on the presence of hydrogen bonds.

The orientation of the hydrogen atom has to be assigned before thepKa calculations introduc-ing a bias if different hydrogen bond networks are possible [45]. Certainly, different hydrogen bonds would also affect, the pKa value of the hydroxyl- and thiol-groups, but usually these groups were not considered as titrateable due to their high pKa value. However, to study proton-transfer (section 5.6) this approximation is not sustainable.

Treatment of Histidine

The amino acid, for which the shortcomings of the approach using a binary state vector be-comes most obvious is histidine (Fig. 3.11). The imidazole ring of histidine has two tautomers in the singly protonated form, which have the proton located on one or the other side of the imidazole ring. Therefore, the hydrogen bonding pattern is clearly different and also the tautomers have differentpKa values. The doubly deprotonated form has a pKa value, which is too high to be populated under physiological conditions. The solution of Bashford and

coworkers [123] treats the tautomers as conformers, i.e., performing a continuum electro-static calculations of the complete protein for the δ- and -tautomer separately. Once the protonation of the δ-tautomer at N 2 is permitted to yield the doubly protonated form and once the protonation and of the-tautomer atN δ1is permitted to yield the doubly protonated form. Postprocessing of the calculated energies (by an additional program) is done to obtain a binary state vector with two bits for histidine. The fully deprotonated form, which was never calculated, but exists in the state vector, was assigned an arbitrary high energy to avoid pop-ulation in the titration calcpop-ulations. All interaction energies with histidine have to be modified to be correct for the modified physical model. For N histidines 2 +N Multiflex calculations (titrating all residues) are necessary. The results are correct in the physiological pH range, but the methodology is computational expensive for many histidines and non-intuitive. It is not transferable with acceptable effort to other sites like quinones (section 5.5).

Treatment of Rotamers

The insufficiency of the approach doing calculations with a binary state vector became even more evident, when You and Bashford included rotamer forms into the pKa value calcula-tions [51]. Effectively, the You and Bashford treatment of rotamers, calculates a thermody-namic average pKa,intr,i for a site, when all other sites are in their reference form. It would be correct to include the rotamer forms equivalently to the charge forms as independent mi-crostates. This would allow to adopt the rotamer population to the charge forms of surround-ing sites. However, such a treatment is not possible with a binary state vector and requires a more general physical model, as it is described in this work.

For inclusion of rotamers intopKacalculations, eq. 3.44 is extended by You and Bashford [51]

(β =RT1 ):

The protonatedjand unprotonatedjform of sitejin rotamerrare calculated in the protein environment jprotein and model environment jmodel. The free energy ∆Genv,i is given as the sum of Born∆GBorn,i, background∆Gback,iand a non-electrostatic, charge form independent energy∆Grotamer:

∆Genv,i= ∆GBorn,i+ ∆Gback,i+ ∆Grotamer (3.50)

The Born and background energies are calculated as homogeneous transfer energies (sec-tion 3.2.5):

∆GBorn,i(jr,protein) = ∆GBorn,homotrans,i(jr,protein ) =GBorn,protein,i(jr)−GBorn,homo,i(jr)

∆GBorn,i(jr,protein) = ∆GBorn,homotrans,i(jr,protein ) =GBorn,protein,i(jr)−GBorn,homo,i(jr)

∆GBorn,i(jr,model ) = ∆GBorn,homotrans,i(jr,model ) =GBorn,model,i(jr)−GBorn,homo,i(jr)

∆GBorn,i(jr,model ) = ∆GBorn,homotrans,i(jr,model ) =GBorn,model,i(jr)−GBorn,homo,i(jr)

∆Gback,i(jr,protein ) =Gback,homotrans,i(jr,protein ) =Gback,protein,i(jr)

∆Gback,i(jr,protein ) =Gback,homotrans,i(jr,protein ) =Gback,protein,i(jr)

∆Gback,i(jr,model ) =Gback,homotrans,i(jr,model ) =Gback,model,i(jr)

∆Gback,i(jr,model ) =Gback,homotrans,i(jr,model ) =Gback,model,i(jr)

The Born energy needs the additional subtraction of the potential of the site in a homogeneous dielectric ϕhomo(~ra,ii(jr)) with dielectric constant of the interior of the site. This term is needed to remove grid artefacts, which were canceled in the single-rotamer calculation by the difference of protein and model compound energy.

Additionally to the terms in eq. 3.50, one might need to consider an additional term∆Gselfback,i, which is the homogeneous transfer energy of the background charges of the protein

ρi,j,r,protein =

Ni,j,r,protein

X

a

Qa,i(jr,protein) (3.51)

or model compound

ρi,j,r,model =

Ni,j,r,model

X

a

Qa,i(jr,model), (3.52)

which changes due to the changing boundary with rotamerrof sitej. If the boundary changes significantly, one might even need to include the surface area dependent non-electrostatic part of the solvation energy.

∆Gselfback,iis computed as:

∆Gselfback,i(jr,protein ) = ∆Gselfback,i(jr,protein ) (3.53)

=1 2

Ni,j,r,protein

X

a

Qa,i(jr,protein)[ ϕprotein(~ra,ii,j,r,protein)

−ϕhomo(~ra,ii,j,r,protein)]

∆Gselfback,i(jr,model ) = ∆Gselfback,i(jr,model ) (3.54)

= 1 2

Ni,j,r,model

X

a

Qa,i(jr,model)[ ϕmodel(~ra,ii,j,r,model)

−ϕhomo(~ra,ii,j,r,model)]

The electrostatic potentials have to be calculated on a grid, which has to be larger than the protein, but should be as fine as the grid of the finest focussing step of the site. Practical calculations on such a grid are not feasible due to memory requirements and computational cost. For small changes of the dielectric boundary, this term is small, that I excluded it from my calculation scheme assuming a combined dielectric boundary. In fact, in many cases, the groups are buried and the dielectric boundary is unchanged. In cases, where the change is significant, conformers can be used.

Despite the problematic inclusion of the rotamer energy into the intrinsic pKa value, the calculation scheme including a molecular mechanics energy∆Grotamer and a thermodynamic cycle using homogeneous transfers is very similar to the approach I describe in this work.

However, my distinction between reference and non-reference rotamers avoids computation ofGBorn,model,i(jr)andGback,model,i(jr)for each non-reference rotamer in each charge form. For calculations with many rotamer forms and charge formspersite this optimization can lead to a significant saving in processor time.

Other Works to Include Protein Flexibility into Electrostatic Calculations

While methods for calculating protonation probabilities in proteins by continuum electrostatic methods exist in a number of groups (Honig, Gunner, McCammon, Bashford, Wade, Vriend, Knapp, Ullmann and other), there are relatively few attempts to include protein flexibility.

Nielsenet al.[46] developed a method to calculatepKa values combiningWhatif[107],DelPhi [28] and the package of Yang et al. [27]. Hydrogen atoms were placed and optimized by different protocols and His, Asn and Gln were allowed to “flip”, i.e., rotate by 180 around theχ2,χ2 andχ3torsion angle, respectively. A threshold-accepting algorithm combined with tree search methods to optimize the global hydrogen bond network [108] was found to be the preferable hydrogen placement method. The hydrogen placement problem was treated as precursory step independent of the outcome of the pKa calculations. In another work [47]

the authors used two structures (for strongly interacting groups four structures) per site with globally optimized hydrogen bond networks to compute Born and background energies.

Therefore, the hydrogen bond network was always optimal to stabilize a certain protonation state, however the energy to convert the system from a hydrogen bond network optimal for one protonation state to a hydrogen bond network optimal for another protonation state was not included into the microstate energy.

Maybe the most advanced method, multiconformational continuum electrostatics (MCCE), was developed in the group of Marilyn Gunner [32, 33]. Their energy function does not only include electrostatic terms, i.e., Born, background and interaction energy, but also a non-electrostatic contribution to the intrinsic energy and interaction energy and an entropy cor-rection. Their method allows to include hydrogen rotamers and tautomers. Singly-protonated histidine is described by two tautomers with the same modelpKa value [32]. The method was

A − + H +

A − + H +

∆ G

protein,deprot,i

G

vac,deprot,i

(AH)

∆ G

solv,i

∆ G

solv,i

(A ) − ∆ G

solv,i

(H ) +

AH

vacuum

AH

protein

Figure 3.12. Thermodynamic cycle to calculate absolutepKa values. The free energy of dissociation of a proton from an acid in the protein (∆Gprotein,deprot,i) is calculated from the free energy of dissociation in vacuum (∆Gvac,deprot) and the solvation energy of the reactants and products. (Adapted from [126])

extended to include sidechain rotamers [33] and reduction potential calculations [124]. Most likely, from the description in Zhu et al. [124] and the fact that histidine is modeled with a single pKa value for both tautomers, more than a single reduction potential orpKa valueper site is not possible. The physical basis of the method is, however, not described in sufficient detail. Also the source code of their program is not available to be analyzed. Therefore, fur-ther speculations are avoided which would be necessary for a comparison with the method presented here.

Very recently the group of Ernst-Walter Knapp the programsTAPBS[125] andKarlsberg2.0 [34]

were written, which seem to be similar toQMPBandGMCT, respectively.TAPBSbuilds onAPBS instead of MEAD. Multiple steps of reduction and protonation as well as rotamers and con-formers seem to be possible. However, the approach chosen here to allow arbitrary ligands, not only electrons and protons, is broader and more flexible. Since details about the under-lying physical model are not published at the time of writing, a detailed comparison is not possible as well.