• Keine Ergebnisse gefunden

Computational studies on the dynamical properties of

N/A
N/A
Protected

Academic year: 2022

Aktie "Computational studies on the dynamical properties of"

Copied!
111
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Computational studies on the dynamical properties of

phytochromes

vorgelegt von Msc.

Giovanni Battocchio

von der Fakultät II - Mathematik und Naturwissenschaften der Technischen Universität Berlin

zur Erlangung des akademischen Grades Doktor der Naturwissenschaften

-Dr. rer. nat.-

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Prof. Dr. Roderich Süßmuth Gutachterin: Prof. Dr. Maria Andrea Mroginski Gutachterin: Prof. Dr. Cecilia Clementi

Tag der wissenschaftlichen Aussprache: 08.06.2021

Berlin 2021

(2)
(3)

Computational studies on the dynamical properties of phytochromes

Abstract

Phytochromes are superfamily of bistable photoreceptor molecules found in several organisms, such as plants, bacteria, algae and fungi. They serve as light sensors, mainly detecting red and far red light, governing a range of functions, such as seed germination, greening, stem extension, and flow- ering in plants. Due to their important role in several organism they have been studied for decades, tough their full mechanism of action remains elusive.

In this work we investigated the dynamical properties of phytochromes by means of computational methods. Furthermore we extended the studies to the vibrational properties of the light sensitive chromophore, an open tetrapyrrol bilin, contained inside the phytochromes.

In the first chapter of this work we start with a general description of phytochromes and their struc- ture, including peculiar features such as a knot like structure connecting two domains. We give a brief exposition of differences and varieties of phytochromes found in plants, bacteria and other organism based on their light sensitivity and photocycle behaviour. We discuss the known confor- mational changes that happen during the photocycle, both in the chromophore and in the protein scaffold.

In the second chapter we briefly summarise the theory behind the computational methods used, from molecular dynamic (MD) to quantum mechanic (QM). We outlined the foundations of MD, a computational method widely employed for the studify of large biomolecules such as complex proteins. We give a short exposition on how to include the effect of the environment, with whom the protein interacts during the simulations. Of the many algorithm developed since the introduc- tion of MD method, we list the ones employed in our simulations. We continue then with a section on two methods we employed developed to improve conformational sampling, an issue that affects all MD simulations and has been tackled by many researcher over the years. In the last part we give a description of the employed QM/MM method adopted to evaluate vibrational properties of the chromophore, whose results can then be compared with experimental results.

In the third chapter we include a summary of the computational details of our calculations. Be-

(4)

ginning with how models are prepared, a starting step sometimes overlooked, but that has deep repercussion on all the simulations that follow. We also give a brief exposition of the used softwares and their capabilities, we then conclude with a section on analysis tools. A proper analysis of the re- sults is as or even more important than the simulations itself, so the choice of the adequate analysis method that better correlates trajectories with actual physical properties, is of greatest importance for the final interpretation of the vast amount of gathered data.

The fourth chapter should be the most interesting for the reader, since it contains the highlights of the results obtained during this PhD period. Results are grouped by system and their observed dynamical behaviour, starting with arguably the most studied phytochrome,Deinocccus radiodu- rans(DrBph), followed by Agp2 fromAgrobacterium tumefaciens,Xantomonas campestris(XccBph) phytochromes and lastly GAF-3, a cyanobacteriochrome derived fromSynechocystissp. PCC6803.

Thanks to the the use of enhanced sampling methods, we were able to study the dynamics of these phyotchromes in a wider variety of conformations and relating them to some specific structural features. Example is the effect of simulating the monomer versus the naturally occurring dimer in solution, or the deletion of a linker knot from the protein scaffold, in DrBph. These enhanced meth- ods made possible to sample secondary structural transitions in Agp2, a phenomenon that would otherwise require order of magnitudes longer conventional MD. In addition was possible to assess the stability of different dimer construct on the longer time scale and determine if one conforma- tion is more or less stable than another one in XccBph, a question that has always driven curiosity, since most of naturally occurring phytochromes appears as parallel dimer, but few physiological functional ones adopt an antiparallel configuration. Other computational methods were adopted, including constant pH calculations, giving accurate pKa values predictions for key residues in the chromophore vicinity and their most likely protonation states in Agp2. This is turn influences the correspondent MD sampling, giving a better and more physiologically accurate representation of the dynamic proton interchange between residues happening in the protein. Finally the results of QM/MM calculations on GAF-3, an interesting red-green photo-switch, aimed to reproduce and help interpret experimental spectra and to link spectra to conformations.

(5)

Zusammenfassung

Phytochromen sind eine Superfamilie von bistabilen Photorezeptormolekülen, die in verschiede- nen Organismen wie Pflanzen, Bakterien, Algen und Pilzen vorkommen. Sie dienen als Lichtsen- soren, die hauptsächlich rotes und fernes rotes Licht erkennen und eine Reihe von Funktionen steuern, wie z. B. die Keimung von Samen, das Ergrünen, die Ausdehnung des Stängels und die Blüte bei Pflanzen. Aufgrund ihrer wichtigen Rolle in verschiedenen Organismen werden sie seit Jahrzehnten erforscht, obwohl ihr vollständiger Wirkmechanismus noch immer schwer zu verste- hen ist.

In dieser Arbeit haben wir die dynamischen Eigenschaften von Phytochromen mit Hilfe von Berech- nungsmethoden untersucht. Darüber hinaus haben wir die Untersuchungen auf die Schwingung- seigenschaften des lichtempfindlichen Chromophors, eines offenen Tetrapyrrol-Bilins, das im In- neren der Phytochrome enthalten ist, ausgeweitet.

Im ersten Kapitel dieser Arbeit beginnen wir mit einer allgemeinen Beschreibung der Phytochrome und ihrer Struktur, einschließlich besonderer Merkmale wie einer knotenartigen Struktur, die zwei Domänen verbindet. Wir geben eine kurze Darstellung der Unterschiede und Varietäten von Phy- tochromen, die in Pflanzen, Bakterien und anderen Organismen gefunden werden, basierend auf ihrer Lichtempfindlichkeit und ihrem Verhalten im Photozyklus. Des Weiteren diskutieren wir die bekannten Konformationsänderungen, die während des Photozyklus stattfinden, sowohl im Chro- mophor als auch im Proteingerüst.

Im zweiten Kapitel fassen wir kurz die Theorie hinter den verwendeten Berechnungsmethoden zusammen, von der Molekulardynamik (MD) bis zur Quantenmechanik (QM). Unter anderem skizzieren wir die Grundlagen der MD, einer Berechnungsmethode, die häufig für die Untersuchung großer Biomoleküle wie komplexer Proteine eingesetzt wird und geben eine kurze Darstellung, wie man den Effekt der Umgebung, mit der das Protein während der Simulationen interagiert, ein- bezieht. Darüber hinaus listen wir von den vielen Algorithmen, die seit der Einführung der MD- Methode entwickelt wurden, die in unseren Simulationen verwendeten auf. Im weiteren Verlauf dieses Kapitels fahren wir mit einem Abschnitt über zwei von uns verwendete Methoden fort, die entwickelt wurden, um das Konformations-Sampling zu verbessern, ein Problem, das alle MD-

(6)

Simulationen betrifft und im Laufe der Jahre von vielen Forschern angegangen wurde. Im let- zten Teil beschreiben wir die eingesetzte QM/MM-Methode, die zur Auswertung der Schwingung- seigenschaften des Chromophors verwendet wurde, deren Ergebnisse dann mit experimentellen Ergebnissen verglichen werden können.

Im dritten Kapitel fassen wir die rechnerischen Details unserer Berechnungen zusammen. Wir be- ginnen damit, wie die Modelle vorbereitet werden. Dieser Anfangsschritt wird manchmal überse- hen wird, hat aber tiefgreifende Auswirkungen auf alle folgenden Simulationen. Hierbei geben wir auch eine kurze Darstellung der verwendeten Software und ihrer Möglichkeiten und schließen dann mit einem Abschnitt über Analysewerkzeuge. Die Wahl der geeigneten Analysemethode, die die Trajektorien besser mit den tatsächlichen physikalischen Eigenschaften korreliert, ist daher von größter Bedeutung für die endgültige Interpretation der riesigen Menge an gesammelten Daten.

Das vierte Kapitel dürfte für den Leser das interessanteste sein, da es die Höhepunkte der während dieser Promotionszeit erzielten Ergebnisse enthält. Die Ergebnisse sind nach Systemen und ihrem beobachteten dynamischen Verhalten gruppiert, beginnend mit dem wohl am besten untersuchten Phytochrom,Deinocccus radiodurans(DrBph), gefolgt von Agp2 ausAgrobacterium tumefaciens,Xan- tomonas campestris(XccBph) Phytochrome und schließlich GAF-3, ein Cyanobakteriochrom, das ausSynechocystissp. PCC6803 stammt. Dank der Verwendung erweiterter Sampling-Methoden konnten wir die Dynamik dieser Phyotchrome in einer größeren Vielfalt von Konformationen un- tersuchen und sie mit einigen spezifischen strukturellen Merkmalen in Verbindung bringen. Ein Beispiel dafür ist der Effekt der Simulation des Monomers gegenüber dem natürlich vorkommenden Dimer in Lösung oder der Deletion eines Linkerknotens aus dem Proteingerüst, in DrBph. Diese verbesserten Methoden ermöglichten es, sekundärstrukturelle Übergänge in Agp2 zu untersuchen, ein Phänomen, für das sonst eine um Größenordnungen längere konventionelle MD erforderlich wäre. Außerdem war es möglich, die Stabilität verschiedener Dimerkonstrukte auf der längeren Zeitskala zu bewerten und zu bestimmen, ob eine Konformation mehr oder weniger stabil ist als eine andere in XccBph. Dies ist eine Frage, die schon immer das wissenschaftliche Interesse geweckt hat, da die meisten natürlich vorkommenden Phytochrome als parallele Dimere auftreten, aber nur wenige physiologisch funktionale eine antiparallele Konfiguration annehmen. Es wurden weitere Berechnungsmethoden angewandt, darunter Berechnungen mit konstantem pH-Wert, die genaue

(7)

Vorhersagen der pKa-Werte für Schlüsselreste in der Nähe des Chromophors und ihrer wahrschein- lichsten Protonierungszustände in Agp2 liefern. Dies wiederum beeinflusst das entsprechende MD- Sampling, was eine bessere und physiologisch genauere Darstellung des dynamischen Protonenaus- tauschs zwischen den Resten im Protein ergibt. Schließlich wurden die Ergebnisse von QM/MM- Rechnungen an GAF-3, einem interessanten Rot-Grün-Photoschalter, vorgestellt, die darauf abzie- len, experimentelle Spektren zu reproduzieren und zu interpretieren sowie Spektren mit Konfor- mationen zu verknüpfen.

(8)

Contents

1 Phytochromes 1

1.1 Phytochromes . . . 1

1.2 Canonical, Prototypical and Bathy phytochromes . . . 4

1.3 A knot connecting PAS and GAF domains . . . 5

1.4 Keto-enol tautomerism . . . 6

1.5 Crystallography: strengths and weaknesses . . . 6

1.6 Phytochromes studies in this work . . . 7

2 Theoretical Background 12 2.1 Molecular Dynamics . . . 12

2.2 Including the Environment . . . 17

2.3 Accelerated and Gaussian Accelerated Molecular Dynamics . . . 19

2.4 Constant pH Molecular Dynamics . . . 22

2.5 DFT . . . 24

2.6 QM/MM . . . 26

2.7 QM/MM vibrational frequency calculation . . . 28

3 Computational Details 30 3.1 System preparation . . . 30

3.2 MD software . . . 32

3.3 QM/MM software . . . 33

3.4 Analysis tools . . . 33

4 Results 36 4.1 DrBph . . . 36

4.2 Agp2 . . . 54

4.3 XccBph . . . 69

4.4 GAF-3 . . . 73

5 Conclusions 82

(9)

References 93

(10)

List of figures

1.1.2 Different chromophores present in phytochromes. PCB, PΦB and BV, linked to a Cysviaa sulfur (in yellow). . . 2 1.1.1 Phytochrome full lenght dimer structure example (PDB: 6FHT). PAS, in blue,

GAF, in green, PHY, in yellow, and OM, in red, secondary structure. Biliverdin chromophore represented as sticks . . . 3 1.1.3 Photocycle schematic example of a prototypical bacterial phytochrome contain-

ing BV. The ground state, Pr, after red light excitation interconvertsviaa series of intermediates, Lumi and Meta state, to Pfr and can similarly convert back to Pr after far red light excitation orviathermal relaxation (gray dashed line). The phy- tochrome in Meta state is deprotonated at the pyrrolic nitrogens, the re-uptake of the proton occurs before the formation of Pfr. Secondary structural transition of the tongue, resulting after the photoconversion, is highlighted in yellow (β-sheet) and orange (α-helix) colors. . . 4 1.3.1 Knot structure, in green, connecting PAS, in red, and GAF, in orange, domains in

DrBph. Chromophore in blue sticks. . . 5 1.4.1 Keto-enol equilibrium for BV chromophore in a bathy phytochrome in Pr state.

The ring D keto form, on the left, is in equilibrium with the enol form, on the right. This also causes a double bond relocation in ring D and ring C. . . 6 1.6.1 The first X-ray structure of DrBph, comprising of a PAS-GAF monomer (PDB

entry: 1ZTU). . . 8 1.6.2 Dimer structure of Agp2-PAiRFP2 in Meta-F state. The dimer interface is between

the two PAS domains in an unusual atisymmetic conformation (PDB entry: 6G20). 9 1.6.3 Dimer structure of XccBph in Pr state. The dimer is a head-to-head parallel dimer

with the OM, a PAS9 domain, interlacing at the C-terminus (PDB entry: 5AKP). 10 1.6.4 Gaf-3 structure in Pr state, as isolated from Slr1393, with PCB chromophore in

salmon sticks (PDB entry: 5DFX). . . 11 2.1.1 Illustration of the bonded interactions of a simple chain molecule. interatomic

distancer23, bend angleθ234, and torsion angle Φ1234. . . 14

(11)

2.2.1 Different water models and their sites. . . 18 2.2.2 Periodic boundary conditions. As a particle moves out of the simulation box, an

image particle moves in to replace it. . . 19 2.3.1 Schematic representation of a hypothetical potential energy function and several

bias potentials plotted at various values of α, and at a relative low value of the threshold boost energy,E. . . . 21 2.4.1 Amber constant pH in explicit solvent workflow. . . 23 2.4.2 Schematic of the method used in the Generalized Born implicit solvent step where

the energy difference between the two protonation states is used to evaluate the Metropolis Monte Carlo criteria for the proposed change in protonation state(s). 24 2.6.1 Simplified schematic of the QM/MM method division, with the QM part in or-

ange represented as sticks and the MM part in gray represented as secondary struc- ture. . . 27 4.1.1 Evolution of RMSD values for Cαof DrBphP with respect to initial crystal confor-

mation (PDB entry 4q0j), throughout the MD simulations computed with CHARMM27 (purple) and AMBER ffSB14 (green) force fields using a) monomer form simu- lated with cMD, b) monomer form simulated with aMD, c) dimer form simulated with cMD and d) dimer form simulated with aMD. Points collected every 0.1 ns (transparent lines) and every 5 ns (solid lines).The first 100 ns (in the gray box) are regarded as initial equilibration of the system and therefore not considered in the analysis. . . 39 4.1.2 Ramachandran Φ/Ψ plots for the DrBphP dimer: a) CHARMM 27 cMD, b)

CHARMM 27 aMD, c) AMBER ffSB14 cMD, d) AMBER ffSB14 GaMD. . . 40 4.1.3 Structure of DrBphP monomer after 500 ns cMD simulation with CHARMM27

force field (in red) compared with the crystal structure [PDB: 4q0j] (in gray), front and back view. The movement of the PHY towards the GAF domain, pivoting around L331, is highlighted by the green dashed line connecting the center of mass of the two domains (plotted in Fig. 4.1.4(a)). . . 41 4.1.4 Evolution of the Center Of Mass (COM) distance (a) and bend angle centered at

Leu331 (b) for the monomer, cMD, computed using CHARMM27 (purple) and AMBER ffSB14 (green) force fields. . . 41 4.1.5 Dynamic Cross Correlation (DCC) plots for DrBphP monomer throughout cMD

using CHARMM27 after frame alignment with respect to GAF domain. a) Spa- tial orientation of the protein, b) DCC along the x direction, c) DCC along the y direction, d) DCC along the z direction, e) PHY domain spanning residues 367 to 443, colored according to the DCC values shown in the zoom-in box 2. . . 42

(12)

4.1.6 Structure of the DrBphP dimer, molecule A in blue and molecule B in red, after 500 ns cMD simulation with CHARMM27 (in color) compared with the crys- tal structure (in gray). Zoom in shows the interaction between Arg495 (orange sticks) and Asp496 (bronze sticks) and the two Glu432 (silver sticks) attracting sodium ions (yellow spheres) that allow close the approach of the two residues. . 43 4.1.7 Evolution of Center Of Mass (COM) distance for DrBphP dimer (a) according to

cMD simulations using CHARMM27 (purple) and AMBER ffSB14 (green) force fields. Evolution of the bend angles (b) of chain A (purple) and chain B (green) of DrBphP dimer from cMD with CHARMM27. The bend angles where computed considering Cαof Leu331 as vertex. . . 44 4.1.8 Dynamic Cross Correlation (DCC) plots for DrBphP monomer throughout cMD

using CHARMM27 after frame alignment with respect to GAF domain. a) Ori- entation of the DrBphP dimer. b) DCC along the x direction. c) DCC along the y direction. d) DCC along the z direction. e) Zoom of the box 5 and the corre- spondent residues (885 to 932) colored according to the DCC values. . . 45 4.1.9 Kink of theα-helix spine of the final DrBphP dimer structure according to GaMD

(AMBER ffSB14), in red, compared to theα-helix spine of the crystal structure of the PaaCΔC construct, in gray (PDB entry: 6FHT). . . 46 4.1.10Native structure A) of DrBph PAS-GAF monomer, including the knot, highlighted

in red, and the knotless structure B) with the two joint strands after deletion of the knot in red. . . 48 4.1.11Comparison of the RMSF, during 500 ns aMD with CHARMM27, of the native

(black) and knot-less (red) structure of DrBph PAS-GAF monomer structure.

The gap is where the knot is cut. . . 50 4.1.12Distance as function of simulation time of the two residues Gln68 and Arg237 in

theβ-sheets of PAS and GAF of DrBph monomer knotless construct. CHARMM27, aMD. . . 50 4.1.13Distance between the two residues Gln68 and Arg237 in the highlightedβ-sheets

through the simulation time. Initially at 23 Åin a) and b), the distance decreases to 15 Å, c), 10 Å, d), 12 Å, e), and 11 Å at the end of the 500 ns simulation, f).

DrBph PAS-GAF Knot-less structure, CHARMM 27 aMD. . . 51 4.1.14Evolution of DrBph PAS-GAF monomer during the 500 ns aMD with CHARMM

27 ff. a) initial conformation, b) after 250 ns and c) after 500 ns. In red sticks are represented negatively charged residues Asp and Glu and in blue sticks the positively charged Arg residues. . . 52

(13)

4.1.15Comparison of dynamic of the native, left, and the knotless, right, DrBphP PAS- GAF monomer, compared to the reference structure, in grey. PAS domain in red, GAF domain in orange and chromophore in green. The chromophore in the knot- less construct is more solvent exposed than in the native form, as visible but the enlarged, unprotected binding pocket’s opening. 500 ns aMD simulation with CHARMM ff. . . 53 4.1.16Comparison of dynamic of the native, left, and the knotless, right, DrBphP PAS-

GAF-PHY (red-orange-yellow respectively) dimer, compared to the reference struc- ture, in grey. 500 ns GaMD simulation with AMBER ff. . . 54 4.2.1 Agp2 Meta-F, model MF1, run 1. Secondary structure change observed in tongue

of chain A, left, from α-helix toβ-sheet, highlighted in yellow. Secondary struc- ture plot for chain A, up right, and chain B, down right, show that only chain A undergoes and keeps the transition for about 200 ns. . . 57 4.2.2 Agp2 Meta-F, model MF1, run 5. Secondary structure change observed in tongue

of chain B, left, fromα-helix toβ-sheet, highlighted in yellow. Secondary structure plot for chain A, up right, and chain B, down right, show that this time only chain B undergoes the transition and keeps it on and off for about 200 ns in total. . . 58 4.2.3 Agp2 Meta-F, model MF4, run 5. Secondary structure change observed in tongue

of chain A, left, fromα-helix toβ-sheet, highlighted in yellow. Secondary structure plot for chain A, up right, and chain B, down right, show that chain A undergoes the transition as a doubleβ-sheet and keeps it for almost 800 ns. . . . 59 4.2.4 Meta-F with two different BV conformations, Pfr-like (a) andα-facial Pr-like (b),

adopted during the same simulation. . . 60 4.2.5 Dihedral angle for ring D during the simulation of Agp2 Meta-F model MF1, run

1. In solid color is every 10th point, faint line is every point. . . 61 4.2.6 Water molecules in the pocket interacting with Prop-C. . . 64 4.2.7 H-bond between the two propionic side chains and their interactions with two

surrounding Arg211 and Arg242 in the Agp2 Meta-F state. . . 65 4.2.8 Orientation of Phe165 in Agp2 WT Pfr state (left) and Agp2-AiRFP2 Meta-F state

(right). . . 68 4.3.1 XccBph G454E dimer (PDB entry: 7L59). The PAS9 output module (in red) is

bentcirca90, which impairs a head-to-head dimer formation. . . 70 4.3.2 XccBph G454E head-to-head dimer model, based on the quaternary structure of

the Pr crystal (PDB entry: 6PL0). In this model the backbone helical spine is preserved and the two OM interlock with each other in an anticlockwise fashion (top view). . . 71

(14)

4.3.3 XccBph head-to-head dimer model. Panel A is the initial structure, and panel B is the final structure after 500 ns GaMD. The dimer interface is conserved with some minor rearrangement of the PAS (in green) and the OM PAS9 (in yellow) domains. The surface is colored based on the electrostatic values obtained with APBS, ranging from -5, in red, to +5, in blue, units areKbT/ec. . . 72 4.4.1 Photocycle of Slr1393-g3. The red- and green-absorbing parent states can be in-

terconverted by photoisomerization of the PCB chromophore in the respective excited states (denoted by asterisks) and subsequent relaxation. The crystal struc- ture of Slr1393-g3 (Pr) is depicted in the middle of the cycle. Depicted the two PCB chromophore structures in Pr, ZZZssa configuration, PDB entry 5DFX (top right), and Pg states, distorted ZZEssa configuration, PDB entry 5M82 (bottom right). Included is Asp498 that replace the pyrol water. . . 74 4.4.2 PCB chromophore in Po state. A) with ring D in the originalβ-facial conforma-

tion, B) with the modiefiedα-facial conformation found in Pr state. . . . 74 4.4.3 Experimental (red lines, 283 K) and calculated (gray lines) RR spectra of GAF-3

of the Pr state in (A) H2O and (B) D2O. . . 76 4.4.4 Experimental (red lines, 283 K) and calculated (gray lines) RR spectra of GAF-3

of the Pg state in (A) H2O and (B) D2O. . . 77 4.4.5 Experimental and calculated (gray lines) RR spectra of GAF-3 of the Po state, ob-

tained by freezing a Pr sample (80 K) and subtraction of residual Pr contributions.

(A) RR spectrum in H2O compared with the calculated Raman spectrum of the α-facial model. (B) RR spectrum in D2O compared with the calculated Raman spectrum of theα-facial model. (C) RR spectrum in H2O compared with the cal- culated Raman spectrum of theβ-facial model derived from the crystal structure of the intermediate (PDB entry 5M85). (D) RR spectrum in D2O compared with the calculated Raman spectrum of theβ-facial model. . . . 79 4.4.6 Model for the photocycle of GAF-3. In the Pr state (top), the three pyrrole rings

A, B, and C of PCB are tightly associated with the conserved Asp (D498) and Trp (W496) residues. Subsequent to photoisomerization of Pr (red arrow), the interactions with the Trp residue are lost, and the coordination of ring A and the Asp residue is changed, leading to the shortened conjugation length in the Pg state.

After photoisomerization starting from Pg (green arrow), an intermediate state (Po) is populated, in which the Trp residue remains solvent-exposed and ring A is in an out-of-plane configuration. The Po state is not photoactive and is established in a thermal equilibrium with Pr. Pr* and Pg* refer to the respective excited states, while Int denotes additional yet unidentified thermal intermediate(s). . . 80

(15)

List of tables

3.2.1 Comparison of software speed on a single NVIDIA RTX 2080 Ti for calculation of a system, including solvent molecules, comprising of about 130000 atoms. . . . 32 4.1.1 Average RMSD values, in Å and standard deviation (in parentheses), of Cαposi-

tions for monomer and dimer structures of DrBphP computed over the last 100 ns MD simulations using CHARMM27 and Amber ffSB14 force fields with respect to the crystal structure. . . 38 4.1.2 Mean RMSF values for the Cα-backbone, after aligning with respect to the GAF

domain, during 500 ns simulation, cMD. Units are Å. . . 38 4.1.3 RMSD values, in Å, for DrBph PAS-GAF monomer, wild type (WT) and knotless

(KL). . . 49 4.1.4 RMSD values, in Å, for DrBph PAS-GAF-PHY dimer, wild type (WT) and knot-

less (KL). . . 53 4.2.1 Agp2 states with different protonation models tested. ”Prot” represents a singly

protonated and ”Deprot” a completely deprotonated Prop-C, while ”ε” and ”P”

indicate if His278 is protonated on the εposition or doubly protonated. Corre- spondingly we assigned a short name tag to each model. . . 55 4.2.2 Agp2 models, number of runs and their results. Numbers in last column refers to

the number of runs where theα βis observed, x means is not observed in any of the run. . . 59 4.2.3 Mean pKa and population values after three (two for Meta-Fβand Pr) separate

200 ns cMD cpH for the Agp2 dimer. . . 65 4.2.4 Mean pKa values and population distribution after three independent 100 ns cMD

cpH for the Agp2 monomer. . . 66 4.3.1 Average values of RMSD, in Å. Number of formed salt bridges and H-bonds, con-

figurational and binding energy, in Kcal/mol. . . 73

(16)

Acronyms

aMD Accelerated Molecular Dynamics. 20 BV biliverdin. 2

CBCR Cyanobacteriochrome. 11 DFT Density Functional Theory. 24 ff force field. 13

GAF cGMP phosphodiesterase/ Adenylate cyclase/FhlA. 2 GaMD Gaussian Accelerated Molecular Dynamics. 20 GB Generalized Born. 17

IR Infrared. 74

MD Molecular Dynamics. 12 MM Molecular Mechanics. 12 OM output module. 2

PΦB phycochromobilin. 2

PAS Period/ARNT/Singleminded. 2 PCB phycocyanobilin. 2

Pfr Far-red absorbing. 1 Pg Green absorbing. 11 PHY Phytochrome-specific. 2

(17)

PME Particle Mesh Ewald. 18 Po Orange absorbing. 11 Pr Red absorbing. 1

PSM Photo Sensory Module. 2 QM Quantum Mechanic. 14, 26

QM/MM Quantum Mechanic Molecular Mechanic. 26 RESP Restrained ElectroStatic Potential. 14

RR Resonance Raman. 74

SPEM Single Particle Electron Microscopy. 7

TIP3P Transferable Intermolecular Potential with 3 Points. 18 WT wild type. 55

(18)

Phytochromes 1

Organisms sense light in very different ways from the simple perception to the more com- plex long range signaling. The importance of light signals in regulating plant growth has been docu- mented for centuries. Indeed, Darwin himself provides detailed observations of the developmental processes occurring following emergence of a dark-grown seedling into the light, in a book written with his son, Francis, ‘The power of movement in plants’[1]. In this work, the authors record impor- tant roles for light throughout plant development, including leaf ”sleep” movements (epinasty/hyponasty¹) and the bending of plant stems towards bright lateral light (phototropism²).

1.1 Phytochromes

Bacteria, fungi, algae and plants need not only to sense the presence of light, but also its quantity and quality. One kind of such photoreceptors able to respond to red/far-red light, called Red ab- sorbing (Pr) and Far-red absorbing (Pfr), employed by all these organisms are phytochromes. The termphytochrome, meaning “plant color”, was originally coined in 1920 to describe the proteinous pigment that controls photoperiod detection and floral induction of certain plants[2], but experi- mentally detected in turnip seedlings in 1959[3].

Phytochromes are dimeric photoreceptors consisting of an N-terminal sensory module comprising

¹Epinasty is the downward bending of a leaf or other plant part, resulting from greater growth of the upper side than of the lower side. Hyponasty is the opposite effect.

²Phototropism is the widespread phenomenon whereby plants tend to grow toward (or away from) a light source.

(19)

Period/ARNT/Singleminded (PAS), cGMP phosphodiesterase/ Adenylate cyclase/FhlA (GAF)) and Phytochrome-specific (PHY) domains[4–6]. Those constitue the so called Photo Sensory Module (PSM). The signal is then transferred to an output module (OM) at the C-terminal end, often a histidine kinase.

In phytochromes, light is absorbed by a linear open-chain tetrapyrrole (bilin) cofactor, thus called chromophore, covalently linked to a cysteineviaa thioether bond to the photosensory pro- tein domain. There are three chromophores, phycocyanobilin (PCB), used by cyanobacteria, phy- cochromobilin (PΦB), used by plants, and biliverdin (BV), used by bacteria, and they differ on the way they are linked to the cysteine and the R group on the D ring. The chromophore can adopt two conformations depending on the orientations of the pyrrol rings with respect to the connecting methine bridges, ZZZssa or ZZEssa, where Z/E stand forZusammen/Entgegenand s/a forsyn/anti.

Figure 1.1.2: Different chromophores present in phytochromes. PCB, PΦB and BV, linked to a Cys viaa sulfur (in yellow).

The photoconversion between Pr (λmax 660 nm) and Pfr (λmax 720 nm) states, orvice versa, is driven by the photoisomerization of the tetrapyrrole, in which ring D rotates, corresponding to a ZZZssa and ZZEssa chromophore configuration. The back reaction is obtained by Pfr(Pr) irradia- tion or by thermal relaxation.

(20)

Figure 1.1.1: Phytochrome full lenght dimer structure example (PDB: 6FHT). PAS, in blue, GAF, in green, PHY, in yellow, and OM, in red, secondary structure. Biliverdin chromophore represented as sticks

(21)

Figure 1.1.3: Photocycle schematic example of a prototypical bacterial phytochrome containing BV.

The ground state, Pr, after red light excitation interconverts viaa series of intermediates, Lumi and Meta state, to Pfr and can similarly convert back to Pr after far red light excitation orviathermal re- laxation (gray dashed line). The phytochrome in Meta state is deprotonated at the pyrrolic nitrogens, the re-uptake of the proton occurs before the formation of Pfr. Secondary structural transition of the tongue, resulting after the photoconversion, is highlighted in yellow (β-sheet) and orange (α-helix) col- ors.

The Pr and Pfr parent states interconvert photochemicallyviaa series of intermediates termed the Lumi and Meta states (see fig. 1.1.3). The nomenclature derive from their first identifcation in 1985 by cryogenical (cryo-)UV/Vis experiments onAvena sativaplant phytochrome[7].

Following the chromophore isomerization some other structural changes happen, the most notable one being the secondary structural transition of the tongue fromα-helix toβ-sheet orviceversa. The tongue is a hairpin protrusion elongating from the PHY domain and facing the chromophore bind- ing pocket (see fig. 1.1.1).

1.2 Canonical, Prototypical and Bathy phytochromes

Canonical phytochromes, among the first to be discovered, have Pr as their dark adapted ground state, are typically plant phytochromes incorporating phycocyanobilin. Part of the canonical family areSynechocystis sp. PCC 6803 Cph1,Micromonas pusillaMpPHY andArabidopsis thalianaPhyA, PhyB/E and PhyC[8].

Prototypical phytochromes also have Pr as ground state and are usually bacteria phytochromes us- ing biliverdin as chromophore. The prototypical family includeAgrobacterium tumenfaciensAgp1, Deinococcus radioduransDrBphP,Rhodopseudomonas palustrisRpBphP2 and RpBphP3, andStig- matella aurantiacaSaBphP1[9].

(22)

Bathy phytochromes, on the other hand, have Pfr as their dark adapted state and are so called be- cause their ground states are bathochromically³ shifted compared to those of all other phytochromes.

To this group belong phytochromes such asBradyrhizobiumBrBphP1,Rhodopseudomonas palustris BphP1,Agrobacterium tumefaciensAgp2 andPseudomonas aeruginosaBphP1[10].

1.3 A knot connecting PAS and GAF domains

Since the very first crystal structure ofDeinococcus Radioduransbacteriophytochrome (DrBphP)[11], an unusual construct was noted, namely a knot structure. The DrBph contains a rare deep trefoil knot in which 48 amino acids protrude from the crossover, breaking the long-held dogma that pro- teins do not fold into knots. A deep trefoil knot is formed as the 35 N-terminal residues upstream of the PAS domain, pass through the loop, residues 225–257, within the structurally conserved GAF domain (see fing.1.3.1).

Figure 1.3.1: Knot structure, in green, connecting PAS, in red, and GAF, in orange, domains in DrBph. Chromophore in blue sticks.

Several functions of this knot have been proposed. One may be to stabilize contacts between the PAS and GAF domains. Another may be to limit the flexibility of a photoreceptor as a mechanism to prevent undesirable energy losses due to vibration or large domain movement upon photoiso- merization of biliverdin. Finally, by severely restricting movement of the N-terminal domain, the knot may orient Cys 24 for efficient conjugation to biliverdin[11].

With the resolution of many other structures, this knot structure appears to be conserved in most

³From greek,bathoslit. ”depth” andkhro¯malit. ”color”, to deeper colors, so to longer wavelenghts.

(23)

phytochromes, with few exceptions among cyanobacteriochromes. Despite the peculiarity of this knot structure, to this day, little research has been done to its main function.

1.4 Keto-enol tautomerism

The Pr state of bathy phytochromes is affected by a pH-dependent equilibrium of tautomeric keto- /enol-forms at the ring D carbonyl group. The enolic tautomer is formed by a proton exchange with a nearby conserved histidine residue and presumably initiates the thermal dark-relaxation reaction to the Pfr state[12].

Figure 1.4.1: Keto-enol equilibrium for BV chromophore in a bathy phytochrome in Pr state. The ring D keto form, on the left, is in equilibrium with the enol form, on the right. This also causes a dou- ble bond relocation in ring D and ring C.

1.5 Crystallography: strengths and weaknesses

Crystallography has surpassed NMR as method of choice for obtaining the 3D structure of proteins.

The first crystal structure of a macromolecule was solved in 1958, a three-dimensional model of the myoglobin molecule obtained by X-ray analysis[13]. As a science, crystallography has produced 28 Nobel Prizes, more than any other scientific field and 2014 was elected ”International Year of Crys- tallography” by the United Nations.

Since the solution of the first crystal structure ofDeinococcus Radioduransbacteriophytochrome in 2005 by Wagner et al. [11], many other structures followed, with increasing resolution and com- pleteness, paving the road for computational simulations. Having good starting models is paramount for any computational study on biomolecules and X-ray crystallography generally satisfy this need.

The development of new crystallization techniques and new, brighter and more powerful synchrotrons, where the X-ray beams are generated, improved the resolution of crystals during the years. But X-ray crystal structures are not without flaws. Not every biomolecule can be easily crystallized, e.g. crys-

(24)

tallization of complete phytochromes, including the OM, is still a challenge, and often proteins need to be chopped into smaller pieces or mutated to meet better crystallization conditions. Another is- sue is that inside crystals proteins are far from the physiological conditions, since concentration and packing are much higher than in the cell, giving some artifacts, e.g. in dimerization conformations.

Also crystals are stored in liquid nitrogen to maintain them integer and to reduce thermal move- ments in the protein, but this does not reflect the normal room temperature conditions. Another issue is that X-ray beams are very energetic and can easily damage proteins, turning them completely unusable after the irradiation. Due to physical constrains, the max achievable resolution is around 1 Å, so hydrogens are seldom resolved in the structures. Despite the believe that the structures ob- tained are unbiased experimental results, there is always a great deal of refinement and modelling during the evaluation of obtained data.

X-ray crystallographic structures are invaluable for theorethical studies of proteins, but they have to be taken critically and adequately refined.

1.6 Phytochromes studies in this work

1.6.1 DrBph

Deinococcus radiodurans bacteriophytochrome is possibly the more widely studied phytochrome and the first crystal to be successfully resolved (even though only PAS-GAF monomer)[11]. It’s present in the extremophilic bacteriumDeinococcus Radiodurans⁴, a rather large bacterium extremely resistant to ionizing radiation, ultraviolet light, desiccation, and oxidizing and electrophilic agents.

Resolution of more complete structures followed, including the PHY domain in a dimer construct[14].

Several characterization methods were used to study the photocycle and structure of DrBph. Among them are Single Particle Electron Microscopy (SPEM)[14] and Negative-Staining EM[6] experi- ments, which highlighted movement of the two PHY domains during photoconversion and corre- spondent induced rearrangement of the OM. A wealth of spectroscopical studies, both experimental including UV[15], IR[16] and Raman[17] and theoretical[18], which identified the intermediates leading to the photoproduct. NMR spectroscopy[19] studies, casting light on the structural het- erogeneity of the dark state. Several fluorescence studies[20,21], aimed to improve flourescent properties of DrBph through targeted mutations, for the use as tissue imaging[22].

⁴The name Deinococcus radiodurans derives from the Ancient Greek deinos and kokkos meaning ”terrible grain/berry” and the Latinradiusanddurare, meaning ”radiation surviving”.

(25)

Figure 1.6.1: The first X-ray structure of DrBph, comprising of a PAS-GAF monomer (PDB entry:

1ZTU).

1.6.2 Agp2

Agp2 bathy phytochrome is found in the rhizobial soil bacteriaAgrobactrium fabrum, also known as Agrobacterium tumefaciens. In 2018 a crystal structure of a mutant in the Meta-F intermediate was resolved[23], in which BV isomerized, but the tongue didn’t yet undergo secondary transition. The mutant, named Agp2-PAiRFP2, was originally designed to enhance flourescent properties for op- togenetic purposes, such as non invasive tissue imagingin vivoin a spatially selective manner [24].

The crystal packing of the Agp2-PAiRFP2 Meta-F dimer happens at the PAS domain in an antisym- metric fashion, which is very unusual and probably due to the extensive mutations and the crystal- lographic conditions. Despite the unusual dimer formation, the PSM displays the same structure as the wild-type protein.

(26)

Figure 1.6.2: Dimer structure of Agp2-PAiRFP2 in Meta-F state. The dimer interface is between the two PAS domains in an unusual atisymmetic conformation (PDB entry: 6G20).

1.6.3 XccBph

The bathy-like phytochrome is present in the plant pathogenXanthomonas campestris. XccBph has been classified as ”bathy-like” due to its Pfr prevalence in the dark state. The full structure in Pr state, bearing the OM, a C-terminal PAS9 domain, was resolved in 2016 by Oteroet al.[25], at 3.25 Åresolution. This rare full length structure shows how the two PAS9 in the OM interlock in a anti- clock-wise twist (top view) giving additional stability to the dimer.

(27)

Figure 1.6.3: Dimer structure of XccBph in Pr state. The dimer is a head-to-head parallel dimer with the OM, a PAS9 domain, interlacing at the C-terminus (PDB entry: 5AKP).

Recently, the same group has crystalized the full length WT Pr state at a higher resolution and, unprecedented, the full length Pfr state of a mutant. The selected mutant was chosen after a screen- ing process developed previously in the same group[26]. The reported Pfr construct is found in a different conformation than in the Pr, with the dimer interface at the tongue region[citation when available].

(28)

1.6.4 Gaf-3

Slr1393 is a Cyanobacteriochrome (CBCR), also isolated fromSynechocystis sp. PCC6803, that photoswitches between red- and Green absorbing (Pg) states. It comprises of three consecutive GAF domains, followed by PAS domain and an histidine kinase as OM. We studied the third GAF domain of the series, thus the name Gaf-3, which is the only one that incorporates PCB as chro- mophore linked to Cys-528. Three states were resolved by X-ray crystallography, Pr, Pg and an Orange absorbing (Po) intermediate[27].

Figure 1.6.4: Gaf-3 structure in Pr state, as isolated from Slr1393, with PCB chromophore in salmon sticks (PDB entry: 5DFX).

Previous studies focused on the regulatory function and spectral properties of some mutants[27–

29]. In this work we focused on computation of Raman spectra and the direct comparison with experimental results to correctly assign each spectrum with the corresponding structure.

(29)

Theoretical Background 2

2.1 Molecular Dynamics

Proteins, like all complex macromolecules, are moving objects and have important dy- namic properties. One well established method for studying such properties is Molecular Dynamics (MD), which is time extension of Molecular Mechanics (MM), that uses simple potential-energy functions to model molecular systems. Besides obtaining information on the time evolution of possible conformations, also gives kinetic and thermodynamic information. Originally conceived within the theoretical physics community during the 1950s, where a hard-sphere model, in which the atoms interacted only through perfect collisions[30], MD methods were perfected to mimic real atomic interactions. In 1976 the first simulation of a protein, a bovine pancreatic trypsin inhibitor, using an empirical energy function constructed using physics-based first-principles assumptions was finally performed[31]. This first simulation of a globular protein of about 500 atoms covered only 9.2 ps of simulation time. Nowadays simulations including 104-106 atoms spanning several hundreds of nanoseconds are routinely achievable, thanks to improvement in computing power and the adoption of GPU acceleration[32]. Simultaneously to hardware advancement major the- oretical and methodological developments also contribute significantly to improve conformation sampling.

(30)

2.1.1 Force Field

An energy function for describing the intermolecular and intramolecular interactions has to be used for MD simulations. In classical MD simulations, the energy function for nonbonded interactions is a pairwise additive function (for computational reasons) of nuclear coordinates only, justified in terms of the Born-Oppenheimer approximation[33]. For bonded groups of atoms, that is those that form covalent bonds, bond angles, or dihedral angles, simple two-body, three-body, and four-body terms are used. The energy functions usually consist of a large number of parametrized terms. These parameters are obtained from experimental and/or quantum mechanical studies of small molecules or fragments, and it is assumed that such parameters may be transferred to the larger molecule of interest. The set of functions along with the associated set of parameters is termed a force field (ff).

A variety of force fields have been developed and perfected specifically for simulation of proteins.

Force field terms are independent and purely additive, e.g. bond lengths are not considered to be dependent on the bond angles, and atomic partial charges are fixed in magnitude. Despite these approximations, these classical ff usually lead to reasonable results.

The two earliest, more robust, widely used and tested ff are the CHARMM[34] and AMBER[35]

ff.

The CHARMM ff has the following form

V(CHARMM) =∑

bonds

kr(r−r0)2+∑

angles

kθ(θ−θ0)2+

dihedrals

kφ[1+cos(nφ−δ)] + ∑

Urey−Bradley

ku(u−u0)2+

impropers

kω(ω−ω0)2+∑

φ,ψ

VCMAP+

nonbonded

{ ε

[Rminij

r12ij −Rminij

r6ij ]

+ qiqj

εrij

}

(2.1)

where thekvalues are the force constants and terms with a zero subscript are the equilibrium values. The intramolecular therms are the bond, that account for the 1-2 bond length, the 1-3 angle term and the 1-4 dihedral term (see fig.2.1.1). In addition there are the Urey-Bradley term, which is a two body term, that extends over all 1-3 bonds, the second is a four body quadratic improper term, although many Urey-Bradley terms and improper dihedral angles are not used, only those that are required for fitting computational results to observable vibrational spectra are utilized. In addition, the linear term in the original Urey-Bradley function is not incorporated at all in recent ff versions. The final additional term is a cross term, CMAP[36], which is a function of two sequential carbon back bone dihedrals. This term originates from differences observed between classically calculated two dimensionalφ/ψ peptide free energy surfaces using the CHARMM22 force field

(31)

Figure 2.1.1: Illustration of the bonded interactions of a simple chain molecule. interatomic distance r23 , bend angleθ234, and torsion angleΦ1234.

and that of experiment. CMAP is a numerical energy correction which essentially transforms the 2Dφ/ψclassical energy map to match that of a Quantum Mechanic (QM) calculated map.

The final term in the function represents the nonbonded interactions, incorporating Coulombic and Lennard-Jones interactions. εrelates to the Lennard-Jones well depth,Rminis the distance at which the Lennard-Jones potential is zero, q is the partial atomic charge of atomi,εis the effective dielectric constant, andrijis the distance between atomsiandj.

While the AMBER ff has the following form

V(AMBER) =∑

bonds

kr(r−r0)2+∑

angles

kθ(θ−θ0)2+

dihedrals

Vn

2 [1+cos(nω−γ)]+

i<j

[Aij

R12ij Bij

R6ij ]

+∑

i<j

[qiqj

εRij

] (2.2)

where the bond and angle terms are identical, while now all the 1-4 terms are contained in the dihedral term without further corrections. The non-bonded terms have a different expression, but account for the same interactions.

Another difference between CHARMM and AMBER is how charges are derived. AMBER employs HF/6-31G* RESP (Restrained ElectroStatic Potential)[37] partial atomic charges that serendip- itously overestimate gas-phase dipoles by a similar amount as obtained in water models such as TIP3P thus approximating the polarization expected in aqueous solution. CHARMM instead ac- counts for a more direct interaction with solvent molecules. Starts by taking the residues optimized

(32)

structures in vacuum (at the HF/6-31G(d) level) and perform a series of supermolecular calcula- tions involving the model compound and a single water molecule (TIP3P) at each of the various polar sites. The supermolecule structure is then optimized at the HF/6-31G(d) level by varying the interaction distance and angle, to find the local minimum for the water position with fixed geome- tries, and partial charges are finally obtained.

These differences illustrate why is not possible to mix different ff because are derived and incorpo- rate terms differently, even though some tools were created in order to adapt one to the other (e.g.

from CHARMM to AMBER, CHAMBER[38]). Another important consideration is that these ff are obtained and optimized for the TIP3P water model and the use of another one is not advisable and can lead to wrong results.

Despite the differences between these two ff, they both are very robust and reliable ff that perform comparably and predict very similar protein dynamics and the use of one or the other is mostly about personal preferences and parameters availability.

2.1.2 Minimization

Before any MD simulation the initial structure of the protein must be relaxed in order to eliminate possible atom clashes and to refine distorted bond angles and lengths. The task is to find the val- ues for which a particular function,V(r), has its global minimum, as function of all the positionsr.

Locating the exact global minimum of a general nonlinear function with more than a dozen of inde- pendent variables is basically unfeasible, since the system has 3Nvariables, whereNis the number of atom, usually up to few hundred. Nonetheless this step is fundamental to any following MD sim- ulation.

There are few methods, that include first-order steepest descent and conjugate gradient methods and the second-order Newton-Raphson method. The steepest descent method is one of the most used first-order iterative descent methods. This utilizes the gradient of the potential-energy surface, which directly relates to forces in the MM description of molecular systems, to guide a search path toward the nearest energy minimum. The force vector is defined asF(r) = −d/drV(r)whereris the vector of atomic coordinates. As an iterative descent methods, a succession of atomic configu- rations are generated by applying, for iteration k, the relationship

x(k) = x(k−1) +λ(k)F(k) (2.3)

where the vectorxrepresents the 3Ndimensional configuration,λ(k)is a step size, andF(k)is the force vector. The step size for the first iteration is usually selected arbitrarily. After every iteration this step size is adjusted according to whether the overall potential energy of the system was reduced or increased by that step. If the energy increased, it is assumed that the step size was sufficiently large to jump over the local minimum along the search direction, and accordingly the step size is reduced

(33)

by some multiplicative factor, typically 0.5. In the event that the energy was indeed reduced, the step size is increased by some factor, typically around 1.2. While the steepest descent method is inefficient for multidimensional problems with potential surfaces with multiple local minima, it is robust in locating the closest local minimum and is very effective in removing steric conflicts and relaxing bond lengths and bond angles to their canonical values.

2.1.3 Molecular Dynamics Method

Molecular dynamics simulations involve the iterative numerical calculation of instantaneous forces present in a MM system and its consequential movements. The MM system consists of a set of par- ticles that move in response to their interactions according to the equations of motion defined in classical mechanics. All QM effects are ignored and each particle, typically a single atom, is consid- ered to be a point mass. The MD trajectories are propagated according to the Newtonian equation of motion, for theith atom

dpi

dt =Fi (2.4)

wherepi =mviandFi =−dV/dviin whichVis defined by eq.2.1 or 2.2.

Numerous algorithms exist for integrating the equations of motion in which the integration is par- titioned into small steps, each separated in time by a specific period Δtbecause the continuous po- tentials describing atomic interaction preclude an analytical solution. Two of the more commonly used algorithms are theLeapfrog methodand theVelocity Verlet method. The Leapfrog algorithm uses the positions at timetand the velocities at timet−(Δt/2)for the update of both positions and velocities via the calculated forces,F(t), acting on the atoms at timet

x(t+Δt) = x(t) +v(t) (

t+ Δt 2

)

Δt (2.5)

v(t) (

t+Δt 2

)

=v(t) (

t− Δt 2

)

+a(t)Δt (2.6)

The velocity Verlet method instead synchronizes the calculation of positions, velocities and ac- celerations

x(t+Δt) = x(t) +v(t)Δt+ 1

2a(t)(Δt)2 (2.7)

v(t+Δt) =v(t) + 1 2

(

a(t) +a(t+Δt) )

Δt (2.8)

(34)

Often a mix of the two is used to optimize roundoff errors. These integrators are time reversible.

This means the direction of simulation in time is arbitrary. If the velocities of all atoms were swapped in sign, the simulation would run in exactly the reverse direction.

MD methods are, in principle, ergodic, so an infinitely long simulation would sample all possible conformations. Practically this is not feasible and often the sampled conformational space is not suf- ficient, especially if the goal is the observation of events beyond theμstimescale, such as secondary structure transitions or folding. This limitations gave rise to the development of several enhanced methods whose goal is to extend the conformational space sampling in shorter timescales[39,40].

Examples are simulated annealing[41], replica exchange[42], adaptively biased MD[43], umbrella

sampling[44], metadynamics[45], accelerated MD (aMD)[46] and Gaussian accelerated MD (GaMD)[47].

In this work we have used the last two methods. A detailed description can be found in section 2.3.

2.2 Including the Environment

The earliest MD simulation, for technical limitations, were done in vacuum. It’s although funda- mental to include the effects deriving from the environment, such as water, ions and other proteins, to get results close to experimental conditions. There are two ways to account for the presence of water and ions, implicit and explicit solvent models. Implicit solvent methods can speed up atom- istic simulations by approximating the discrete solvent as a dielectric continuum, thus drastically reducing the number of particles in the system. An additional effective speedup often comes from much faster sampling of the conformational space afforded by these methods. The Generalized Born (GB) solvation model is the most commonly used implicit solvent model for atomistic MD simulation[48]. It is based on modeling the solute as a set of spheres whose internal dielectric con- stant differs from the external solvent

Gs= 1 8πε0

( 1 1

ε )q2

2A (2.9)

wereε0it the dielectric constant in vacuum,εis the specific dielectric constant of the medium, Ais the ion radius andqis its charge. All direct interaction between water and protein are lost, mak- ing this model fast, but not always accurate. Thanks to the improvement of computational power the use of implicit water models is generally discouraged, but for some specific cases. The more costly, but also more accurate, explicit solvent models are now the standard. In explicit water mod- els water molecules surround the protein creating, together with ions, an environment close to the experimental one. Different water models exist, and they mainly differ on how many interaction points, calledsites, they have. They can range from 2 to 6, but the most used are with 3 and 4 sites, as illustrated in fig.2.2.1.

Three-site models have three interaction points corresponding to the three atoms of the water

(35)

Figure 2.2.1: Different water models and their sites.

molecule. Each site has a point charge, and the oxygen atom also has the Lennard-Jones param- eters. The most used is the Transferable Intermolecular Potential with 3 Points (TIP3P) model.

3-site models achieve a high computational efficiency with decent accuracy and are therefore the most used in MD simulations where there could be hundred of thousand water molecules present.

The four-site models have four interaction points by adding one dummy atom near of the oxygen along the bisector of the HOH angle of the three-site models (labeled M in the figure). The dummy atom only has a negative charge. This model improves the electrostatic distribution around the wa- ter molecule. As already discussed previously, it’s necessary to use the water model that has been used in the development of the ff of choice.

To simulate physiological conditions, where the solvent and solutes extend for hundreds of Å, it is common to duplicate the system periodically in all directions to represent an essentially infinite system. Typically, a cubic lattice is used for replication of the central cubic box, but other geome- tries, such as dodecahedron and octahedron, can be used. The atoms outside the central box are simply images of the atoms simulated in that box. So-calledperiodic boundary conditionsensure that all simulated atoms are surrounded by neighboring atoms, whether those neighbors are images or not. Every atom that leaves the box enters from the other side, ensuring the mass conservation of the system (see fig.2.2.2).

The computation of long range interactions arising from the periodic boundary conditions is made with the Ewald summation method[49], called Particle Mesh Ewald (PME). It was first de- veloped as the method for calculating electrostatic energies of ionic crystals but then vastly used in computational chemistry[50]. Ewald summation is a special case of the Poisson summation for- mula, replacing the summation of interaction energies in real space with an equivalent summation in Fourier space. In this method, the long-range interaction is divided into two parts: a short-range contribution, calculated in real space, and a long-range contribution, calculated using a Fourier transform. This method ensures rapid convergence of the energy compared with that of a direct summation, has high accuracy and reasonable speed when computing long-range interactions, and it is thus thede factostandard method for calculating long-range interactions in periodic systems.

(36)

Figure 2.2.2: Periodic boundary conditions. As a particle moves out of the simulation box, an image particle moves in to replace it.

2.2.1 SHAKE algorithm

The higher the number of variables that have to be computed during the simulation the longer it would take. Improvements in efficiency are often obtained through freezing the fastest modes of vibration by constraining the bonds to hydrogen atoms to fixed lengths using algorithms such as SHAKE[51] or RATTLE[52]. Using the SHAKE algorithm allow the use of larger time-step sizes without any significant degradation in the quality of the simulation. The SHAKE algorithm (also known as the constrained Verlet method) is a modification of the Verlet algorithm to impose con- straints on the internal coordinates such as bond lengths and bond angles. The length of the time step is restricted by the requirement that Δt is small compared to the period of the highest fre- quency motions being simulated. For the biomolecular systems, the highest frequency motions are the bond stretching vibrations, yet these vibrations are generally of minimal interest in the study of biomolecular structure and function. In essence, they may be considered as averaging out the highest frequency vibrations. In the SHAKE algorithm all constraints are imposed through fixed interatomic distances.

2.3 Accelerated and Gaussian Accelerated Molecular Dynamics

For most biological systems of interest, the simulation time is limited to the nanosecond microsec- ond time scale, so simple molecular dynamics cannot be used to adequately explore portions of the energy landscape separated by high barriers from the initial minimum. Furthermore, for most bi- ological molecules, the energy landscape has multiple minima or potential energy wells with high

(37)

free energy barriers, and during a molecular dynamics simulation the system is trapped in one or another local minimum for long periods of simulation time.

Accelerated Molecular Dynamics (aMD) is a bias potential introduced by the McCammon group at UCSD[46]. It is a modification to the potential that in practice reduces the height of local barri- ers, allowing the calculation to evolve much faster and doesn’t require information of where are the barriers, saddle points or even what type of configuration changes are expected or necessary to tra- verse through a particular barrier. Moreover, an interesting feature of aMD is that the shape of the added potential conserves the underlying shape of the real one, such that minima are maintained as minima and barriers are preserved as barriers. In result, adding the aMD potential in practice simply modifies the relation between energy differences, so the distribution of sampling of differ- ent structures is still related to the original potential distribution and can be recovered exactly by reweighing. The general idea behind the accelerated molecular dynamics scheme can be expressed as

V(r) =



V(r), ifV(r)≥E

V(r) +ΔV(r), ifV(r)<E (2.10) whereV(r)is the modified potential,V(r)is the original potential, ΔV(r)is a continuous non- negative bias boost potential function andEis the boost energy threshold. The biasing potential is defined as

ΔV(r) = (E−V(r))2

α+ (E−V(r)) (2.11)

αis a tuning parameter that determines how deep the modified potential energy basin is. When is zero the modified potential is flat, which means free walking below the threshold energyE. Effect of various values ofαcan be seen in graph 2.3.1.

A careful choice of the boost energyEandαvalues has to be done to accelerate the sampling without destroying the underlying potential shape.

After the simulation is possible to retrieve the original distribution via a reweighting procedure, which converges to the canonical distribution, and the corrected canonical ensemble average of the system is obtained by simply reweighting each point in the configuration phase space on the modified potential by the strength of the Boltzmann factor of the bias energy at that particular point.

This procedure is, though, not completely accurate.

To account for the reweighting inaccuracies an improved method as been developed by Miaoet al.

[47]. It works by adding a harmonic boost potential to smooth the system potential energy surface.

The boost potential follows Gaussian distribution, thus the name Gaussian Accelerated Molecular Dynamics (GaMD), which allows for accurate reweighting using cumulant expansion to the second

(38)

Figure 2.3.1: Schematic representation of a hypothetical potential energy function and several bias potentials plotted at various values of α, and at a relative low value of the threshold boost energy,E.

order. Now the boost potential has the form ΔV(r) = 1

2k(E−V(r))2 (2.12)

wherekis the harmonic force constant, which has to satisfy the relation

k≤ 1

Vmax−Vmin

(2.13) whereVminandVmaxare the system minimum and maximum potential energies. Definedk k0(1/(Vmax−Vmin)); then 0<k0 1, and the threshold energyEfalls in the following range

Vmax ≤E≤Vmin+ 1

k (2.14)

k0determines the magnitude of the applied boost potential. With greaterk0, higher boost po- tential is added to the potential energy surface, which provides enhanced sampling of biomolecules across decreased energy barriers.

For simulations of a biomolecular system, the probability distribution along a selected reaction co- ordinateA(r)is written asp(A), whererdenotes the atomic positionsr1, ...,rN. Given the boost potential ΔV(r)of each frame,p(A)can be reweighted to recover the canonical ensemble distri- bution,p(A), as

p(Aj) = p(Aj) ⟨eβΔV(r)j

M

j=1⟨eβΔV(r)j

, j=1, ...,M (2.15)

whereMis the number of bins, β = KBTand⟨eβΔV(r)jis the ensemble-averaged Boltzmann

(39)

factor of ΔV(r)for simulation frames found in thejthbin. In order to reduce the energetic noise, the ensemble-averaged reweighting factor can be approximated using cumulant expansion

⟨eβΔV=exp {∑

k=1

βk k!Ck

}

(2.16) where the first three cumulants are given by

C1 =⟨ΔV⟩

C2 =⟨ΔV2⟩ − ⟨ΔV⟩2

C3 =⟨ΔV2⟩ −3⟨ΔV2⟩⟨ΔV⟩+2⟨ΔV⟩2

(2.17)

when the boost potential follows Gaussian distribution, cumulant expansion stops to the sec- ond order and provides a more accurate reweighting compared with non-Gaussian distribution.

Both aMD and GaMD have the option to apply the boost to the potential energy only, dihedral energy only or to both, called dual-boost. The dual-boost simulation provides higher acceleration than the other two types of acceleration and is generally the most used.

An increase of kinetic rates and sampling was proven for both methods, aMD [53–55] and GaMD [56], leading to speedup from 10 to 1010fold, depending on the system size, process and strength of acceleration, compared to cMD.

2.4 Constant pH Molecular Dynamics

Crystal structures usually don’t provide information about hydrogen’s location, due to their small size. Though a correct protonation assignment of each residue, especially for those close to a sen- sitive are, such as the chromophore binding pocket, is crucial for a reliable MD simulation. Ideally the assignment should be periodically updated during the simulation, especially for residues that have pKa close to the pH in use, which is usually pH 7, such as His.

A number of methods incorporating such events have been proposed, mostly involving the Poisson- Boltzmann equation to gauge the correct protonation state but not for the propagation of the MD trajectory. More recently a method using integral changes in protonation and consistent potentials for both titration and propagation was presented. Using a generalized Born model for the aqueous solvent, Langevin dynamics is used to propagate the solute trajectories. The interdependence of titration states and solute conformation is recognized by use of periodic Monte Carlo sampling of the protonation states of the titratable residues[57].

We used the method implemented in Amber v.18, in which MD is propagated in explicit solvent, but protonation state changes are still attempted using a Generalized Born implicit solvent model[58].

The workflow, shown in fig. 2.4.1, involves running MD for a certain number of steps, stripping the solvent and ions, attempting protonation state changes for each titratable residue in random order,

Referenzen

ÄHNLICHE DOKUMENTE

For example, RT reported only that: “There have been protests across Europe against global free trade deals, including the Transatlantic Trade and Industrial Pact (TTIP) with

The embryo is mounted in anterior orientation (A1-A3), in ventral orientation, focus on the proximal An1 and An2 articles (B1-B3), in ventral, tilted orientation (C1-C3),

Later in embryonic development (beginning at stage 22), anterior expression of Photd2 is highly restricted in the anterior head and only appears in the developing PC in a small

‘fake’ to an exact copy of an already existing work, which is then passed off as the original, and ‘forgery’ to a work that is not an exact copy, but rather done ‘in the

Here strong voltage dips in the po- wer supply may occur – such as from short circuits, from shutdown of an entire grid section or from the failure of a large power plant unit..

In the context of the doped spin liquids noted earlier, we obtain a FL* state when the holon and spinon bind to form a fermionic state with spin S = 1=2 and charge +e (a possible

The main political parties, particularly the Popular Party (PP) and the Socialists, which between them have ruled Spain for the last 30 years, and politicians as individuals

M.. Proposed models, algorithms.. c ) Mapping of economic damage. d ) Calculation of fields of pollutant maximum concentrations under normal un- favorable