• Keine Ergebnisse gefunden

dependent receptors

8.3 the structure of HasR alone

To this day no crystal structure of HasR alone exists. This is most probably due to the long exible loops protruding into the extracellular medium inhibiting crystal formation.

Two strategies to generate a reliable HasR structure have been devised: Identifying suit-able peptides for co-crystallization and generating a relaxed structure using molecular dynamics.

8.3.1 Co-crystallization

The work ow for the identication of the peptides was this: For any suitable peptide it had to be determined whether the peptide would bind to the protein. For this task docking programs are used. Then it had to be determined whether the bound peptide has a stabilizing eect on the area. This can be achieved by MD simulations, comparing the average uctuation of the area inuenced by the peptide with a simulation run without the peptide. To verify the stabilizing eect these simulation need to be extremely long.

To observe protein relaxation requires on average simulation times of beyond 0.5 ms271.

A good starting point for the identication of peptides was of course the structure of parts of HasA that made close contact to HasR. Peptides constructed from these parts were already known to bind to HasR in at least one conguration. To choose some peptides for initial experiments the force of interaction was computed on a residue basis between the residue itself and HasR. From this ve peptides, each seven residues long, those were chosen that bound to HasR most tightly and used for further analysis.

At rst it had to be tested whether these peptides would bind to other parts of HasR.

The computational method known as 'Docking' is most suitable for this task. Here a small molecule, called the ligand, is tted into a known binding pocket of the receptor. For this pocket a so called receptor grid is generated containing e.g. electrostatic properties . This grid can be only of limited size to keep computational requirements within reasonable boarders. The ligand can be treated as exible to achieve an even better tting. Quite a number of programs available, free of charge or commercially, designed for this specic task. These programs, however, require previous knowledge about the location of the binding site. Since this is exactly the thing we were interested in these programs were not suitable for us.

A selection of workarounds was tested, however: Several programs are available designed for the detection of cavities in a receptor. These programs identied without fail the electrostatically rich area on top of HasR as well as the bottom close to the plug domain.

Since it was of great interest whether the peptides bind to the outside of the barrel in the areas forming the crystal contacts, this was no longer pursued. The other workaround involved the generation of a large number of receptor grids, and t the peptide into each of them. Here another limitation of the docking tools became obvious. The tools would only accept ligands with 300 atoms or 50 rotatable bonds, again due to computational restrictions. Our peptides exceeds these limits by far.

Protein-protein docking tools are designed to dock an arbitrarily sized protein to another protein. In this case, both proteins are kept rigid, which limits its usability drastically.

At this point we assumed that the peptides do not bind to the protein in the same conguration as they do at their original contact area. Therefore, linear peptides were constructed using the leap program from the amber package. A home grown python script was used to induce random changes in the φ and ψ values of up to ve residues in the peptide. By doing so we generated an ensemble of 500 randomly generated but still largely linear (in contrast to hydrophobically collapsed) peptides. This was found to be the minimal non-repetitive set of conformations. The protein docking program 'daughter of turnip'272275 was nally used to dock the peptides to the protein.

At this point the project was abandoned due to the following reasons: The docking calculations for all conformations of one peptide alone would take up more than 300 cpu years. And this does not even include the MD simulations to determine the quality. And second, we argued that crystallizing a protein with some peptides is not likely to result in a structure of the protein alone, but rather a structure of a protein with some peptides attached to it, maybe even in the structure as with the complete HasA bound to it.

8.3.2 Locally enhanced sampling

The locally enhanced sampling functionality of AMBER1017 was used to generate a re-laxed conformation of the extracellular loops. A topology le was created using the HasR structure of the holo complex. The LES functionality of the leap program generated 10 copies for each individual loop. One copy of each loop belonged to one set of loops. So the nal topology le contained 10 sets of loops. This ensured that neighboring loops of one set would not overlap each other during the simulations. After the simulation each set of loops can be extracted and investigated individually. Also the mass of the atoms was reduced to 1/10 of its original mass. This standard procedure for LES increases the mobility of the atoms drastically.

The simulation was run for 50 ns at 300 K controlled by a Berendsen thermostat in a box with Tip3P water. The investigation of the ten individual sets afterwards revealed only very small dierences between the simulation product and the crystal structure. As mentioned above, protein relaxation can be observed at around 0.5 ms. The set with the highest overall rms dierence had a distance of 1.9 Å to the crystal HasR. This was mainly due to side chain orientations.

Due to the higher memory requirements of LES simulations the simulation eciency is drastically reduced. The simulation of these 50 ns took 7.8 cpu years, three months on 32 cpus.

8.4 AmberPython

Generally simulator packages contain, apart from the main integrator, also a selection of tools to generate and review input data and evaluate and analyze output data. The output data is usually presented in the form of Cartesian coordinates or velocities per atom saved at dened time steps. On the y derived state variables of the system can also be stored during the simulation. In the the case of the Amber MD package the evaluation program of choice is the ptraj program developed and maintained by the group of Prof. Chatham.

Ptraj is capable of handling and modifying les containing the Cartesian coordinates and either transform the whole le into dierent data formates (such as the charmm format) or exporting individual time steps, or parts thereof, to dierent data formats such as the PDB format. It can also perform a series of analytical functions, e.g. calculating the root means square deviation (RMSD) vs. a given coordinate, count the number of waters near a specied set of atoms or generate average structures over a time period and calculate the average uctuation during that time.

Ptraj has a few limitations, however. It is only capable of handling individual coor-dinate les. It is not possible to evaluate velocities or energy data collected during the simulation, nor is it possible to compare individual coordinate les derived from simula-tions with dierent starting posisimula-tions or simulation control parameters. To overcome this problem I have developed a number of functions and scripts in both bash and python.

I currently compose these functions into a ready-to-use python module similar to the

famous BioPython package276,277, that can be implemented in any python project. The main goal behind this approach is to give modelers and developers working with the Am-ber simulation programs a robust, well tested and open frame for their own endeavors without having to deal with the nitty-gritty error-prone elementary tasks like reading les and extracting information.

The model class is designed to store all data that is present in a parameter le: Data concerning atom name, number and element, along with mass, charge and vdW-radius.

It also stores data regarding bonding, equilibrium bond length and angle, their respective spring constants and dihedral data like phase and periodicity. The hierarchy is similar the BioPython's PDB class278: the atoms belongs to and inherits from a residue, which in turn belongs to a chain, which then belongs to the model. Each of the structural objects also inherits from an entity class. In general, a child Entity object (i.e. Atom, Residue, Chain) can be extracted from its parent (i.e. Residue, Chain, Model respectively) by using it's id as a key.

The model class is best be instanced using a amber parameter le generated with the one of Ambers leap programs, since it contains all relevant data except for chain names and title informations. The model object can later be completed with the original pdb le used for generating the .prmtop, providing alternative atom and residue numbers and chain names. The numbering of Amber parameter les starts at one for both atoms and residues and is continuously increased disregarding possible osets for individual chains in the original pdb-le. This is done to provide unique id's for all atoms in the .prmtop. When instancing the model object using a parameter le, chain names are automatically assigned: connected residues are given the same chain name starting with a capital A, increasing with each new fragment disregarding its nature, protein, DNA or ligand. Waters and solvent molecules that are marked as such are given a 'W'. Should the model contain more that 20 fragments, waters do not count, each fragment is given an 'A' and nal labeling is assigned to the user.

Instancing the model using a pdb-le is also possible though not recommended. For standard residues of DNA and proteins pdb-les do not contain bonding information and the interpreter has to derive that from the residue and atom name which might not be successful, since dierent protonation states, for example HIS, HIE, HID and HIP for histidine or other modications like selenomethionine increase the number of standard cases drastically. All other parameters, like charge and mass, have to be derived the same way only using atom name. This might also be error prone, e.g. the dierence between the alpha carbon and a calcium is wether the "a" is capital or not. For non standard residues the connect section at the end of the pdb le is examined.

After instancing the model, coordinates and velocities can be assigned to the atoms. It is capable of handling pdb-les and Amber restart les with and without velocities and simulation les. The latter can be quite large. The package does, therefore, not attempt to store everything in the computers memory, except when specically asked for, instead it references these les at a ten frame interval for faster access.

In this work the heme acquisition system of the opportunistic enterobacterium Serratia marcescens is investigated using in silico simulation methods.

The system comprises a small soluble protein called HasA (chapter4.5.1) and an outer membrane receptor called HasR (chapter 4.5.2). HasA is secreted to the extracellular medium by an ABC-transporter and captures a heme molecule. The heme-loaded HasA then binds to HasR. During the binding process the heme molecule is removed from the HasA binding pocket and transferred to the HasR binding site deep within the receptor.

This process occurs spontaneously (chapter4.5.2). The transport through the membrane, however, requires energy. This is supplied by the TonB homolog HasB, which spans the periplasmic cleft and binds to the receptor via the TonB-box. A conformational change in HasB supposedly pulls the plug domain of HasR out of the barrel, dragging the heme molecule along (chapter 4.3).

Here, the path of the heme molecule was investigated starting from holoHasA and ending with the heme in the periplasmic cleft. Since it it still very complicated and time consuming to generate intermediate structures of ligand transfer or complex formation us-ing X-ray crystallography or NMR experiments, all-atom molecular dynamic simulation was used. Here a force eld (chapter3.2) is used to integrate Newton's second law of mo-tion (chapter 3.3). Starting from a known structure the trajectory of the protein in phase space can be computed iteratively (chapter3.1). The use of advanced simulation methods and articially biasing potentials (chapter 3.3.3) allows a decrease of computational time by preventing the system from exploring phase space regions of lesser interest.

For this protein system a number of NMR and crystal structures are available: An NMR structure for apoHasA and a crystal structure for holoHasA. Also, the wild-type structure of the complex with the heme bound to HasR as well as a heme-free structure, and also a mutant structure with Isoleucine 671 changed to a glycine is available. Here the heme is still bound to HasA. And double mutant structure , residues number 297 and 800 mutated to alanine, has been made available which shows a complex that might represent an early stage in the complex formation.

From the start, the three aspects of this project, complex formation, heme transfer and heme transport, were supposed to be parts of one large simulation. Therefore, despite being investigated separately the main factors determining each simulation (integration scheme, temperature control, etc., compare tables 5.1, 6.1 and 7.1) were strictly kept identical. The main idea was to be able to merge the resulting simulation les of the three aspects into one all-describing movie. Therefore, in each aspect the whole protein was used, although conventionally the simulation is restricted to important parts, for

example simulating only the extracellular part of HasR for the complex-formation and using restraints to mimic the barrel.

As a side eect, all three aspects had to be simulated an equilibrated similarly. So whenever one aspect showed unexpected or unbiological behavior or even crashed with one set of parameters, and an improved set had to be derived, the other two projects had to be redone using this new set as well. While this was certainly a time consuming endeavor, it resulted in a set-up that has proven to be extremely robust against a wide variety of treatments. The described behavior, or the lack thereof, is therefore with a high probability an aspect of the protein itself and not an eect of the method.