• Keine Ergebnisse gefunden

A.2 Tools and Libraries

A.2.3 Libraries and Tools Connected to HYDE

Examples

Create a new torsion library based on a given multi mol file and a torsion library hierarchy:

TorsionPatternMiner --out DIR --initialtorlib TOR_LIB --molfile MULTIMOLFILE --database mols.db

--selectivematching false --storeincsdhistograms true

From the resulting output files, the new tor lib should then be used to control the quality of the peak determination in running the tool in single matching mode on the molecule set.

TorsionPatternMiner --out DIR --initialtorlib DIR/newtorlib.xml --database mols.db --donotuseterminalbonds true

The resultingbondanglesmatching.csvcan then be analyzed by our python script createpaperplots.pyto generate the torsion rule - red flags in per cent plot. It is advisable to compare thebondanglesmatching.csvfile of the initial torsion lib with the one generated by the new tor lib. Likewise, a different molecule set such as the ligand expo can be employed to test its agreement with the presented torsion library.

The python script sortandcomparetorsionpatterns.py takes as input two such files and compares each bond, angle data triplet in terms of the matching torsion rule and the determined angle quality. If the data triplet matches a different torsion rule and, or receives a differing quality assessment, it will be quantified in the output files and annotated with examples. This analysis was applied on the resorted TorLib to control against unwanted sorting until only reasonable switches were found.

visualizeContTorScoreFromPatMiner.py takes as input the directory with the extracted data and visualizes the given patterns.

HydeDebugGUI

TheHydeDebugGUI(HDG) is a graphical tool to analyze and optimize HYDE scores.

It was initially developed by Dr. Schneider and further expanded by Dr. Nittinger and me. A full reimplementation due to HDG’s current incompatibility to Qt on Windows was implemented by M. Gr¨ossler under my supervision and is in preparation for the merge into the NAOMI mainline. The merge is currently not possible since the ability to compute structures on the command line has not yet been reimplemented.

HDG visualizes the active site with the for HYDE relevant hydrogen bonds and other information such as the position of possible waters (Figure A.4). It also shows residual HYDE scores for e.g. thermostability analysis and protein-protein interface scores in a given structure. Great care has been taken on allowing a full export of the result of a geometrical optimization including proton positions.

To identify structural deficits, atomic B factor and EDIA coloring are integrated.

Ligands in the score table are marked yellow, when strained torsions are present.

They are marked orange when a close heavy atom contact is detected. If both are present, the ligand entry is dyed in red.

The results of e.g. the small series data set given on the command line can then be analyzed with the analysis scripts of the CASF benchmarks and the Python framework calledhyde evaluatorwritten by our cooperation partner BioSolveIT.

For working graphically with HYDE, the HDG is the center tool.

geohydeoptimizer

geohydeoptimizerwas written by BioSolveIT to analyze the dispersion of small scale sampled ligand configurations through optimization with GeoHYDE. Through RMSD based cluster analysis, the spread of HYDE scores per RMSD cluster as well as the amount of such clusters and the existence of singletons can be observed.

A strong increase in RMSD clusters or in the HYDE score difference per cluster indicate the introduction of an unwanted step function into GeoHYDE. The tool is a derivative and an extension of my first now outdated benchmarking tool called hydeoptimizer.

GeohydeEvaluator

GeohydeEvaluatoris a newly developed, highly configurable tool for benchmark-ing GeoHYDE that integratesgeohydeoptimizer’s ligand sampling ability. In its

Figure A.4: HydeDebugGUI displays e.g. pocket with HYDE or EDIAmcolors.

normal configuration, it optimizes a specific complex-ligand complex in regards to HYDE and reports partial initial and final scores. The results of a pocket flex-ibility analysis with SIENA can be supplied to the tool. If such flexible residues are present in the pocket, their side chains will also be geometrically optimized in combination with the ligand. In the following examples as well as necessary details of the implementation are outlined. Available tool options:

• --resultFolderSpecify result output folder (required)

• --complexComplex PDB file (required)

• --configConfiguration file (required)

• --ligandLigand or conformers of ligand sdf file

• --densityDensity in ccp4 file format

• --molIdMol id of the ligand in the PDB file to be analyzed. Format should be HET ID CHAIN RESSEQID

• --waters Geometrically optimize waters in binding pocket which initially have at least an EDIAmof a certain value

• --sienadbSIENA result DB to extract flexible residues from

• --printconfigWrite out config file Examples

Evaluate a specific ligand withGeohydeEvaluator:

GeohydeEvaluator --resultFolder YOURLOCATION --complex YOURCOMPLEX --molID ID_CHAIN_RESSEQID --density DENSITY_PDBID.ccp4

An initial tool configuration can be obtained in setting--printconfig to True. It can subsequently be fed back into the tool:

GeohydeEvaluator --resultFolder YOURLOCATION --complex YOURCOMPLEX --molID ID_CHAIN_RESSEQID --density DENSITY_PDBID.ccp4

--config YOURCONFIG

In switchingRunSamplingin the configuration file totrue, the ligand will be sam-pled at the beginning. All conformations will then be optimized and evaluated. Be aware, that sampling around torsion bonds should be strongly restricted in setting the maximum number of conformers generated by sampling around torsion bonds withMaxNumberTorsionWobblingPosesto a value of 30 for example. The sampling can further be configured for rotation, translation and torsion bond sampling.

GeohydeEvaluator --resultFolder YOURLOCATION --complex YOURCOMPLEX --molID ID_CHAIN_RESSEQID --density DENSITY_PDBID.ccp4

--config YOURMODIFIEDCONFIG

Optimizing with flexible side chains is possible in giving a SIENA result database toGeohydeEvaluatorwith an entry for the specific PDB structure and switching FlexibleResiduestotrue. Please be aware, that only flexible residue in the active site of the specified ligand can be considered. If none of them are close enough to the ligand,GeohydeEvaluatorautomatically switches to a normal optimization without protein flexibility. If flexible residues are detected, their initial and final EDIAmafter the optimization will be reported in an additional column in the output file.

GeohydeEvaluator --resultFolder YOURLOCATION --complex YOURCOMPLEX --molID ID_CHAIN_RESSEQID --density DENSITY_PDBID.ccp4

--sienadb SIENARESULTDB

The experiment to optimize consecutively all waters in the active site with an initial EDIAmabove e.g. 0.8 is as follows:

GeohydeEvaluator --resultFolder YOURLOCATION --complex YOURCOMPLEX --molID ID_CHAIN_RESSEQID --density DENSITY_PDBID.ccp4

--waters 0.8

The subsequent passages describe the inner work flow from reading an input file over optimizing to scoring a pocket with HYDE.

Preprocessing

In the following, the preprocessing of a complex with its ligand is explained. A

com-plex can be presented in PDB format and with the help of theComplexLib::ComplexFactory translated into a NAOMI complex. The ligand to be optimized can either be added

with the help of an SDF file or in specifying its molecular id in giving the triplet HETcode Chain ResSeqId to the executable. In the first case, the SDF file is pro-cessed and all entries are seen as the configurations of an identical ligand. With the help of the second method, a infile id and chain match is searched in the complex molecules, ions and waters. The matching structure is then used as the ligand.

Subsequently, the active site needs to be prepared. First, the standard HYDE site with the radius of 8 Å around the ligand as well as the big site with the radius of 11.5 Å are created. All waters are then removed in both pockets and Protoss [5] is run for both sites and the ligand. Only if the user supplies precomputed ligand configurations, Protoss is not used on the ligand to avoid changes in its proton configuration. In accordance to the workflow for treating waters as used in warpp[41], all waters are removed from the binding site. If aSIENAresult database is defined, flexible residues are identified in the pocket with functionality, that had to be moved fromSIENAto theSIENAToolLibas part of this thesis.

If waters should be optimized, first, they will be evaluated with EDIAmand those above the given cutoffwill be assembled. Each water to be optimized thus needs to be removed from the complex to not duplicate it while the other waters are kept as part of the pocket.

File Output

After the optimization, poses can be written out in storing the ligand in its own SDF file. Cofactors of the complex are also written into an SDF file while the complex with its annotated protons is written out to a PDB file. Score data is written into a CSV output file. Pockets are available with and without explicitly placed waters for which the recently published toolwarpphad to be refactored.

Python Framework

The accompanying Python framework reads and stores all scores in SQLite data

bases. It automatically detects the affiliation of the pocket to one of the ProtFlex18 data sets. Connecting to both the score and the SIENA cluster analysis data base, it generates all necessary analysis plots.

Structural Deviations

The classesGlobalRotationTranslationWobbling(GRTW),TorsionWobbling(TW), and both classes combined in GlobRotTransLocalTorWobbling (GRTTW) in the

NAOMI libraryMoleculeallow to perturb the coordinates of a molecule inGeohydeEvaluator. GRTW and TW both need range intervals and step sizes to direct the modifications.

The first class allows the molecule to be rotated around its center of mass as well as to be translated along the unit vectors inR3. The torsional perturbation allows the rotation around each rotatable bond while only producing a maximum number of molecule configurations. Hence, a root atom with the minimum distance to any atom is determined in the initialization phase. Then, all rotatable bonds are grouped together by their minimum distance to the root atom. Going from the most distant set of bonds towards the root atom, all rotatable bonds with at least the current distance to the root atom are allowed to be perturbed, when the in total generated number of configuration is not above the number of maximally allowed configurations. Thus, the total number of configurations is as follows:

cGRTW = #step3rot·#step3trans (A.1)

cTW = #step#rotbondstor (A.2)

cGRTLTW = cGRTW·cTW (A.3)

A step can also have the value of zero, thus being neutral. Since the number of possible configurations can escalate quickly, the GeohydeEvaluator only selects twenty conformers by random from them. Those are not allowed to exceed an RMSD of 2 Å. The ligand configuration is also removed if it has an intramolecular clash higher than those of the original ligand pose. Only slight intermolecular clash is accepted in either maximally three atom contacts or maximally as much contact as the original ligand configuration had with the protein. Contact is identified by analyzing the van der Waals sphere intersections of the protein atoms with the sphere of 0.4 times the van der Waals radius sphere of the heavy atoms in the ligand.

(a) Degrees of freedom in the Optimization in the case of a flexible molecule.

(b) Optimization work flow annotated with the participating code libraries in NAOMI with tryptophan as ligand.

Figure A.5: Optimization workflow with its degrees of freedoms.

Optimization

NumOptimization as the library for gradient free optimization for HYDE was created by me. In the middle of my thesis, Florian Flachsenberg extended the software to also allow gradient based optimization. As a result, the libraries NumOptimization and NumOptimizationHelper now contain all abstract classes necessary for usingNLoptto optimize a set of atoms, an active site or a ligand. The implementation of these for GeoHYDE can be found in theOptimizationdirectory of the Hydelibrary. Geometrically optimizing a ligand in a fixed protein pockets means to allow global rotation and translation for the ligand. Also, rotatable bonds following the criteria of the TorsionLib (Table 4.1) as well as single bonds towards a hydrogen donor should be rotatable throughout the optimization (see Figure A.5(a)). These types of bonds can also be made flexible in a protein side chain.

Following the work flow depicted in Figure A.5(b) the initial position of the ligand and other groups are the baseline against which the optimization strategy suggests changes. They are always applied on the original pocket configuration, scored with the active GeoHYDE terms and given to BOBYQA in the externalNLoptpackage.

The algorithm then integrates the score in its calculations and proposes the next pocket configuration to be tested. The cycle of suggesting, applying changes and scoring them is repeated until termination criteria are met.

Difference between GeoHYDEdesolvand the intermolecular Lennard-Jones Score In Chapter 5, three Lennard-Jones Potentials are monitored. Both the GeoHYDEdesolv

as an intermolecular and the intramolecular Lennard-Jones potential to identify clashes in the ligand and if necessary protein residues are specially fine tuned po-tentials including protons if necessary developed by our cooperation partner. The third potential which is not part of GeoHYDE but monitored in the evaluations is a standard 12-6 Lennard-Jones potential only evaluated between heavy atoms in differing components in the pocket.

Integration of the repulsive Lennard-Jones potential

The purely repulsive LJ potential (C) from the NAOMIScoringLibhas two config-uration parameters: The preferred value of the potential when the two atoms fully overlap in their center and the position, for which the potential should reach zero at around twice the sumsof the van der Waals radii of the atoms. It would be pre-ferred if the potential would be highly similar to the repulsive part in GeoHYDEd

for proper comparison. A rough parametrization atx=0 toLJs=100 and toLJs =0 forx = sperformed best in contrast tox = 2sorx= σ, the position of the original zero crossing of the LJP. Regardless,Cassesses some atom pairs as clashing while GeoHYDEd disagrees (see 2zzd TLA C 4001 in Table B.14).