Summary and Future Research - Methods for the Efficient Comparison of Protein Binding Sites and

In this work, improved and accelerated methods for the comparison of protein binding sites are presented. Prior to the development of such approaches, however, the importance of detecting relevant and significant pockets automat-ically has been demonstrated (Chap. 2). It could be shown that the extraction of binding pockets in close proximity of the bound ligands makes comparisons trivial due to the inherent shape similarity, which is transferred from the ligands to the extracted pockets. Binding site comparisons are thus rather trivial, even in datasets that hardly contain redundancies in the sequence information. Applying this strategy in the pocket-extraction step, unoccupied pockets that are likely to contain important information about the putative accommodation of yet unknown ligands will remain unconsidered.

Subsequently, an extended graph-based model for enhanced similarity search in Cavbase was presented in Chapter 3. We proposed a novel and efficient modeling formalism that does not increase the size of the graph model used in Cavbase, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters. Combined with a heuristic for the efficient detection of maximum common subgraphs, these properties are evaluated as additional node labels in the program LC, which leads to a gain of information and enables much faster but still very accurate comparisons between different structures. Moreover, the acceleration DivLC was discussed in Chapter 4

149

which makes use of graph partitioning. Therefore, graphs are divided into disjoint components prior to their comparisons. The pseudocenter sets are split with regard to their assigned physicochemical type, which leads to seven much smaller graphs than the original one. Applying this approach on the same test scenarios results in another significant speed-up without sacrificing accuracy. The graph partitioning approach only revealed weaknesses when small subpockets were used for the mutual comparisons.

In Chapter 5, a method for large-scale mining of similar protein bind-ing pockets was introduced. A program called RAPMAD (RApid Pocket MAtching using Distances) was developed, which allows for ultra-fast similar-ity comparisons as protein binding sites are represented by sets of distance histograms that are both generated and compared with linear complexity.

Thus RAPMAD attains a speed of more than 20 000 comparisons per second, which makes screenings across large datasets and even entire databases easily feasible. The practical use of the programs RAPMAD and LC was proven by a successful prospective virtual screening study that aimed at the identification of novel inhibitors of the NMDA receptor.

Finally, an extension of the program DSX, a scoring method for protein-ligand complexes, was introduced (Chap. 6). By adding the assessment of hydrogen-bond geometries an improvement of the program could be achieved along with only little increase in runtime. The extended version was tested on well-studied test datasets, which enables an exhaustive comparison with the previous version as well as a plethora of other hitherto developed approaches.

6.3 Future Work

Despite the improvements that have been presented in this work, there is still space for further enhancements. Firstly, I want to point out the algorithmic workflow of the methods LC and DivLC. During the generation of the product graph we chose a value of 2.0 Å for the parameter. This parameter defines the

6.3. FUTURE WORK 151 maximum distance between two nodes to be inserted as a new product node in the product graph. Thus, it is significantly responsible for the size and quality of the product graph. In our trials, we decided to use 2.0 Å as this matches the parameter setting in Cavbase. However, it is very likely that varying to smaller or greater values will also lead to different binding site comparison results. To our knowledge, this value has been fixed in the workflow of Cavbase without a real rational derivation. It is therefore reasonable to calculate the classification results for a broad range of different values, e. g. in the range of [0.1. . .3.0], in order to optimize the threshold .

The scoring function DSX holds remarkable potential for improvements as well. Although a moderate amendment of the scoring results could be obtained after implementing the evaluation of H-bond geometries, there are many more geometrical features that are reasonable to be assessed when scoring a receptor-ligand complex. For instance, also halogen bonds and the mutual positions of aromatic rings Π-stackings or edge-to-face configurations could be taken into consideration. It was shown that specific halogen bonds can even contribute as much to binding affinity as hydrogen bonds [68, 97]. Furthermore, Taylor has shown the high potential of especially iodine to form strong halogen bonds to nitrogen and oxygen by performing an exhaustive study of crystal structures in the CSD (see Ref. 145, Tab. 5). Lu et al. moreover proposed an ordering of the strengths of halogen bonds, which is H· · ·I > Br > Cl [97].

Furthermore, examinations could be held that consider also water molecules in the scoring process. Due to the novel consideration of H-bonds this could lead to another performance gain of DSX. In addition, one could enhance the implementation of the H-bond scoring even further such that even water molecules in more remote locations are involved which are part of the so-called second solvation shell.

In the present study on DSX, we computed the positions of hydrogen atoms by using the VSEPR model, as the electron density does not disclose the positions of hydrogens properly in X-ray structures with a resolution well

below 1.0 Å. However, future studies could exclusively process input structures for the generation of statistical potentials that already contain H positions.

Aside from highly resolved X-ray structures, the data deposited in the CSD contains in many cases experimentally determined H atom positions. Moreover, structures that have been resolved by neutron diffraction comprehend well-determined H positions.

A

Im Dokument Methods for the Efficient Comparison of Protein Binding Sites and for the Assessment of Protein-Ligand Complexes (Seite 173-177)