• Keine Ergebnisse gefunden

6. Results and Discussion of Pelikan 75

6.5. Comparison with Relibase

As a last experiment the performance and ability of the Pelikan method to search for 3D structures in protein-ligand interfaces is compared to Relibase. This experiment should help to contextualize the Pelikan approach within the landscape of existing tools. Among the currently published tools, only Relibase and Relibase+ allow for precise queries on an atomic level as Pelikan does. However, since Relibase+ is not publicly available, only Relibase can be used here. In contrast to Relibase+, Relibase does not allow for intra-molecular distances.

Moreover, angle constraints are not possible. Hence, three different 3D queries were designed which contain only distance constraints between the ligand and the protein. The queries are shown in Figure 5.3.

The resulting hits and the retrieval times for the searches of all three queries on both systems are displayed in Table 6.2. Since the runtimes for Relibase highly fluctuated, the runtimes for Relibase are given as mean values from three independent experiments. For Pelikan, only the runtime of one exemplary run is shown since the fluctuation of runtimes using the same

Relibase Pelikan

Query hits PDB runtime hits PDB runtime

structures structures

Query 1 29 24 148± 60 s 30 24 62 s

Query 2 114 55 108± 74 s 218 52 66 s

Query 3 8 7 28 ±6 min 13 7 113 s

Table 6.2.: Resulting hits detected by Relibase and Pelikan using three different queries. For each query, the number of resulting hits, the number of detected PDB structures and the runtime is given for Relibase and Pelikan, respectively. For Pelikan, a database containing PDBcompletewas used.

Runtimes for Pelikan were measured using the SSD settings. For Relibase, the web interface provided by the CCDC was used (http://relibase.ccdc.cam.ac.uk/index.php, accessed between March and June 2017). Runtimes were measured using a stopwatch. Mean values and standard deviations of three independent experiments are shown.

query is very small.

For query 1, almost the same results were detected by Relibase and Pelikan. They only differ in the number of detected hits. Relibase found 29 hits whereas Pelikan detected 30.

The reason for this is that the used query could match the same set of atoms twice. In the Pelikan query, search point 1 and 2 are interchangeable if both matching atoms fulfill the distance constraint to search point 3. This is the case in one pocket. Obviously, Relibase does not count these symmetric hits.

Concerning query 2, Pelikan found many more hits than Relibase. The reason for this is again the symmetry of the query. Pelikan counts every unique hit whereas Relibase seems to count only unique sets of atoms as hits. Moreover, Relibase found three PDB structures which were not detected by Pelikan, these are 2hm9, 3kwh, and 2xad. The PDB code 3kwh contains a protein-ligand complex whose structure has been determined using NMR. Thus several models for this structure exist. During the interpretation of the PDB file, only the first entry is used by the NAOMI library if several models exist. In this first model, the distance between the atoms which match search point 1 and 4 is too large for the used distance constraint (4.7 ˚A). Relibase uses all models of the structure. In other models, the distance between the matching atoms agrees with the distance constraints. The structure with PDB code 3wkh is deprecated and has been replaced by the structure 3oc0 in the PDB.

Both Relibase and Pelikan find a hit in 3oc0. 3wkh seems to be still part of the Relibase database but not of the Pelikan database. The PDB code 2xad links to a structure which contains a glycopeptide. Relibase considers this glycopeptide as ligand and thus detects a hit. In Pelikan, this structure is considered as protein and thus no hit is detected in this structure.

Even though Relibase and Pelikan find the same number of PDB structures for query 3, the detected structures differ between both tools. Only one hit is identical: PDB structure 1kwf with ligand GLC. The remaining seven hits detected by Relibase are not detected by Pelikan.

PDB Code Ligand Reason why hit is not detected by Pelikan 3noq EDO Ligand contains only four heavy atoms.

4mty GOL Ligand oxygens are in two different molecules 2ayw ONO 501 Ligand oxygens are in two different molecules 2ayw ONO 601 Ligand oxygens are in two different molecules 1ylj SO4 Ligand oxygens are in two different molecules 1lug SUA Ligand oxygens are in two different molecules 3k34 SUA Ligand oxygens are in two different molecules

Table 6.3.: Hits which were exclusively detected by Relibase using query 3. For each hit, the PDB ID, the ligand name and the reason, which the hit was not detected in Pelikan is given.

Table 6.3 lists all these hits and gives a reason why the respective hit is not detected by Pelikan. In most of the cases, the two ligand oxygens of the query are part of two different molecules in the resulting hits. In the Relibase query, only the origin of an atom can be specified, e.g., ligand, protein, or water. If the drawn structure is not connected, it cannot be specified that two atoms should be part of the same molecule. In Pelikan, the reference ligand, which is used to define the pocket, is logically different from other small molecules within the pocket, called ligands. If in the the query 3 for Pelikan, the term ’reference ligand’

is exchanged by ’ligand’ for one of the oxygen, all hits listed in Table 6.3 could be detected, except for PDB structure 3noq. This hit could only be detected with Pelikan if for both oxygens, the term ’reference ligand’ is replaced by ’ligand’. This means that Pelikan is in principle able to find all hits Relibase is detecting. However, the query used in Pelikan is more precise in a sense that a user has to define if points are part of the same reference ligand or part of different ligands. In Relibase, these cases can not be distinguished.

Moreover, there are six PDB structures which were found by Pelikan but not by Relibase.

For each of these structures, one hit is exemplary shown in Figure 6.13. All six hits are valid since they agree with the used search constraints. The hits detected in PDB structures 3whi and 5jug are within covalently bound ligands. Relibase does not seem to interpret these structures as ligand which is why they are not detected by Relibase. A reason why the PDB structures 1i1w and 1g66 are not among the hits of Relibase could be that GOL is not interpreted as ligand. The molecule GOL is relatively small and could be interpreted as part of the solvent rather than as a ligand. However, this interpretation is very unlikely since Relibase also detected the ligand EDO in PDB structure 3noq (see Table 6.3) which is chemically very similar to GOL. Concerning PDB structures 4x5p and 2bzz, no possible explanation why these hit are not part of Relibase’s hitlist could be discovered.

The comparison of the runtimes which Relibase and Pelikan needed to find all hits is difficult because Relibase can only be accessed via a web server. The runtime here highly depends on the connection and number of simultaneous users. Therefore, the queries were repeated several times on different days using the Relibase web interface. Overall, the runtimes for query 1 and 2, are mainly between 1 and 3 minutes using Relibase and Pelikan. For query 3, however, Relibase requires with 28 minutes about a factor 15 longer than Pelikan. Noticeably,

Figure 6.13.: Hits which were exclusively detected by Pelikan using query 3 on the database PDBcomplete. The atoms matching the search points are highlighted with colored disks. The distance constraints are indicated by colored lines.

queries in Relibase seem to be much faster if a large ligand substructure is used instead of a large molecular structure from the protein.

Taken together, it can be concluded that Pelikan is able to find correct results in a runtime which is similar or faster than that of Relibase. For queries which contain more information about searched structures in the protein than in the ligand, Pelikan is even faster than Relibase. Moreover, Relibase does not find all hits which are detected by Pelikan for this query. Even if the additional geometric constraints of Relibase+ are taken into consideration, Pelikan offers more query flexibility as a large number of textual and numerical properties can be added to a geometrical query.