Calculation and Storage of relevant Data - Pelikan - Searching for Interaction Patterns

3. Aims and Preconditions 25

4.3. Pelikan - Searching for Interaction Patterns

4.3.2. Calculation and Storage of relevant Data

Resulting vector for angle constraints Point-point

constraint type

3-5Å 1.search

point 1

2.search point 3 Distance constraint

Example:

1.search point 1

2.search point 3 π-π

Interaction constraint Example:

Resulting vectors for angle constraints Interaction type of

search point Donor Examples:

N

1. 2.

Acceptor Examples:

1. 2.

Aromatic Example:

and

N

and

a b

Figure 4.4.: Vectors of query objects in the Pelikan query system. Every pair of vectors can be connected with an angle constraint. a) Possible vectors of search points. b) Possible vectors of point-point constraints.

Filters for Textual and Numerical Properties

In addition to the geometrical query, numerical and textual filters can be added to a query.

Filters can be defined for all properties stored in the PropertyDB database. A list of all stored properties is shown in Appendix B.5. Depending on the value type, these filters are either range filters where a minimal and a maximal allowed value is provided, or string filters. In the latter case, the given filter string is searched in the respective database entry as substring in a case-insensitive manner. Besides filters for specific properties, substructure filter for reference ligands can be defined using SMARTS patterns.

InteractionDB

stores 3D information about atoms in pockets and interactions.

potential result points (PRPs) prp_key

pocket_key type origin element coordx coordy coordz atom name

primary key foreign key integer integer integer integer integer integer string interactions

prp_key1 prp_key2 interaction type

foreign key foreign key integer

examples:

donor, acceptor, ...

ref.ligand, protein, ...

carbon, oxygen, ...

h-bond, metal, ...

ComplexDB.pockets

Figure 4.5: Overview over the tables of the InteractionDB used in Pelikan.

Arrows indicate cross references between tables.

Herein, black arrows represent cross references within the Interac-tionDB, whereas green arrows show cross references to the ComplexDB.

The InteractionDB

An overview about the table scheme of the InteractionDB is shown in Figure 4.5. The first table stores potential result points (PRPs) which are atoms with 3D coordinates. It is called

’PRP table’. A PRP is uniquely identified by its primary key, named prp key. Note that the smallest valid prp key is 1. For each PRP, the table stores its 3D coordinates, its interaction type, its element type, the molecule type it originates from (origin), and the name of its atom. In order to reduce storage space, the 3D coordinates are stored as integers. In a structural file from the PDB, there are at most six digits to represent the floating point number of the x, y, and z coordinates of an atom, respectively. Thus, each coordinate can be multiplied by 1 000 and casted to an integer by loosing at maximum the fourth decimal.

The maximal integer which can be generated by this method is 999 999 000. This number can be represented using 4 bytes. According to the SQLite documentation [92], the storage of a floating point number would need 8 bytes. Hence, 12 bytes (4 bytes for each dimension) can be saved per PRP using this conversion of types. Additionally, a foreign key links back to the ComplexDB indicating the pocket the PRP is part of.

In the table ’interactions’, atomic interactions are stored. An atomic interaction is al-ways formed between two PRPs which are represented by two reference keys, prp key1 and prp key2, respectively. They point to the primary prp key in the PRP table. Additionally, the type of the interaction, e.g., hydrogen bond or cation-π, is stored in this table.

Database Construction

The database can be constructed out of a collection of structural files from the PDB. Both file formats, pdb and mmCIF, are accepted. During the building process, the files are first read and interpreted using the functionality of the NAOMI library explained in section 4.2.1.

As already mentioned, when reading files from the PDB using the NAOMI library, covalently

bound ligands as well as sugar chains are categorized as part of the protein. In order to resolve this shortcoming, a procedure was developed which identifies covalently bound ligands as well as ligands consisting of more than five residues. The methods is shown in Algorithm 1.

Firstly, all residues of a protein which might be a ligand are identified and classified as ’hetero residue’ (see Line 3 in Algorithm 1). These are all residues which fulfill all of the following criteria:

• The residue is not a standard amino acid.

• The residue is connected to a protein chain (by definition of NAOMI) via a non-peptide bond.

• All atoms of the residue have valid 3D coordinates.

Algorithm 1 Detect additional ligands

1: proceduregetLigandsFromProtein(protein)

2: newligands= empty molecule vector

3: heteroresidues=findHeteroResidues(protein)

4: components =groupConnectedResidues(heteroresidues)

5: for all c∈ components do

6: if noChainBreakUponRemoval(c,protein)then

7: molecule=createOneMolecule(c)

8: addMoleculeToVector(molecule,newligands)

9: end if

10: end for

11: returnnewligands

12: end procedure

After all potential ligands have been detected, connected residues are grouped into connected components by a breadth first search (see Line 4 in Algorithm 1). If no chain break is intro-duced by removing the complete component, all residues of one component are converted to one molecule and added to the vector of ligands. Note that the residues are not removed from the protein in order to maintain the chemical integrity of the structure. However, the respective residues in the protein are ignored in the further procedures.

Every ligand which consists of more than five and less than one hundred heavy atoms is con-sidered a reference ligand here. These reference ligands are used to build pockets. A pocket is defined as all ligands, residues, water, and metals which have an atom-atom distance of less than 6.5 ˚A to any of the reference ligand’s atoms. Pockets which contain no residue are discarded. Afterwards, all atomic interactions are calculated within each pocket using the NAOMI library (see Section 4.2.2).

In the last calculation step, different properties of the current data structures are determined.

Properties of the reference ligand and the pocket are calculated using the NAOMI library (see Section 4.2.8). Moreover, properties for the protein and the complete protein-ligand complex are extracted from the header section of the PDB file. In total, 39 different properties are collected for each input file. Therein, 18 properties are determined for the reference ligand, 13 for the pocket, four for the protein, and four for the protein-ligand complex. The exact properties which are calculated and their value ranges are shown in Appendix B.5.

In the next step, all calculated data is stored in the database. This includes the storage of all small molecules, protein chains, and pockets in the respective tables of the MoleculeDB, the ProteinDB, and the ComplexDB. All properties are stored in the respective tables of the PropertyDB using the reference key to either the molecule, the molecule instance, the pro-tein, or the pocket assigned in the previous step, respectively. Each heavy atom in a pocket can be a PRP in a search and is thus stored as PRP in the PRP table of the InteractionDB.

The interaction type of each atom, which is needed for the storage of a PRP, is determined using the NAOMI library (see the first step for identifying atomic interactions described in Section 4.2.2). It might happen that more than one interaction type is determined, e.g., the oxygen of a water can be donor as well as acceptor. In these cases, a separate PRP is stored for every detected interaction type. These PRPs are completely equal, except for their entry in the column ’interaction type’. The prp key of each PRP can be seen as a unique id and will later be used to refer to a specific PRP.

For all detected atomic interactions in the pocket, both PRPs which take part in this inter-action are stored in the interinter-actions table of the Interinter-actionDB. If more than one PRP has been entered into the database for one atom, only the PRP with the matching interaction type is used here.

Im Dokument Mining of Interaction Geometries in Collections of Protein Structures (Seite 54-57)