• Keine Ergebnisse gefunden

Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and gaussian processes

N/A
N/A
Protected

Academic year: 2022

Aktie "Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and gaussian processes"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Page 1 of 2

(page number not for citation purposes)

Chemistry Central Journal

Open Access

Poster presentation

Virtual screening for PPAR-gamma ligands using the ISOAK molecular graph kernel and gaussian processes

T Schroeter*

1

, M Rupp

2

, K Hansen

1

, K-R Müller

1

and G Schneider

2

Address: 1Technische Universität Berlin, Machine Learning Dept., Franklinstr. 28/29, 10587 Berlin, Germany and 2Johann Wolfgang Goethe- University, Chair for Chem-and Bioinformatics, Siesmayerstr. 70, D-60323 Frankfurt am Main, Germany

* Corresponding author

For a virtual screening study, we introduce a combination of machine learning techniques, employing a graph ker- nel, Gaussian process regression and clustered cross-vali- dation. The aim was to find ligands of peroxisome- proliferator activated receptor gamma (PPAR-y). The receptors in the PPAR family belong to the steroid-thy- roid-retinoid superfamily of nuclear receptors and act as transcription factors. They play a role in the regulation of lipid and glucose metabolism in vertebrates and are linked to various human processes and diseases [1]. For this study, we used a dataset of 176 PPAR-y agonists pub- lished by Ruecker et al [2].

Gaussian process (GP) models can provide a confidence estimate for each individual prediction, thereby allowing to assess which compounds are inside of the model's domain of applicability. This feature is useful in virtual screening, where a large fraction of the tested compounds may be outside of the model's domain of applicability. In cheminformatics, GPs have been applied to different clas- sification and regression tasks using either radial basis function or rational quadratic kernels based on vectorial descriptors [4,5]. We used a graph kernel based on itera- tive similarity and optimal assignments (ISOAK, [3]) for non-linear Bayesian regression with Gaussian process pri- ors (GP regression, [4]). A number of kernel-based learn- ing algorithms (including GPs) are capable of multiple kernel learning [5], which allows combining heterogene- ous information by using multiple kernels at the same time. In this work, we combined rational quadratic ker- nels for vectorial molecular descriptors (MOE2D,

CATS2D and Ghose-Crippen fragment descriptors) with the ISOAK graph kernel.

We evaluated our methodology in different ranking and regression settings. Ranking performance was assessed using the number of false positives within the top k pre- dicted compounds. Predicted compounds were ranked based on both predicted binding affinity and the confi- dence in each prediction. In the regression setting, we employed standard loss functions like mean absolute error (MEA) and root mean squared error. The established linear ridge regression (LRR) and support vector regres- sion (SVR) algorithms served as baseline methods. In addition to standard test/training splits and cross-valida- tion, we used a clustered cross-validation strategy where clusters of compounds are left out when constructing training sets. This results in less optimistic results, but has the advantage of favouring more robust and potentially extrapolation-capable algorithms than standard training/

test splits and normal cross-validation. In the regression setting, both GP and SVR models performed well, yielding MAEs as low as 0.66 +- 0.08 log units (clustered CV) and 0.51 +- 0.3 log units (normal CV). In the ranking setting, GPs slightly outperform SVR (0.21 +- 0.09 log units vs. 0.3 +- 0.08 log units).

In conclusion, Gaussian process regression using simulta- neously – via multiple kernel learning – the ISOAK molec- ular graph kernel and the rational quadratic kernel (with standard molecular descriptors) performs excellent in ret-

from 4th German Conference on Chemoinformatics Goslar, Germany. 9–11 November 2008

Published: 5 June 2009

Chemistry Central Journal 2009, 3(Suppl 1):P15 doi:10.1186/1752-153X-3-S1-P15

<supplement> <title> <p>4th German Conference on Chemoinformatics: 22. CIC-Workshop</p> </title> <editor>Frank Oellien</editor> <note>Meeting abstracts – A single PDF containing all abstracts in this Supplement is available <a href="http://www.biomedcentral.com/content/files/pdf/1752-153X-3-S1-full.pdf">here</a>.</note> <url>http://www.biomedcentral.com/content/pdf/1752-153X-3-S1-info.pdf</url> </supplement>

This abstract is available from: http://www.journal.chemistrycentral.com/content/3/S1/P15

© 2009 Schroeter et al; licensee BioMed Central Ltd.

(2)

Chemistry Central Journal 2009, 3(Suppl 1):P15 http://www.journal.chemistrycentral.com/content/3/S1/P15

Page 2 of 2

(page number not for citation purposes)

Open access provides opportunities to our colleagues in other parts of the globe, by allowing

anyone to view the content free of charge.

Publish with ChemistryCentral and every scientist can read your work free of charge

W. Jeffery Hurst, The Hershey Company.

available free of charge to the entire scientific community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours you keep the copyright

Submit your manuscript here:

http://www.chemistrycentral.com/manuscript/

rospective evaluation. A prospective evaluation study is currently in progress.

References

1. Henke B: Progr Med Chem 2004:1-53.

2. Ruecker C, Scarsi M, Meringer M: Bioorg Med Chem 2006:5178-5195.

Rupp M, Proschak E, Schneider G, J Chem Inform Model, 2007, 2280–

2286.

3. Schwaighofer A, Schroeter T, Mika S, Laub J, Laak A, Sülzle D, Ganzer U, Heinrich N, Müller K-R: J Chem Inform Model 2007:407-424.

4. Obrezanova O, Csanyi G, Gola J, Segall M: J Chem Inf Model 2007:1847-1857.

5. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B: J Machine Learning Research 2006:1531-1565.

Referenzen

ÄHNLICHE DOKUMENTE

do not span the whole space (e.g. the dimension of the space is higher as the number of the data points). It will be important later for the feature spaces.. Why is it so? Proof

In a more recent confirmation for one day of S&amp;P 500 index options data at multiple maturities, Fengler and Hin (2015) work out the pricing kernel surface by fitting B-splines to

Objective: Familiar with difference of embedded/PC Linux system, Development tool chain for embedded ARM systems, Kernel modules, Principles of device drivers, Communication of

This modified algorithm outperforms all other online kernel classifiers on noisy data sets and matches the performance of Support Vector Machines with less support vectors..

The main purpose in this paper is t o explore ways and means of finding the kernel quasidifferentials in the sense of Demyanov and Rubinov for a certain

Based on the one or the other, efficient optimization strategies have been proposed for solving ` 1 -norm MKL using semi-infinite linear programming [21], second order approaches

We present an efficient feature selec- tion method for density level-set estimation where optimal kernel mixing coefficients and model parameters are determined simultaneously..

extreme, where all relevant information is spread uniformly among the kernels such that there is no redundant information shared, the canonical mixture intuitively represents