• Keine Ergebnisse gefunden

Approaches to reconstruct and analyze intercel- intercel-lular signaling networksintercel-lular signaling networks

The objective of this thesis is to reconstruct and analyze intercellular signaling networks.

Most research conducted so far on biological networks concentrated mainly onintracellular networks, such as genetic regulatory networks (Bower and Bolouri, 2000), metabolic net-works (Ma and Zeng, 2003), protein-protein interaction netnet-works (Schwikowskiet al., 2000) and signal transduction networks (Steffen et al., 2002). Contrary, intercellular signaling networks are in the focus of only a small number of research projects. Therefore we will introduce in the following such approaches.

2.4.1 Bioinformatics and cellular signaling

The general requirements to reconstruct and analyze cellular signaling networks are re-viewed by Papin and Subramaniam (2004) and Papinet al.(2005). Here, cellular signaling is understood as integral combination of intra- and extracellular signals and thus, of events that happen at diverse spatio-temporal scales. Modeling cellular networks ranges from biochemical equations representing quick intracellular responses (<101 seconds, such as e.g. protein modifications and changes in Ca2+ concentrations) to slow responses (from minutes to hours) over large distances, such as in endocrine signals. The networks can be modeled in varying degrees of detail to understand their complexity and to make quan-titative predictions. But a whole-network reconstruction at the most detailed level of differential equations or stochastic simulations is certainly out of reach. Thus, different kinds of modeling approaches have to be combined in order to gain a systemic view on cellular signals between cells.

In addition to the details on intracellular networks, on which the review by Papin et al.

(2005) mainly concentrates, combinatorial calculations are presented that elucidate the complexity that a complete intra-to-extracellular signaling network would exhibit. Even if all intracellular elements that can influence the signaling processes are not considered, the variety of the possible ligand-receptor interactions is large. For example, 367 variants of the G-protein-coupled receptor (GPCR) could be identified in the human genome and the expression profiles of 100 GPCRs in the mouse genome also indicate that most receptors are expressed in various tissues. Hence, many different receptors probably exist concur-rently in the same cell or tissue. If one assumes that a mere 1% of the estimated 1 543 different receptors in the human genome (i.e. 15 receptors) can be independently expressed in any give cell type, then a cell could potentially respond to 215 = 32 768 different ligand combinations (for two independent ligand states: bound and unbound). This gives a good illustration of the general complexity of cellular signaling.

To finally reach the goal of an integrated model of the human cell signaling, efforts are needed that go beyond single research projects. For this purpose, several research initiatives have been formed to build the base for a human whole signaling network as well as for the integration of data at several physiological levels. There is for example the Alliance

2.4 Approaches to reconstruct and analyze intercellular signaling networks 35

for Cellular Signaling (Gilman et al., 2002, available at www.signaling-gateway.org), which is a large-scale collaboration designed to answer global questions about signaling networks. But although this initiative addresses cell signaling in general, the effort is mainly restricted to intercellular signaling and there on pathways of two cells, B lymphocytes and cardiac myocytes. Further projects in this context are theDatabase of Quantitative Cellular Signaling (DOQCS, a repository of models of signaling pathways at the level of chemical reactions, Sivakumaran et al., 2003, available at doqcs.ncbs.res.in) and the portal of the Science journal, the Signal Transduction Knowledge Environment (STKE, available at stke.sciencemag.org) . The STKE includes Connection Maps, the database of cell signaling. The integration of different and separately stored pathways at the intracellular level has been shown by Hsinget al. (2004). For this purpose they used semantic networks which are similar toontologies (Section 5.1).

As systems biology emerged as a discipline with the goal to integrate existing knowledge from different levels of molecular biology (Kitano, 2002), an integrative modeling of all physiological levels in the human organism is achieved in two ambitious projects, the Physiome Project located at the University of Washington, USA (Bassingthwaighte, 1995, available at www.physiome.org), and the IUPS Physiome Project at the University of Auckland, New Zealand (Hunter et al., 2005, available at wwww.bioeng.auckland.ac.

nz/physiome/physiome_project.php). The physiome projects are worldwide efforts to define and describe the physiome quantitatively through the development of databases and models which will facilitate the understanding of the integrative function of cells, organs, and organisms. The aim is to develop integrative models at all levels of biological organization, from genes to the whole organism via gene regulatory networks, protein pathways, integrative cell function, and tissue as well as whole organ structure-function relations. Thus, these projects are not focused on only cellular signaling, but cell signaling is an important part of physiology and in near future the data collected and integrated by these projects might be possible to use to reconstruct intercellular signaling networks.

2.4.2 Reconstruction by spatial gene expression analysis

Beside the attempt of a complete integrated modeling of cellular signaling in humans, Di-ambra and da F. Costa (2005) present an example how intercellular signaling networks can be reconstructed from Drosophila data. The main purpose of their study is to improve the analysis of spatial gene expression patterns by means of complex networks. Images of small volumes of the organism show the gene expression intensities in a number of neighboring cells. An image is then transformed into a network of cells and two cell nodes are con-nected by an undirected edge if they have a similar expression intensity and are not further apart than a maximum distance. The basic assumption here is that cell signaling drives and coordinates gene expression at least in a local area. The analysis of the node degrees and clustering coefficients of the resulting networks could be used to characterize different stages in developmental dynamics and to identify abnormalities. Although this has been done for Drosophila, this approach can in principle be applied in any organism where im-ages of gene expression intensities on the cellular level can be obtained. This shows how the

analysis of cellular signaling networks can be reasonably used to understand the function of organisms at a systemic level.

2.4.3 Reconstruction of nuclear receptor interactions

Considering human intercellular signaling again, the reconstruction of the signals from the available data as first step remains a problem. For this reason Albert et al. (2003) access the biomedical literature with an automated approach to generate a database of protein interactions with nuclear receptors. Therefore, a subset of MEDLINE texts is selected that contains terms from a dictionary (protein and nuclear receptor names as well as keywords like “bind” or “associate”). The dictionary is hierarchically organized (comparable to an ontology) and initially manually created, but subsequently extended by the achieved text mining results. The selected texts are decomposed into their sentences and that are then searched for co-occurring triples of protein, receptor and keyword terms. Finally stop lists containing rules that describe known false-positive results are applied and the resulting extracted interactions are stored in a database.

With this process, about 15 thousand co-occurrence triples were retrieved automatically from about 4 thousand abstracts. After manual curation of all results, about 3 thousand co-occurrences were classified as positive, which equals a precision (i.e. ratio of true-positives among all results) of about 20%. Interestingly, the number of detected interactions correlates with the number of published papers for a given receptor. Comparisons with yeast two-hybrid screen results suggest that such a correlation cannot be confirmed by experimental data. Thus, beside the problem of the uncertainty of automatically generated results from fuzzy natural language texts, it turns out that also text mining reflects the bias in the literature (see also the review on network extraction from text in Section 2.3.5).

This study shows how partial knowledge of intercellular signaling can be reconstructed from text. However, the locations (cell types or tissues) of the extracted protein and their receptors are not considered here. Thus, although the text mining approach is similar to the approach we will apply here (see Section 5 and the discussion in Section 5.5), the data gained by Albert et al.(2003) is not sufficient to reconstruct entire cell signaling networks.

2.4.4 Analysis of the human immune cell network

If a network of intercellular signals could be reconstructed, the next challenge is its analysis since such a network typically consists of a relatively low node number compared to a much larger number of connections. Especially the fact that any node pair might obtain a principally unlimited number of multiple edges (modeling the different first messenger relations between two cell types) is not considered in usual network or graph analysis.

Therefore, Tieriet al. (2005) show how such a network can be analyzed by considering the number of different interactions as edge weight for shortest path calculations.

The network that Tieri et al. (2005) focus on is the human immune cell network, i.e. a subset of the whole intercellular communication network consisting of 19 cell types as nodes and a total of 316 connections, including autocrine self-loops. The data is taken from the