• Keine Ergebnisse gefunden

Class Signal Molecule Cell Signaling ExtraCell Signaling Gene Expression

Fields Endogenous/Exogenous From molecule From tissue

Other Name To molecule To tissue

Is Synonym Interaction Signal Molecule

Species Type

Cell Signaling Tissue

Synthesis Target

Table 4.1: Class definitions in CSNDB. Only classes and fields used for the cell-cell signaling reconstruction are shown.

reconstruction approaches are then performed: in the first approach only information is accessed that can be directly detected as relevant, complemented by a second approach which is designed to exploit as much information from the CSNDB as possible. Both approaches and the resulting networks are shown in Section 4.2.

In Section 4.3 the subnetwork of organ-organ interactions resulting from the second reconstruction approach is used as an example how such networks can be used for further analysis. Section 4.4 briefly describes the implementation of the CSNDB extractions and finally, a discussion in section 4.5 closes this chapter.

4.1 Content and organization of CSNDB 47

Signal Molecule : "GH"

Endogenous

Other Name "growth hormone"

Other Name "hGH"

Cell Signaling "GH-RH -> GH"

Cell Signaling "somatostatin -> GH"

Cell Signaling "GH -> IGF-1"

Cell Signaling "GH -> GH receptor"

Cell Signaling "GHS -> GH"

Type Hormone

Tissue "brain"

Tissue "Aorta"

Tissue "Placenta"

Synthesis "hypophysis"

Signal Molecule : "GH receptor"

Endogenous

Other Name "growth hormone receptor"

Other Name "hGHbp"

Cell Signaling "GH -> GH receptor"

Type Receptor

Tissue "brain"

Tissue "breast"

Tissue "heart"

Cell Signaling : "GH -> GH receptor"

From molecule "GH"

To molecule "GH receptor"

Interaction "ligand-receptor binding"

Table 4.2: Example for a definition of a signaling entity and its corresponding molecules in the CSNDB flat file. Only fields used for reconstructing cell-cell signals are shown here. Field values enclosed by double quotes references to other fields. References to other classes are realized by exact string matches of descriptors instead of identification numbers.

(i.e., autumn 2005) both the web-interface and the flat file are unavailable.

Several difficulties had to be solved in order to utilize the CSNDB. For example, the data is organized as mutually referencing objects, however, the flat file generated from the database contains only user-defined descriptor strings as identifiers. Since these descriptors can have typos or could otherwise be ambiguous, referenced objects often cannot not be identified. This is additionally worsened by the fact that some objects do not exist in the database. Such inconsistencies had to be resolved manually. Furthermore, the data structure definition is not in XML or in another format suitable for an automated data access. Thus, an automated processing of the flat file is not easily possible, and during the parsing process many other errors and problems had to be resolved.

4.1.1 Relevant classes

The CSNDB data structure consists of classes which contain objects implementing the class scheme. Therefore, each object consists of a name and a number of fields. The object names are used in the flat file to establish references between objects. So the fields of an object can contain values, references to other objects, or are used as boolean flags. In the latter case, the appearance of such a field means that its value is set to “true”. References are enclosed by double quotes. A field might appear several times for different values or references (e.g., a molecule that appears in several different tissues) or is completely left out if no values are set (i.e., there are no empty fields). Table 4.1 shows the classes and fields mainly accessed in the present context to reconstruct intercellular signaling networks and Table 4.2 presents the objects GH (growth hormone) andGH receptor as typical example objects of the CSNDB.

Considering the relevant fields of a Signal Molecule, such a molecule can be marked as Endogenous or Exogenous. Using this, exogenous molecules as pathogens, viruses or drugs can be excluded in the present context. Synonym molecule names are linked by the fields Other Name and Is Synonym. Sometimes the name of the Species containing this molecule is specified. Values of type might be e.g., Hormone, Neurotransmitter or Cytokine. A molecule can be assigned to more than one type. The field Cell Signaling of a Signal Molecule references all signaling interactions in which this signal molecule takes part.

The fields Tissue, Synthesis and Targetare of special importance since they contain the names of the locations where theSignal Moleculehas been found, where it is produced or received, respectively. Here it turned out that although the field nameTissue suggests the use of a specific type of location (i.e. a tissue), this field can contain locations of very different kinds, as e.g. cell types, organs or organ systems, which are not all regarded as tissue in a biomedical sense and subsist on various levels of the anatomical hierarchy.

Hence, in the following we prefer the term location (instead of tissue), which refers in the remainder of this chapter to entries in the fieldsTissue, Synthesis andTarget. In order to access the locations by their types, all locations finally used in the network are manually assigned to a location type (as e.g. cell type, tissue or organ, see Section 4.3).

In a Cell Signaling object the two interacting molecules are specified in the fields

4.1 Content and organization of CSNDB 49

From molecule and To molecule and the type of the interaction is defined in the field Interaction. The type can be e.g., phosphorylation, protein-protein interaction orligand-receptor binding.

Molecular interactions are also stored as Gene expression, a class similar to Cell Signaling, i.e. aGene expression object possesses all features of Cell Signaling.

Gene expression is additionally considered here in order to capture events from steroid signaling where hormones bind to a receptor inside the cell and influence gene expression directly (Section 2.1). Further information about locations linked by intercellular signals is explicitly stored inExtraCell Signaling objects where two locations (in From tissue and To tissue) are directly connected through a Signal Molecule. In some cases this information is also captured by the information in Cell Signaling and its respective sig-naling molecules. Since the number of ExtraCell Signaling objects in the CSNDB is considerably low, most reconstructed signals are inferred from interacting molecules and their locations.

4.1.2 Assembly of intercellular signals

Finally it has to be derived how intercellular signals can be extracted from the presented data scheme: from a Cell Signaling objects the nodes and the links of the network can be inferred by connecting the locations of the interacting molecules From molecule and To molecule. With this information the templates that model a cell-cell signal (Sec-tion 3.1) are filled with the signaling molecules (ligand and receptor) and their loca(Sec-tions.

Hence, the locations as the nodes of the network include in case of the CSNDB reconstruc-tions not only cell types, but also, for instance, tissues and organs.

As an example consider the growth hormone signalingGH -> GH receptorin Table 4.2:

here the four locations of the GH molecule (brain, Aorta, Placenta and hypophysis) can be connected to three locations of the GH receptor (brain, breast and heart).

Thus, the directions of the links between the locations are determined by the From molecule as source and the To molecule as target nodes of the Cell Signaling.

The different semantics of the location fields (Tissue, Synthesis or Target) is in one reconstruction approach considered more specifically. Some fields of the Cell Signaling andSignal Moleculeclasses (as e.g.Endogenous/Exogenous andSpecies, see Table 4.1) are used for filtering purposes.

Although the CSNDB flat file contains even more information about the molecules and their signals, the cell signaling reconstruction tasks use only the fields described here.

Since most of the additional information does not appear very frequently in the selected molecules and signals, we would expect few changes in the accuracy or topology of ex-tracted networks given further information. The two extraction runs on the CSNDB which are described in the following Sections 4.2.1 and 4.2.3 differ mainly in the selection of appropriate Cell Signaling classes and in the handling of the different location fields.

Reconstruction I Reconstruction II

Selected database objects

Entity CSNDB Total Locations Total Locations

Cell Signaling 1 382 169 74 180 106

Signal Molecule 3 512 264 120 262 160

Gene Expression 83 - - 0 0

ExtraCell Signaling 15 - - 8 8

Resulting graphs

direct direct

Locations CSNDB mult uniq bip trip mult uniq bip trip

All Nodes 172 85 159 205 94 215 287

Edges - 3 584 1 614 1 069 935 3 214 1 551 1 222 1 102

Organs Nodes 88 29 74 107 57 155 213

Edges - 1 243 430 481 451 2 117 871 884 831

Table 4.3: Summary of CSNDB extraction results. The upper part of the table shows for the relevant database entities the number of appearance in theCSNDBand the numbers selected by the two applied reconstruction approaches. For both approaches the total number of selected entities (Total) and the number of entities for which locations are specified (Locations). The entities with locations could be used in the subsequent graph constructions. Note that the classesGene ExpressionandExtraCell Signaling are not accessed in the first reconstruction approach. The dimensions of the graphs resulting from the two approaches are shown in the lower part of the table (for all locations as well as for the subset of organ locations). The CSNDBcolumn shows the total number of available locations and organs, whereas the other columns in the lower part of the table contain the node and edge numbers resulting for each available graph representation: direct multiple (mult) and unique (uniq), bipartite (bip) and tripartite (trip).