• Keine Ergebnisse gefunden

2.2 Protein-Protein Interactions

2.2.1 Identification of protein interactions

Detecting all possible physical interactions within an organism – the interactome (Cusick et al., 2005) – is an essential step toward deciphering the complex molecular relationships in living systems. Different experimental and computational methodologies have been developed to identify the specific mechanisms of protein recognition at the molecular level and to elucidate the global picture of protein interactions in the cell. We briefly introduce (1) two established experimental methods, (2) literature curation and (3) in silico techniques for discovering protein interactions and discuss their methodical capabilities and limitations.

4In vivo methods refer to experiments performed in living cells whilein vitromethods are carried out in a controlled environment.

2.2 Protein-Protein Interactions

Table 2.1: Experimental methods for detecting protein interactions and their charac-teristics. The table summarizes for each technique whether it is suitable for large-scale analysis (+ vs. −), whether it is anin vivoor in vitro system4, the type of interaction it detects (binary vs. complex) and the type of interaction characterization. (Table adapted from Shoemaker and Panchenko (2007a))

Large-scale Type of Type of

Method approach Cell assay interaction characterization

Yeast two-hybrid + in vivo binary Identification

Tandem affinity purification–

MS

+ in vitro complex Identification

Protein microarrays + in vitro complex Identification

Phage display + in vitro complex Identification

Co-immunoprecipitation in vivo complex Identification

Surface plasmon resonance in vitro complex Kinetic, dynamic characteriza-tion

Electron microscopy in vitro complex Structural and biological char-acterization

Fluorescence Resonance En-ergy Transfer (FRET)

in vivo binary Biological characterization X-ray Crystallography, NMR

spectroscopy

in vitro complex Structural and biological char-acterization

2.2.1.1 Experimental detection methods

Experimental elucidation of interactions between gene products is done either at small-or large-scale (Rivas and Fontanillo, 2010). Experiments detecting less than 100 protein interactions are commonly considered to be small-scale while the others are denoted as large-scale (Patil et al., 2011). Methods that identify direct physical interactions among protein pairs are called binary methods. Approaches that determine physical interactions between a group of proteins, without distinguishing between direct and indirect interactions, are co-complex methods.

Numerous experimental methods have been developed for protein interaction detec-tion, see Table 2.1 and Phizicky and Fields (1995) for a review. Traditionally, protein interactions have been detected by genetic, biochemical or biophysical techniques, such as X-ray crystallography or fluorescence resonance energy transfer (FRET). Such small-scale studies focus on individual proteins for generating specific interaction maps (Fin-ley and Brent, 1994; Mayes et al., 1999; Goehleret al., 2004). However, the increasing availability of fully sequenced genomes and the speed at which proteins are discovered increased the interest in techniques that screen large sets of candidates systematically.

Two widely established large-scale methodologies are the yeast two-hybrid (Y2H) sys-tem (Fields and Song, 1989) and tandem affinity purification coupled to mass spectrom-etry (TAP-MS) (Rigaut et al., 1999); the former system is a binary and the latter a co-complex method. Both methodologies have been used for large-scale experiments in different model organisms, including yeast, fly, worm and human. The majority of interaction data currently available in the databases IntAct and MINT, for instance,

0 5000 10000 15000 20000 25000 30000 35000 40000 45000

50000 IntAct MINT

Figure 2.6: Overview on the number of protein interactions per detection method as provided in the public databases IntAct and MINT (March 2011).

is derived from Y2H and its variants. A general overview on the number of protein interactions per detection method is shown in Figure 2.6.

In the following, we briefly introduce Y2H and TAP-MS as the work presented in this thesis largely relies on protein interaction data derived from such experiments. We will highlight the systematic and methodological limitations inherent to each method. These effects have to kept in mind as the amount of experimental errors inevitably affects the outcomes of further analysis.

Yeast two-hybrid assay (Y2H) The Y2H assay determines whether two proteins physi-cally interact with each other by using the principle of transcriptional activation. Genet-ically modified yeast strains are used to express two fusion proteins (two hybrids), which, if they interact, induce the expression of a reporter gene. Fusion proteins are created by linking proteins to separable protein domains of transcription factors. One protein, the bait, is fused to the DNA-binding domain that is capable to bind the promoter of a reporter gene. A potential binding partner, the prey, is linked to the activator domain that activates transcription by facilitating the binding of the RNA polymerase to the promotor. If both proteins interact, their complex forms an intact, functional transcrip-tional activator which mediates the transcription of the reporter gene (see Figure 2.7).

Reporter genes encode proteins whose function provides a simple readout, such asLacZ fromE. coli which causes a colorimetric reaction within the cell (Brueckneret al., 2009).

Large-scale library screens can be performed by using a cDNA library instead of a single prey protein. Y2H has been extensively applied in several large-scale screens (Uetz et al., 2000; Ito et al., 2001; Rual et al., 2005; Stelzl et al., 2005) and for individual

2.2 Protein-Protein Interactions

Transcription of reporter gene Bait protein

Prey protein

DNA binding domain Transcriptional activation domain

Promoter

RNA Polymerase Yeast cell

Figure 2.7: The yeast two-hybrid system for detecting binary protein-protein interac-tions (adapted from Alberts (1998)). A target protein, the bait, is fused to a DNA-binding domain that localizes it to the promoter region of a reporter gene. A potential binding partner, the prey, is linked to an activator domain. The interaction of both fusion proteins forms an intact, functional transcriptional activator which triggers the expression of the reporter gene.

experiments (Finley and Brent, 1994; Mayeset al., 1999; Davyet al., 2001).

Overall, Y2H is an established in vivo technique, well-suited for large-scale analysis.

It allows to detect both transient and stable interactions, independently of endogenous protein expression. Albeit yeast cells are utilized for expressing fusion proteins, Y2H is not restricted to interactions between yeast proteins; in principle, the genetic code of any fusion protein may be introduced into the yeast cell. The major drawback of the yeast two-hybrid assay is its poor reliability. Y2H is performed in the nucleus, hence many proteins are not analyzed in their native compartment. Thus, two proteins may interact in the experiment although they would not do so in their natural environment (Koegl and Uetz, 2007). In turn, essential post-translational modifications of non-yeast proteins may not be carried out, or the fusion process might interfere with the true interactions between proteins. In consequence, Y2H data are associated with a large number of false positive and false negative interactions. Early estimates on distinct data sets indicated that only 30–50% of the detected interactions are biologically meaningful. More recent quality assessments suggested that Y2H data contain less false positives as previously presumed. Nevertheless, Y2H screens are still far from being reliable and the rate of interactions not detectable by Y2H remains substantial (Yuet al., 2008).

Tandem affinity purification mass spectrometry (TAP-MS) In this technique, indi-vidual proteins are first fused to a protein fragment (the ‘tag’) which is used as an anchor for biochemical purification of protein complexes. The modified proteins are expressed and purified from cell extracts using the tag. Other proteins bound to the tagged protein are co-purified and subsequently identified by mass spectrometry (see Figure 2.8).

In contrast to Y2H assays, data derived from co-complex approaches, such as

TAP-MS or co-immunoprecipitation, cannot be directly translated into binary interactions.

Co-complex methods only identify proteins involved in a given complex rather than the direct interactions between them. Different models are employed to translate the group-based observations into pairwise interactions. The matrix model assumes that all proteins of a purified complex interact whereas the spokes model infers only interactions between the tagged protein and each co-purified protein. The latter one is often used, as it yields a smaller number of false positives (Hakes et al., 2007). Bader and Hogue (2002) estimated, for instance, that the number of false positives is three times larger in the matrix model.

Genome-wide TAP-based studies have been successfully performed for yeast (Krogan et al., 2006; Gavinet al., 2006), and for a smaller number of proteins in human (Ewing et al., 2007) and E. coli (Butland et al., 2005). Contrary to Y2H, TAP-MS detects protein complexes and interactions within the native cellular environment and is able to capture several members of a complex. In turn, protein complexes that are not present under the given conditions might be missed, loosely associated proteins of a complex might be washed of during purification and the tagging of a protein may interfere with the complex formation. Accordingly, the coverage of TAP-MS is limited as a large frac-tion of interacfrac-tions, e.g., transient interacfrac-tions, might be missed. Yet, false positive and false negative rates are much lower than for other experimental techniques (Kem-merenet al., 2002; von Mering et al., 2002), including Y2H, as interaction information are obtained under more natural physiological conditions than those induced by Y2H.

However, both methods detect rather complementary types of interaction and only the combination of different approaches with bioinformatic tools will eventually yield a more complete characterization of physiologically relevant protein interactions in a given cell or organism (Brueckneret al., 2009).

Literature curation

Protein interaction data, retrieved from small- and large-scale experiments, are com-monly published in the scientific literature. To make this knowledge available to the scientific community, interaction data have to be curated and archived in specialized databases.

Literature curation translates information on physical interactions between proteins from free-text publications into a structured format (Chatr-aryamontriet al., 2007). Cu-rators read through the literature, identifying and extracting all significant information:

the organism being studied, the gene product annotated, the proteins that interact, the type of experiment performed, and an identifier (typically the PubMed ID) as the source of information. This allows for quality control of the data.

However, the volume and growth of biomedical literature makes it hard to curate all newly published information (Hunter and Cohen, 2006; Chatr-aryamontriet al., 2007).

In addition, relevant data may be missed by oversight, an intrinsic weakness of purely human curation, and literature curation is ‘hypothesis-driven’ with prior assumptions of what could be learned. Accordingly, literature-curated data are often biased toward better-characterized genes and proteins (Cusick et al., 2009). In consequence, only a

2.2 Protein-Protein Interactions

Expression of fusion protein

Purified protein complex Mass spectrometry

Figure 2.8: TAP-MS procedure for characterizing protein complexes. A target protein, the bait, is fused to a protein fragment - the TAP tag - comprising a protein A-IgG binding domain (ProtA), a calmodulin binding peptide (CBP) and a TEV protease cleavage site. The modified protein is then expressed in cells where it may carry out its natural function participating in one or more protein complexes. Protein complexes are purified from cell extracts by two subsequent affinity chromatographies using the TAP tag. Co-purified proteins bound to the bait are then identified by standard mass spectrometry.

small fraction of all published interactions has been captured in the interaction databases so far.

Computational detection approaches

As discussed above, experimental detection methods and literature curation have several limitations and do not yet come close to elucidate full interactomes. Thus, several computational methods have been proposed for predicting protein-protein interactionsin silicobased on various evidence. A complete review of the available approaches including their strengths and limitations is beyond the scope of this section. We shall briefly introduce established concepts for predicting protein interactions and refer interested readers to extensive reviews (Valencia and Pazos, 2002; Shoemaker and Panchenko, 2007b; Liuet al., 2008) as predicted protein interactions are not used in this work.

A common approach for inferring novel protein interactions is based on the analysis of protein domains to determine which domains participate in an interaction. Given a set of protein domains that interact frequently in known interactions, novel interactions can be predicted between proteins containing the same domain pairs (Denget al., 2002;

Chen and Liu, 2005; Jothi et al., 2006). Another established methodology relies on the concept of ‘interologs’, which refers to pairs of homologous proteins interacting in different organisms (Matthewset al., 2001). Novel protein interactions are thus inferred by identifying evolutionarily conserved protein interactions in related genomes (Sharan

et al., 2005; Wiles et al., 2010). Additional methods employ:

• phylogenetic profiles (Pellegriniet al., 1999; Goh and Cohen, 2002),

• gene fusion events (Rosetta stone) (Marcotte et al., 1999b; Enright et al., 1999),

• co-localization information, such as gene neighborhood or gene cluster (Dandekar et al., 1998; Overbeeket al., 1999),

• patterns of co-occurrence or co-expression (Jansen et al., 2002; Geet al., 2001),

• sequence and structural similarities between interacting proteins (Comeau et al., 2004; Sikić et al., 2009).