• Keine Ergebnisse gefunden

Genome-wide analysis of mutually exclusive exons

1.5 Mutually exclusive exons

1.5.2 Genome-wide analysis of mutually exclusive exons

The prediction algorithm is parameterised. Different parameters result in different predictions.

If less restricted parameters are used, more already known annotations can be reconstructed resulting in a higher sensitivity, but also more false positive predictions are introduced resul-ting in a lower specificity. We could determine reasonable parameters from the application of the prediction algorithm to some example genes and the X chromosome of Drosophila melanogaster. Those parameters needed further evaluation. Very low parameters were used during the search in the whole fruit fly genome to examine the limits of these parameters with respect to the sensitivity. All predictions were stored in a database. A corresponding web ap-plication was developed to analyse the results based on different parameters that can be cho-sen after the prediction process. This application, called Kassiopeia, is able to store the ge-nome-wide analyses of mutually exclusive splicing in different organisms and to make those accessible. The development of Kassiopeia was part of this work (section 3.1, p. 119).

Kassiopeia

One could imagine to use an already established tool like the UCSC genome browser [71, 72]

instead of developing a new application. This genome browser is a popular tool used in many web applications to visualise annotations of genomes as shown in Figure 1.5-2 for the myosin heavy chain gene of Drosophila melanogaster. It allows adding multiple annotation tracks that contain position specific information related to the genome. Examples are the positions of exons as well as expression patterns or sequence conservation in different species.

In the case of the Kassiopeia database we could not follow this approach and decided to velop a new application based on the WebScipio source code. The main reason was the de-mand to allow filtering of the MXE candidates after the prediction process (Figure 3.1-2, p. 124). To our knowledge, this is not possible in any tool published so far. Each gene entry in

Kassiopeia is linked to other tools and databases to make their data easily accessible. The Drosophila genes are linked to the corresponding Flybase5 [73] entry, the modENCODE data in the UCSC genome browser6 and to WebScipio7.

Drosophila melanogaster

We chose the Drosophila melanogaster genome for the first genome-wide analysis and pre-diction of MXEs. Since the first classical genetic experiments with fruit flies by Thomas Hunt Morgan in 1908, Drosophila melanogaster developed to one of the best-analysed model or-ganisms for genetic studies. The annotation of its genes is in an advanced state, due to cDNA

5 http://flybase.org

6 http://flybase.org/cgi-bin/gbrowse/dmelrnaseq 7 http://www.webscipio.org

Figure 1.5-2 | UCSC genome browser. The figure shows the genomic region of the myosin heavy chain (Mhc) gene of Drosophila melanogaster in the UCSC genome browser (http://genome.ucsc.edu). The annotation tracks FlyBase Genes, Spliced ESTs and Conservation are selected.

sequencing, whole genome sequencing of closely related Drosophila species, transcriptome sequencing using RNA-Seq and additional computational methods. In addition, Drosophila melanogaster was the main object in the modENCODE project8 [15, 74]. This makes it pos-sible to validate our prediction approach with reliable annotations.

Compared to human and mouse, which also have high-quality annotations, the fruit fly ge-nome contains shorter introns making the analysis less complex and the prediction more ro-bust. To evaluate the sensitivity of our prediction method, the most important advantage of Drosophila melanogaster in contrast to the model organisms Arabidopsis thaliana and Cae-norhabditis elegans is that a lot of mutually exclusive splicing events were already reported:

- Drosophila melanogaster: 102-251 events [40, 50, 74]

- Arabidopsis thaliana: 3-4 events [40, 50]

- Caenorhabditis elegans: 30-55 events [40, 50, 75]

- Homo sapiens: 124-212 events [40, 50]

Twelve Drosophila species

Our prediction pipeline was applied to eleven additional Drosophila species besides Dro-sophila melanogaster (section 3.1, p. 119). This enables the analysis of the evolution of MXE clusters. The main result was that these clusters evolved very fast in the past 50 million years (section 3.2, p. 137). The mechanism seems to be frequently inserted in a wide range of genes. The analyses of the other Drosophilas also showed how accurate the predictions are for species that do not have a good gene annotation.

Arabidopsis thaliana

Intron retention is the most prevalent type of alternative splicing in plants, in contrast to exon skipping in Metazoa [44, 49]. Mutually exclusive splicing events seem to be very rare in plants [50] and overlooked by some studies up to know as in [76]. In the model organism Arabidopsis thaliana three to four events of mutually exclusive splicing were reported [40, 50] and 14 events are annotated in release 10 of The Arabidopsis Information Resource (TAIR) database [77]. Based on this release, our prediction pipeline found 99 internal MXE candidates (section 3.1, p. 119). Therefore, we expect the number of mutually exclusive splic-ing events in plants to be underestimated.

8 http://www.modencode.org

Caenorhabditis elegans

Another model organism that has an accurate gene annotation is the namatode Caenorhabditis elegans. So far 30 to 55 events of mutually exclusive splicing were reported [40, 50, 75] and 35 are annotated in the WormBase release 230. Based on this release, our predictions suggest 283 internal MXE candidates (section 3.1, p. 119).

Homo sapiens

The organism of highest interest in science is the human. This results in an accurate annota-tion of human genes, the basis of our predicannota-tion pipeline. In the human genome less MXEs are annotated than in the Drosophila melanogaster genome9, even though the total number of alternative splicing events is much higher [40]. In human 124 to 212 events of mutually ex-clusive splicing are reported [40, 42, 50].