• Keine Ergebnisse gefunden

Chapter 1 - General introduction

1.1 Phylogenomics

Phylogenetics - the reconstruction of evolutionary history - is a prerequisite of almost any evolutionary study. According to Darwin’s theory of evolution, under the correct ecological conditions an ancestral species may occasionally split into two descendant species. Initially, these two descendent species are very morphologically similar;

however measurable differences accumulate over time after the initial divergence.

This process, described as “descent with modification”, occurs repeatedly in the descendant species during the course of evolution. These branching lineages can be depicted as a tree-like structure, illustrating Darwin’s idea, that slow, successive modifications can give rise to the extreme diversities observed in contemporary species [17].

The earliest phylogenetic analyses were mainly based on similarities in morphological and ultrastructural characters. These morphological based methods have been proved powerful in some aspects by which the main groups of animals and plants can be identified easily [17, 18]. However, one main limitation in the morphological based methods is that the number of reliable homologous characters is rare or even nonexistent in some taxa (e.g. microorganisms). The emergence of the DNA sequencing in 1970s has revolutionized phylogenetic studies as it dramatically increased the number of homologous characters in phylogenetic reconstruction, thus improving the resolution of phylogenetic trees [18]. A few genes, particularly the

small subunit ribosomal RNA (SSU rRNA), became the reference markers for microbial identification and the inference of deep phylogenetic relationships [19].

However, topological conflicts between phylogenies using different genes were revealed, as more genes were included into analyses [18]. Furthermore, due to limited information of a single gene, numerous parts/branches of the tree of life are poorly resolved [18].

Advances in automated Sanger sequencing technologies have yielded an increase number of publicly available genomes [18, 20]. This wealthy of data has fostered a new research area, termed phylogenomics, which leads to analyses genomes in a phylogenetic context [21]. One branch of phylogenomics studies involves the use genomic data to reconstruct the ‘tree of life’ [18, 21]. Whole genome analyses dramatically increase the number of informative characters that can be used in phylogenetic analyses, thus reducing stochastic or sampling errors in the traditional single genes based phylogenetic reconstruction [18].

Multiple sequence alignment (MSA) method has long been a sine qua non in phylogenomic reconstruction [22]. The MSA method rests on orthologous genes across species, in which orthologous genes were firstly aligned using software like Clustalw [23] or Muscle [24], and then unambiguous aligned sites were used to reconstruct tree [18] (Figure 1.1). Two alternative approaches can be used in tree construction step, supermatrix [25] and supertree [26]. The supermatrix approach, following the principle of using all available data, reconstructs the phylogeny based on concatenation of individual genes; while the supertree approach combines the optimal tree from the analysis of each individual gene based on methods such as the matrix representation using parsimony (MRP) [27, 28]. The relative merits of these two methods still need to be explored [18]. For example, empirical studies showed that the superior of the supermatrix over supertree in constructing the phylogeny of crocodylians [29], whereas two approaches had produced similar trees in the phylogenomic analyses of Bacteria [30-32].

Chapter 1– General discussion

Figure 1.1 Phylogenomic inference using multiple sequence alignment method (modified from [18]).

The well-developed MSA based methods, however, are not without limitations.

Firstly, it is obvious that phylogenomic reconstruction, using hundreds of genes, is only suitable for model systems of which genome assembly and corresponding gene annotation are available. Furthermore, given the information such as gene annotation is available, factors, such as stochastic substitution variations and bias across sites and taxa (also know as heterotachy), incomplete taxon sampling, lateral gene transfer, copy number variation, recombination, gene fusion and diverse chromosomal inheritance patterns, can produce misleading phylogenetic signals and lead to phylogenetic reconstruction artifacts [18, 33]. More importantly, although under development, next-generation phylogenomics is highly demanded, which could transcend gene boundaries and capture genome-wide phylogenetic signals from unprecedented volumes and types of data with the applications of second and third generation sequencing technologies [34].

The advances of phylogenomics is realizing Darwin’s dream of having “... fairly true genealogical trees for each great kingdom of Nature”. It also provides a general framwork to understand the fundamental questions on speciation such as the following: How and when did particular characters (such as key evolutionary innovations) arise and how were they modified during speciation?

Understanding of the cause of speciation is one of the fundamental questions of evolutionary biology. Traditionally, to answer this question involves a close examination of the character changes in recently diverging species [35]. Phylogenetic analyses using molecular data provide robust information on both the phylogenetic relationships and divergence times [17, 35]. Phylogenetic methods can contribute powerfully to studies of speciation, when data on phylogenetic relationships and divergence times are combined with further information on biogeography and ecology [35]. For example, based on species-level phylogenies and geographic distributions of the recently formed sister taxa of several insect, bird and fish groups, researchers found that there is no range overlapping between the young species, thus suggesting the allopatric speciation model is more common than other speciation models [36].

These phylogenetic methods also provide a general framework in which to examine the impact of character evolution (e.g. ecological shift or key evolutionary innovation) on speciation. The basic idea is to map given traits onto a phylogeny, with the null hypothesis that if the traits arose randomly during evolution, they would have no effect on speciation [35]. This has been elegantly examined in haplochromine cichlids, a highly species-rich lineage, in the context of egg spot evolution [6]. Egg-spots, small, discrete pigment patches that mimic haplochromines eggs, are usually found on the anal fins of male cichlids. Central in the courtship behaviour displayed by haplochromines, egg spots are considered to be a sexual advertisement that stimulate female cichlids to open their mouths in the proximity of the male’s genital opening, ensuring the fertilization of their eggs [37, 38]. In their phylogenetic analysis, authors suggest that egg-spots are a key evolutionary innovation for the haplochromine cichlids, as the origin of this trait coincided with the origin of modern haplochromine cichlids [6].

Chapter 1– General discussion