• Keine Ergebnisse gefunden

An Odyssey on exploring the genomic evolution of vertebrates

N/A
N/A
Protected

Academic year: 2022

Aktie "An Odyssey on exploring the genomic evolution of vertebrates"

Copied!
211
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Dissertation zur Erlangung des

akademischen Grades eines Doktors der Naturwissenschaften

vorgelegt von Tereza Manousaki

an der

Mathematisch-Naturwissenschaftliche Sektion Fachbereich Biologie

Tag der mündlichen Prüfung: 9 April 2013 1. Referent/Referentin: Prof. Dr. Axel Meyer 2. Referent/Referentin: Prof. Dr. Michael Berthold

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-229383

(2)

2

«Να  προσπαθείς  πάντα  να  πηγαίνεις  εκεί  που  είναι  αδύνατον  να  πας»

Νίκος  Καζαντζάκης

“Head  always  towards  destinations  that  are  impossible  to  reach”

Nikos Kazantzakis

(3)

3

Table of Contents

Acknowledgements ...4

List of Tables ...6

List of Figures ...7

General Introduction...9

Chapter 1. The lamprey genome: illuminating vertebrate origins ...13

Abstract ... 17

Main text ... 18

Chapter 2. Co-orthology of Pax4 and Pax6 to the fly eyeless gene: molecular phylogenetic, comparative genomic and embryological analyses ...35

Abstract ... 37

Introduction ... 38

Methods ... 41

Results ... 45

Discussion ... 56

Chapter 3. Comparative genomics approach to detecting split-coding regions in a low- coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes) ...61

Abstract ... 63

Introduction ... 64

Methods ... 66

Results ... 70

Discussion ... 78

Chapter 4. Parsing parallel evolution: ecological divergence and differential gene expression in the adaptive radiations of thick-lipped Midas cichlid fishes from Nicaragua ...83

Abstract ... 85

Introduction ... 86

Methods ... 90

Results ... 97

Discussion ... 109

Chapter 5. Considerations on studying ecological adaptation and gene expression evolution in non-model organisms ...121

Abstract ... 123

Main text ... 124

General Discussion ...143

Summary ...146

Zusammenfassung ...150

Record of Achievements ...154

Appendices ...157

Cited Literature ...191

(4)

4

Acknowledgements

This dissertation is the result of 4 years of effort inside and outside of the laboratory. First of all, I would like to thank Prof. Axel Meyer for giving me the opportunity to conduct my Ph.D. in his laboratory and for providing a highly stimulating scientific environment. I wish to thank Dr. Shigehiro Kuraku who recruited me in the lab and gave me guidance throughout my Ph.D. study. Further, I am greatly indebted to Frau Bader, the person who made our lifes easier by managing every little or big thing. I am also very thankful to Konstanz Research School Chemical Biology and the University of Konstanz who supported greatly my study and my education by providing me not only a stipendium and numerous courses, but also further funding for attending workshops and conferences.

Luckily, in this odyssey I was not alone. Many people contributed to and shaped the content of this dissertation and my scientific thinking. I would like to start with Hyuk Je Lee for the amazing scientific discussions that ended up always with great ideas and big hangovers!! Gonzalo Machado-Schiaffino was constantly supporting me with excitement for research and joy for science and beer! Paolo Franchini introduced me to Next Gen.

Sequencing, and I taught him a bit of tennis!

There are many more people inside the lab that contributed to my research directly or indirectly. I would like to thank in particular Helen (especially when struggling with english), Julia, Fred, Andi, Shaohua, Julian, Nathalie and Adina for sharing their scientific mind with me. Further, I would like to thank all the members of the laboratory for the friendly atmosphere, which peaked on Fridays after 18:00!

Outside the lab, Ελευθερία,  Eva-Maria, Kelly and Mado gave their own piece of support throught out those 4 years.

Maria-Luise Spreitzer and Ji Hyoun Kang.. There are so many things to say but no word is good enough to describe my gratitude. I thank you so much for being there at all hard and beautiful moments.

(5)

5

Τερέζα,  Μηνά  and  Αριάδνη,  thank  you  from  my  heart  for  being  a true family for me!

As for my family (and Oscar), I can only say that nothing could have ever happened if I did not have your love and support which can cross thousands of kilometers in milliseconds!

Finally, I want to thank someone whose contribution is impossible to evaluate..

George Nomikos has been my greatest support with his patience, undestanding and persistence in believing in me. Αγαπημένε  μου  ξέρεις πολύ  καλά πόσο  σημαντικός  είσαι για  μένα. Η  υποστήριξή  και  η  αγάπη  σου  με  κάνουν ευτυχισμένη και  μου  δίνουν  δύναμη   να  συνεχίσω.. σε  ευχαριστώ  πολύ!

(6)

6

List  of  Tables

Chapter 2

Table 1. Result of maximum-likelihood analysis on Pax4/6 phylogeny ... 53

Chapter 4

Table 1. Specimen count summary. ... 93 Table 2. Morphological differentiation of body and jaw shape by morph and pairwise

stable isotope differentiation. ... 100 Table 3. Differentially expressed genes between thick- and thin-lipped Midas cichlids and

SNPs comparison across lakes. ... 106 Table 4. Shared differentially expressed genes between morphs across all four lakes. ... 107 Table 5. SNPs observed within morphs. ... 108

Chapter 5

Table 1. The studies selected for analyzing the correlation of differential expression and divergence time ... 135

(7)

7

List  of  Figures

Chapter 1

Figure 1. An abridged phylogeny of the vertebrates. ... 19 Figure 2. Genome-wide deviation of lamprey coding sequence properties from patterns

observed in other vertebrate and invertebrate genomes. ... 22 Figure 3. Conserved synteny and duplication in the lamprey and gnathostome (chicken)

genomes. ... 24 Figure 4. The effect of genome duplication and independent paralog loss on the evolution

of lamprey/gnathostome conserved syntenic regions. ... 27 Figure 5. Enrichment of gene ontologies among vertebrate-specific gene families. ... 29 Figure 6. Absence of sequence conservation for a limb Shh enhancer in lamprey. ... 32

Chapter 2

Figure 1. Three possible scenarios of the timing of gene duplication between Pax4 and Pax6. ... 39 Figure 2. Expression patterns of pax4 in zebrafish embryos. ... 47 Figure 3. Molecular phylogeny focusing on the Pax4/6 class of genes based on a broad

taxon sampling. ... 49 Figure 4. Conserved synteny containing Pax4 and Pax6 genes. ... 54 Figure 5. A hypothesized scenario for phylogenetic and regulatory properties of Pax4 and

Pax6. ... 58

Chapter 3

Figure 1. Phylogenetic position of the Callorhinchus milii. ... 65 Figure 2. Method overview to identify split protein-coding regions. ... 71 Figure 3. Evaluation of ESPRIT based on human genome with artificial split introduced in

8% of all CDSs. ... 73 Figure 4. Evaluation of ESPRIT based on comparison between NCBI m34 and m37 mouse

assemblies (CDS from Ensembl v35 and v48 respectively). ... 74 Figure 5. Confirmation of predicted split genes (in red) from full-length reference

counterparts (in blue). ... 76 Figure 6.Callorhinchus milii genomic contigs containing fragments of ephrin B1 gene. ... 77

Chapter 4

Figure 1. Map of western Nicaragua and relevant lakes... 89

(8)

8

Figure 2. Eco-morphological differentiation of thick- and thin-lipped Midas cichlids. ... 98

Figure 3. Body and jaw shape differentiation by lip group and lake. ... 101

Figure 4. Differential expression (mean base count to the log2-fold change) by morph across all four lakes... 104

Figure 5. GO terms summary for the differentially expressed contigs by lake. ... 105

Figure 6. Direction and magnitude of parallelism across lakes and analyses. ... 108

Figure 7. Lake size and age versus number of DE genes. ... 116

Chapter 5

Figure 1. Comparative analysis of differentially expressed genes and divergence times. ... 134

Figure 2. Proposed workflow of an RNA-Seq experiment. ... 141

(9)

General  Introduction

My Ph.D. dissertation is a series of pieces, each providing an autonomous perspective to fundamental questions of evolutionary biology and especially the areas of evolutionary developmental biology (evo-devo),   phylogenetics,   ecological   and   “pure”   genomics.   The   chapters can be divided to those that trace evolutionary events that took place deep in evolutionary time (Chapters 1 and 2), those that track evolutionary changes linked to very recent divergence between populations or species (Chapters 4 and 5), and a technical perspective (Chapter 3). Most of the work presented is interdisciplinary, standing at the intersection of areas such as ecology, developmental biology and genome evolution.

One of my main objectives was to explore the genomic evolutionary history of vertebrates. Vertebrates have numerous morphological innovations that played a major role in their  evolutionary  “success”.  To  study  the  evolution  of  those  vertebrate-specific innovations in a phylogenetic context, broad taxon sampling is required, and especially from species that diverged right after the split of the group from invertebrates. However, the majority of sequenced genomes available to date belong to groups that diverged much later (e.g.

mammals). An important exception is the genome of sea lamprey (Petromyzon marinus) that has recently been sequenced. Sea lamprey belongs to Cyclostomata, the first group that diverged from the rest of vertebrates (Gnathostomata) after the split from invertebrates. Thus, the sequencing of its genome filled a gap in the distribution of the available genomic information within the tree of vertebrates. Chapter 1 is dedicated to the analysis of lamprey genome. It incorporates the effort of many research groups and elucidates not only the biology of lamprey, but also that of the vertebrate ancestor (paper under review in Nature Genetics). Our contribution was two-fold. First, we identified the novel genes acquired during the evolution of the vertebrate ancestor, seeking to describe the genetic toolbox

(10)

10

accompanying the respective morphological evolution. Second, we characterized two inherent properties of the protein-coding moiety of the genome, codon usage bias and amino acid composition. Both findings are highlighted in the main paper and are described in the respective supplementary material with detailed report in our conducted analyses and results

The evolution of all extant vertebrates is highly influenced by an event that took place at the time when their last common ancestor existed. That was the tetraploidization of the genome, first proposed by Ohno (Ohno 1970) and later by a plethora of other studies. The evidence supporting this hypothesis were based i.) on the number of genes vertebrates have compared to invertebrates, and ii.) on the presence of paralogons in vertebrate genomes (genomic regions with significantly more paralogs than expected by chance) (Nakatani et al.

2007; Rokhsar et al. 2008). The tetraploidization happened through two rounds of whole genome duplications (WGD) (Kuraku et al. 2009) and offered a unique opportunity for the

“birth”  of  new  genes,  creating  a  genetic  “reservoir”  from  which  innovations  can  arise.  WGDs were followed by a massive wave of gene loss. However, different genes were lost in different  lineages  making  the  reconstruction  of  individual  gene  families’  evolutionary  history   a challenging task. Chapter 2 is dedicated to revealing the evolutionary history of two members of the Pax family, Pax4 and Pax6. Members of this family include transcription factors with significant developmental roles in animals (Wehr and Gruss 1996). The history of Pax4 and Pax6 seems to be tightly linked to WGDs.

Chapter 3 focuses on a technical perspective. In an era where individual laboratories can sequence a whole genome in relatively short time, the availability of massive data revealed the need for sophisticated methods that can extract biological information, even when the amount of sequencing is partial. We built a pipeline that incorporates a plethora of available software of gene prediction to extract gene models, and applied it in the genome of Callorhinchus milii (chimaera), the only available chondrichthyan genome so far. The predicted gene models of chimaera were then used as anchors to link previously unassembled contigs to scaffolds, based on their gene content. This pipeline improves significantly the genome scaffolding and gene modeling of the studied species by exploiting the information

(11)

11

available from other species, and provides the community of vertebrate biology with valuable primary resources.

The already introduced chapters incorporate genomic and phylogenetic techniques to understand deep evolutionary events, like gene duplications and losses, that took place hundreds of millions of years ago. The challenge in those questions is that after such events occurred, secondary changes followed. Those secondary events can often mask the event of interest, or create   a   “grey”   zone   beyond   which   phylogenetic   or   evolutionary   signatures   become hard to detect. On the opposite end of the evolutionary timescale, when the split between two populations happened very recently leading to speciation or is still ongoing, another grey zone appears. The changes taking place are not always in such extent to clearly identify, given the natural variation that characterizes almost every biological property of populations.

In Chapter 4 we explore the transcriptomic and ecological divergence in a remarkable paradigm of parallel evolution. Midas cichlids from Nicaragua provide an ideal example for studying ecological divergence in a repeated fashion. This chapter tackles the divergence between two ecomorphs of Midas cichlids, with and without hypertrophic (thick) lips. To capture the transcriptomic differences between the thick- and thin-lipped populations, we compared the levels of expression in a transcriptome-wide manner using the method of RNA-Sequencing (Wang et al. 2009). However, many aspects of how to study the gene expression changes that are correlated with adaptive traits are unknown, concerning both the biological and technical part. This fact prompted us to summarize the available knowledge and suggest ways of conducting similar research. Chapter 5 is a review paper that includes the considerations emerged while approaching such a complex question. Although a plethora of papers address this topic, our knowledge is still poor given the number of parameters affecting it. In this chapter we summarize the progress of the field using either microarrays or the new methodology of RNA-Sequencing to uncover the forces shaping gene expression evolution and its impact upon speciation. Further, we tackle many of the technical aspects on

(12)

12

the experimental design, particularly using the next-generation sequencing technology, proposing alternative strategies.

The chapters of this dissertation provide a mosaic view on how evolution shapes genes and genomes over long and short evolutionary periods, shedding light on the evolutionary history leading to morphological changes. This area of research awaits great progress especially in an era where massive sequencing is turning to common practice, allowing is to tackle questions that require integrative approaches as the ones described in this dissertation.

(13)

Chapter  1.  The  lamprey  genome:  illuminating   vertebrate  origins

(in review in Nature Genetics)

(14)

14

The Lamprey genome: illuminating vertebrate origins

Jeramiah J Smith 1*†,2, Shigehiro Kuraku 3†,4, Carson Holt 5†,6, Tatjana Sauka- Spengler 7†,8, Ning Jiang 9, Michael S. Campbell 6, Mark D. Yandell 6, Tereza Manousaki 4, Axel Meyer 4, Ona E. Bloom 10,11, Jennifer R. Morgan 12†,13, Joseph D. Buxbaum 14-17, Ravi Sachidanandam

14, Carrie Sims 18, Alexander S. Garrett 18, Malcolm Cook 18, Robb Krumlauf 18,19, Leanne M.

Wiedemann 18,20, Stacia A. Sower 21, Wayne A. Decatur 21, Jeffrey A. Hall 21, Chris T.

Amemiya 2,22, Nil R. Saha 2, Katherine M. Buckley 23, Jonathan P. Rast 23, Sabyasachi Das 24, Masayuki Hirano 24, Nathanael McCurley 24, Peng Guo 24, Nicolas Rohner 25, Clifford J.

Tabin 25, Paul Piccinelli 26, Greg Elgar 26, Magali Ruffier 27, Bronwen L. Aken 27, Stephen M.

J. Searle 27, Matthieu Muffato 28, Miguel Pignatelli 28, Javier Herrero 28, Matthew Jones 8, C.

Titus Brown 29,30, Yu-Wen Chung-Davidson 31, Kaben G. Nanlohy 31, Scot V. Libants 31, Chu- Yin Yeh 31, David W. McCauley 32, James A Langeland 33, Zeev Pancer 34, Bernd Fritzsch 35, Pieter J. de Jong 36, Lucinda L Fulton 37, Brenda Theising 37, Paul Flicek 28, Marianne Bronner

8, Wesley C. Warren 37, Sandra W. Clifton 37,38†, Richard K. Wilson 37, Weiming Li 31

Affiliations:

1 Department of Biology, University of Kentucky, Lexington, KY, USA.

2 Benaroya Research Institute at Virginia Mason, Seattle, WA, USA.

3 Genome Resource and Analysis Unit, Center for Developmental Biology, RIKEN, Japan.

4 Department of Zoology and Evolutionary Biology, University of Konstanz, Konstanz, Germany.

5 Ontario Institute for Cancer Research, Informatics and Bio-Computing, Toronto, ON, Canada. 6 Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT, USA.

7 The Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.

8 Division of Biology, California Institute of Technology, Pasadena, CA, USA.

(15)

15

9 Department of Horticulture, Michigan State University, East Lansing, MI, USA. 10 The Feinstein Institute for Medical Research, Manhasset, NY, USA.

11 The Hofstra North Shore-LIJ School of Medicine, Hempstead, NY, USA.

12 Marine Biological Lab, Woods Hole, MA, USA

13 Department of Cell and Molecular Biology, University of Texas at Austin, Austin, TX, USA.

14 Department of Genetics and Genomics Sciences, Mount Sinai School of Medicine, New York, NY, USA.

15 Department of Psychiatry, Mount Sinai School of Medicine, New York, NY, USA.

16 Department of Neuroscience, Mount Sinai School of Medicine, New York, NY, USA. 17 Friedman Brian Institute, Mount Sinai School of Medicine, New York, NY, USA.

18 Stowers Institute for Medical Research, Kansas City, MO, USA.

19 Department of Anatomy &Cell Biology, The University of Kansas School of Medicine, Kansas City, KS, USA.

20 Department of Pathology and Laboratory Medicine, University of Kansas School of Medicine, Kansas City, KS, USA.

21 Center for Molecular and Comparative Endocrinology, University of New Hampshire, Durham, New Hampshire, USA.

22 Department of Biology, University of Washington, Seattle, WA, USA.

23 Department of Immunology and Department of Medical Biophysics, University of Toronto, Sunnybrook Research Institute, Toronto, ON, Canada.

24 Emory Vaccine Center and Department of Pathology and Laboratory Medicine, Emory University, Atlanta, Georgia, USA.

25 Department of Genetics, Harvard Medical School, Boston, MA, USA.

26 MRC National Institute for Medical Research, London, England.

27 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

(16)

16

28 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK.

29 Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.

30 Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA.

31 Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA.

32 Department of Zoology, University of Oklahoma, Norman, OK, USA.

33 Department of Biology, Kalamazoo College, Kalamazoo, MI, USA.

34 Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, MD, USA.

35 Department of Biology, University of Iowa, Iowa City, IA, USA.

36 Children's Hospital Oakland, Oakland, CA, USA.

37 The Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.

38 The Advanced Center for Genome Technology, Norman, OK, USA.

* Correspondence to: Jeramiah J Smith, jjsmit3@uky.edu

Current Address

(17)

17

Abstract

Lampreys are representatives of an ancient vertebrate lineage that diverged from our own

~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (Petromyzon marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and fundamentals of vertebrate biology. Here, we present the first version of the lamprey genome assembly, generated by overcoming challenges presented by its high content of repetitive elements and GC bases as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole genome duplications likely occurred before divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin associated proteins and development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and evolutionary events that have shaped extant organisms and their genomes.

(18)

18

Main text

The fossil record reveals that, during the Cambrian period, there was a great elaboration in the diversity of animal body plans. This includes emergence of a species with several characteristics shared with modern vertebrates such as a cartilaginous skeleton that encases the central nervous system (cranium and vertebral column) and provides a support structure for the branchial arches and median fins. The cartilaginous cranium of this species housed a tripartite brain with a forebrain regulating neuroendocrine signaling via the pituitary gland, a midbrain (including an optic tectum) for processing sensory information from paired sensory organs and a segmented hindbrain for controlling unconscious functions such as respiration and heart rate. These adult features suggest that their embryos must already have possessed uniquely vertebrate cell types like the skeletogenic neural crest and ectodermal placodes, both defining characters of modern day vertebrates. Subsequent diversification of this lineage gave rise to the jawed vertebrates (gnathostomes), hagfish (for which genome-scale sequence data are currently limited), lamprey and several extinct lineages (Figure 1, Supplement 1).

Given the critical phylogenetic position of lamprey, as an outgroup to the gnathostomes (Figure 1), comparing the lamprey genome to gnathostome genomes holds the promise of providing insights into the structure and gene content of the ancestral vertebrate genome. Unsolved important questions include the timing and subsequent elaboration of ancient genome duplication events and the elucidation of genetic innovations that may have contributed to the evolution and development of modern vertebrate features. These include the jaws, myelinated nerve-sheaths, adaptive immune system and paired appendages/limbs.

(19)

19

Figure 1. An abridged phylogeny of the vertebrates. This figure shows the timing of major radiation events within the vertebrate lineage. Extinct lineages and some extant lineages (e.g. coelacanths, lungfish  and  hagfish)  have  been  omitted  for  simplicity.  CZ  =  Cenozoic.  Here,  “reptiles”  is  synonymous   with   “sauropsids”,   “ray-finned   fish”   is   synonymous   with   “actinopterygian”   and   'Euteleostome'   is   synonymous  with  “Osteichthyan”.

Sequencing, assembly and annotation

Approximately 19 million sequence reads were generated from genomic DNA derived from the liver of a single wild-captured adult female sea lamprey (Petromyzon marinus) (Supplement 2). The lamprey genome project was initiated well before the discovery that lamprey undergoes programmed genome rearrangements during early embryogenesis, which results in the deletion of ~20% of germline DNA from somatic tissues (Smith et al. 2009), though the effects of rearrangement on the genic component of the genome are not fully understood. We used raw sequence reads to examine large-scale sequence content and repetitive structure of the lamprey genome. These analyses indicated that the lamprey genome is highly repetitive, rich in GC bases, and highly heterozygous (Supplement 3). Although these features tend to encumber assembly of long contiguous sequences, analyses of broad-

(20)

20

scale structure enabled optimization of parameters used in assembly algorithms (Supplement 3).

The current assembly was generated using Arachne (Jaffe et al. 2003) and consists of 0.816 Gb of sequence, distributed across 25,073 contigs. Half of the assembly is in 1,219 contigs of 174 kb or longer and the longest contig is 2.4 megabases. This assembly resolves multi-kilobase to megabase-scale structure over a majority of single-copy genomic regions (Supplement 3), permitting the annotation of repetitive elements (Supplement 4), genes (Supplements 5 and 6) and conserved intergenic features (Supplement 7). Detection of extensive conserved synteny with gnathostome genomes (below) indicates that lamprey scaffolds accurately reflect the chromosomal organization of the lamprey genome. This assembly therefore provides unparalleled resolution of gene content and structure of this evolutionarily important genome.

Ab initio searches for repetitive DNA sequences revealed that the lamprey genome contains abundant repetitive elements with high sequence identity. We identified 7752 distinct families of repetitive elements, accounting for 34.7% of the assembly (Supplement 4).

Notably, this proportion is expected to be a significant underestimate, due to collapsing of repetitive elements during genome assembly. The large diversity of lamprey repetitive elements and abundance of high-identity (presumably young) repeats represents a potentially rich resource for studies of the evolution and transposition of repetitive sequences.

The location of genes was determined by combining RNAseq mapping and exon linkage data (Supplement 5) with gene homologies, prediction of coding sequences, splicing signals and repetitive elements using the MAKER pipeline (Cantarel et al. 2008) (Supplement 6). The final set of annotated protein-coding genes contained a total of 26,046 genes encoding 26,204 transcripts. This number is similar to the numbers of predicted protein-coding genes in other vertebrate genomes reported to date. Conserved non-coding elements (CNEs) were identified by homology to published sequences (Woolfe et al. 2007; Venkatesh et al. 2006).

Searches_ENREF_7 identified a limited number of homologous CNEs in lamprey, 337 (5.0%

of 6670 (4)) and 287 (6.0% of 4782 5), in close agreement with previous analyses (McEwen

(21)

21

et al. 2009). For those lamprey CNEs that where linked to conserved homologous regions in lamprey and gnathostome genomes, sequence identity typically extends over approximately half the length (53%) of the gnathostome CNE (Supplement 7). Thus, either the lamprey lineage diverged from jawed vertebrates before most gnathostome CNE sequences became highly constrained or these CNEs have evolved much more rapidly in the lamprey genome than in jawed vertebrate genomes. Future work on additional lamprey and hagfish genomes should ultimately resolve this question.

Variation in nucleotide content and substitution can strongly influence intragenomic functionality and intergenomic comparative analyses. Analysis of the lamprey genome revealed that the GC-content of the lamprey genome assembly is higher than most other vertebrate genomes that have been reported. Overall, 46% of the assembly is composed of GC bases, similar to the GC content of raw WGS reads (Supplement 8). Genome-wide analyses also reveal patterns of intragenomic heterogeneity in GC content similar to those of amniote species that possess isochore structures, but lower in magnitude. Moreover, GC-content of protein-coding regions (61%) is markedly higher than that of non-coding and repetitive regions. As expected, this is highest in the third position of codons (GC3; 75%) (Supplement 8). Patterns of GC bias strongly affect codon usage and amino acid composition of lamprey genes, imparting an underlying structure to lamprey coding sequences that differs substantially from all other sequenced vertebrate and invertebrate genomes (Figure 2, Supplement 8). Notably, we did not detect a significant correlation between GC3 and GC- content of adjacent non-coding regions. Thus, it appears that processes that lead to patterns of intragenomic heterogeneity in lamprey GC-content differ fundamentally from those in species that possess isochore structures. This raises a question regarding the adaptive value or other biological importance of the observed variation of GC content within and among genomes.

To further explore the biological basis of high GC content and its intragenomic heterogeneity, we examined the relationship between protein-coding GC-content and codon usage bias, amino acid composition and gene expression level. The results show that genomic GC content strongly correlates with codon usage bias and amino acid composition, but not

(22)

22

with gene expression level (Supplement 8). These observations are consistent with a scenario wherein high GC content results from broad-scale substitution bias, rather than selection for specific GC- rich codons. As lamprey is clearly an outlier amongst vertebrates, further dissection of coding GC content in the sea lamprey and other lamprey and hagfish species will help reveal the causes and consequences of intragenomic heterogeneity of GC content in vertebrate genomes.

Figure 2. Genome-wide deviation of lamprey coding sequence properties from patterns observed in other vertebrate and invertebrate genomes. (A) Codon usage bias. Correspondence analysis on relative synonymous codon usage (RSCU) values was performed using nucleotide sequences of all predicted genes concatenated for individual species. (B) Amino acid composition. Correspondence analysis was performed using deduced peptide sequences of all predicted genes for individual species. Red: lamprey.

Grey: invertebrates. Green: jawed vertebrates.

Duplication structure of the genome

It is generally accepted that two rounds (2R) of whole genome duplication (WGD) occurred early in the history of vertebrate evolution (Ohno 1979). However, the timing of these defining duplication events has not been well supported by genome-wide sequence data thus far (Kuraku et al. 2009). As the proximate outgroup to jawed vertebrates, the lamprey genome is uniquely suited for addressing several questions regarding the occurrence, timing, and outcome of WGD events. To identify gene and genome duplication events in the ancestral

(23)

23

vertebrate lineage, we analyzed patterns of duplication within conserved syntenic regions of lamprey and gnathostome genomes and compared these patterns to the entire genome.

We estimated duplication frequencies based on aligning all predicted lamprey proteins from the MAKER (Cantarel et al. 2008) dataset to whole genome assemblies for human (GRCh37, GCA_000001405.1) and chicken (Gallus_gallus-2.1, GCA_000002315.1).

To account for the possibility that paralogs have been retained on one or both genomes, in a way that bypasses many confounding aspects of phylogenetic reconstruction (Supplement 9, 10), regions were considered putative orthologs if they yielded the highest-scoring alignment between the two genomes or an alignment score (bitscore) within 90% of the top-scoring alignment (see Supplement 10 for details). Strong patterns of conserved synteny are observed between lamprey and both human and chicken genomes (Supplement 10). For simplicity, we present comparisons to the chicken genome below as the genome is known to have undergone substantially fewer interchromosomal rearrangements than have mammalian genomes (Hillier et al. 2004; Smith and Voss 2006).

(24)

24

Figure 3. Conserved synteny and duplication in the lamprey and gnathostome (chicken) genomes. In panels A – D, the locations of presumptive lamprey/chicken orthologs (including duplicates) are plotted relative to their physical position on chromosomes and scaffolds, and connected by colored lines. Panels A and B show pairs of chicken chromosomes that correspond to a series of lamprey scaffolds. In panel A, 10 lamprey loci are present as duplicate copies in the chicken genome and 59 are present as single copies. In panel B, 12 lamprey loci are present as duplicate copies in the chicken genome and 54 are present as single copies. In Panels C and D, asterisks indicate duplicates.

Our analyses indicate that most lamprey and gnathostome genes currently do not possess 2R-duplicates in their respective genomes (Supplement 9, 10), presumably due to frequent loss of one paralog following duplication. Accordingly, we used the lamprey genome to search for a signature of large-scale duplication that does not rely on the retention of duplicated genes, but can be informed by their presence. Specifically, we searched for cases

(25)

25

wherein a single lamprey scaffold contains interdigitated homologies from two distinct regions of a gnathostome genome (Figure 3). Such patterns are consistent with large-scale duplication followed by random loss of either paralogous copy. Nearly all lamprey scaffolds exhibited patterns of interdigitated conserved synteny of gnathostome orthologs (Tables S10.1 and S10.2). Moreover, homologs from individual pairs of gnathostome chromosomes were recurrently observed in interdigitated syntenic blocks on several lamprey scaffolds.

Importantly, some of the individual homologous markers that contributed to these conserved syntenic blocks were mapped to duplicate positions within gnathostome genomes, being present on the two homologous gnathostome chromosomes. Although these duplicates constitute a relatively modest fraction of conserved syntenic homologs (14.5% in Figure 3A, and 18.2% in Figure 3B, not counting redundant copies), we interpret these as strong evidence that large-scale (i.e. whole genome) duplication has played a major role in shaping gnathostome genome architecture.

Similar duplication patterns on lamprey scaffolds also appear to support the notion that large-scale (i.e. whole genome) duplication has played a major role in shaping lamprey genome architecture. Although lamprey scaffolds do not yet provide chromosome-scale resolution, several cases were identified wherein two large lamprey scaffolds contain predicted paralogs and patterns of interdigitated conserved synteny (two defining signatures of large-scale duplication; Figure 3C and D, Supplement 10). To further assay for patterns indicative of ancient whole genome duplication events (i.e. 2R) within the lamprey genome, we manually examined all lamprey scaffolds that possessed 10 or more gnathostome homologs. These 83 scaffolds accounted for 10% of the comparative map and possessed a duplication frequency (0.463, including redundant copies of duplicates) that was similar to the genome at large (0.448). Among these scaffolds, we identified 29 gene pairs that were present as duplicates on two large scaffolds and one trio that was present on three large scaffolds. For a majority of duplicates, scaffolds contained at least one additional ortholog on the chicken chromosome that harbored an ortholog of the duplicate [specifically, both (59.3 %), one (29.6

%) and no (11.1%) scaffolds contained an additional syntenic ortholog]. On average, these

(26)

26

scaffolds contained 2.98 additional conserved syntenic genes for each individual lamprey duplicate (including the 11.1% with no syntenic markers). These patterns are consistent with the existence of patterns of interdigitated synteny in the lamprey genome that are highly similar to in gnathostome genomes, indicating that the most recent (2R) WGD event likely occurred in the common ancestral lineage of lampreys and gnathostomes.

Additional genome-wide analyses reveal that: 1) the number of ancestral loci with retained duplicates in gnathostome genomes is not significantly different from the number with retained duplicates in lamprey (lamprey = 0.271, chicken = 0.262, c2 = 2.94, P = 0.08, Supplement 10); 2) the frequency of shared duplications is higher than would be expected by chance (observed = 0.150, expected = 0.022, c2 = 6179, P(c2) < 1e-100,  P(Fisher’s  Exact)  <  

1e-100, Supplement 10); 3) a model invoking recurrent selection against small-scale duplicates across a majority of the genome is not sufficient to explain genome-wide patterns of shared duplication (Supplement 10), and 4) inclusion of lamprey in phylogenetic analyses resolves gene families consistent with two rounds of whole genome duplication (Supplement 9). Moreover, targeted analyses of Hox clusters and gonadotropin releasing hormone (GnRH)-syntenic regions reveal that post-duplication loss of paralogs has occurred largely independently in lamprey and gnathostome genomes, consistent with divergence of the two lineages shortly after the last WGD event (Figure 4, Supplement 11 and 12). Although the less parsimonious scenario of one or two independent and ancient WGD events in gnathostome and lamprey lineages cannot be completely ruled out, neither a gnathostome- specific genome duplication nor persistent selection to retain a subset of independent duplicates is likely to explain the subtle differences in duplication structure of lamprey versus gnathostome genomes. It seems exceedingly unlikely that such genomic arrangements and distributions of synteny blocks would arise by chance or mechanisms other than by an ancient shared WGD. We therefore propose that genome wide patterns of duplication are indicative of a shared history of two rounds of genome-wide duplication prior to the lamprey/gnathostome divergence.

(27)

27

Figure 4. The effect of genome duplication and independent paralog loss on the evolution of lamprey/gnathostome conserved syntenic regions. (A) Conserved synteny among the GnRH group 2, 3, and 4 genes in lamprey, chicken, and humans, including the medaka region for GnRH3, which is absent in tetrapods. The orientation of each chromosome (Chr) and scaffold (Sf) is indicated with line arrows. A pointed box represents the orientation  of  each  gene.  Open  rectangles  with  red  X’s  indicate   lost GnRH loci. The ancestral state of the gene region is shown at the bottom. (B) Assembled lamprey Hox scaffolds and patterns of conserved synteny, relative to human Hox clusters (human Hox clusters, rather than chicken, are used because all four human hox-syntenic regions are integrated into the human genome assembly). A pointed box represents the orientation of each gene. Three additional conserved syntenic genes, located adjacent to the PM2Hox cluster, are omitted due to space limitations (retinoic acid receptor, heterogeneous nuclear ribonucleoprotein and thyroid hormone receptor).

(28)

28 Ancestral vertebrate biology

It has been suggested that many of the morphological and physiological features that characterize vertebrates evolved through the modification of preexisting regulatory regions and gene networks (Carroll 2008). However we reasoned that the lamprey genome might enable us to identify genes that arose within the ancestral vertebrate lineage and infer how these new genes may have contributed to specific innovations in ancestral vertebrates that contributed to their arguably successful evolutionary trajectory. Toward this end, we searched for lamprey genes that: 1) have homologs in at least one sequenced gnathostome genome; and 2) have no identifiable invertebrate homolog in annotated sequence databases and genome project-based resources (including, but not limited to invertebrate deuterostomes: sea urchin, sea snail, acorn worm, lancelet and sea squirt). In total, this search identified 224 gene families that presumably trace their evolutionary origin to the ancestral vertebrate lineage (Supplement 13). Notably, these included many gene families whose taxonomic distribution was previously thought to be more restricted (e.g., APOBEC4 was previously reported to be a tetrapod-specific gene) (Rogozin et al. 2005). Thus, roughly 1.2-1.5% of the protein-coding landscape in the human genome (263 genes from 224 families / ~20,000 genes) originated from novel genes that emerged at the base of vertebrate evolution. Phylogenetic analyses also revealed expansions and reductions of gene families within vertebrate lineages (Supplement 9). These reveal specific loss of clotting-related genes in the lamprey lineage and differential contraction and expansion of gene families related to neural function and inflammation in lamprey versus gnathostome lineages, which reflect broad parallels in the evolution of lamprey and gnathostome immunity (Supplement 14).

To better understand how novel genes may have contributed to the evolution of the vertebrate ancestor, we collected gene ontology (functional) information for the 224 vertebrate- specific gene families (Supplement 15). Comparing these gene ontologies to the genome-wide distribution of lamprey ontologies revealed that these vertebrate-specific gene families are significantly enriched in functions related to myelination and neuropeptide/neurohormone signaling (Figure 5). These findings suggest that elaboration of

(29)

29

signaling in the vertebrate central nervous system may have been facilitated by the advent of novel vertebrate genes. Ontology analyses are also consistent with the broadly held view that most genes involved in regulation of morphogenesis are of ancient origin and common throughout animals.

Figure 5. Enrichment of gene ontologies among vertebrate-specific gene families. Horizontal bars show the frequencies of ontology classes among vertebrate-specific gene families and in the entire set of lamprey gene models. Data are shown for the all ontologies that are over-represented with P<0.005 (Fisher’s   Exact   test).   Most   over-represented ontologies are related to neural development and neurohormone signaling.

In all extant gnathostomes, myelinating oligodendrocytes wrap axons in a layer of proteins and lipids, increasing the efficiency and speed of neuronal conduction. In humans, disorders of myelination have many manifestations that range from cognitive to movement disorders. Intriguingly, analysis of the lamprey genome identified specific enrichment of genes associated with myelin formation in the central and peripheral nervous system of jawed vertebrates (Figure 5, Supplement 13, 16), despite the fact that extant jawless vertebrates are thought to completely lack myelinating oligodendrocytes (Bullock et al. 1984). These genes

(30)

30

include peripheral myelin protein 22 (Pmp22), myelin protein zero (Mpz), as well as myelin proteolipid protein (Plp), myelin and lymphocyte (Mal) protein, and myelin transcription factor 1-like (Myt1l). Homologs of Mal and Pmp22 were reported to be present in Ciona intestinalis, an invertebrate chordate (Gould et al. 2005) and putative Ciona homologs of Myt1l and Plp1 are identifiable in Ensembl (Flicek et al. 2011). To our surprise, analysis of the lamprey genome revealed three myelination genes that may have evolved specifically within the ancestral vertebrate lineage [myelin basic protein (Mbp), Mpz and 2', 3'-cyclic nucleotide 3'-phosphodiesterase (CNP); Supplement 13, 16]. This suggests that the molecular components of myelin already existed in the vertebrate ancestor and were later recruited in the evolution of myelinating oligodendrocytes within the gnathostome lineage, perhaps through the evolution of regulatory systems (Newbern and Birchmeier 2010). Alternatively, oligodendrocyte-like cells may have been present at the vertebrate ancestor but secondarily lost in the lamprey lineage while retaining genes encoding myelin proteins. Dissecting the function  of  “myelination”  genes in lamprey and hagfish should continue to shed light on the origin of gnathostome myelin.

By virtue of its basal phylogenetic position, the lamprey also serves as a key comparative model for understanding the evolution of the vertebrate immune system.

Lamprey possess two major immune cell types that are similar to T- and B- lymphocytes of gnathostomes but possess adaptive immune receptors that are unrelated to gnathostome immunoglobulins, perhaps reflecting that of the ancestral vertebrate (Saha et al. 2010; Guo et al. 2009). The lamprey genome harbors several genes that impart unique functionality to gnathostome B- and T- lymphocytes. Annotation of other components of the immune system reveals that reduced complexity in vertebrate innate immune receptors may have coincided with the evolution of adaptive immune receptors (Supplement 14). Analysis of the lamprey genome assembly and end-mapped BAC clones reveals that each rearranging lamprey immune receptor locus (variable lymphocyte receptors: VLRs) extends for several hundred contiguous kilobases. For example, the VLRB locus extends for at least 717kb, with

(31)

31

components of the receptor face being drawn from regions distributed across practically the entire length of the current scaffold (Supplement 14).

The lamprey genome sheds light on evolutionary events that occurred early in the evolution of the gnathostome lineage, after the lamprey/gnathostome split. Paired appendages (pelvic and pectoral fins in fish; hind- and fore- limbs in tetrapods) are a major evolutionary innovation of gnathostome vertebrates, as they permit additional forms of locomotion and behavior. Lampreys have well-developed dorsal and caudal fins, but lack paired fins. Despite different embryonic origins, signaling pathways involved in development and positioning of median fins are reused for paired fin development (Freitas et al. 2006), raising the question of whether these pathways were already present in the limbless ancestral vertebrate (Supplement 17). During fin and limb development Shh is required to pattern the anteroposterior axis of appendages. It has been shown that the limb specific expression of Shh is coordinated by a long-range cis-acting  enhancer.  This  “Shh  appendage  specific  regulatory  element”  (ShARE)   is found in homologous positions in tetrapods, teleosts, and chondrichthyans (Dahn et al.

2007; Lettice et al. 2003; Sagai et al. 2005). In all vertebrate species analyzed so far, this element is found in intron 5 of the limb region 1 gene (Lmbr1) that lies up to 1 Mb away from the transcription start site of Shh, in mouse. Intriguingly, the presence of ShARE is correlated with the presence of paired appendages at least within the tetrapod lineage, as snakes and caecilians seemingly have lost this element secondarily (Sagai et al. 2004). Because of the conserved genomic position of the element in other vertebrates we focused our analysis on lamprey orthologs of the Lmbr1 gene. Directed analysis of intron 5 in the Lmbr1 orthologs revealed that these introns are much shorter and had no similarity to ShAREs (Figure 6).

Searches of the entire genome assembly or raw sequence reads also failed to detect any regions similar to ShARE, suggesting that this regulatory region evolved within the gnathostome lineage.

(32)

32

Figure 6. Absence of sequence conservation for a limb Shh enhancer in lamprey. Comparison of an intronic region in the Lmbr1 gene. Focusing on the intron containing the Shh cis-regulatory element (ShARE [or MFCS1]; see references Dahn et al. 2007; Sagai et al. 2005), genomic nucleotide sequences of jawed vertebrates and the lamprey were compared with mVISTA (Frazer et al. 2004) using the mouse as reference. Note that two genomic regions were identified in the lamprey as harboring potential Lmbr1 orthologs. Lengths of this intron for individual species are listed on the right.

Summary

The lamprey genome provides unique insight into the origin and evolution of the vertebrate lineage. Here, we present a few examples of its utility in dissecting the evolution of vertebrate genomes, and aspects of ancestral vertebrate biology. As examples, we: 1) provide genome- wide evidence for two WGD in the common ancestral lineage of lampreys and gnathostomes, 2) identify novel genes that evolved within this ancestral lineage, 3) link vertebrate neural

(33)

33

signaling features to the advent of novel genes, 4) uncover parallels in immune receptor evolution, and 5) provide evidence that a key regulatory element in limb development evolved within the gnathostome lineage. This genomic resource holds the promise of providing insights into many other aspects of vertebrate biology, especially with continued refinements in the assembly and the capacity for direct functional analysis in lamprey (Nikitina et al. 2008; Nikitina et al. 2009).

Acknowledgments

The lamprey genome project was funded by the National Human Genome Research Institute [U54HG003079 (RKW)]. Additional support was provided by grants from the National Institutes of Health (R24GM83982) and Great Lakes Fisheries commission (WL). Partial funding was provided by several additional sources, including grants from National Institutes of Health [F32GM087919, T32HG00035 (JJS); DE017911 (MB); R01HG004694 (MDY);

GM079492, GM090049, and RR014085 (CTA) and R37HD032443 (CJT)], the National Science Foundation [MCB-0719558 (CTA); IOS-0849569 (SAS) and IOS-1126998 (MJY)], the New Hampshire Agricultural Experiment Station [Scientific Contribution Number 2471 (SAS)], the Charles Evans Research Award (OB, JB and JRM), the Canadian Institutes of Health Research [MOP74667 (JPR)] and the Natural Sciences and Engineering Research Council of Canada [312221 (JPR)]. We recognize all the important work that could not be cited due to space limitations. Genome Institute, Washington University School of Medicine, Production Sequencing group for all sample procurement and genome sequencing work, Michigan State University Genomic Core for transcriptome sequencing, and US Geological Survey Lake Huron Biological Station for providing lamprey samples for sequencing. We thank Francesca Antonacci and Evan E. Eichler from the University of Washington Department of Genome Sciences for performing fluorescent in situ hybridizations and providing access to computational facilities, respectively. We thank Mark Robinson for bioinformatic analysis of immune system genes and conversion of GFF files for BAC end mapping. A portion of this research was conducted in part at the Marine Biological

(34)

34

Laboratory, Woods Hole, MA. We acknowledge the support of the Stowers Institute and technical support from the SIMR Molecular Biology Core, particularly Karen Staehling, Anoja Perera and Kym Delventhal for BAC screening and sequencing. We acknowledge the Center for High Performance Computing at the University of Utah for the allocation of computational resources toward gene annotation.

(35)

Chapter  2.  Co-orthology  of  Pax4  and  Pax6  to  the  fly   eyeless  gene:  molecular  phylogenetic,  comparative  

genomic  and  embryological  analyses

Evolution & Development 13: 448-459

(36)

36

Co-orthology of Pax4 and Pax6 to the fly eyeless gene: molecular phylogenetic, comparative genomic and embryological analyses

Tereza Manousaki1,2,†, Nathalie Feiner1,3,†, Gerrit Begemann1, Axel Meyer1,2,3 and Shigehiro Kuraku1,2,3*

1Laboratory for Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, Universitätsstrasse 10, 78464 Konstanz, Germany

2Konstanz Research School Chemical Biology (KoRS-CB), University of Konstanz, Universitätsstrasse 10, 78464 Konstanz, Germany

3International Max-Planck Research School (IMPRS) for Organismal Biology, University of Konstanz, Universitätsstrasse 10, 78464 Konstanz, Germany

*Author for correspondence (email: shigehiro.kuraku@uni-konstanz.de)

Contributed  equally  to  this  work.

(37)

37

Abstract

The functional equivalence of Pax6/eyeless genes across distantly-related animal phyla has been one of central findings on which Evo-Devo studies is based. In this study, we show that Pax4, in addition to Pax6, is a vertebrate ortholog of the fly eyeless gene [and its duplicate, twin of eyeless (toy) gene, unique to Insecta]. Molecular phylogenetic trees published to date placed the Pax4 gene outside the Pax6/eyeless subgroup as if the Pax4 gene originated from a gene duplication before the origin of bilaterians. However, Pax4 genes had only been reported for mammals. Our molecular phylogenetic analysis, including previously unidentified teleost fish pax4 genes, equally supported two scenarios: one with the Pax4-Pax6 duplication early in vertebrate evolution and the other with this duplication before the bilaterian radiation. We then investigated gene compositions in the genomic regions containing Pax4 and Pax6, and identified (1) conserved synteny between these two regions, suggesting that the Pax4-Pax6 split was caused by a large-scale duplication and (2) its timing within early vertebrate evolution based on the duplication timing of the members of neighboring gene families. Our results are consistent with the so-called two-round (2R) genome duplications in early vertebrates. Overall, the Pax6/eyeless ortholog is merely part of a 2:2 orthology relationship between vertebrates (with Pax4 and Pax6) and the fly (with eyeless and toy). In this context, evolution of transcriptional regulation associated with the Pax4-Pax6 split is also discussed in light of the zebrafish pax4 expression pattern that is analyzed here for the first time.

(38)

38

Introduction

Members of the Pax (paired-box) gene family encode transcription factors that play crucial roles in development (Wehr and Gruss 1996). A milestone in the 1990s which promoted subsequent intensive studies on Pax genes was the ability of the Drosophila melanogaster eyeless gene as well as its mouse ortholog Pax6 to induce eye formation when expressed ectopically in flies (Halder et al. 1995). Pax6/eyeless genes have thus been recognized as the master control gene for eye development (Gehring and Ikeo 1999). A recent report on secondary changes in the insect lineage shed light on a divergent aspect of the Pax6/eyeless orthology (Lynch and Wagner 2010). The aim of this paper is to investigate possible changes in the gene repertoire and gene regulation in the chordate lineage.

Traditionally, non-phylogenetic classifications have grouped Pax4 with Pax6 because of the absence of a conserved octapeptide in both of them (Wehr and Gruss 1996). The other vertebrate Pax genes are divided into the classes Pax1/9, Pax3/7 and Pax2/5/8, depending on the completeness of the homeodomain (Chi and Epstein 2002). Recent studies suggested that the first wave of the diversification of the Pax gene family dates back to the early metazoan era (Matus et al. 2007). The second wave of the diversification of Pax genes later in the vertebrate lineage is marked by gene duplications between Pax2, -5 and -8 (Kozmik et al.

1999; Bassham et al. 2008; Goode and Elgar 2009), between Pax1 and -9 (Holland et al.

1995; Ogasawara et al. 1999; Mise et al. 2008) and between Pax3 and -7 (Holland et al.

1999). These gene duplications occurred after invertebrate chordates branched off, but most likely before the split between gnathostomes and cyclostomes (McCauley and Bronner-Fraser 2002; O'Neill et al. 2007). This timing matches that of so-called two-round whole genome duplications (2R-WGDs; Lundin 1993; Holland et al. 1994; Sidow 1996; Spring 1997) implicated in early vertebrate evolution (Kuraku et al. 2009; reviewed in Panopoulou and Poustka 2005). However, it has not been explored, in the modern framework of molecular phylogenetics and comparative genomics, whether the Pax4-Pax6 split also coincided with this second wave of diversification (Fig. 1A).

(39)

39

Figure 1. Three possible scenarios of the timing of gene duplication between Pax4 and Pax6. Arrows indicate the Pax4-Pax6 split. (A) The Pax4-Pax6 duplication took place in the vertebrate lineage, and both Pax4 and Pax6 are orthologous to invertebrate Pax6/eyeless genes. Inside other Pax classes, namely Pax1/9, Pax3/7 and Pax2/5/8, paralogs that share the same structural property were also duplicated at this timing (see Introduction). This scenario, however, has never been suggested by molecular phylogenetic analysis. (B) Pax4 originated in a relatively recent gene duplication from mammalian Pax6. This scenario has been previously supported by the presence of Pax4 genes only in mammals. (C) The Pax4-Pax6 duplication predates the deuterostome-protostome split. Family-wide phylogenetic analyses usually support this scenario (see Introduction). However, no non-mammalian and invertebrate orthologs of Pax4 have been reported.

The timing of the gene duplication has significant impacts on our understanding of evolutionary modification of gene repertoires and functions. In fact, Pax4 genes have been reported only for human (Pilz et al. 1993), mouse (Sosa-Pineda et al. 1997) and rat

(40)

40

(Tokuyama et al. 1998), suggesting that Pax4 originated from a gene duplication unique to the mammalian lineage (Fig. 1B). However, family-wide phylogenetic analyses performed to date usually suggested an ancient origin of the Pax4 gene early in metazoan evolution (Fig.

1C; Hoshiyama et al. 1998; Wada et al. 1998; Breitling and Gerber 2000). In these studies, invertebrate genes identified as Pax6 orthologs, such as fly eyeless (Bopp et al. 1986) and Caenorhabditis elegans vab-3 (Chisholm and Horvitz 1995; Zhang and Emmons 1995), were shown to be more closely related to vertebrate Pax6 genes, than to Pax4 genes (Fig. 1C).

Because critical phylogenetic signals may be obscured by divergent sequences from other Pax classes, the long-standing question regarding the timing of the Pax4-Pax6 split should be addressed using a focused dataset aiming to resolve the Pax4-Pax6 relationship.

Gene duplications are usually followed by interplay between duplicates in terms of their functional differentiation. Thus, a comparison of the regulation and functions of duplicates can also lead to better understanding of gene family evolution. In mammals, in addition to the aforementioned inductive role in eye development, Pax6 is involved in development of the central nervous system (CNS), including the fore- and hindbrain, the neural tube, the pituitary and the nasal epithelium (Walther and Gruss 1991). In mouse, Pax6 is   also   expressed   in   all   the   four   cell   types   (α,   β,   δ   and   γ)   in   the   islets   of   Langerhans,   the   endocrine part of the pancreas (St-Onge et al. 1997). In zebrafish, a composite expression pattern of pax6a and pax6b highly resembles that of its mouse ortholog (Kleinjan et al. 2008;

also see Kinkel and Prince 2009 for a review on zebrafish pancreas development).

In contrast, Pax4, identified only in mammals, has not been implicated in eye development, but is rather expressed in the retinal photoreceptor cells (Rath, Bailey, Kim, Coon et al. 2009). Pax4 is   also   expressed   mainly   in   the   β-cells of the pancreas, and is necessary  for  the  differentiation  of  both  β- and  δ-cell lineages (Sosa-Pineda et al. 1997). A recent  study  revealed  plasticity  for  pancreatic  α-cells  to  transdifferentiate  into  β-cells (Thorel et al. 2010). Importantly, Pax4 can trigger this transdifferentiation (Collombat et al. 2009;

also see Liu and Habener 2009). This aspect of the Pax4 function attracts attentions as a potential clinical target of diabetes therapy (Gonez and Knight 2010). It would be intriguing

(41)

41

to reveal possible alterations or conservation in regulation of Pax4 expressions during evolution in order to reveal the evolutionary history of partitioned or redundant roles between Pax4 and Pax6 genes. However, a thorough comparative picture has been obscured by the lack of our knowledge about non-mammalian Pax4 orthologs.

In this study, we characterized the previously unidentified non-mammalian Pax4 orthologs in teleost fish genomes and performed combinatorial analyses on molecular phylogeny, conserved synteny and gene expression patterns. Our analysis favours a scenario which postulates the duplication between Pax4 and Pax6 genes in the 2R-WGDs (Fig. 1A). In light of this evolutionary scheme, we conclude that Pax4 secondarily lost its expression in the central nervous system (CNS) after the 2R-WGD early in vertebrate evolution. This could have led to the highly asymmetric evolution between Pax4 and Pax6.

Methods

RT-PCR

Total RNA was extracted from a whole 52 hpf zebrafish embryo. The RNA was reverse transcribed   into   cDNA   with   SuperScript   III   (Invitrogen)   using   a   3’   RACE   System   (Invitrogen).   This   cDNA   was   used   as   template   in   the   following   3’   RACE   PCR.   The   first   reaction   was   performed   using   the   forward   primer   5’-GACTGAGGGAATGAGACCAT-3’,   and the product of this PCR was used as template for the nested PCR with the forward primer 5’-CGCAGAGGAGACAAACCTTT-3’.   These   primers   were   designed   based   on   zebrafish   transcript sequences in Ensembl (ENSDART00000027919 and ENSDART00000078690).

The middle   fragment   was   amplified   using   the   forward   primer   5’-

ATGATTGAGCTGGCGACTGA-3’   and   the   reverse   primer   5’-

TCAAACTTTCGCTCCCTCCT-3’   in   the   first   PCR   and   the   forward   primer   5’-

GACTGAGGGAATGAGACCAT-3’   and   the   reverse   primer   5’-

CCTCATCCTCGCTCTTGATA-3’   in   a   nested PCR. The upstream fragment (covering the start  codon)  was  amplified  using  the  forward  primer  5’-TTTCTAGGATGTTCAGCC-3’  and  

(42)

42

the   reverse   primer   5’-CTCTTGTGCTGAACTATG-3’   in   the   first   PCR   and   the   forward   primer   5’-CAGCCAATTCTGCATGTA-3’   and   the   reverse   primer   5’- TGATGGAGATGACTTCAG-3’  in  a  nested  PCR.  We  concatenated  the  sequences  of  these   three fragments into one with the full-length open reading frame (ORF) and deposited it in EMBL under the accession number FR727738.

For in situ hybridization to detect zebrafish pax6b transcripts, a fragment covering its 3’-end  was  isolated  with  3’  RACE  using  the  forward  primer  5’-GTTTCACTGTTTTGCTCG- 3’  in  the  first  PCR,  and  the  forward  primer  5’-ACAGGACAACGGTGGTGAAAA-3’  in  the   nested PCR.

In situ hybridization

Two zebrafish pax4 riboprobes were prepared separately using the middle and 3' cDNA fragments described above. Whole-mount in situ hybridization using the pax4 riboprobes labeled with digoxigenin (DIG)-UTP and the pax6b riboprobes labeled with Fluorescein (Roche Applied Science) was performed as previously described (Begemann et al. 2001).

Hybridization was detected with alkaline phosphatase (AP)-conjugated anti-DIG antibody (Roche Applied Science) followed by incubation with NBT/BCIP for pax4, and with AP- conjugated anti-Fluorescein antibody (Roche Applied Science) followed by INT/BCIP-based detection for pax6b. In double in situ staining, pax6b transcripts were detected first, and after a washing step in 0.1 M glycine (pH 2.2), pax4 transcripts were detected.

Fluorescent in situ hybridization was performed using the tyramide signal amplification (TSA) system (Invitrogen) as instructed by the manufacturer. DIG-labeled riboprobe was detected with horseradish peroxidase (HRP)-conjugated anti-DIG antibody.

After incubating with biotinyl-tyramide, fluorescent signal was detected with streptavidin-488 (Invitrogen).

Referenzen

ÄHNLICHE DOKUMENTE

Additionally, the characterization of each individual fatty acid ester was discussed with regard to the fuel properties of biodiesel produced by the alga. Key

Moreover, our phylogenetic analysis suggested that the amphioxus and vertebrate Hox14 genes are not orthologous, but have arisen independently through tandem gene duplications of

Furthermore, the overall codon usage in the available sea lamprey coding sequence dataset and genome-wide coding nucleotide sequences for other species were tabulated (Figure 7)..

Coomassie stained SDS-PAGE and Western blot analysis identified the YdgR protein, and uptake experiments with the fluorescent dipeptide β-Ala-Lys-AMCA in bacterial cells

Das Besondere an dem Nahrungsergänzungsmittel ist das Aminosäurenprofil, da die für den Körper essenziellen Eiweiße in einem optimalen Verhältnis enthalten sind.. Sie werden

Now we are going to study several aspects of the interplay between those four Hankel non-negative definite sequences which determine the sections of an [α, β]-Hausdorff non-

A nearly constant increase with increasing altitude is seen for polyunsaturated fatty acids (PUFA), linoleic acid (C18:2) and the sum of CLA isomers.. The essential omega-3 fatty

Fatty acid composition of adipose tissue lipids closely related to dietary fatty acid intake?. – main impact related to dietary