• Keine Ergebnisse gefunden

Development, function and evolution of body structures are governed by tightly regulated gene

2. General Introduction

2.1. Development, function and evolution of body structures are governed by tightly regulated gene

The information how we and all other organisms develop, function and interact with our environment is encoded in our DNA which lies tightly packed as chromosomes in the nuclei of each of our cells (Figure 1A). During a process called transcription the genetic information encoded in genes is transcribed into messenger RNA (mRNA). The mRNA provides the template for the translational machinery, which translates the mRNA into amino acid sequences and eventually functional proteins (Figure 1C).

A typical eukaryotic gene locus is composed of several elements. The protein information is encoded in one or more exons, which together form the coding region (CDS), and are separated by introns. Transcription is initiated by the assembly of a basal transcription machinery at the promoter region, mostly located 5’ upstream, close to the transcription start site (TSS) of the respective gene. This protein complex recruits the RNA polymerase that synthesizes the pre-mRNA. Where, when and how strong a gene is transcribed is though in the first place controlled by regulatory intronic or intergenic DNA regions, so called enhancers or cis-regulatory regions ((Davidson, 2001; Wray, 2003), Figure 1C). Therefore the respective genomic regions must be depleted of nucleosomes, which otherwise confer tight DNA packing.

Hence, regulatory sequences must be accessible for transcription factors (TFs) that physically interact with the DNA by recognizing sequence-specific TF-binding motifs. This in turn leads to recruitment of additional TFs and co-factors. Enhancer sequences, are often of modular nature, meaning that several, locally separated regulatory regions modulate the expression of a single gene (e.g. (Adachi et al., 2003; Davidson, 2001; Stanojevic et al., 1991)). The advances in high throughput sequencing methods nowadays allow to reliably define the location of open chromatin regions in the genome. Approaches like ChIP-seq (Johnson et al., 2007; Robertson et al., 2007), FAIRE-seq (Giresi et al., 2007) or ATAC-seq (Buenrostro et al., 2013) are frequently used to define putative regulatory regions and allow to link them to gene expression, if combined with other methods like RNA-seq (Wang et al., 2009). However, how exactly enhancers carry out their regulatory function is not yet completely understood and different models of enhancer function have been proposed (Buffry et al., 2016). Chromosome

- 4 -

conformation capture methods combined with high throughput sequencing such as Hi-C (van Berkum et al., 2010) allow resolving the 3-dimensional chromatin states and are used to study how distantly located regulatory sequences exert their regulatory function (Furlong and Levine, 2018).

Each cell type of an organism is characterized by a certain combination of expressed genes and the defined interplay of their gene products. Since different cell types have to carry out distinct functions for a long period of time (depending on the life span of an organism), this function is ensured by tissue or even cell-specific gene expression (Lübbe and Schaffner, 1985).

Traditional methods to quantify the expression levels of single genes include quantitative real- time PCR (qRT PCR,(Bustin, 2000)) and Northern Blotting (Alwine et al., 1977). The spatial distribution of transcripts can be studied by in-situ hybridization (Pardue and Gall, 1969).

Nevertheless, only the advent of next generation sequencing (NGS) like RNA-seq facilitated the efficient genome wide assessment of gene expression by quantifying the complete mRNA content that is expressed at a certain time point in a cell or tissue (Wang et al., 2009).

Disturbance of gene expression, and thus function, eventually leads to disease or death of the respective organism (e.g. (Dermitzakis, 2008; Emilsson et al., 2008)). For instance, in humans, the formation and progression of cancer is tightly linked to aberrant gene expression and regulation (e.g. (Liang and Pardee, 2003)). Therefore, the expression of genes has to be under tight spatial and temporal regulation, which is ensured on several molecular and cellular levels (Figure 1C). The accessibility of regulatory regions for instance is highly dependent on the tissue and developmental stage (e.g. Bozek et al., 2019). Furthermore, biochemical modifications of DNA (methylation) and histone proteins (methylation, acetylation, phosphorylation and many others) influence gene expression (Kouzarides, 2007; Lawrence et al., 2016) (Figure 1C). In Drosophila dosage compensation relies for example on the acetylation of lysine 16 residues on the H4 histones of the X-chromosome, allowing the increase of transcription in males by decondensation of the chromosomes (e.g. (Akhtar and Becker, 2000; Turner et al., 1992)).

Additionally, methylation of Cytosines has been linked to repression of transcription (reviewed in Bird and Wolffe, 1999). In vertebrates for example, promoter or enhancer regions, often containing so-called CpG-islands are usually depleted of methylated CpGs and hyperacetylated histones, marking actively transcribed genes.

- 5 -

The spatially and temporally restricted availability of TFs and co-factors that bind to accessible regulatory regions further represents a level of context specific gene regulation. One example of transcriptional co-regulation, which will be introduced in Chapter II in more detail can be found in the developing Drosophila wing disc. Pannier (Pnr), a GATA transcription factor which usually activates expression of its target genes, interacts in a spatially defined manner with U-shaped (Ush) (Fromental-Ramain et al., 2010, 2008). The resulting heterodimer loses the activating role of Pnr but acquires a repressing function (Haenlin et al., 1997). Also, post-transcriptional processes can modulate gene expression in a context dependent manner. For instance, the context dependent expression of small regulatory RNA molecules such as microRNAs (miRNAs) modifies the stability of mRNA or the efficiency with which an mRNA molecule is translated (reviewed in Bartel, 2018; Kittelmann and McGregor, 2019). Also, for long-non-coding RNAs (lncRNAs) it has been established that they are transcribed in a highly spatially and temporally controlled manner and are suggested to influence for example the expression of genes in their close genomic vicinity (Kopp and Mendell, 2018; Ponting et al., 2009; Sarropoulos et al., 2019). These are only few of the many examples that show that tissue and stage specific gene expression is orchestrated on different levels of the gene regulation machinery.

Figure 1. Gene expression is tightly controlled. A. The DNA lies heavily packed as so-called chromatin in the nuclei of eukaryotic cells. B. Formation of chromatin is carried out by wrapping DNA around histones, which are composed of nucleosomes. Regions of loose packing, characterized by nucleosome depletion, are in general more accessible for transcription factors (TFs) and loci in these regions are mostly actively transcribed. In contrast, tightly packed DNA is inaccessible to regulatory proteins and subsequent transcription. Biochemical modification of histones or cytosines provide another level of gene regulation. C. A eukaryotic gene locus is composed of one or more exons, which make up the CDS of the gene. Regulatory regions are located in introns, separating the exons, or in intergenic regions. Transcription is initiated at the promoter region, 5’ upstream of

- 6 -

the transcription start site (TSS), and TFs bound to enhancer regions further regulate gene expression. The figure is taken from (Buchberger et al., 2019).

While gene expression has to be tightly controlled to ensure proper organ development and function, many evolutionary studies revealed that divergence in gene expression is a key driver for phenotypic evolution (Alvarez et al., 2015; Carroll, 2005; King and Wilson, 1975; Todd et al., 2016). One of the most classical examples, where differences in morphologies were associated with differential gene expression is the work of Abzhanov and colleagues, who linked higher expression of bone morphogenetic protein 4 (BMP4) to wide beak morphology in ground finches (Abzhanov, 2004), whereas development of long beaks of cactus finches is mainly driven by higher levels of calmodulin (CaM) (Abzhanov et al., 2006). In East African cichlid fish it has recently been revealed, that changes in the expression of the agrp2 gene, defines the pigmentation pattern of different radiations (Kratochwil et al., 2018). Similarly, adaptive changes in abdominal pigmentation of African Drosophila populations are caused by expression variation of the ebony gene (Pool and Aquadro, 2007; Rebeiz et al., 2009). Changes in gene expression levels could be due to changes in a gene’s own regulatory regions (cis-regulatory divergence) or due to divergence of upstream regulators, such as transcription factors or regulatory RNAs (trans-regulatory divergence) (Cowles et al., 2002; Wittkopp et al., 2004). For many simple traits, including pigmentation, trichome formation or loss of specific skeletal structures, it has been shown that the causative underlying mutations are often located in the non-coding, regulatory regions of the locus (e.g. Chan et al., 2010; McGregor et al., 2007;

Prud’homme et al., 2006; Rebeiz et al., 2009), which would eventually affect the expression of the respective gene. If this also applies to quantitative, complex traits like size and shape of organs and structures remains to be established.

In summary, gene expression is a central biological process that transfers the information stored in the genome of an organism to its development, function and evolution.

- 7 -