• Keine Ergebnisse gefunden

2.1. The molecular mechanisms of gene expression

2.1.4. Transcription and its regulation

Transcription process The transcriptional process in general is separated in three phases:

initiation, elongation and termination. In the initiation phase, the RNA polymerase binds the DNA close to the transcription start side (TSS) in combination with general transcrip-tion factors (GTFs) supporting the formatranscrip-tion of the pre-initiatranscrip-tion complex that is depicted in Figure 2.3. This complex opens the DNA double helix and short RNA transcripts are synthesized by the RNA polymerase at the TSS [16]. After the first RNA transcript exceeds a length of about ten ribonucleotides the elongation phase starts that is simply the polymeri-sation of further ribonucleotides according to the DNA template by moving along the DNA strand[16]. The termination phase starts after the RNA polymerase passes the poly A signal sequence, the RNA strand is released, the RNA polymerase dissociated from the DNA and the transcription bubble is closed [16].

Regulatory DNA regions The transcription efficiency of a gene is influenced incisby a couple of DNA regions such as promoters, enhancers, upstream activator sequences (UASs), insulators and boundary elements. Incismeans that the regulatory element is on the same DNA molecule as the gene. A promoter is located immediately upstream of the transcrip-tion start side and can even reach within the coding region of the gene [18]. In eukaryotes, the totality of functional elements of the promoter (ciselements) that are sufficient to acti-vate the transcription are referred to as core promoter and consists of 40-60 nucleotides in length [16]. The composition of theseciselements is specific and varies from gene to gene [19]. Common elements of eukaryotic RNA polymerase II core promoters are the TFIIB recognition elements (BRE), the TATA box, the initiator (Inr) as well as some downstream

Figure 2.3.: Preinitiation complex of RNA polymerase II. The binding of RNA polymerase II to the promoter is supported by general transcription factors de-noted as TFII (transcription factors for RNA polymerase II) with classifications:

TFIIA,TFIIB,TFIID,TFIIE,TFIIF,TFIIH. The TATA box is recognized by the TATA-binding protein (TBP), a subunit of TFIID. (Modified from [16, Fig. 12-15])

promoter elements like the downstream promoter element (DPE), downstream core ele-ment (DCE) and the motif ten eleele-ment (MTE) [16, Page 397] (see Figure 2.4). In general, a subset of these elements is sufficient to enable the binding of polymerase, general transcrip-tion factors and co-activators and thus, to enable the formatranscrip-tion of the preinitiatranscrip-tion complex [16, 19]. Besides the promoter, another important regulatory element is the enhancer, a cluster of regulatory sequences that is located hundreds or even millions of base pairs up-stream or downup-stream from its target gene [16, 19, 20]. Enhancers form looping structures to physically interact with the promoter of their target gene irrespective of orientation [21], leading to transcription activation or the increase of the transcriptional level. The target genes of an enhancer can either be neighbouring genes but even the skipping of some genes is possible to reach their target genes [21]. Rarely, enhancer and target gene are located on different chromosomes [22]. The activity of enhancers is cell type specific or is affected by developmental or environmental constraints [19] indicating that the alterations of enhancer activities results in the change of gene expression patterns and consequently, incorrect al-teration of enhancer activity are linked to many human diseases [23]. The enhancer activity itself can be identified by eRNAs, short non-coding RNAs that are bidirectionally tran-scribed from enhancer sequences if the enhancer elements are in close proximity to RNA polymerase II [19]. In addition, active enhancers can be identified by the proteins bound to them, i.e. they are often bound by the factor EP300 [21]. In the mammalian genome, there are around 23000 genes and about 1 million enhancers, indicating that several enhancers can act on the same target gene depending on the cell type or condition [19]. In turn, an enhancer can regulate several genes. The underlying mechanism of how an enhancer finds its target promoter is not fully understood yet. Following von Arensbergen et al. [21] mech-anisms that might be involved in this selection process are: i) biochemical compatibility, ii) spatial architecture, iii) insulation and iv) chromatin environment. These mechanisms are illustrated in Figure 2.5. In detail, two regulatory sequences are biochemically compatible if both of them have the ability to be occupied by protein combinations that are able to

interact with each other. Obviously, the physical interactions between two sequences can only take place if the overall folding of the chromatin renders it possible. As mentioned above, another kind ofcisregulatory DNA regions are insulator elements that can promote or block the interaction between an enhancer or a promoter by altering the 3D conformation of chromatin. These DNA regions are bound by specific DNA binding proteins where the most popular binding partner is the CTCCC-binding factor (CTCF) [21].

Figure 2.4.: Polymerase II core promoterwith transcription start site (TSS) common reg-ulatory elements: BRE (TFIIB recognition element), TATA (TATA Box), Inr (initiator ele-ment), DCE (downstream core element) and DPE (downstream promoter element). (Based on [16, Fig. 12-14])

Transcription factors In order to carry out their regulatory functions, the instructions encoded in the sequences of the cis regulatory elements are recognized by the selective binding of proteins to theses regulatory sequence elements. These proteins belong to the overall class of transcription factors (TFs), regulatory proteins that are directly involved in the regulation process of a gene by usually binding to specific regulatory DNA sequences termed transcription factor binding sites (TFBSs) [25]. Fulfilling their regulatory functions, TFs can completely activate or repress transcription of a certain gene, or increase/decrease the level of its transcription. Thereby, TFs directly interact with the basal transcriptional machinery or alter chromatin structure by histone or DNA modifications. Regarding their molecular structure, TFs in general exhibit a modular composition (see Figure 2.7) and contain at least one of the following protein domains: i) a DNA binding domain, ii) an oligomerization domain, iii) a regulatory domain and iv) a trans-activation domain [26]. The DNA-binding domain recognizes specific DNA sequence patterns and enables the protein-DNA binding. protein-DNA-binding domains of proteins can be computationally predicted based on their amino acid sequences using for example Jensen-Shannon divergence as we did in our recent approach [27] (see Appendix A.4). The regulatory domain in turn controls the activity of a TF by e.g. ligand binding or phosphorylation and the trans-activation domain is usually characterized by a specific amino acid composition [26].

The human genome consists of around 20000 protein coding genes of which roughly 1500 code for TFs. Considering isoforms that are generated by alternative splicing, the human body contains more than 2900 TFs [25]. However, the number of TFs is much smaller than the number of all genes and consequently the composition of TFs bound to regulatory

Figure 2.5.: Mechanisms determining promoter-enhancer interactions. The pairing of an enhancer to a certain promoter is enabled if a) the bound transcription factors are com-patible to each other, b) the spacial constraints allow the contact between the two DNA regions, c) insulator elements do not hinder the pairing and d) the chromatin landscape of the enhancer is accessible. (Based on [21, 24])

elements as well as TF interplay is important in order to provide a proper gene regulation in eukaryotic cells.

Further, TFs in general have an oligomerization domain that allows the direct physical in-teraction (synergistic or antagonistic) with other TFs. Thereby, TFs form homo-and het-erodimers with other TFs, depending on whether the interaction partner is of the same type or not and extending this dimerization process, TFs use to form high order complexes in combination with co-factor proteins. The binding sites of the underlying TFs in turn form clusters on DNA that are known ascisregulatory modules. Direct physically cooperations between transcription factors are depicted in Figure 2.6. Regarding a regulatory region, TFs that bind to thecis regulatory modules inside that region are interacting with each other.

In addition, the TFs that are bound to different regulatory regions can directly physically

Biological background 14 TSS

Promoter

TF TSS

CoF

Enhancer

Promoter

Intra-regional cooperation

Direct inter-regional cooperation Indirect

inter-regional cooperation via cofactor

Figure 2.6.: Physical cooperation strategies of transcription factors.In order to provide proper gene regulation, transcription factors (TFs) have to cooperate with other TFs or co-factors (CoF) in a synergistic or antagonistic manner. These cooperations can for example take place between TFs that bind next to each other on DNA (intra-sequence cooperations) and TFs that belong to different regulatory sequences (inter-sequence cooperations). The cooperations between TFs of different regulatory regions can be based on direct physical interactions or can be established by cofactors.

DNA binding

domain Oligomerization domain

Regulatory

domain Trans-activation domain

Figure 2.7.: Modular composition of transcription factors. In general transcription fac-tors consists of all or some of the following domains: DNA binding domain, oligomerization domain, regulatory domain and trans-activation domain.

interact with each other or indirectly via co-factor. These physical cooperations can be synergistic or antagonistic in a way that the effect of activating transcription factors can be strengthened or reduced. For the antagonistic way, transcription factors termed repressors hinder the activity of activating TFs as depicted in Figure 2.8. Regarding one regulatory sequence, repressors can functionally cooperate with the activator by blocking its binding site or physically cooperate by masking its activation domain. In contrast, repressors bound to a distal regulatory region (like enhancer region) can directly or indirectly interact with activating TFs on the promoter [16].

Figure 2.8.: Strategies of repressing transcription factors.A transcription factor can full fill its repressing function by a) blocking the binding site of the transcriptional activator, b) interacting with the activator and thereby covering it’s activation domain and c) directly repress transcription initiation by interacting with general transcription factors. (Modified from [16, Fig. 17-20])