• Keine Ergebnisse gefunden

G-Quadruplex Forming Repeat Sequences In Bacterial Genomes

N/A
N/A
Protected

Academic year: 2022

Aktie "G-Quadruplex Forming Repeat Sequences In Bacterial Genomes"

Copied!
276
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

G-Quadruplex Forming Repeat Sequences In Bacterial Genomes

Dissertation submitted for the degree of Doctor of Natural Sciences (Dr. rer. nat.)

Presented by Charlotte Rehm

at the

Faculty of Natural Sciences Department of Chemistry

Date of the oral examination: March 24, 2015 First referee: Prof. Dr. Jörg S. Hartig Second referee: PD Dr. Malte Drescher

(2)
(3)
(4)
(5)

Parts of this work were published in:

Rehm C, Holder IT, Groß A, Wojciechowski F, Urban M, Sinn M, Coelfen H, Drescher M, Hartig JS.

2014. A bacterial DNA quadruplex with exceptional K+ selectivity and unique structural

polymorphism. Chem. Sci. 5: 2809-2818

Rehm C, Wurmthaler LA, Li Y, Frickey T, Hartig JS. Investigation of a quadruplex-forming repeat sequence highly enriched in Xanthomonas and Cyanobacteria (submitted)

Additional publications:

Klauser B, Rehm C, Summerer D, Hartig JS. 2015. Engineering ribozyme-based aminoglycoside switches of gene expression by in vivo genetic selection in Saccharomyces cerevisiae. Riboswitches as Targets and Tools, Methods Enzymol, 550:301-20

Rehm C, Klauser B, Hartig JS. 2015. Engineering aptazyme switches for conditional gene expression in Mammalian cells using an in vivo screening approach. Methods Mol Biol, 1316:127-40

Ausländer S, Stücheli P, Rehm C, Ausländer D, Hartig JS, Fussenegger M. 2014. A general design strategy for protein-responsive riboswitches in mammalian cells. Nature Methods, 11: 1154-1160

Rehm C, Hartig JS. 2014. In vivo screening for aptazyme-based bacterial riboswitches. Methods Mol Biol. 1111: 237-249

Saragliadis A, Krajewski SS, Rehm C, Narberhaus F, Hartig JS. 2013. Thermozymes: Synthetic RNA thermometers based on ribozyme activity. RNA Biol. 10(6)

(6)
(7)

Record of Contributions:

A Bacterial DNA Quadruplex with Exceptional K+ Selectivity and Unique Structural Polymorphism:

This project was carried out in collaboration with Isabelle T. Holder (PhD student, AG Hartig), Andreas Groß (PhD student, AG Drescher), Filip Wojciechowski (former Post-Doctoral Researcher, AG Hartig) and Maximilian Urban (former Master’s Student, AG Hartig).

Isabelle T. Holder carried out the electrophoretic mobility shift assay (Figure 21E) and prepared samples for 1H-NMR analysis. NMR measurements were carried out at the NMR core facility and data was analyzed by Isabelle T. Holder with the assistance of Žarko Kulić (AG Möller) (Figure 22D and Figure 47). Filip Wojciechowski synthesized the spin-labeled d[(GGGGCT)3GGGG]

oligonucleotide. EPR measurements were carried out by Andreas Groß (Figure 21F) and analyzed by Andreas Groß and Malte Drescher (AG Drescher). Maximilian Urban carried out and analyzed AUC measurements (Figure 22A-C) with the assistance of Marius Schmid and Helmut Coelfen (AG Coelfen).

G-rich Bacterial Repeat Sequences with the Potential to Fold G-quadruplexes:

This project was carried out in collaboration with Lena A. Wurmthaler (former Master’s Student, now PhD student, AG Hartig) and Yuanhao Li (PhD student, AG Frickey).

GGGGA(C/T)T repeat sequence data from Nostoc sp. PCC 7120 and data of repeat associated genes was compiled and kindly provided by Lena A. Wurmthaler (Table 60, Table 61). Data shown in Figure 30 and Figure 37C was provided by Lena A. Wurmthaler. She also measured CD spectra of G-quadruplex forming oligonucleotides derived from Nostoc sp. (Figure 31C-F). Yuanhao Li carried out the mapping of whole transcriptome sequencing data of Xanthomonas axonopodis pv. citri str. 306 and kindly provided data of the coordinates of the assembled transcripts and their expression levels (Table 59).

(8)
(9)

Acknowledgement

Gleich geht’s los. Aber zuvor, möchte ich mich an dieser Stelle ganz herzlich bei allen bedanken, die mich auf dem Weg hierher begleitet haben.

Zu aller erst bei Prof. Dr. Jörg S. Hartig für die Aufnahme in die Arbeitsgruppe und die Möglichkeit über die Jahre an wirklichen vielen unterschiedlichen und interessanten Projekten zu werkeln. Mein Laborjournal besagt, dass ich im Dezember 2007 zum ersten Mal in der AG Hartig aufgeschlagen bin. Vielen Dank für das entgegengebrachte Vertrauen, die Geduld und offene Ohren, wenn mal etwas nicht geklappt hat, dass man jederzeit an deiner Tür klopfen kann und insgesamt für die völlig unkomplizierte Atmosphäre in der Gruppe. Nicht zuletzt auch für die immer wieder spaßigen Gruppenausflüge.

PD Dr. Malte Drescher, für die Übernahme des Zweitgutachtens und für die gute Zusammenarbeit beim Quadruplex-Polymorphismus-Projekt.

Prof. Dr. Thomas U. Mayer, für die Übernahme des Prüfungsvorsitzes, alle Ratschläge während meiner Thesiskommitees und auch schon vorher während meiner Masterarbeit.

Allen, die mir bei meinen Projekten geholfen haben, insbesondere Isi, Lena, Andi, Filip, Max und Yuanhao.

Und natürlich auch meinen fleißigen und gut gelaunten Studenten Jasmin 1 & Jasmin 2, Carina, Kathleen und Jeremias. Die sich auch schon mal für etwas abgefahrenere Experimente bereitgestellt haben, und unter anderem Kohlextrakt herstellt und mit mir gebangt haben, ob das Apollo-13-Flammenspektrometer beschließt, sich doch auf den Weg zum Mond zu machen.

Allen früheren und jetzigen Mitgliedern der AG Hartig für wunderbare Tratschpausen und Kochrunden. Als Wächter der Süßigkeitenschublade kann ich nur sagen, dass mir eure ah’s und oh’s beim Öffnen des Schokoverstecks immer wieder sehr viel Freude bereitet haben. Danke, dass ihr euch so bereitwillig als Testobjekte für diverse Kuchenkreation bereitgestellt habt. Ich gebe zu, dass mit der Roten Beete war etwas gewagt, aber ihr habt euch tapfer geschlagen. Den morgendlichen Kaffee auf dem Dach mit euch im Sommer werde ich schwer vermissen. Britney nicht ganz so sehr, dafür den Espresso umso mehr, die wirklich einzigartige Crema und die Diskussionen über die Feinheiten der deutschen Sprache und Hipsters. Ich plädiere immer noch für den Pinguin im Kühlraum und bin auch bereit wieder zu kommen, um ihn mir anzusehen.

Jetzt ein Genitiv für Michele: Es bedarf eines ganz dicken Dankeschöns an Astrid, ohne die im Labor absolut gar nichts laufen würde! Vielen Dank, für die Gesellschaft bei unzähligen Mittagessen, und das geduldige Warten, wenn’s mal wieder länger gedauert hat. Ich verspreche, die Gentechnikaufzeichnungen kommen im Januar.

(10)

Isi, für die Unterstützung in und außerhalb des Labors und offene Ohren zu jeder Zeit und insbesondere während der letzten Wochen. Danke, dass ich bei dir Dampf ablassen darf und dass meine Musikmixe bei dir immer ein zu Hause finden. Wir sind halt old school. Ich habe mich über jedes Herzl, Schoki und Blümli auf meinem Schreibtisch unglaublich gefreut und Sporteln in der Mittagspause mit dir war immer lustig.

Waidmannsheil!

Bene, dafür dass er es so lange mit mir in einem Labor ausgehalten hat, ohne einen Quadruhead oder Hammerplex zu bekommen, und mit mir das badische Labor vor den Angriffen aus dem schwäbischen Ausland verteidigt hat. Gerne auch mit 300 Trailer. Leider habe ich keinen Abschnitt mehr über die Gene- Gun eingefügt, das wäre dann doch etwas viel geworden, aber hiermit ist sie wenigstens erwähnt. Für das Einfliegen von Mochis bekommst du einen Sternchenaufkleber ins Laborjournal.

Meinen fleißigen Korrekturwichteln Isi (schon wieder du!) und Astrid, dafür, dass ihr euch die Zeit genommen und die Mühe gemacht habt, euch durch 280 Seiten working title „all you ever wanted to know about quadruplexes in CD“ zu graben.

Meinen Freunden aus dem Studium und zu Hause, inklusive aller Anhänge und Ableger, dafür, dass wir es trotz räumlicher Trennung immer wieder schaffen, etwas zusammen zu unternehmen und man sich bei euch immer gleich zu Hause fühlt.

Meiner in alle Winde verstreuten Familie, insbesondere meinen Eltern, für die Unterstützung und das Vertrauen in sämtlichen Lebenslagen, rund um die Uhr seit 29 Jahren. Und dafür, dass ich immer gut gefüttert auf den Weg geschickt werde, „Mama Rehms Lunchpakete“ haben hier schon Bekanntheitsgrad erlangt.

Und last but definitely not least meinem Freund Daniel, dafür, dass ich bei dir immer Rückhalt finde und für die Geduld in den letzten Monaten. Ich löse hiermit mein Versprechen ein und da du nicht zum Public Viewing geflogen bist, musst du auch nicht für meine Defense einfliegen. :) Ich freue mich schon sehr darauf, dass wir bald wieder auf einem gemeinsamen Kontinent zu finden sind.

Jetzt geht’s wirklich los:

(11)

Contents

1 Introduction ... 1

1.1 Nucleic Acid Building Blocks ... 2

1.2 Non-Canonical Nucleic Acid Structures ... 4

1.3 G-Quadruplexes ... 8

1.3.1 Structural Features of G-Quadruplexes ... 9

1.3.2 G-Quadruplex Polymorphism ... 12

1.3.3 Frequently Used Methods for the Characterization of G-Quadruplexes ... 13

1.3.4 Putative Cellular Roles of G-Quadruplexes ... 15

1.3.5 Evidence of G-Quadruplex Formation in vivo ... 18

1.4 Repetitive DNA Sequences ... 20

1.4.1 Putative Structures Formed by Repetitive DNA Sequences ... 20

1.4.2 Overview of Repetitive DNA Types in Prokaryotes ... 22

1.4.2.1 Simple Sequence Repeats (SSRs) ... 22

1.4.2.2 Other Types of Repetitive Elements in Prokaryotic Genomes ... 23

1.4.3 Evolvability of SSRs and their Effect on Cellular Functions ... 24

1.5 Occurrence and Distribution of SSRs and G-Quadruplexes in Prokaryotes ... 27

1.6 Hyperosmotic Shock in Non-Halophilic Bacteria ... 29

2 Aims of this Thesis ... 33

3 Results and Discussion ... 35

3.1 A Bacterial DNA Quadruplex with Exceptional K+ Selectivity and Unique Structural Polymorphism ... 35

3.1.1 K+ Selectivity and Structural Transition ... 36

3.1.2 Investigation of the Individual G-Quadruplex Conformations ... 42

3.1.3 Influence of Loop Sequence Composition ... 46

3.1.4 Influence of Loop Length ... 47

3.1.5 Influence of G-tract Length ... 48

3.1.6 Occurrence of (G4CT)3G4 in Bacterial Genomes ... 49

3.1.7 Towards Switchable Nanomaterials ... 50

3.1.8 Conclusions ... 52

3.2 G-rich Bacterial Repeat Sequences with the Potential to Fold Quadruplexes ... 55

3.2.1 Characterization of G-rich Repeat Sequences in Xanthomonas sp. ... 56

3.2.2 Characterization of G-rich Repeat Sequences in Nostoc sp. ... 62

3.2.3 G-Quadruplex and i-motif Formation by Repeat Patterns ... 65

3.2.4 Repeat Associated Genes ... 72

3.2.5 Analysis of Sequence Homology in Repeat Containing Regions in Xanthomonads ... 74

3.2.6 Repeats in Non-Coding Regions ... 79

3.2.6.1 Distance Distribution of the Repeats to the Neighboring ORFs ... 79

3.2.6.2 Operon Analysis for Repeat Associated Genes... 80

(12)

3.2.8 Influence of Hyperosmotic Shock on the Expression Levels of Repeat Associated Genes ... 87

3.2.9 Fishing for (GGGAATC)3GGG Binding Proteins ... 95

3.2.10 Conclusions ... 98

4 Summary and Outlook ... 103

5 Zusammenfassung und Ausblick ... 107

6 Materials ... 111

6.1 Chemicals and Reagents... 111

6.2 Nucleotides and Radiochemicals ... 111

6.3 DNA Oligonucleotides and Primers ... 111

6.3.1 DNA Oligonucleotides Used for CD, PAGE, NMR and AUC ... 111

6.3.2 DNA Oligonucleotides Used as Primers in qPCR ... 113

6.3.3 DNA Oligonucleotides Used as Probes in Fishing Experiments ... 115

6.4 Bacterial Strains ... 115

6.5 Media for the Cultivation of Xcc ... 115

6.6 Enzymes, Kits and Size Standards ... 116

6.7 General Solutions and Buffers ... 117

6.8 Laboratory Consumables... 119

6.9 Laboratory Equipment ... 119

6.10 Software and Online Tools and Databases ... 120

7 Methods ... 121

7.1 Circular Dichroism (CD) Measurements ... 121

7.2 CD Thermal Denaturation ... 121

7.3 UV Thermal Denaturation ... 121

7.4 Oligonucleotide Purification by Preparative PAGE ... 121

7.5 Radioactive Labeling of Oligonucleotides ... 122

7.6 Electrophoretic Mobility Shift Assay (EMSA) ... 122

7.7 Synthesis of Spin-Labeled d[(G4CT)3G4] ... 123

7.8 ERP Measurements ... 123

7.9 EPR Data Analysis ... 124

7.10 1H-NMR Measurements ... 124

7.11 Analytical Ultracentrifugation (AUC) ... 124

7.12 BLASTn Search ... 125

7.13 Identification of Repeat Patterns in Xanthomonads ... 125

7.14 Identification of Repeat Patterns in Nostoc sp. PCC 7120 ... 125

7.15 Analysis of Repeat Associated Genes ... 126

7.16 Analysis of Sequence Homology Between Xcc and Xac in Repeat Containing Regions 126 7.17 Analysis of Repeat Positions Relative to Neighboring ORFs ... 126

7.18 Operon Analysis ... 127

(13)

7.19 Analysis of Whole Transcriptome Sequencing Data of Xac ... 127

7.20 Cultivation of Xcc for Hyperosmotic Shock Experiments ... 127

7.21 RNA Isolation ... 128

7.22 Removal of Genomic DNA ... 128

7.23 Phenol-Chloroform Extraction ... 128

7.24 Ethanol Precipitation ... 129

7.25 Nucleic Acid Quantitation ... 129

7.26 cDNA Synthesis ... 129

7.27 Semi-Quantitative Real-Time PCR (qPCR) Analysis ... 130

7.28 Agarose Gel Electrophoresis ... 131

7.29 Immobilization of Oligonucleotides on Streptavidin Coated Magnetic Beads ... 131

7.30 Preparation of Xcc Protein Extract for Fishing Experiments ... 132

7.31 Fishing for Protein Interaction Partners ... 132

7.32 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE) ... 132

7.33 Silver Staining of SDS-Polyacrylamide Gels ... 133

8 Abbreviations and Units ... 135

9 Bibliography... 137

10 List of Figures ... 149

11 List of Tables ... 151

12 Appendices ... 153

12.1 Additional Data from “A Bacterial DNA Quadruplex with Exceptional K+ Selectivity and Unique Structural Polymorphism” ... 153

12.1.1 Additional Figures ... 153

12.1.2 Additional Tables ... 158

12.2 Additional Data from “G-rich Bacterial Repeat Sequences With the Potential to Fold Quadruplexes” ... 164

12.2.1 Additional Figures ... 164

12.2.2 Additional Tables ... 170

12.2.3 Classification of Repeat Associated Genes According to KEGG Pathways ... 250

12.2.3.1 Classification of Repeat Associated Genes of Xcc According to KEGG Pathways... 250

12.2.3.2 Classification of Repeat Associated Genes of Xac According to KEGG Pathways ... 254

12.2.3.3 Classification of Repeat Associated Genes of Ana According to KEGG Pathways ... 258

(14)
(15)

1 Introduction

Figure 1: The Flow of Genetic Information

Simplified scheme of the flow of genetic information in a prokaryotic cell: The genetic information is stored in the form of DNA on a single circular chromosome (gray). To use the genetic information an mRNA transcript (violet) is synthesized during transcription using the DNA as template. Translation of the encoded message in the mRNA sequence into the respective amino sequence occurs at the ribosome (orange, colored circles represent nascent polypeptide chain). The amino acid chain folds into a functional protein (blue) carrying out a variety of tasks in the cell, e.g. as biocatalysts or structural components. Figure adapted from Chapter 28 in (1).

Genetic information in all organisms is stored in the form of deoxyribonucleic acid (DNA), or ribonucleic acid (RNA) in many viruses. Bacteria are the simplest living organism and are classified as prokaryotes; this refers to the lack of a nucleus as opposed to eukaryotic cells in which the genetic material is kept in this specialized compartment within the cell. Being a single celled organism the DNA is found in same compartment as the cytoplasm in a bacterium (2). A gene can be regarded as a unit of information, the term “gene” was first used by Wilhelm Johannsen in 1909 (3). In most bacteria genes are aligned on a single circular chromosome; in addition bacteria may carry extra genetic elements (plasmids) (2). In a process called gene expression the information to operate as a living organism that is stored in the DNA is transferred to functional molecules, the so-called gene products. These are often proteins, but may also be RNAs in the case of non-coding DNA. In a very simplified overview the flow of genetic information is divided into two successive steps: During transcription a copy of the genetic information encoded in the genome is synthesized in the form of messenger RNA (mRNA) by the RNA polymerase. In a second step called translation the information, that is encoded in the order of the nucleotides of the nucleic acids on the mRNA, is decoded and translated into an amino acid sequence, which ultimately forms a functional protein (Figure 1) (1,2). In the genetic code one codon consisting of three nucleotides

(16)

Molecular Biology” in 1958 and later proven by Robert W. Holley, Har Gobind Khorana, Heinrich Matthaei, Marshall W. Nirenberg and colleagues, whose combined efforts deciphered the genetic code (4-9). Protein synthesis primarily occurs at the ribosomes. Due to the lack of a nucleus transcription and translation are not temporally and spatially separated in a prokaryotic cell. In addition, bacterial translation initiation can occur at multiple sites on a polycistronic mRNA (1).

Many ribosomes can be assembled on an mRNA simultaneously, this complex is referred to as polysomes (10). This seemingly simple picture has become increasingly more complex in recent years as more non-protein regulators of cellular processes have been described in bacteria: An increasing number of non-coding RNAs (11,12) and other RNA-based mechanisms have been identified that influence gene expression, e.g. riboswitches (13-16), small regulatory RNAs (17), the T box mechanism (18) and CRISPR interference (19,20). Furthermore there is increasing evidence for non-canonical nucleic acid structures to directly or indirectly influence replication, recombination, transcription and translation on the DNA or RNA level (21-28).

1.1 Nucleic Acid Building Blocks

DNA was isolated for the first time in 1869 by Friedrich Miescher (3). Three components make up the chemical structures of a nucleic acid: a nitrogen base and a pentose sugar are connected by a glycosidic linkage between 1’ carbon and a nitrogen atom in the base forming a nucleoside, a phosphate group is attached at the 5’ carbon of the pentose yielding the nucleotide. In 1929 Phoebus Levene identified 2’ deoxyribose as a component of DNA (29). Two types of nitrogen bases are found in DNA: the pyrimidines cytosine (C) and thymine (T) and the purines adenine (A) and guanine (G). As an example for a monomeric building block of DNA the deoxynucleotide guanosine monophosphate (GMP) is shown in Figure 2A. DNA and RNA differ in the sugar moiety and one of the bases. In DNA the pentose sugar is a 2’ deoxy-D-ribose, while D-ribose is found in RNA. The pyrimidine uracil (U) is used as a base in RNA instead of T. DNA is a linear polymer assembled by the linkage of the phosphate on the 5’ carbon to the 3’ position of the next following deoxyribose resulting in a phosphodiester bond. Such a polynucleotide exhibits an end-to-end directionality (5’ to 3’), the order of the nucleotides encodes the genetic information. In 1953 Rosalind Franklin and Maurice Wilkins used X-ray analyses to study nucleic acids and proposed that DNA is assembled in a regularly repeating helix with the phosphate groups lying on the outside of the helical structure (30). Based on this analysis James D. Watson and Francis H. C. Crick proposed the right-handed double-helical structure, in which two polynucleotide strands associate in an antiparallel fashion.

(17)

Figure 2: Nucleic Acid Building Blocks and Base Pairing Conformations

A: Deoxynucleotide guanosine monophosphate (GMP) is shown as an example of a monomeric building block of DNA.

Rotation of the base around the glycosidic linkage between the 1’ carbon of the deoxyribose and the nitrogen atom in the base switches the base between the anti and syn conformation. 1’ carbon and C8 of the purine base are marked with red circles for better comparison. Pink lines mark hydrogen bond donors and acceptors of the Watson-Crick (left) and Hoogsteen (right) face of the nucleobase. B: Watson-Crick base pairs A:T and G:C. Location of the major and minor groove are depicted by gray lines. Hydrogen bond acceptors are marked in blue, hydrogen bond donors are marked in red. Both bases are in the anti conformation. Figure 2B is adapted from Chapter 27.1.2 in (1). C: Examples of additional interactions between nucleobases by Hoogsteen base pairing with one or more bases in the syn conformation. Top left A:T Hoogsteen base pair (adapted from (31)). Top right: G and protonated C form a Hoogsteen base pair (adapted from (32)). Bottom left: Guanine tetrad formation between four guanines. The tetrad is stabilized by eight hydrogen bonds (adapted from (33)). Bottom right: Triplex formation by Watson-Crick base pairing between A and T and a reverse Hoogsteen base pair A:A (adapted from (34)).

(18)

The two macromolecules are assembled by interaction between complementary hydrogen bond donors and acceptors between the bases located in the middle of the helix. In a regular B-DNA duplex two hydrogen bonds are formed between A and T, three hydrogen bonds between G and C (Figure 2B, see also Figure 3A for a 3D structure) (35). The ability of two bases to associate via hydrogen bonding is called base pairing. A base has two faces with which to engage in hydrogen bonding: the Watson-Crick face as described above and additionally the Hoogsteen face. Rotation of the base around the glycosidic linkage between 1’ carbon of the deoxyribose and the nitrogen atom in the base switches the base between the anti and syn conformation as is shown for GMP in Figure 2A. In B-DNA all bases are found in the anti conformation (Figure 2B). Base flipping allows pairing via the Hoogsteen face, which creates a variety of other interaction possibilities. As an example an A:T Hoogsteen base pair is shown in Figure 2C, here T engages in hydrogen bonding with the Watson-Crick face, while A is presented with the Hoogsteen face (31). Another possibility is Hoogsteen base pairing between G and the protonated form of C (32). Hoogsteen base pairs are also transiently formed in canonical duplex DNA, for instance both examples that just were mentioned. A special arrangement is the guanine tetrad formation between four guanines interacting by Hoogsteen base pairing. The tetrad is stabilized by eight hydrogen bonds in total (33,36). This is the building block found in G-quadruplexes, which will be explained in more detail in Chapter 1.3. As fourth example triplex formation by Watson-Crick base pairing between A and T and a reverse Hoogsteen base pair A and A is shown (34). Non-Watson-Crick base pairing is also important for the ready and complex structure formation of RNA molecules (37).

1.2 Non-Canonical Nucleic Acid Structures

Despite often being labeled as the inactive storage of genetic information DNA is of dynamic and very polymorphic nature. A variety of alternative secondary structures can be adopted, e.g. A or H-DNA (38,39), left-handed Z-DNA (40,41), cruciforms (42) and the four-stranded G-quadruplex (25,43,44). Transcriptionally inactive DNA exists predominantly in the stable right-handed B-DNA conformation, strand separation is a requirement to initiate transcription and can induce the formation of non-canonical nucleic acid structures in the single-stranded DNA (45). Such non-B DNA structures are stabilized by both Watson-Crick and non-canonical base pairs (21). Non-B DNA has been established as sites of genomic instability in eukaryotic as well as prokaryotic genomes (23,46). Genomic variability has especially been attributed to repetitive motifs and will be dealt with in more detail in Chapter 1.4.

(19)

Figure 3: Examples of Non-Canonical Nucleic Acid Structures

Secondary structure (rainbow) with hydrogen bonds as pink dashed lines and solvent accessible surface structure of examples of non-canonical nucleic acid structures are shown. Structures were accessed at the RSCB protein data base (www.rcsb.org) (47). Pictures were created with the on site Jmol 3D view or Protein Workshop Freeware. A: crystal structure of a B-DNA decamer d(CGATTAATCG) duplex with a bound Mg2+ ion (green) (PDB ID 1D49 (48)), B: crystal structure of a Z-DNA hexamer duplex with spermine (PDB ID 3P4J (49)), C: NMR solution structure of a DNA hairpin formed by self-complementary DNA (PDB ID 2M8Y (50)), D: crystal structure of a DNA Holliday junction (PDB ID 467D (51)), E: solution NMR structure of an intramolecular triplex (PDB ID 1B4Y (52)).

(20)

F: NMR solution structure of the d(AACCCC) tetrahymena telomeric repeats forming an intermolecular i-motif (PDB ID 1YBL, (53)). G: NMR structure of the parallel, intermolecular G-quadruplex formed by d(TTAGGGT) in solution (PDB 1NP9 (54)), H: NMR solution structure of a intermolecular (3+1) hybrid G-quadruplex with three strands in parallel orientation and one strand in antiparallel orientation (PDB ID 2AQY (55)), I: crystal structure of the propeller structure of a intramolecular, parallel G-quadruplex from human telomeric DNA with coordinated Na+ ions (violet) (PDB ID 1KF1 (56)), J: NMR structure of the anti-parallel (2+2) intramolecular G-quadruplex from human telomeric DNA in Na+ solution (PDB ID 2MBJ (57)).

(21)

This section gives a brief overview of examples for non-canonical nucleic acid structures shown in Figure 3 with the intent to give an insight into the variety of possible arrangements and three dimensional structures; this list is by no means complete. For comparison Figure 3A shows the structure of a regular B-DNA duplex.

While A or H-DNA is formed by oligo-purine or –pyrimidine runs (38,39), left-handed Z-DNA is formed by alternating purine/pyrimidine patterns (58). d(CGCGCG) was crystallized in 1979 by Alexander Rich and co-workers and revealed a left-handed duplex with altered helical parameters in comparison to B-DNA (59). A refined structure co-crystallized with the polyamine spermine by Brzezinski et al. is shown in Figure 3B. Hairpins and cruciforms can be formed by palindromes and close inverted repeats that are per definition self-complementary. Hairpins are formed within one strand. Watson-Crick base pairing holds the strand together while the nucleotides at the turning point bulge out. Figure 3C shows the 3D structure of d(CGCGAAGCATTCGCG) determined by NMR, the palindromic regions are underscored (50). A Holliday junction is shown as an example of a cruciform DNA or four way junction. Holliday junctions are the key intermediate in recombination processes (60). The complex shown in Figure 3D formed by four d(CCGGGACCGG) molecules interacting with each other was determined by X-ray crystallography (51). DNA triplex structures can be formed intra- and intermolecularly. H-DNA is a type of triplex DNA formed intramolecularly by homopurine-homopyrimidine stretches that are mirror repeats (61). The structural characterization of d(TCTTCCTTTTCCTTCTCCCGAGAAGGTTTT), a triplex forming oligonucleotide, was carried out by NMR (Figure 3E), nucleotides participating in triplex helix formation are underscored (52).

The so-called i-motif is formed from C-rich DNA at mild acidic conditions, which enables the formation of hemiprotonated cytosine-cytosine+ base pairs (a scheme is shown in Figure 9B, Chapter 1.4.1) (62). The structure may form intermolecularly by association of various strands or intramolecularly from consecutive runs of C’s. Figure 3F shows a tetramer formed by d(AACCCC), which is complementary to the tetrahymena telomeric repeats (53).

Finally, G-quadruplexes are four-stranded helical complexes that are assembled from multiple stacked G-tetrads as shown in Figure 2C. A wide range of topologies can be adopted, examples shown are a tetrameric parallel G-quadruplex (Figure 3G), an assymetric dimer forming a hybrid structure (Figure 3H), an intramolecular parallel (Figure 3I) and an intramolecular antiparallel G- quadruplex (Figure 3J). Evidence for G-quadruplexes as important components in cellular processes has been increasing in recent years (25,63-65). Among others their participation in recombination (66) and replication (67) has been implicated. In addition G-quadruplexes are very stable DNA structures that have been exploited for nanotechnological purposes. For instance they have been used as building blocks in DNA nanoarchitectures (68) and nanodevices (69). G- quadruplexes are the central component of this thesis and will be explained in detail in the following chapter.

(22)

1.3 G-Quadruplexes

Oligonucleotides containing guanine rich regions can spontaneously assemble into tetrameric arrangements, so-called G-quadruplexes. Self-association of guanines at millimolar concentrations had been observed since the 19th century, when in 1910 Bang reported the formation of polycrystalline gels by concentrated solutions of guanosines (70). Guanines can interact via Hoogsteen basepairing to form a tetrameric square arrangement, a so-called tetrad (Figure 4A). In G-rich oligonucleotides several tetrads can stack upon each other to assemble a G- quadruplex structure. Each tetrad is rotated with respect to the adjacent one forming a helical structure. A quadruplex is stabilized by metal cations bound in the central cavity, primarily by the monovalent cations Na+ and K+ that coordinate to the O6 carboxy oxygen of the guanines, yielding compact and stable structures (Figure 4B) (44,71). In those structures four Hoogsteen hydrogen- bonded guanine bases form a square co-planar array as determined by crystallographic methods by Gellert et al. in the 1962 (72). The overall fold is further stabilized by π-π – stacking interactions between these tetrads assembling the tetrameric units into large helical structures (44). In 1988 Sen and Gilbert studied the G-rich regions of the immunoglobin switch region and found them to form a complex of lower electrophoretic mobility than expected in native polyacrylamide gelelectrophoresis (PAGE) (33). A year later Williamson et al. reported the unexpected high electrophoretic mobility of oligonucleotides corresponding to Oxytricha telomeric DNA d(T4G4) in native PAGE in the presence of monovalent cations (36). Both labs independently suggested G- quadruplex formation.

The general formula for a potential unimolecular G-quadruplex is GnN1–7GnN1–7GnN1–7Gn. Gn refers to the G-tract (the number of consecutive G’s in a strand), which usually consists of two to four G’s, and N refers to the loop nucleotides that form the connection between G-tracts. G-quadruplexes can be formed from DNA and RNA (44).

Figure 4: G-Quadruplex Structure

A: G-tetrad formation by four guanines interacting via Hoogsteen hydrogen bonding (blue dashes). Monovalent metal cations (green) in the central cavity stabilize the structure by coordinating to the negatively charged O6’s. B: Several G- tetrads stack upon each other to form a compact, helical structure. Two examples of intramolecular G-quadruplexes are shown with different types of connecting loops (dashed lines) between the individual G-tracts (solid lines). Metal cations in the central cavity stabilize the overall fold. Depending on the size of the cation it can be located within the G-tetrad plane (e.g. Na+) or between two G-tetrads (e.g. K+).

(23)

As described in the following G-quadruplexes can show very different topologies depending on stoichiometry and strand orientations. As a result different types of loop geometries are found connecting the individual G-tracts: lateral loops joining adjacent G-strands, diagonal loops crossing a tetrad and joining opposite G-strands and external loops connecting a tetrad on one end of the stack with a tetrad on the other end (Figure 4B). In addition G-tract length, and herewith the number of tetrads, may vary and influence the overall topology (44,73-75).

1.3.1 Structural Features of G-Quadruplexes

Many different structures and topologies have been characterized by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy in solution in the last years. To date (December 2014) 136 entries of DNA G-quadruplex structures have been deposited in the nucleic acid database (http://ndbserver.rutgers.edu/). G-quadruplexes can be formed by one G-rich oligonucleotide strand folding back on itself yielding an intramolecular G-quadruplex (Figure 5A).

In addition intermolecular G-quadruplexes can be formed by two or more different DNA strands that interact with each other. The number of the stacked G-tetrads as well as the orientation and length of the loops lead to a high heterogeneity of G-quadruplex topologies as reviewed in (44,73,75).

In case of intramolecular quadruplexes one generally distinguishes between four different types of loop and strand arrangements as illustrated at the example of a G-quadruplex with three tetrads in Figure 5A: In an antiparallel G-quadruplex the strands are arranged in pairwise antiparallel orientation (2+2). Two prominent conformations are the chair structure with only lateral loops (Figure 5A left) and the basket structure with two lateral and one diagonal loop (Figure 5A, 2nd from left). In a parallel G-quadruplex all G-tracts are arranged with the same strand polarity; as result only external loops are found (Figure 5A right). An example of the 3D structure of an all parallel propeller structure determined by NMR is also shown in Figure 3I. In the so-called (3+1) hybrid structure three strands are found in parallel arrangement with the fourth being aligned in the antiparallel orientation. Lateral and external loops connect the different G-tracts in this case (Figure 5A 3rd from left).

Examples of various intermolecular G-quadruplex are shown in Figure 5B. Depending on loop arrangement three different antiparallel G-quadruplexes with two lateral or two diagonal loops can be distinguished. Furthermore the topology of a (3+1) hybrid formed from two nucleotide strands of unequal length is shown. The NMR solution structure of such an asymmetric complex in Na+ solution was reported by Zhang et al. and the 3D structure is shown in Figure 3H (55). Most of the first reported G-quadruplex structures were tetramers formed from short oligonucleotide strands (Figure 5B bottom) (33,36). Again antiparallel as well as parallel topologies are possible.

(24)

Figure 5: G-Quadruplex Topologies

A: Examples of different intramolecular G-quadruplexes with three G-tetrads, guanines are depicted by gray squares.

The following types of G-quadruplexes are distinguished according to strand orientation, from left to right: antiparallel chair structure, antiparallel basket structure, (3+1) hybrid with three strands in parallel orientation and one strand in antiparallel orientation and the all parallel propeller structure with only external loops. The general oligonucleotide sequence GGGLnGGGLnGGGLnGGG is shown underneath; G-tracts with bottom-to-top orientation are marked green, the reverse orientation in blue, connecting loops are gray. B: Examples of different intermolecular G-quadruplexes with three G-tetrads. Dimeric G-quadruplexes are shown in the top row, left: three antiparallel G-quadruplexes formed by GGGLnGGG dimers, right: a dimeric (3+1) hybrid structure. Bottom row: tetrameric G-quadruplexes with antiparallel and all parallel topologies.

To gain an insight into the relationship between loop length and topology Neidle and co-workers studied the loop length-dependent folding of G-quadruplexes and noted a preference for quadruplexes with short loops (1-2 nt) to fold in the parallel conformation with lateral loops.

Molecular dynamics simulations showed that linker length of T1-2 was too short for diagonal crossing of a tetrad, increased instability and hence favoured the formation of a parallel topology (74,76). Using a combination of circular dichroism (CD), UV melting, molecular modelling and simulation techniques, they determined an all parallel intramolecular G-quadruplex as the only possible structure for nucleic acids with three loops consisting of single nucleotides, e.g.

d(TGGGTGGGTGGGTGGGT) (74).

While all of the above examples depict G-quadruplexes with perfect G-tracts also broken-strand structures and G-tracts with looped out nucleotides have been reported. For instance Chen et al.

recently discovered a novel broken-strand structure for a G-quadruplex found in the G-rich

(25)

Figure 6: Syn- and Anti-Conformations of the Guanine Bases in G-Tetrads

A: Rotation of the nucleobase around the glycosidic bond results in anti or syn glycosidic arrangements. The gray highlighted region represents the sugar phosphate backbone, in a G-tetrad the base in the anti arrangement would be standing out from the paper plane, while in the syn conformation the base would be located behind the paper plane.

Figure adapted from (75). B: Examples of arrangements of syn/anti glycosidic arrangements in the different G- quadruplex topologies. In quadruplexes with antiparallel oriented strands nucleobases are rotated around the glycosidic bond in order to retain tetrad formation. Guanines with syn conformation are shown in blue, anti conformation is shown in green. Figures from B are adapted from Chapter 1 in (78).

In G-quadruplexes containing antiparallelly oriented strands nucleobases need to be rotated around the glycosidic bond in order to retain the Hoogsteen base pairing within the tetrad, such quadruplexes show syn as well as anti glycosidic conformations (Figure 6A). This results in four possible types of glycosidic steps between two guanines from the same strand: syn-syn, anti-anti, anti-syn and syn-anti (79). Structures with any combination of arrangements have been observed, selected examples are shown in Figure 6B (80). Alternating syn/anti or anti/syn glycosidic arrangements along the G-strand are found in antiparallel G-quadruplexes with lateral and diagonal loops. In the hybrid structure one base per tetrad is flipped in relation to the others, while in an all parallel G-quadruplex with only external loops the same glycosidic arrangement can be adopted by all bases.

(26)

1.3.2 G-Quadruplex Polymorphism

The formation of G-quadruplexes is cation-dependent. The guanine O6 oxygens all point towards the G-quadruplex core thereby creating a strong negative electrostatic potential in the central cavity that is counterbalanced by the coordination of cations within the central pore. Metal ions can be located within the G-tetrad plane or between two successive G-tetrads. K+ ions are always found at equal distance between two tetrads coordinating to eight oxygen atoms in a symmetric tetragonal bipyramidal configuration, while Na+ can occupy both positions (44,80). Strand stoichiometry and the nature of the coordinated cation are the major contributors to G- quadruplex polymorphism. It cannot be said a priori which structure a certain sequence will adopt.

A well-studied example for the polymorphic nature of G-quadruplexes is the human telomeric sequence, hTel d[(TTAGGG)n]. A plethora of different structures have been described for this sequence under different experimental conditions. Crystal structures of d[AG3(T2AG3)3] reveal a parallel topology when folded in the presence of K+ (56), whereas an antiparallel conformation in solution containing Na+ was detected for the same sequence by NMR (81). Furthermore, several co-existing structures have been postulated for this sequence in solution when stabilized by K+ (82-86).Using electron paramagnetic resonance (EPR) spectroscopy Singh et al. have been able to elucidate the polymorphic nature of d[(G3T2A)3G3] in K+ solution, in which a 1:1 mixture of the parallel propeller and the antiparallel basket structure was detected (87). The same distribution was found when the quadruplex sequence was injected into Xenopus oocytes for in cellulo measurements (88). However, when studying individual G-quadruplex units within the context of extended sequences composed of the human telomeric DNA repeat, formation of a (3+1) hybrid structure was detected (89).

The Oxytricha telomeric sequence d[(G4T4)3G4] as well as the Tetrahymena-related telomeric sequence d[(G4T2)3G4] were also shown to adopt multiple conformations in solution depending on the stabilizing ion in electrophoretic assays and CD measurements. In case of d[(G4T2)3G4] Na+ promoted the antiparallel conformation, however, the parallel and tetrameric conformation was formed in the presence of Sr2+ or K+ (90). Thomas and co-workers have demonstrated the structural polymorphism of Oxytricha telomeric DNA by Raman spectroscopy. At low concentrations of Na+ or K+ the sequence adopted an antiparallel foldback quadruplex; with increasing alkali ion concentrations interquadruplex conversion took place yielding a parallel quadruplex (91,92). Part of the Oxytricha telomeric sequence d[G4T4G4] has been shown to fold into a dimeric antiparallel quadruplex in solution with Na+ by Sugimoto and co-workers (93,94).

Addition of divalent cations, particularly Ca2+, lead to oligomerization of the sequence and switched the conformation to the parallel topology (94).

(27)

1.3.3 Frequently Used Methods for the Characterization of G-Quadruplexes

A variety of methods have been employed to monitor and determine thermodynamic and kinetic aspects of G-quadruplex formation and to elucidate G-quadruplex structures and stoichiometry.

Techniques that have been used to study quadruplexes are spectroscopy methods as e.g. nuclear magnetic resonance (NMR) (95), circular dirchroism (CD) (96), ultraviolet absorption (97,98), Raman spectroscopy (91), and electron paramagnetic resonance (EPR) (87,88,99,100). In addition crystallography (101), DMS footprinting and electrophoretic mobility shift assays (EMSA) are also commonly used. Less frequently employed are DNA polymerase Stop Assays (33,102), fluorescence resonance energy transfer (FRET) (85,103), nuclease sensitivity (104), photo crosslinking (105) and analytical ultracentrifugation (106,107).

X-ray crystallography and 2D NMR give information of atomic resolution. However in contrast to crystal structures, which only give a static picture, NMR also gives an insight into G-quadruplex dynamics. One dimensional 1H NMR has been used to study G-quadruplex structures early on (107,108). In different types of nucleic acids the imino protons of the nitrogen bases are involved in different types of hydrogen bonding. The characteristic proton chemical shifts can be used to distinguish between different arrangements of nucleic acids: 10–12 ppm for G-tetrads, 15–16 ppm for C:C+ base pairs found in i-motifs, and 12–14 ppm for Watson–Crick A:T and G:C base pairs (95,109). One disadvantage of NMR is the high sample concentration of 1 to 3 mM that is required for measurements. EMSAs are frequently used to distinguish monomeric from multimeric complexes. Mobility in the electric field is determined by the size, shape and charge of the molecule during its passage through the gel. Under denaturing conditions the strands are separated with mobility determined by their molecular weight. Strictly native conditions need to be applied when studying multimeric complexes or intrastrand structures that fold back onto themselves.

Structure formation, strand stoichiometry, ion dependence and temperature dependence can be deduced from EMSA, however one needs to be cautious with the assignment of possible structures as buffer conditions, temperature and voltage may greatly impact mobility (73,80,102).

Circular dichroism (CD) spectroscopy is the method that was applied for the characterization of G-quadruplex folding throughout this thesis and will be explained in more detail in the following.

When circularly polarized light passes through an absorbing, optically active medium, left-handed and right-handed circularly polarized light will be differentially absorbed by a chiral molecule/entity in this medium. The difference in absorption of the left-handed and right handed circularly polarized light is called circular dichroism (CD) and can be measured in CD spectroscopy. The output is an absorption difference spectrum (110,111). The quantity used to described CD is ellipticity θ. θ is recorded versus wavelength and measured in millidegrees (mdeg or deg x cm² x dezimol-1). CD spectroscopy is a straight-forward and sensitive method to quickly

(28)

comparison to known structures. Although it gives less specific and resolved structural information than X-ray crystallography or NMR, it nevertheless can be used to detect the global features of a molecule and distinguish between topologies of different G-quadruplex species (96,112,113). DNA as well as RNA G-quadruplexes can be investigated (114). CD requires little amounts of sample; usually µM concentrations of nucleic acids are sufficient. As the measurement is carried out in solution a wide range of solvents, temperature, pH and ionic strength of a buffer can be explored (44,96). CD also allows the measurement of kinetics and tracking of conformational transitions (112). The different arrangement of anti and syn glycosidic angles and herewith the overall geometry of G-quadruplexes with alternating and non-alternating G-tetrad polarity allows the discrimination between parallel and antiparallel topologies (115). Commonly the typical CD spectrum of an antiparallel quadruplex shows a maximum at around 290 nm and a minimum at 265 nm, whereas a characteristic spectrum of a parallel quadruplex displays a maximum at about 260 nm and a minimum at 240 nm (112,113,116,117). Hybrid structures with attributes of both aforementioned species can be detected showing a minimum at 240 nm, shoulder at about 270 nm and maximum at 290 nm (83) (Figure 7). In addition mixtures of different topologies can be detected. CD spectra alone are no final proof of the exact molecular structure of a G-quadruplex and should be used in combination with other techniques; however they do give general evidence of structure formation.

Figure 7: Examples of CD Spectra of G-Quadruplexes with Different Topologies and an i-Motif

Antiparallel G-quadruplexes typically show a maximum at around 290 nm and a minimum at 260 nm, whereas parallel G-quadruplexes display a maximum at 260 nm and a minimum at 240 nm. Quadruplexes with hybrid topology have attributes of both aforementioned species showing a minimum at 240 nm, shoulder at about 270 nm and maximum at 290 nm. i-motifs can also be detected in CD with a minimum at 260 nm and maximum around 280 nm.

Thermodynamic parameters concerning G-quadruplex can be obtained from CD and UV melting experiments. Upon raising temperature the hydrogen bonds stabilizing the G-quadruplex assembly will break and the associated nucleic acid strands will separate. This process of nucleic acid denaturation can be studied by observing the change in ellipticity at a given maximum in the spectrum or monitoring hypochromicity of UV absorbance at 295 nm over a temperature gradient.

(29)

From this data the melting temperature T1/2 can be determined, which in turn can be used to calculate thermodynamic (ΔH, ΔG, and ΔS) and kinetic parameters. Renaturation may also be studied. In addition melting temperatures can be used to deduce the molecularity of a formed structure. For an intramolecular G-quadruplex T1/2 should be temperature independent, whereas a hysteresis phenomenon is observed for intermolecular G-quadruplexes due to slow folding and unfolding kinetics (97,98,118).

1.3.4 Putative Cellular Roles of G-Quadruplexes

Genome-wide analyses have detected potential G-quadruplex forming sequences in eukaryotic and prokaryotic genomes of highly divergent organisms (24,45,119,120). The fact that their formation is cation dependent and stabilization especially occurs by K+ and Na+ favors their formation under physiological conditions. Formation of intramolecular G-quadruplexes requires the nucleic acid to be single-stranded, thus sites that are prone to G-quadruplex formation are promoter regions, the replication fork and recombination sites where duplex DNA becomes unzipped, mRNA which is naturally single-stranded and in addition the single-stranded overhangs of telomers in eukaryotes (121). Indeed in eukaryotes, especially in the human genome, quadruplexes have been found in functional genomic domains, such as oncogene promoter regions or at the telomers, which in the past has turned the spotlight on using them as druggable sites e.g. for cancer therapeutics (43,79,122,123). Proof for abundant quadruplex formation in vivo is increasing. For example in 2011 Rodriguez et al. found that cells treated with the small molecule G-quadruplex binder pyridostatin showed transcription- and replication dependent DNA damage.

Chromatin immunoprecipitation of the DNA damage marker γH2AX followed by high throughput sequencing analyses showed that the DNA associated to γH2AX was enriched for putative G- quadruplex binding sequences. Furthermore cell imaging with labelled pyridostatin showed co- localization with the helicase hPif1 (124). Pif1 from yeast is known to resolve G-quadruplexes during replication (67). In 2013 Balasubramanian and co-workers were the first to show the distribution of G-quadruplexes on eukaryotic chromosomes and their regulation during cell cycle progression in different types of human cell lines using a G-quadruplex-specific antibody (64). In a follow-up study the group also demonstrates the visualization of RNA G-quadruplex structures within the cytoplasm of human cells (65). Putative cellular roles that have been assigned to quadruplex structures are the involvement in organization and protection of the telomeres, stalling of the replication fork machinery, promotion of homologous recombination and regulation of transcription (25). Figure 8 gives an overview of the proposed cellular functions of G- quadruplexes. Possible scenarios will be explained in the following, then examples of reported G- quadruplex functions from the literature will be presented. As the circular genomes of bacteria

(30)

Figure 8: Putative Functions of G-Quadruplexes

G-rich sequences in non-telomeric regions of the genome might transiently form G-quadruplex structures (blue) after DNA duplex (gray and black) separation during replication, recombination or transcription. In addition mRNA (violet) is prone to G-quadruplex formation during translation. A: During replication G-quadruplexes may form in the

(31)

(green) progression or lead to gapped replication, if the template strand is looped out and replication is reinitiated at a downstream position. B: Recombination can be initiated by G-quadruplexes either by the secondary structure itself serving as homologous region, which will induce strand exchanges between different genomic regions (gray and black), or by double strand breaks (red lightning) that occur in proximity to the G-quadruplex, which will recruit the DNA repair machinery and lead to illegitimate recombination. Recombination may also be facilitated by keeping the DNA duplex in the single stranded conformation allowing for strand exchange. Adapted from (25). C: In promoter regions G- quadruplexes may repress transcription directly by forming an obstacle for the RNA polymerase or indirectly by recruiting protein factors that will inhibit the polymerase. Stimulation of transcription can be achieved by keeping the DNA duplex in the single stranded conformation or again indirectly by recruiting activating factors. Adapted from (25).

D: On the mRNA G-quadruplexes can inhibit translation either by blocking access of the ribosome (orange) to the RBS (orange line) or forming an obstacle for ribosome progression downstream of the RBS, which will lead to a truncated protein (gray circles) and possibly RNA degradation by RNases (gray pacman). The nascent polypeptide chain is represented by colored circles. Translation initiation could be achieved by freeing an RBS that is trapped in another secondary structure. Stalling of the ribosome during translation can induce frameshifts and lead to an altered gene product.

On the DNA level a G-quadruplex may interfere with replication, recombination or transcription.

Inhibition of replication can be achieved by a G-quadruplex structure formed in the strand that serves as template for the leading strand by stalling of the DNA polymerase (Figure 8A left). In the lagging strand replication takes place discontinuously. Here, G-quadruplex formation of the template strand may lead to gapped replication. Part of the template strand may loop out into a non-canonical structure and be bypassed by the polymerase, replication would be reinitiated at the next Okazaki fragment (Figure 8A right).

G-quadruplex formation has also been implicated during recombination: Due to their repetitive nature G-quadruplexes may represent regions of sequence overlap with other genomic regions.

Recombination could be initiated at the homologous secondary structures and lead to sequence exchange between different genomic regions associated to G-quadruplexes. A G-quadruplex may also facilitate recombination by keeping the duplex in the single stranded conformation and thereby promote strand exchange reactions. Furthermore strand breaks or deletions have been associated to alternative secondary structures. The DNA repair machinery may be recruited to the non-canonical structure and illegitimate recombination can occur (Figure 8B).

When located within promoter regions G-quadruplexes can function as transcriptional regulators.

Similarly to the situation during replication G-quadruplex formation in the template strand may again directly inhibit RNA polymerase progression. However, if the G-quadruplex were to be located on the non-template strand, transcription would be facilitated as the non-canonical structure would keep the DNA strands from realigning. Transcription could also be stimulated or repressed indirectly by binding of the G-quadruplex by protein factors, which in turn either recruit or inhibit the RNA polymerase (Figure 8C).

Finally, G-quadruplexes may also be located on the mRNA and then influence translation. G- quadruplex forming sequences positioned adjacent to or within the ribosomal binding site (RBS), may block the ribosome from binding to the mRNA. In this case translation initiation would be blocked. Conversely, the RBS may also be trapped in a secondary structure, such as a stem-loop, and not be accessible for the ribosome. In this case G-quadruplex formation in proximity to the

(32)

formation of a non-canonical structure within the mRNA located downstream of the ribosome may cause stalling of the ribosome. Translation may be stopped completely, leading to production of a truncated protein and potential degradation of the blocked mRNA. Ribosomal stalling may also induce frameshifting resulting in an altered gene product upon continuation of translation (Figure 8D bottom).

1.3.5 Evidence of G-Quadruplex Formation in vivo

To date the majority of studies concerning the function of G-quadruplexes have been carried out in eukaryotes. In particular, quadruplex-forming sequences have been characterized within the promoter regions of proto-oncogenes, e.g. c-MYC, KRAS and c-KIT, where they serve as transcriptional regulators (125-127). When comparing quadruplex folding abilities of a variety of DNA sequences and their respective RNA counterparts in vitro Joachimi et al. found that all RNA sequences exclusively formed G-quadruplex structures with parallel strand orientation that were often more stable than the structures adopted by the homologous DNA sequences (114). In mammalian cells Hartig and co-workers were able to show that synthetic RNA G-quadruplexes inserted into the 5’ untranslated region (5’ UTR) in front of a luciferase reporter gene provided predictable repression of gene expression by acting as translational suppressors (128,129).

Furthermore whole-transcriptome analyses conducted in HeLa S3 cells detected specific changes for quadruplex-containing genes upon treatment of the cells with the G-quadruplex-specific small molecules TMPyP4, 360A and PhenDC3 (130,131). All of the studies mentioned above hint towards a regulative role of G-quadruplexes in transcription. Less examples are reported with G- quadruplex as regulators on the translational level. Kumari et al. identified an RNA G-quadruplex forming sequence in the 5’ UTR of the human NRAS proto-oncogene transcript. Translational repression by this structure was shown in a cell-free translation system coupled to a reporter gene assay (59). Endoh et al. describe a putative G-quadruplex forming sequence in the ORF of the human estrogen receptor α mRNA. They observed a pause in translation at the G-quadruplex site in vitro and differences in the proteolysis of the protein when the G-quadruplex construct and related mutants were expressed in cells (132). Finally, Beaudoin and Perreault reported G- quadruplexes in the 3’ UTR of the LRP5 and FXR1 genes that likely regulate alternative polyadenylation and mRNA shortening (133).

Regulative functions of G-quadruplexes have also been implicated for recombination. In an in vitro assay Boan et al. found recombinant PCR products from templates containing four tandem repeats of TGGGGC as it is found in the G-rich human minisatellite MsH43 (60). Paeschke et al. carried out genome-wide chromatin immunoprecipitation to determine the in vivo binding sites of the multifunctional Saccharomyces cerevisiae Pif1 DNA helicase capable of unwinding G-quadruplex

(33)

addition Pif1-deficient cells showed slowed replication in vicinity of these motifs and strand breaks increased. Replication was further slowed down by introduction of additional artificial G- quadruplex sequences (67).

Although these and many other studies hint at diverse roles for quadruplexes in eukaryotic cells relatively little is known about potential functions of G-quadruplexes in bacteria despite quadruplex-forming sequences being a wide-spread sequence motif in bacterial genomes. In an artificial setup Wieland et al. were able to show that quadruplexes masking the ribosomal binding site within an mRNA lead to repression of gene expression in Escherichia coli (E. coli) and that the level of repression correlated with the thermodynamic stability of the quadruplex (134). Recently, Holder and Hartig studied the multifaceted effects of G-quadruplexes as potent transcriptional and translational regulators in E. coli. Regulatory effects were strongly depended on strand orientation and the exact location within the promoter region, 5´-UTR or 3´-UTR. While inhibitory effects were observed upon insertion of G-quadruplex forming sequences anti-sense to the core promoter region, the same sequences inserted after the -10 region had activating effects. Up- and downregulation of gene expression was observed upon insertion of G-quadruplexes on the sense strand in or near the ribosomal binding site (78). Suppression of translation elongation by quadruplex-forming sequences found in protein-coding sequences (ORFs) in E. coli has recently been demonstrated by Endoh et al. (135). In a follow-up study it was shown that ribosomal stalling by such an RNA quadruplex can cause a -1 ribosomal frameshift in cellulae (136). In a computational search Chowdhury and co-workers identified potential quadruplex forming sequences in promoter regions of bacteria and found them to be enriched in certain gene classes.

Specifically, in Deinococcus radiodurans potential quadruplex sequences are located in regulatory regions of genes contributing to radioresistance. Upon treatment with a quadruplex-binding ligand attenuation of the radioresistance was observed (137). In a landmark study Seifert and co- workers identified a cis-acting quadruplex sequence that is necessary for pilin antigenic variation in Neisseria gonorrhoeae (66). Antigenic variation takes place via a non-homologous recombination event between a single expressed pilE locus and many silent donor loci. Mutation of a G-rich sequence with quadruplex forming potential upstream of pilE inhibited recombinational switching at the variable locus. Quadruplex formation is required for nicking the DNA, the break site is then further processed by the recombination machinery which results in antigenic variation (66). Furthermore Seifert and co-workers identified a conserved promoter sequence adjacent to the pilE quadruplex motif; upon DNA strand separation during transcription G-quadruplex formation would be possible. Indeed transcription of a cis-acting, non-coding small RNA from this promoter was found to be essential for antigenic variation to commence (138).

Concerning duplex DNA destabilization in close proximity to G-quadruplex or i-motif forming sequences, König et al. report that a spacer of five bp next to the alternative DNA motif already is sufficient to maintain stability of the adjacent duplex regions (139).

(34)

1.4 Repetitive DNA Sequences

Simple sequence repeats (SSRs) are very abundant in the human genome and also ubiquitous in prokaryotes (140,141). Short tandem repetitive DNA patterns of generally 1-6 nt are referred to as microsatellites; they account to 3% of the genome in humans (124). Microsatellites have first been identified in eukaryotes in the early 1980s, when sequencing of alleles at the human globin locus revealed a varying number of short sequence motifs 5’ to the β-globin gene (142,143). This was shortly followed by the identification of alternating pyrimidine-purine polymers which convey the potential to form Z-DNA (58). DNA fingerprinting then showed extensive length variation of tandem repeats highlighting these sequences as hypervariable regions and sources of genetic variation (144,145). Since 1989 these differences in microsatellites have been exploited in genotyping (146).

1.4.1 Putative Structures Formed by Repetitive DNA Sequences

So far research on SSRs has primarily been focused on short mono-, di-, tri- and tetranucleotide repeats of which every possible combination has been found to be vastly over-represented in the human genome (140). Especially trinucleotide expansions in ORFs, introns or UTRs have been the main area of interest. In these cases repeat instability gives rise to human neurodegenerative diseases, e.g. Huntington disease (147), spinobulbar muscular atrophy (148) and Fragile X syndrome (149). During replication hairpin structures (Figure 9A) can be formed from palindromic regions within these repeats, which leads to either expansion or deletion of the repetitive element depending on its location in the template or nascent strand during replication (see Chapter 1.4.3).

G-rich repeat sequences (Figure 9C) can give rise to G-quadruplex structures, which have been explained in detail before (Chapter 1.3). A G-rich element on one strand of the DNA duplex is inevitably tied to existence of a C-rich pattern in the complementary strand. The so-called i-motif structure (Chapter 1.2, Figure 3F) is formed from C-rich oligonucleotides at mild acidic conditions, which enables the formation of hemiprotonated cytosine-cytosine+ base pairs (Figure 9B) (62).

The structure may form intermolecularly by association of various strands or intramolecularly from consecutive runs of C’s (Figure 9C). Although formation of the i-motif is favored at lower pH, some sequences are able to stably fold i-motif structures even at neutral pH (150). Formation of i-motifs in vivo remains to be shown, although proteins have been identified in human cells that bind to C-rich regions capable of forming i-motifs, e.g. transcription factors hnRNP K and NM23- H2 interact with the i-motif located in the c-MYK promoter region (151,152). An interplay between the G-quadruplex and the i-motif in the nuclease hypersensitive element (NHE) III1 of the c-MYC

(35)

promoter has been proposed (104). i-motifs may form under special environmental conditions, especially under negative supercoiling conditions (62,104). Mirror repeats (Figure 9D) in the DNA can give rise triplex structures (H-DNA) (see Chapter 1.2, Figure 3E). And inverted repeats are able to form four way junctions such as cruciforms (Figure 9E) and Holliday junctions (see Chapter 1.2, Figure 3E).

Figure 9: Examples of Structures Formed by Repetive DNA

A: Triplet tandem repeats have hairpin forming potential, example shows the CAG array coding for glutamine in the huntingtin gene, adapted from Chapter 27.6.6 in (1). Base pairing is represented by pink dashed lines.

B: Hemiprotonated C:C+ base pair. Hydrogen bonding is represented by pink dashed lines, sugar phosphate backbone is implied by R highlighted in gray. Adapted from (62). C: Repetitive G-rich and C-rich sequences in the complementary strand allow for G-quadruplex and i-motif formation. Shown are the sequence and suggested structure of the non- canonical structures found in the NHE III1 of the c-MYK promoter. (104). Guanine tetrads are represented by gray squares, C:C+ base pairs by gray triangles. D: Mirror repeats enable triplex formation, shown is the sequence of the H- DNA motif found in the Drosophila melanogaster hsp26 gene promoter (153). Watson-Crick base pairs are shown in pink and Hoogsteen base pairs in orange. E: Inverted repeats are able to form DNA four way junctions or so-called cruciform structures. Example shows the sequence motif and suggested structure of the p53 target sequence from the p21 promoter (42).

Referenzen

ÄHNLICHE DOKUMENTE

(°C) NANOG F: TGAACCTCAGCTACAAACAGGTG R: AACTGCATGCAGGACTGCAGAG 0.5 60 OCT3/4 F: CTTGCTGCAGAAGTGGGTGGAGGAA R: CTGCAGTGTGGGTTTCGGGCA 0.4 64 SOX2 F: AGAACCCCAAGATGCACAAC

[r]

[r]

Exploring the directionality of Escherichia coli formate hydrogenlyase: a membrane- bound enzyme capable of fixing carbon dioxide to organic acid... Substrate tolerance of

Besides sequencing of fragments from a metagenomic library, the search for interesting genes and functions within such a library using highly automated function- and sequence-

(2007) it was reported that oligonucleotide patterns that best separate taxonomic classes have a dual dependency: rank and genomic fragment length dependency. Conversely, the best

(1983) Rifampin prophylaxis versus placebo for household contacts of children with Hemophihts influenzae type b disease American Journal of Diseases m Children 137, 627-32.

In Chapter 5, newly-generated complete genome sequence information of the dengue virus were utilised to provide a historical account of the virus diversity in Malaysia since