• Keine Ergebnisse gefunden

Nucleic acid repeats are certain DNA motifs containing sequence elements which are repeated in several units. The similar units can either occur consecutively (in the same strand direction, e.g. 5’3’) or separated by different numbers of nucleotides (interspersed repeats). Furthermore, they can be located in opposite direction to each other (“mirror”

repeats). Nucleic acid repeat sequences appear to be related to the formation of non-canonical structures in genomic DNA (examples are shown in Figure 1.7).

18

Figure 1.7: Examples of DNA repeats.

Schematics of arbitrary repeat sequences on DNA level and their corresponding secondary structures are shown.

Repetitive units are framed in white. A Interspersed SSR (GGGT) able to form a G-quadruplex structure. B Mirror repeat (CTTCCCCTTTCT-NN-TCTTTCCCCTTC; N represents any nucleotide) which could form H-DNA. C Palindromic repeat (CTTCCCCTTTCT-NN-AGAAAGGGGAAG; N represents any nucleotide) which could form stem loop or cruciform structures.

Prokaryotic repeats have been classified according to different criteria like their total size, genomic distribution, coding capability as well as their number of occurrence in the genome.

Examples for different categories are simple sequence repeats (SSR), tandem repeats (TR), miniature inverted repeats (MITE), repetitive extragenic palindromic (REP) sequences and clustered regularly interspaced short palindromic repeats (CRISPRs). The 20-48 bp long CRISPR (242) repeats have been shown to play a role in the adaptive immune response of bacteria. REPs (243,244) are palindromic, 20-40 bp long DNA repeats which can occur as single units or in clusters, so-called bacterial interspersed mosaic elements (BIMEs). MITEs are generally less than 200 bp in length and require a transposase for transposition. They can fold into long stem-loop structures on RNA level and frequently carry functional motifs, such as promoter sequences or protein binding sites (245,246). TRs contain multiple units, which are directly repeated in a head-to-tail manner and span from 1-100 base pairs (247,248) (units with a size of 1-9, 10-100 and >100 bp are termed micro-, mini- and macrosatellites, respectively). They are found in a variety of prokaryotic species (249,250) and can show considerable differences even among closely related species (251), suggesting TR to be subject to evolutionary changes (252). Kashi and co-workers investigated tandem iterations in E. coli and found them to be under-represented in open reading frames (ORFs) when exceeding a length of 3 bp (253). Microsatellites with a length

19

of 1-6 bp – also termed SSRs (254) – participate in bacterial adaption (255,256): high mutation rates at repeat sites can lead to an expansion or contraction of the SSRs which is related to bacterial phase variation. Phase variation describes a specific ON- or OFF- switch of the gene expression of a given factor involved in the interaction with the host, such as the invasiveness or the adherence to host cells (257-259). Most repeats occur in intergenic regions up to 200 bp upstream of the start codon, containing proximal regulators of gene expression.

Nucleic acid repeats can have strong effects on the local DNA structure in the genome. They are prone to fold into hairpins or more complex structures. Sequences with the potential to adopt such non-canonical nucleic acid structures are abundant in eukaryotic and prokaryotic genomes. Recently, Huang and Mrázek presented a survey of local sequence patterns that promote non-canonical DNA conformations from 1,424 prokaryotic chromosomes (260):

They found that SSR are suppressed, whereas longer TR showed at least a slight over-representation in whole genome analyses across all phyla. Repeat sequences with the potential to form G-quadruplexes and H-DNA structures were found to be normally represented in most prokaryotic genomes with their analysis.

Both repeats and non-B-DNA structures have been associated with genomic instability.

Inverted repeats were found to cause deletions in E. coli as early as the 1980s (261-263).

Instability caused by TR sequences has been attributed to different hereditary diseases (264). Chromosomal plasticity in Pseudomonas fluorescens species has been linked to MITE sequences (265). REP sequences have been linked to genetic instability in E. coli toxin-antitoxin systems (266), and other repetitive sequences have been described in relation to genomic plasticity in bacteria (267,268). Most repeat sequences have the potential to fold into secondary structures on DNA and/or RNA level, as it has been described for pneumococcal bacteria (269). Also, those non-canonical nucleic acid structures are prone to interfere with translation, transcription, replication or recombination (see Chapter 1.2). The exact mechanisms of those influences, however, have not been elucidated to date. The function and role of many repetitive elements occurring in eukaryotes and prokaryotes is still unclear.

20

2 A

IM OF THIS

T

HESIS

Non-canonical nucleic acids have been investigated in detail for decades (see Chapter 1).

Quadruplex (see Chapter 1.1.1) and triplex structures (see Chapter 1.1.2) occur in G-rich sequence strains, and the evidence about their in vivo existence is increasing. Several studies suggested them to influence regulatory and life cycle states in cells, and many of these structures have been associated with human diseases (see Chapter 1.2). Although computational searches provided vast evidence for the occurrence of potential alternative structure motifs across all kingdoms of life, the concrete mechanisms of their influences and functions are unclear. Studies carried out in prokaryotic systems are particularly rare.

In this thesis, two topics – both dealing with G-rich alternative structures in prokaryotes – were covered: 1. Positional effects of G-quadruplexes on E. coli gene expression and 2.

Investigation of DNA triplex repeats naturally occurring in E. coli.

The aim of the first topic was to gain new insights into the secondary structure-mediated regulation of gene expression in E. coli. For this purpose, a series of reporter gene constructs containing systematically varied positions of G-quadruplexes were generated.

Those sequences were then inserted at several positions within the promoter, 5´-UTR, and 3´-UTR regions. In an engineered system, G-rich sequences in the vicinity of the ribosome binding site were analyzed for gene activating behavior. A possible activation mechanism has been proposed, which makes those designs suitable for the application in addressable systems. Furthermore, potential quadruplex forming sequences occurring naturally in the E. coli genome were investigated for their influence on gene expression. In addition, first studies investigating G-quadruplex sequences occurring in the ORF of the kdpD and kefC genes of E. coli and Salmonella subspecies were undertaken.

The aim of the second topic was to investigate a particular type of intrastrand triplex which has been described in earlier studies but whose function and exact structure never could be clarified. This motif was characterized by in silico and biochemical (CD, NMR, in vivo probing) studies. Furthermore, the genomic stability around this motif was investigated, and different mechanisms for its involvement in recombination or replication were proposed. We also investigated whether this motif is involved in the organization of the bacterial nucleoid.

This thesis also describes the collaborative design of a database allowing the search for intrastrand triplex motifs in 5,246 genomes of bacterial and archeal species. This way, intrastrand triplex motifs were found to be widely distributed in bacteria.

21

3 R

ESULTS AND

D

ISCUSSION