• Keine Ergebnisse gefunden

Fishing for (GGGAATC) 3 GGG Binding Proteins

3 Results and Discussion

3.2 G-rich Bacterial Repeat Sequences with the Potential to Fold Quadruplexes

3.2.9 Fishing for (GGGAATC) 3 GGG Binding Proteins

G-quadruplex structures can be stabilized and regulated by protein interactions. A number of proteins involved in G-quadruplex binding and resolving have been reported, especially in humans cells that bind to G-quadruplexes located in the telomeres, in promoter regions and also bind to RNA G-quadruplexes (295). G-quadruplexes need to be resolved during processes such as replication to avoid replication errors, helicases with G-quadruplex resolving capability have been described in humans, chicken, C. elegans, yeast and bacteria (43). Recently von Hacht et al.

identified several proteins that interact with G-quadruplexes forming sequences in the MMP16 and ARPC2 mRNAs in pull-down assays with whole cell extracts from different eukaryotic cell lines (296). Few proteins that interact with G-quadruplexes have been identified in prokaryotes.

RecQ is a helicase identified in E. coli that has been shown unwind G-quadruplexes in vitro (297).

RecQ is also involved in antigenic variation in N. gonorrhoeae (66,138,190). In addition the E. coli homolog of MutS was reported to bind to G-quadruplex DNA in vitro (298). Pif1 was originally identified as G-quadruplex helicase in yeast (67,239), but homologs are found in various prokaryotes (238); however G-quadruplex recognition of these homologous helicases has not been proven experimentally.

To identify possible protein interaction partners of the GGGAATC repeat sequences, pull down assays with Xcc cell extract were performed. Experimental details are described in Chapters 7.29 to 7.33. Briefly, fishing was performed with oligonucleotides corresponding to the GGGAATC G-quadruplex, the complementary C-rich strand and the respective duplex DNA. Scrambled oligonucleotides with the same G+C content were used to exclude proteins with general DNA binding properties. The oligonucleotides carried a biotin label at the 5’ end that was separated from the G-quadruplex forming sequence by 5 nt long linker sequence. The oligonucleotides were folded in presence of 10 mM K+, which was sufficient to induce G-quadruplex formation as observed by CD spectroscopy. Biotin labeled oligonucleotides were immobilized on streptavidin coated magnetic beads. As negative control beads were incubated with biotin only. Whole cell

extract of Xcc was prepared from an overnight culture grown in TY full medium. Fishing was performed in 1x PBS adjusted to 10 mM K+ to prevent G-quadruplex unfolding and in the presence of salmon testis DNA to reduce the number of unspecific binders. Proteins were analyzed by denaturing SDS-PAGE and gels were silver stained.

The biotin control showed many proteins binding unspecifically to the beads. No strong signals were detected in the washing fractions; however these samples were diluted in comparison to the eluates. One band between 70 and 100 kDa was enriched in the pull-downs with single stranded oligonucleotides in comparison to the duplex probes, which may indicate a protein binding to single-stranded DNA (Figure 43).

Figure 43: Silver Stained Gels of Pull-Down Assays

Protein fractions from the pull down assay were analyzed by SDS-PAGE and gels were silver stained. A: 8% gel. B: 12%

gel. Samples of the protein extract that was devided onto the magnetic beads before (protein extract) and after fishing (fishing), samples of the four washing steps (1st wash – 4th wash) and the eluted fractions of beads coupled with the G-quadruplex forming oligonucleotide (G3AATC QTP), the C-rich complementary strand (C3GATT), the annealed duplex probe (duplex), the respective scrambled oligonucleotides (G3AATC scr., C3GATT scr., duplex scr) and the negative control that contained only biotin coated beads, but no oligonucleotide probes. Arrows indicate two proteins that were found enriched in comparison to other fractions.

Another band was enriched in the pull-down with the G-quadruplex forming oligonucleotide between 55 and 70 kDA in the 8% gel (Figure 43A, arrow), but less prominent in the 12% gel (Figure 43B). It was also present in other fractions and therefore not further analyzed. A third band was enriched in the eluate of the pull-down with the C-rich complementary strand, just under 35 kDa (Figure 43B, arrow). The band was clearly visible and not present in the other eluates.

A G-quadruplex formed on the G-rich strand in the genome could keep the DNA in the single-stranded conformation and thereby facilitate protein binding to the C-rich strand or vice versa.

The respective band was cut from the gel and given to the Proteomics Facility of the University of Konstanz for protein identification using peptide mass fingerprint. The protein was identified as bifunctional acetyltransferase/isomerase WxcM (NCBI GI: 21230090, locus tag: XCC0615) with a molecular mass of 33.2 kDa. There is one publication available that describes WxcM to the part of a cluster of 15 genes that are involved in the biosynthesis of the lipopolysaccharide (LPS) O-antigen and the LPS-core. The N-terminal domain of the predicted bifunctional enzyme is similar to acetyltransferases, the C-terminal domain is similar to postulated isomerases (299). Analysis with the KEGG Sequence Similarity DataBase (300) (http://www.kegg.jp/kegg/ssdb/) showed no nucleic acid binding motifs, thus WxcM likely is a false positive. A second pull down assay was performed in which WxcM was not detected, nor any other specific binder (data not shown).

Van Hacht et al. carried out 5 rounds of pull-downs and only recurrently detected proteins were further analyzed. Detection of a protein that was enriched in all pull-downs with single stranded oligonucleotides in comparison to duplex DNA hints towards a single-stranded DNA binding protein which in turn suggests that the G-rich oligonucleotide may not have folded properly.

Generally, the influence of additional flanking bases may affect G-quadruplex folding and stability (253,254), attachment of the linker sequence and the biotin label may be detrimental G-quadruplex folding. In future experiments the biotin labels could also be attached at the 3’ end, the linker sequence could be varied and the pull-down assay could be carried out at higher K+ concentrations. Generally, unspecific binding to the beads was very high, a blocking step could be added to the protocol during which the beads would first be exposed to salmon testis DNA as blocking agent alone and the cell extract would then be added in a second step. Whole cell extract was also obtained from an overnight culture, a further variation to the protocol could be to obtain cell extracts from Xcc during exponential growth phase, where other protein factors may be present than during stationary phase.

3.2.10 Conclusions

GGGAATC / GGGGA(C/T)T repeat sequences are ubiquitous in the Xcc, Xac and Ana genome, they represent a special type of SSR as in addition to being repetitive sequences they also have the capacity to form G-quadruplex structures. We found these repetitive patterns to be present all over the respective genomes, however with a strong bias for non-coding regions. Remarkably, a clear preference for a unit size of four was detected, which corresponds exactly to the number of G-tracts needed for G-quadruplex formation.

Using CD-spectroscopy we were able to show that repeat comprising DNA oligonucleotides readily formed secondary structures with moderate to very high thermodynamic stability primarily stabilized by K+. Characteristic spectral changes and enhanced thermodynamic stability were observed under conditions favoring G-quadruplex formation. In addition no structural changes could be observed upon introduction of G to T mutations for the Xcc derived oligonucleotides.

Taken together this suggests that the adopted structures in presence of K+ are G-quadruplexes.

However CD spectroscopy alone is no definite proof for adoption of a G-quadruplex structure, and further analysis, e.g. NMR, need to be carried out to confirm this proposal. In addition we observed characteristic spectral changes that suggest i-motif formation of the complementary C-rich oligonucleotide, even at only mildly acidic pH of 6.5. Increasing ionic strength did not disturb i-motif formation. In case of inverted repeats there is the possibility of formation of stem-loop structures as well as G-quadruplexes, both secondary structures may also compete with each other. It is unclear whether such possible non-canonical nucleic acid structures are formed at the DNA or RNA level in the bacteria in vivo. However, analysis of RNA sequencing data published by Jalan et al. (285) showed that many of the repeat sequences in Xac are in fact transcribed, the G- as well as the C-rich strand was found to be part of transcripts. While DNA as well as RNA quadruplexes exist, formation of an i-motif on RNA level is regarded as less likely contrarily to G-quadruplexes (114), RNA i-motifs have been shown to be less stable than their DNA counterparts (301,302). To consolidate the hypothesis that the rich patterns studied are prone to G-quadruplex formation, one could analyze if the repeats coincide with sites of duplex destabilization. SIDDBASE (303) is a database containing the stress-induced DNA duplex destabilization (SIDD) profiles of complete microbial genomes that could be used for the analysis of Xcc, Xac and Ana genomes.

A preference for these G-rich repeats to be located in close proximity to the ORF either upstream on the anti-sense (non-coding) strand or downstream on the sense (coding) strand was detected for in all three organisms. These locations are prone to exhibit gene regulatory effects. A variety of possible cellular functions have been attributed to G-quadruplexes as has been reviewed by Bochman et al. (25). For instance putative regulative roles of G-quadruplex structures formed

transcription by keeping the DNA strands separated, or even promotion or repression of transcription by recruitment of G-quadruplex binding proteins that may in turn interact with the RNA polymerase. Recently, Holder and Hartig showed that in E. coli G-quadruplex sequences can have activating as well as inhibitory effects on gene expression that largely depend on the exact location of the non-B DNA element within the promoter region or at the ribosomal binding site (26). Similar gene regulatory effects have also been observed for SSRs located upstream of an ORF, e.g. by overlapping with binding sites of regulatory proteins or variation of spacing between promoter elements (193,194).

Comparison of assembled transcripts carrying repeats with G-quadruplex forming potential versus a non-quadruplex forming control group indicated several groups of repeats that are interesting targets for further studies: The first group comprises G-quadruplex with a potential regulative role in translation. G-quadruplex forming repeats were especially under-represented on polycistronic transcripts, which had also been observed by the analysis of predicted operons.

Repeats that are nevertheless found at this position may point towards a role in differential translation of the encoded messages. The second group contains repeats that indicate a role in transcription termination. Although transcription termination has so far not been shown to be a putative cellular role of G-quadruplexes, we found transcripts stopping during a C-rich repeat sequence and being sense to the last gene on the transcript. If the C-rich repeat strand is found on the transcript, then the G-rich strand was used as template during transcription. G-quadruplex formation in the single-stranded DNA template might directly hinder polymerase progression.

The third group contains repeats located in the 5’ UTR. Transcripts starting within a G-rich repeat were detected. G-quadruplex formation would then be possible in the 5’ UTR of the RNA, which may affect binding of the ribosome to the ribosomal binding site. Repeat variation at this position could also influence the spacing between the transcription start site and the ribosomal binding site. Furthermore, transcripts starting with a C-rich sequence were also enriched for G-quadruplexes forming repeats in comparison to the control group. Here, a G-rich sequence would be found in the DNA template surrounding the transcription start site. However the data set used in this preliminary study was very small; further sets of RNA sequencing data should be analyzed to verify the observed effects, before picking targets for biochemical studies. In addition 5’ or 3’

RACE PCR could be carried out to validate the respective start and end point of a transcript of selected candidates (289).

On the other hand, mutations in repeats located downstream of ORFs or within ORFs may have been brought about with the aim to in fact avoid transcription termination as pre-mature termination may ultimately lead to aberrant or truncated proteins. In fact the ability and likelihood to mutate, e.g. by extension or contraction of repeat units, have been shown to be greatly decreased by the introduction of just a few mutations in a tandem repeat sequence (304,305).

Generally, we found location of repeats between divergent ORFs to be under-represented. In this case G-rich repeats may overlap with promoter regions of several genes. Possible secondary structure formation or repeat expansion in this region may interfere with the promoter function of both genes. Under-representation of GGGAATC / GGGGA(C/T)T motifs at such a position may indicate that formation of non-canonical nucleic acid structures by the repeats might well be possible in vivo and therefore be avoided in this particular region. This goes hand in hand with repeats being underrepresented on the coding strand within ORFs in all three organisms, where G-quadruplex formation may cause polymerase stalling or induce frame-shifts (135,136). This is in accordance with observations made by Mrázek and Huang, who report heptameric repeats in intragenic regions to be normally represented in Xanthomonas species and cyanobacteria. In addition intragenic G-quadruplex patterns were also normally represented in all xanthomonads (24). Generally, Lin and Kussell found SSRs to be suppressed in the middle of coding regions in prokaryotes, but enriched near the termini. SSRs were especially over-representated close to the N-terminus indicating involvement in phase variation by frame-shifting (184).

Analysis of the repeat associated genes in all three organisms showed them to be moreover randomly distributed across the different functional classes and belonging to the general metabolism. The same observation has been made by Guo et al. who also found SSRs to be associated with housekeeping genes, e. g. rRNA and tRNA genes, ribosomal proteins, amino acyl-tRNA synthetases, chaperones, and important metabolic enzymes (203). Repeats involved in phase variation have been shown to be associated with cell surface structures such as antigens (27,28,194,195). In addition a G-quadruplex in Neisseria gonorrhoeae has been shown to promote antigenic variation (66,138,190). While genes encoding cell wall components and pilis were among the repeat associated genes, the great number of genes belonging to the general metabolism makes an role of GGGAATC and GGGGA(C/T)T repeats in phase variation unlikely.

Furthermore, SSRs involved in phase variation tend to be A/T rich (141), which facilitates melting and increases the likelihood to evolve via strand slippage mispairing during replication (46,276).

It is noteworthy that so far we have only identified GGGAATC repeat patterns within the genus Xanthomonas, but not in other γ-proteobacteria such as the enterobacteria E. coli, Salmonella enterica or the plant pathogen Erwinia amylovora. In these organisms G-quadruplex patterns also have been found to be not be abnormally represented in non-coding regions (24). The genus Xanthomonas shows a high degree of host plant specificity and may even show tissue specificity.

In addition to infecting different dicotyledonous hosts, Xcc invades the vascular system of the plant while Xac infects the mesophyll tissue (271). However repeats were often found associated to similar genes in Xac and Xcc and not exclusive to pathogenicity related genes, this makes a function of the repeats in pathogenicity or pathogen-host interaction unlikely.

While the majority of repeats are found between the same genes in Xcc and Xac, we found extensive length and sequence variation of the intergenic patterns even between these closely related organisms. It was hypothesized that the increased abundance of heptameric repeats in bacteria might be related to the size of the DNA segment that interacts with the active site of the DNA polymerase, which may lead to increased occurrence of polymerase slippage for this pattern type (155). Joukhadar and Jighly formulated the hypothesis that microsatellites may even grant more stable flanking genes. SSRs may be able to discard weak DNA polymerases, thereby increasing the opportunity of the flanking genes to be replicated by more stable DNA polymerases (306).

A number of the genes that were differentially regulated under hyperosmotic shock conditions in E. coli were found to be homologs of repeat associated genes in Xcc. Finding of GGGAATC patterns in proximity to genes involved in osmoadaption in combination with increasing K+ levels and negative supercoiling conditions during hyperosmotic challenges and G-quadruplex formation of 5’-(GGGAATC)3GGG-3’ in KCl solution prompted us to investigate the expression levels of repeat associated genes under osmotic shock conditions to check for G-quadruplex dependent regulation of transcription. Therefore conditions that induce hyperosmotic shock for which no literature was available concerning Xcc were established and validated by qPCR. An overall trend of GGGAATC repeats as gene regulators was not observed among the genes tested in this initial screen.

However a general function as gene regulators should not be ruled out, as hyperosmotic shock is only one of many different triggers possible. Nevertheless, two repeat sequences were found in an initial screen of selected repeat associated genes that are promising targets for future studies:

repeat #120 near osmC and the inverted repeat #092/#093 located between flgE/flgF. More than 4-fold upregulation for osmC and more than 6-fold downregulation for flgE/flgF were observed under hyperosmotic challenge with sucrose and sorbitol. In both cases the repeat sequence is located upstream of an ORF and could act as a regulator of transcription, or also translation in the case of flgE. Further biochemical studies, e.g. chemical probing to test for G-quadruplex formation, and treatment with G-quadruplex binding small molecules during osmotic shock to enhance changes in gene expression could be carried out to assess a putative physiological roles of the respective GGGAATC as transcriptional regulators.

Finally, G-quadruplexes and inverted repeats also have to be considered as possible targets of protein binding factors. However, pull-down assays from Xcc cell extracts so far did not reveal a protein interaction partner of the G-quadruplex forming sequence from Xcc. Especially regarding inverted repeats we did not find the sequence between the repeats to be complementary to any other region in the respective chromosome, and therefore exclude an anti-sense effect of the interrepeat sequence. A potential effect may be structural by formation of a stem-looped structure that might also be the target of proteins. Mrázek and Huang reported palindromes and inverted repeats to be over-represented, especially in intergenic regions (24). Both sequence types can give

rise to stem-loop-structures or cruciform, which have been implicated to influence replication, regulate of gene expression and recombination (42).

In conclusion, in this project G-rich heptameric repeats of the type GGGAATC in the xanthomonads Xcc and Xac and GGGGA(T/C)T repeats in the cyanobacterium Ana were characterized with respect to their genomic distribution and location, length and sequence variability as well as functions of the associated genes and their relative orientation to the repeats. We found evidence for G-quadruplex forming potential of the repeat sequences in biophysical studies, and hints that also structure formation in the C-rich strand may be possible. However, the overall picture regarding a biological role remains undefined: Generally, both the G-rich and the C-rich repeat sequences were found to be transcribed in Xac. Preliminary results of the analysis of whole transcriptome sequencing data of Xac point towards a variety of putative regulative function including transcription termination as well as regulation of transcription and translation.

Futhermore qPCR analysis of expression levels of repeat associated genes under hyperosmotic shock in Xcc indicated two repeats that may act as regulators of transcription. Further biochemical

Futhermore qPCR analysis of expression levels of repeat associated genes under hyperosmotic shock in Xcc indicated two repeats that may act as regulators of transcription. Further biochemical