• Keine Ergebnisse gefunden

The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage

Comparative Genomics of Green Sulfur Bacteria

2. The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage

2.1 Background

This part summarises the development and implementation of a web accessible genome browser tool and practical applications of oligonucleotide usage, including examples for multiple genomes.

This application was used in the manuscripts listed in section 1.2. Oligonucleotide usage is based on statistical parameters of normalised short oligomers counts in DNA sequences (Karlin et al. 1997).

Oligonucleotide usage has been known for some time to be useful in discerning between two genomes which maintain different patterns (Karlin et al. 1997, Abe et al. 2003, Pride et al. 2003, Reva and Tümmler 2004, Teeling et al. 2004). Furthermore, these patterns can also be useful within a genome for discerning between core and accessory genomes (Karlin 2001, Reva and Tümmler 2005, Mrázek et al. 2009). The core genome includes regions where GC content and oligonucleotide patterns are close to the genomic average (Karlin et al. 1997, Reva and Tümmler 2005). In contrast, accessory genomic elements such as integrated plasmids, phages or genomic islands typically divulge different oligomer patterns. Finally, the process of amelioration describes the tendency of monomer or oligomer patterns in blocks of atypical DNA to approach that of the genomic average (Vernikos et al. 2007). This process is difficult to observe or test, but by inference has been linked to replication or repair systems in the genome, and DNA conformational tendencies (Lawrence and Ochman 1997, van Passel et al. 2006). Fascinatingly, plasmids have been demonstrated to maintain a different oligomer signal to the host chromosome (van Passel et al.

2006), perhaps suggesting that other mechanisms are also at work. These might include the frequent uptake of novel DNA by plasmids (van Passel et al. 2006, Cortez et al. 2009), different replication and repair or restriction systems, or the ability of plasmids to be tranferred between genomes as part of the "mobilome" (Cortez et al. 2009).

2.2 About the manuscript

The SeqWord Genome Browser was implemented as a Java applet which runs in a web browser and links to a database of precalculated nucleotide patterns. The main achievement of this work was to allow easy access for all researchers to oligonucleotide pattern data for all public completely sequenced prokaryotic genomes. This tool facilitates analysis at multiple genome scales between comparative genomics of complete genomes right down to single genes across several species.

Part 2: The SeqWord Genome Browser

Algorithms had been previously derived and listed in previous work (Reva and Tümmler 2005). The browser applet is flexible, allowing users to browse genomes with annotations in a linear fashion, or alternatively, to view genomic fragments (eg. 10 kbp in size). Fragment view facilitates a novel interpretation of genomes or genes, as the genomic regions may be plotted in three dimensions according to various oligonucleotide parameters. As oligonucleotide parameters contain structural and/or phylogenetic signal, conserved and accessory genomic regions can be differentiated. For example, the small chromosome of Vibrio cholerae El Tor N16961 has large regions with highly variable genomic content. This fascinating replicon harbours an integron island termed a "gene capture system" and contains multiple regions of putatively horizontally acquired DNA, even from outside the gammaproteobacteria (Heidelberg et al. 2000). It seems the large scale integration of novel DNA into this chromosome is sufficient to disrupt its global oligonucleotide usage. In other words, according to oligonucleotide usage this chromosome does not consist of well defined core and accessory regions, but is a true mosaic of genetic elements from diverse origins (Heidelberg et al. 2000). Interestingly, the larger chromosome of V. cholerae displays considerably different oligonucleotide usage patterns, and the small chromosome has even suggested to be a captured megaplasmid (Heidelberg et al. 2000). Another use case on single genes is the absence of any hallmarks of divergent oligonucleotide usage in the narG gene (Palmer et al. 2009). Furthermore, classes of genes such as long modular genes, ribosomal RNA or ribosomal proteins with distinct oligonucleotide properties could be located. This web service thus allows tentative assignment of outlier novel fragments to a few classes of genes. SeqWord is available online from sites at Pretoria, South Africa and Hanover, Germany.

The SeqWord Genome Browser was published in BMC Bioinformatics in 2008. I was mainly involved with writing the manuscript, assessing the scientific content of the system and maintaining the mirror site server in Hanover. Oleg Reva was the architect of the whole project, while Anna Rakitianskaia and Hamilton Ganesan did the programming and technical aspects. Burkhard Tümmler had previously developed the statistical parameters used in this work.

2.3 References

Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T. & Ikemura, T. (2003) Informatics for unveiling hidden genome signatures. Genome Res 13(4) 693-702.

Part 2: The SeqWord Genome Browser

Cortez, D., Forterre, P. & Gribaldo, S. (2009) A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes. Genome Biol 10(6) R65.

Heidelberg, J. F., Eisen, J. A., Nelson, W. C., Clayton, R. A., Gwinn, M. L., Dodson, R. J., Haft, D.

H., et al. (2000) DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae.

Nature 406(6795) 477-483.

Karlin, S., Mrázek, J. & Campbell, A. M. (1997) Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol 179(12) 3899-3913.

Karlin, S. (2001) Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol 9(7) 335-343.

Lawrence, J. G. & Ochman, H. (1997) Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44(4) 383-397.

Mrázek, J. (2009) Phylogenetic signals in DNA composition: limitations and prospects. Mol Biol Evol 26(5) 1163-1169.

Palmer, K., Drake, H. L. & Horn, M. A. (2009) Genome-derived criteria for assigning

environmental narG and nosZ sequences to operational taxonomic units of nitrate reducers. Appl Environ Microbiol 75(15) 5170-5174.

van Passel, M., Bart, A., Luyf, A., van Kampen, A. & van der Ende, A. (2006) Compositional discordance between prokaryotic plasmids and host chromosomes. BMC Genomics 7(1) 26.

Pride, D. T., Meinersmann, R. J., Wassenaar, T. M. & Blaser, M. J. (2003) Evolutionary

implications of microbial genome tetranucleotide frequency biases. Genome Res 13(2) 145-158.

Reva, O. N. & Tümmler, B. (2004) Global features of sequences of bacterial chromosomes,

plasmids and phages revealed by analysis of oligonucleotide usage patterns. BMC Bioinformatics 5, 90.

Reva, O. N. & Tümmler, B. (2005) Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 6, 251.

Teeling, H., Waldmann, J., Lombardot, T., Bauer, M. & Glöckner, F. O. (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5, 163.

Vernikos, G. S., Thomson, N. R. & Parkhill, J. (2007) Genetic flux over time in the Salmonella lineage. Genome Biol 8(6) R100.

BioMedCentral