• Keine Ergebnisse gefunden

Application of GenCoDB to the division cell wall cluster

2. GenCoDB - A statistical tool for genetic context conservation analyses in bacterial genomes

2.4 Application of GenCoDB to the division cell wall cluster

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

30 hovered over showing various identifiers for that gene and links to the represpective databases. (C) Once

users are happy with their filtered selection of neighbours, they can export it to the neighbourhood view to see it as a quantitative histogram.

Data availability. Through the user interface every graph is available to download, both in *.svg and *.png formats, allowing the effortless generation of publication-quality graphics. Both the neighbourhood and genome view allow for download of the raw data in comma separated value (csv) format. In particular, the *.csv files available from the neighbourhood view contains a row for each ortholog group in the displayed neighbourhood, with columns containing the frequency of that ortholog group appearing in the 25 up- and downstream positions surrounding the seed gene.

The genome view produces a *.csv file which has a row for each selected genome and in the columns the ortholog group assigned to the genes in the 25 up- and downstream positions surrounding the seed gene. Both of these *.csv files reflect the settings selected in the user interface, including the database correction, genome selection and orientation options (a + or – will be placed before ortholog group IDs to signify relative orientation to the seed gene). These *.csv files allow for the reproduction of the graphs with other visualization strategies or for further downstream analyses.

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

31

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

32 9 - Figure 2.8 – Analysis of the DCW cluster in GenC

(A) The neighbourhood of mraY including all genomes in which mraY is found. (B-E) The neighbourhood of mraY with genomes from only Proteobacteria, Bacteroidetes, Actinobacteria and Firmicutes respectively.

The legend for the coloured bars is found in A. Black bars represent ortholog groups that were not in the top 50 most conserved groups when considering all genomes and certain groups have been labelled for convenience. (F) A selection of genomes from firmicutes showing the distribution of genes around mraY.

Each arrow represents a gene, and the colour the assigned ortholog group. Purple represents mraY and the other colours match the legend in A and B with slight opacity. Black arrows represent genes that are not displayed in the histogram view as they are not considered significantly conserved. (G,H) The neighbourhood of mraY with a custom selection of genomes either of rod-shaped bacteria (G) or cocci and spiral shaped bacteria (H). The colour of the bars is consistent with the legend from A and B.

Curious about these rearrangements, we wanted to see how these changes were distributed across the bacteria kingdom and if they were localized to particular taxa. Viewing mraY in the tree view we see that the conservation score is strikingly lower in the Cyanobacteria and delta/epsilon subdivisions of Proteobacteria (17.37 and 19.56 respectively) (Figure 2.5). A closer inspection in the neighbourhood view confirmed that in many genomes of these sub taxa the neighbourhood around mraY was gone (Figure 2.9). We also noticed that the conserved synteny whilst mostly similar across the different taxa was much smaller in firmicutes than the other phylum even through the conservation scores were relatively similar and in fact slightly higher than proteobacteria or bacteroidetes which have large conserved syntenies (30.78 vs 28.41 and 28.27 respectively).

Therefore, to investigate why this was the case we restricted the number of genomes to the main 4 phyla in our database: Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes.

10 - Figure 2.9 – Loss of conserved genomic neighbourhoods of mraY in some taxa

The genomic context surrounding mraY from delta/epsilon Proteobacteria genomes (left) and Cyanobacteria genomes (right). Height of the bar represents the conservation of an ortholog group, with colours signifying the different ortholog groups. Legends are found in the top right-hand corner of each histogram.

In Proteobacteria and Bacteroidetes we only see slight disruption and when we explored further down the taxonomic tree there was no to little disruption in the gamma and beta proteobacteria (Figure 2.8B and C, Figure 2.10). In addition to the core DCW genes, many Proteobacteria

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

33 genomes contain ddlB and ftsA (Figure 2.8B), being co-conserved with mraY in 56% and 67.5%

of neighbourhoods, whereas Bacteroidetes contains a glutamyl-tRNA aminotransferase (yqeY) at 45.9% (Figure 2.8C). However, in actinobacteria we see the association of 5 additional ortholog groups, namely a pyrodoxal phosphate homeostasis protein (ylmE) (56.6%), a polyphenol oxidase (ylmD) (47.6% - below the significance threshold), Cell division protein SepF (69.7%), an uncharacterized membrane protein (ylmG) (58.4%) and a divIVA domain containing protein (Figure 2.8C) (72.6%). Furthermore, these were accompanied with the loss of ftsA (Figure 2.8C).

11 - Figure 2.10 – Strongly conserved genomic neighbourhoods of mraY in some Gamma/Betaproteobacteria

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

34 The genomic context surrounding mraY from Gammaproteobacteria genomes (left) and Betaproteobacteria

genomes (right). Height of the bar represents the conservation of an ortholog group, with colours signifying the different ortholog groups. Legends are found in the top right-hand corner of each histogram.

Cell wall synthesis and division occurs differently in actinobacteria, as unlike other rod-shaped bacteria they elongate their cell wall from their poles and not laterally along the cell length (reviewed in (Pamela et al. n.d.)). Also, in contrast to the other phylum, a major player in the divisome, FtsZ, is not essential for growth for Actinobacteria (McCormick et al. 1994), therefore it is unsurprising there would be difference in the division cell wall cluster. The biological function of many of the newly introduced genes to the cluster have yet to be determined however given that in our tool we see them associated with the DCW cluster, a role in cell division or septum formation seems likely. Indeed, SepF and divIVA which do not have close homologs in non-terrabacteria genomes, have been shown to be crucial in Z-ring formation leading to division (Hamoen et al.

2006). ylmE and ylmD have orthologs in the majority of bacterial species however according to our tool, they do not have significant conservation partners except in terrabacteria (Figure 2.11). In Streptomyces venezuelae these two genes were deleted and there was no observable impact on growth rate, septum formation or sporulation (Santos-Beneit et al. 2017), however this does not preclude a role in cell wall synthesis and in non-laboratory conditions. For instance in E.coli (not from Actinobacteria) yfiH (the ylmD homolog) was shown to be involved in preventing non-canonical amino-acids from being incorporated into the peptide chain in place of L-alanine (Parveen and Reddy 2017). As actinobacteria do not have D-alanine--D-alanine ligase (ddlB) in their DCWcluster, perhaps ylmD provides a complementary function. We also found evidence that uncharacterized membrane protein (ymlF) may have a small role in cell division as a knock out mutant for this gene in Streptococcus pneumoniae was shown to have thinner septums and increased numbers of tetrads and diplococci suggesting incomplete division (Fadda et al. 2003), and in chloroplasts, the ylmG ortholog when overexpressed, impaired chloroplast division and distribution of the chloroplast nucleoids(Kabeya et al. 2010). Therefore, one could extrapolate, in the absence of such confirmatory literature the functional predictive power conserved neighbourhoods provide and how this could be used to help functionality characterise currently under researched ortholog groups and the role players in novel phenotypes unique to certain clades.

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

35 12 - Figure 2.11 – Evolution of conserved neighbourhoods of ylmE and ylmD

The evolutionary history of the genomic neighbourhoods of ylmE (left) and ylmD (right). The circles represent the conservation score of the ortholog group at that taxonomic level. The arrows below the circles represent the most conserved synteny surrounding the seed gene at the taxonomic level. A gene will only be considered part of the synteny if its conservation is above the significance threshold at that position. The seed gene is always shown in dark green.

In Firmicutes there is significant disruption both on 3’ and 5’ ends of the cluster (Figure 2.8E).

Upstream of mraY the same genes are conserved as found in the other phyla but it appears the order of the genes is greatly intermingled, however closer inspection of the genomes in the genome view shows that the relative order of these 5 genes remains the same but they have been randomly interspaced with other genetic elements (Figure 2.8.F). Downstream of mraY there is a loss of murC and ddlB and in some cases the addition of the same ortholog groups that appeared in the actinobacteria neighbourhood, such as the DivIVA domain-containing protein and the uncharacterized membrane protein (ylmG) (Figure 2.8E). Given Actinobacteria and Firmicutes share a more recent common ancestor compared to the other phyla it is reasonable that they share common rearrangements in this cluster and signifies these changes likely occurred before the division of terrabacteria. Two new additions to this cluster were the RNA polymerase sigma factor RpoD, and an RNA-binding protein (Figure 2.8E). RpoD is the housekeeping sigma factor active during exponential growth and up-regulates genes associated with fast growth such as translation associated proteins (Ozaki et al. 1991). As cell wall synthesis and cell division also occurs at a faster rate during high growth rates and less required during other phases perhaps associating this cluster with this sigma factor may allow for faster response times to changes in nutritional availability.

Here, we would like to mention two important notes that highlight the benefits of the flexible customization of GenCoDB. Firstly, at the default parameters, the murE and murF genes belong to the same ortholog group. This would occur if the protein sequences of these two genes are very similar to one another and it has been shown that murE and murF despite not having over high sequence similarity have highly conserved motif regions and most likely diverged from a recent

GenCoDB – A statistical tool for genetic context conservation analyses in bacterial genomes

36 common ancestor (Bouhss et al. 1997). Without prior knowledge of the cluster it is not clear these

are two separate genes with two functions, however by adjusting the ortholog grouping level to a lower level, in this case phylum, murE and murF cluster into distinct groups. Secondly ftsL is only found in the Proteobacteria despite it being highly conserved in this cluster in all phyla. This is because despite being recognized as the same gene, the differences in the sequences of ftsL cluster the proteins separately even in the least sensitive of ortholog group levels.

Using these observations and then connecting them with literature confirms the power of genetic context analysis for hypothesis generating however it can also be used to provide confirmatory evidence of research questions. Tamames(2001) found that the conservation of this cluster was correlated with cell morphology, specifically rod-shaped cells. To test this observation we looked at the neighbourhood of mraY in the known rod shaped Bacilli and other filamentous bacteria (e.g Actinomyces, Clostridium, Enterobacter) and compared that to a neighbourhood of non-rod bacteria (e.g coccoids such as Streptococcus, Enteroccocus and Neisseria bacteria and spiral shaped bacteria from Helicobacter, Campylobacter and Leptospira). Here we see that in the rod bacteria there is a strong conserved neighbourhood around mraY however in spiral and coccoidal bacteria there is no to little conservation surrounding mraY, confirming what was reported by Tamames(2001) (Figure 2.8G and H). Given this striking difference it is tempting to propose that if through random rearrangement events the DCW cluster is broken, the interplay between the different proteins of this cluster is demolished and the coordination required to form a rod shaped cell wall is lost. Alternatively, this evidence could suggest that the selective pressures that maintain the DCW cluster are only present in rod shaped bacteria, and if they lose this cell morphology through disruptions in other parts of the genome, reshuffling of the DCWcluster would then be permitted. Further in lab investigation especially looking at the organisms which do not follow the trend of being rod shaped with a DCW cluster would be required in order to to tease these two alternatives apart.