• Keine Ergebnisse gefunden

Discussion conserved non-coding elements

The focus of CNE research so far has been on vertebrate genomes (Polychronopoulos et al., 2017). There are quite a few species and lineages were vertebrate CNEs were identied, also between quite distantly related species such as human and puer sh, where the last common ancestor occurred 430 years ago (Aparicio et al., 1995). Even after quite a long divergence time vertebrates still tend to have a high sequence sim-ilarity between species. This shows for example in the high alignment rate between human and puer sh (12 % of the puer sh genome can be aligned to the human genome). In insects, the focus lies on UCEs and is mostly centred on Drosophilids.

Insect genomes are more divergent after the same time span than vertebrate genomes.

Between dierent Hymenoptera genomes with a divergence time of 240my(Misof et al., 2014), we were able to align 2-10 %. This alignment rate was enough to identifyCNEs, as these are highly conserved regions that are of interest of us.

The biggest hurdle for the identication of CNEs is the availability of well sequenced and annotated genomes of species that are closely related. To be able to identify

CNEs at least one well sequenced and annotated genome is necessary, depending on the method used to identify CNEs. This one species is used as a reference to identify conserved regions in other genomes, regardless whether whole genome alignments or a sliding window approach are used.

Using whole genome alignments for CNE identication requires more good quality genomes with good annotation as well as specialised seeding schemes for the species that are aligned. In our work we used the MAM8 seeding scheme, which is based on the substitution patterns in mammals (Frith and Noé, 2014). TheWGAswe used might be improved by using an insect or arthropod specic seeding scheme, which does not currently exist.

So far, the focus on the gene that the CNEor the whole cluster is interacting with, lay on protein-coding genes. We could show that lncRNAs are also in distances and orien-tation to CNE clusters that could point towards an interaction between these two and an additional protein-coding gene. We calculated thelncRNA-protein-coding gene ratio for each species both for the whole annotation and the identied cluster partner. For N. vitripennis we noticed that anlncRNAwas twice as highly likely to be neighbouring a cluster in cis-direction than would be expected if this was a random occurrence. Of course this number highly depends on the annotation of the genome and the assem-bly quality, as some studies assume that the number of lncRNAs vastly outnumbers the protein-coding genes (Quinn and Chang, 2016). Also, the total number of genes (including N/A) might be lower than expected due to the cluster number, because in some cases a cluster was found between a gene and another cluster. These two clusters then had the same gene identied as neighbouring.

Still we found that the majority of our gene-CNE cluster neighbours were not real in-teractions due to either no gene being found next to the cluster in a 500 kb distance or the gene having an orientation towards the cluster that is not cis. Because very little is known about the interaction betweenCNEsand their genes we made the assumption that their orientation to each other is important. If future research into this topic shows that their orientation is not important, our results regarding how many genes are identied as potential interaction partner for a CNEcluster could change consider-ably.

It has been shown that the protein-coding genes associated with CNEs are mostly in-volved in developmental regulation. This is also an area where lncRNAs have one of their functions. The problem with lncRNAs is that their general functions are known, as in what the whole class does, but only for a small number the function of a specic

lncRNAis known, such as sphinx that regulates the male courtship behaviour (Legeai and Derrien, 2015). The combination of their high abundance and their presumed function makes them a point of interest regardingCNEs. So far a possible interaction betweenCNEs, a developmental protein-coding gene, and anlncRNAhas not been stud-ied. Because we only did computational analysis ofCNEswe cannot say that thelncRNA

neighbouring a CNE is really involved in a CNE-gene-interaction. But it presents an idea that should be further looked into, i.e., looking into genomes with better studied

lncRNAs to see if this relationship also exists there and using experimental set ups.

We looked at the synteny between those CNE clusters that have an lncRNA as their possible interaction partner. We found out that there does seem to exist synteny of the clusters between dierent species, as in all cases we looked at this synteny was at least partially conserved. However, the clusters might not be identied in one species because the singleCNEs inside have a larger distance to another than we dened as a cluster. Recombination maps would be an interesting further study to see how much recombination actually happens inside a CNE cluster. We did not look into what cre-ated the dierent distances between single CNEs. Another point is that the denition of a CNE cluster is somewhat arbitrary set to a maximum distance of 20 kb between toCNEs. An expansion of our cluster denition could probably show that our syntenic

CNE hits are arranged in clusters in more than one genome, but only if no rearrange-ments of the genome happens. As we did not look into those clusters that had only protein-coding genes next to them, we cannot conclude an association between the

lncRNAs and the synteny.

Some studies have shown that these conserved regions harbour transposable elements, although it is not clear yet whether the insertion of TEs is enhanced in these regions

(Manee et al., 2018). Inserted TEs could be responsible for the dierent distances.

To show if TEs are indeed found between the dierent CNEs a comparison with a TE annotation of the genome is needed.

Our study showed thatCNEsare still identiable over an evolutionary distance of 240my in insect lineages with a low similarity between their genomes. In vertebrates it has been shown that the CNEs that are conserved between distantly related lineages are dierent from those found in closer related groups (i.e. mammals), meaning that there is only a partial overlap between these CNE groups (Woolfe et al., 2004). It would be interesting to see if this also holds true for Hymenoptera. Also, it would be interesting to see how large the divergence time between species has to be before no CNEs are identiable anymore. This also raises the question if there areCNEsthat are conserved in all Metazoa. The rst step would be to look how much of these highly diered genomes can still be aligned.