• Keine Ergebnisse gefunden

2. MATERIALS AND METHODS

2.5. Molecular methods

2.5.13. Sequence analyses (filtering and clustering of raw reads)

2.5.13.1. Analyses of sequences derived from pyrosequencing

The sequence analyses of bacterial genes (i.e., 16S rRNA and mxaF/xoxF gene sequences) was conducted at the department EMIC of the University of Bayreuth and the sequence analyses of fungal genes (i.e., ITS gene sequences) was conducted at Department of Soil Ecology, UFZ - Helmholtz Centre for Environmental Research, Halle by Dr. Guillaume Lentendu. Thus, the filtering of sequences and the clustering of amplicon pyrosequencing reads were different. Finally, clustered reads (according to similarity) were used for further analyses (see 2.5.14, 2.5.15, 2.5.17).

Recovered reads of bacterial genes were trimmed to 446 nt (for 16S rRNA amplicons) and 461 nt (for mxaF amplicons), so that the reverse primer sequence was mostly removed.

Amplicon pyrosequencing errors were corrected using ACACIA, i.e., homopolymer error-correction and low quality reads were discarded [Bragg et al., 2012]. Potential 16S rRNA chimeric sequences were filtered out using UCHIME algorithm implemented in USEARCH and the latest RDP Gold database for high quality 16S rRNA gene reference sequences [Edgar et al., 2011]. Before sequence clustering initial barcode sequences were modified to re-assign amplicons (see Table A 1). Using JAguc v2.1 [Nebel et al., 2011] sequences were clustered into operational taxonomic units (OTUs) using a pairwise sequence alignment before creating a distance matrix and clustering with the average similarity method. Only sequences with the correct forward primer sequence were further analysed. OTUs of 16S rRNA were clustered on family level with 90.1 % as pairwise similarity cut-off value [Yarza et al., 2010] and mxaF OTUs were clustered with a cut-off value of 90 %. The mxaF cut-off was higher than previously reported [Stacheter et al., 2013] to obtain a higher diversity with regard to exhibit still a relative constant number of retrieved OTUs (Figure 32). An overview of the total number of sequences derived from pyrosequencing and after clustering is given in Table A 4. Phylogenetic affiliation of ribosomal sequences was done by a local nucleotide BLAST using the latest NCBI GenBank release. Affiliation was verified by manual BLAST of the OTU’s representative sequences and phylogenetic tree using MEGA Version 6.06 [Tamura et al., 2013]. Affiliation of mxaF OTUs was performed by manual BLAST and phylogenetic trees.

Figure 32 Correlation between the number of detected phylotypes and the nucleotide sequence similarities of mxaF gene sequences.

The shown correlation between all detected phylotypes and the nucleotide sequence similarity based on all detected mxaF gene sequences of both SIP experiments (i.e., 113 689 sequences) was used to determine the similarity threshold value of 90 % chosen for clustering and further analyses. Inset focuses on a sequence similarity range between 80 % and 100 %. This figure has been published in Morawe et al. 2017.

Recovered reads for fungal genes were demultiplexed and quality trimmed using MOTHUR [Schloss et al., 2009]. Reads that met the following criteria were further analysed: holding one of the expected barcodes (1 mismatch allowed, for barcode sequences see Table A 2) and the forward fusion primer sequence (includes ITS4, 4 mismatches allowed), with a minimum length of 355 nt, a minimum average quality of 29 Phred score over the 355 first nucleotides, maximum homopolymer length of 8 nt, and without ambiguous nucleotides. The reads were cut to their 355 first nucleotides to avoid low quality ends and length sorting in the following clustering step. Normalised reads (1503 counts per sample) were checked for chimeric sequences using UCHIME [Edgar et al., 2011] as implemented in MOTHUR. Unique sequences were sorted by decreasing abundances and were clustered into OTUs using CD-HIT-EST [Fu et al., 2012] at a 97 % pairwise similarity cut-off value. Low abundant OTUs with 3 or less reads were removed as they potentially originated from artificial sequences [Kunin et al., 2010]. Representative OTU sequences were classified against the dynamic UNITE database (v7 release 01.08.2015 [Kõljalg et al., 2013]) using the MOTHUR implementation of Wang et al. (2007) classifier. Sequences that could not assigned further than to the kingdom Fungi were classified for a second time against a previous database including non-fungal ITS sequences retrieved from GenBank (release 207, accessed on 06.05.2015 [Benson et al., 2008]) in order to detect and remove non-fungal sequences. Subsequently, remaining sequences assigned to the Fungi kingdom only were classified against the full UNITE database to improve the taxonomic affiliation. In addition, reference sequences of selected OTUs (representative sequence) were manually identified by ‘massBLASTer analyses’ of UNITE database to confirm affiliation. An overview of the total number of sequences derived from pyrosequencing and after clustering is given in Table A 4.

For general community analyses and the identification of labelled phylotypes of bacterial and fungal genes (see 2.5.14, 2.5.17.4, 2.5.17.5) sequences occurring only once within the complete data set of all received amplicon libraries were considered as artificial errors and thus were removed, whereas singletons in each individual amplicon library were preserved.

2.5.13.2. Analyses of sequences derived from synthesis-sequencing

The sequence analysis of raw reads obtained by ‘ILLUMINA sequencing’ was conducted by our cooperation partner at the Institute de botanique, Laboratoire GMG, Equipe AIME, Strasbourg (analyses done by Dr. Ludovic Besaury).

For the analyses of 16S rRNA reads, the Illumina reads were analysed using mothur software package v.1.33.2 [Kozich et al., 2013] with the default parameters of the MiSeq standard operating protocol (http://www.mothur.org/wiki/MiSeq_SOP). Read pairs were assembled into contigs and contigs shorter than 420 bp or longer than 460 bp were discarded. Sequences were pre-clustered in groups of sequences with up to 2 nucleotide differences. Putative chimeric sequences were predicted by UCHIME [Edgar et al., 2011]

and subsequently removed. The remaining sequences were assigned using naïve Bayesian

taxonomic classification on the bacterial reference database SILVA (SSU_Ref database v.119), at a bootstrap cut-off set at 80 %. Only sequences affiliated to Bacteria and Archeae domains were selected and other non-bacterial or archeal sequences were discarded. The clustering into OTUs was done at a similarity threshold of 98 % sequence identity using the automated protocol within Mothur. This also yielded a representative consensus sequence for each OUT that was chosen as the most abundant sequence in a given OTU, and subsequently used for sequence alignments and further analyses.

For the analyses of functional genes (i.e., mxaF/xoxF and cmuA) the raw reads were processed using mothur software package v.1.33.2 [Kozich et al., 2013] including length filtering and quality trimming, and allowing sequence lengths within 20 nucleotides of the expected length of the amplicon. The USEARCH software [Edgar, 2010] was used for clustering the obtained filtered reads. Sequences occurring only once within the complete dataset of gene amplicon libraries were considered as artificial and were removed, whereas singletons in individual amplicon libraries were retained. Reads were clustered iteratively at progressively lower cut-off values, and the maximum cut-off value at which the number of retrieved OTUs stabilized was selected according to Stacheter and colleagues (77 % for mxaF/xoxF-type MDH) and the similarity cut-off value for the other SIP experiments (90 % for cmuA) (see 2.5.13.1, Figure 32). In the case of all MDH related gene sequences only forward reads (R1) were analysed in this manner because of the amplicon length (569 bp) and poor assembly results when forward and reverse reads were assembled. Consensus sequences for each OTU were provided by mothur. These sequences were compared against a gene-specific database generated from GenBank using BLAST (http://blast.ncbi.nlm.nih.gov) for taxonomic identification. Thus, a cluster table for each gene and each sample was obtained and used for further analyses.