• Keine Ergebnisse gefunden

2.5.1 Read alignment and quantification

Read alignment to the reference genome – In total, there were six samples (tetrads CO, tetrads HS, post-meiotic CO, post-meiotic HS, mature CO and mature HS) that were run in biological triplicates, resulting in a total number of 18 MACE libraries. The reads of the 18 MACE libraries were aligned to the reference genome of tomato (SL3.0; cultivar Heinz), which was downloaded from the SGN (see 2.2.4). The alignment was performed with NextGenMap (see 2.3.12), which was executed with default parameters except for the following modifications: --kmer-skip 1, --silent-clip and --no-unal.

Subsequently, all alignments with an edit distance (sum of insertions, deletions and mismatches) greater than two were eliminated from the reported SAM file.

Quantification and normalization – As in tomato only a single isoform is annotated for each protein-encoding gene in the generic feature format (GFF) file, no discrimination between different mRNAs from the same gene took place. Therefore, the quantification of the mRNA level of a gene was based on the total number of reads aligned to the gene, which was done with the htseq-count script of HTSeq (see 2.3.7). The input of the htseq-count script were the SAM files with the alignments and the GFF file of tomato (ITAG3.2; see 2.2.4). Due to differences in the number of aligned reads between the different MACE libraries, the read counts of all mRNAs were library-wise normalized by dividing the read count of an mRNA by the total number of aligned reads, followed by the multiplication by one million, which led to TPM values for all mRNAs.

2.5.2 Threshold estimation and data quality control via PCA

Threshold estimation – To determine a threshold, at which it is assumed that an mRNA was really detected in a MACE library, all mRNAs not detected in every biological replicate of a sample were extracted. Next, these mRNAs were used to generate a distribution of replicate averaged TPM values for each of the six samples, whereby only replicates in which the mRNA was detected were taken into account. Based on the 95th percentiles of each of the six distributions a detection threshold of 1 TPM could be determined. In the end, all mRNAs detected (≥1 TPM) in at least two out of the three replicates were considered as detected for a sample. The TPM values of mRNAs not detected in a sample were set to 0 for all three replicates.

PCA – To obtain a first overview about the behavior of the replicates and samples, a PCA was performed with FactoMineR (see 2.3.6). As input served log10 transformed TPM values of the 18 MACE libraries.

2.5.3 Analysis of stage-accumulated mRNAs

Identification of stage-accumulated mRNAs – To identify mRNAs accumulated (stage-accumulated mRNAs) in a single or two consecutive developmental stages, a differential regulation analysis was performed with DESeq2 (see 2.3.4) by taking the read counts of the replicates of all CO samples as input. In total, four statistical tests were performed independently. These included a likelihood ratio test for the identification of mRNAs with significantly different levels between any two stages (adjusted p-value <0.05) and three pairwise Wald tests (tetrads vs meiotic, tetrads vs mature and post-meiotic vs mature) to figure out between which stages the difference is (adjusted p-value <0.05 and a log2 fold change <-1 or >1). The outcome of this procedure was used for the determination of stage-accumulated mRNAs. mRNAs were considered as stage-accumulated in a single stage if they had significantly higher levels in this stage than in the other two stages. Further, mRNAs were considered as accumulated in two consecutive stages if they had no significant difference between the two stages, but in both stages significantly higher levels than in the remaining third stage. This approach resulted in five groups of stage-accumulated mRNAs, namely mRNAs accumulated in tetrads, tetrads + post-meiotic, post-meiotic, post-meiotic + mature and mature.

Functional enrichment analysis – For the functional characterization of the stage-accumulated mRNAs, all mRNAs of tomato were functionally annotated based on the MapMan ontology. For this purpose, the protein sequences of the mRNAs were submitted to the Mercator web server, which

assigned MapMan terms to each protein sequence, if possible (see 2.3.11). In the next step, a functional enrichment analysis was performed to identify functional categories that are enriched among the stage-accumulated mRNAs. For this purpose, a python script (see 2.3.13) was written, which performs a Fisher’s exact test and a subsequent correction for multiple hypothesis testing via the false discovery rate (FDR) method of Benjamini and Hochberg. The test was performed for all MapMan terms of the second hierarchy level (e.g. protein.synthesis), as this level is available for the majority of mRNAs and sufficient for a clear identification of important processes. Each MapMan term was tested for the null hypothesis that there is no dependency between the term and the stage-accumulated mRNAs, whereby all annotated mRNAs served as background. A rejection of the null hypothesis means that there is a dependency and thus an enrichment of the term among the stage-accumulated mRNAs. The reported p-value of each term was afterwards corrected for multiple hypothesis testing and terms with an adjusted p-value <0.05 considered as enriched.

2.5.4 Collection of of Hsfs and identification of Hsps in tomato

The analysis of Hsfs and Hsps was based on known tomato Hsfs, described by Scharf et al. (2012), and predicted tomato orthologs of A. thaliana Hsps. The ortholog prediction was performed with InParanoid (see 2.3.8) by taking the proteomes of A. thaliana (TAIR10; see 2.2.5) and tomato (ITAG3.2; see 2.2.4) as input. The reported ortholog groups were afterwards screened for groups containing A. thaliana Hsps, which were described by Fragkostefanakis et al. (2015). Tomato genes assigned to the same group as A. thaliana Hsps were considered as tomato Hsps.

2.5.5 Analysis of mRNAs differentially regulated in response to HS

Differentially regulated mRNAs upon HS – For the identification of mRNAs with differentially regulated levels in response to HS, for each developmental stage a Wald test was performed with DESeq2, whereby the read counts of the replicates of the non- and heat-stressed samples were used as input.

mRNAs with an adjusted p-value <0.05 and a log2 fold change <-1 and >1 were considered as down- and upregulated, respectively.

Hsf and Hsp analysis –The HS response of the seven Hsp families (Hsp100, Hsp90, Hsp70, Hsp60, Hsp40, sHsp and Hsp10) and of the Hsf family was analyzed based on the differential expression results of each developmental stage. For this purpose, all members of the Hsp and Hsf families were categorized as downregulated, not regulated or upregulated. Afterwards, the percentage each category constitutes in a family was determined.