• Keine Ergebnisse gefunden

4 Bivariate genomic prediction of phenotypes by selecting epistatic interactions across years

5.6 Outlooks and Conclusion

5.6.1 Outlook for epistasis models

In this work, we have developed and evaluated epistasis models in univariate and bivariate statistical settings for prediction across environments and across years in doubled haploid lines from European landraces Kemater and Petkuser in six locations in the year 2017 and in four locations in both years 2017 and 2018 in Germany and Spain for eight phenotypic traits.

ERRBLUP as the full epistasis model incorporating all pairwise SNP interactions and sERRBLUP as the selective epistasis model incorporating a subset of pairwise SNP interactions have been developed in this thesis. ERRBLUP was shown to be almost identical to GBLUP, since incorporating all pairwise SNP interactions into the genomic prediction model introduced a large number of unimportant variables in to the model producing the ‘noise’ which prevents a gain in ERRBLUP predictive ability compared to GBLUP. In contrast, sERRBLUP outperforms GBLUP

175 when the top ranked pairwise SNP interactions maintained in the model in both univariate and bivariate statistical settings across series of phenotypic traits and landraces in maize dataset for the majority of cases. Interaction selection based on effect variances in sERRBLUP was shown to be the preferable selection criterion compared to interaction selection based on absolute effect sizes due its robustness.

The increase in GBLUP predictive ability to maximum sERRBLUP predictive ability results from epistasis in bivariate statistical setting, while this increase is caused by both epistasis and borrowing information across environments in univariate statistical setting.

5.6.2 Outlook for EpiGP R-package

In this thesis, we have found that the proposed sERRBLUP model’s computing time display a quadratic growthby increasing the number of SNPs. Although our developed “EpiGP” R-package is able to process more than 30’000 SNPs and thus more than 450 Million SNP x SNP interactions in an efficient way, our proposed model can potentially lead to a considerable computational load by increasing the number of SNPs to hundreds of thousands. Therefore, reducing the number of SNPs by LD pruning is highly recommended which can potentially lead to higher predictive ability in addition to making the application of epistasis models feasible in an efficient computational time. Moreover, utilizing haplotype blocks in sERRBLUP model was shown to result in the very similar prediction accuracies as the ones obtained when utilizing pruned set of SNPs, while its computational load is significantly reduced compared to sERRBLUP model based on pruned set of SNPs.

5.6.3 Outlook for influential factors on the model’s predictive ability

Bivariate models were shown to be superior to univariate models under the cross validation scenario through which only the test set of the target environment (or year) is masked in each run of 5-fold cross validation with 5 replicates. With this, bivariate GBLUP was shown to be slightly better than maximum univariate sERRBLUP, and maximum bivariate sERRBLUP obtained the maximum predictive ability among all bivariate models in most cases.

Furthermore, in the context of bivariate models and univariate sERRBLUP model, the genotype overlap between the target environment (or year) and the secondary environment (or year) illustrates significant correlation with the respective models’ predictive abilities. Under the assumption of a high level of genotype overlap, genomic correlation was shown to be the first and the most significant factor affecting the genomic prediction model’s predictive ability. Phenotypic correlation and the trait’s heritability were also the factors affecting the predictive ability of the respective models.

In this thesis, we further proposed genomic prediction models across multiple environments jointly as a successful approach through which the obtained predictive ability is as good as or better than the maximum predictive ability obtained across a single environment. This was potentially caused

176 by providing 100 percent genotype overlap between the target environment and the secondary environment. Besides, utilizing the average phenotypic values across all environments adjusts the impact of a single environment which helps to prevent the choice of a ‘wrong’ environment for training the model.

5.6.4 Concluding remarks

Genomic prediction of phenotypes can be a giant step toward increasing the accuracy of genomic prediction models by utilizing the sERRBLUP model especially through a bivariate statistical framework which can potentially increase the predictive ability the most and its computational load can be reduced significantly by utilizing haplotype blocks instead of pruned set of SNPs. The potential gain in bivariate sERRBLUP predictive ability is determine by the level of genotype overlap, genomic and phenotypic correlation between the target environment (or year) and the secondary environment (or year) for a highly heritable trait, when utilizing the full dataset of the secondary environment (or year) and the training set of the target environment (or year) for predicting the target environment’s test set (or year).

In fact, this scenario of genomic prediction of phenotypes in bivariate sERRBLUP model simulates the real scenarios in breeding programs in which the lines are recorded in multiple environments either during the same growing season or subsequent seasons. Moreover, the successful approach considered as prediction across multiple environments jointly is a potential approach in this context which can enhance the predictive ability through training the model on all available information across environments or across multiple growing seasons.

Overall, accurate genomic prediction of phenotypes is of great importance in plant breeding since genomic prediction is becoming a daily tool for selection purposes by plant breeders. Therefore, our proposed sERRBLUP model as a selective epistasis model increasing the prediction accuracy can revolutionize the genomic prediction of phenotypes especially through utilizing all available information across multiple environments or multiple growing seasons jointly to train the model in an efficient computational manner by EpiGp R-package. Moreover, efficiently utilizing all phenotypic records which have been collected at high costs over different environments and/or years is an important capacity of sERRBLUP which substantially enhances the prediction accuracy while decreasing the further costs in plant breeding. In general, the sERRBLUP model can potentially be utilized for other species such as animal and human genomic predictions where epistasis is a relevant gene action.

177