• Keine Ergebnisse gefunden

Univariate and multivariate genomic prediction models

1 Introduction

1.7 Univariate and multivariate genomic prediction models

In animal and plant breeding programs, multi-trait and multi-environment data are quite common.

Therefore, powerful statistical models are required to use these data and exploit the correlation between the traits to improve the prediction accuracy for genomic selection purposes (Montesinos-López et al., 2018).

Genomic prediction models can be classified into two main categories of univariate models and multivariate models based on the number of desired traits to be analyzed. The univariate models are designed to predict a single phenotypic trait, while the multivariate models as multi-trait and multi-environment models are designed to predict multiple phenotypic traits simultaneously.

Utilizing multi-trait models helps to capture the complex relationships between the traits more efficiently than univariate models and mostly results in more accurate prediction. Multi-trait models have been recently more popular in genomic selection due to their capacity of predicting multiple traits concurrently and the ability to increase the prediction accuracy compared to univariate models when the genetic correlation between the traits is high (Jia and Jannink, 2012;

Jiang et al., 2015; Montesinos-López et al., 2018). Multi-trait models mostly provide higher prediction accuracy for correlated traits compared to univariate models (He et al., 2016; Schulthess et al., 2018), although some studies reported just a modest increase in their prediction accuracy (Calus and Veerkamp, 2011; Montesinos-López et al., 2016).

Henderson and Quaas (1976) proposed the first application of mixed models for multi-trait evaluation. Multi-trait models were initially proposed in animal breeding to model genetic correlation among traits and to model genotype by environment interactions across multiple years or environments (Mrode, 2014; Lee and van der Werf, 2016). The initial multivariate models which were applied to plant and animal species were based on available pedigree information to infer relationships among individuals and traits in mixed model framework (Mrode, 2014).

However, the wide availability of dense molecular markers led to a replacement of the limited pedigree information to construct genomic relationship matrices resulting in new options for analyzing crops with restricted pedigree information (Endelman and Jannink, 2012). Velazco et al. (2019) reported an improvement in predictive ability of multi-trait GBLUP compared to single-trait GBLUP in sorghum. Their study illustrated that multi-single-trait GBLUP increases the predictive ability of grain yield up to 16 percent by including plant height information into a multi-trait GBLUP model. This might be due to the strong genetic correlation between grain yield and plant height in sorghum hybrids (Velazco et al., 2019). Covarrubias-Pazaran et al. (2018) also showed that under medium or high genetic correlation, multivariate GBLUP exhibited higher accuracy than univariate GBLUP.

In plant breeding, one of the breeders’ major challenges is the difference in genotype performance from one environment to the other environments which is known as 𝑮 × 𝑬 interaction (Kang and Gorman, 1989). Multi-environment models are usually employed to assess 𝑮 × 𝑬 interaction for a single trait when the information on multiple genotype is recorded in multiple environments (Montesinos-López et al., 2016; Hassen et al., 2018). Inclusion of 𝑮 × 𝑬 interaction in genomic

27 prediction models help selection of lines with optimal overall performance across target environments in genomic selection context (Roorkiwal et al., 2018).

Several statistical models have been used to estimate 𝑮 × 𝑬 interaction in plant breeding such as linear regression, Analysis of Variance (ANOVA) models and linear mixed models (Elias et al., 2016). Incorporating genotype × environment (𝑮 × 𝑬) interaction into additive genomic prediction models in multi environment analysis has been reported to be potentially successful in increasing predictive ability (Hallauer et al., 2010). Burgueño et al. (2012) proposed the first statistical framework to model 𝑮 × 𝑬 using a linear mixed model for genomic prediction so that the single-trait, single-environment GBLUP model was extended to the multi-environment context. This approach was based on borrowing information across environments which resulted in higher prediction accuracy (Burgueño et al., 2012). Days to heading, and days to maturity in Iranian and Mexican wheat landraces in drought and heat environments has been evaluated by Crossa et al. (2016) which indicated that inclusion of 𝑮 × 𝑬 interaction in genomic prediction model lead to substantial and consistent increase in prediction accuracy compared to models without the 𝑮 × 𝑬 term. Inclusion of 𝑮 × 𝑬 interaction in a whole regression approach also leading to accurate prediction of maize yield (Millet et al., 2019), and the highly significant effect of 𝑮 × 𝑬 interaction on grain yield for single cross maize hybrids across environments with low and optimum availability of nitrogen in the soil (Mafouasson et al., 2018) are some examples of multi environment models. Moreover, environment analysis can also be utilized for multi-year analysis in the scenario of changing environmental conditions (Elias et al., 2016). In fact, gathering phenotypic data over the years to predict the lines in the upcoming years is a potential approach to increase the prediction accuracy, such as including historical phenotypic data in genomic prediction of hybrids in grain maize which has shown to increase its prediction accuracy (Schrag et al., 2019a).

Additionally, Martini et al. (2016) showed the feasibility of borrowing information across environments in EG-BLUP without incorporating additional terms such as 𝑮 × 𝑬 interaction into the epistasis genomic prediction model. This method resulted in an increase in predictive ability in one environment by variable selection in the other environment under the assumption of positive correlation of phenotypes in different environments which was demonstrated with the publicly available wheat data set (Pérez and de los Campos, 2014).

Overall, in the context of crop and livestock breeding, developing efficient selection strategies and powerful statistical models with higher prediction accuracy for which the costly and time consuming phenotyping of numerous selection candidates in multiple environment could be mitigated deserves special attention.

In this thesis, GBLUP, ERRBLUP and sERRBLUP models have been compared in their predictive abilities in the univariate statistical framework for the simulated phenotypes from genotypes of the publicly available wheat dataset (Pérez and de los Campos, 2014) (chapter 2). We further compared GBLUP, ERRBLUP and sERRBLUP models in both univariate and bivariate statistical frameworks for prediction across environments in 910 doubled haploid lines from European maize

28 landraces Kemater Landmais Gelb and Petkuser Ferdinand Rot in six locations in Germany and Spain for series of eight phenotypic traits gathered in the year 2017 (Chapter 3). Bivariate GBLUP, ERRBLUP and sERRBLUP models have been finally compared for prediction across years in the maize dataset in four locations in Germany and Spain by modeling the years 2017 and 2018 as two separate traits in in multi-trait model (chapter 4).

29