• Keine Ergebnisse gefunden

Identification of quantitative trait loci for dressage in Hanoverian warm

6.1 Summary

The aim of this study was a genome-wide association (GWAS) study for quantitative trait loci (QTL) of dressage in Hanoverian warmblood horses employing the Illumine equine SNP50 Beadchip. We genotyped 115 stallions of the National State stud of Lower Saxony for our analyses. The quality of walk, trot and canter, and the rideability of a horse are characterizing for in the dressage talent. We performed adaptive permutation testing with and without parameters to explain potential data stratification to determine associations. Besides, we employed two different mixed linear animal model approaches to control for marker-based population structure and marker identity-by-state (IBS) based kinship among all individuals. Population stratification was explained best in the mixed linear animal model considering Hanoverian, Thoroughbred, Trakehner and Holsteiner genes and the marker identity-by-state relationship matrix. We identified 12 QTL for dressage on horse chromosomes (ECA) 2, 3, 4, 6, 7, 8, 9, 10, 16, 18, 27 and 28 (-log10 P-value >4) and further putative QTL with -log10 P-values of 3-4 on ECA1, 2, 3, 5, 14, 18, 19, 20, 21 and 26. Within the QTL regions we identified functional candidate gene for dressage performance, including VWC2 on ECA4, HPX on ECA7, AF3BL2 on ECA8, TRAPPC9 on ECA9, MYL3 on ECA16 and MCPH1 on ECA27. Within the putative QTL we found MYO5A, GNB5, GABPB and HDC on ECA1, TRPC3 on ECA2, PPARGC1A on ECA3, MARKAPK on ECA5, SHOX2 on ECA19, and GABPA on ECA26 as functional candidate genes. Our results suggest that multiple genes involved in diverse processes are crucial for elite dressage performance. In particular genes involved in coordination, ataxia and learning aptitude might play a major roll for excellent dressage performance. To validate these QTL, further studies in larger data sets and other horse populations are necessary.

Keywords: horse, dressage, quantitative trait loci, GWAS, SNP

83

Identification of QTL for dressage

6.2 Introduction

Primarily bred to be a horse suitable for the military usage, since the foundation of the Hanoverian Stud Book in 1735, the Hanoverian warmblood (Hanoverian) was intensely selected for an athletic and competitive phenotype that is required for a favourable riding horse. Therefore, the Hanoverian represents one of the most important breed of sport horses in the world today and is numerously represented among the best dressage horses world wide. Excellent movements in walk, trot and canter combined with a good rideability are the major aim of dressage horse breeding. The walk is aimed to be a rhythmical and even four beat, ground covering, energetic and elastic, and well balanced. A clear two-beated cadence, a high level of impulsion, elasticity, ground cover and balance, an uphill-moving forehand with a freely moving shoulder is demanded for the trot. Thereby, the hindlegs are supposed to be active, well bending and moving with thrust under centre of gravity, with a clear activity of the musculature of the back and the thighs. The canter is aimed to be uphill with clear rhythm (three-beat), impulsion, elasticity, ground cover and balance. Every canter stride should be a powerful push with well bent hindlegs, striding under centre of gravity. Results of mare performance tests (MPT) and auction inspection (PAI) of the Hanoverian studbook society (HSS) of last decades indicate breeding progress particularly in thus dressage related traits. In this connection, possible sale benefits may cause that breeders tend to put in particular weight on dressage talent of their foals. At young age, high prices are primarily achieved for foals with good movements and dressage horses are often the most expensive among the horses offered at riding horse auctions in Europe. However, training a dressage horse to high levels takes several years and further more, genetic improvement in horses is greatly reduced by the long generation interval. Heritability estimates for walk, trot, canter and rideability range from 0.26 – 0.36 in Hanoverian warmblood horses (Stock

& Distl 2007). Hence, the application of genetic markers in selection schemes to improve dressage performance and early reveal best horses for professional training could be highly desirable. Although the physical attributes contributing to excellent dressage performance are well described, the genes contributing to dressage

84

Identification of QTL for dressage

performance have not yet been identified. Genomic regions potentially containing genes that influence exercise-related phenotypes in Thoroughbreds are localized for the first time by Gu et al. (2009). In contrast, numerous studies have been performed to identify QTL and candidate genes potentially affecting physical performance in human. Therefore, relating to physical performance the human is the best studied species so far (Bray et al. 2009). Ballet, competitive dancing, and gymnastic exercise are human sports requiring physical performance most comparable to dressage.

Next to endurance and strength, high levels of balance, coordination and sensitivity are required.

For human and few other species e.g. cattle and dogs, genotyping arrays containing SNP markers were successfully used for mapping QTL for quantitative traits (Karlsson et al. 2007; Myles et al. 2008; Kolbehdari et al. 2009). The completion of the equine genome assembly, and SNP assays covering the whole equine genome enable to scan for genetic variations in horses at a very high resolution. The aim of this study was to perform a genome-wide association (GWA) analyses for dressage, including the quality of walk, trot, canter and rideability as well as the composed trait dressage in Hanoverian horses using the equine SNP50 BeadChip (Illumina, San Diego, CA, USA) and to screen the potential QTL for possible candidate genes known from human studies.

6.3 Materials and Methods 6.3.1 Animals and phenotypic data

Blood samples were collected from 115 Hanoverian warmblood stallions of the National State Stud of Lower Saxony. These stallions were born between 1972 and 2000 and represent a random sample from all Hanoverian stallions born in the last 20-30 years. Pedigree data were made available by the HSS through the national unified animal ownership database (Vereinigte Informationssysteme Tierhaltung w.V., VIT). Pedigree records of these stallions allowed us to assign the 115 stallions into 16 families which included a total of 798 stallions (Table S1). We employed the latest breeding values (BVs) for dressage (Mai 2009) provided by the HSS. BVs for dressage were estimated based on results recorded at mare performance tests

85

Identification of QTL for dressage

(MPTs) since 1987 and inspections before auctions (PAIs) since 1999 including 35,512 animals. Dressage is a composed trait resulting from scores for the quality of walk, trot and canter, and the rideability. At MPTs mares are scored by a judging commission for dressage using a scale from 0 (not shown) to 10 (excellently shown) with 0.5 intervals. Quality of walk, trot, and canter and rideability are separately scored and then averaged to a total score for dressage. Horses pre-selected for sale at riding horse auctions of the HSS are scored for dressage by a judging commission. Between 1999 and 2008, 8,081 Hanoverians (5,567 males, 2,514 females) were judged at PAIs. Auction candidates are scored for the same traits and at the same scale like the mares at MPTs. The same commission judges the mares at MPTs and the auction candidates to ensure comparable results. If a mare took part in a PAI as well as in a MPT, then the result of the MPT is included in the BV estimation. For mares which repeated the MPT, the last result is included. BVs are estimated yearly through the VIT by applying a multivariate BLUP (best linear unbiased prediction) animal model (Christmann 1996).

Yijk= μ + TESTi+ aj + eijk

with yijk = score for quality of walk, trot and canter, and rideability, μ = model constant, TESTi = fixed effect of the individual test for MPT or PAI: interaction between the place, the year and season of performance evaluation, aj = random additive genetic effect of the individual horse and eijk = random residual. Season has been classified two seasons (January to June and July till December).

BVs were standardized to a mean value of 100 points and a standard deviation of 20 points, using the horses born in the years 1999 and 2000. Every year the reference birth years move by one year forward.

For the investigated stallions the BV for dressage ranged from 48-150 (mean 97±23) (Table S2). Reliabilities ranged between 0.25 and 0.99 (mean 0.87±0.13).

The distributions of BVs for dressage were analyzed using the procedure UNIVARIATE of SAS software (Statistical Analysis System, version 9.2, SAS Institute, Cary, NC, USA, 2010).

86

Identification of QTL for dressage

For each stallion the proportion of genes of Hanoverian (HAN), Thoroughbred (TB), Trakehner (TRAK) and Holsteiner (HOL) horses were calculated using all available pedigree information. Details are described elsewhere (Hamann & Distl 2008). Mean (median) proportions of genes in the stallions were 0.54 (0.63) for HAN, 0.28 (0.19) for TB, 0.05 (0.03) for TRAK, and 0.06 (0) for HOL (Table S1). Given the uneven representation of gene proportions, each four classes for the several breeds were defined as follows HAN: ≤0.34, >0.34 and < 0.78, ≥0.78; TB: ≤0.13, >0.13 and <

0.30, ≥0.30; TRAK: ≤0.20, >0.20 and < 0.80, ≥0.80; HOL: 0.00, >0.00 and < 0.30,

≥0.30.

6.3.2 Genotyping SNPs

Genomic DNA was extracted from EDTA blood samples of 115 Hanoverian warmblood stallions through a standard ethanol fraction with concentrated sodium chloride (6M NaCl) and sodium dodecyl sulphate (10% SDS). Concentration of extracted DNA was determined using the Nanodrop ND 1000 (Preqleb Biotechnology GmbH, Erlangen, Germany). DNA concentration of samples was adjusted between 30 and 80 ng/μl.

Genotyping was performed with the Illumina equine SNP50 BeadChip containing 54,602 SNP markers using standard procedures as recommended by the manufacturer. Raw data were analysed using the genotype module version 3.2.32 of the BeadStudio program (Illumina). In order to assign the genotypes we generated a cluster file with the help of the BeadStudio software and the genotyping module version 3.2.32.

6.3.3 Data analysis

For genome-wide mapping we performed association analyses for all SNPs with a minor allele frequency (MAF) >0.01 and a call rate >0.90. Due to missingness test, no SNP was excluded. There were 7875 SNPs that did not reach a sufficient MAF and 3951 SNP had a call rate ≤0.90, so 43,441 SNPs were left for association analyses. To control spurious associations, we tested possible stratification effects on their outcome of GWA and employed empirical genome-wide error probabilities

87

Identification of QTL for dressage

through adaptive permutations. The models employed were parameterized using PLINK, version 1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/ (Purcell et al. 2007)) and TASSEL version 2.1 (http://www.maizegenetics.net/tassel (Bradbury et al.

2007)). First, genome-wide associations were determined without any parameters to explain potential data stratification. Therefore, adaptive permutations for correction of multiple tests was performed using a maximum of 1,000,000 permutations (Plink1) the “--assoc” and the “--aperm” options of PLINK. An extended model included the gene proportions of the important founder breeds HAN, TB, TRAK and HOL to improve the results of the GWA analyses. The covariates were considered as class effects with each four levels. The adaptive permutations were done applying a linear regression model using the “--linear” and “--covar” options for PLINK (Plink2). In a third PLINK model, we tested the effect of family structure on the GWA analysis. We performed Cochran-Mantel-Haenszel (CMH) tests within the 16 families and simultaneously considered the gene proportion of HAN, TB, TRAK and HOL as covariates. Here, we utilized the “--mh”, “--within” and “--aperm” options for PLINK (Plink3).

A mixed linear animal model (MLM) was employed to control for marker-based population structure (Q-matrix) and marker identity-by-state (IBS) based kinship among all individuals (K-matrix) using TASSEL (http://www.maizegenetics.net/tassel (Bradbury et al. 2007)) (Tassel1). The data file for building these two matrices were from 7375 genome-wide and equidistantly distributed SNPs at pair-wise linkage disequilibrium <0.2 (Ritland 1996). The Q-matrix contained three covariates for the cryptic structure of the stallions as determined by STRUCTURE, version 2.3.3 (Pritchard et al. 2000), via optimization of the likelihood of the data. Using the KIN option of TASSEL, the K-matrix was created calculating the marker IBS coefficients.

This subset of SNPs was generated as a pruned subset of SNPs that are in approximate linkage equilibrium with each other. Therefore the “--indep-pairwise 2000 500 0.2” (sliding window size of 2000 SNPs, window shift steps of 500 SNPs and an r2 threshold of 0.2) option within PLINK was used. We implemented two MLM models. The first MLM model explained for effects of the cryptic structure as determined via structure and the IBS-kinship matrix (Tassel1). In the second MLM

88

Identification of QTL for dressage

model (Tassel2) the proportions of genes of HAN, TB, TRAK and HOL as covariates and the IBS-kinship matrix were taken into account. The MLM (Yu et al. 2006) was implemented in TASSEL as described in Henderson’s notation (Bradbury et al.

2007):

y = Qβ + Zu + Ga + e

where y is the BV for dressage, walk, trot, canter or rideability; β is an unknown vector containing fixed effects of population structure (Q-matrix) or the proportion of genes of HAN, TB, TRAK and HOL; u is an unknown vector of random additive genetic effects from multiple background QTL for individuals (K-matrix); a is an unknown vector containing the genotype effects of the SNPs in the GWA; X, Z, and G are the known design matrices; and e is the unobserved vector of random

residuals. Subsequently, we calculated the observed polymorphism information content (PIC) using the ALLELE procedure of SAS/Genetics (Statistical Analysis System, version 9.2 SAS Institute, Cary, NC, USA, 2010).

We built quantile-quantile (Q-Q) plots to visualise the observed versus expected P-value distribution for each of the models employed. The observed –log10 P-values were plotted against –log10 P-values expected under the null hypothesis of independence (Fig. 1-5). The observed divergence between the expected distribution of the regression line and the distribution of observed –log10 P-values represent the inflation of P-values mainly caused by data stratification. According to the Q-Q plots, smallest –log10 P-value inflation was observed using a MLM model with the gene proportion of HAN, TB, TRAK and HOL as covariates and the K-matrix for random additive genetic effects due to the IBS relationship among all animals (Tassel2).

Based on the Q-Q plots, we defined a SNP as significant with -log10 (P) >3 and as highly significant with -log10 (P) >4 using Tassel1 or Tassel2. We found 1000 SNPs as highly significant using Plink1, 694 SNPs using Plink2 and 151 SNPs using Plink3.

Using MLM, 270 SNPs were associated with the BV for dressage employing Tassel1 and 57 employing Tassel2. Subsequent, potential QTL were defined as genomic region with minimum one SNP marker estimated as highly significant using Tassel1 or Tassel2 (Table 1). Further putative QTL were defined as genomic regions

89

Identification of QTL for dressage

harbouring at least one SNP marker estimated as significant using Tassel1 and Tassel2 (Table S1).

Estimates of the additive and dominance effects for each of the most significant SNPs within each potential QTL were obtained using Best Linear Unbiased

Prediction (BLUP) with the software PEST (Groeneveld et al. 1990).

Yijklmno= μ + GTi+ HANj + TBk + TRAKl + HOLm+ an + eijklmno

with yi…o = BVs for dressage, walk, trot, canter and rideability μ = model constant, GTi = genotypes of the most significant SNP within each QTL, HANj = proportion of Hanoverian genes, TBk = proportion of Thoroughbred genes, TRAKl = proportion of Trakehner genes, HOLm = proportion of Holsteiner genes, an = random additive genetic effect of the individual horse (n = 1-3665) and ei…o = residual.

The additive genetic effects for each of the most significant SNP within each QTL were estimated as half of the difference of the least square means of the two homozygous genotypes. The dominance effect was calculated as the deviation of the least square means of the heterozygotes from the average of the two homozygous genotypes. If none of the investigated stallions was homozygote for the minor allele of the SNP, the genotype effect was calculated as the deviation of the least square mean of the homozygote genotype from the least square mean of heterozygote genotype. Significance was tested using F-tests. The genotype based BVs (gBV) for dressage was calculated based on the observed additive or dominance effect of these SNPs for each stallion.

dominance effect of each SNP or b = genotype effect of the SNP.

For better compression, we standardized gBVs to a mean value of 100 points and a standard deviation of 20 points. Subsequent, correlations between BV dressage and gBV dressage were calculated using the CORR procedure of SAS/Genetics. We performed multiple analyses of variance (ANOVA) to test the influence of gBV

90

Identification of QTL for dressage

dressage on the distribution of BV dressage (r2) using the procedures GLM of SAS/Genetics.

6.4 Results

We were able to define 12 QTL for dressage on horse chromosomes (ECA) 2, 3, 4, 6, 7, 8, 9, 10, 16, 18, 27 and 28 (Table 1). The QTL regions were on ECA2 at 115.8 Mb, on ECA4 at 19.1 Mb, on ECA6 at 45.8 Mb, on ECA7 at 76.9 Mb, on ECA10 at 73.0 Mb, on ECA27 at 33.3 Mb and on ECA28 at 45.2 Mb. On ECA3, highest –log10 P-values were at 89.4-89.6 Mb and on ECA8 at 39.3 Mb. Peak values were highest at 79.2–79.3 Mb on ECA9, at 40.8-41.3 Mb on ECA16 and at 0.8 Mb on ECA18. Further putative QTL with -log P-values >3 and <4 were on ECA1, 2, 3, 5, 14, 18, 19, 20, 21 and 26 (Table S3). The mean polymorphism information (PIC) was 0.26 for each SNP with -log P-values >4.

The largest additive effects with values of 16.1-16.3 points for BV dressage were detected for was detected for BIEC2-734776 and BIEC2-745466 on ECA28 (Table 2). Additive effects in a similar but smaller size were shown for QTL on ECA4, 7 and 9. With the exception of the QTL on ECA9 and ECA28 at 22 Mb, for all other QTL significant dominance effects could be demonstrated. The dominance effects were in a range from -9 to -20 and 21 to 32 points of BV dressage. For the QTL with a MAF

<0.09 each one homozygous genotype was missing and thus, additive and dominance effects were not estimable. The correlation coefficient estimated between the gBV for dressage and the BV for dressage was 0.54. The variance, explained by the most significant SNP within each QTL was 0.40. Table 3 shows the distribution of SNP genotypes per proportions of Hanoverian, Thoroughbred, Trakehner and Holsteiner genes.

Within the QTL regions the von Willebrand factor C domain containing 2 gene (VWC2) on ECA4, the hemopexin gene (HPX) on ECA7, the ATPase family gene 3-like 2 gene (AF3BL2) on ECA8, the trafficking protein particle complex 9 gene (TRAPPC9) on ECA9, the myosin, light chain 3, alkali; ventricular, skeletal, slow gene (MYL3) on ECA16 and the microcephalin 1 gene (MCPH1) ECA27 could be identified as a functional candidate gene for physical performance. Within the

91

Identification of QTL for dressage

putative QTL we found the myosin VA (heavy chain 12, myoxin) gene (MYO5A), the guanine nucleotide binding protein (G protein), beta 5 gene (GNB5), the GA binding protein transcription factor, beta gene (GABPB) and the histidine decarboxylase gene (HDC) on ECA1, the transient receptor potential cation channel, subfamily C, member 3 gene (TRPC3) on ECA2, the peroxisome proliferator-activated receptor gamma, coactivator 1 alpha gene (PPARGC1A) on ECA3, the mitogen-activated protein kinase-activated protein kinase 2 gene (MARKAPK) on ECA5, the short stature homeobox 2 gene (SHOX2) on ECA19, and the GA binding protein transcription factor, alpha subunit 60kDa gene (GABPA) on ECA26 as functional candidate genes. Table S2 shows all SNPs, estimated as significant using Tassel1 or Tassel2 and potential candidate genes from human studies for physical fitness.

6.5 Discussion

The aim of our study was to reveal SNP markers associated with dressage in Hanoverian warmblood horse. We were able to define 12 QTL and 15 further putative QTL. We suggest, based on the moderate number of associated SNPs that several genes, similar to physical performance in human (Macarthur & North 2005), are involved in an elite dressage performance.

The Q-Q-plots show expected distribution of association test statistics (X-axis) across the SNPs compared to the observed value (Y-axis). Any bias deviation from the X=Y line implies a consistent data stratification across the whole genome, as seen in Fig. 1-4. The Q-Q-plot for observed P-values calculated using Tassel2, shows a solid line matching X=Y until it curves at the end, representing the small number of truly associated SNPs. Hence, omitting the effects of kinship and the proportion of genes for GWA would result in an increased false discovery rate caused by stratification within the investigated population. To some extent estimated –log error probabilities substantially vary between the different models used, indicating a high sensitivity of GWA analyses. In particular, results estimated using Plink differ from thus estimated using Tassel. We suppose, the algorithm used in Tassel takes data stratification of the investigated stallion population better into

The Q-Q-plots show expected distribution of association test statistics (X-axis) across the SNPs compared to the observed value (Y-axis). Any bias deviation from the X=Y line implies a consistent data stratification across the whole genome, as seen in Fig. 1-4. The Q-Q-plot for observed P-values calculated using Tassel2, shows a solid line matching X=Y until it curves at the end, representing the small number of truly associated SNPs. Hence, omitting the effects of kinship and the proportion of genes for GWA would result in an increased false discovery rate caused by stratification within the investigated population. To some extent estimated –log error probabilities substantially vary between the different models used, indicating a high sensitivity of GWA analyses. In particular, results estimated using Plink differ from thus estimated using Tassel. We suppose, the algorithm used in Tassel takes data stratification of the investigated stallion population better into