• Keine Ergebnisse gefunden

Genetic Diversity in the Modern Horse Illustrated from Genome-Wide SNP Data

N/A
N/A
Protected

Academic year: 2022

Aktie "Genetic Diversity in the Modern Horse Illustrated from Genome-Wide SNP Data"

Copied!
15
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Genetic Diversity in the Modern Horse Illustrated from Genome-Wide SNP Data

Jessica L. Petersen1*, James R. Mickelson1, E. Gus Cothran2, Lisa S. Andersson3, Jeanette Axelsson3, Ernie Bailey4, Danika Bannasch5, Matthew M. Binns6, Alexandre S. Borges7, Pieter Brama8, Artur da Caˆmara Machado9, Ottmar Distl10, Michela Felicetti11, Laura Fox-Clipsham12, Kathryn T. Graves4, Ge´rard Gue´rin13, Bianca Haase14, Telhisa Hasegawa15, Karin Hemmann16, Emmeline W. Hill17, Tosso Leeb18, Gabriella Lindgren3, Hannes Lohi16, Maria Susana Lopes9, Beatrice A. McGivney17, Sofia Mikko3, Nicholas Orr19, M. Cecilia T Penedo5, Richard J. Piercy20, Marja Raekallio16, Stefan Rieder21, Knut H. Røed22, Maurizio Silvestrelli11, June Swinburne12,23, Teruaki Tozaki24, Mark Vaudin12, Claire M.

Wade14, Molly E. McCue1

1University of Minnesota, College of Veterinary Medicine, St Paul, Minnesota, United States of America,2Texas A&M University, College of Veterinary Medicine and Biomedical Science, College Station, Texas, United States of America,3Swedish University of Agricultural Sciences, Department of Animal Breeding and Genetics, Uppsala, Sweden,4University of Kentucky, Department of Veterinary Science, Lexington, Kentucky, United States of America,5University of California Davis, School of Veterinary Medicine, Davis, California, United States of America,6Equine Analysis, Midway, Kentucky, United States of America,7University Estadual Paulista, Department of Veterinary Clinical Science, Botucatu-SP, Brazil, 8University College Dublin, School of Veterinary Medicine, Dublin, Ireland, 9University of Azores, Institute for Biotechnology and Bioengineering, Biotechnology Centre of Azores, Angra do Heroı´smo, Portugal,10University of Veterinary Medicine Hannover, Institute for Animal Breeding and Genetics, Hannover, Germany,11University of Perugia, Faculty of Veterinary Medicine, Perugia, Italy,12Animal Health Trust, Lanwades Park, Newmarket, Suffolk, United Kingdom,13French National Institute for Agricultural Research-Animal Genetics and Integrative Biology Unit, Jouy en Josas, France,14University of Sydney, Veterinary Science, New South Wales, Australia,15Nihon Bioresource College, Koga, Ibaraki, Japan,16University of Helsinki, Faculty of Veterinary Medicine, Helsinki, Finland,17University College Dublin, College of Agriculture, Food Science and Veterinary Medicine, Belfield, Dublin, Ireland,18University of Bern, Institute of Genetics, Bern, Switzerland,19Institute of Cancer Research, Breakthrough Breast Cancer Research Centre, London, United Kingdom,20Royal Veterinary College, Comparative Neuromuscular Diseases Laboratory, London, United Kingdom,21Swiss National Stud Farm, Agroscope Liebefeld-Posieux Research Station, Avenches, Switzerland,22Norwegian School of Veterinary Science, Department of Basic Sciences and Aquatic Medicine, Oslo, Norway,23Animal DNA Diagnostics Ltd, Cambridge, United Kingdom,24Laboratory of Racing Chemistry, Department of Molecular Genetics, Utsunomiya, Tochigi, Japan

Abstract

Horses were domesticated from the Eurasian steppes 5,000–6,000 years ago. Since then, the use of horses for transportation, warfare, and agriculture, as well as selection for desired traits and fitness, has resulted in diverse populations distributed across the world, many of which have become or are in the process of becoming formally organized into closed, breeding populations (breeds). This report describes the use of a genome-wide set of autosomal SNPs and 814 horses from 36 breeds to provide the first detailed description of equine breed diversity. FST calculations, parsimony, and distance analysis demonstrated relationships among the breeds that largely reflect geographic origins and known breed histories.

Low levels of population divergence were observed between breeds that are relatively early on in the process of breed development, and between those with high levels of within-breed diversity, whether due to large population size, ongoing outcrossing, or large within-breed phenotypic diversity. Populations with low within-breed diversity included those which have experienced population bottlenecks, have been under intense selective pressure, or are closed populations with long breed histories. These results provide new insights into the relationships among and the diversity within breeds of horses. In addition these results will facilitate future genome-wide association studies and investigations into genomic targets of selection.

Citation:Petersen JL, Mickelson JR, Cothran EG, Andersson LS, Axelsson J, et al. (2013) Genetic Diversity in the Modern Horse Illustrated from Genome-Wide SNP Data. PLoS ONE 8(1): e54997. doi:10.1371/journal.pone.0054997

Editor:Hans Ellegren, University of Uppsala, Sweden

ReceivedOctober 3, 2012;AcceptedDecember 20, 2012;PublishedJanuary 30, 2013

Copyright:ß2013 Petersen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding:This work was supported by National Research Initiative Competitive Grants 2008-35205-18766, 2009-55205-05254, and 2012-67015-19432 from the United States Department of Agriculture-National Institute of Food and Agriculture (USDA-NIFA); Foundation for the Advancement of the Tennessee Walking Show Horse and Tennessee Walking Horse Foundation; National Institutes of Health-National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIH- NAIMS) grant 1K08AR055713-01A2 (MEM salary support), and 2T32AR007612 (JLP salary support), American Quarter Horse Foundation grant ‘‘Selective Breeding Practices in the American Quarter Horse: Impact on Health and Disease 2011–2012’’; Morris Animal Foundation Grant D07EQ-500; The Swedish Research Council FORMAS (Contract 221-2009-1631 and 2008-617); The Swedish-Norwegian Foundation for Equine Research (Contract H0847211 and H0947256); The Carl Tryggers Stiftelse (Contract CTS 08:29); IBB-CBA-UAc¸ was supported by FCT and DRCT and MSL by FRCT/2011/317/005; Science Foundation Ireland Award (04/Y11/B539) to EWH; Volkswagen Stiftung und Niedersa¨chsisches Ministerium fu¨r Wissenschaft und Kultur, Germany (VWZN2012). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests:Matthew M. Binns is employed by Equine Analysis, June Swinburne by Animal DNA Diagnostics Ltd., and Emmeline W. Hill, Nickolas Orr, and Beatrice A. McGivney are associated with Equinome Ltd. There are no competing interests including patents, products in development, or marketed products to declare in relationship to this work. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing details and materials.

* E-mail: jlpeters@umn.edu

PLOS ONE | www.plosone.org 1 January 2013 | Volume 8 | Issue 1 | e54997

(2)

Introduction

With a world-wide population greater than 58 million [1], and as many as 500 different breeds, horses are economically important and popular animals for agriculture, transportation, and recreation. The diversity of the modern horse has its roots in the process of domestication which began 5,000–6,000 years ago in the Eurasian Steppe [2–4]. Unlike other agricultural species such as sheep [5] and pigs [6,7], archaeological and genetic evidence suggests that multiple horse domestication events occurred across Eurasia [2,8–12]. During the domestication process, it is believed that gene flow continued between domesticated and wild horses [13] as is likely to also have been the case during domestication of cattle [14,15]. Concurrent gene flow between domestic and wild horses would be expected to allow newly domestic stock to maintain a larger extent of genetic diversity than if domestication occurred in one or few events with limited individuals.

Prior genetic work aimed at understanding horse domestication has shown that a significant proportion of the diversity observed in modern maternal lineages was present at the time of domestication [2,8,16]. The question of mitochondrial DNA (mtDNA) diversity was further addressed by recent sequencing of the entire mtDNA genome. These studies estimate that, minimally, 17 to 46 maternal lineages were used in the founding of the modern horse [2,17];

however, those data were unable to support prior studies suggesting geographic structure among maternal lineages [9,18].

Recent nuclear DNA analyses have utilized ‘‘non-breed’’ horses sampled across Eurasia to attempt to understand the population history of the horse. These microsatellite-based studies suggest a weak pattern of isolation by distance with higher levels of diversity in, and population expansion originating from Eastern Asia [13,19]. High diversity as observed by both mtDNA and microsatellites and the absence of strong geographical patterns is likely a result of continued gene flow during domestication, the high mobility of the horse, and its prevalent use for transportation during and after the time of domestication. Interestingly, while significant diversity is observed in maternal lineages, paternal input into modern horse breeds appears to have been extremely limited as shown by a lack of variation at the Y-chromosome [20,21].

Diversity in the founding population of the domestic horse has since been exploited to develop a wealth of specialized populations or breeds. While some breeds have been experiencing artificial selection for hundreds of years (e.g. Thoroughbred, Arabian), in general, most modern horse breeds have been developed recently (e.g. Quarter Horse, Paint, Tennessee Walking Horse) and continue to evolve based upon selective pressures for performance and phenotype (Table 1). Horse breeds resulting from these evolutionary processes are generally closed populations consisting of individual animals demonstrating specific phenotypes and/or bloodlines. Each breed is governed by an independent set of regulations dictated by the respective breed association. Not all breeds are closed populations. Some breed registries allow admixture from outside breeds (e.g. Swiss Warmblood, Quarter Horse), and others are defined by phenotype (e.g. Miniature).

Finally, some populations that are often referred to as breeds are classified simply by their geographic region of origin and may not be actively maintained by a formal registry (e.g. Mongolian, Tuva) (Table 1). Those breeds that may be free ranging and experience lesser degrees of management may more appropriately be termed

‘‘landrace populations.’’ Therefore, genetic characteristics within horse breeds are expected to differ based upon differences in the definition of the breed, the diversity of founding stock, the time

since breed establishment, and the selective pressures invoked by breeders. The extent of gene flow not only varies within breed, but among horse breeds, the direction and level of gene flow is influenced by breed restrictions/requirements, and potentially by geographic distance.

Considering modern breeds, unlike mtDNA, nuclear markers can discern breed membership [12]. However, studies of nuclear genetic diversity of modern breeds to date have most commonly focused on a single population of interest, sets of historically related breeds, or breeds within a specific geographic region [22–

36]. Additionally, these analyses of nuclear genetic diversity in horse breeds are largely based upon microsatellite loci, which do not often permit consolidation of data across studies. Thus, large, across-breed investigations of nuclear diversity in the modern, domestic horse are lacking.

The Equine Genetic Diversity Consortium (EGDC), an international collaboration of the equine scientific community, was established in an effort to quantify nuclear diversity and the relationships within and among horse populations on a genome- wide scale. The development of this consortium has facilitated the collection of samples from 36 breeds for genotyping on the Illumina 50K SNP Beadchip. The breeds included in this report represent many of the most popular breeds in the world as well as divergent phenotypic classes, different geographic regions of derivation, and varying histories of breed origin (Table 1). The standardized SNP genotyping platform permits the compilation of data across breeds at a level never before achieved. Results of this collaboration now allow for the detailed description of diversity and assessment of the effects of genetic isolation, inbreeding, and selection within breeds, and the description of relationships among breeds. These data will also facilitate future across breed genome- wide association studies as well as investigations into genomic targets of selection.

Results Samples

Of the 38 populations sampled, two breeds were represented by geographically distinct populations: the Thoroughbred was sampled in the both the United States (US) and the United Kingdom and Ireland (UK/Ire), and the Standardbred was sampled in the US as well as in Norway. Eight Standardbred horses sampled from the US were noted to be pacing horses as opposed to the Norwegian and remaining US individuals that were classified as trotters. In addition, the International Andalu- sian and Lusitano Horse Association Registry (IALHA) in the US maintains one stud book but designates whether the individual was derived from Spanish (Pura Raza Espan˜ola) or Portuguese (Lusitano) bloodlines, or a combination of both. Of the Andalusian horses collected in the US, five were noted to have Portuguese bloodlines.

Phenotypic classifications of the horse breeds include those characterized by small stature (Miniature Horse, pony breeds), breeds characterized by large stature and/or large muscle mass in proportion to size (draft breeds), light horse or riding breeds, gaited breeds, rare breeds, breeds founded in the past 80 years, and populations that are relatively unmanaged (‘‘landrace’’). The number of samples, sampling location, region of breed origin, and a list of primary breed characteristics are found in Table 1.

After pruning of individuals for genotyping quality and relationships (see methods), and keeping a similar number of individuals per breed, 814 of the 1,060 horses remained in the analysis. Of the horses removed, 12 had known pedigree relationships at or more recent to the grandsire/dam level, 44

(3)

Table 1.Populations (breeds) included in the study, region of breed origin and sampling location, notes on population history relevant to diversity statistics, and breed classification based upon use and phenotype.

Breed

Geographic Origin

Region Sampled

Population

size (approx) Population Notes Classification(s)

Akhal Teke Turkmenistan US & Russia 3,500 Pedigree records began-1885, Stud book-1941 Riding horse, endurance Andalusian Spain United States 185,000 US registry formed in 1995 including Pura Raza Espan˜ola &

Lusitano bloodlines

Riding horse, sport

Arabian Middle East United States 1 million Arabian type bred for over 3,500 years; US stud book-1908 Riding horse, endurance

Belgian Belgium United States common US Association began-1887 Draft

Caspian Persia United States rare Rediscovered in 1965 with N,50, no breeding records prior;

Stud book-1966

Riding and driving pony

Clydesdale Scotland US & UK 5,000 Registry formed-1877 in Scotland; Stud book-1879 Draft Exmoor Great Britain United

Kingdom

2,000 Exmoor Pony Society-1921 Riding and driving pony

Fell Pony England United

Kingdom

6,000 Fell Pony Society began in 1922; outcrossed with Dale’s pony until 1970s

Light draft pony

Finnhorse Finland Finland 19,800 Stud book-1907 Light draft; riding horse; trotting

Florida Cracker

United States United States rare Introduced to US in 1500s; association began-1989 with 31 horses

Riding horse, gaited Franches-

Montagnes

Switzerland Switzerland 21,000 Official stud book-1921; Current breeding association established-1997

Light draft, riding horse French Trotter France France common Population closed-1937 although allows some Standardbred

influence

Riding horse, trotting

Hanoverian Germany Germany 20,000

(Germany)

Outcrossing allowed Riding horse

Icelandic Iceland Sweden 180,000 Isolated.1,000 years; Federation of Icelandic Horse Association began-1969

Riding horse, gaited

Lusitano Portugal Portugal 12,000 Stud book-1967 after split from Spanish Andalusian breed Riding horse, sport Mangalarga

Paulista

Brazil Brazil common Registry began-1934 Riding horse

Maremmano Italy Italy 7,000 Breed identification based upon conformation and inspection

Riding horse Miniature United States United States 185,000 Two US registries founded in 1970s; Maximum height

restrictions for registration

Driving pony, extreme small size Mongolian Mongolia Mongolia 2 million Many types based upon purpose and geography Riding horse, landrace Morgan United States United States 100,000 Founding sire born in 1789; Registry-1894 Riding and driving horse New Forest

Pony

England United

Kingdom

15,000 Stud book-1910 with a variety of sires; No outcrossing since 1930s

Light draft, riding pony, landrace North Swedish

Horse

Sweden Sweden 10,000 Breed association-1894; Stud book-1915 Draft

Norwegian Fjord

Norway Norway common Stud book-1909 Riding and light draft

Paint United States United States 1 million Registry-1965; One parent can be Quarter Horse or Thoroughbred

Riding horse, stock horse

Percheron France United States 20,000 Stud book-1893 Draft

Peruvian Paso

Peru United States 25,000 Breed type over 400 years old; Closed population Riding horse, gaited Puerto Rican

Paso Fino

Puerto Rico Puerto Rico 250,000 Breed type,500 years old; Association founded-1972 Riding horse, gaited

Quarter Horse United States United States 4 million Association formed-1940; One parent may be Paint or Thoroughbred

Riding horse, stock horse, racing Saddlebred United States United States 75,000 Breed type founded in late 1700s; Association began-1891 Riding and driving horse, some

gaited

Shetland Scotland Sweden common Stud book-1891 Riding pony

Shire England United States 7,000 1st Shire organization-1877 (UK); stud book-1880;

US assoc-1885

Draft

Standardbred United States Norway common Stud book-1871; Some outside trotting bloodlines (French Trotter) allowed

Riding horse, harness racing (trot)

PLOS ONE | www.plosone.org 3 January 2013 | Volume 8 | Issue 1 | e54997

(4)

individuals were removed at random from overrepresented breeds to equalize sample size across breeds, 4 failed to genotype at a rate greater than 0.90, and 186 were removed due to pi hat values (pairwise estimates of identity by descent) above the allowed threshold. Of those last 186 horses that were removed, 122 were from disease studies where relationships were common due to sampling bias.

Within Breed Diversity

Diversity indices were calculated using 10,536 autosomal SNPs that remained after pruning for minor allele frequency (MAF), genotyping rate, and linkage disequilibrium (LD) across breeds (referred to as the primary SNP set). Diversity indices were also calculated using three other SNP sets, resulting from different levels of LD-based pruning (see methods). Individuals noted as outliers in parsimony and cluster analyses (see below) were excluded from within-breed diversity calculations.

Using the primary SNP set, diversity, as measured by expected heterozygosity (He), ranged from 0.232 in the Clydesdale, to 0.311 in the Tuva (Table 2). Considering the SNP sets pruned less stringently for LD, the diversity within the Thoroughbred increased in relationship to the other breeds, as did that of nine other breeds. Mean and total heterozygosity increased with increased number of loci and less stringent LD pruning (Table 2).

Inbreeding coefficients (FIS) calculated on the primary SNP set showed significant excess homozygosity in 17 populations, which was greatest in the Andalusian (0.065). Three of the four lowest FIS

values were found in the Thoroughbred samples (UK/Ire, US, and when considered together) (Table 2).

Inbreeding coefficients (f) calculated for each individual based upon observed and expected heterozygosity showed several individuals with significant loss of heterozygosity. The highest individual value of f (0.56) was found in an Exmoor pony. Within breeds, average individual estimates of f were greatest in the Clydesdale, Mangalarga Paulista, and Exmoor while the lowest breed means were found in the landrace populations (Table 2).

Effective population size (Ne), as estimated by LD [37] using an autosomal SNP set pruned within each breed for quality, was lowest (143) in the UK/Ire sample of the Thoroughbred (UK/Ire) but also low in the other racing breeds as well as the Clydesdale (Table 2). Highest values of Ne were observed in the Eurasian landrace populations, the Mongolian (743) and Tuva (533), and also in the Icelandic (555), Finnhorse (575), and Miniature (521).

Breed-specific decay of LD essentially mirrors the results of the Ne

calculation given the relationship between the statistics. A plot of

LD across 2 Mb in a subset of the breeds that represent the range of Neestimates is found in Figure S1.

Parsimony and Principal Component Analyses

With a domestic ass designated as the outgroup, parsimony analysis of 10,066 loci pruned for LD of R2= 0.2 (see methods) resulted in generally tight clustering and monophyly of samples within breeds, supported by high bootstrap values (Figure 1).

Major clades of the tree show grouping of the Iberian breeds (Lusitano and Andalusian), ponies (Icelandic, Shetland, Minia- ture), Scandinavian breeds (Finnhorse, North Swedish Horse, Norwegian Fjord), heavy draft horses (Clydesdale, Shire, Belgian, Percheron), breeds recently admixed with and/or partly derived from the Thoroughbred (Paint, Quarter Horse, Maremmano, Swiss Warmblood, Hanoverian), modern US breeds (American Saddlebred (hereafter ‘‘Saddlebred’’ and Tennessee Walking Horse), trotting breeds (Standardbred and French Trotter), and Middle Eastern breeds (Akhal Teke and Arabian). Exceptions to monophyly include the Paint and Quarter Horse as well as the Hanoverian and Swiss Warmblood, which are mixed in clades surrounding the Thoroughbred and Maremmano. In addition, the Clydesdale was placed as a clade within the Shire breed and the Shetlands as a clade within the Miniatures. Strong bootstrap support for monophyly is present within a subset each of Lusitanos (83%), and Andalusians (87%); however the remainder of individuals from these breeds were intermixed. No structure was found within the US sample regarding individual Andalusians noted to have Portuguese bloodlines opposed to those with Spanish bloodlines (Figure S2). The Mongolian and most Tuva horses were grouped together while a subset of the Tuvas fell out as a sister clade to the Caspians. Several individuals were not positioned in the clades that represented the majority of the other individuals in the breed (Figure 1). These include three Shires, two Mongolians, a Caspian, and a Norwegian Fjord. In each instance, the outlier status of these individuals was also supported by cluster analysis (see below).

Principal component analysis (PCA) also serves to visualize individual relationships within and among breeds. The plot of PC1 vs. PC2 shown in Figure S3 illustrates relationships similar to those shown by parsimony, including the placement of outliers outside of their respective breeds. All Thoroughbred samples, regardless of origin, are separated from the others by PC1 and form a cluster at the top of the figure. Intermediate between the Thoroughbred and central cluster of breeds are the Hanoverian, Swiss Warmblood, Paint, and Quarter Horse. The Shetland, Icelandic, and Miniature split from the remainder of samples in PC2, falling out in the lower Table 1.Cont.

Breed

Geographic Origin

Region Sampled

Population

size (approx) Population Notes Classification(s)

Standardbred United States United States Stud book-1871; Harness racing in early 1800s included pacing horses

Riding horse, harness racing (trot or pace)

Swiss Warmblood

Switzerland Switzerland 15,000 Stud book-1921; Crossed with European Warmbloods, Thoroughbreds, Arabians

Riding horse, sport

Tenn Walking Horse

United States United States 500,000 Registry-1935; Blood typing and parentage verification mandated in 1993

Riding horse, gaited Thoroughbred England UK & Ireland common Stud book-1791; Closed population Race horse, riding horse, sport

Thoroughbred England United States Race horse, riding horse, sport

Tuva Siberia Russia 30,000 Different types depending on region Light draft, landrace

doi:10.1371/journal.pone.0054997.t001

(5)

Table 2.Number of samples (N), effective population size (Ne), individual inbreeding estimates (f), inbreeding coefficient (FIS), and expected heterozygosity (He) from four SNP sets pruned based upon varying levels of LD.

Expected Heterozygosity (He) Individual inbreeding (f) r2 0.1 R2 0.1 r2 0.2 r2 0.4

Breed N Ne FIS Min Max Mean 10,536 6,028 18,539 26,171

Akhal Teke 19 302 0.015* 0.015 0.297 0.101 0.287 0.281 0.303 0.311

Andalusian 18a 329 0.065* 0.028 0.274 0.114 0.296 0.293 0.308 0.312

Arabian 24a 346 0.033* 0.060 0.060 0.060 0.287 0.280 0.302 0.310

Belgian 30b 431 20.002 0.039 0.166 0.111 0.278 0.276 0.284 0.284

Caspian 18 351 20.022 20.033 0.136 0.041 0.294 0.292 0.305 0.308

Clydesdale 24 194 0.004 0.128 0.323 0.261 0.232 0.225 0.238 0.236

Exmoor 24 216 0.034* 0.055 0.556 0.239 0.247 0.242 0.253 0.252

Fell Pony 21 289 0.002 0.069 0.178 0.114 0.278 0.272 0.285 0.285

Finnhorse 27 575 20.004 0.011 0.100 0.052 0.296 0.296 0.302 0.301

Florida Cracker 7 171 0.026* 0.004 0.359 0.159 0.270 0.263 0.284 0.291

Franches-Montagnes 19a 316 0.003 0.018 0.203 0.095 0.284 0.279 0.297 0.301

French Trotter 17a 233 20.018 0.064 0.173 0.105 0.275 0.262 0.295 0.307

Hanoverian 15a 269 20.010 0.002 0.087 0.052 0.294 0.280 0.320 0.335

Icelandic 25c 555 0.006* 0.043 0.234 0.083 0.289 0.288 0.290 0.288

Lusitano 24 391 0.039* 0.008 0.220 0.090 0.296 0.292 0.309 0.315

Maremmano 24 341 20.012 20.015 0.109 0.038 0.298 0.287 0.318 0.329

Miniature 21 521 0.005 0.043 0.161 0.075 0.291 0.292 0.296 0.295

Mangalarga Paulista 15 155 20.011 0.176 0.320 0.242 0.235 0.228 0.246 0.250

Mongolian 19a 751 0.001 20.034 0.055 0.015 0.309 0.308 0.314 0.314

Morgan 40 448 0.040* 0.003 0.307 0.090 0.296 0.287 0.310 0.317

New Forest Pony 15 474 0.000 20.022 0.066 0.025 0.304 0.300 0.316 0.319

Norwegian Fjord 21a 335 20.003 0.053 0.168 0.122 0.274 0.274 0.278 0.277

North Swedish Horse 19 369 0.011* 0.069 0.210 0.133 0.275 0.276 0.279 0.278

Percheron 23 451 0.003 0.043 0.143 0.086 0.287 0.284 0.292 0.293

Peruvian Paso 21 433 0.002 0.008 0.134 0.055 0.298 0.293 0.306 0.310

Puerto Rican Paso Fino 20 321 20.003 0.004 0.298 0.103 0.280 0.278 0.287 0.290

Paint 25 399 0.006* 20.013 0.101 0.040 0.302 0.289 0.324 0.337

Quarter Horse 40a 426 0.011* 20.012 0.144 0.047 0.302 0.290 0.323 0.336

Saddlebred 25d 297 20.008 0.051 0.145 0.103 0.279 0.268 0.297 0.306

Shetland 27 365 0.032* 0.108 0.370 0.182 0.264 0.268 0.268 0.266

Shire 23 357 0.024* 0.130 0.258 0.187 0.261 0.252 0.268 0.267

Standardbred - Norway 25e 232 20.004 0.063 0.202 0.130 0.272 0.255 0.289 0.298

Standardbred - US 15 179 0.039* 0.097 0.222 0.153 0.276 0.262 0.293 0.303

Standardbred - all 40 290 0.022* 20.028 0.323 0.130 0.276 0.260 0.293 0.303

Swiss Warmblood 15a 271 0.005 0.023 0.117 0.059 0.296 0.281 0.322 0.337

Thoroughbred - UK/Ire 19a 143 20.028 0.089 0.171 0.133 0.264 0.245 0.292 0.309

Thoroughbred - US 17a 163 20.015 0.093 0.182 0.134 0.267 0.250 0.295 0.313

Thoroughbred - all 36 190 20.019 0.089 0.182 0.134 0.266 0.248 0.294 0.312

Tuva 15 533 0.016* 20.028 0.116 0.022 0.311 0.309 0.320 0.322

Tennessee Walking Horse 19 230 0.008* 0.065 0.276 0.148 0.269 0.256 0.284 0.291

Mean 22.3 341 0.007 0.039 0.204 0.107 0.282 0.275 0.295 0.300

Total 814 0.313 0.303 0.329 0.336

Min 20.028 20.034 0.055 0.015 0.232 0.225 0.238 0.236

Max 0.005 0.176 0.556 0.261 0.311 0.309 0.324 0.337

aIndividuals from this breed also included in [41];

b20 of these individuals were also reported in [41];

c17 of these individuals were also reported in [41];

d21 of these individuals were also reported in [41];

e19 of these individuals were also reported in [41].

FISand f were calculated based upon the primary SNP set (10,536 loci). Samples also used in [41] are indicated in the footnotes.

*indicates significance ata,0.05 determined by 10,000 permutations.

doi:10.1371/journal.pone.0054997.t002

PLOS ONE | www.plosone.org 5 January 2013 | Volume 8 | Issue 1 | e54997

(6)

left corner, and the British drafts anchor the figure at the lower right. While most breeds cluster tightly, several are dispersed across one or both PCs. The Hanoverian, Swiss Warmblood, Paint, and Quarter Horse, as noted above, are extended along PC1, while the Arabian and Franches-Montagnes show similar spreading, also along PC1. The Tuva, Clydesdale, and Shire individuals also are not as tightly clustered as other populations despite the low within breed diversity of the latter two.

Distance Analysis

An unrooted neighbor joining (NJ) tree of Nei’s distance [38]

was constructed using SNP frequencies within breeds from the 10,536 SNP data set (Figure 2). The relative placement of breeds reflects that seen in the parsimony tree with several exceptions.

The Paint, Quarter Horse, Swiss Warmblood, Hanoverian, Maremmano, and Thoroughbred, are found in one large branch of the tree, although the Maremmano is placed outside of the clade containing the aforementioned breeds. The position of the Morgan with the Saddlebred and Tennessee Walking Horse also deviates from parsimony analysis but reflects historic records of relationships among these breeds. The Scandinavian breeds remain in one branch of the clade, which also includes the Shetland and Miniature. Unlike the parsimony cladogram, the Caspian falls in a clade with the other Middle Eastern breeds, the Arabian and Akhal Teke. Finally, the Exmoor, a British breed, is placed with another British breed, the New Forest Pony, rather than with the Scandinavian breeds as in the parsimony analysis.

Each branch shows support of over 50%, with many clades being supported by over 99% of the 1,000 bootstrap replicates.

Cluster analysis

Likelihood scores for runs of various K in Structure showed an increase in overall mean ln P(X|K) until K = 35 (Figure S4). A clear ‘‘true’’ value of K is not obvious examining the likelihood scores or using the Evanno method [39] (data not shown);

however, variance among runs begins to increase with a diminishing increase in likelihood scores after K = 29, which is near the peak of the curve. The value of the highest proportion (breed average q-value) of assignment of each breed for each value of K, as well as the cluster to which it assigns is shown in Table S1.

Additionally, the proportion assignment at K = 29 for each of the breeds is found in Table S2.

The first breeds to have all individuals assign strongly to one cluster are the Thoroughbred and Clydesdale (with Shire) at K = 2, followed by the Shetland at K = 3; these four breeds do not show signs of admixture at any K value analyzed. Evidence of weak geographic grouping is observed at K = 4, which consists of:

1, the Middle Eastern and Iberian breeds (pink); 2, the Thoroughbred and breeds to which it continues to be or was historically crossed (yellow); 3, breeds developed in Scandinavia and Northern Europe (orange); and 4, the British Isles draft breeds (blue) (Figure 3).

Figure 1. Individual and breed relationships among 814 horses illustrated by parsimony.Parsimony tree created from 10,066 SNPs and rooted by the domestic ass. Breeds are listed in the legend in order starting from the root and working counterclockwise. Individual outliers with respect to their breeds are noted with arrows. Bootstrap support calculated from 1,000 replicates is shown for major branches when greater than 50%.

doi:10.1371/journal.pone.0054997.g001

(7)

Figure 2. Distance based, neighbor joining tree calculated from SNP frequencies in 38 horse populations.Majority rule, neighbor joining tree created from 10,536 SNP makers using Nei’s genetic distance and allele frequencies within each population. Percent bootstrap support for all branches calculated from 1,000 replicates is shown.

doi:10.1371/journal.pone.0054997.g002

Figure 3. Bayesian clustering output for five values of K in 814 horses of 38 populations.Structure output for five values of K investigated. Each individual is represented by one vertical line with the proportion of assignment to each cluster shown on the y axis and colored by cluster. Other values of K are shown in Figure S1 and a summary of assignment of each breed in Tables S1 and S2.

doi:10.1371/journal.pone.0054997.g003

PLOS ONE | www.plosone.org 7 January 2013 | Volume 8 | Issue 1 | e54997

(8)

Middle Eastern and Iberian Breeds

As also observed in the NJ tree, clustering of the Iberian and Middle Eastern breeds with the Mangalarga Paulista, Peruvian Paso, and Puerto Rican Paso Fino (q .0.5) is observed until K

= 8, after which point the Mangalarga Paulista assigns with q

= 0.93 to another cluster. The remaining breeds cluster together until K = 12, at which time the Middle Eastern breeds (Arabian, Akhal Teke, and Caspian) are assigned to their own cluster, leaving the Iberian breeds clustered with the Peruvian Paso and Puerto Rican Paso Fino. At low values of K (i.e.K,6) the Florida Cracker, Saddlebred, Standardbreds, Morgan, and Tennessee Walking Horse fall into the cluster with the Iberian and Middle Eastern breeds with breed mean q.0.5. At K = 29, each of these breeds is assigned with q.0.72 to an individual cluster with the exception of the Lusitano and Andalusian, which remaining clustered together.

Thoroughbreds and Thoroughbred Crossed Breeds Relationships described by the NJ tree among the Thorough- bred, Hanoverian, Swiss Warmblood, Paint, Quarter Horse, and Maremmano are also seen in cluster analysis. Clustering of those breeds with the Thoroughbred is observed throughout the values of K examined although at moderate frequencies (Figure 3, Table S1, Figure S5). At K = 29, the Hanoverian and Swiss Warmblood remain assigned to the cluster defined by the Thoroughbred but with assignment probabilities of 0.51 each. The Quarter Horse and Paint also assign to this cluster with q-values of 0.30 and 0.34, respectively. Neither the Quarter Horse, Paint, Hanoverian, or Swiss Warmblood populations assign to any cluster with q.0.62 at K = 29. No evidence of population substructure is observed between the US and UK/Ire Thoroughbreds as also shown by PCA and parsimony analyses (Figure S6).

Scandinavian and Northern European Breeds

As in the NJ and parsimony trees, the Finnhorse, Icelandic, Miniature, North Swedish Horse, Norwegian Fjord, and Shetland are parsed into the same cluster (q-value.0.5) through K = 5.

However, unlike the NJ tree, at K = 4, the highest value of assignment places the Belgian and Percheron into this cluster although with q ,0.5 (0.42 and 0.38, respectively). The relationship remains until K = 6, at which time the Miniature, Icelandic, and Shetland fall into a different cluster. At K = 10, the Icelandic clusters again with the North Swedish Horse and Norwegian Fjord. The Norwegian and United States Standard- bred populations, which at K = 4 assign with q.0.5 to the cluster containing the Scandinavian breeds, separate from the Scandina- vian breeds at K = 5. At K = 31, substructure appears in the Standardbred samples, which correlates to those individuals identified as pacers and that fall into an individual clade in the parsimony tree (Figure S7). At K = 29, the Miniature and Shetland continue to be assigned to the same cluster (q-values

= 0.55 and 0.95, respectively). The next highest proportions of assignment of the Miniature horse are to the clusters described by the New Forest Pony (q = 0.20) and Icelandic (q = 0.11). No value of K evaluated eliminated signals of admixture from all populations in the dataset at K = 38 (the actual number of populations sampled) or any value of K through 45 (data not shown).

British Isles Draft

The Clydesdale and Shire cluster together, and apart from the other breeds beginning at K = 3. In addition, the Fell Pony, which is placed within the same clade in the NJ and parsimony trees, and

proximal to the Clydesdale and Shire in PCA, shows moderate assignment to this cluster (0.29,q,0.41) for several values of K from 4 to 14. At K = 29, the Shire assigns to the same cluster as the Clydesdale with q = 0.69. The individual outliers from the Shire breed also noted in parsimony analysis are evident beginning at K = 3. Excluding these outliers, at K = 29, the proportion of assignment for the Shires to the cluster with the Clydesdale increases to 0.74.

FST

All pairwise FST values calculated between the 37 populations (excluding the Florida Cracker) were significant as tested by 20,000 permutations (Figure 4). The lowest level of differentiation was found between the Paint and Quarter Horse populations (FST

= 0.002), while the greatest divergence was observed between the Clydesdale and Mangalarga Paulista (FST= 0.254). The two Thoroughbred populations had an FST value of 0.004, while the two Standardbred populations had 10-fold greater divergence (FST = 0.020) than the minimum observed value in this dataset;

this value is similar to that observed between the Lusitano and Andalusian (0.021). An FSTvalue of 0.006 was identified between the Tuva and Mongolian populations. The global FSTvalue was 0.100. AMOVA computed on the set of 37 samples (excluding the outliers identified in Structure and the Florida Cracker) showed that 10.03% of the variance was accounted for among populations (p =,0.001), 0.53% of the variance was among individuals within populations (p = 0.19), and 89.44% of the variation was within individuals (p,0.001).

Discussion

These data are gathered from populations that represent tremendous diversity in phenotype and breed specialization. With breeds sampled across four continents, the resulting relationships observed largely reflect similarities of geographic origin, docu- mented breed histories, and shared phenotypes. In general the highest within breed diversity was observed in breeds that are recently derived, continue to allow introgression of other populations, those that have a large census population size, and landrace populations that experience a lesser degree of controlled breeding. Not surprisingly, low diversity is observed in breeds with small census size, relatively old breeds with closed populations, and those with documented founder effects, whether due to population bottlenecks or selective breeding.

A total of seven individuals were identified by parsimony and cluster analysis as outliers with respect to the breed to which they were assigned. The pedigrees of these individuals were unknown.

Because it is possible these horses were unknowingly crossbred or subject to mishandling in the field or laboratory, they were excluded from the within-breed analyses to avoid potential bias in indices of diversity. In addition, the potential impact of SNP ascertainment bias on diversity calculations must be acknowl- edged. The reference genome is from a Thoroughbred mare [40]

and SNP identification was based upon the reference genome and data from seven other horses representing six breeds. Therefore, SNPs are generally derived to identify modern variation within the Thoroughbred as well as between the Thoroughbred and these other breeds. Thus, the SNPs identified may reflect an upward bias in diversity indices in the Thoroughbred and closely related breeds [41]. It seems that ascertainment bias may have particularly influenced the results when considering the data sets that have an increased number of loci resulting from relaxed LD pruning. These results show an increase in the relative diversity of the Thoroughbred, breeds with which the Thoroughbred contin-

(9)

ues to actively interbreed, and the other SNP discovery breeds, with respect to other breeds in the study. This is opposite of what may be expected given the high levels of genome-wide LD in the Thoroughbred. Without considering SNP ascertainment bias, it is expected that measured diversity would increase in breeds with low LD more quickly than in those with high LD, due to greater independence of markers in the former breeds. These SNPs, derived largely from the Thoroughbred, are apparently detecting a higher proportion of Thoroughbred-specific, rare variants and it appears that as more loci are included, more of these Thorough- bred-based variants are assayed, resulting in the observed increase in variation in the Thoroughbred, Thoroughbred-influenced breeds, and breeds used in SNP ascertainment.

The majority of the analyses were performed using 10,536 SNP markers pruned across breeds for LD of r2.0.2 as well as MAF of 0.05 or above. Even though additional markers could have been used for analysis, many population-level statistics assume inde- pendence of loci. The stringent pruning for LD was therefore undertaken to eliminate bias in the test statistics that may result from substantial breed-specific differences in LD [40,41]. A truncated data set also helped to make calculations, especially cluster analysis, computationally feasible. On the other hand, diversity indices were calculated after pruning the full data set to r2= 0.2 and r2= 0.4 (using pairwise correlation), and one replicate setting the threshold to R2,0.1 (using the variance inflation factor), to examine the effect of allowing for varying levels of LD and therefore varying numbers of loci (see methods).

Within-breed Diversity

Even considering SNP ascertainment, low diversity as measured by He was observed in the Thoroughbred as well as the Standardbred, which both experience high selective pressures and are closed populations. Low diversity was also observed in breeds that have undergone a severe population bottleneck, such as the Exmoor and Clydesdale, and breeds that have small census population sizes, such as the Florida Cracker. Although the

Thoroughbred is a large population that is widely distributed on a geographic scale, historic records suggest that one sire is responsible for 95% of the paternal lineages in the breed and as few as 30 females make up 94% of maternal lineages [28]. In addition, the population has been largely closed to outside gene flow since the formation of the first stud book in 1791 [42] and individuals within the breed are subject to selective pressure for racing success; therefore low, within-breed diversity is not at all surprising.

Using LD-based calculations, the estimated Ne for the Thoroughbred was similar to that found in a UK sample [43]

and among the lowest of the study set despite the large census population size and geographic distribution of this breed.

Individual inbreeding values based upon observed vs. expected homozygosity indicate that individual Thoroughbred horses show signs of inbreeding, with a mean loss of heterozygosity of 16.3%.

This value is slightly larger than that found in [28] (13.9%). Using the same SNP array, [44] also showed inbreeding in the Thoroughbred, and specifically an increase in inbreeding over time. The only breeds with higher f values were the Exmoor, Clydesdale, Mangalarga Paulista, and Shire. Despite low individ- ual diversity, FIS values do not show significant inbreeding in either of the Thoroughbred populations as a whole, or in the Norwegian Standardbred although FIS is significant in the US Standardbred population (discussed below).

The Clydesdale and Exmoor, in addition to having high individual estimated coefficients of inbreeding, also show the lowest within-breed diversity observed in the dataset. A lack of diversity in the Clydesdale and another British draft breed, the Shire, is likely a result of a severe population bottleneck observed in most draft breeds with the onset of industrialization and after the conclusion of World War II (WWII) as well as selection for size and color [45,46]. The Exmoor pony, considered to be one of the purest native breeds of Britain, has been naturally selected for survival in harsh winter conditions on the moors in southwest England [45,47]. Similar to the draft breeds, the Exmoor Figure 4. Pairwise FSTvalues based upon 10,536 SNPs in 37 horse populations.Pairwise FSTvalues as calculated in Arlequin using 10,536 autosomal SNPs and significance tested using 20,000 permutations. All pairwise values are significantly different from zero. (individual outliers were removed from this analysis).

doi:10.1371/journal.pone.0054997.g004

PLOS ONE | www.plosone.org 9 January 2013 | Volume 8 | Issue 1 | e54997

(10)

population decreased significantly after WWII to approximately 50 individuals, undoubtedly influencing the diversity observed in this study. The effect of low population size and selection is also reflected in extremely high individual estimates of f within some individuals. Finally, the Mangalarga Paulista shows low levels of heterozygosity, and as discussed below, the greatest divergence as measured by pairwise FST of all breeds in the study. While these results could be due to geographic distance between this and other breeds, and/or genetic drift, unfortunately these horses were all sampled from only two farms and likely do not represent the entirety of the diversity present in the breed; therefore we cannot rule out sampling error which would inflate the estimated level of divergence between these individuals and the other breeds and result in a decrease in He. However, a lack of diversity in sampling of the breed would not have an effect on estimates of individual inbreeding coefficients, which were among the highest of the entire data set.

Converse to the above examples, high levels of diversity as measured by both He and Ne, accompanied by low estimates of inbreeding (f and FIS), are observed in the Mongolian, Tuva, and New Forest Pony. The Mongolian and Tuva are unique in that they represent landrace populations that are less managed than the popular breeds of Western Europe and North America; they occupy a diverse range of habitat, have been selected for meat and milk in addition to use in transportation, and originate in the region where domestication was likely to have occurred. The population of Mongolian horses is large and individuals are phenotypically diverse [48]. In 1985, approximately two million Mongolian horses of four different types were estimated to live within the country [45]. The Tuva is not as numerous as the Mongolian but is similar in its purpose and also has high within- breed phenotypic diversity. In addition, it is suggested that the Tuva has experienced outcrossing in order to increase its size and stamina [45] as may also be the case in the Mongolian [49].

Similarly, the New Forest Pony was historically a free-ranging population in Great Britain, but was crossbred until the 1930’s.

These traits: old populations, large population size, outcrossing, high phenotypic diversity, and lesser artificial selection/manage- ment, result in the high levels of genetic diversity observed. This extent of diversity appears to diminish as populations are restricted by selective pressures into formal breeds.

Other population characteristics are likely the cause of the diversity observed in the Finnhorse, Icelandic, and Miniature. In the case of the Icelandic, the high level of diversity was possibly maintained by a large census population size despite isolation for almost a thousand years and several population bottlenecks due to natural disasters [45]. In the case of the Finnhorse, diversity may be due to within-breed substructure into four sections of the studbook established in 1970: the work horse (draft), trotters, riding horse, and pony [50]. Finally, high diversity in breeds such as the Miniature is likely a result of a diverse founding stock [45,51,52]; horses of small size from a variety of geographic regions and bloodlines were utilized in founding the breed, which is defined by phenotype.

All of these three factors, large population size, phenotypic diversity within the breed, and a diversity of founding stock, also lead to the relatively high levels of diversity observed in the Paint and Quarter Horse; in addition, these breeds both allow continued outcrossing between themselves and with the Thoroughbred and have experienced a tremendous population expansion since the formal foundation of the breeds within the past 45–75 years. Due to the relative infancy of these populations, it could be argued that the Paint, Quarter Horse, and other, newly-derived breeds, have not yet had time to undergo the evolutionary processes necessary

to be genetically distinct populations as is observed in breeds with longer histories and closed studbooks. However, even with high within-breed diversity and large census population sizes (over 1 million and 4 million worldwide for the Paint and Quarter Horse, respectively), Nefor these breeds account for only a fraction of the census size, demonstrating non-random mating and selection.

Outcrossing is also continued in the Swiss Warmblood and Hanoverian breeds, which show similar trends in diversity measures as the Quarter Horse and Paint. The relatively low Ne

in these breeds, accompanied by moderate He may partially be due to significant crossing with the Thoroughbred, which would contribute long blocks of LD [40,41], resulting in decreased estimates of Ne.

Of note in breeds such as the Quarter Horse, Lusitano, and Andalusian, is that despite moderate to high relative levels of He, and low to moderate estimates of f, FISvalues in each breed are significantly positive. Significant FISwas also previously observed in the Iberian breeds using microsatellite markers [53]. While selection and inbreeding may be responsible for significant values of FISin some of these breeds, another instance in which FISmay be significantly positive is in the presence of subpopulation structure within the sample. Evidence of this in the Lusitano and Andalusian is present in parsimony analysis where individuals of the two breeds fall into one clade, but within that clade are two highly supported branches represented by a subset of each breed.

In addition, when forcing high values of K in Structure, such as observed at K = 35, Andalusian and Lusitano individuals fall into one of two clusters with q-value .0.5 in a nonbreed-specific manner (data not shown). These results support [54], which showed potential subpopulation structure in the Lusitano via microsatellite analysis. In the Quarter Horse, subpopulation structure is evident through the evaluation of bloodlines and the selection of popular sires for diverse performance classes. This population substructure in the Quarter Horse has also been demonstrated by marked differences in allele frequencies among performance types (cutting, western pleasure, halter, racing, etc.) [55]. A similar instance is found in the US population of the Standardbred, which also has significant excess homozygosity (FIS). Unlike Standardbreds in Europe, which are raced at a trot, those in the US are divergently selected for racing at either the pace or the trot, creating structure within the breed [56].

Finally, several rare populations are included in this dataset.

The Caspian is one of the oldest breeds in the Middle East and was thought to be extinct until its recent rediscovery in 1965. The Florida Cracker, a now rare breed, was developed in the United States from feral stock of Iberian descent [57]. The sample size of the Florida Cracker limits the conclusions that can be drawn regarding within-breed diversity. However, the Caspian shows high Ne, He, and estimates of f, given its rarity. After rediscovery of the breed, which historically was believed to represent a type of landrace population, [58] describes a three-year survey, which found approximately 50 individuals remaining, noting that many could not be considered ‘‘pure.’’ In addition, [33] were unable to show evidence of a recent bottleneck in the Caspian breed. The diversity observed in what are now considered Caspian horses likely stems from high levels of diversity within those individuals that founded the modern population.

Among-breed Diversity

The expectation of homogeneity within breeds due to closed populations and selection is supported by the results of AMOVA, which show significant variation present among populations, but a non-significant proportion of variance within. However, the variation among samples lends information about current and

Referenzen

ÄHNLICHE DOKUMENTE

The effect of sample size and sampling a geographically restricted area on the number of fixed SNPs (F ST = 1) was examined to under- stand to what extent false- positive fixed

Finally, although the MSTN variants shown to be associated with muscle fiber type proportions are found most commonly within the extended haplotype putatively under selection,

Each stallion was in contact with 9 to 15 mares, with rates of MHC-dissimilar mares ranging from 25–100% (no MHC-similar mare could be found for one of the stimulus stallions

The overlapping homozygous regions (Table 3) can be assigned into four different groups: a) the orientalized group, including Akhal Teke, Purebred Arabian, Shagya Arabian from 21.8

When different individuals from the same mouse line were compared, varying total numbers of Lsi1 or Lsi2 mGFP-positive GCs (or pyramidal neurons) did not affect the fractions

Genetic diversity and relationships among the six German heavy draught horse breeds South German Coldblood, Rhenish German Draught Horse, Mecklenburg Coldblood, Saxon Thur-

The aim of this work was to create a contribution to the comparative physical gene map, especially to the cytogenetic gene map of the horse by mapping candidate genes

A human antibody directed to the factor VIII C1 domain inhibits factor VIII cofactor activity and binding to von Willebrand factor. KIRKPATRICK