Genome-wide single nucleotide polymorphism (SNP) identification and characterization in a non-model organism, the African buffalo (Syncerus caffer), using next generation sequencing
Nathalie Smitz
a,b,∗, Pim Van Hooft
c, Rasmus Heller
d, Daniel Cornélis
e,f, Philippe Chardonnet
g, Robert Kraus
h,i, Ben Greyling
j, Richard Crooijmans
k, Martien Groenen
k, Johan Michaux
a,eaConservationGenetics,UniversityofLiège,BoulevardduRectorat26,4000Liège,Belgium
bJointExperimentalMolecularUnit,RoyalMuseumforCentralAfrica,Leuvensesteenweg11-17,3080Tervuren,Belgium
cResourceEcologyGroup,WageningenUniversity,P.O.Box47,6700AAWageningen,TheNetherlands
dBioinformatics,DepartmentofBiology,UniversityofCopenhagen,OleMaaløesVej5,2200Copenhagen,Denmark
eCentredeCoopérationInternationaleenRechercheAgronomiquepourleDéveloppement(CIRAD),UPRAGIRS,CampusInternationaldeBaillarguet, F-34398Montpellier,France
fCentredeCoopérationInternationaleenRechercheAgronomiquepourleDéveloppement(CIRAD)-RP-PCP,UniversityofZimbabwe,Harare,Zimbabwe
gInternationalFoundationfortheConservationofWildlife(IGF),RuedeTéhéran15,75008Paris,France
hDepartmentofBiology,UniversityofKonstanz,78457Konstanz,Germany
iDepartmentofMigrationandImmuno-Ecology,MaxPlanckInstituteforOrnithology,AmObstberg1,78315Radolfzell,Germany
jAgriculturalResearchCouncil,OldOlifantsfonteinRoad,IreneCenturion0062,SouthAfrica
kAnimalBreedingandGenomicsCentre,WageningenUniversity,Droevendaalsesteeg1,Wageningen,6708PB,TheNetherlands
Keywords:
Populationgenomics Conservation Diseaseecology Molecularmarkers
a b s t r a c t
ThisstudyaimedtodevelopasetofSNPmarkerswithhighresolutionandaccuracywithintheAfrican buffalo.Suchasetcanbeused,amongothers,todepictsubtlepopulationgeneticstructureforabetter understandingofbuffalopopulationdynamics.Intotal,18.5millionDNAsequencesof76bpweregen- eratedbynextgenerationsequencingonanIlluminaGenomeAnalyzerIIfromareducedrepresentation libraryusingDNAfromapanelof13Africanbuffalorepresentativeofthefoursubspecies.Weidentified 2534SNPswithhighconfidencewithinthepanelbyaligningtheshortsequencestothecattlegenome (Bostaurus).Theaveragesequencingdepthofthecompletealignedsetofreadswasestimatedat5x,and at13xwhenonlyconsideringthefinalsetofputativeSNPsthatpassedthefilteringcriterion.Oursetof SNPswasvalidatedbyPCRamplificationandSangersequencingof15SNPs.Ofthese15SNPs,14ampli- fiedsuccessfullyand13wereshowntobepolymorphic(successrate:87%).Thefidelityoftheidentified setofSNPsandpotentialfutureapplicationsarefinallydiscussed.
Introduction
TheAfricanbuffalo(Synceruscaffer)hassufferedofmajorpop- ulationlossesduringthelastcentury,impactingallbutunevenly subspecies.Habitatloss,climaticchanges,poachinganddiseases are the main challenges currently threatening the species sur- vival,contributingtolocalbuffalopopulationsdecimation.Direct competitionforspaceandresourcesgraduallyappearedwiththe
∗Correspondingauthor.Presentaddress:MRAC,Leuvensesteenweg11-17,3080, Tervuren,Belgium.
E-mailaddress:nathalie.smitz@africamuseum.be(N.Smitz).
expansionoflivestockfarmingandagriculture.Currentlyaround 75%oftheglobalAfricanbuffalopopulationislocatedinprotected areas(East,1999).Theresultingdisruptionofnaturalwildlifepop- ulationadmixtureislikelyresponsibleforgeneticerosion(Young and Clarke, 2000; Frankham et al., 2002). Isolated populations arelikelytohavelowerreproductivefitnessandlosetheiradap- tivegeneticvariation,whilepresentingahigherriskofextinction (Frankhametal.,2002).Conservationgeneticshelpinidentifying andpromotingappropriatemanagementmethodstoreducethe risksof speciesextinction throughthe studyofthespatial dis- tributionof mutationsbetweenand amongpopulations. Recent technologicaladvanceshaverevolutionizedthegenerationofthese geneticresources,allowingDNA-libraryconstruction,large-scale
Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-379374
Erschienen in: Mammalian Biology - Zeitschrift für Säugetierkunde ; 81 (2016), 6. - S. 595-603 https://dx.doi.org/10.1016/j.mambio.2016.07.047
596
sequencingandidentificationofsinglenucleotidepolymorphism (SNP)geneticmarkers(Seebetal.,2011).SNPswereshowntocon- stitutehighlyinformativemarkers(Morinetal.,2009)andlead toabetterinferenceofpopulationstructurethanmicrosatellites (Liuetal.,2005;Santureetal.,2010).Attentionhasbeguntoshift towardSNPsaspreferredgeneticmarkersduetotheirincreased powerofresolution and accuracy forstudying finescalepopu- lationstructure (Schlötterer, 2004).Thisis based ontheirhigh abundancethroughout thegenome,simple mutationcharacter- istics,lowmutationrates,usabilityonnon-invasivesamplesand historicalDNA,andstandardizationpossibilitiesbetweenlabora- tories(Krausetal.,2014;Morinetal.,2007a,b,2004;Luikartetal., 2003).SNPshavebecomeanestablishedmarkerinmolecularecol- ogy,evolutionarygenetics,andanimalbreeding(Daveyetal.,2011;
Krausetal.,2014,2012;Morinetal.,2004;Santureetal.,2010).
Despitetheirattractiveness,somedifficultieshavebeenexperi- encedindevelopingSNPinnon-modelorganismsduetothelimited ornogenomicresourcesavailable,leadingtocomplexlaboratory screeningofsegmentsofthegenomefrommultipleindividualsto yieldonlyasmallnumberofindependentSNPs.Next-generation sequencing (NGS) allows to overcome this issue by providing large-scalegenomevariation studiesbasedondeepsequencing ofrelativelylargegenomefractions(>1%)oreven thecomplete genome (Seeb et al., 2011). However, not so long ago, within non-modelorganisms,thepredominanttechniquehasbeenthe targetedgeneapproach,usingregularSangersequencing(Sanger etal.,1977),sinceitdoesnotrequirespecies-specificpre-existing DNAdataandisapplicabletomanytaxa.AfewhundredSNPswere identifiedusingthisapproachfornumerousspecies(e.g.,158SNPs, Sceloporusundulates;112SNPs,Salmosalar;768SNPs,Pusahisp- idahispida;168SNPs,Thryothoruspleurostictus)(Andreassenetal., 2010;Crameretal.,2008;Olsenetal.,2011;Rosenblumetal.,2006).
OnlyafewSNPsperspecies(<100)havebeendevelopedusingthe targetedgeneapproachforanimalsofconservationconcernsuch asthemarmoset(Saguinusoedipus),thedhole(Cuonalpinus)and theelephant(Loxodontaafricana)(Aitkenetal.,2004).Thetargeted geneapproach,althoughstillwidelyused,islaborious,timecon- suming,costlyandyieldsonlyafairlylimitedamountofcandidate SNPsincontrasttoNGS.
TheReducedRepresentationLibrary(RRL)approachisa NGS methodthatinvolvesa digestionstepofmultiplegenomicDNA sampleswithrestrictionenzyme(s),a selectionof theresulting restrictionfragmentsandasequencingstep.RRLapproacheshave beenusedtogeneratetensofthousandstomillionsofcandidate SNPswithagenome-widecoverageforexampleincattle(Tassell etal.,2008),turkey(Kerstensetal.,2009)andgreattit(VanBers etal.,2010).Alternatively,SNPresourcesfromonespeciescouldbe usedinacloselyrelatedspecies.AnIlluminaBovineSNP50Bead- Chiphasbeendevelopedforcattle(Bostaurus),acloserelativeto theAfricanbuffalo(Matukumallietal.,2009).ThisBeadChipscores 54,001informativeSNPsthatareuniformlydistributedalongthe entirecattlegenome.Ithasahighcrossamplificationsuccessrate acrosscattlebreeds (Matukumalliet al.,2009).However,when used onother bovid species, the number of polymorphic sites decreasessubstantially.OnlyafewpercentofallSNPsonthechip werestillpolymorphic(Milleretal.,2010)whentestedonother speciessuchasthewaterbuffalo(Bubalusbubalis−1159SNPs), theYak(Bosgrunniens−949), theNorth AmericanBison (Bison bison−1604),andtheBanteng(Bosjavanicus−1429)(Michelizzi etal.,2011).SimilarresultswereobtainedwhentestingtheOvi- neSNP50BeadChip,developedfordomesticsheep,ontworelated ovidspecies(Milleretal.,2010).Cross-speciesamplificationofSNP assaysusuallydoesnotworkwellcomparedtocross-speciesampli- ficationofmicrosatellites(Krausetal.,2012).Evenifgenotypingis successful,manypolymorphismsin onespeciesarefixedinthe other.Moreover,cross-speciesSNPsmayharborextremebiases
inallelefrequencies,sincetheymaypredominantlybefoundin regionsofthegenomeundernaturalselectionfavoringpolymor- phism(e.g.,balancingselection).
Sincecross-speciesgenotypingofSNPsoftenseemsproblem- aticorbiased,thisstudyaimstocharacterizeagenome-widesetof SNPsspecificallyfortheAfricanbuffalooveritswholedistribution area(sub-SaharanAfrica).ApreviousstudyconductedbyLeRoex etal.(2012)alreadyaimedatidentifyingSNPsintheAfricanbuffalo, howevertheirsamplingwaslimitedtotheCapebuffalosubspecies (Synceruscaffercaffer)andtotheHluhluwe-iMfoloziNationalPark (NP).ThebuffalopopulationwithinthisNationalParkisknown tobeaffectedbystrongnon-equilibriumconditionslinkedtoa founderevent(Smitzetal.,2014;DuToit,1954;Kappmeieretal., 1998).Inthepresentstudy,NextGenerationSequencingofreduced representationlibrariesforSNPdiscoverywasused.Thegenome ofanotherBovidspecies,Bostaurus,whichdivergedfromAfrican buffaloapproximately12millionyearsago,wasusedasareference formappingthereads(HassaninandRopiquet,2004;Pitraetal., 2002;RobinsonandRopiquet,2011;TimeTreesoftware-Hedges etal.,2006;KumarandHedges,2011).Thepresentstudyallowed theidentificationof2534SNPswithhighconfidencebyaligning shortsequencesoftheAfricanbuffalo(Synceruscaffer)tothecattle genome(Bostaurus).
Materialandmethods
Samplecollectionandlibrarypreparation
A geographically large and diverse panel of African buffalo wassampled:6fromEastand SouthernAfrica[SouthAfrica(2), Uganda(1), Kenya(1), Ethiopia(1), Namibia(1)] belonging tothe Synceruscaffercaffersubspecies,and7fromWestandCentralAfrica [CentralAfricanRepublic(1), Niger(3),Chad(2),BurkinaFaso(1)]
belongingtotheS.c.nanus,S.c.brachycerosandS.c.aequinoctialis subspeciesrespectively(Fig.1).These subspeciesweregrouped togetherbecausephylogeneticstudiesshowedthattheyformone cladewithonlyminortomoderateFSTdifferentiationbetweensub- species,rangingbetween0.02 and0.12 (Smitzetal.,2013;Van Hooftetal.,2002).Sampleextraction, selectionandRRLlibrary preparationproceduresareavailableasSupplementaryinforma- tion(Supplementaryfile1).
Sequencefiltering
Priorto thesequence alignment steps,differentfilters were appliedtotherawIllumina sequencedataaccording toseveral criteria.First,sequenceswereexpectedtostartwitha CTdinu- cleotidebecauseoftheAluIrestrictionsite(betweenATandCT).
Allsequencesnotbearingthispatternwerediscardedaspotential contamination.Secondly, averagequalityscoreswerecalculated foreachreadbytakingthemeanofallindividualscoresateachof the76positions.Readspresentinglowoverallphredqualityscores wereremoved(EwingandGreen,1998).Moreover,endofreads displayingtwosuccessivereadpositionswithaveragephredqual- ityscoreslowerthan20weretrimmedfromthefirstreadposition withaphred<20.
SequencemappingandSNPdiscovery
Quality filtered and trimmed sequence reads were aligned to the bovine reference genome (Bos taurus; UCSC Genome Bioinformatics; http://genome.ucsc.edu/http://genome.ucsc.edu/
(21/01/2015)) since an African buffalo genome sequence is notavailable. TheMosaikAssemblersoftware (Mosaik 1.0.1388- Stromberg, 2010) wasused withdefault settings, specifying a medianfragmentlengthof50bp(i.e.,innermatedistance)with
Fig.1.SamplinglocalitiesofAfricanbuffaloincludedinthisstudy.West-centralclade(S.c.nanus,S.cbrachyceros,S.c.aequinoctialis)andsouth-easternlineage(S.c.caffer) weresampeled;1,PamaReserve;2,WNP;3,AoukNPandZakoumaNP;4,N’gotto-Kindo;5,OmoNP;6,NakuruNP;7,QueenElizabethNP;8,WaterbergNP;9,Hluhluwe UmfoloziNP.
a searchradius of50bptosearchfor missing mate andfor an alignmentthatconformstothepaired-endorientation.TheALL alignmentmodewasusedwithahashsizeof15(allhashingstrat- egy),a maximumpercentage ofmismatchesallowedof15, and a minimum clustersize of 35. Consensus candidateSNPs were extractedfromthedatasetusingtheSAMToolssoftwarewiththe pileupfunction(SAMTools0.1.7–SequenceAlignment/Map-(Li etal.,2009a,b)).CandidateSNPs werethenfilteredforhavinga phredqualityscoreabove20(qualityofbasecalling>99%),forhav- ingamappingqualityscoreabove30,andaminoralleleoccurrence atthepolymorphicpositionofatleastthreetimes.Positionsthat weremonomorphicwithintheSNPdiscoverypoolbutshoweda fixeddifferencewiththereferencegenome(Bostaurus)weredis- carded.Finally, SNPs witha fourtimeshigher read depththan theaveragereaddepthoftheRRL(averagetotalnumberofreads alignedwiththereferencegenometoauniqueposition)werealso discarded(Kerstensetal.,2009),asthesearelikelytobefalseSNPs thataretheresultofalignmentofparologoussequences.
For thevalidation ofthefilteredSNPset,primers hadtobe designedinthedirectflankingregionaroundtheSNPs.Therefore,a buffaloconsensussequencewasgeneratedfromtheRRLsequences
and flanking sequences around SNP positions were extracted.
Wherepossible,flankingregionsweregeneratedbasedonthespe- cificAfricanbuffaloconsensussequence.IfaSNPwassituatedclose tothe beginningor end of thereads,flankingregions for each SNPweregeneratedusingpartoftheBostaurusgenomeandcon- catenatedwithAfricanbuffaloconsensusinformation(‘chimeric flankingsequences’(Jonkeretal.,2012)).
Validation
Our set of SNPs was validated by randomly selecting 15 SNPs scattered on the whole genome and by amplify- ing them by standard PCR. Only SNPs that reached at least an Illumina design score of 0.6 in Illumina’s Assay Design Tool available at http://www.illumina.com/support/array/array software(21/01/2015)wereselected.Primersweredesignedusing Primer3(RozenandSkaletsky,2000;http://simgene.com/Primer3 (21/01/2015)),enteringourchimericflankingsequences(Table1).
Asthecattlegenomewasusedtoalignourmergedsequences,we wantedtospecificallytestthesequenceconservationbetweenBos taurusandSynceruscaffer.Ifsequenceconservationishigh,amplifi-
598
Table1
PrimersetsusedforSNPvalidationstep(n/a:noamplification).
Chrnr.bpposition Primerset SNP Observedpolymorphism
Chr1.75025101 TTTGGATCAGGAGGAACCAGCCCCTTTGGTGGAACATTTA A/G Yes
Chr5.14429154 AAAGGATTTCTGTTGGTGGAGAGATTTGCCTTCTCAAACTGGA A/G Yes
Chr6.119157451 TGAAATCTAACTGCCTGGGACTCAGGTGTGCTGGTTTACAGG C/T Yes
Chr9.106373371 AGTCTGCCTAAAAAGCCCATTCCTCCCACGCACAGACTC A/C Yes
Chr10.4010071 TCACCTGAATCCCACCCTTACTCGAGAAGGGCTTTGTGAC A/C Yes
Chr11.72584688 AACACCCCACCTTAATGCAGGTCAGGAGAGGGCTGTCAAG C/T Yes
Chr11.70625560 GCCATAAGGGTGGTGTCATCCCATGGACATCCTTTTCCTG C/T No
Chr12.20644129 TCCATGCCCATCTGAGATTTCCTGGCCTGACTCTGAGGTA A/C Yes
Chr14.80545216 GAGATCCCACTCGGCTGTTAAACCGTGAGCGAAGTGAGAG A/T Yes
Chr15.23524504 GATGGACTTGGTGGCAATTTGCCTCAGGACCATTTTCAGA A/G Yes
Chr15.81001941 GCTTGTTCAGATGGCACAGAGCCAGTACTCCCCCTAGCTC A/C n/a
Chr16.62257511 GCGTTCCTTCAACAACCAAGGCCATCTTGATTTCCTTCCA C/T Yes
Chr17.4151052 TCCCAGAGCAGACAGTCTCACGGTGATCATCTGCTAATGC A/C Yes
Chr17.74836302 CCCTCCACTAGCTTCTCAGCAGTGGAGCTGAGGTCTTGGA C/T Yes
Chr19.7345443 CATAATCCCAGCCAGTCTCCGAGAGCACCCCTGAGTTGAA C/T Yes
cationofSNPswillbesuccessfulwhenprimersaredesignedwithin theadjacentregionofthebovinegenomesurroundingthereadcon- tainingtheSNPcandidate.Therefore,threesituations(fiveSNPs persituation)weretested:(1)bothforwardandreverseprimers designedonthebovinegenomewherenoRRLreadsaligned,(2) oneforwardprimer designedonthereferencegenomeandone reverseprimerdesignedontheRRLreads(Synceruscaffer),and(3) bothprimersdesignedsolelyfromtheRRLreads.Thus,validation requiredSNPsfirstlytoamplifysuccessfully andsecondlytobe polymorphic.Eachofthese15SNPswassequencedinall13indi- vidualsusedintheoriginaldiscoverypool.ThePCRreactiontook placeinatotalvolumeof12l,consistingof3lDNA(10ng/l), 0.5lofprimers(0.03g/l),5.2lMastermix(ThermoScientific) and0.3lAmpliTaq® DNAPolymerase. Cyclingconditionscon- sistedof36cyclesfor30sat95◦C,30sat50◦Cand30sat72◦C.
Aninitialdenaturationstepprecededtheprocess(5minat95◦C), andafinalextensionstepfollowedtheprocess(10minat72◦C).
SequencingwasperformedonanABI3730XLcapillarysequencer.
TheresultingsequenceswerealignedusingCLUSTALX(Thompson etal.,1997)asimplementedinBIOEDITv.7.09(Hall,1999)andSNPs werevalidatedvisually.
Results
SequencingoftheRRLandreadfiltering
TheAluIrestrictionenzymewaschosenfortheconstructionof theAfricanbuffaloRRLsince itmaximizedthequantityoffrag- mentssituatedinthetargetedsizerangeof100–200bp,evaluated performinganinsilicodigestionoftheBostaurusgenome.Cor- respondencesbetweeninsilicoandinvitroobservedrestriction enzymecleavagepatternswerepreviouslydemonstratedwithin othermammalspecies(Abdurashitovetal.,2006,2007).Intotal, 18.5millionpaired-endsequencesof76bplengthweregenerated bytheIlluminaGenomeAnalyzerIIontwolanes.Thegenomecov- erage(Mosaik1.0.1388,Stromberg,2010)wasestimatedatabout 5%ofcodingandnon-codingregions(Fig.2).Theaveragephred qualityscoreperreadpositiondroppedbelow20afterposition55 forabout5millionreads.TomaintainsufficientqualityforSNP detection,thosereadsweretrimmedafterposition55.Theaverage sequencingdepthofthewholealignedsetofreadswasestimated at5xandat13xafterfilteringsteps.
Readalignmenttothereferencegenome
Thecattlegenomeconsistsof29autosomesandthesexchro- mosomes,withatotalestimatedgenomesizeof2.87Gb(Liuetal., 2009).Intotal,about60%ofthereadswerenotretainedbecause
theydidnotpassthefilteringcriteriaofthealignment:theywere tooshort,werenotunique(i.e.,aligntomorethanonelocation) orcontainedtoomanynucleotidedifferences.Eventually,6.9mil- lionreadsremainedfortheSNPdiscoveryandcouldbesuccessfully alignedtotheBostaurusreferencegenome,correspondingto836.5 millionbp.Fromthesereads,22%wereorphans(i.e.,onlyoneof thepairedreadalignedtothereference,whiletheotherdidnot), while14%hadonepairedreadthatwasnon-unique.Thephysical distributionoftheidentifiedSNPsacrossthebuffalogenomewas estimatedusingthecattlegenomeasthereference.
SNPdetection
Atotalof318,091putativeSNPsweredetected.Fixeddiffer- encesbetweenthediscoverypanelofAfricanbuffaloandthecattle referencegenomewerediscarded(i.e.,22,472).ThefewSNPs(1.7%) withmorethantwo alleleswerealsodiscardedastri-ortetra- allelicSNPsareuncommonandaremorelikelytobetheresultof sequencingerrorsthanrealpolymorphism (Brookes,1999).Fur- thermore,mostgenotypingassaydesignsdonotallowformore thantwoalleles.AfterfilteringtheseSNPsforminorallelecount (minoralleleoccurringatleastthreetimes),forminimumphred qualityscoreof20andforminimummappingqualityscoreof30, 2534SNPsinwhichweplacehighconfidenceremaineddistributed acrosstheentiregenome(Table2).Thesequencingdepthhadan averageof13readsafterfilteringsteps.Atotal numberof1837 SNPshadanIlluminadesignscore≥0.6.
SNPqualityassessment
Theratiooftransitions (TS;i.e.,C/TorA/G)versustransver- sions(TV;i.e.,A/T,G/C,A/CorG/T)wasestimatedasameasurefor thequalityoftheSNPdiscovery.TheTS:TVratioobservedwithin ourdatasetwas2.38:1(1784transitionsversus750transversions), withanearlyequal numberofA/G andT/Cmutations(889A/G and895T/C),andthefourTVchangesoccurringatsimilarfrequen- cies.Thisistheexpectedempiricalratio,whileratiossubstantially lowerthan2canbeindicativeofrandomgenotypingerror(Kraus etal.,2012).TheTS:TVratioremainedsimilarwhenplottedper readposition(Fig.3),whichisagoodindicationthattherewasno readpositionbias,suchasfalseSNPsduetolowsequencingquality towardstheendsofreads(Krausetal.,2012).Moreover,SNPpre- dictionsweretestedbydeterminingwhetherparticularregionsof sequencereadspresentedmoreSNPcandidatesthanothers.Previ- ousstudieshaveshownthattailsofreadspresentexcessivelymore sequencingerrors,leadingtofalseSNPsidentification(Dohmetal., 2008;VanBersetal.,2010).WithintheSNPsetpassingallfiltering
Fig.2.Referencesequencecoverageofthe30bovinechromosomes(Bostaurus).
600
Table2
NumberofhighconfidenceSNPsoneachofthe30bovinechromosomes(Bostaurus).
Chrnr SNPnr
1 89
2 98
3 135
4 92
5 105
6 51
7 103
8 70
9 54
10 73
11 114
12 58
13 97
14 69
15 80
16 68
17 72
18 90
19 112
20 50
21 102
22 106
23 89
24 89
25 119
26 84
27 73
28 61
29 92
X 39
Fig.3. AverageTS:TVratioateachreadposition.
criteria,nooverrepresentationofSNPcandidatesintheendsofthe readswasobserved.
SNPvalidation
OnlyoneprimerpairdesignedwithintheRRLreadsfailedto yieldanamplificationproduct.Withinthe14remainingamplifi- cationproducts,onedidnotcontaintheexpectedpolymorphism, whichmeansthat87%oftheexpectedSNPswereconfirmedby Sangersequencingoftheindividualsinthediscoverypanel.This highpercentageofsuccessfulPCRamplificationissimilartothat observedingeese(93%of384SNPstested)usingthesamechimeric technique (Jonker et al., 2012). The ten primer set entirely or partlydesignedwithinthecattlegenomegavePCRproductswith expectedSNPsobserved.Thiscorroboratesthehighgenomecon- servationbetweencattleandbuffalo.
Discussion
ModelspeciesreferencegenomeandSNPvalidation
Thepresentstudyenabledtheidentificationof2534SNPswith highconfidenceinanon-modelorganism.1837SNPshadanIllu- mina design score ≥0.6, reflecting a high likelihood that assay designwillbesuccessfulonamodernhighthroughputSNPgeno- typingplatform.About30%ofthesequencereadscouldbealigned tothebovinegenome,aclosely-relatedspecies.ThestudyofJonker etal.(2012)usedthesametechniquetoidentifySNPsintheBar- nacleGoose(Brantaleucopsis)byaligning1.77millionreadstothe Mallard(Anasplatyrhynchos)genome(divergencetime30million years)(Huangetal.,2013).Inthatstudy,16.1%ofthereadssuc- cessfullyaligned,subsequentlyallowingtheidentificationof2188 highconfidenceSNPs.IntheAfricanbuffalo,LeRoexetal.(2012) mapped19–23%oftheirshortreads(50bpreads)tothedomestic cowgenome(Bostaurus).Ourstudyconfirmsthatusingagenome ofacloselyrelatedspeciesasareferencestandardprovidesasuffi- cientnumberofhighconfidenceSNPsandoffersagoodalternative tocharacterizeSNPs innon-modelspecieswithoutcarryingout tediousstepsofdeepassemblyofredundantcontigs(Kerstensetal., 2009;VanBersetal.,2010).
The chimeric flanking sequences obtained from the cattle genomewerealsousedtogenerateprimersforvalidationsteps.
HighPCRamplificationsuccesswithchimericprimersindicated sufficientconservationbetweenthegenomesofthetwospecies tousethemforgenotypingassaydesign.ThehighPCRamplifica- tionsuccesscouldlikelybeattributedtothecorrespondenceofthe alignedfilteredreadstohighly conservedsequences.VanHooft etal.(1999)previouslydemonstratedhighgenomeconservation whenusingmicrosatellitesprimersdevelopedforcattleonAfrican buffalo,with83%successfulamplification.
Ascertainmentbias
Ascertainmentbiascanresultfromtheselectionoflocifrom anunrepresentativesample ofindividuals. Tolimit thiskindof bias,arelativelylargepoolofsamplescoveringthewholedistribu- tionareaofthetargetedspecieswasselected,comprisingallfour currentlytaxonomicallyrecognizedsubspeciesofAfricanbuffalos.
However,ascertainmentbiascanalsobeintroducedbylimitedread depth.Bystipulatingaminorallelecountofthreeinourprotocol, sequencingdepthshouldatleastcomprisesixreadsforaSNPto beretained.Ouraveragesequencingdepthofthewholeputative SNPdatabasewasestimatedat5x,whichincreasedto13xwhen estimatedontheSNPsetthatpassedthefilteringcriterion.This sequencingdepthremainslowcomparedtootherstudies(e.g.,25x (VanBersetal.,2010),58x(Kerstensetal.,2009)).Thestudiesmost similartoours,toourknowledge,arethatofJonkeretal.(2012) (Brantaleucopsis),whichyieldedanaveragesequencedepthof9.9x, andthatofLeRoexetal.(2012)(S.c.caffer),whichyieldanaverage sequencedepthof2.7x.Ourlowsequencedepthmaybeexplained byanover-representationofsize-fractionatedfragmentsranging between100and200bpslicedfromthepolyacrylamidegel.Conse- quently,manytruerarevariantsmaycertainlyhavebeenrejected.
Moreover,thelowdepthofcoveragealsoimpliesthatonlySNPs presentinmultiplesamplesofourDNApoolhadachancetobe identified.OverrepresentationofcommonSNPsoverrareSNPsis thusexpectedtointroduce biasintoourSNPset.Thisneedsto betakenintoaccountwheninterpretinggenotypicdatainfuture projects.However,byourgeographicallybroadsamplingdesignwe avoidthesubstantialgeographicascertainmentbiasthatislikely presentinLeRoexetal.(2012),becausetheirSNPdiscoverypanel waslimitedingeographicextent.
Fig.4. CumulativenumberofSNPoccurrenceateachreadposition.
Sequencingerror
Differentestimatorswereusedtoevaluatetheriskoffalseposi- tivesintheSNPdiscoveryanalysis.AhighTS:TVratioisconsidered a goodmeasure of SNPvalidity.A TS:TVratio of1:2 wouldbe detectedifmutationswererandomandisthereforeanindication forsequencingerrors. HigherratesofC/Tmutationsdue tothe deaminationofmethylcytosinesinCpGdinucleotidesareresponsi- bleforahigherTS:TVratioinrealdata(Cooperetal.,2010;Scarano etal.,1967;Vignaletal.,2002).Usually,theratioof2.1:1isobserved inmammals(DePristoetal.,2011).Aratiosignificantlylowerthan thislastonecanthereforebeanindicatorofpoorqualitysequenc- ingdata.OurTS:TVratioof2.4:1issimilartotheresultsobtained forexampleinthestudyofKrausetal.(2011)(2.3:1)andJonker etal.(2012)(2.7:1).ItthusindicatesthatmostdetectedSNPcalls werenotrandom,whichreflectsthatourSNPslikelyrepresenttrue nucleotidepolymorphism.
MisidentificationofSNPsduetosequencingerrorsisavoided byexcludingvariationwithalowphredscore.Tailsofreadsoften displayincreasinglymoresequencingerrorsusingIllumina’stech- nology.Eventhoughwetrimmedourreadsduringqualitychecks, adecreaseinthenumberofpredictedSNPsinthetailofthereads wasobservedandmaybeexplainedbyanassociateddecreaseof theassociatedphredscore(Fig.4).ThisdecreaseinpredictedSNPs perpositioninIlluminasequencingreadswasalsoreportedinear- lierstudies(Kerstensetal.,2009;Ramosetal.,2009;VanBersetal., 2010),onwhichthecurrentSNPdetectionpipelineisbased.
Thehighvalidationsuccessinourstudycanalsoillustratethe qualityof thepredictedSNPs. Fromthesetof15, onlyone did notamplifywhoseprimersweredesignedbasedonthegenerated reads(Table1).Therefore,a closelyrelatedspeciesgenomecan beusedbothformappingandSNPdiscovery,butalsotodesign theprimersforthegenotypingstep.AnotherSNPappearedtobe monomorphic,yieldingaconversionrateof87%ofpolymorphic SNPs.ThisvalidationsuccessrateishigherthanthatofLeRoex etal.(2012)workingontheCapebuffalo(S.c.caffer).There,within thesetof173SNPs usedfor thevalidation,143amplifiedsuc- cessfullyandonly75werepolymorphic.ThefalsepositivesinSNP discoveryinthestudyofLeRoexetal.(2012)seemstobelinked tothelowcoverage(mean2.7x),andtothefactthataSNPwas inferredifthenucleotidevariantwassupportedbyaminimumof tworeads(vs.6inourstudy).Usingsuchalowcut-offmaysig- nificantlyincreasetheriskofidentifyingfalsepositives.Applying aminorallelecountofatleastthreeminimizesfalsepositivesin theSNPdiscoveryanalysis.Thisapproach,however,alsodramati- callyreducestheidentificationoftruenucleotidepolymorphisms thatcould,inprinciple,bedetectedevenifitwouldalsoincrease chancesofidentifyingasequencingerrorasanSNP.
UtilityofSNPsinAfricanbuffalo
SNPmarkerscanprovidemajorinsightsintoanimaldispersal patterns.Thisisespeciallyrelevantinthelightofrecentconserva- tioninitiativesaimingtorestoregeneticdiversityofwildlifestocks by re-establishing demographic connectivity between wildlife populations of different NP (e.g., Great Limpopo Transfrontier Conservation Area). Dispersal beyond traditional conservancy boundaries,andalsonationalborders,mayposeariskasfaras thespreadofpathogensareconcerned(Crossetal.,2004,2005).
Amongwildlifespecies,buffalosareknowntobeoneofthemain wildlifereservoirsfordiseases(Rodwelletal.,2001).Sincebuf- faloarecloselyrelatedtocattle,andmaytransmitdiseasedirectly orindirectly, buffaloalsorepresentan importantthreatfor the Africanlivestockindustry,fromaconservation,sanitaryandeco- nomicpointofview(Garine-Wichatitskyetal.,2010;Jollesetal., 2005;Micheletal.,2006).Forfinescaleinference,alargernumber ofSNPsmayberequired,astheinformationcontentofoneSNPis lessthanthatofonemicrosatellite(i.e.,bi-allelicvs.multi-allelic markers).Previousstudiesrevealedthatfourtotwelvetimesmore SNPsareneededforpopulationstructureinferencetomatchthe statisticalpowerofonemicrosatellite(Liuetal.,2005).Forhighly dispersiveorganismsithasbeenshownthatthedetectionoflow levelsofdifferentiationispossiblewithaminimumof 80SNPs (Morinetal.,2009;Rymanetal.,2006).Ourlargesetofhundreds ofSNPsshouldthusallowtoscalethegeneticmarkersystemto theneedsoffuturestudiesoftheinteractionbetweenlandscape featuresandmicroevolutionaryprocesses(Maneletal.,2003).
ThisSNPdatabasemayalsobeofbenefitinthecontextofselec- tivebreeding.Indeed,selectivebreedingofspecificphenotypesof theCapebuffalosubspecies(S.c.caffer-South-EasternAfrica)has becomeanintricatebusinesswithinprivategamefarming.Females arebeingselectedforhorn length,milkproduction andregular calvingintervals,whilemalesarebeingselectedforhornsize,body massandshape,whicharedesirabletotrophyhunters.However, suchapproaches mayleadto distortionof evolutionarynatural processesandmayreducethespeciesgeneticvariabilitythereby weakeningthespeciesresilienceinthewild.Thesepracticesare notbelievedtobenefittheconservationofglobalbiodiversity,and mayevenbecomeproblematicifgeneticdilutionoccursthrough escapesofselectedindividualsintoneighboringwildpopulations ofbuffalo.Futuredevelopmentofguidelinesincollaborationwith thegame-farmingbreedersshouldallowfindingcompromisesfor thelong-termconservationofthewildlifespecies.
Conclusions
WithinahighlymobilespeciessuchastheAfricanbuffalo,the SNPsetdeveloped inthis studyshouldprovidehighlyvaluable andreliabletoolsforgaininginsightintothemigratorypatternof thisspecies,knowntobeadiseasereservoir.Ourapproachyielded higherqualitySNPs(asjudgedbyassayconversionrate)andless geographicallybiasedSNPsthanapreviousstudy(LeRoexetal., 2012).Furthermore,theconstructionofchimericflankingsequence wasshown toincreasethenumberofusableSNPsbyproviding sufficientlylargeregionsforthegenotypingassay.
Conflictofinterest
Theauthorsdeclaretohavenoconflictsofinterest.Theydevel- opedallaspectsofthisstudy.Thesponsorsoftheissuehadnorole inthestudydesign,datacollectionandanalysis,decisiontopublish, orpreparationofthemanuscript.
602
Authorcontributions
Thepresentresearchstudywasdesignedincollaborationwith PimVanHooft,RasmusHeller,JohanMichaux,RichardCrooijmans, MartienGroenenandBenGreyling.Partialsampleswereprovided byDanielCornélisandPhilippeChardonnet.Statisticalanalysisand interpretationwasperformedbyNathalieSmitz,withassistance ofRobertKraus,RichardCrooijmansandMartienGroenen.Allco- authorsparticipatedtothepaperwriting.
Acknowledgements
WeliketothankthesupportoftheResearchPlatform“Produc- tionandConservationinPartnership”(RP-PCP).Wewouldalsolike tothankF.Jori,B.VanVuuren,K.L.Kanapeckasandallcollectorsfor providingusthesamplesusedforthediscoveryoftheSNPset.Tech- nicalassistanceinthelaboratorywasprovidedbyBertDibbits,and RudyJonkerhelpedwithconstructingchimericflankingsequences.
Proofreading assistancewasprovided byVirginie Winant.This projectwassupportedbythenetwork “BibliothèqueduVivant”
fundedbytheCNRS,the“MuséeNationald’HistoireNaturelle”,the INRAandthe“CentreNationaldeSéquenc¸age”.Thisstudyissup- portedbyresearchgrantsfromtheFRS-FNRSofBelgiumprovided toJ.R.MichauxandN.M.R.Smitz.
AppendixA. Supplementarydata
Supplementarydataassociatedwiththisarticlecanbefound,in theonlineversion,athttp://dx.doi.org/10.1016/j.mambio.2016.07.
047.
References
Abdurashitov,M.A.,Tomilov,V.N.,Chernukhin,V.A.,Gonchar,D.A.,Degtyarev,S.
Kh.,2006.MammalianchromosomalDNAdigestionwithrestriction endonucleasesinsilico.OvchinnikovBull.Biotechnol.Phys.Chem.Biol.2, 29–38.
Abdurashitov,M.A.,Tomilov,V.N.,Chernukhin,V.A.,Gonchar,D.A.,Degtyarev,S.
Kh.,2007.ComparativeanalysisofhumanchromosomalDNAdigestionwith restrictionendonucleasesinvitroandinsilico.Med.Genet.6,29–36.
Aitken,N.,Smith,S.,Schwarz,C.,Morin,P.A.,2004.Singlenucleotide
polymorphism(SNP)discoveryinmammals:atargeted-geneapproach.Mol.
Ecol.13,1423–1431,http://dx.doi.org/10.1111/j.1365-294X.2004.02159.x.
Andreassen,R.,Lunner,S.,Høyheim,B.,2010.TargetedSNPdiscoveryinAtlantic salmon(Salmosalar)genesusinga3UTR-primedSNPdetectionapproach.
BMCGenomics11,706,http://dx.doi.org/10.1186/1471-2164-11-706.
Brookes,A.J.,1999.TheessenceofSNPs.Gene234,177–186.
Cooper,D.N.,Mort,M.,Stenson,P.D.,Ball,E.V.,Chuzhanova,N.A.,2010.
Methylation-mediateddeaminationof5-methylcytosineappearstogiverise tomutationscausinghumaninheriteddiseaseinCpNpGtrinucleotides,aswell asinCpGdinucleotides.Hum.Genomics4,406–410.
Cramer,E.R.A.,Stenzler,L.,Talaba,A.L.,Makarewich,C.A.,Vehrencamp,S.L., Lovette,I.J.,2008.IsolationandcharacterizationofSNPvariationat90 anonymouslociinthebandedwren(Thryothoruspleurostictus).Conserv.
Genet.9,1657–1660,http://dx.doi.org/10.1007/s10592-008-9511-7.
Cross,P.,Lloyd-Smith,J.,Bowers,J.,Hay,C.,Hofmeyr,M.,Getz,W.,2004.
Integratingassociationdataanddiseasedynamics:anillustrationusing AfricanbuffaloinKrugerNationalPark.Ann.Zool.Fennici41,879–892.
Cross,P.C.,Lloyd-Smith,J.O.,Getz,W.M.,2005.Disentanglingassociationpatterns infission-fusionsocietiesusingAfricanbuffaloasanexample.Anim.Behav.69, 499–506,http://dx.doi.org/10.1016/j.anbehav.2004.08.006.
Davey,J.W.,Hohenlohe,P.A.,Etter,P.D.,Boone,J.Q.,Catchen,J.M.,Blaxter,M.L., 2011.Genome-widegeneticmarkerdiscoveryandgenotypingusing next-generationsequencing.Nat.Rev.Genet.12,499–510,http://dx.doi.org/
10.1038/nrg3012.
DePristo,M.A.,Banks,E.,Poplin,R.,Garimella,K.,Maguire,V.,Hartl,J.R., Philippakis,C.,Angel,A.A.,del,G.,Rivas,M.A.,Hanna,M.,McKenna,A.,Fennell, T.J.,Kernytsky,A.M.,Sivachenko,A.Y.,Cibulskis,K.,Gabriel,S.B.,Altshuler,D., Daly,M.J.,2011.Aframeworkforvariationdiscoveryandgenotypingusing next-generationDNAsequencingdata.Nat.Genet.43,491–498,http://dx.doi.
org/10.1038/ng.806.
Dohm,J.C.,Lottaz,C.,Borodina,T.,Himmelbauer,H.,2008.Substantialbiasesin ultra-shortreaddatasetsfromhigh-throughputDNAsequencing.Nucleic AcidsRes.36,e105,http://dx.doi.org/10.1093/nar/gkn425.
DuToit,R.,1954.TrypanosomiasisinZululandandthecontroloftsetsefliesby chemicalmeans.OnderstepoortJ.Vet.Res.26,317–387.
East,R.,1999.AfricanAntelopeDatabase1999.Gland:IUCN,Switzerlandand Cambridge.
Ewing,B.,Green,P.,1998.Base-callingofautomatedsequencertracesusingphred.
II.Errorprobabilities.GenomeRes.8,186–194.
Frankham,R.,Ballou,J.D.,Briscoe,D.A.,2002.IntroductiontoConservation Genetics.CambridgeUniversityPress,Cambridge.
Garine-Wichatitsky,M.,deCaron,A.,Gomo,C.,Foggin,C.,Dutlow,K.,Pfukenyi,D., Lane,E.,Bel,S.,LeHofmeyr,M.,Hlokwe,T.,Michel,A.,2010.Bovine tuberculosisinbuffaloes,SouthernAfrica.Emerg.Infect.Dis.16,884–885, http://dx.doi.org/10.1890/02-5266.
Hall,T.A.,1999.BioEdit:auser-friendlybiologicalsequencealignmenteditorand analysisprogramforWindows95/98/NT.NucleicAcidsSymp.Ser.41,95–98.
Hassanin,A.,Ropiquet,A.,2004.MolecularphylogenyofthetribeBovini(Bovidae, Bovinae)andthetaxonomicstatusoftheKouprey,BossauveliUrbain1937.
Mol.Phylogenet.Evol.33,896–907,http://dx.doi.org/10.1016/j.ympev.2004.
08.009.
Hedges,S.B.,Dudley,J.,Kumar,S.,2006.TimeTree:apublicknowledge-baseof divergencetimesamongorganisms.Bioinformatics22,2971–2972,http://dx.
doi.org/10.1093/bioinformatics/btl505.
Huang,Y.,Li,Y.,Burt,D.W.,Chen,H.,Zhang,Y.,etal.,2013.Theduckgenomeand transcriptomeprovideinsightintoanavianinfluenzavirusreservoirspecies.
Nat.Genet.45,776–784,http://dx.doi.org/10.1038/ng.2657.
Jolles,A.E.,Cooper,D.V.,Levin,S.A.,2005.Hiddeneffectsofchronictuberculosisin Africanbuffalo.Ecology86,2358–2364,http://dx.doi.org/10.1890/05-0038.
Jonker,R.M.,Zhang,Q.,VanHooft,P.,Loonen,M.J.J.E.,VanderJeugd,H.P., Crooijmans,R.P.M.A.,Groenen,M.A.M.,Prins,H.H.T.,Kraus,R.H.S.,2012.The developmentofagenomewideSNPsetfortheBarnaclegooseBranta leucopsis.PLoSOne7,e38412,http://dx.doi.org/10.1371/journal.pone.
0038412.
Kappmeier,K.,Nevill,E.M.,Bagnall,R.J.,1998.Reviewoftsetsefliesand trypanosomosisinSouthAfrica.OnderstepoortJ.Vet.Res.65,195–203.
Kerstens,H.H.D.,Crooijmans,R.P.M.,Veenendaal,A.,Dibbits,B.W.,Chin-A-Woeng, T.F.C.,denDunnen,J.T.,Groenen,M.A.M.,2009.Largescalesinglenucleotide polymorphismdiscoveryinunsequencedgenomesusingsecondgeneration highthroughputsequencingtechnology:appliedtoturkey.BMCGenomics10, 479,http://dx.doi.org/10.1186/1471-2164-10-479.
Kraus,R.H.S.,Kerstens,H.H.D.,VanHooft,P.,Crooijmans,R.P.M.A.,VanDerPoel,J.J., Elmberg,J.,Vignal,A.,Huang,Y.,Li,N.,Prins,H.H.T.,Groenen,M.A.M.,2011.
GenomewideSNPdiscovery,analysisandevaluationinmallard(Anas platyrhynchos).BMCGenomics12,150,http://dx.doi.org/10.1186/1471-2164- 12-150.
Kraus,R.H.S.,Kerstens,H.H.D.,vanHooft,P.,Megens,H.-J.,Elmberg,J.,Tsvey,A., Sartakov,D.,Soloviev,S.A.,Crooijmans,R.P.M.A.,Groenen,M.A.M.,Ydenberg, R.C.,Prins,H.H.T.,2012.Widespreadhorizontalgenomicexchangedoesnot erodespeciesbarriersamongsympatricducks.BMCEvol.Biol.12,45,http://
dx.doi.org/10.1186/1471-2148-12-45.
Kraus,R.H.S.,Vonholdt,B.,Cocchiararo,B.,Harms,V.,Bayerl,H.,Uhn,R.K.,Orster, D.W.F.,Roos,C.,2014.Asingle-nucleotidepolymorphism-basedapproachfor rapidandcost-effectivegeneticwolfmonitoringinEuropebasedon noninvasivelycollectedsamples.Mol.Ecol.Resour.,http://dx.doi.org/10.1111/
1755-0998.12307.
Kumar,S.,Hedges,S.B.,2011.TimeTree2:speciesdivergencetimesontheiPhone.
Bioinformatics27,2023–2024,http://dx.doi.org/10.1093/bioinformatics/
btr315.
LeRoex,N.,Noyes,H.,Brass,A.,Bradley,D.G.,Kemp,S.J.,Kay,S.,vanHelden,P.D., Hoal,E.G.,2012.NovelSNPdiscoveryinAfricanbuffalosynceruscaffer,using high-throughputsequencing.PLoSOne7,e48792,http://dx.doi.org/10.1371/
journal.pone.0048792.
Li,H.,Handsaker,B.,Wysoker,A.,Fennell,T.,Ruan,J.,Homer,N.,Marth,G., Abecasis,G.,Durbin,R.,2009a.Thesequencealignment/map(SAM)formatand SAMtools.Bioinformatics25,2078–2079.
Li,S.,Wan,H.,Ji,H.,Zhou,K.,Yang,G.,2009b.SNPdiscoverybasedonCATSand genotypinginthefinlessporpoise(Neophocaenaphocaenoides).Conserv.
Genet.10,2013–2019.
Liu,N.,Chen,L.,Wang,S.,Oh,C.,Zhao,H.,2005.Comparisonofsingle-nucleotide polymorphismsandmicrosatellitesininferenceofpopulationstructure.BMC Genet.6(Suppl.1),S26,http://dx.doi.org/10.1186/1471-2156-6-s1-s26.
Liu,Y.,Qin,X.,Song,X.-Z.H.,Jiang,H.,Shen,Y.,Durbin,K.J.,Lien,S.,Kent,M.P., Sodeland,M.,Ren,Y.,Zhang,L.,Sodergren,E.,Havlak,P.,Worley,K.C., Weinstock,G.M.,Gibbs,R.A.,2009.Bostaurusgenomeassembly.BMC Genomics10,180,http://dx.doi.org/10.1186/1471-2164-10-180.
Luikart,G.,England,P.R.,Tallmon,D.,Jordan,S.,Taberlet,P.,2003.Thepowerand promiseofpopulationgenomics:fromgenotypingtogenometyping.Nat.Rev.
Genet.4,981–994,http://dx.doi.org/10.1038/nrg1226.
Manel,S.,Schwartz,M.K.,Luikart,G.,Taberlet,P.,2003.Landscapegenetics:
combininglandscapeecologyandpopulationgenetics.TrendsEcol.Evol.18, 189–197,http://dx.doi.org/10.1016/S0169-5347(03)00008-9.
Matukumalli,L.K.,Lawley,C.T.,Schnabel,R.D.,Taylor,J.F.,Allan,M.F.,Heaton,M.P., O’Connell,J.,Moore,S.S.,Smith,T.P.L.,Sonstegard,T.S.,VanTassell,C.P.,2009.
DevelopmentandcharacterizationofahighdensitySNPgenotypingassayfor cattle.PLoSOne4,e5350,http://dx.doi.org/10.1371/journal.pone.0005350.
Michel,A.L.,Bengis,R.G.,Keet,D.F.,Hofmeyr,M.,DEKlerk,L.M.,Cross,P.C.,Jolles, A.E.,Cooper,D.,Whyte,I.J.,Buss,P.,Godfroid,J.,2006.Wildlifetuberculosisin SouthAfricanconservationareas:implicationsandchallenges.Vet.Microbiol.
112,91–100,http://dx.doi.org/10.1016/j.vetmic.2005.11.035.