• Keine Ergebnisse gefunden

Isolation, Amplification, Sequencing

Im Dokument Table of Content (Seite 15-0)

2. Materials and Methods

2.4 Microsatellites

2.4.2 Isolation, Amplification, Sequencing

Standard 10 µl reactions consisted of 1× PCR HotMaster Buffer, 0.2 mM dNTPs, 0.5 µM of each primer (one labeled (forward), one unlabelled (reverse)), 0.02 U/µl HotMaster Taq (Eppendorf, 5-Prime), 0.5 M Betaine (Sigma Aldrich) and 1-2µl of DNA with a concentration of 50ng/µl determined by (Nano-Drop). Cycling conditions on an Epgradient thermocycler (Eppendorf) were different depending on primers used (Tab. 1). A final extension step of 20 minutes at 65°C was performed to reduce in vitro artifacts due to incomplete adenylation of products (Leese and Held 2008). PCR products were visualized on 2% TBE agarose gels, diluted 1–10 fold with molecular grade water (CARL ROTH) and 1 µl of the diluted product was denatured in a mixture of 14.7 µl HI-DI formamide with 0.3 µl GeneScan ROX 500 size standard (both Applied Biosystems).

Tab. 1: Annealing temperatures for labeled primers isolated for microsatellite loci of the nuclear DNA of Notocrangon antarcticus.

Ncr1 Ncr3 Ncr6 Ncr12 Ncr14

Annealing Temperature 50°C 50°C 54°C 50°C 45°C

In the case of Ncr20 and Ncr17, due to the fact that the primers amplified more than one product during the PCR, the resulting PCR products were separated by cutting the fragments out of an 2% TBE agarose gel according to the manufacturers protocol “5Prime PCR Extract and GelExtract Mini Kits Manual” (2007©) before sequencing. The cycle-sequencing was run under the same conditions as mentioned above.

16 2.4.3 Fragment Analysis and Genotyping

The fragments were analyzed on an ABI 3130xl; and allele length scoring was performed using the software GENEMAPPER 4.0 (Applied Biosystems). Samples were genotyped 4-7 times separately and results were compared to minimize genotyping errors. In addition, microsatellite fragments of random samples were amplified under same PCR conditions in separate PCRs, the fragments were analyzed 2-4 times and the results were compared, to calibrate the scoring criteria and to confirm scored genotypes. Samples with uncertain results were sorted out of further data analysis.

2.4.4 Cloning

In order to improve and redesign some of the existing microsatellite primers (Ncr and Mys by Agrawal et al. in prep.) for Notocrangon antarcticus, the PCR products of these primers were inserted in a plasmid pCR2.1-TOPO®TA vector from Invitrogen® (Lot no. 841084) and transformed in competent E.coli cells (Invitrogen® , Promega, Ch. 873292A) according to the heat-shock/ one-shot protocol manual from the Invitrogen kit: TOPO TA Cloning Kit. Cultures of positive colonies, identified by blue-white selection (IPTG/X-Gal), were grown overnight (ca. 17h at 37°C) on agar-LB-medium containing 100 μg/mL ampicillin. The competence of the cells was proofed before cloning the insert in a PUC 19 Vector (lot no. 837179) according to the recommendations of the manufacturer.

The PCR cycle profile for the cloning step was: initial denaturation at 94°C for 2min; 38 cycles of denaturation at 94°C for 20 secs, annealing for 20secs at different temperatures according to the primers used, and elongation at 65°C for 30secs followed by a single final elongation step of 20 min at 65°C. PCR products were tested on a 2% TBE agarose gel, cleaned with the QIAquick PCR Purification Kit according to the manufacturer’s protocol for PCR products and frozen. Approximately four hours before cloning, the PCR products were thawed and 2 µl of each PCR product were used as template for a second PCR at aforementioned conditions to ensure adenylation of the PCR products for cloning. The new PCR products were cleaned with the same QIAquick PCR Purification Kit and tested on a 2% TBE agarose-gel. Purified

17 PCR products for Mys primers (Agrawal in prep.) were pooled and 2µl of the mixture were used for the one-shot cloning step with a single cell charge. A second cell charge was equally cloned but with 2µl of pooled PCR products for Ncr Primers (Agrawal et al. in prep.). For the transformation step, provided salt-solution (lot no. 804050) and water (lot. nr. 830136) were used. Each cell culture was equally divided on 6 plates with agar-medium to grow colonies over night at 37°C. 96 positive colonies of each cell culture (192 colonies) were chosen, placed separately on agar and sequenced by QIAGEN. Aforementioned 96 positive colonies were also grown over night at 37°C in liquid LB medium to provide an exact copy of the samples send to QIAGEN if needed. Additionally, some more positive colonies (672) were picked, grown at 37°C, precipitated and stored either in 10x HotMaster-PCR buffer (Eppendorf, 5-Prime) or in molecular grade water (CARL ROTH) at -20°C.

2.4.5 Data Analysis

The genotyping and allele scoring of the microsatellite fragments was performed using GENEMAPPER 4.0 (Applied Biosystems, 2004)

The GENEMAPPER Software generates genotypes from the raw spectra of prepared samples run on an electrophoresis instrument. The instrument performs electrophoric separation of the fluorescent labeled Fragments (due to the labeled primers used – “Hex”

(hexachlorofluorescein phosphoramidite) or “Fam” (carboxyfluorescein) (Metabion int.

AG.)). Thus, it monitors fluctuations in emitted light as the fragments migrate passing a laser.

The Data Collection Software assembles the collected spectral signal for each fragment from each sample and stores the data for further analysis. GENEMAPPER Software separates the collective raw spectra for each sample into the component signals, corresponding to the emission wavelength of the fluorescent dyes used for the primers and size standard.

Subsequently the software generates genotypes by processing the resulting dye “signals”

(GENEMAPPER Software, User’s Guide, Copyright 2004, Applied Biosystems).

The resulting peaks were then genotyped manually from 4-7 times separately and results were compared to minimize genotyping errors.

18 After genotyping, the microsatellite allele size data from an excel sheet was changed in format using MSAT TOOL KIT, version 3.1.1 (12/2008; Park, 2001). The output file was converted into the required file formats, for further analysis, using CONVERT, version 1.3.1 (3/2005; Glaubitz, 2004).

To study the population structure with information from different microsatellite loci several statistical programs were employed, which are described briefly in the following paragraphs.

During the polymerase chain reaction (PCR) for microsatellites amplification some errors can occur, mostly among the annealing and amplification processes, as: one or more alleles do not amplify (“null-alleles”); biased changes in allele sizes occur due to stuttering of the polymerase while amplifying the repetitive motif, resulting in fragments with less base-pairs (bp) (“Stuttering”); large alleles are not amplified as efficiently as small alleles (“Large allele dropout”). MICROCHECKER 2.2.3 (Shipley 2003) helps to detect this type of errors to decrease bias during the interpretation and further analysis of the microsatellite allele data.

This application is based on a Monte Carlo simulation (bootstrapping) method that generates expected homozygotes and heterozygotes allele size difference frequencies and compares this with the genotypes from the input allele size data. To calculate expected allele frequencies and frequency of any null alleles, the program uses the Hardy-Weinberg theory of equilibrium (HWE) (Van Oosterhout et al. 2003, 2004). Thus, this program was used to check the raw data for genotyping errors and for the presence of null alleles. The expected number of homozygotes for each class (allele size) is calculated based on the heterozygote frequency for that class. This number is then compared to the observed number of homozygotes. The probabilities of observed homozygote frequencies are computed using two methods: using the homozygote and heterozygote frequencies of each size class (“binomial based”); and by comparing the observed value to the mean rank position of that value in the simulated values (“rank based”) (Van Oosterhout et al. 2003).

Null allele frequencies are shown by estimating allele frequencies and can be compared to the null allele frequencies obtained by using Chakraborty (Chakraborty et al. 1992) and Brookfield (Brookfield 1996) methods. However, no evidence was found for null alleles within the input data, thus, this function was not needed.

19 Population differentiation on genic differentiation level, as well as on genotypic differentiation level was performed for all population pairs with GENEPOP version 4.1 (Raymond and Rouset 1995). Both tests were run with following parameters using the same Markov Chain (by Gou and Thompson 1992) to assess p-values: 10000 burnin, 100 batches with 5000 MCMC steps each (MCMC: Markov Chain Monte Carlo; a class of algorithms that takes samples from probability distributions, based on the construction of a Markov chain).

Genotypic differentiation is tested for following hypothesis H0: “genotypes are drawn from the same distribution in all populations" related to the distribution of diploid genotypes in the different populations, while genic differentiation is tested for H0: “alleles are drawn from the same distribution in all populations” concerning the distribution of alleles among the given samples. The p-value output was used to assign the significance of differentiation by using the calculated FST-values. The FST-values were calculated for all population pairs with GENEPOP, which follows standard ANOVA as in Weir and Cockerham (1984). The FST max value was computed by FSTAT (Goudet, 1995 (modified 2001)) after recoding the input file with RECODEDATA, version 0.1 (Meirmans, 2006). Hence, the standardized F’ST value can be calculated dividing the FST value provided by GENEPOP by FST max, as it is recommended by Leese et al. (2008) and presented in the manual of RECODEDATA as it has become a common index for the magnitude of population structure. Moreover, diverse Hardy-Weinberg (HW) tests were performed with GENEPOP all with the same parameters and using the same Markov Chain (Dememorization: 10 000; Batches: 20; Iterations per batch: 5000) as well as for the computation of FIS (inbreeding-coefficient).

In addition, STRUCTURE 2.3.3 2010 (Pritchard et al. 2003) supplied Bayesian multilocus-based clustering algorithm and was used to carry out individual assignment tests to populations. STRUCTURE was demanded with the Java front end and CONVERT transcribed the GENEPOP file with the genotype tables, into a STRUCTURE-compatible file-format. The clustering model of STRUCTURE, assigns individuals probabilistically to a population or jointly two or more populations from a K number of possible populations depending on their admixture level. Each k population is characterized by a number of allele frequencies at given loci. The program assumes that the loci within populations are at HWE and linkage equilibrium – in other words the parameters are set to group individuals together to populations in order to provide aforementioned priors (Structure 2.2 Manual). For the N.

antarcticus data set, most likely number of populations was developed with prior

20 information on geographic origin of individuals and the maximum number of population was set to seven according to the number of sample sites (K from one to seven). The number of MCMS steps was set to 100000. Results were controlled as described in the manual-operating instructions to test the set up parameters and were found to be suitable. Hence, mentioned parameter sets were used to perform four independent iterations with a burn-in period of 1000 and a no. of MCMC steps of 100000 with and without using the population admixture model and with and without giving the sample location as a prior. Again, aforementioned tests were also performed with and without assuming correlation of allele frequencies. The number of populations was set from K=1 to K=7 according to the number of sample sites, in order to detect potential subpopulations. The final number of populations was determined by comparing the difference of calculated Bayes-factors for different number of assumed subpopulations and taking the corresponding and smallest “K” value (for the highest value of the differences between Bayes-factors) as the expected number of subpopulations for N. antarcticus.

21 Notocrangon antarcticus within the sample sites around the Antarctic. Subsequently, two haplotypes were observed differing from each other by 5 bp within a total of 507 analyzed bp (1% mutation). One haplotype resulted for the sample region of SGI and the other haplotype for the rest of the sample regions around the Antarctic, as shown in Fig. 6.

SOI (South representing one sample site and each number representing number of sequences and therefore, number of samples aligned for each region. The smaller light blue circle represents the second haplotype belonging to the sample site of SGI. The black line with dots connecting the two circles shows the base-pairs (bp) of difference between the two haplotypes, with each dot representing an additional single bp-mutation to the line.

Based on the 16S data, there is a clearly difference between the localities of SGI and the rest of the Antarctic N. antarcticus, probably due to lack of gene flow across the Polar Front.

Results from the 16S rDNA haplotype network, surely confirm population differentiation

22 within N. antarcticus, which can be better investigated with help of faster evolving markers such as microsatellites.

Considering the fact that the sequences for sample sites SOI, LA, LB, LC, EWS, TA did all show the same haplotype, there is no need to increase the number of individuals tested to increase the reliability and significance of this clear 16S rDNA data set.

3.1.2 18S rDNA

Due to its length, an amplification of the whole fragment failed and thus mostly smaller fragments of either the “beginning” or the “ending” region of the fragment were successfully sequenced. Complete fragment of the whole 18S region were scarce and no mutations between different sample locations could be detected after the alignment of the sequences of the fragments. Considering as well that the 18S gene evolves slower than 16S and as the results of 16S did not show much variability, to continue and optimize the amplification of the 18S fragment for Notocrangon antarcticus did not seem to be necessary and of major importance for this study.

3.1.3 Cytochrome oxidase (CO1) mtDNA

The amplification PCR of the CO1 mitochondrial gene showed results for an annealing temperature from 39.9 to 44.3°C, but resulted in two PCR products for, as detected in the 2% TBE agarose gel - the bigger being approximately 800 bp long and the smaller one 200 bp long. Even though, the 200 bp long fragment is too small to be the sought fragment, it will interrupt the sequencing of the 800bp long CO1 fragments. Therefore, the sequencing process for CO1 could not be carried out within the framework of this study and the two fragments must be either cut out of the gel and purified before yielding more results, or different primers have to be used for this gene. These two PCR products were sighted for all tested temperatures, so, if the same primers shall be used and the protocol has to be modified to amplify only one fragment, the use of different annealing temperatures can be excluded. A possible explanation for the appearance of the small fragment (ca. 200 bp)

23 might be the presence of a pseudo gene of a region of the CO1 (originally mtDNA) located in the nuclear DNA or a totally different product unrelated to CO1. In this case a dilution of the template DNA might help to discard the smaller fragment, since more mtDNA as nDNA is expected in the DNA extracted (due to many mitochondria and only one nucleus per cell).

3.2 Microsatellites

3.2.1 Marker Selection

Within the 20 microsatellite primers designed for Notocrangon antarcticus in past research projects, five (Ncr1, Ncr3, Ncr6, Ncr 12 and Ncr14) were chosen and fluorescent-labeled for intraspecific population analysis (App. 3, Appendix). All 20 designed primers were tested and primers Ncr2, Ncr4, Ncr7, Ncr8, Ncr9 and Ncr11, were found to amplify a fragment without any repeat or variation and thus, were rejected for further analysis. However, the locus Ncr11 might be mutating to fast as the sequence showed many ambiguous peaks and therefore does not give a reliable signal, as no clear repeat was detected. Fragments for primers Ncr10, Ncr13 and Ncr16 have to be re-sequenced after only giving a result of 5bp during the sequencing process. Primers Ncr17 and Ncr20 showed 2-3 bands in the 2% TBE agarose gel and the amplified fragments were therefore treated separately through cutting and purifying in order to treat the different products separately. Since, latter primers were not specific enough to amplify only one fragment, new primers have to be designed and they had to be subsequently expelled from the fragment analysis within this project. Yet, the corresponding loci should be taken into consideration for further research projects as they show repeats in their sequences and might be, for instance, potential microsatellites. Locus Ncr5 showed a very complex repeat pattern over 25 bp length and was not used for the fragment analysis either but the function as genetic marker cannot be rejected. Locus Ncr15 showed a repeat motive and could be a good candidate for future analysis in order to expand the data used for this thesis. Ncr18 and Ncr19 primers did not amplify any fragment nor gave any other results (see App. 4).

In order to re-design primers which did not yield a clear sequence or amplified more than one region, the products of these primers were cloned into E. coli as described under 2.4.4.

The products seemed to have been successfully cloned, detected by IPTG/X-Gal blue/white

24 selection; and the chosen colonies were stored appropriately in order to be analyzed in future studies.

3.2.2 Fragment Analysis

Out of the five labeled markers, three polymorphic and reliable microsatellite loci developed for N. antarcticus were applied to attain intraspecific genetic polymorphisms for all extracted specimens from all sample sites. Ncr6 was discarded because it seemed to only have monomorphic peaks over all sample sites, as well as Ncr12 because of its genotyping was not reliable, due to many stutter peaks. The genotyped alleles for each tested marker on each individual are shown in App. 5. The missing allele data from some samples in App. 5 has to be supplemented in future and is only missing because of lack of time and not because of the failure of fragment amplification.

Alleles for Ncr1, Ncr3, and Ncr14 (App. 3) were polymorphic in all tested populations.

Screened alleles per locus for all specimen ranged from 3 (Ncr1) to 19 (Ncr3) and the number of genotypes from 6 (Ncr1) to 47 (Ncr3) (Tab. 2). Thus, locus Ncr1 appears to be less polymorphic compared to Ncr3 and Ncr14, despite having the highest number of scored individuals, so the small sample size is probably not the reason for small range of allele-types. The probability of observed homozygotes was only significant in the case of Ncr14 according to MICROCHECKER (App. 6). However, the number of expected and observed homozygotes does not differ drastically among the other loci Ncr1 and Ncr3 either. All in all, no evidence was found neither for scoring errors due to stuttering nor for large allele dropout, nor null alleles, in all three loci. Hence, the data were considered as reliable and allow further tests on population genetics.

25 Tab. 2: Microsatellite analysis of the species Notocrangon antarcticus containing number of scored samples (NS) scored alleles (NA) and inbreeding-coefficient (FIS) for each loci and each population; observed heterozygosity (HO) and expected heterozygosity (HE) for each population over all loci. Populations represent sample sites off South Georgia Island (SGI), South Orkney Island (SOI), Antarctic peninsula (Larsen A, B and C (LA, LB, LC)), East Weddell Sea (EWS) and Terre Adélie (TA).

NS NA FIS HO HE

Ncr1 Ncr3 Ncr14 Ncr1 Ncr3 Ncr14 Ncr1 Ncr3 Ncr14

SGI 21 12 9 3 11 6 0.3830* -0.082 0.3496* 0.6138 0.7090

SOI 14 11 9 3 10 4 -0.5838 -0.1 -0.098 0.8783 0.6926

LA 13 10 5 3 9 5 -0.4667 -0.0062 -0.1765 0.9154 0.7317

LB 13 10 6 3 11 3 -0.5349 -0.0843 -0.5789 0.9487 0.6775

LC 11 10 5 3 9 5 -0.2329 0.2317 -0.2121 0.8394 0.7518

EWS 4 5 5 2 7 7 -0.5 -0.1111 -0.1111 0.9167 0.7029

TA 14 12 8 3 10 5 -0.4649 -0.0168 -0.1395 0.8829 0.7217

Bold FIS-values are significant with a * representing a p-value < 0.05. HO and HE are both in Hardy-Weinberg equilibrium with a p-value < 0.05.

After checking the data on its reliability, the genic differentiation between each possible population pair was computed with GENEPOP. The resulting analysis showed that the population of SGI clearly differed from the other six populations. Genic differentiation was significant between SGI and all other tested populations (SOI, LA, LC, TA ( p-value < 0.01), LB and EWS (p-value < 0.05), see also Tab. 3). Different than expected were the results for the probability of genic differentiation between LC and the populations of SOI and LB, which showed significant probability of genic differentiation even though the sample site of LC and LB are geographically very close to each other (Tab. 3; Fig. 5). However, the magnitude and significance of the differentiation between populations can be only regarded considering the FST or standardized F’ST values (see also following paragraphs and Tab. 4).

26 Tab. 3: Tests on genic and genotypic differentiation for the species Notocrangon antarcticus. Significance of the genic differentiation for all population pairs across all loci (upper diagonal) and of the genotypic differentiation for each population pair across all loci (lower diagonal) both calculated following the Fisher’s method. Populations represent sample sites off South Georgia Island (SGI), South Orkney Island (SOI), Antarctic peninsula (Larsen A, B and C (LA, LB, LC)), East Weddell Sea (EWS) and Terre Adélie (TA)

SGI SOI LA LB LC EWS TA distribution in all populations” for the genic differentiation probability and H0: “Genotypes are drawn from the same distribution in all populations” for the genotypic differentiation probability.

SGI SOI LA LB LC EWS TA distribution in all populations” for the genic differentiation probability and H0: “Genotypes are drawn from the same distribution in all populations” for the genotypic differentiation probability.

Im Dokument Table of Content (Seite 15-0)