• Keine Ergebnisse gefunden

2 Methods

2.10 Variant calling

To discover the single nucleotide variants (SNVs) two softwares were used: DNAstar and Torrent Variant Caller plug-in of Torrent suite which are described in the following text.

2.10.1 DNAstar software

DNAstar (version 11.1) is a commercial software package, in which we have used Seqman NGen, Seqman Pro and Arraystar. Seqman NGen aligns the sequences to the reference genome, which in this case was human genome version 19. Seqman Pro was used to visualize the sequences.

Arraystar was used to compare genotypes of all variants in all patients along with the coverage at this position in a single table.

Using Seqman Ngen software the variants were called. The parameters used were as following Mer size(or K-mer size which indicates the number of nucleotides which are considered as a unit in alignments) and minimum matching percentage were set to automatic, genome ploidy was set to Diploid, in the SNP tab of advanced options “calculate SNPs” was checked, minimum SNP percentage was set to 0, SNP confidence threshold was adjusted to 0, minimum SNP-count was 2, minimum base quality score was set to 5 and the Check strands box was checked.

53 2.10.2 Torrent Variant Caller

Torrent Variant Caller identifies single-nucleotide polymorphisms (SNPs), multinucleotide polymorphisms (MNPs), insertions, and deletions in performed sequencing analysis. TVC does this action in two following steps.

Step 1 is to survey the aligned sequences and subsequently, find candidate regions which have evidences of presence of a variant in the examined region. These regions are then marked as probably containing a variant. Candidate regions must pass a minimal set of requirements (such as read quality, minimum coverage, etc.) (Garrison E 2012).

Step 2 is to additionally assess the candidate variant identified in step 1 to assign the confidence rate of each variant and also to assign the genotype. Candidate positions are then evaluated by a statistical model offered from GATK. GATK is a software package developed at the Broad Institute to analyze next-generation resequencing data. Finally, to decide if the assumptions of the model are not in agreement with the data, variants are filtered by several criteria.

All of these actions are controlled by predefined parameters provided for the software. It is possible to use pre-configured parameter settings in accordance with the estimated variant frequency in the sample (Germline or Somatic) and the desired stringency level of the calls. On the other hand, Torrent Variant Caller offers users to define the parameters, depending on their population and aims.

The Torrent Variant Caller has an output file for each individual; in which only the variations are displayed. Using these output files by collecting the variations found a hotspot file containing all variants was made and uploaded to Torrent suite. This file contains a list of chromosomal positions, along with the reference and alternative allele belonging to that position. This file (the hotspot file) obliges the TVC to analyze and report all listed positions in the hotspot file, even if the called allele is not a variant. Consequently, it was determined whether the lack of the call is because of absence of coverage or because the called sequence is the same as reference allele.

54

Figure 2.10.1 Screen-shot of a Torrent variant caller output. In the top section information regarding the sequencing run name, application type, barcode name, the parameters used for variant calling along with a summary of the variations found are displayed. A bar chart of variations found categorized by chromosome is shown on the right side. Finally the variants are shown at the bottom. Data regarding variants’ chromosomal position, alleles and the quality of sequencing at the position are demonstrated.

Figure 2.10.1 shows a representative sample’s Torrent Variant Caller output view. The plug-in output files, can be loaded into Broad Institute's Integrative Genomics Viewer for further visualization or analysis (Thorvaldsdóttir et al. 2013). Upon clicking, the link opens and displays all reads covering the chromosomal position. Figure 2.10.2 shows a representative heterozygote variant call on Integrative Genomics Viewer.

55

Figure 2.10.2 Integrative Genome Viewer (IGV) interface. Overview of IGV interface for showing a C to T variation. Gray bars represent the reads. Each horizontal grey bar indicates one read. All reads are aligned by the software and the variants are marked.

Here only a selected number of the reads at that position are illustrated.

Two pre-configured settings for germline SNPs were applied to the analysis. These configurations were High stringency and Low stringency. Table 2.10.1 displays the settings used in analysis using Torrent Variant Caller. After surveying the results of these analyses, a custom setting was made to better match our desired coverage threshold and confidence of called variants.

56

Table 2.10.1 Torrent Variant Caller parameter configurations. The steps after which we decided to use custom parameters are stated in paragraph 3.4.3.

data_quality_stringency 8.5 6.5 8.5

hp_max_length 8 8 8

filter_unusual_predictions 0.25 0.3 0.25

filter_insertion_predictions 0.2 0.2 0.2

filter_deletion_predictions 0.2 0.2 0.2

snp_min_cov_each_strand 3 0 3

snp_min_variant_score 10 10 10

snp_min_allele_freq 0.15 0.1 0.15

snp_min_coverage 20 6 10

snp_strand_bias 0.95 0.95 0.95

snp_beta_bias 8 30 8

indel_min_cov_each_strand 3 5 3

indel_min_variant_score 10 10 10

indel_min_allele_freq 0.15 0.1 0.15

indel_min_coverage 20 15 20

indel_strand_bias 0.85 0.85 0.85

indel_beta_bias 8 8 8

hotspot_min_cov_each_strand 3 3 3

hotspot_min_variant_score 10 10 10

hotspot_min_allele_freq 0.15 0.1 0.15

hotspot_min_coverage 20 6 10

hotspot_strand_bias 0.95 0.95 0.95

hotspot_beta_bias 8 30 8

downsample_to_coverage 400 400 400

outlier_probability 0.01 0.01 0.01

prediction_precision 1 1 1

relative_strand_bias 0.8 0.8 0.8

Parameters in blue color which control the minimum coverage acceptable are modified to prepare the custom parameters from high stringency parameters.

57