• Keine Ergebnisse gefunden

1.4 Detection of somatic mutations

1.4.1 SNV detection

SNVs are the most common alterations in tumor genomes. The last decade has witnessed the development of algorithms to detect SNVs in cancer genomes: SomaticSniper [64], JointSNVMix [65], MuTect [66], Strelka [67], LoFreq [68], VarScan 2 [69] and VarDict [70] (listed in Table 1.1). Most of these methods consider only a subset of errors and biases described above. For example, VarScan2 employs empirically derived filtering parameters, including read position, strandedness, and average mapping quality between reference and variant reads to exclude candidate variants resulting from sequencing or alignment artifacts [69]. MuTect was specifically designed to detect low allele fraction variants due to either tumor heterogeneity or normal cell contamination [66]. It utilizes filters to remove false positives with characteristics corresponding to strand bias or poor mapping quality. Although a number of comparative studies of SNV callers are available [71, 72], there are no concordant recommendations of tools optimally balancing sensi-tivity and specificity. The varying performances based on different datasets suggest that multi-caller strategies are favorable [57, 63]. Of noteworthy, several machine-learning algorithms, such as MutationSeq [61] and SomaticSeq [73] have been developed. These algorithms trained their classifiers on a series of sequence features from a training dataset, then classifiers were used on a target dataset to distinguish true somatic alterations from false positives. Incorporating the strengths of different somatic mutation detection algo-rithms, these methods report higher accuracy and robustness [73].

1.4.DETECTIONOFSOMATICMUTATIONS Table 1.1: Computational tools for detecting somatic mutations

Tools Description Mutation type Reference

SomaticSniper Bayesian probability with posterior filtering SNVs [64]

JointSNVMix Probabilistic graphical model with pre-filtering SNVs [65]

MuTect Bayesian classifier with pre- and post-filtering SNVs [66]

MuSE Markov substitution model for molecular allelic evolution SNVs [74]

Pindel Pattern growth learning approach Indels [75]

Dindel Bayesian model accounting for sequencing, base-calling and mapping errors Indels [76]

Indelocator Information not available Indels [77]

Strelka Bayesian approach with posterior filtering SNVs, Indels [67]

LoFreq Statistical model for sequencing error biases SNVs, Indels [68]

SomaticSeq Ensemble approach with machine learning SNVs, Indels [73]

VarScan 2 Fisher exact test, filtering and FDR correction SNVs, Indels, SCNAs [69]

VarDict Fisher exact test with post-filtering SNVs, Indels, SVs [70]

GAP1 Pattern recognition of segmented and smoothed bi-dimensional profile SCNAs [78]

GenoCNA1 Continuous time HMM with discrete states SCNAs [79]

PICNIC1 HMM algorithm with preprocessing transformation SCNAs [80]

ASCAT1 Goodness-of-fit score of candidate solutions of tumor ploidy and tumor purity SCNAs [81]

OncoSNP1 Single unified Bayesian framework. SCNAs [82]

Continued on next page

9

1

Tools Description Mutation type Reference

GPHMM1 Global parameter HMM SCNAs [83]

ABSOLUTE1 Optimization of logarithmic scores SCNAs [84]

SegSeq2 Local change-point analysis with a subsequent merging procedure SCNAs [85]

CNAseg2 HMM segmentation with read depth variability correction SCNAs [86]

readDepth2 CBS algorithm with GC-content and mappability correction SCNAs [87]

BIC-seq2 Minimizing BIC approach with no read distribution assumption SCNAs [88]

Control-FREEC2 Sliding window approach with corrections of GC-content and mappability biases SCNAs [89]

ExomeCNV2 CBS algorithm with an assumption of read Gaussian distribution SCNAs [90]

CNAnorm2 CBS algorithm with correction of normal cell contamination and tumor aneuploidy SCNAs [91]

Patchwork2 CBS algorithm with tumor purity and ploidy estimation SCNA [92]

HMMcopy2 HMM segmentation with GC-content and mappability correction SCNAs [93]

OncoSNP-SEQ2 HMM segmentation accounting for tumor purity, ploidy and heterogeneity SCNAs [94]

CLImAT2 Integrated HMM algorithm accounting for tumor purity and ploidy SCNAs [95]

PEMer Read pair based approach with simulation based error models SVs [96]

BreakDancer Read pair based approach Indels, SVs [97]

VariationHunter Read pair based approach SVs [98]

SVDetect Integrated method of read pair and read depth SVs [99]

DELLY Integrated method of read pair and split reads SVs [100]

Continued on next page

1.4.DETECTIONOFSOMATICMUTATIONS Table 1.1 –Continued from previous page

Tools Description Mutation type Reference

PRISM Integrated method of read pair and split reads SVs [101]

HYDRA Integrated method of read pair and local assembly SVs [102]

CREST Integrated method of split reads and local assembly SVs [103]

cortex var De novoassembly method using colored de Bruijn graphs SVs [104]

Meerkat Integrated method of read pair, split reads, and assembly SVs [105]

LUMPY Integrated method of read pair, split read and read depth, as well as prior knowledge SVs [106]

MapSplice Gene fusion detection from paired-end or single-end RNA-seq data Gene fusions [107]

FusionSeq Gene fusion detection from paired-end RNA-seq data Gene fusions [108]

TopHat-Fusion Gene fusion detection from paired-end or single-end RNA-seq data Gene fusions [109]

SnowShoes-FTD Gene fusion detection from paired-end RNA-seq data Gene fusions [110]

ShortFuse Gene fusion detection from paired-end RNA-seq data Gene fusions [111]

FusionMap Gene fusion detection from WGS or RNA-seq data (both paired and single end) Gene fusions [112]

FusionHunter Gene fusion detection from paired-end RNA-seq data Gene fusions [113]

deFuse Gene fusion detection from paired-end RNA-seq data Gene fusions [114]

Comrad Integrated gene fusion detection from paired-end RNA-seq and WGS data Gene fusions [115]

ChimeraScan Gene fusion detection from paired-end RNA-seq data Gene fusions [116]

nFuse Integrated gene fusion detection from paired-end RNA-seq and WGS data Gene fusions [117]

SOAPfuse Gene fusion detection from paired-end RNA-seq data Gene fusions [118]

Continued on next page

11

1

Tools Description Mutation type Reference

INTEGRATE Integrated gene fusion detection from paired-end RNA-seq and WGS data Gene fusions [119]

1for SNP array data;2for NGS data.

HMM: Hidden Markov Model; CBS: Circular Binary Segmentation; BIC: Bayesian Information Criterion.

1.4. DETECTION OF SOMATIC MUTATIONS