• Keine Ergebnisse gefunden

5. Results

5.2 Multigene panel sequencing in patients with ID and Short stature

5.2.6 Run statistics I – Overview of the quality of runs

A total of 12 runs were performed in this study with 24 samples in each run comprising of 286 patients’ altogether. For the ID cohort (166 patients) and Short stature cohort (120 patients), a total of seven and five runs were performed respectively (Table 5.5). An average of 1.7 ± 0.5 million total passed filter reads per run was acquired, with approximately 96.7%

mapping to the reference genome. A read enrichment (Target aligned reads/Total aligned reads) of about 75% ± 2% and about 85% ± 4% of Padded Read Enrichment (Padded target aligned reads/Total aligned reads) was acquired in the current study. Summary statistics of all the runs performed in the current study displaying the cluster passing filters, Q-scores, and cluster densities are displayed in figure 5.13.

72 Table 5.5: Run statistics of all the runs performed in the current study. The runs done using the modified protocol are highlighted in red text background. (ID: Magdeburg ID cohort; GH: Short stature cohort; Conc: concentration)

Run name

Final library

Conc

Cluster density (K/mm²)

Cluster passing filter

Q Score

>=Q30

Yield total (GB)

Error rate

% of gaps from total length of

targeted reference ID_Run1 12 pM 1657 84.7% 90.7% 7.6 0.57% 1.07%

ID_Run2 12 pM 1642 84.3% 90.8% 7.4 0.54% 1.16%

ID_Run3 12 pM 1291 91.7% 93.7% 7.1 0.51% 1.30%

ID_Run4 12 pM 1329 91.2% 93.1% 7.3 0.50% 1.14%

GH_Run1 12 pM 820 96.5% 96.4% 4.8 0.47% 1.73%

GH_Run2 12 pM 776 96.24% 96.2% 4.5 0.47% 1.69%

GH_Run3 12 pM 1020 93.2% 94.4% 5.7 0.48% 1.40%

GH_Run4 12 pM 1149 93.1 % 93.0% 6.4 0.51% 1.30%

ID_Run5 12 pM 1339 88.6% 83.1% 7.0 0.85% 1.21%

ID_Run6 12 pM 1463 86.3% 72.2% 7.4 2.59% 1.52%

GH_Run5 12 pM 1527 86.1% 80.6% 7.7 1.10% 1.38%

ID_Run7 15 pM 1251 90.7% 81.2% 6.8 1.05% 1.75%

Final library concentration - The amount of final library pool (24 samples) used for sequencing runs.

Cluster Density (K/mm²) - Shows the number of clusters per square millimeter for the run.

(The density of clusters detected by image analysis, +/- one standard deviation).

Clusters Passing Filter (%) - Shows the percentage of clusters passing filter based on the Illumina chastity filter, which measures quality.

%Q>=30 - The percentage of bases with a quality score of 30 or higher, respectively. Higher scores indicate higher confidence in the variant and lower probability of errors. For a quality score of Q, the estimated probability of an error is 10- (Q/10). For example, the set of Q30 calls has a 0.1% error rate.

Yield total - The number of bases sequenced

Error Rate - The calculated error rate, as determined by the PhiX alignment. The alignment of a PhiX spike-in as an external control to measure the percentage of reads with 0–4 mismatches, providing a direct measurement of the intrinsic error rate.

Total length of targeted reference - Total length of sequenced bases in the target reference.

73 Figure 5.13: Summary statistics of all the runs performed in the current study displaying the cluster passing filters, Q-scores, and cluster densities (CD). The cluster densities between normal/modified protocols as well as between ID/ Short stature (GH) cohorts are indicated in detail by colour coding.

In the current study about >1200 k/mm2 clusters (770–1660 k/mm2) were generated and no major difference in the cluster passing filters was observed between the runs. All the standard protocol runs had a Phred Q score > = 30 values, a mean of more than 90% of sequenced bases, reflecting the high quality of the runs and increased probability of correct base calling whereas reduced Qscores for modified protocol runs was observed.

74 The average mean depth for the targeted regions was 151.6 ± 51.6 with approximately 90.2%

of Percent Q30 (Figure 5.14) for all the runs performed in this study. The uniformity of coverage (Pct > 0.2*mean) (the percentage of targeted base positions in which the read depth is greater than 0.2 times the mean region target coverage depth) was 92.5 ± 1.1% (Figure 5.14). The run statistics per run is shown completely in Supplementary Table S7.

Figure 5.14: Coverage summary of all the runs in the current study. The coverage depth (number of reads per region) with percentage of uniformity of coverage and Q score values are displayed. The runs done using the modified protocol are highlighted in red text background. (ID: Magdeburg ID cohort; GH: Short stature cohort).

On average the target coverage at 1x, 10x, 20x, 50x was 98.9 ± 0.3%, 96.2 ± 1.1%, 93.6 ± 2.1% and 83.8 ± 5.9%, respectively. Target coverage graphs displaying percentage targets with coverage greater than 1x and 20x for all the runs in the current study are shown in Figures 5.15 and 5.16 respectively. Out of all 286 samples, few samples showed low coverages for both 1x and 20x than others probably due to poor sample quality. In particular, 3 samples (2 from ID cohort and one from GH cohort) failed having very low average mean depth (72.3, 65.3 and 23.7) and also very low 1x (<98%) and 20x (<85%) coverages. These 3 samples also had high number of gaps6 less than 20x percentage coverage with less uniformity of coverage. The failure of these samples is due to poor sample quality, inaccurate quantification of the samples and due to low sample input, the library peak distribution sizes was less than 200 bp resulting in more elevated gaps. The two samples which failed in ID cohort were repeated in another run and achieved promising results.

6 Given a depth threshold, a gap is defined as a consecutive run of bases in which all bases have coverage less than the threshold. It is in these regions that variants are filtered due to low depth.

75 Figure 5.15: Target coverage graph displaying percentage targets with coverage greater than 1x for all the runs in the current study, along for each corresponding sample in individual run. On average 98.9

% of 1x coverage was observed for each sample in the runs except for few outliers. The runs done using the modified protocol are highlighted in red text background. (ID: Magdeburg ID cohort; GH: Short stature cohort).

Figure 5.16: Target coverage graph displaying percentage targets with coverage greater than 20x for all the runs in the current study, along for each corresponding sample in individual run. On average 93.8 % of 20x coverage was observed for each sample in the runs except for few outliers. The runs done using the modified protocol are highlighted in red text background. (ID: Magdeburg ID cohort; GH:

Short stature cohort).

96%

96,5%

97%

97,5%

98%

98,5%

99%

99,5%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1x Coverage

ID_Run1

ID_Run2 ID_Run3 ID_Run4 ID_Run5 ID_Run6 ID_Run7 GH_Run1 GH_Run2 GH_Run3 GH_Run4 GH_Run5

65%

70%

75%

80%

85%

90%

95%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

20x Coverage

ID_Run1 ID_Run2 ID_Run3 ID_Run4 ID_Run5 ID_Run6 ID_Run7 GH_Run1 GH_Run2 GH_Run3 GH_Run4 GH_Run5

76 There were quite a number of gaps with ≤ 20x coverage and the low coverage was due to that the gaps were either regions of high (>60%) or low (<30%) GC content. Target coverage graphs displaying number of gaps less than 20x percentage coverage for all the runs in the current study are shown in Figure 5.17.

Figure 5.17: Gap summary graph displaying the number of gaps less than 20x percentage coverage for all the runs in the current study, along for each corresponding sample in individual run. On average 42 gaps per sample was present in each run with no significant differences between the modified and standard runs except for few outliers. The runs done using the modified protocol are highlighted in red text background. (ID: Magdeburg ID cohort; GH: Short stature cohort).

An example summary statistics graph showing comparison between standard protocol and modified protocol, for two individual runs performed in the current study, displaying the Q-scores, number of gaps and gap length in bp along for each corresponding sample in individual run is shown in detail in Supplementary figure 6.