• Keine Ergebnisse gefunden

6.2 Algorithm Description

6.2.3 Peak Picking Algorithm

Based on Theorem 6.2.2 the peak picking algorithm can now be defined in a finite dimensional setting as follows. Letf 2 RLbe a single raw MALDI spectrum of lengthL. This signal is divided into overlapping slicesfi of lengthand overlapo2.0; 1/. An overlap between slices is required, otherwise peaks might be unintentionally separated into two consecutive slices. For both slices the transform coefficients based on the given framegk;lcan be computed in the next step. Frames, which might be interesting to consider are Gabor and wavelet frames, resulting in either time-frequency or time-scale coefficients. From these coefficients the maskmcan now be estimated using (6.2.9) for a given regularization parameter.

Once the mask is estimated, it indicates at the most prominent differences between two consecutive and overlapping transform coefficientsci andciC1for correspondingfi andfiC1. If both slices are similar with respect to, the mask is constant and supposedly no peak is present. Otherwise, if the mask takes values different from 1, a closer inspection might indicate possible peak locations. Generally, there are three possible cases to consider:

There is a peak present infi and no peak at the same position infiC1, resulting in values smaller than one in the maskmi, in order to lessen the influence of large coefficients inci

compared tociC1.

A peak in fiC1 and no peak at the same location infi yields a mask with coefficients larger than one.

There are peaks in fi andfiC1 at the same location, meaning that these two peaks are located exactly.1 o/apart in the original spectrumf. Both transform coefficients might not differ significantly enough for the maskmi to deviate from a constant one.

From the first two cases, peak locations can be easily estimated by considering the values of mi 1. Resulting negative values, denoted in the following by./neg, are accumulated directly by summing the absolute mask coefficients over all frequency/scale indices,

zki DX

l

ˇ ˇ ˇ ˇ

mik;l 1

neg

ˇ ˇ ˇ ˇ

: (6.2.13)

With every slicefithe signalzki then accumulates to an indicator signal for peaks. On the other hand, positive entries inmi 1, indicating at a peak infiC1, can be used in the subsequent step when the difference betweenfiC1andfiC2is estimated as an additional source of information.

Obviously, when analyzingmiC1 negative coefficients are expected at the positions where in the preceding step positive coefficients occurred. If this is the case, it can be proceeded with (6.2.13). If, however, m 1is zero, where negative coefficients are expected it can only be due to the third case mentioned above. In such cases, the information of preceding as well as subsequent masks can be used to circumvent that peaks are getting ignored, whenever peaks are present in two subsequent slicesfi andfiC1 at the same relative location. The mask for fi 1andfi should lead to positive values and negative values forfiC1andfiC2, provided all peaks in the spectrum are not periodically spaced. In MALDI data such regular patterns can be neglected though.

In general, Equation (6.2.9) tends to be more sensitive for peaks infi instead offiC1, since the regularization parametergets weighted withjcij2. Consequently, the first case of the three mentioned above is usually the one occurring most. A summary of the proposed peak picking approach can be found in Algorithm 10. Note that the inner products in line 3 and 4 are finite dimensional with respect to the frame length. Additionally, the computation forN spectra can be done using an additional loop or, as implicitly indicated in Algorithm 10, a vectorized approach.

Slicing spectra into smaller parts has several advantages. First, a raw spectrum could be analyzed without preprocessing the baseline. If the slice length is chosen small enough, the influence of baseline effects of two consecutive slices can be made negligibly small. A second advantage is that the algorithm’s sensitivity can be adjusted to the noise level. Based on time-frequency or time-scale coefficients for both slices, the amount of noise present in these

Algorithm 10:MALDI Peak Picking Algorithm Input :f 2RNL- Raw MALDI data,

2RC- Regularization parameter, - Length of slices,

o- Overlap between slices, gk;l- Frame of length, Output:z2RNL

1 M Total number of slices with lengthand overlapodividingL

2 fori D0; 1; 2; : : : ; M 1do

3 c1 ˝

fi; gk;l˛

4 c2

˝fiC1; gk;l

˛

5 y jc2j jc1j 1

6 mi .y 1/

1

jc1j2jy 1j

C

C1;

7 zi P

l

ˇ ˇ ˇ ˇ

mik;l 1

neg

ˇ ˇ ˇ ˇ

8 end

9 z P

izi

slices can be estimated, e.g., using the noise variance estimation proposed by Mallat (2008, Eq. (11.85) on p. 565). The regularization parametercan then be weighted according to the noise level. Hence, the sensitivity increases whenever the noise variance decreases within a single spectrum.

Modified Algorithm with Spatial Awareness

The algorithm described in the previous section can be modified to allow spatial awareness. In the context of MALDI, this means the peak picking process depends on neighboring spectra.

The possibility of a peak being detected is larger if the spectra of neighboring spots also contain peaks at approximately the samem=zratio. Whereas a peak surrounded by noise in the spatial neighborhood might be more likely to be ignored.

The spatial awareness can be included in the closed form solution of the maskmsimilar to the Windowed Group LASSO approach in (4.2.14), with the remaining algorithm as described

in Algorithm 10. To formulate the spatial awareness mathematically, let for every spectrum the setN be its neighboring spectra, including the actual spectrum itself. Further, denote bywj, j 2N, a weight corresponding to each neighbor such thatP

j2N wj D1. By defining Q

y D X

j2N

wj

ˇˇc2;j

ˇ ˇ ˇˇc1;j

ˇ ˇ

: (6.2.14)

and

Q

c1 D X

j2N

wj c1;j; (6.2.15)

based on transform coefficientsc1andc2of neighboring spectra, the estimation of the maskm can be formulated as

mD jc2j

jc1j 1 1 j Qc1j2j Qy 1j

C

C1: (6.2.16)

This scales the regularization parameterfor each coefficient depending on the characteristics of neighboring spectra. The weights w can, for example, be a simple average kernel, where each element is defined bywj D jN1j8j 2 N andjNjdenotes the cardinality ofN. Other choices of weights could include Gaussian kernels with different variances or circular average filters. Non-linear approaches such as median filters (Lim, 1990, Ch. 8.2.2) or edge-preserving Kuwahara filters (Bartyzel, 2016; Kuwahara et al., 1976) are also possible.