Automatic correction of erroneous trials - 5 Study 2

5 Study 2 - Error-Related Potentials

5.4 Automatic correction of erroneous trials

User wants to grasp objectA

BCI predicts objectB

ErrP elicited

ErrP recognized (True Positive)

ErrP not recognized (False Negative)

Flash Stimulus/Object

Predict most likely object

BCI predicts objectA

No ErrP elicited

ErrP recognized (False Positive) ErrP not recognized

(True Negative)

Grasp Object

False recognition Correct recognition

IntentRecognitionPhase (Multiple Trials)

Error CorrectionPhase (Single Trial)

Execution Phase

(1) (3)

(4) (2)

Figure 5.5. Schematic view of an error correction method using error-related potentials. Green paths corre-spond to increased or unchanged performance, whereas red paths denote performance loss.

5.4 Automatic correction of erroneous trials

or even improvement compared to the standard classification procedure without error cor-rection, whereas red paths denote cases where a loss of performance is involved. Section 5.5 will investigate the performance costs inclined with the depicted paths (1)-(4) and propose an analytical method to compute the performance of such a BCI system.

The task of implementing such a system and automating the ErrP recognition can be split into three different sub-tasks:

Preprocessing A process that aims to convert the raw data into a suitable representation and improve the SNR of the data with methods that can also be applied in real-time in an online BCI.

Feature Extraction Based on the preprocessed data, specific features that discriminate well between classes are extracted.

Classification The classification problem involves assigning class labels to the input data consisting of the previously computed feature vectors.

Based on the results of the offline analysis, combinations of popular preprocessing, feature extraction and classification methods will be compared in this section.

5.4.1 Preprocessing

Common to all method combinations, the continuous EEG data were bandpass filtered to 1-10Hz and segmented into epochs from 100ms pre-stimulus to 800ms after stimulus onset.

For each epoch a baseline correction was conducted with the data of the 100ms pre-stimulus interval. This was done in order to remove linear trends that might still be present after the filtering. As a first step the data were inspected with respect to obvious artifacts like eyeblinks or muscle activity. Even though the subjects were instructed not to blink especially not during the feedback phase, frequent eye blink artifacts were found which primarily occurred during the pre-stimulus time and near the end of the epoch. No muscle artifacts could be identi-fied throughout the whole dataset. Every epoch with obvious artifacts exceeding 100µV was rejected from the dataset.

Downsampling In this step, thedownsamplingmethod was subject to evaluation. The idea is that the original data dimensionality can be reduced without any significant loss of infor-mation since the lowpass filtered data can be equally well represented by a lower number of samples per time interval. This reasoning is a direct result of Nyquist’s theorem [Nyquist, 1928] which states that a signal with maximum frequencyf has to be sampled at a frequency of at least 2·f to capture the full frequency content. Due to the lowpass filter which limited the maximum frequency to 10Hz from the original 256Hz, a dimensionality reduction fac-tor of _{256H z}^{10H z} =0.391 can be achieved which equals almost 61% smaller input data. A lower dimensional dataset is highly preferable since any of the classification methods will benefit from lower dimensional input data given the limited number of input data for each class.

5.4.2 Feature extraction

−2.9

−1.4 0 1.4 2.9

Trials

Correct trials − Comp. 5

20 40 60 80 100 120

−1.5 1.5

0.040

−3.29

ERSP

−0.01244 dB

−200 −100 0 100 200 300 400 500 600 700 0

0.6

ITC

Time (ms)

3.375 Hz

−3.5

−1.7 0 1.7 3.5

Trials

Erroneous trials − Comp. 5

10 20 30 40

−2.9 2.9

−1.060

ERSP

−3.771 dB

−200 −100 0 100 200 300 400 500 600 700 0

0.9

ITC

Time (ms)

5.375 Hz

Figure 5.6. ERP image of subject 1 for condition Correct (left) Error (right) visualizing ICA component number 5.

The feature extraction step aims at finding and extracting patterns in the data that discrim-inate better between classes than the raw data alone. As methods, independent component analysis (ICA), principal component analysis (PCA) and t-test are considered in the evalua-tion.

ICA features ICA was chosen since it decomposes the signal into a set of statistically inde-pendent sources. The resulting signals can be interpreted as a decomposed linear mixture of EEG components that are characteristic for the measured time interval. Experimental results (e.g. [Jung et al., 1998], [Callan et al., 2001], [Jung et al., 1999]) showed that event-related ac-tivity is usually assigned to one specific independent component while the remaining signal activity like artifacts and noise are assigned to different components. As a result, the SNR of the relevant signal is improved.

To receive a transformation that decomposes the mixed signal, a projection matrixWis computed that minimizes the gaussianity of the data and transforms the sensor space dataX into independent component spaceSor more formally

S=WX. (5.1)

5.4 Automatic correction of erroneous trials

As ICA algorithm, extendedInfomax ICA[Bell and Sejnowski, 1995] was used which com-putes the unmixing matrix by minimizing the mutual information of the data projected on all axes. The resulting component activations show that it should be possible to discriminate between both conditions. One example is shown in figure 5.4.2 which visualizes the prop-erties of component 5 of subject 1 who participated in the high task difficulty experiment.

Especially the epochs of the error condition show a stable signal phase angle over consec-utive trials which can be seen as straight vertical color gradients whereas the trial-to-trial variability of the correct condition is much higher. Further, the middle plot shows no sig-nificant increase in spectral power for thecorrect condition. This fact indicates that after a correct stimulus there is no significant change in the signal amplitude of any frequency com-pared to the baseline time interval. The highest phase coherence could be measured for the 3Hz frequency which is highest during baseline and extends to half the post-stimulus time interval. This finding corresponds to the analysis of the original sensor-space data in section 5.3. Therefore it was known that highly discriminable time intervals can be found from 200-500ms after stimulus onset. But since the former analysis was based on the combined dataset of all subjects, an automatic method to find the most discriminable intervals is likely to find better intervals since slight subject dependent latency variations can be expected.

−0.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.2 0.4 0.6 0.8 1

IC 1

time (s)

−0.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.2 0.4 0.6 0.8 1

IC 2

time (s)

−0.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.2 0.4 0.6 0.8 1

IC 3

time (s)

Feature p−value Feature p−value

Feature p−value

Figure 5.7. First 3 ICA non-artifactual components of subject 2. A timepoint is selected as feature when the p-value exceeds 0.99, i.e. the 0.01 significance level. The selected regions correspond to those significant regions found in the offline analysis.

t-test features The encouraging results of [Visconti et al., 2008] motivated for a similar pro-cedure to identify regions of interest that could be used to reduce the dimensionality of the data while retaining the discriminable dimensions. For each epoch, channels were

concate-nated into one single feature vector. A data point was marked as a possible feature candi-date if the null hypothesis of equal means using a two-tailed t-test at significance levelα was rejected. The remaining data point indices were retained and were used to cut out the significant regions of the epochs in the dataset. The features found by this procedure partly matched what could be expected from visual inspection of the independent components and thus lends itself to be combined with ICA.

PCA features To compare the performance of the proposed t-test method a second feature extractor was added based onprincipal component analysis (PCA). We have been using this method in our previous P300 experiments on time domain sensor space data with good suc-cess [Kaper, 2006,Lenhardt, 2006,Lenhardt et al., 2008,Finke et al., 2009]. Every 14×115 epoch in the training set was concatenated to form a 1×1610 vector. Applied to the full dataset, a n×1610 matrix , withn being the number of epochs in the training set, was obtained. PCA was computed on this set resulting in an×n matrix since all principal components higher thannare essentially zero. The number of PCs were chosen based on the amount of variance they account for. The cutoff was set to 1−α. Multiple values forαwere tested.

5.4.3 Classification

The primary classification method used wasLinear Discriminant Analysis (LDA)(c.f. 4.2.3).

This method and its variants have been used in many studies (e.g. [Kaper, 2006,Visconti et al., 2008,Lotte and Cuntai, 2009] and in a comparison of classification methods [Krusienski et al., 2006]), LDA was among the best methods . Usually an LDA classifier is trained using many epochs of P300 and non-P300 epochs (e.g. 100+). In the special case of ErrP training it is not possible to collect such an amount of data for erroneous epochs since they occur at a very low rate during the experiment. Additionally, even more erroneous epochs would be required than P300 epochs to get a better estimate of their statistical properties. The reason for this is that for P300-speller tasks it is usually acceptable to accumulate epochs over multiple trials and classify on their mean, i.e. they don’t rely on single-trial classification. In the case of error-detection using error potentials, the input data consists of only one epoch since the classified result is shown to the user only once. Due to the small number of training samples (usually in the order of 20-50) compared to the dimensionality of more than 1500, normal LDA cannot be used since the covariance estimate needed for the calculation of the weight vector becomes singular. To improve the covariance estimates, shrinkage of the covariance matrices has been selected as regularization method. To regularize the sample covariance matrix defined by

S=XX^T/n (5.2)

a linear combination of the type

Σ^∗=p₁I+p₂S (5.3)

Im Dokument A Brain-Computer Interface for robotic arm control (Seite 92-97)