• Keine Ergebnisse gefunden

7 An Asynchronous BCI for Robot Control

7.4 Results

0.6 0.4 -0.3

0.2

0.6 0.4

-0.1

-0.2

0.6 0.4

-0.3 0.2

0.6 0.4

-0.1

-0.2

0.6 0.4

-0.3 0.2

0.6 0.4

-0.1

-0.2

0.6 0.4

-0.3 0.2

0.6 0.4

-0.1

-0.2

Epoch

Subtrial

D(x) == false

D(x) == false

D(x) == false

D(x) == false 1.

2.

3.

4.

5.

D(x) == true

0.6 0.4

-0.3 0.2

0.6 0.4

-0.1

-0.2

Output classification

Figure 7.5.Illustration of the asynchronous EDS algorithm with dynamic window sizes. An averaging window grows with increasing number of subtrials until a maximum length is reached. For each subtrial, the current window serves as input for the decision functionDwhich either start another subtrial if the contained values do not significantly correlate with the P300 score distribution, or outputs the result if the contained values are significantly similar to the P300 score distribution.

trated in figure 7.5 which shows how each consecutive subtrial adds one more classification score to the window. For each added score, the windows grows by one until it reaches its maximum size of 4. After this point, the oldest score observation gets replaced by the most recent one. The decision function will evaluate the current window for every subtrial and returntruewhenever the sequence of scores is likely to stem from the P300 score normal distribution with parametersµ+andσ+.

7.4 Results

Pickup Task Placement Task No-Control Task Subject Acc. Sym./min Acc. Sym./min FPR F P R TTA

S1 80% 3.3 (2:20) 70% 1.1 (6:30) 0.4 0.2 12s

S2 90% 2.3 (2:50) 70% 1.4 (5:00) 0.6 0.71 18s

S3 100% 5.9 (1:40) 90% 1.7 (5:50) 0.2 0.39 8s

S4 80% 2.3 (3:40) 60% 0.7 (8:20) 0.8 1.54 10s

Mean 87.5% 3.45 72.5% 1.23 0.5 0.71 12s

Table 7.1. BCI performance achieved in the study. The measures accuracy (Acc.), correct symbols per minute (Sym./min), false positive rate as wrong actions per minute during theno-control period andtime to active (TTA) are shown. For comparison, F P R reflects the results from [Zhang et al., 2008] who also used 4 subjects in his study. These rates were assessed with a different set of subjects.

different locations which required 20 selection commands (i.e. 10 forpickupand 10 for the placementtask). The task performance was measured by calculating the communication rate in correct symbols per minute, as well as the overall accuracy.

To evaluate the feasibility of the asynchronous protocol, the subject’s focus had to be dis-tracted from the BCI stimulus presentation. For this reason, they were asked to fill out a short questionnaire and answer to questions of the experiment supervisor after the experiment ended. During that time the BCI was still running and was ready to receive commands. The evaluation criterion here was the number of action conducted during this period. A perfect recognition of user inactivity could not be achieved but the number of 0.5 conducted actions per minute is slightly better than the method proposed by [Zhang et al., 2008] which was also used in P300 wheelchair control of [Rebsamen et al., 2007]. Further, to evaluate how long the system takes to recognize that the subject is now actively communicating with the sys-tem, one more object relocation had to be carried out. Table 7.1 summarizes the number of wrongly conducted actions per minute during the questionnaire period as well as the time it took the system to recognize a voluntary selection command of the user (time to active (TTA)).

For the best performing subject 3, a close to perfect result in both tasks could be achieved with accuracies of 100% for the pickup taskand 90% for the placement task. As a general trend however, the accuracy in the placement task is inferior to the pickup task with about 10-20% less accurate predictions. The practical communication speed during the pickup task ranged from 2.3-5.9 symbols per minute seem to be comparable to other state of the art work (e.g. [Townsend et al., 2010]). In contrast, the communication speed during the placement task dropped by a factor of almost 3. Considering the much larger set of stimuli, this is not a surprising fact since for the placement, a set with 64 stimuli has to be presented whereas the pickup task only required 5 stimuli to be displayed. For this reason, it would be inaccurate to directly compare these results to other works which work with different number of stimuli and different time periods between consecutive stimulus presentations.

The results of theno-controlstate detection are comparable to the method proposed by

Number Question

1 The system is intuitive 2 The system is exhausting 3 The system reacts as I expect it 4 The system is slow

5 The system usage is complicated

6 The system reacts different when I talk or move 7 Concentration declines after the training phase 8 I had to concentrate hard to make it work 9 My commands were recognized reliably

Table 7.2. Translated questions from the questionnaire. The subjects had to assign scores to each question between 1 (does not agree) to 10 (fully agree).

Zhang. Even though a slightly lower FPR was achieved by the EDS method, the rates per subject seem to suggest that there is no significant difference in detection performance be-tween both methods. This fact is not surprising considering the similarity of the methods.

Both exploit the scalar score distribution of a classifier and statistical significance testing to assign class labels. The difference consists mainly of the fact that the EDS requires only P300 and non-P300 class epochs for the training while Zhang’s method requires epochs acquired during user inactivity as a third class.

To evaluate the practical usability of the system, relevant information from the question-naires were considered. The appendix lists the original questionquestion-naires in german language while table 7.2 shows a translated version of the questions.

The mean scores assigned by the subjects are shown in figure 7.6. For convenience, the mean values have been converted intoperformance values. Theperformance valuewill be referred to as a value that equals 1 if the given score corresponds 100% to the desired be-haviour of the system. For instance, question 2 states that the system is exhausting. When a subject fully agrees to this question, a 10 will be assigned. Since the intention of a BCI devel-oper would be to design a system that is not exhausting at all, theperformance valuewill be 0. On the other hand, a system that is intuitive and easy to use (question 1 and 5) is highly de-sirable, hence a user score of 10 would correspond to aperformance valueof 1. With respect to the intended behavior, questions 2, 4, 5, 6, 7 and 8 are negative formulations which require negation (i.e. 11−scor e) of the assigned scores to receive unnormalized performance values.

After negation, all scores were normalized to the [0..1] interval.

The results show that the BCI did not perform well in all parts concerning concentration (i.e. question 2, 7 and 8). None of these values could reach 50% performance. All subjects testified that the BCI usage is very exhausting after a while. It is not surprising that question 7 and 8 did not receive favorable scores as well since an exhausted user will have difficul-ties concentrating on a specific task. However, the standard deviation for question 7 and 8 are much larger than for question 2 which indicates that the subjects opinions differed to a

7.4 Results

1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 10

7.75 8.5

7.75 6.5

2.75 4.5

7.75 6.25

8.25 (a) Mean score

1 2 3 4 5 6 7 8 9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.775

0.25 0.775

0.45 0.825

0.65

0.325 0.475

0.825 (b) Mean performance

Figure 7.6. (a) Mean scores of the usability study with standard deviation bars. (b) Normalized performance values, with 1 corresponding to best and 0 to worst outcome for the specific question.

larger extend. Another important factor is the perceived communication speed (question 4).

On average, speed was rated as mediocre. If the score assigned by subject 2 was not included, the result would be even worse. For subject 3 however, even though he achieved statistically the highest speed and most accurate results, only a performance of 0.3 (score of 7)was as-signed whereas subject 2 rated the speed with 0.7 (score of 3). This phenomenon shows that the perceived communication speed is highly subject dependent. It remains unclear, which factors have an influence on the speed perception in this setting. On the more positive side, the remaining fields covered by the questionnaire performed much better. Questions 1 and 5 were aimed to assess the ease of use of the BCI with the expectation, that both question would be answered with the same score. The mean scores shown in figure 7.6 reveal that question 1 and 5 were consistently given similar performance rating of 0.77 and more with low standard deviations. These results indicate that the subjects had no difficulties using the BCI. A similar outcome could be ascertained for the questions 3 and 8, both aimed to assess the subjectively perceived BCI accuracy. Both questions were given similar scores of more than 77% with comparable standard deviations. As an interesting fact, the perceived accu-racy even matches the real mean accuaccu-racy of 80% (averaged over all subjects and both tasks).

Thus it can be asserted, that subjects of this study had good feeling for the overall accuracy and did not get biased by the long and exhausting experiment.

The last topic to be evaluated was theno-control state detectionwhich was covered by ques-tion 6. Overall, the scores showed that most of the subjects did not notice a huge difference while talking or moving (performance values of 0.5-0.7), except subject 2 who assigned a per-formance value of 0.3. The mean value still exceeds 60%.