• Keine Ergebnisse gefunden

1.5 Estimators

1.5.1 Classifiers - choosing either-or

In the first studies using pattern recognition for myoelectric prosthetic control (refer to Section 1.3), discriminant analysis was used. Later, artificial neural networks (ANN) were introduced [26] and extensively used (e.g. [43, 45, 46]). Further popular choices for non-parametric classifiers are for example k-nearest neighbor (kNN) [29, 47] and support vector machines (SVM) [31,34,48,49], whereas linear and quadratic discrimi-nant analysis (LDA, QDA), Gaussian mixture models and hidden markov models are some investigated examples for parametric classifiers. For an extensive comparison of different features, in combination with different estimators applied to sEMG and iEMG signals, the interested reader is referred to [50]. Further comparisons of differ-ent classifiers are found in [28, 47, 51].

1.5. Estimators

Figure 1.1: Typical signal processing chain of a modern myoelectric control pattern recognition system for upper limb prostheses. The signals originate in the muscle fibers, propagate through the arm tissue to the skin where they are picked up as sEMG signals. The signals are filtered, amplified and digitized. After windowing, discriminative signal features are calculated. In case of large resulting dimensionality (many sEMG channels, many features), dimensionality reduction is performed prior to calculating an estimate of the performed movement. The estimator needs to be trained with a series of training data. The raw estimation outputs are postprocessed (e.g. again windowed, filtered...) and ultimately the prosthesis control commands are sent to the prosthetic control unit driving the actuators.

As for the features, in general, simple and computationally cheap, (hyper-)parameter independent, well studied and robust classifiers yield comparable results to more com-plex and sensitive methods and are therefore the methods of choice in a generic setup.

These classifiers were also preferred in this work. However, as will be discussed in Chapter 4, significant performance improvements can be achieved by specific, targeted modifications of existing methods for desired objectives.

Linear Discriminant Analysis (LDA) classifier

In this section the LDA classifier is introduced in detail, since it will be used exten-sively in the further progress of this thesis as well as other related algorithms, such as CSP, PCA and KNFST (see later chapters for details on these methods). LDA attempts to express the dependent variable (class) by a linear combination of indepen-dent variables (features). This section has been adapted and extended from [38, 39].

LDA is closely related to the Fisher discriminant ratio (FDR), given as F DR= (µ1−µ2)2

σ2122 (1.6)

where µi and σ2i are the class means and variances in the transformed space, re-spectively. The criterion thus optimizes the feature separability (minimal inter-class dispersion, maximal between-class dispersion) in the transformed space, resulting in an optimized setting for classification. Realizing that the variance of variabley trans-formed from an input vector x with the linear projection vector w

y=wTx+w0 (1.7)

(1.6) can be obtained in the transformed space from the input space by F DR(w) = wTΣbw

wTΣww (1.8)

where Σb is the covariance matrix between the class means of the different classes and Σw is the average covariance matrix of data belonging to the same class.

In order to maximize the separability criterion given in (1.8),whas to be chosen such that F DR(w) is maximized:

1.5. Estimators

To make the problem well defined, the scaling factor ofw has to be fixed, which can be achieved by setting the norm ofw to 1: ||w||2 =wTw= 1.

This results in the constrained optimization problem:

arg max

w

wTΣbw

wTΣww subject to: wTw= 1 (1.10) Eq. (1.10) is a standard mathematical problem and is known as quadratic program-ming. The standard technique yielding a closed form solution for such a problem is by transforming the problem to a Lagrangian formulationL(w)

L(w) = wTΣbw

wTΣww −λ(wTw−1) (1.11)

where λ are the Lagrange multipliers. Differentiating (1.11) w.r.t. w and setting to 0:

∂L(w)

∂w = 2wΣ−1w Σb −2λw = 0 (1.12)

→wΣ−1w Σb = λw (1.13)

Eq. (1.13) is satisfied for all tuples (w, λ) where w∈ W and λ∈Rand W is the set of eigenvectors of Σ−1w Σb and λthe corresponding eigenvalues. The quantity of λ is a measure of separation quality for its correspondingw. Thus, by taking the eigenvec-tors sorted by their corresponding eigenvalues from largest to lowest, the projection directions of optimal class separability as measured by the Fisher criterion in the projected space are obtained. Note that in a C class problem, Σb is calculated from the sum of outer products ofC class mean vectors and thus its rank is at mostC−1.

Therefore, there exist only C−1 eigenvectors of Σ−1w Σb with non-zero eigenvalues.

Plugging in the obtained result forw in the linear transformation (1.7), we obtain a discriminative function g(x)

g(x) = WTx+w0 (1.14)

whereW contains the calculated eigenvectors as columns aggregated in a matrix and w0 are the corresponding biases. A sample of an unknown class can now be

classi-LDA becomes the optimal Bayesian classifier under two important assumptions, as will be shown in the following:

In a general formulation, given a certain measurement x, we should classify xto any of the C classes i if

P(i|x)> P(j|x) ∀j 6=i, i, j ∈ {1. . . C} (1.15) Read as: “Decide thatx stems from classiif the probability of class label iis higher than that of any other class, i.e. class i has the highest probability”.

Applying the Bayesian rule between posterior and prior probabilities and plugging back into (1.15) delivers:

kP(x|k)P(k) is positive and equal on both sides of the inequality, it can be eliminated, leaving:

P(x|i)P(i)> P(x|j)P(j) (1.18) There are two possible ways to obtain the class conditional probability density func-tion, P(x|·): One way would be by estimation of the distribution, but this requires a great amount of measurements which is usually hard to obtain. Another way is to assume a probability distribution. Usually the following assumption is made:

Assumption 1: All measurements xk stem from a multivariate Gaussian distribu-tion, which is given by:

1.5. Estimators

Therefore, (1.18) can be re-written as:

P(i)

k2 on both sides and taking the natural logarithm leads to:

logP(i)−1 Equation 1.22 is referred to as quadratic discriminant analysis (QDA) and the sepa-ration lines between the classes are (hyper-)quadratics (dm has been resubstituted in (1.22) to make the quadratic term apparent). It can readily be used as a classification rule, and the mean vectors and covariance matrices are approximated empirically us-ing a set of trainus-ing data.

This equation can only be simplified further under the following assumption:

Assumption 2: All classes k share the same covariance matrix: Σi = Σj = Σ. with xTΣ−1x being equal on both sides (assuming same covariance matrix for all

1

classes!), and thus:

logP(i)− 1

Ti Σ−1µi+xTΣ−1µi >logP(j)− 1

TjΣ−1µj +xTΣ−1µj (1.30) Thus for classifying an input vector x, the function g(x, i) has to be evaluated for each class i:

where Cg and W g can be calculated readily during the training of the classifier. The classification rule is then simply to evaluate (1.31) for each of the classes and classify x to class i if

g(x, i)> g(x, j) ∀j 6=i, i, j ∈ {1. . . C} (1.32) The value of g(x, i) is an indicator for the likelihood of the correctness of this clas-sification. When the sum of all likelihoods for all classes is normalized to 1, each of the obtained values can be interpreted as a probability. Since in (1.22) the logarithm was taken for mathematical convenience, re-linearization of the likelihood values is advisable by exponentiation of each g(x, i) value, as proposed in [52].