• Keine Ergebnisse gefunden

Applying Semantics - Grounding through Visual Perception

4.2. Classification of Household Objects

4.2.1. Boosted Classification

As already mentioned, one of the most effective and often used boosting ap-proaches is AdaBoostby Freund and Schapire (1997). The main difference between AdaBoost and most other boosting algorithms is the strategy of applying adjustable weightsωi to each sample in the training set (~xi, yi, ωi) with i = 1...n. Where ~xi belongs to some instance space X and yi de-notes some label from the label space Y of size K. The algorithm adapts the weights in each boosting step according to the results of the currently trainedbase classifier. The original AdaBoostapproach was developed as a classifier for binary problems. Zhu et al. (2009) present a new multi-class boosting algorithm calledStagewise Additive Modeling using a Multi-Class Exponential Loss Function (SAMME)which extends the classicalAdaBoost to be used for multi-class problems.

Both boosting algorithms call a givenbase classifierh:XY repeatedly in a series of rounds t = 1, . . . , T. Initially all weights are set equally, but on each round the weights of incorrectly classified examples are increased so that thebase classifierhas to focus on the previously misclassified examples in the training set. The goodness of abase classifier’s instancehtin round tis measured by its error:

t= P

i:ht(xi)6=yi

ωi,t

P

i:1..n

ωi,t (4.8)

Notice that the error depends on the sample weights on which the base classifierwas trained. This leads to a particularly low error and thereby to a high rating if the previously misclassified samples are now correct. If in practice thebase classifierdoes not support weighted samples (which is the case for the chosen implementations), a subset of the training examples can be sampled according to the weights which then will be used for training.

The trained set ofT base classifierwill ultimately yield the additive stage-wise modelC(~x) for prediction. Each classifier in this model has a parameter αt which measures the importance that is assigned to ht.

αt= log1−t t

+ log (K−1)

| {z }

new term

. (4.9)

104

4.2. Classification of Household Objects

Note thatK denotes the total number of classes and that αt gets larger as t gets smaller. A classifier is accepted only if αt > 0. If this assump-tion is not true the classifier performs worse than random guessing and is discarded. This is because the original theory of boosting assumes that each used classifier performs better than random guessing. In a two-class problem this corresponds to an error of less than 0.5 which is equivalent to αt > 0. The additional term in equation 4.9 is the main difference be-tween classical AdaBoost and SAMME. It enables SAMME to be used for multi-class problems which requires to accept base classifier with an error greater than 0.5. More precisely: a classifier is accepted if t<(K−1)/K (or (1−t) > 1/K). This ensures that a classifier performs better than random guessing depending on the number of used classes. In the case of K = 2 the algorithm reduces to classicalAdaBoostwith a probability of 0.5 for each class. For more details and theoretical justification see Zhu et al.

(2009). Figure 4.5 illustrates the correlation of the classifier weight for the additive model and the classification error for different numbers of classes.

For K= 2 the α-curve is identical with classical boosting whereas for more than two classes α is still positive fort>0.5 (and t<(K−1)/K).

Subsequently, the weightswi of the samples are adjusted according to the classification error in the current round:

ωi,t+1ωi,t·

( e−αt :ht(x~i) =yi

eαt :ht(x~i)6=yi , i= 1,2, . . . , n. (4.10) This equation increases the weight of examples misclassified by ht and decreases the weight of correctly classified samples. Thus, the weight tends to concentrate on ”hard” examples. Finally, after passing all rounds, the additive stagewise model can be created which consists of T pairs of base classifier instanceshtand classifier weights αt.

For prediction one simply has to find the class k for which the additive weight of the stagewise model is the highest. In other words, the classifica-tion result of the resulting strong classifier H is

H(~x) = argmax

k T

X

t=1

αt·

( 1 :ht(~x) =k

0 :ht(~x)6=k (4.11) The complete procedure is shown as pseudo code in Algorithm 3.

4. Applying Semantics

Algorithm 3 SAMME pseudo code

Require: training set (~xi, yi, ωi), i= 1, . . . , n Require: base classifierh

1: Initialize weights for all samples: wi 1n, i= 1, . . . , n 2: for allt= 1, . . . , T do

3: Fit a base classifierht(~x) to the training data using weightswi 4: Computer weighted error forht:

t= P

i:ht(xi)6=yi

ωi,t

P

i:1..n

ωi,t

5: Computer classifier weight:

αt= log

1−t

t

+ log(K1)

6: Set new sample weights:

ωi,t+1ωi,t·

( e−αt :ht(x~i) =yi

eαt :ht(x~i)6=yi

, i= 1,2, . . . , n.

7: Re-normalizewi

8: end for

9: return H(~x) = argmax

k T

P

t=1

αt·

( 1 :ht(~x) =k 0 :ht(~x)6=k

106

4.2. Classification of Household Objects

0 0.2 0.4 0.6 0.8 1

−6

−4

−2 0 2 4 6

εt

α

Classical Adaboost 3 Classes 5 Classes

Figure 4.5.: The plot shows the coherence between error and classifier weight for 2, 3 and 5 classes. For two classes, the graph shows the same behavior as for classical AdaBoost. For a higher number of classes, the base classifiers are allowed to have a higher error on the training samples.

Multiple Classifiers

TheSAMMEapproach does support multi-class data but is not able to pro-cess more than one classifier type. In order to achieve this the SAMME algorithm is modified to perform an exhaustive search during steps 3 and 4 (see Algorithm 3). It therefore trains each available classifier-feature com-bination from a set of classifier settings and features. The comcom-bination that reaches the smallest errorton the weighted training set is added to the ad-ditive model in Equation 4.10. This algorithm will be denoted asExhausive SAMME (E-SAMME) (see Algorithm 4).

All permutations ofbase classifier and feature are taken into account for each training step. Thus, the optimal classification regarding the available set of classifier settings and features can be found for the weighted training set in each step.

However not every type ofbase classifiersupports training with weighted samples. As already mentioned above, the weights are used to resample the training set. For the described system two alternatives were implemented:

4. Applying Semantics

Algorithm 4 E-SAMME pseudo code Require: training set (~xi, yi, ωi), i= 1, . . . , n

Require: set of feature-classifier combinationshc, c= 1, . . . , C 1: Initialize weights for all samples: wi 1n, i= 1, . . . , n 2: for allt= 1, . . . , T do

3: Initializemin,t= 1 4: for allc= 1, . . . , C do

5: Fit a base classifier hc,t(~x) to the training data using weightswi 6: Computer weighted error for hc,t:

c,t= P

i:hc,t(xi)6=yi

ωi,t

P

i:1..n

ωi,t

7: Computer classifier with minimum error:

min,t min (c,t, min,t) h0t argmin

hc,t (c,t, min,t) 8: end for

9: Computer classifier weight:

αt= log1−

min,t

min,t

+ log(K1)

10: Set new sample weights:

ωi,t+1ωi,t·

( e−αt :h0t(x~i) =yi

eαt :h0t(x~i)6=yi

, i= 1,2, . . . , n.

11: Re-normalizewi

12: end for

13: return H(~x) = argmax

k T

P

t=1

αt·

( 1 :h0t(~x) =k 0 :h0t(~x)6=k

108

4.2. Classification of Household Objects

Roulette Wheel Selection This method selects a subset randomly from a set of weighted samples by randomly choosing a number nr between 0 and 1. The weights of the samples are summed up until the sum exceeds the random number nr. The corresponding sample will be added to the subset until a maximal number of samples Z is reached.

The samples with high weights are more likely to be chosen which has the effect that the classifier is trained more often on samples that are badly classified.

Maximum Selection This method chooses the first Z samples with the highest weights as the subset for training. This has the effect that no sample is chosen more than once. So the base classifieris trained on more different samples which leads to better generalization. The number Z is computed via a sample factor zf: Z =zf ·n.

Since thebase classifiersare trained only on the weighted subsets (step 5), but the training error is calculated on the complete training set (step 6) this error also serves as generalization measure for classifier hc,t.

The prediction in E-SAMMEworks analogous to the common AdaBoost method. Each base classifier in the additive model of E-SAMME predicts the identity of the given candidate. For each class k the classifier weights αt of the base classifiers that predict class k are summed up. Finally, the class with highest accumulative classifier weight is chosen as final prediction result (see step 13).

In the actual implementation which is used on the robot (e.g. in Robo-Cup@Home) the set of classes include an additional unknown category so there are K+ 1 classes instead of K. Depending on the base classifier and the specific task the classification is used for, the unknown class implicitly emerges through thresholding the prediction results of the base classifiers or by explicit demonstration of negative samples.