Boosted Classification - Classification of Household Objects

Applying Semantics - Grounding through Visual Perception

4.2. Classification of Household Objects

4.2.1. Boosted Classification

As already mentioned, one of the most effective and often used boosting ap-proaches is AdaBoostby Freund and Schapire (1997). The main difference between AdaBoost and most other boosting algorithms is the strategy of applying adjustable weightsωi to each sample in the training set (~xi, yi, ωi) with i = 1...n. Where ~x_i belongs to some instance space X and y_i de-notes some label from the label space Y of size K. The algorithm adapts the weights in each boosting step according to the results of the currently trainedbase classifier. The original AdaBoostapproach was developed as a classifier for binary problems. Zhu et al. (2009) present a new multi-class boosting algorithm calledStagewise Additive Modeling using a Multi-Class Exponential Loss Function (SAMME)which extends the classicalAdaBoost to be used for multi-class problems.

Both boosting algorithms call a givenbase classifierh:X→Y repeatedly in a series of rounds t = 1, . . . , T. Initially all weights are set equally, but on each round the weights of incorrectly classified examples are increased so that thebase classifierhas to focus on the previously misclassified examples in the training set. The goodness of abase classifier’s instancehtin round tis measured by its error:

_t= P

i:ht(xi)6=yi

ωi,t

i:1..n

ω_i,t (4.8)

Notice that the error depends on the sample weights on which the base classifierwas trained. This leads to a particularly low error and thereby to a high rating if the previously misclassified samples are now correct. If in practice thebase classifierdoes not support weighted samples (which is the case for the chosen implementations), a subset of the training examples can be sampled according to the weights which then will be used for training.

The trained set ofT base classifierwill ultimately yield the additive stage-wise modelC(~x) for prediction. Each classifier in this model has a parameter αt which measures the importance that is assigned to ht.

αt= log1−_t t

+ log (K−1)

| {z }

new term

. (4.9)

104

4.2. Classification of Household Objects

Note thatK denotes the total number of classes and that αt gets larger as _t gets smaller. A classifier is accepted only if α_t > 0. If this assump-tion is not true the classifier performs worse than random guessing and is discarded. This is because the original theory of boosting assumes that each used classifier performs better than random guessing. In a two-class problem this corresponds to an error of less than 0.5 which is equivalent to αt > 0. The additional term in equation 4.9 is the main difference be-tween classical AdaBoost and SAMME. It enables SAMME to be used for multi-class problems which requires to accept base classifier with an error greater than 0.5. More precisely: a classifier is accepted if t<(K−1)/K (or (1−_t) > 1/K). This ensures that a classifier performs better than random guessing depending on the number of used classes. In the case of K = 2 the algorithm reduces to classicalAdaBoostwith a probability of 0.5 for each class. For more details and theoretical justification see Zhu et al.

(2009). Figure 4.5 illustrates the correlation of the classifier weight for the additive model and the classification error for different numbers of classes.

For K= 2 the α-curve is identical with classical boosting whereas for more than two classes α is still positive for_t>0.5 (and _t<(K−1)/K).

Subsequently, the weightswi of the samples are adjusted according to the classification error in the current round:

ω_i,t+1←ω_i,t·

( e^−α^t :h_t(x~_i) =y_i

e^α^t :h_t(x~_i)6=y_i , i= 1,2, . . . , n. (4.10) This equation increases the weight of examples misclassified by h_t and decreases the weight of correctly classified samples. Thus, the weight tends to concentrate on ”hard” examples. Finally, after passing all rounds, the additive stagewise model can be created which consists of T pairs of base classifier instancesh_tand classifier weights α_t.

For prediction one simply has to find the class k for which the additive weight of the stagewise model is the highest. In other words, the classifica-tion result of the resulting strong classifier H is

H(~x) = argmax

k T

t=1

αt·

( 1 :ht(~x) =k

0 :h_t(~x)6=k (4.11) The complete procedure is shown as pseudo code in Algorithm 3.

4. Applying Semantics

Algorithm 3 SAMME pseudo code

Require: training set (~xi, yi, ωi), i= 1, . . . , n Require: base classifierh

1: Initialize weights for all samples: wi← ¹_n, i= 1, . . . , n 2: for allt= 1, . . . , T do

3: Fit a base classifierh_t(~x) to the training data using weightsw_i 4: Computer weighted error forh_t:

t= P

i:ht(xi)6=yi

ω_i,t

i:1..n

ωi,t

5: Computer classifier weight:

αt= log

1−t

+ log(K−1)

6: Set new sample weights:

ωi,t+1←ωi,t·

( e^−α^t :ht(x~i) =yi

e^α^t :ht(x~i)6=yi

, i= 1,2, . . . , n.

7: Re-normalizewi

8: end for

9: return H(~x) = argmax

k T

t=1

α_t·

( 1 :h_t(~x) =k 0 :h_t(~x)6=k

106

4.2. Classification of Household Objects

0 0.2 0.4 0.6 0.8 1

−6

−4

−2 0 2 4 6

ε_t

Classical Adaboost 3 Classes 5 Classes

Figure 4.5.: The plot shows the coherence between error and classifier weight for 2, 3 and 5 classes. For two classes, the graph shows the same behavior as for classical AdaBoost. For a higher number of classes, the base classifiers are allowed to have a higher error on the training samples.

Multiple Classifiers

TheSAMMEapproach does support multi-class data but is not able to pro-cess more than one classifier type. In order to achieve this the SAMME algorithm is modified to perform an exhaustive search during steps 3 and 4 (see Algorithm 3). It therefore trains each available classifier-feature com-bination from a set of classifier settings and features. The comcom-bination that reaches the smallest errorton the weighted training set is added to the ad-ditive model in Equation 4.10. This algorithm will be denoted asExhausive SAMME (E-SAMME) (see Algorithm 4).

All permutations ofbase classifier and feature are taken into account for each training step. Thus, the optimal classification regarding the available set of classifier settings and features can be found for the weighted training set in each step.

However not every type ofbase classifiersupports training with weighted samples. As already mentioned above, the weights are used to resample the training set. For the described system two alternatives were implemented:

4. Applying Semantics

Algorithm 4 E-SAMME pseudo code Require: training set (~xi, yi, ωi), i= 1, . . . , n

Require: set of feature-classifier combinationshc, c= 1, . . . , C 1: Initialize weights for all samples: wi← ¹_n, i= 1, . . . , n 2: for allt= 1, . . . , T do

3: Initializemin,t= 1 4: for allc= 1, . . . , C do

5: Fit a base classifier h_c,t(~x) to the training data using weightsw_i 6: Computer weighted error for h_c,t:

c,t= P

i:hc,t(xi)6=yi

ωi,t

i:1..n

ω_i,t

7: Computer classifier with minimum error:

_min,t ← min (_c,t, _min,t) h⁰_t ← argmin

h_c,t (c,t, min,t) 8: end for

9: Computer classifier weight:

α_t= log₁₋

min,t

+ log(K−1)

10: Set new sample weights:

ωi,t+1←ωi,t·

( e^−α^t :h⁰_t(x~i) =yi

e^α^t :h⁰_t(x~i)6=yi

, i= 1,2, . . . , n.

11: Re-normalizewi

12: end for

13: return H(~x) = argmax

k T

t=1

αt·

( 1 :h⁰_t(~x) =k 0 :h⁰_t(~x)6=k

108

4.2. Classification of Household Objects

Roulette Wheel Selection This method selects a subset randomly from a set of weighted samples by randomly choosing a number n_r between 0 and 1. The weights of the samples are summed up until the sum exceeds the random number nr. The corresponding sample will be added to the subset until a maximal number of samples Z is reached.

The samples with high weights are more likely to be chosen which has the effect that the classifier is trained more often on samples that are badly classified.

Maximum Selection This method chooses the first Z samples with the highest weights as the subset for training. This has the effect that no sample is chosen more than once. So the base classifieris trained on more different samples which leads to better generalization. The number Z is computed via a sample factor z_f: Z =z_f ·n.

Since thebase classifiersare trained only on the weighted subsets (step 5), but the training error is calculated on the complete training set (step 6) this error also serves as generalization measure for classifier hc,t.

The prediction in E-SAMMEworks analogous to the common AdaBoost method. Each base classifier in the additive model of E-SAMME predicts the identity of the given candidate. For each class k the classifier weights αt of the base classifiers that predict class k are summed up. Finally, the class with highest accumulative classifier weight is chosen as final prediction result (see step 13).

In the actual implementation which is used on the robot (e.g. in Robo-Cup@Home) the set of classes include an additional unknown category so there are K+ 1 classes instead of K. Depending on the base classifier and the specific task the classification is used for, the unknown class implicitly emerges through thresholding the prediction results of the base classifiers or by explicit demonstration of negative samples.

Im Dokument The attentive robot companion: learning spatial information from observation and verbal interaction (Seite 130-135)