7.2 Geometry-based Method
7.2.1 A Method of 49 Facial Points
In this section, I propose a geometry-based approach that utilizes the relative lo-cation of 49 facial points to their lolo-cation in a person-specific neutral model. To minimize the individual variation, I weight each displacement with respect to the person-specific face configuration in the way to produce the final feature descrip-tor. Figure 7.3 shows the 49 facial points used here; I developed an approach
7.2. Geometry-based Method 108 Input image Face detection Facial point localization
Geometric-based feature extraction
i
0
Machine learning method Facial expression
Figure 7.2: The structure of the proposed geometric-based algorithm for facial ex-pression recognition.
to automatically locate them as detailed in Ch. 5. Before extracting the geomet-ric features, I estimate the affine transformation between the facial points in the processed frame and its prior-known person-specific location in the neutral state, under the assumption that the processed face is in near frontal pose. To this end, I use 3 facial points as they are not influenced by any deformations of the facial muscles, the eyes center and the nose bottom point as shown in Figure 7.3b. The locations of the three points (p3p = {pecr,pecl,pnb} ∈ R2×3) are inferred with re-spect to the 49 points in Figure 7.3a as follows.
pecr = p19+p22
2 (7.1)
pecl = p25+p28
2 pnb = p16
7.2. Geometry-based Method 109
0 1 2
3 4 5 6 7 8
9 10
11 12 13 141516 17
18 19
20 21 23 22
24 25
26 27 29 28 30
31 32 33 34 35 36 37 38 40 39 41 42
43 44 45 47 46 48
(a)
dh
dv
(b)
Figure 7.3: (a) The 49 facial points used in the proposed geometric-based approach for facial expression recognition. (b) Person-specific normalized factors for hori-zontal and vertical distances.
Next, I derive the affine transformation, in terms of a multiplication matrix A ∈ R2×2 and a translation vectort∈R2×1, that maps the three points in the processed frame to their equivalent in the neutral frame, satisfying the following augmented matrix.
"
pN3p 11×3
#
=
"
A t
01×2 11×1
#
×
"
p3p 11×3
#
(7.2) Some straightforward calculations using Eq. (7.2) lead to the values of A and t. 1a×b is ana×bmatrix whose all elements are one. 0a×b is ana×b matrix whose all elements are zero. Then, I exploit the obtained matrices (A, t) to transfer the located 49 facial points(p49p)in the processed frame to the neutral space(pN49pt) as follows.
"
pN49pt
11×49
#
=
"
A t
01×2 11×1
#
×
"
p49p
11×49
#
(7.3) To produce the geometric features, I first calculate the displacement between the transformed points and their equivalence in the neutral frame.
∆p49 =pN49pt−pN49p, {∆xi,∆yi, . . . ,∆x48,∆y48} ∧i6= 16 (7.4)
7.2. Geometry-based Method 110
Locate the 49 facial points
Remove the neutral in-plane rotation Derive the affine transformation matrix
Transfer the located point to the neutral space
Extract the displacement features
Neutral Expression
Anger Disgust Fear
Happiness Sadness Surprise
Figure 7.4: The feature extraction process of the propose geometric-based approach that exploits 49 facial points.
To minimize the individual variation, the displacements are evaluated with respect to person-specific face configuration (dh, dv, See Figure 7.3b) as follows.
∆p˜49={∆x1 dh
,∆y1 dv
, . . . ,∆xi dh
,∆yi dv
, . . . ,∆x48 dh
,∆y48 dv
} ∧i6= 16 (7.5) Finally, to remove the dominant effect of the large range features, a standard-ized version of∆p˜49is considered the geometric feature vector that is of length 96.
The feature extraction process is summarized in Figure 7.4. I employ the SVM to assign a facial expression to the extracted feature vector. Similar to the appearance-based evaluation, due to the lack of samples I performed LOOCV. The resulting confusion matrix is shown in Table 7.7. The first row of each class summarizes the person-specific case, while the second row stemmed from employing a general neutral model instead of the person-specific one. In this evaluation, the facial point localization within the neutral frame (first frame of each sequence) is considered as the person-specific neutral model, while an averaging of the neutral points’ lo-cation over all training subjects makes the general neutral model. The faces were arranged to have a similar interocular distance, the distance between the eye cen-ters, prior the averaging. In real scenarios, the person-specific neutral state could
7.2. Geometry-based Method 111 Table 7.7: Confusion matrix of 6-class facial expression recognition using geomet-ric features extracted from 49 facial points, based on evaluation conducted on CK+
database via SVM. For each expression, two rows are presented. The first row is dedicated for the person-specific scenario, the features are calculated with re-spect to a priorly known person-specific neutral model. The second row is the case where a general neutral model is used. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa
Ha 1.0000 0 0 0 0 0
0.9710 0 0.0145 0 0.0145 0
Su 0 0.9878 0 0 0 0.0122
0 0.9756 0 0 0.0122 0.0122
An 0 0 0.9111 0.0444 0 0.0444
0 0 0.7778 0.1333 0 0.0889
Di 0 0 0.0169 0.9831 0 0
0.0339 0 0.0847 0.8814 0 0
Fe 0.1200 0.0800 0 0.0400 0.6800 0.0800
0.0800 0.0400 0 0 0.8000 0.0800
Sa 0 0 0.1429 0 0.0357 0.8214
0 0 0.1429 0.0357 0.0357 0.7857
be obtained, with human intervention, during an initial registration step. It can be derived automatically as well by averaging the facial point detection of the consid-ered person for a long-period based on the assumption that emotional expressions spread just over few frames.
In Table 7.7, a conducted cross-validation on CK+ database shows that I can achieve an average recognition rate of89.72%when the features are extracted with respect to person-specific neutral state and a rate of 86.52% in the case of using a general neutral model. A drop of 3.2% cannot be avoided without the person-specific prior information. More confusions are experienced among the expres-sions of subtle facial deformations (sadness, anger) upon using the general neutral model. In the meantime, the recognition rates of happiness and surprise, expres-sions of high facial deformations, are not affected. Personalized geometric features
7.2. Geometry-based Method 112 Table 7.8: Confusion matrix of 7-class facial expression recognition using geometric features extracted from 49 facial points, based on evaluation conducted on CK+
database via SVM. Here, we infer the neutral state as person-specific neutral state is not available Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa Ne
Ha 0.9855 0 0 0 0 0 0.0145
Su 0 0.9634 0 0 0.0122 0.0122 0.0122
An 0 0 0.7111 0.0667 0 0.0222 0.2000
Di 0 0 0.0678 0.8814 0 0 0.0508
Fe 0.0800 0.0800 0 0 0.8000 0 0.0400
Sa 0 0 0.1071 0 0 0.5357 0.3571
Ne 0 0 0.0435 0.0290 0.0145 0.0580 0.8551
lead to better performance, which highlights potential improvements when one personalizes the training and testing as well.
In the case of lack of person-specific information, it is sensible to automatically infer the neutral state as well. Table 7.8 summarizes the results of the conducted cross-validation in the case of 7-class without prior person-specific information.
The features were extracted with respect to the general neutral model derived from the training data. Normally, adding new categories to the classifier drops its av-erage recognition rate due to new confusions with existing categories. Here, the major confusions arise between neutral and sadness; neutral and anger, due to the small subtle facial deformations in both sadness and anger expressions. For the CK+ database, the achieved average recognition rate is81.89%, where happi-ness and surprise are recognized with 98.55% and 96.34%, respectively. Disgust and neutral are recognized with high rates as well,80% and 85.51%, respectively.
on the other side, the recognition rate of sadness dropped dramatically to53.57%, where 35.71%of the sadness samples are identified as neutral. A similar trend in drop of the recognition rates was experienced in the conducted cross-validation on BU-4DFE database, more details are provided in Appendix A.2.1.
7.2. Geometry-based Method 113
pr m
plom
plm
pupm
pr e ple
pr eb pleb
Figure 7.5: The eight facial points exploited by our proposed geometric-based ap-proach to recognize the facial expressions.