features based on cross-database evaluation, the model was trained using BU-4DFE and evaluated on ck+ database. SVM was employed as a machine learning method. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa Ne
Ha 0.9571 0 0.0143 0.0143 0 0.0143 0
Su 0.0122 0.7927 0.0244 0.1220 0.0244 0 0.0244
An 0.1333 0 0.6667 0.1778 0 0.0222 0
Di 0.0678 0 0.5932 0.3051 0.0169 0 0.0169
Fe 0.5600 0 0.0400 0.0800 0.2800 0 0.0400
Sa 0.0345 0 0.2759 0.4483 0.0345 0.2069 0
Ne 0.1421 0 0.2741 0.2538 0.0660 0.0406 0.2234
B.2 Geometry-based Method
Martinez [107] state that robust computer vision algorithms for face analysis and recognition should be based on configural and shape features, which are defined as the distance between facial components (mouth, eye, eyebrow, nose, and jaw line).
In other words, geometry-based methods are effective for facial analysis tasks. In this dissertation, I exploited geometry-based features to provide a frame-based decision about the expression, hence no spatio-temporal features [130] were ex-ploited. Two scenarios were investigated in Sec. 7.2.1, expression recognition with prior information about person-specific neutral state and without. A geometry-based method ignores the information regarding skin appearance; consequently, it is less susceptible to changes in illumination. Moreover, it is easier to minimize the differences between individuals at the availability of person-specific neutral state.
B.2.1 A Method of 49 Facial Points
In Sec. 7.2.1, I proposed a geometry-based approach to recognize the six basic ex-pressions (happiness, surprise, anger, disgust, fear, and sadness), along with the neutral state in some cases. 96 features were extracted, with a goal of maximizing the inter-class variation and minimizing the intra-class variation. Those features represent the displacement of detected 49 facial points to their corresponding loca-tion in either the person-specific or the general neutral model. The displacements
B.2. Geometry-based Method 164 Table B.13: Confusion matrix of 6-class facial expression recognition using geo-metric features extracted from 49 facial points, based on cross-database evaluation via SVM. The model was trained using ck+ and evaluated on BU-4DFE database.
For each expression, two rows are presented. The first row is dedicated for the person-specific scenario, the features were calculated with respect to a prior known person-specific neutral model. The second row is the case where a general neutral model is used. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa
Ha 0.8537 0 0 0.0122 0.1341 0
0.7927 0.0244 0.0122 0.0122 0.1220 0.0366
Su 0 0.7209 0 0 0.2674 0.0116
0 0.9651 0 0.0116 0.0116 0.0116
An 0 0 0.7324 0.1690 0.0141 0.0845
0 0 0.4648 0.2254 0.0141 0.2958
Di 0.1071 0 0.1071 0.4643 0.3036 0.0179 0.0536 0.0536 0.0714 0.5893 0.0714 0.1607 Fe 0.0678 0.0339 0.0339 0 0.6441 0.2203 0.0678 0.1356 0.0169 0.0169 0.3559 0.4068
Sa 0 0 0.0794 0.0159 0.0476 0.8571
0 0.0159 0.0317 0.0317 0.0317 0.8889
are calculated relatively to the face shape; the features were standardized before being forwarded to the SVM classifier.
Table B.13 summarizes expression recognition rates obtained by training a mod-el via the CK+ database and evaluating it on the BU-4DFE database. The first row of each expression is dedicated for the person-dependent scenario, and the sec-ond for the person-independent scenario. The achieved average recognition rate is 71.21%in the person-specific neutral case, and67.61%in the general neutral model.
Happiness, surprise, and sadness expressions are recognized with high rates in both cases. More confusions among disgust, fear, and sadness arise, which can be attributed to the inconsistent intensity of the depicted expression in the BU-4DFE database, e.g. 20% of the fear samples in the first case and 40.68% in the second case are recognized as sadness. Contrary to expectations, the recognition rate of the surprise expression raises from72.09%in person-specific case to96.51%in the general neutral model. In the former case the individual variation is removed and
B.2. Geometry-based Method 165 Table B.14: Confusion matrix of 6-class facial expression recognition using geo-metric features extracted from 49 facial points, based on cross-database evaluation via SVM. The model was trained using BU-4DFE and evaluated on ck+ database.
For each expression, two rows are presented. The first row is dedicated for the person-specific scenario, the features are calculated with respect to a prior known person-specific neutral model. The second row is the case where a general neutral model is used. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa
Ha 0.9565 0 0 0.0290 0.0145 0
1.0000 0 0 0 0 0
Su 0 0.9878 0 0 0 0.0122
0 0.9878 0.0122 0 0 0
An 0 0 0.9111 0.0222 0 0.0667
0 0 0.8889 0.0444 0.0444 0.0222
Di 0 0 0.8814 0.1017 0 0.0169
0 0 0.8644 0.1356 0 0
Fe 0.0400 0.0800 0 0.1200 0.7200 0.0400
0.2000 0 0 0.0400 0.7600 0
Sa 0 0 0.1071 0.0357 0.1071 0.7500
0 0 0.1071 0.1071 0.1786 0.6071
Table B.15: Confusion matrix of 7-class facial expression recognition using geomet-ric features extracted from 49 facial points, based on cross-database evaluation via SVM, in person-independent mode. The model was trained using ck+ and evalu-ated on BU-4DFE database. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa Ne
Ha 0.7561 0.0122 0.0122 0.0122 0.0854 0 0.1220
Su 0 0.9535 0 0 0.0116 0 0.0349
An 0 0 0.4225 0.1549 0.0141 0.1690 0.2394
Di 0.0714 0.0536 0.0714 0.5357 0.0714 0.1607 0.0357 Fe 0.0508 0.1525 0.0169 0 0.2712 0.2034 0.3051
Sa 0 0.0159 0.0317 0 0.0317 0.6349 0.2857
Ne 0 0 0.0488 0 0.0122 0.1220 0.8171
B.2. Geometry-based Method 166 Table B.16: Confusion matrix of 7-class facial expression recognition using geo-metric features extracted from 49 facial points, based on cross-database evaluation via SVM, in person-independent mode. The model is trained using BU-4DFE and evaluated on CK+ database. Each column represents samples of the predicted class while each row represents samples of the ground truth class.
Ha Su An Di Fe Sa Ne
Ha 1.0000 0 0 0 0 0 0
Su 0 0.9756 0 0 0.0122 0 0.0122
An 0 0 0.8222 0.0444 0.0222 0 0.1111
Di 0 0 0.8475 0.1356 0 0 0.0169
Fe 0.0800 0 0 0.0400 0.8800 0 0
Sa 0 0 0.1071 0.1429 0.1071 0.5357 0.1071
Ne 0.0290 0 0.2754 0.0145 0.1739 0.0290 0.4783
as the expression intensity varies in the BU-4DFE database,26%of the samples are identified as fear, which is expressed with similar eyes and eyebrows movement but limited mouth opening.
In Table B.14, the results were obtained by the opposite configuration to the afore-mentioned one, the model was trained using the BU-4DFE database and evalu-ated on the CK+ database. Interestingly, happiness, surprise, and anger expres-sions are recognized with high rates in both cases, more than95% for happiness, 98% for surprise, and 88% for anger. Sadness and fear expressions are identified with rates ranging between 76%and 60% in both cases. Unlike the other expres-sions, most of the disgust samples (more than86%) are identified as anger, which can be attributed to the variations in depicting the expression between the two databases. The variation in expression intensity among individuals in BU-4DFE database makes it a better choice for training as most of the CK+ samples fall in the correct corresponding expression space, except the disgust samples. In gen-eral, minimizing the individual variation via extracting geometric features with respect to person-specific neutral state leads to a better generalization capability across databases.
B.3. Discussion 167