3.3. Face Detection 40 expression, illumination, or races, across the training faces. Throughout this the-sis, I employ VJ approach [159] for the face detection, mainly the frontal and profile models available in OpenCV Library [16], in which also the diagonal haar-like fea-tures were exploited [94]. These models were trained across expressions, admis-sible range of poses, illuminations, ethnics, skin tones, and small occlusion cases.
In what follows a brief overview of VJ approach is given, including its advantages and shortcoming. VJ face detector is built on top of a rejection cascade of nodes, where each node is an AdaBoost group of decision tree (one level deep) classi-fiers, as shown in Figure 3.10. LetCi denote the binary decision of treei, then the adaboost classifier ofncan be formulated as follows.
Ad=sign(w1C1+w2C2+· · ·+wnCn), (3.34) Boosting technique is used to calculate the weight values(w1, . . . , wn)as summa-rized in Algorithm 2. The AdaBoost nodes are arranged to achieve a maximum speed with reasonable performance. Earlier nodes have a lower number of fea-tures as they are evaluated the most since each node terminates the patch testing when its output is false. Additionally, they have a higher detection rate which of course at the cost of higher false positives; however, by the end of the rejection cascade a higher detection rate (98%) is achieved with lower false-positive rate (0.0001%) [17].
The VJ detector exploits the Haar-like features, defined as a threshold applied to sum and difference of intensity values of adjacent image regions. Samples of Haar-like feature types are shown in Figure 3.11. The computation of the features is sped by the use of an integral image.
To locate a face inside an image, a sliding window shifts pixel-by-pixel scan-ning the whole image across various scales for potential faces. Those window scales are parametrized by: minimum search size; maximum search size; and step scale factor, where each face outside the selected scales will be ignored. Then, the positive overlapping windows that passed the minimum neighboring threshold are merged through averaging to produce the final detection results. The search is sped up by scaling the features instead of scaling the processed patch itself.
One main drawback of VJ face detector is its inconsistent face cropping and conse-quently ruining any further automatic analysis. This issue appears when you scan an image containing faces of different scale with fix parameters, or scanning the same image but with different parameters as shown in Figure 3.12. To cope with
3.3. Face Detection 41 Algorithm 2:The boosting algorithm AdaBoost.
Data: N training Samples(x, y)with binary labelsyi ∈ {−1,+1},C(x)is a binary decision tree based on input vectorxthat is one level deep.
Result: Ad(x) Training
Initialization
D1(i) = N1, i, . . . , N fort←−1toT do Ct(x) = argmin
Cj∈H
Ej, Ej =
N
P
i=1
[Dt(i)×In(yi 6=Cj(xi))]
Et =
N
P
i=1
[Dt(i)×In(yi 6=Ct(xi))]
wt= 12log [(1−Et)/Et]
Dt+1(i) = [Dt+1(i)×exp(−wtyiCt(xi))]/Zt,Ztis a normalization factor.
Testing
Ad(x) = sign T
P
t=1
wtCt(x)
this issue, I propose several post-processing methods, explained in later chapters.
False-positive detection can be further rejected by checking the existence of major facial components (Eyes, nose, mouth), or the existence of skin color within the detected patch. Tracking methods are potential solution to mitigate mis-detections while processing an image sequence. Due to the employing of the aforementioned efficient methods, integral image, and rejection cascade, VJ face detector can work in real time. Sharma et al. [146] showed the feasibility of building VJ face detec-tor in real time. Moreover, it has been proven in our lab that it works at 45fps on NVIDIA GeForce GTX 780 by scanning an image of 640×480 pixels with all potential scales.
3.3. Face Detection 42
Edge features
Line features
Center-surround features
Figure 3.11: Samples of Haar-like features, add intensity values of the light region and then subtract the value of dark region
.
(a) [78×78] (b) [111×111] (c) [122×122] (d) [135×135]
Figure 3.12: Applying VJ face detector to an image each time with different search-ing parameters (e.g. scale step factor) leads always to a different croppsearch-ing. The size of the returned box is shown beneath each sub-image in pixels. The image was taken from BIWI database.
CHAPTER 4
Databases
Throughout this dissertation, I exploited several databases either to train my pro-posed methods or to evaluate them in within/cross database scenarios. This sec-tion provides a brief descripsec-tion of those databases. I grouped them into three categories according to their employment here, not their eligibility. All of them are publicly available, which ensures the reproducibility of the achieved results.
4.1 Facial Point Databases
To conduct a sophisticated evaluation of our facial point detector, I exploited sev-eral databases that vary in illumination, pose, expression, etc.. Below follows a brief overview of those databases.
4.1.1 CMU Multi-PIE
This database was produced by Gross et al. [58], aiming to advance the research in face recognition across poses and illumination conditions; however, it has shown a great benefit for evaluating methods of facial point detection (as used here) and of facial expression recognition. It composes 337 subjects, 264 male and 102 female.
60%of them were European-Americans,35%Asian,3%African-American and2%
others, with an average age of 27 years. Each was photographed with 15 views at once under 19 illumination conditions. The data were gathered in four recording sessions, each dedicated for different expressions. A uniform static background is
43
4.1. Facial Point Databases 44 used in all sessions. The recording was performed using 13 cameras located at the head level with various yaw angles spaced by15◦, and using two cameras captur-ing the face from surveillance views. Durcaptur-ing the recordcaptur-ing, each subject was asked to display one of the following facial expressions: smile, surprise, squint, disgust, scream. In total, the database contains 755,370 images from the 337 different sub-jects. The image resolution is480×640pixels, where the mean inter-pupil distance is78pixels in the frontal view.
4.1.2 MUCT
This database was produced by Milborrow et al. [112], aiming to advance the re-search in facial point detection across illumination, age, and ethnicity. It consists of 3755 images of human faces, each manually annotated with 76 facial points. 276 subjects, with equal numbers of males and females and a cross section of ages and races, were photographed with 5 views at once, with a uniform static background.
Three cameras were located at head level simulating three different yaw poses, the other two at a higher and lower level of the face simulating 2 pitch angles. 10 dif-ferent illumination conditions were applied, where each subject was captured in only up to 3 conditions. Some subjects appear with makeup, glasses, and head-dresses. They were not asked to display any facial expression; however, a natural smile was presented in some frames. The image resolution is 480 × 640 pixels, where the mean inter-pupil distance is88pixels.
4.1.3 Helen
This database was produced by Le et al. [89], aiming to advance the research in facial point detection on high-resolution images across illumination conditions, poses, and expressions. The images were gathered from Flickr, implying more diversity and consequently, more challenge. The images were collected by making searching on Flickr with different keywords such as family, outdoor, boy, wedding.
The search was carried out in different languages to avoid a cultural bias. Only images of faces greater than 500 pixels in width were incorporated. The faces may appear with a proportional amount of background in some samples or being very close so contacting with image edges. In total, the database contains 2330 samples categorized into training part of 2000 samples and testing part of 330 samples.