• Keine Ergebnisse gefunden

Chapter 1 Introduction

4.6 Corpora

node is created.

Each of the nodes in the GWR is also equipped with a function to identify how often it has fired, meaning how often the distance between the neuron and the input was larger than a certain threshold. This mechanism modulates the creation of new nodes by creating a priority in updating neurons which have not been fired in a long time instead of creating new neurons. That gives the network a forgetting mechanism. This mechanism allows the network to forget useless information, that means forget representations which are not important to represent the data.

Together with that, each edge has an associated age that will be used to remove old connections. That means that if a new cluster is created during training, and suddenly is not related to the main neurons anymore, it should be deleted. In the end of each training iteration, the nodes without connections are removed. That makes the model robust against outliers.

The behavior of the GWR when iterating over a training set shows the emer-gence of concepts. In the first epochs the network will have an exponential grow in the number of neurons, but after achieving a topology that models the data, it will mostly converge. This behavior will change when a new set of training samples is presented to the network. If that new set does not match with some particular region of the network, the model will adapt around the new data distribution, forgetting and removing old neurons when necessary, and creating new ones. That gives the model a similar behavior found in the formation and storage of memory in the brain.

Figure 4.10 illustrates a GWR in two different steps of training. In the left side, Figure 4.10a, it shows the GWR in the second training cycle. In this example, each dot represents a neuron, and the color of the neuron represents an emotional concept. The image on the right, Figure 4.10b shows the model after 100 training cycles are completed. It is possible to see that the network created a non-uniform topology to represent the data. Also, it is possible to see that neurons with similar concepts stay together, creating the idea of clusters. Also, it is possible to identify that some emotional concepts, mostly the black ones, are merged with the others.

The black concepts represent the neutral emotions, which are related to all others in this example.

4.6. Corpora

Figure 4.10: The neurons of a GWR during the training on emotional data, showed in the 2 first training cycles (a) and in the 100 first ones(b). Each color represents a different emotional concept associated with that neuron.

To evaluate the attention models proposed in this thesis, we introduce a new emotional attention corpus, based on the FABO dataset. This corpus contains data used for emotional attention tasks, which are not present in the previously mentioned datasets.

Most of the corpora used in training and evaluating affective computing models are related to single emotion expressions, and none contains data of Human-Robot-Interaction (HRI). To evaluate the capability of the models proposed in this thesis to cope with HRI scenarios a novel dataset, named KT Emotional Interaction Corpus, with emotional behavior observations was designed and recorded. This corpus was designed to be challenging in different tasks: spontaneous emotion expressions recognition, emotional attention identification, and emotional behavior analysis.

The following sections detail the FABO and SAVEE corpus, the emotional attention corpus and also the new KT Emotional Interaction Corpus.

4.6.1 Emotion Expression Corpora

The Bi-modal face and body benchmark database

For our experiments we use four corpora. The first one is the bi-modal FAce and BOdy benchmark database (FABO), introduced by Gunes and Piccardi [117]. This corpus is composed of recordings of the upper torso of different subjects performing spontaneous emotion expressions. This corpus contains a total of 11 expressions performed by 23 subjects of different nationalities. Each expression is performed in a spontaneous way, where no indication was given of how the subject must

per-Figure 4.11: Examples of images with an angry expression in the FABO corpus.

Figure 4.12: Faces with an angry expression in the SAVEE corpus.

form the expression. A total of 281 videos were recorded, each one having 2 to 4 of the following expressions: “Anger”, “Anxiety”, “Boredom”, “Disgust”, “Fear”,

“Happiness”, “Surprise”, “Puzzlement”, “Sadness” and “Uncertainty”. Each ex-pression starts with a neutral phase, and continues until the apex phase, where the expression is at its peak. We use the neutral phase for each expression to create a 12th “Neutral” class in our experiments. Figure 4.11 illustrates images present in a sequence of an angry expression in the FABO corpus.

Surrey Audio-Visual Expressed Emotion

The second corpus is the Surrey Audio-Visual Expressed Emotion (SAVEE) Database, created by Haq and Jackson [123]. This corpus contains speech recordings from four male native English speakers. Each speaker reads sentences which are clustered into seven different classes: “Anger”, “Disgust”, “Fear”, “Happiness”, “Neutral”,

“Sadness” and “Surprise”. These classes represent the six universal emotions with the addition of the “Neutral” class Each speaker recorded 135 utterances, with 30 representing “Neutral” expressions and 15 for each of the other emotions. All the texts are extracted from the TIMIT dataset and are phonetically balanced. Each recording contains the audio and face of the participant, with facial markers. The markers are present to be used for systems that need them, and unfortunately belong to the image. Figure 4.12 illustrates faces of a subject while performing an angry expression in the SAVEE corpus.

Emotion-Recognition-In-the-Wild-Challenge Dataset

The third corpus is the database for the Emotion-Recognition-In-the-Wild-Challenge (EmotiW) [69]. This corpus contains video clips extracted from different movies