Self-Organizing Maps - Modeling Affection Mechanisms using Deep and Self-Organizing Neural Netw

Chapter 1 Introduction

4.5 Self-Organizing Maps

of the SOM’s topology.

The training of the neurons in the SOM rely heavily on the Hebbian learning structure, where neurons which are neighbors fire together and thus should be updated toward the same goal. The neurons in the SOM are fully connected with the input data, so each neuron will be updated to resemble one or a set of input stimuli which are closely related.

The SOM uses a competitive learning mechanism based on the Euclidean Dis-tance. After a random initialization of the neurons, one sample is presented to the network. The distance between the input sample and the neurons is calculated, and the unit which has the smaller distance is selected. This unit is commonly known as the best matching unit (BMU). During the update cycle, only the weights of the BMU and the neurons neighboring this unit are adjusted using the input stimuli as the goal. The update rule for the neuron in the SOM can be expressed as:

wt =wt−1+θ(b, n)α(x−wt), (4.18) where wt is the updated weight of the current unit, wt−1 is the current weight, θ(b, n) is the neighboring function between the current unit, n and the BMU b, α is the learning coefficient, a parameter which decreases during training, and x is the input. This function calculates the distance between the current unit and the BMU, and can be implemented in several different ways. The most common one is to define the function as 1 if the neuron is adjacent to the BMU and 0 if not.

During training, it is possible to see an emerging effect: some neurons will be updated to reflect the distribution of the input data. This effect forms a cluster, where certain regions in the SOM can be associated with certain concepts in the input data distribution. For example, if the SOM is being used to classify human faces, it is possible to cluster male and female faces in different places of the grid. Figure 4.9 illustrates a SOM connected to the input layer, the BMU and the clustering effect which is represented by different colors in the SOM structure.

4.5.1 Growing When Required Networks

The Growing When Required (GWR) [206] networks are an extension of SOMs which does not have the concept of a 2D grid in the neuron’s distribution. In a SOM the neurons are constructed in a way that they are disposed of in a 2D structure, where each neuron has a known number of neighbors. The number of neurons and their disposition in the grid is one of the decisions that should be made before starting to build the SOM. That gives the model the capability to reduce the dimensionality of the input data but restricts the network with respect to the amount of information it can learn.

The GWR was proposed to solve the fixed topology problem in the SOM. In this model, neurons are added only when necessary, and without any pre-defined topology. This allows this model to have a growing mechanism, increasing and de-creasing the number of neurons, and their positions, when necessary. This makes

4.5. Self-Organizing Maps

Figure 4.9: Illustration of a SOM. The input is fully connected with all the SOM units, and the best matching unit (BMU) is calculated. The different colors in the grid indicate neurons which encode similar information, introducing the idea of different clusters in the SOM.

the model able to represent data with an arbitrary number of samples and intro-duces the capability of dealing with novelty detection.

The neurons in the GWR have, besides the weight vector which connects the neuron to the input, the concept of edges. The edges connect the neurons, giving them the concept of neighbors. This allows the model to grow to match the topological distribution of the data, in contrary to the SOM which transforms the data topology into a 2D topology.

Differently from a SOM, the GWR starts only with two neurons, which are created based on two random samples in the input data. Then, as more data is presented to the network, the algorithm decides based on some constraints and the activation behavior of each neuron when and where to add a new neuron. That means that the network can create different clusters of neurons, which represents similar concepts, in different spatial regions. Imagine if you have a network trained with data representing two colors, blue and red, and suddenly a new color, green, is presented. The network will decide to create a new region of neurons to represent the new color. Theoretically, there is no limit to adding new neurons, meaning that the network can learn an arbitrary number of new concepts. The edges maintain the concept of similarity between the nodes, therefore clusters can have connections through edges, exhibiting that these clusters have certain properties in common.

To train this network, the main concept of the SOM training is kept: to find a BMU among the neurons in the model. The difference is that after finding the BMU, the activation of the BMU and its neighbors is calculated. In the GWR the activation can be represented as a function applied to the distance between the neurons and the input. Based on this distance, the network can identify if an input is too far from the knowledge stored in the neurons of the network, if that is the case, a new node is added and an edge between the closer node and the new

node is created.

Each of the nodes in the GWR is also equipped with a function to identify how often it has fired, meaning how often the distance between the neuron and the input was larger than a certain threshold. This mechanism modulates the creation of new nodes by creating a priority in updating neurons which have not been fired in a long time instead of creating new neurons. That gives the network a forgetting mechanism. This mechanism allows the network to forget useless information, that means forget representations which are not important to represent the data.

Together with that, each edge has an associated age that will be used to remove old connections. That means that if a new cluster is created during training, and suddenly is not related to the main neurons anymore, it should be deleted. In the end of each training iteration, the nodes without connections are removed. That makes the model robust against outliers.

The behavior of the GWR when iterating over a training set shows the emer-gence of concepts. In the first epochs the network will have an exponential grow in the number of neurons, but after achieving a topology that models the data, it will mostly converge. This behavior will change when a new set of training samples is presented to the network. If that new set does not match with some particular region of the network, the model will adapt around the new data distribution, forgetting and removing old neurons when necessary, and creating new ones. That gives the model a similar behavior found in the formation and storage of memory in the brain.

Figure 4.10 illustrates a GWR in two different steps of training. In the left side, Figure 4.10a, it shows the GWR in the second training cycle. In this example, each dot represents a neuron, and the color of the neuron represents an emotional concept. The image on the right, Figure 4.10b shows the model after 100 training cycles are completed. It is possible to see that the network created a non-uniform topology to represent the data. Also, it is possible to see that neurons with similar concepts stay together, creating the idea of clusters. Also, it is possible to identify that some emotional concepts, mostly the black ones, are merged with the others.

The black concepts represent the neutral emotions, which are related to all others in this example.

Im Dokument Modeling Affection Mechanisms using Deep and Self-Organizing Neural Networks (Seite 75-78)