GWR-based Learning Architecture - Hierarchical GWR Model

G STS

4.4 Hierarchical GWR Model

4.4.1 GWR-based Learning Architecture

Chapter 4. Self-Organizing Neural Integration of Visual Action Cues

G

^P1

G

^P2

G

^M1

G

^M₂

4.4. Hierarchical GWR Model

creating new ones. For this purpose, the network implements a firing counter to express how frequently a neuron has fired based on a simplified model of how the efficacy of an habituating synapse reduces over time. As discussed in Section 3.2.2, the use of an activation threshold and firing counters to modulate the growth of the network leads to the creation of a larger number of neurons at early stages of the training and then a tuning of the weights of existing neurons through sub-sequent training iterations (epochs). This behavior is particularly convenient for incremental learning scenarios since neurons will be created to promptly distribute in the input space, thereby yielding fast convergence through iterative fine-tuning of the topological map. The GWR algorithm will then iterate over the training set until a given stop criterion is met, e.g. a maximum network size or a maximum number of iterations.

In our architecture, hierarchical learning is carried out as described in Sec-tion 4.3.1. At the first stage of our hierarchy, each stream is composed of two GWR networks to process pose and motion features separately. We therefore compute two distinct datasets with sequentially-ordered pose and motion features, denoted asP andM respectively. SinceP andM are processed by different network hierar-chies, they can differ in dimensionality. Following the notation introduced in Fig.

1, we train the networks G^P₁ and G^M₁ with samples from P and M respectively.

After this step, we train G^P₂ and G^M₂ with the training sets of concatenated tra-jectories of best-matching neurons as defined by Eq. 4.3. The STS stage consists of the integration of prototype activation trajectories from both streams by train-ing the network G^{ST S} with two-cue trajectory samples. The network layer G^{ST S} integrates pose-motion features by training the network with the concatenation of vectors Ψ = {Ω(P) _ Ω(M)}, where P and M are the activation trajectories fromG^P₂ and G^M₂ respectively. After the training ofG^{ST S} is completed, each neu-ron will encode a sequence-selective prototype action segment, thereby integrating changes in the configuration of a person’s body pose over time.

For the purpose of action classification, we extend the unsupervised GWR-based learning with two labelling functions: one for the training phase and one for returning the label of unseen samples as described in Section 4.3.2. We train the G^{ST S} network with the labeled training pairs so that symbolic labels are attached to neurons representing temporally-ordered visual representations.

Noise Detection

The presence of noise in the sense of outliers in the training set has been shown to have a negative influence on the formation of faithful topological representa-tions using SOMs (Parisi and Wermter, 2013), whereas such an issue is partially addressed by incremental networks. For instance, incremental networks such as GNG and GWR are equipped with a mechanism to remove rarely activated nodes and connections that may represent noisy input. In contrast to GNG, however, the learning strategy of the GWR shows a quick response to changes in the distri-bution of the input by creating new neurons to match it. The insertion threshold aT modulates the number of neurons that will be added, e.g. for high values of aT

Chapter 4. Self-Organizing Neural Integration of Visual Action Cues

0 10 20 30 40 50 60 70 80 90 100

Figure 4.6: A GWR network trained with a normally distributed training set of 1000 samples resulting in 556 nodes and 1145 connections (Parisi et al., 2015b).

0 20 40 60 80 100 120 140 160 180 200

0 0.2 0.4 0.6 0.8 1

Samples

Network activation

Figure 4.7: Activation values for the network trained in Fig. 4.6 with a test set of 200 samples containing noise. Noisy samples line under novelty threshold a_new = 0.1969 (green line) (Parisi et al., 2015b).

more nodes will be created. The network is also equipped with a mechanism to avoid slight input fluctuations to perturb the learning convergence and the creation of unnecessary nodes. The GWR takes into account the number of times that a neurons has been activated, so that neurons that have been activated more times, are trained less. Therefore, an additional threshold modulates the firing counter of neurons so that during the learning process less trained neurons are updated, whereas new neurons are created only when existing neurons do not sufficiently represent the input. A number of experiments have shown that the GWR is well-suited for novelty detection (Marsland et al., 2005), which involve the identification of inputs that do not fit the learned model.

In line with this mechanism, we use the activation function to detect noisy input after the training phase. The activation function will be equal to 1 in response

4.4. Hierarchical GWR Model

to input that perfectly matches the model, i.e. minimum distance between the weights of the neuron and the input, and will decrease exponentially for input with a higher distance. If the response of the network to the novel input is below a given novel activation threshold anew, then the novel input can be considered noisy in the sense that it is not represented by well-trained prototype neurons, and thus discarded. The threshold value a_new can be empirically selected by taking into account the response distribution of the trained network with respect to the training set. For each novel inputx_new, we compute:

exp(−kx_new−w_bk)<A¯−u·σ(A), (4.7) where w_b is the best-matching neuron of x_new, ¯A and σ(A) are respectively the mean and the standard deviation of the set of activations A from the training set, and u is a constant value that modulates the influence of fluctuations in the activation distribution.

Fig. 4.6 shows a GWR network trained with 100 input vectors with two nor-mally distributed clusters. Over its 500 iterations, the network created 556 neurons and 1145 connections (a_T = 0.95, u = 4). The activation values for a test set of 200 samples (also normally distributed) containing artificially introduced noise are shown in Fig. 4.7. It is observable how noisy samples lie below the computed ac-tivation threshold a_new = 0.1969 (Eq. 4.7) and can, therefore, be discarded. We use this noise detection procedure to all the networks in our architecture with the aim to attenuate noise in the training data and prevent the forced classification of input that are not represented by the trained model.

Im Dokument Multimodal Learning of Actions with Deep Neural Network Self-Organization (Seite 58-61)