Supervised Learning - Neuronal Models of Motor Sequence Learning in the Songbird

3.4 Learning

3.4.2 Supervised Learning

Supervised learning is used to imprint a precise set of desired input-output relationships onto neuronal networks. If the output neuron is required to respond to certain classes of input patterns with a specific output activity, this process is called classification. During the learning process, a supervisor or teacher provides detailed feedback to the synaptic connections on the success of learning and the nature of the synaptic changes necessary to achieve the desired input-output relationship. The formulation of the input-output relations depends on the network structure and neuron type: In the most simple setting in a feed-forward network, an output neuron is required to respond to a given spatial input pattern either with positive activity or not. This is called the Perceptron problem and will be discussed in the next section. The Perceptron setting can be extended to the temporal domain with spiking neurons, the Tempotron: Here, the output neuron is required to classify two sets of spatio-temporal input patterns by either spiking or not spiking. The natural extension of the Tempotron is the Chronotron: Here, the output neuron is required to respond to a spatio-temporal input spike pattern with a spike at a precisely defined time. This problem and its solutions will be discussed in section 3.4.2.3.

Supervised learning is also applied to learning in recurrent networks. Here, either a sta-tionary activation pattern in response to a noise or partial input of that pattern or even elongated spatio-temporal activation patterns are the goal of learning. Due to the high sensitivity of activations in recurrent networks to noise, it is difficult to imprint stable activation sequences.

3.4.2.1 The Perceptron

The Perceptron is a toy model of a simple feed-forward neural network, that can learn to distinguish two different classes of inputs. To that end, it is required to respond to one class of input patterns with activation and to the other class with non-activation. This can also be viewed as a case of associative learning, where the input pattern is associated with a given output.

Consider a simple feed-forward network, which consists of a layer ofN input neurons and a single output neuron that is trained to perform the desired classification. Figure 3.1(a) shows an example of such a network. In the original Perceptron setup as introduced in [HKP91], the state of each input neuron is calledξ_i and takes the valuesξ_i ∈ {−1,1}. The output neuron takes statesO ∈ {−1,1}. It is a simple threshold unit, i.e. it computes its output according to

O=g(h) =g (

∑

wiξi

)

(3.29) wherew_i is the connections strength from input neuronito the output neuron and g(h) is the activation function. In the simple case of deterministic threshold units it is just the sign function:

g(h) = sgn(h) (3.30)

3.4.2.1.1 The Original Perceptron Learning Rule

In simple classification problems, there are two classes of µ input patterns, which are supposed to be distinguished by the output neuron. The input vectors of pattern µ will

be denoted ξ^µ. This implies that for each input pattern ξ^µ there is a desired output ζ^µ. The goal then is that

O^µ=^! ζ^µ . (3.31)

Since O^µ is given by eq. (3.29), the weight wi have to be chosen such that the summed and weighted input into the output neuron is either positive or negative.

It is possible to interpret eq. (3.29) as a scalar product, which makes it possible to rewrite it as

ζ^µ=^! O^µ= sgn (

⃗ w ˙ξ⃗^µ

)

. (3.32)

This shows that the output is just the sign of the input projected onto the weight vector.

Therefore, the boundary between positive and negative output is given by the plane defined by w⃗⃗ξ˙^µ= 0 through the origin and perpendicular to w.⃗

The goal here, is to choosew⃗ such that this plane separates patterns with desired positive output (ζ^µ= 1) from patterns with desired negative output (ζ^µ=−1).

This is not always possible, since the patterns may not always be linearly separable, that is there may be two or more patterns that require synaptic weights that are incompatible to provide correct output.

For robustness to noise in the input patterns, it is useful to define a margin κ, which defines a minimum distance between the input into the output neuron and zero, such that

ζ^µh^µ> N κ (3.33)

The original Perceptron learning rule is then given by

∆wi=ηΘ (N κ−ζ^µh^µ)ζ^µξ_i^µ (3.34) This quantity is only larger than zero, if the condition in equation 3.33 is not fulfilled, that is the output is different from the required output. Then the weight from input neuron i to the output neuron is increased, if the activation in both has the same sign and decreased, if they are of different sign. Hence, the weight change moves the output in the desired direction. Due to the constant size of the weight change in each step, this learning rule converges in a finite number of steps, if learning is possible (see [HKP91]

for a proof). Furthermore, learning stops, when the actual output is the desired output, such that overlearning due to repeated presentation of the input patterns is not possible.

These are highly desirable qualities in a learning rule.

3.4.2.2 The Tempotron

The natural extension of the Perceptron problem is the Tempotron. Here, the output neuron is taught to classify elongated spatio-temporal patterns and respond to them with either a spike or no spike. This concept was introduced in [GS06], where it was found that in principle, integrate-and-fire neurons are capable of learning such spike timing based decisions, since the state of the neuron is dependent on the order of inputs into a neuron.

3.4.2.3 The Chronotron

The Chronotron problem is the extension of the Tempotron problem from the pure clas-sification task on spatio-temporal patterns: The output neuron is required to provide one

output spike at a precisely defined time during each input pattern.

There are several learning rules that attempt to solve these classification problems, some of which I will introduce here.

3.4.2.3.1 The δ-rule and ReSuMe.

The δ-rule, also called the Widrow-Hoff-rule [HKP91], lies at the core of a whole class of learning rules used to teach a neuronal network some target activity pattern. Synap-tic changes are driven by the difference of desired and actual output, weighted by the presynaptic activity:

∆w(t)∝fpre(t) (

f_post^target(t)−f_post^actual(t) )

. (3.35)

Pre- and postsynaptic firing rate are denotefpre,post. The target activityf_post^target(t) is some arbitrary time dependent firing rate. The actual self-generated activityf_post^actual(t) is given by the current input or voltage of the postsynaptic neuron (depending on the formulation), transformed by the input-output function g(h) of the neuron.

ReSuMe (short for Remote Supervised Method) is a supervised spike-based learning rule first proposed in 2005 [PK10]. It is derived from the Widrow-Hoff rule for rate-based neurons, applied to deterministic spiking neurons. Therefore, continuous time dependent firing rates are replaced by discrete spiking events in time, expressed as sums of delta-functions. Because these functions have zero width in time, it is necessary to temporally spread out presynaptic spikes by convolving the presynaptic spike train with a temporal kernel. Although the choice of the kernel is free, usually a causal exponential kernel works best. The weight change is given by

w(t)∝[S_d(t)−S_o(t)]

⎡

⎣a_d+

∞

∫

exp(−s/τ_plas)S_i(t−s)ds

⎤

⎦ , (3.36)

where S_d(t) is the desired, So(t) is the self-generated and Si(t) the input spike train at synapse i. τ_plas is the decay time constant of the exponential kernel. a_d is a constant which makes sure that the actual and target firing rates match; learning also works with-out. ReSuMe converges when both actual and desired spike lie at the same time, because in this case the weight changes exactly cancel each other out.

3.4.2.3.2 E-Learning.

E-Learning was conceived as an improved learning algorithm for spike time learning [Flo12a].

It is derived from the Victor-Pupura distance (VP distance) between spike trains [VP96].

The VP-distance is used to compare the similarity between two different spike trains (see section 3.5.2).

E-Learning is a gradient descent on the VP distance and has smoother convergence than ReSuMe. In this rule, first the actual output spike train is compared to the desired spike train. The VP algorithm determines if output spikes must be shifted or erased or if some desired output spike has no close actual spike so a new spike has to be inserted. Based on this evaluation, actual and desired spikes are put in three categories:

Actual output spikes are “paired” if they have a pendant, i.e. a desired spike close in time and no other actual output spike closer (and vice versa). These spikes are put into a setS.

Unpaired actual output spikes that need to be deleted are put into the setD.

Unpaired desired output spike times are put into the set J, i.e. the set of spikes that have to be inserted.

To clarify, S contains pairs of “paired” actual and desired spike times, D contains the times of all unpaired actual spikes, andJ the times of unpaired desired spike times. With the PSP sum as above, the E-Learning rule is then

∆w_i =γ

⎡

⎣

∑

tins∈J

λ_i(t^ins)− ∑

t^del∈D

λ_i(t^del) +γ_r τ_q²

∑

(t^act,t^des)∈S

(t^act−t^des)λ_i(t^act)

⎤

⎦ . (3.37) γ is the learning rate, and γr is a factor to scale spike shifting relative to deletion and insertion.

The former two terms of the rule correspond to ReSuMe, except the kernel is not a simple exponential decay. The advantage of E-Learning is that the weight changes for spikes close to their desired location are scaled with the distance, which improves convergence and consequentially memory capacity.

3.4.2.3.3 FP-Learning.

FP-Learning [MR ¨OS14] was devised to remedy a central problem in learning rules like ReSuMe and others. Any erroneous or missing spike “distorts” the time course of the membrane potential behind it compared to the desired final state. This creates a wrong environment for the learning rule, and weight changes can potentially be wrong. Therefore, the FP-Learning algorithm stops the learning trial as soon as it encounters any spike output error. Additionally, FP-Learning introduces a margin of tolerable error for the desired output spikes. An actual output spike should be generated in the window of tolerance [t_d−ϵ, t_d+ϵ] with the adjustable margin ϵ. Weights are changed on two occasions:

1. If a spike occurs outside the window of tolerance for anyt_dat timet_err, then weights are depressed by ∆wi ∝ −λ_i(terr). This also applies if the spike in question is the second one within a given tolerance window.

2. Ift=td+εand no spike has occured in the window of tolerance, thenterr=td+ε and ∆wi ∝λi(terr).

In both cases, the learning trial immediately ends, to prevent that the “distorted” mem-brane potential leads to spurious weight changes. Because of this property, this rule is also referred to as “First Error Learning”.

Im Dokument Neuronal Models of Motor Sequence Learning in the Songbird (Seite 34-37)