Homeostatic Plasticity: Information Theoretic Intrinsic Plasticity Rule

2.2 Self-adaptive Reservoir Framework

2.2.3 Homeostatic Plasticity: Information Theoretic Intrinsic Plasticity Rule

As discussed in the introductory chapter (section 1.3.2), homeostatic regulation by way of in-trinsic plasticity is viewed as a mechanism for the biological neuron to autonomously modify its

4The information storage measure was implemented using modified versions of the Java based information dynamics toolkit(Lizier,2014). The toolkit was used as a wrapper class with Matlab.

2.2 Self-adaptive Reservoir Framework

firing activity to match the input stimulus distribution (Turrigiano et al., 1994);(Desai et al., 1999). From an information theoretic perspective, Stemmler and Koch (Stemmler and Koch, 1999) demonstrated that IP can allow a neuron to exploit its complete dynamic range of firing rates while being driven by a given input, such that for Gaussian input distributions, IP could lead to an optimal exponential output distribution for maximizing information transfer (see Fig. 1.5 (c) and (d)). Furthermore, it is plausible that single neurons try to achieve this max-imum information transmission while obeying constraints on its energy expenditures (Sharpee et al., 2014). Based on this idea, IP can be formalized based on the following three principles (Schrauwen et al.,2008):

• Information maximization: Maximum mutual information (see appendix A.1) between the input entropy of a neuron and its firing rate entropy, i.e. the output of the neuron contains as much information on the input as possible.

• Constrained output distribution: Neurons have a limited range of operation (firing rate range of non-linearity type) with highly sparse firing patterns as well as limits on its energy expenses.

• Adaptation of neurons intrinsic parameters: Biological neurons have been observed to adjust their intrinsic excitability and maintain firing rate homeostasis without the need to change individual synaptic connections(Zhang and Linden,2003).

In (Triesch, 2007) a model of intrinsic plasticity based on changes to the neuronal non-linear activation function was introduced. A gradient rule for direct minimization of the Kullback-Leibler divergence between the neuronal current firing-rate distribution and maximum entropy (fixed mean) exponential output distribution was motivated. Subsequently in(Schrauwen et al., 2008)an IP rule for the hyperbolic tangent transfer function with a Gaussian output distribution (fixed variance maximum entropy distribution) was derived. During testing the adapted reser-voir dynamics, it was observed that for temporal tasks requiring linear responses the Gaussian distribution performs well. However on non-linear tasks, the exponential distribution gave a better performance. In this thesis, with the aim to obtain sparser output codes with increased signal to noise ratio for stable temporal memory processing, we derive and implement a generic learning rule for IP using the Weibull distribution as the target output distribution, for the reservoir neurons.

The Weibull distribution is a 2-parameter continuous distribution, such that its shape and scale parameters can be adapted to account for various shapes of the neuron activation function (Eq. 2.12). The Weibull distribution has a high kurtosis number leading to sparser output codes and can generalize between a wide range of cumulative distribution functions. Unlike the previous models of fermi transfer functions(Triesch,2007),(Steil,2007), here we use the Weibull distribution as the target output distribution and derive a generic stochastic learning rule for tan-hyperbolic (tanh) neuronal non-linearity. This is primarily aimed at firing rate homeostasis as well as optimal information flow between the input and output of each reservoir neuron.

2 Self-adaptive Reservoir Network for Temporal Information Processing (Methods)

Figure 2.4: Example of generalized Weibull intrinsic plasticity for a single reservoir neuron (left) A hyperbolic tangent neuron firing rate function with initial shape and bias parameters, a = 1.0 and b = 0.0. Randomly selected input stimuli from a Gaussian distribution (zero mean and standard deviation 0.5) result in neuron firing rate output from a broad Gaussian distribution. (right) After intrinsic plasticity assuming an optimal Wiebull output distribution (with parameters α= 1.0 and β = 0.15), the neuron firing rate curve shifts (learned mean value of a = 1.5087 and b = −1.1366). As a result for the same input from a Gaussian distribuition, the reservoir neuron output activity follow an maximal entropy Exponential like distribution. The Weibull distribution allows flexible adjustment of the optimal distribution shape by changing the parametersαandβ accordingly.

Deriving the IP Rule for Neuron Activation Function Parameters:

The probability distribution of the two-parameter Weibull random variabler is given as follows:

f_weib(r;β, α) = α

The parameters α > 0 and β > 0 control the shape and scale of the distribution respectively.

Between α = 1 and α = 2, the Weibull distribution interpolates between the exponential dis-tribution and the Rayleigh disdis-tribution. Specifically between α = 3 and α = 5, we obtain an almost normal distribution. Due to this generalization capability it serves best to model the actual firing rate distribution and also account for different types of neuron non-linearity. The neuron firing rate parametersaandbof Eq. 2.12 can be calculated by minimizing the Kullbeck-Leibler (K-L) divergence between the actual output distribution of the reservoir neurons activity f_r(r) and the desired distribution f_weib(r) with a fixed mean firing rateβ (Fig. 2.4).

2.2 Self-adaptive Reservoir Framework

The KL-divergence between fr(r) andf_weib(r) is given by:

D=D_KL(f_r(r), f_weib(r)) =

Here,H(r) is the firing rate entropy (self-information) of a reservoir neuron.

We know,

(from Eq. 2.12)⁵ for a single neuron with input x and output r and representing the integrals in terms of the expectation (E) quantities, the above relation can be simplified to (here C are constant terms):

D=−E

Recall that the tanh non-linearity can be represented in the exponential form as follows:

r =tanh(ax+b) = e^2(ax+b)−1

e^2(ax+b)+ 1 (2.24)

Thus, differentiating this w.r.tx,aandband representing in terms ofr we get the following set of base equations:

5The activation are time dependent, however here we neglect the time variable for mathematical convenience.

2 Self-adaptive Reservoir Network for Temporal Information Processing (Methods)

Using the partial derivatives from Eq. 2.25 and differentiating Dw.r.t the parameter b yields:

∂D

Similarly differentiatingD w.r.t the parameteraresults in:

∂D

From the above equations we get the following on-line learning rule with stochastic gradient descent with learning rate η

Note: This relationship between the neuron parameter update rules (∆aand ∆b) is generic and valid irrespective of the neuron non-linearity or target probability distribution.

In general this local IP rule tries to robustly adapt the internal dynamics of the reservoir in an input driven and completely unsupervised manner. In contrast, the neural timescale adaptation rule tries to modulate the neuronal time constants, effectively matching the timescales in the incoming time varying stimuli. This is based on a quantification of the extent of influence that the past activity of a neuron has on it’s activity in the immediate future. We therefore combine IP learning with the neuron timescale adaptation rule in series. The time constant adaptation is carried out after the intrinsic adaptation of the neuron non-linearity. This combination leads to a single self-adaptive framework that controls the local memory of each neuron based on the incoming input to the network, while preventing runway dynamics (homeostasis). In the next section we will present the supervised plasticity mechanism to learn the reservoir to readout and internal reservoir weights, in a task dependent manner.

Im Dokument Temporal information processing and memory guided behaviors with recurrent neural networks (Seite 38-42)