Coding of low-dimensional variables with spiking neural networks

(1)

Coding of low-dimensional variables with spiking

neural networks

vorgelegt von

M.Sc.

Veronika Koren

ORCID: 0000-0003-2920-2717

an der Fakultät IV - Elektrotechnik und Informatik

der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften

Dr.rer.nat.

-genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Henning Sprekeler Gutachter: Prof. Dr. Klaus Obermayer Gutachter: Prof. Dr. Valentin Dragoi Gutachter: Prof. Dr. Benjamin Lindner

Tag der wissenschaftlichen Aussprache: 27.11.2019

(2)

(3)

Spikes, extremely precise temporal signals, are believed to be the main mean of commu-nication between neurons. However, it is at present unclear how can be the information, contained in spike timing, utilized for encoding of low-dimensional variables, that presum-ably guide animal’s behavior. Based on work by Boerlin, Machens and Deneve (Boerlin et al. 2013), we derive a functional model of spiking neural activity that exploits in-formation in spike timing. The model represents an arbitrary low-dimensional variable by tracking its inputs with its spiking activity, and a spike is produced whenever this improves the estimation of the input signal. Precise spike timing is a build-in feature of such a model, and is an alternative to bottom-up descriptions of neural dynamics. Cod-ing functionality is based on a geometric description, where each neuron is attributed a coding weight that determines neuron’s role for representation, computed at the network level. Coding weight determines how does the neuron weight its inputs, what is the effect of neuron’s spike on connected neurons, as well as on the read-out of spiking activity. Even if many neurons share the same coding weight, and are therefore redundant in their coding function, the design of the network ensures that spiking activity is nevertheless efficient. We show that maximally efficient regime for coding coincides with asynchronous spiking, interspersed with occasional synchronized bursts, and show how recurrent and lateral connections generate these bursts.

In the rest of the thesis, we study decoding models on parallel spike trains in behav-ing monkey, performbehav-ing a visual discrimination task on binary stimulus classes. While decision-making is traditionally studied with respect to the neural activity in high-level, decision-making areas, we instead decode correct choice behavior from the spiking activ-ity in sensory areas V1 and V4. We show that a linear classifier on parallel spike counts predicts animal’s behavior better than chance. From the classification model, we compute decoding weights, that tell what is the role of each neuron within the population for the classification task. We show that, in particular in V4, decoding weights allow various insights into the structure of pair-wise interactions and coupling of the activity of single neurons with the population. First, we show that in V4, neurons with strong weights are more strongly coupled, synchronized and correlated than uninformative neurons. Second, we show that coupling, synchrony and correlations are stronger between neurons with the same sign of the weight compared to others. Finally, we show that correlations between neurons with the same sign of the weight decrease the performance of the decoder.

We proceed by building a biologically interpretable model of the read-out of parallel spike trains in single trials. We compute the synaptic current of a read-out neuron that receives synaptic inputs from a population of projecting neurons. We assume that spikes are weighted by a vector of decoding weights, where decoding weights reflect the role of each neuron for the computation at the network level. Resulting signal allows to predict the choice behavior of the animal, while simpler methods as the population PSTH entirely

(8)

fail to do so. Disentangling superficial, middle and deep layer of the cortex, we show that in both V1 and V4, superficial layers are the most important for discrimination. We also show that the read-out signal of neurons with positive and negative weights is negatively correlated.

During the experiment, the animal is rewarded for correct behavior. The representation of the behavioral choice, however, must also take place when the choice is incorrect. We ask whether the decoding model, trained in the presence of the information on both stimulus and choice (e.g., the correct choice), generalizes to decoding in the context of the choice alone. We show that such generalization takes place in V1, but fails in V4. In V1, the choice signal can be discriminated during the second half of the trial. Similarly to decoding in the context of stimulus and choice, the choice signal is the strongest in the superficial layer of the cortex and the read-out of neurons with positive and negative weights is negatively correlated. In contrast to decoding in the context of stimulus and choice, decoding of choice requires the information on spike timing. In general these results show the similarity of representation of stimulus classes and corresponding behavioral choices in the primary visual cortex of the macaque.

Keywords: Coding; computation; representation; spiking neural network; popula-tion code; decoding; stimulus; choice; behavior; classificapopula-tion; informapopula-tion; visual cortex; macaque; V1; V4; transfer learning; prediction

(9)

Financial disclosure

The author of the thesis has been financially supported by the scholarship of the Doctoral Program of the Bernstein Center for Computational Neuroscience Berlin (BCCN Berlin). The Doctoral Program has been financed by Deutsche Forschungsgemeinschaft (grant number GRK 1589/2). The financial source had no influence on the study.

Acknowledgements

I would like to thank my supervision Prof. Klaus Obermayer for his interest in my work, as well as Prof. Benjamin Lindner, Prof. Valentin Dragoi, Prof. Taro Toyoizumi, Prof. Tilo Schwalger and Prof. Henning Sprekeler for scientific and didactical interaction. I would also thank my colleagues, Dr. Caroline Matthis, Dr. Christian Donner, Dr. Tiziano D’Albis, Youssef Kashef, Robert Seidl, Ivo Trowitzsch and Dr. Fatma Deniz for scientific discussions, comments on the previous draft and personal encouragement. Namely, Caroline has repeatedly engaged in lenghty discussions on many methodological questions that I encountered during my PhD. I am grateful to Prof. Sophie Denève who has raised my interest for computational neuroscience. On the personal level, I owe a lot to my parents that have encouraged me throughout my studies. Finally, I would like to thank my husband for his outstanding support.

(10)

0.2 General introduction

0.2.1 The topic of the thesis and its structure

“To understand life, one has to understand not just the flow of energy, but also the flow of information."

(William Bialek, 2012)

The brain is a complex system that can be studied in many ways. From the point of view of biology, the brain is part of animal’s body, performing a specific function. In general terms, the brain processes information that can be of use to the animal agent. The information comes from the outside world to the sensory periphery, where stimuli activate sensory receptors, which in turn activate sensory neurons that project into the central nervous system. On the other end of the functional chain of information processing, motor neurons project on muscles and control their activation, informing them how to move the body. Between sensory periphery and motor neurons, there is a vast and highly complex network, the central nervous system, where the information passing goes from one neuron to another.

According to Darwin’s theory of evolution, every living creature is subjected to the evolutionary pressure that shapes its body and brain through the genes. The brain func-tion is also subjected to the evolufunc-tionary pressure, and informafunc-tion processing should be performed in such a way as to favor the survival of the owner’s species [25]. This implies that the way the brain performs its function should strive for optimality. The question is, in what ways should it be optimal? A straightforward guess is that information processing should be precise, reliable, and robust, such that small perturbations do not disrupt it. However, it has been understood that there is another important facet to this problem, which is energy efficiency. The brain is an energetically expensive organ for the body. In humans, the brain of a developing child burns more than half of the available energy, while the brain of an adult uses about one fifth of the body’s energy [88, 53]. From the energy budget devoted to the brain, approximately half of it is used for the baseline maintenance and the other half for active signal processing [88]. Energy efficiency might therefore be a major constraint for the brain function and might play an important role

(11)

in the way neural networks are designed. The first chapter of the thesis introduces an efficient model of coding, a spiking neural network that estimates its inputs with spiking activity. The representation emerges from the activity of the entire network. Spikes are fired when they are necessary for coding, but we show that the network is unstable unless activity is controlled with metabolic cost on spiking. This chapter is based on published

work [47], available on the website of PLoS Computational Biology 1_.

Biological neurons are believed to encode and generate signals that are meaningful for animal’s behavior. In chapters 2, 3, and 4, we study decoding models that exploit the structure of activity of neural populations in V1 and V4 visual cortices of the monkey. In chapter 2, we decode the choice behavior of the monkey from parallel spike trains. We use a decoder that gives insights into the structure of population responses and study this structure in relation to pair-wise dynamics of the network. This work is under review

and can be accessed on BiorXiv 2_{. In chapter 3, we return to spike trains and study a}

low-dimensional read-out of parallel spike trains, the same as proposed in the efficient model. The efficient model deals with spikes that encode arbitrary inputs, while in the case of recorded data, spike trains are linked to a specific representation of stimuli and of the behavioral choice that the animal subject is experiencing. We use the population vector from the chapter 2 to determine the model of the read-out. This work has been

published in PLoS ONE 3_{. Finally, we ask how does decoding in the presence of the}

information about both the stimulus and the choice transfer to decoding in the presence of the information about the choice alone. The animal sometimes makes a wrong choice, and we hypothesize that if visual cortices represent information about the choice also in absence of the information about stimuli, the erroneous choice has to rely on a neural representation that is partially similar to the representation of the correct choice. The

related pre-print can be accessed on BiorXiv 4

1_{https://doi.org/10.1371/journal.pcbi.1005355 (CC BY 4.0)} 2_{https://doi.org/10.1101/645135}

3_{https://doi.org/10.1371/journal.pone.0222649 (CC BY 4.0)} 4_{https://doi.org/10.1101/2020.01.10.901504}

(12)

0.2.2 Coding of visual stimuli and of behavioral choice: short literature re-view

Reading-out the choice behavior from activity of single neurons

The importance of considering behavioral output of animals when studying neural activity has been increasingly recognized over the last decades [50]. An important step towards understanding how such signals may arise has been made when electrophysiology and psychophysics, independent disciplines in the past, have been brought together to study the relation between stimuli, neural activity and behavior [15]. The pioneering work in this direction has been done by studying the decision-making, where the experimenter controls the visual stimuli and measures the choice behavior of the animal, as well as the neural activity in a particular part of the cortex. Traditionally, the neural activity has been measured in single neurons, one-by-one. This had an important influence on neuroscience, since it has put the focus of research on single neurons.

According to the mainstream theory on decision-making, the information about the stimulus is processed in sensory areas and is from there transmitted to decision-making areas, where the activity would encode the choice variable [39, 80]. An important number of studies on decision-making have been conducted in the middle temporal (MT) area, where neurons are sensitive to motion direction, and with moving dots as stimuli, where a multitude of dots move more or less coherently in a particular direction. While neurons in MT are sensitive to a particular feature of stimuli (direction of motion), the activity in the lateral intraparietal area (LIP) contains decision-related signals [70]. An earlier study has proposed that LIP accumulates choice-related information from the MT [83], while later study has shown that spiking responses from LIP best predict the behavioral choice if they are themselves integrated (on two time scales) [70]. In any case, prediction does not necessarily imply causation, and in fact, the causal role of LIP for the behavior is still under debate [71]. It has been shown that silencing of LIP does not have a significant effect on the choice behavior, while silencing the MT in the same experiment did importantly compromise the behavior [51].

The choice probability is the probability of predicting a (binary) behavioral choice from spike counts of a single neuron. It is computed from the distribution of spike counts

(13)

in two behavioral conditions with the receiver-operating characteristics analysis. If the distribution of spike counts in the two conditions has little overlap, the cell has a strong choice probability. Choice probability in sensory areas is typically weak (e.g., 2-6 %) or at chance, and this number importantly depends on the area, the stimuli and the details of the experimental setup. In general, spiking activity of single neurons is weakly predictive of behavioral choices in MT [63, 15, 98, 16, 21], V4 [84] and V2 [65]. In the primary visual cortex, spike counts of single neurons were not considered to be informative about the choice behavior [65] until recently, when choice probability has been reported in orientation-selective neurons in V1, arguing that the presence of cortical maps for the relevant task feature might be crucial for the occurrence of the choice probability [68].

In addition, a study has shown that, in the barrel cortex of the rat, finding the most sensitive cells and biasing the read-out towards their activity, one can find cells that have a causal effect on animal’s choice behavior [43]. In general, many neurons activate during a decision-making task and the read-out from all cells does not capture the contribution of only a couple of single neurons, even if they are strongly sensitive to the choice behavior [9]. However, if the read-out is biased towards neurons that receive direct projections from a sensitive neuron, perturbation of a single cell can be detected in the read-out [9]. This explains why stimulation of a single cell can have a significant effect on the behavioral choice of the animal, as reported in [43]. While this result is intriguing, since it shows that a small perturbation can, in principle, be detected at the macroscopic level, it requires rather strong assumptions about the learning and the read-out, and it is unclear if these assumptions are met in natural conditions, without a specific intervention of the experimenter.

Reading-out choice behavior from the activity of correlated neural circuits

Early studies on choice probability concluded that the sensitivity to changes in stimuli of single neurons is comparable to the sensitivity of the behavioral output, captured by the psychometric curve, [15, 16, 80]. This led to the hypothesis that a small number of single neurons has a causal effect on the behavioral choice of the animal [15, 16]. Later study has found that sensitivity of single neurons has been previously overestimated

(14)

and that an average single neuron is less sensitive than the behavioral response [21]. In any case, the question remained how does the brain “combine" responses of a vast number of sensory neurons to build a choice-related signal. Combining responses of even a small network of independent neurons results in sensitivity that is larger than the sensitivity of the psychometric curve [80]. However, neurons in the brain are, at least locally, correlated, and correlations have important implications for the read-out. While positive correlations between neurons with the same selectivity decrease the quantity of transferred information, same correlations between neurons with different selectivity might increase the information in the read-out [6]. Therefore, not all correlations are harmful for the read-out [62], and which correlations are harmful depends on the sign of correlation and the structure of the population responses [45].

The information transfer might not be the only criterion to consider when studying the impact of correlations for coding. It has been pointed out that the signal of a small number of neurons might be too weak of a signal to be read-out by the downstream network, and that only correlated activity of neuronal pools can be transmitted downstream [67]. In this setting, inter-neuron correlations play the role of an amplifier of neural signals. If neurons with the same (or similar) coding function are more strongly correlated than neurons with very different coding function, the signal of neurons with the same coding function will be amplified by correlations [15, 21, 67].

In presence of parallel activity of many neurons that are observed simultaneously, a natural choice for a decoder is a multivariate model that takes into account correlations across units. In the motor cortex, the activity of individual neurons can be represented as vectors that make weighted contributions along the axis of their preferred direction [35]. It has been shown that the sum of vectors represents the direction of motion in single trials [35]. Application of multivariate models of decoding has recently brought important insights about the mixed selectivity of single neurons within the population to features of the stimuli [75]. Multivariate approach seems to be useful also for decoding“internal" variables. The vector of population responses has been shown to predict fluctuations of attention on a trial-by-trial basis [22]. While the notion of the population code is not new and has been shown to describe well the neural activity in different settings (see [77]

(15)

for a review), the population code has sometimes been overinterpreted [30]. As one stud-ies a certain population code, it is advisable to study its relation to responses of single neurons, to show whether the activity of the network can be accounted for by summing contributions of single neurons [30]. The debate on what is the main neural support for the representation of behaviorally relevant variables is still ongoing [1]. Namely, vari-ous models describe “static" population codes, based on the notion of tuning curve and, therefore, firing rate of neurons within the network [1]. It is questionable if such a code can explain the neural function, since neurons responding to stimuli have rich temporal dynamics and are known to change their firing rate as fast as within one millisecond [82]. Recently, theoretical work has suggested a population code that exploits the information in spike timing [14]. In the following, we will describe the theory of such a dynamic estimator, and how it is realized with a recurrent spiking neural network.

(16)

(17)

Chapter 1

Theoretical model of an efficient neural

network

Highlights

We describe a generic function approximator, implemented by a spiking neural net-work.

Network represents an arbitrary input using spike timing.

Low-dimensional input is represented in the high-dimensional space of spike trains. An optimal working regime is a trade-off between coding and metabolic efficiency.

(18)

1.1 Abstract

Neural spikes are known as extremely precise temporal signals of cortical neurons. We propose a network whose functionality relies on spike timing. From the design of the network, we have that a specific neuron fires when this improves the estimation of the input. In addition, we take into account the fact the neural activity is energetically demanding for the organism, and control firing rates with a cost on spiking. Network represents an arbitrary low-dimensional input with high-dimensional spike trains, and the read-out of the network activity is the projection of spike trains back into the space of the low-dimensional signal. This setting allows to represent identical signals with many different spiking patterns, reproducing trial-to-trial variability that is observed in cortical neurons. Network has strong lateral connections and its dynamics can results in fast volleys of spikes between neurons with opposite selectivity. This effect can be controlled by costs on spiking, that increase firing thresholds and control the strength of the reset after spiking in single neurons.

Keywords: population code; spike timing; efficiency; variability; bursts; spontaneous activity; optimal code; encoding; decoding; representation; noise

(19)

1.2 Introduction: Objectives and requirements of the model

One way to design an efficient neural network is to posit that the activity of the network should take into account two objectives, one is coding precision and the other is cost in energy that the network spends for its activity. At the same time, we require that coding is reliable and robust by design. A reliable network is the network that reliably converts its inputs into desired outputs, while robustness is the capacity to function properly in presence of disturbances. For example, a reasonable degree of the external noise or a failure of one specific neuron within the network should not excessively disrupt the coding process. This is important, since neural networks in the cortex are subject to various sources of noise [31]. Finally, it is be desirable that the network complies with most important characteristics of neural activity, as they have been observed in the cortex. An intriguing characteristics of neural activity that is omnipresent in the cortex (and widely debated in theoretical neuroscience) is the variability of neural responses. As identical stimuli are presented to the visual system, the activity of cortical networks is highly variable [24, 74]. What varies is the pattern of spike timing that the network emits. Considering an example neuron from the network, the statistics of its spiking activity is relatively well approximated with the Poisson process, where the mean rate of discharge is the same as its variance. The main source of this variability seems to come from the network, since single cells have been shown to spike reliably [10]. As single neurons in the neocortex were injected with a current that mimics the statistics of the current that neurons receive in the natural environment, neurons spiked reliably to 1 ms precision [59]. The mechanism of the variable discharge of cortical neurons can be accounted for by the network of many neurons with balance of excitation and inhibition. If total excitatory and inhibitory inputs to the neuron are tightly balanced, the mean of excitatory and input currents cancel, and the spiking is driven by transient fluctuations (i.e., the variance) of inputs [96, 97]. In the “tight balance" regime, most of the input a single neuron receives is due to the surrounding network, and the external input is smaller than the recurrent input. Also, the response to the input is linear. Recently, a model with “loose balance" has been proposed, where the external input can be proportional to the recurrent input and where the input-output function of the network can be sublinear or supralinear, depending on

(20)

the strength of the input [4]. While the concept of the balanced network can explain the source of variability of cortical responses in mechanical terms, the question remains, how can variable spiking patterns carry reliable signals, that presumably underlie coherent perception and animal’s behavior?

As a toy example of this problem, consider the following scenario: we expose the visual system to two identical images and record neurons in the primary visual cortex (V1). The percept elicited by the two images is (nearly) the same, while the network responds with two fairly different spiking patters. (For simplicity, we will talk about two identical percepts, while in reality, we would expect that the two percepts are only very similar, but of course not identical.) Presumably, identical percepts are due to identical signals in the brain. This implies that in our example, two different spiking patterns must be decoded as the same signal. Moreover, since we have exposed the visual system to the same stimulus two times, and provided that the signal is reliably transmitted between the retina and the V1, the two instances of the signal entering the network must also be (nearly) identical. We therefore have two identical inputs and two identical outputs, that in between give rise to two completely different spiking patterns of the neural network. It has been shown that such coding scheme can be explained [14, 47]. We assume that neurons within the network have a redundant coding function and that the connections between neurons have a low-dimensional structure [47]. If the spiking activity of the network is of higher dimensionality than the output signal, an identical output can be a result of many spiking patterns. In other words, the projection between the high-dimensional space of spike trains and the low-dimensional space of the output signal allows for many different realizations of the spiking pattern to be decoded as the same output signal.

1.3 Results

1.3.1 Spiking neural network with coding and metabolic efficiency

In the following, we give an overview of the derivation of the model. The model assumes

(21)

has M weights, one for each dimension of the input signal, 𝑊 = [𝑤𝑚,𝑛]; 𝑚 = 1, ..., 𝑀 ; 𝑛 =

1, ..., 𝑁. Several dimensions of the input signal stand for neuron’s selectivity for multiple

features of stimuli. In the primary visual cortex, for example, neurons are sensitive to orientation, spatial frequency, direction, color, temporal frequency and disparity of visual stimuli [18].

The model is a generic function approximator, as it approximates the function of its inputs with the spiking activity of the network. If the external input to the network in

the m-th dimension is 𝑠𝑚(𝑡), the signal in the 𝑚−th dimension is given by the convolution

of the signal with an exponential kernel 𝑢(𝑡) = exp(−𝜆𝑣𝑡).

𝑥𝑚(𝑡) =

∫︁ ∞

0

𝑠𝑚(𝑡 − 𝜏 )𝑢(𝜏 )𝑑𝜏 (1.1)

The spike train of the single neuron 𝑛 is defined with a Dirac delta distribution,

𝑜𝑛(𝑡) =

∑︁

𝑘

𝛿(𝑡 − 𝑡𝑘_𝑛) (1.2)

with 𝑡𝑘

𝑛 the 𝑘-th spike of the neuron 𝑛. The instantaneous firing rate of the neuron 𝑛 is

the convolution of its spike train with the same kernel.

𝑟𝑛(𝑡) =

∫︁ ∞

0

𝑜𝑛(𝑡 − 𝜏 )𝑢(𝜏 )𝑑𝜏 (1.3)

The approximated signal in the 𝑚−th dimension is the weighted sum of firing rates across neurons in the network.

^ 𝑥𝑚(𝑡) = 𝑁 ∑︁ 𝑛=1 𝑤𝑚,𝑛𝑟𝑛(𝑡) (1.4)

The dynamics of the network is defined as an optimization problem, and spiking activity is conditioned on coding and metabolic efficiency. The spiking activity of the network is such as to minimize the distance between the signal and the estimated signal, while also taking into account the number of spikes that are used for representation. This gives the following error function,

E(𝑡) = 𝑀 ∑︁ 𝑚=1 (𝑥𝑚(𝑡) − ^𝑥𝑚(𝑡))2+ 𝜇 𝑁 ∑︁ 𝑛=1 𝑟𝑛(𝑡)2+ 𝜈 𝑁 ∑︁ 𝑛=1 𝑟𝑛(𝑡) (1.5)

(22)

with 𝜇, 𝜈 > 0. Notice that the error function is time-dependent. This is so because the network approximates its inputs in real-time, and sampling is not required. In eq. 1.5, the first term on the right-hand side computes the distance between the signal and the estimated signal, while the second and the third term on the right-hand side are L1 and L2 regularizers that control the strength of firing rates (L1 and L2) and their distribution across neurons (L2).

1.3.2 Computation of the membrane equation

We pose that there will be a spike in a particular neuron if this minimizes the error function.

𝐸(𝑡|spike of neuron 𝑘) < 𝐸(𝑡, no spike) (1.6)

Applying this condition on spiking to eq. 1.5 and developing the expression, we get the following inequality (see Methods, eq. 1.31 - 1.46):

𝑀 ∑︁ 𝑚=1 [𝑤𝑚,𝑘(𝑥𝑚(𝑡) − ^𝑥𝑚(𝑡))] − 𝜇𝑟𝑘(𝑡) > 1 2( 𝑀 ∑︁ 𝑚=1 𝑤2_𝑚,𝑘+ 𝜇 + 𝜈) (1.7)

On the left-hand side, we have the difference between the signal and the estimated signal, weighted by the weight of the neuron 𝑛. Since the left-hand-side of eq. 1.7 is time-dependent, we assume it is equivalent to the membrane potential. The right-hand side of eq. 1.7 is assumed to be equivalent to the firing threshold. Also, eq. 1.7 is not only for the neuron that spiked, but is valid for any neuron.

𝑉𝑛(𝑡) = 𝑀 ∑︁ 𝑚=1 [𝑤𝑚,𝑛(𝑥𝑚(𝑡) − ^𝑥𝑚(𝑡))] − 𝜇𝑟𝑛(𝑡) (1.8) 𝜃𝑛= 𝑀 ∑︁ 𝑚=1 𝑤2 𝑚,𝑛 2 + 𝜇 2 + 𝜈 2 (1.9)

Using the definition of the signal and the estimated signal, we get an expression for the dynamics of the membrane potential for the neuron 𝑛 (see Methods, eq. 1.47 - 1.54),

˙ 𝑉𝑛(𝑡) = −𝜆𝑣𝑉𝑛(𝑡) + 𝑀 ∑︁ 𝑚=1 (︁ 𝑤𝑚,𝑛𝑠𝑚(𝑡) −∑︁ 𝑘 𝜑𝑘,𝑛𝑜𝑘(𝑡) )︁ − 𝜇𝑜𝑛(𝑡) (1.10)

(23)

where Φ = [𝜑𝑖,𝑗]; 𝑖, 𝑗 = 1, ..., 𝑁 is the matrix of recurrent and lateral connections,

com-puted as Φ = 𝑊𝑇_𝑊_{. In case there was a single spike at neuron 𝑘, the membrane potential}

of the neuron 𝑛 can be written in the following way: ˙ 𝑉𝑛(𝑡) = −𝜆𝑣𝑉𝑛(𝑡) + 𝑀 ∑︁ 𝑚=1 (︁ 𝑤𝑚,𝑛𝑠𝑚(𝑡) − 𝑤𝑚,𝑛𝑤𝑚,𝑘𝑜𝑘(𝑡) )︁ − 𝜇𝑜𝑛(𝑡) (1.11)

Dynamics of the membrane potential (eq. 1.11) depends on a leak term, a feed-forward

input, a recurrent/lateral input and the term −𝜇𝑜𝑛(𝑡), that hyporpolarizes the neuron

that just spiked. The feed-forward input weights the sum of external inputs, 𝑠𝑚(𝑡), with

neuron’s weight, 𝑤𝑚,𝑛. According to the sign of the weight of the neuron 𝑛, same input can therefore have a depolarizing or hyperpolarizing effect on neuron’s membrane potential.

Input from lateral connections weights the spike of activated neuron, 𝑜𝑘(𝑡), with the

product of weights of the activated neuron and the neuron 𝑛, −𝜑𝑘,𝑛 = −𝑤𝑛𝑤𝑘. If the

neuron that spiked was the neuron 𝑛 itself, we get a strong negative current, −𝑤2_{𝑛, that}

functions as the reset of the neuron 𝑛. If weights of the spiking and receiving neuron have the same sign, the current received by the neuron 𝑛 is negative, and if weights have opposite signs, the current is positive. The last term on the right-hand side of eq. 1.11 has the effect of increasing the reset. It is nonzero only when the neuron 𝑛 spikes, and when the neuron 𝑛 spikes, the term −𝜇𝑜(𝑡) hyperpolarizes neuron’s membrane potential, creating relative refractory period for the neuron.

The firing threshold of the neuron 𝑛 is therefore proportional to the sum of weights across M dimensions of the stimulus, and constants 𝜇 and 𝜈. As it can be understood from eq. 1.5, constants 𝜈 and 𝜇 control firing the rate. They do so through increasing the reset (𝜇), and by increasing the firing threshold (both 𝜈 and 𝜇), and the trade-off between the coding and the metabolic efficiency can be regulated by regulating these two parameters. Note that none of described effects on the membrane potential were imposed, but result from derivation of the objective function.

1.3.3 Minimal models of spiking activity

Error correction with one neuron. The coding function of the model can be illustrated

(24)

Let’s assume that the weight of the neuron is a unit, 𝑤 = 1, and that both linear and quadratic cost constants are zero, 𝜇 = 𝜈 = 0. The membrane equation of such a neuron is a leaky integration of the external input 𝑠(𝑡), and the threshold corresponds to the half of its weight. ˙ 𝑉 (𝑡) = −𝜆𝑣𝑉 (𝑡) + 𝑠(𝑡) − 𝑜(𝑡) (1.12) 𝜃 = 𝑤 2 2 = 1 2 (1.13)

Since we have that 𝑉 (𝑡) = 𝑥(𝑡) − ^𝑥(𝑡), the neuron spikes as the distance between the signal and the estimate is half of its weight (for a schema, see fig. 1.1A). As the neuron spikes, it is reset for the amount corresponding to the absolute value of its weight (with the third term on the right-hand side of eq. 1.12). The model neuron does not require any learning, since the design of the model is such as to ensure that spikes occur with the right timing. Coding with a single neuron does not require regularization.

Redundant units but efficient spikes. We now consider a small network of four neurons,

two with positive weights (neuron 1 and 2, plus neurons), and two with negative weight (neurons 3 and 4, minus neurons). When the estimated signal is too low, one of plus neurons activates to pull the estimate up, and when it is too high, one of the minus neurons spikes to pull it down (fig. 1.1B). If the weight of the two neurons with the same sign is identical, it is irrelevant which of the two neurons spike, since both correct for the error equally well. As we add more neurons to the network, this simple principle results in irregular spiking patterns (fig 1.1C). Now there are 25 plus neurons (pink) and the same number of minus neurons (blue), and the absolute value of weights is the same for all neurons. As the signal is increasing, plus neurons fire, but the spatial pattern of their firing is irrelevant for the estimate. It can be noticed that having more neurons and

scaling weights proportionally to the number of neurons (i.e. with 1

𝑁), we can keep the

mean firing rate of single neurons constant, but estimate the signal with better precision (fig 1.1C).

(25)

Effect of recurrent and lateral interactions of 2 neurons with opposite weights. As one of plus neurons from fig. (fig 1.1B) spiked, it has sent a negative current to the neuron

2 through lateral connections (eq. 1.11, −𝑤𝑛𝑤𝑘 < 0 for 𝑠𝑔𝑛(𝑤𝑛) = 𝑠𝑔𝑛(𝑤𝑘)), making

it unlikely for the other neuron with the same weight to spike in the next step. This is desired, since the error has already been corrected by the neuron 1, and inhibiting the neuron 2 avoids redundant error correction. The same spike has send a positive current to

neurons 3 and 4, (since −𝑤𝑛𝑤𝑘 > 0 for 𝑠𝑔𝑛(𝑤𝑛) ̸= 𝑠𝑔𝑛(𝑤𝑘)), increasing their probability

of firing in the next time step. To put the effect of lateral connections in evidence more clearly, we study the same network in the absence of external input (𝑠(𝑡) = 0 ∀𝑡). In this case, the spike of a plus neuron sends enough positive current to neurons 3 and 4 to make one of them fire in the next time-step. The spike of one of minus neurons in turn excites plus neurons, and one of those will spike again. This mechanism creates fast volleys of spikes between neurons with positive and negative weight (fig 1.1D).

0 30 0 1 2 3 -80 -40 -1 1 0 100 -80 -40 0 30 0 1 2 3 50 Time [ms] S ig na l A N eu ro n id Time [sec] Time [sec] B D C S ig na l V m N e ur on id S ig na l N eu ro n id V m

Figure 1.1. Coding with spike timing. (A) Encoding a signal (blue) with the read-out of the spike train of a single neuron (red). The yellow region indicates the coding error. (B) Encoding of an oscillatory signal (gray) with the read-out of four neurons. Two neurons have an identical positive weight (pink spikes) and two neurons have an identical negative weight (blue spikes). The lower plot shows the membrane potential of one plus and one minus neuron. (C) Encoding of the same signal with 25 plus neurons (pink) and 25 minus neurons (blue). (D) Fast volleys of spikes between plus and minus neurons in the absence of the external input, but in presence of white noise in the membrane potential. We show the signal and the estimate (top), spike trains (middle) and the membrane potential (bottom).

(26)

To understand better this effect, we consider a small network of two neurons, one with

positive weight, 𝑤1 = 1, and one with negative weight, 𝑤2 = −1. We keep the network

1-dimensional, without the external input, and keep the linear and the quadratic costs at zero. The weight vector is w = [1, −1] and the effect of recurrent and lateral connections is given by the projection of the weight vector on itself.

−w𝑇w o(𝑡) = − ⎡ ⎣ 1 −1 −1 1 ⎤ ⎦ ⎡ ⎣ 𝑜1(𝑡) 𝑜2(𝑡) ⎤ ⎦ (1.14)

A spike of each neuron has an inhibitory effect on itself, acting as reset, and an excitatory effect on the neuron with the opposite weight.

˙

𝑉1(𝑡) = −𝜆𝑣𝑉1(𝑡) − 𝑜1(𝑡) + 𝑜2(𝑡)

˙

𝑉2(𝑡) = −𝜆𝑣𝑉2(𝑡) − 𝑜2(𝑡) + 𝑜1(𝑡)

(1.15)

Since thresholds of both neurons are 𝜃1 = 𝜃2 = 1₂, lateral connections alone can drive the

spiking of the two neurons. We can calculate the membrane potential by integration.

𝑉1(𝑡) = ∫︁ ∞ 0 𝑜2(𝑡) − 𝑜1(𝑡) − 𝜆𝑣𝑉1(𝑡) 𝑑𝑡 𝑉2(𝑡) = ∫︁ ∞ 0 𝑜1(𝑡) − 𝑜2(𝑡) − 𝜆𝑣𝑉2(𝑡) 𝑑𝑡 (1.16)

To understand the effect of perturbation with a noisy spike, let’s assume that the system has been silent for a while, dissipating the effect of initial conditions, and that we have a noisy spike, provoked by accumulation of the noise at the membrane potential

in the neuron 1 at time 𝑡 = 𝑡′ _{− 𝑑𝑡}_{. At the time of the spike, the membrane potential}

of neuron 1 is at threshold, 𝑉1(𝑡 = 𝑡′ − 𝑑𝑡) = 𝜃1 = 1₂ , while the membrane potential of

(27)

neuron 1 and sent a positive current to neuron 2. 𝑉1(𝑡) = 𝜃1 (︁ − ∫︁ ∞ 𝑡′ 𝜆𝑣𝑉1(𝑡) + 𝛿(𝑡 − 𝑡′) 𝑑𝑡 )︁ = −1 2exp(−𝜆𝑣(𝑡 − 𝑡 ′ )) 𝑉2(𝑡) = ∫︁ ∞ 𝑡′ −𝜆𝑣𝑉2(𝑡) + 𝛿(𝑡 − 𝑡′) 𝑑𝑡 = exp(−𝜆𝑣(𝑡 − 𝑡′)) (1.17) At the time 𝑡 = 𝑡′_{, 𝑉1}_(𝑡′_{) = −}1 2 and 𝑉2(𝑡

′_{) = 1}_{. The membrane potential of neuron 2 is}

now above threshold, and neuron 2 spikes. This activates the reset current in neuron 2 and sends a positive current back to neuron 1.

𝑉1(𝑡) = − 1 2exp(−𝜆𝑣(𝑡 − 𝑡 ′_{)) + exp(−𝜆} 𝑣(𝑡 − 𝑡′ − 𝑑𝑡)) 𝑉2(𝑡) = exp(−𝜆𝑣(𝑡 − 𝑡′)) + exp(−𝜆𝑣(𝑡 − 𝑡′− 𝑑𝑡)) (1.18)

At time 𝑡 = 𝑡′_{+ 𝑑𝑡}_{, the membrane potential of the neuron 1 is again above threshold and}

will therefore spike again, while the membrane potential of the neuron 2 is close to zero.

𝑉1(𝑡′+ 𝑑𝑡) = 1 − ( 1 2− 𝜖) = 1 2+ 𝜖 𝑉2(𝑡′+ 𝑑𝑡) = −1 + (1 − 𝜖) = −𝜖 (1.19) In the particular case where the two neuron have weights of exactly the same magnitude and opposite sign, recurrent connections keep driving the network. In such a case, noise can in fact be help to disrupt spurious spiking.

Effect of spurious spiking on the read-out. The estimated signal, equivalent to the

read-out of the network activity, accounts for every spike. Here, we examine the effect of recurrent and lateral interactions on the estimated signal. We consider the case of two neurons described above, where neurons have weights of opposite sign, and the external input is at zero. If the external input is at zero, the signal is at zero at all times, 𝑥(𝑡) = 0 ∀𝑡.

Before the first noisy spike, the estimate of the signal is at zero as well, ^𝑥(𝑡) = 0 ∀𝑡 < 𝑡′_.

(28)

neuron 1’s weight at the next time step, ^𝑥(𝑡 = 𝑡′_{) = 1}_{, and decays thereafter.} ^

𝑥(𝑡) = exp(−𝜆𝑣(𝑡 − 𝑡′)) (1.20)

As neuron 2 activates at time 𝑡 = 𝑡′ _{+ 𝑑𝑡}_{, this pulls the estimate in the direction of the}

weight of neuron 2. ^

𝑥(𝑡) = exp(−𝜆𝑣(𝑡 − 𝑡′)) − exp(−𝜆𝑣(𝑡 − 𝑡′− 𝑑𝑡)) (1.21)

At time 𝑡 = 𝑡′_{+ 𝑑𝑡}_{, the estimate is close to zero. The error in the case with zero external}

input is equivalent to the square of the distance of the estimate from the origin.

𝐸(𝑡) = (0 − ^𝑥(𝑡))2

= ^𝑥2(𝑡)

(1.22) We therefore have that the first noisy spike has created a prediction error, while the second spike has corrected for most of this error.

𝐸(𝑡′) = ^𝑥2(𝑡′) = 1

𝐸(𝑡′+ 𝑑𝑡) = ^𝑥2(𝑡′+ 𝑑𝑡)

= (1 − exp(−𝜆𝑣(𝑡 − 𝑡′))2 ≈ 0

(1.23)

If the noise in the membrane potential of neuron 1 leads the neuron 1 to spike again, the error is re-created and then again corrected by the spike of neuron 2. In the presence of noise, we therefore can have an alternation of spikes that create and correct for the coding error.

1.3.4 Optimal trade-off between coding and metabolic efficiency

Spurious spiking can be controlled with regularization with linear and quadratic cost. We have shown that with costs at zero, even a small network of 2 neurons may enter periods of high-frequency spiking, driven by the interaction between neurons with opposite sign of the weight. With a bigger network, we would expect that more of such pairs may form. This would mean that neurons with the same sign of the weight synchronize, while pairs

(29)

of neurons with the opposite sign of weight fire in alternation. We test the effect of the linear and quadratic cost on network activity by simulating the activity of a middle-sized network of N=400 neurons, receiving 3 input features (M=3). The input is smoothed white noise,

˙𝑠𝑚(𝑡) = −𝜆𝑠𝑠𝑚(𝑡) + 𝐼𝑚(𝑡) (1.24)

with 𝐼(𝑡) randomly drawn from standard normal distribution, 𝐼𝑚 ∼ 𝒩 (0, 1), and with

𝜆𝑠 = 50−1. For simplicity, we assume that features of the stimulus are independent. For

each of the M features of the stimulus, the vector of coding weights is randomly drawn

from the standard normal distribution, w𝑚 ∼ 𝒩 (0, 1), with w𝑚 = [𝑤𝑚,1, ..., 𝑤𝑚,𝑁]. Since

neurons respond to 𝑀 features of the stimulus, we associate every single neuron with a

vector of M weights, one weight per feature, w𝑛 = [𝑤1,𝑛, ..., 𝑤𝑀,𝑛]𝑇. The weight vector of

the neuron 𝑛 is normalized across input features with the following norm: ˜ w𝑛 = w𝑛 ‖w𝑛‖ ‖w𝑛‖ = ⎯ ⎸ ⎸ ⎷ 𝑀 ∑︁ 𝑚=1 𝑤2 𝑚,𝑛 (1.25)

We use the transmission delay of 1 ms, identical across neurons. This network is used for the simulation of all results from now on, unless stated otherwise.

We keep the quadratic cost at zero, progressively increase the linear cost and measure the synchrony of spiking. The latter is measured as the percentage of neurons that fire in the same time step. At zero cost, we get 50 % of neurons firing synchronously, meaning that all neurons with the same sign of the weight spike in the same time step (fig. 1.2A). Similarly to the network with 2 neurons studied previously, recurrent connections drive fast volleys of spikes even without any noise in the system. However, this effect can be controlled with costs on spiking. As the linear cost is sufficiently high, synchronization drops from very strong synchrony to asynchronous spiking (fig. 1.2A). Such a behavior is not surprising, since the linear cost has a network-wide effect of increasing the firing threshold of all neurons for the same amount (eq. 1.9). As we set the linear cost to zero and progressively increase the quadratic cost, the synchrony of the network is falling exponentially but in a continuous fashion (fig. 1.2B). We argue this might be so because

(30)

5 10 15 20 0 25 50 0 2 4 6 25 50 0 100 0 30 0 100 0 30 Linear cost A E rr or [l og ] Quadratic cost % s yn ch ro ny Time [ms] Time [ms] B D C

Linear cost Quadratic cost

Figure 1.2. A trade-off between coding and metabolic efficiency. (A) Percent synchrony as a function of the linear cost, quadratic cost is at zero. There is no noise. (B) Percent synchrony as a function of the quadratic cost. The linear cost is at zero and there is no noise. (C) The error, on the scale of the natural logarithm, as a function of the linear cost. Quadratic cost is xed at 𝜇 = 5. The standard deviation of the noise is 𝜎 = 0.25, used thereafter unless mentioned otherwise. (D) The error as a function of the quadratic cost. Linear cost is xed at 𝜈 = 5.

the quadratic cost has the effect on the strength of the reset (eq. 1.10), affecting only neurons that have recently fired. Notice that for low costs, the estimate is strongly overfitting the signal, while for high cost the estimation is not precise (fig. 1.2B, insets). This makes us hypothesize that there is a set of costs that gives an optimal trade-off between coding and metabolic efficiency. The minimization of the error function depends on cost terms by definition (see eq. 1.5), however, it is not trivial to search for the metabolic efficiency that also ensures good coding efficiency. We measure the mean coding error and the average cost on spiking as a proxy for coding and metabolic efficiency. The mean coding error is measured as the mean absolute difference between the signal and the estimated signal, normalized with the average norm of the weights.

⟨error⟩ = 1 ⟨‖w‖⟩𝑛 1 𝑇 𝑀 𝑇 ∑︁ 𝑡=1 𝑀 ∑︁ 𝑚=1 |𝑥𝑚(𝑡) − ^𝑥𝑚(𝑡)| (1.26)

(31)

The average norm of weights is the norm defined in eq. 1.25, averaged across neurons,

⟨‖w‖⟩𝑛= _𝑁1

∑︀𝑁

𝑛=1‖w‖.

The effect of a spike on the estimated signal depends on the weight of the active neuron and normalizing with the norm of weights makes the measure of the coding error independent of a particular choice of the scaling of weights and the number of input dimensions. The average cost on spiking is computed simply as a time-averaged spike count. ⟨cost⟩ = 1 𝑇 𝑇 ∑︁ 𝑡=1 𝑁 ∑︁ 𝑛=1 𝑜𝑛(𝑡) (1.27)

We inject white noise in the membrane potential of each single neuron, 𝜂𝑛(𝑡) ∼ 𝒩 (0, 𝜎).

The noise is independent across neurons and in time, ⟨𝜂𝑖(𝑡), 𝜂𝑗(𝑡′)⟩ = 𝛿𝑖,𝑗(𝑡 − 𝑡′). As we

fix the quadratic cost parameter and measure the coding error as a function of the linear cost 𝜈, we find a minimum, indicating the optimal parameter 𝜈 for coding (fig. 1.2C, red). The ⟨cost⟩, on the contrary, keeps decreasing with increasing linear cost parameter (fig. 1.2C, green). Such behavior is expected, since 𝜈 increases the firing threshold and makes it less likely for neurons to fire. Nevertheless, it is interesting that the behavior of the ⟨cost⟩ goes from steeply decreasing to slowly decreasing and that this transition coincides with the minimum of ⟨error⟩. We compute the total error as the sum of the two error functions, ⟨Total error⟩ = ⟨error⟩ + ⟨cost⟩ (fig. 1.2C, black). The minimum of the

⟨Total error⟩ is given by the optimal cost parameter, 𝜈 = 𝜈*, and such cost parameter is

the one with the best trade-off between coding and metabolic efficiency. As we now fix the linear cost parameter at 𝜈 = 5 and vary the quadratic cost parameter 𝜇, we get similar results (fig. 1.2D). The minimum of the ⟨Total error⟩ allows to find optimal quadratic

cost 𝜇 = 𝜇*_{, that gives the best trade-off between precise coding with few spikes. Finally,}

we estimate the ⟨Total error⟩ as a function of both linear and quadratic cost constants simultaneously. The landscape of the ⟨Total error⟩ has multiple minima (fig. 1.3A, red dots), including a minimum at [𝜈, 𝜇] = [5, 5].

The behavior of the ⟨error⟩ depends on the variance of the noise 𝜎. As we vary

the variance of the noise, the optimal linear cost 𝜈* _{changes in a non-linear fashion,}

and the smallest 𝜈* _{occurs with weak (and not with zero) noise (fig. 1.3B). Since weak}

(32)

makes the network more responsive to inputs of small magnitude. Interestingly, optimal

cost constant 𝜈* _{is independent on the strength of the external input (fig. 1.3C). Also,}

as the strength of the noise varies, optimal linear cost parameter seems to keep the synchronization at the same level (fig. 1.3D).

O pt im al li ne ar c os t A B D C Linear _cost

Standard deviation noise

To ta l e rr or [l og ] O pt im al li ne ar c os t Input strength Quadratic co st

Standard deviation noise

F re q. b ur st a t o pt imum

Figure 1.3. Optimal linear cost adapts to the level of noise. (A) The ⟨Total error⟩, estimated jointly for the linear and quadratic cost parameters. The function has multiple minima (red dots). Black dots mark the diagonal (𝜈 = 𝜇). (B) Optimal linear cost as a function of the standard deviation of the noise. We show the optimum estimated with ⟨error⟩ (red) and with ⟨Total error⟩ (black). Quadratic cost is 𝜇 = 5. (C) The optimal linear cost as a function of the input strength. (D) Frequency of bursts with optimal costs as a function of the standard deviation of the noise. The criterion for having a burst is to have at least 20 % of neurons active simultaneously.

1.3.5 Linear cost on spiking controls the working regime of the network We select two values of 𝜈, one smaller and one larger than the optimal, and test the spiking activity of the network in absence of external drive with these values of the linear

cost constant: [𝜈 < 𝜈*_{, 𝜈}*_{, 𝜈 > 𝜈}*_{] (corresponding to points [A,B,C] on fig. 1.4A). With}

increasing 𝜈, the frequency of synchronous events decreases (fig. 1.4B). With the optimal

linear cost parameter, 𝜈*_{, synchronous bursts are still present but rare (point B), and}

(33)

oscillations for cost lower than optimal (fig. 1.4C, point A), mostly weak fluctuations around the zero signal at the optimum (point B) and a silent signal for cost higher than optimal (point C). In the network with external drive, we observe similar behavior. The

estimate is overfitting the signal with 𝜈 < 𝜈*_{, precisely estimating with 𝜈 = 𝜈}*_{, and}

sluggish and imprecise with 𝜈 > 𝜈*_.

400 400 0 2 4 6 8 10 400 0 50 0 20 0 500 1000 0 20 0 50 0 5 0 500 1000 0 5 S ig na l A B D C Time [ms]

Linear cost Time [sec]

To ta l e rr or [l og ] N eu ro n id . Time [ms] S ig na l optimum A C B A C B A B C A B C

Figure 1.4. Working regime for optimal and suboptimal linear cost. (A) Total error as a function of the optimal linear cost. We mark the minimum of the function, given by the optimal linear cost (point B), a linear cost that is smaller than optimal (point A), and bigger than optimal (point C). Quadratic cost is 𝜇 = 5. (B) Spiking activity of the network without the external input, for linear cost corresponding to points A,B and C. (C) Estimated signal without the external input, with linear cost corresponding to A,B and C. (D) Estimated signal with external input, using linear cost that correspond to A,B and C.

With optimal cost parameters (𝜈 = 𝜇 = 5), the network spikes asynchronously with low firing rates most of the time, and the activity is interspersed with occasional short moments of synchrony (fig. 1.5A). Synchronous bursts occur in the presence of the ex-ternal drive (active state, first half of the trial) as well as in the absence of the exex-ternal drive (quiescent state, second half of the trial). Bursts clearly resemble to fast volleys of spikes that we studied with the network of 2 neurons. Neurons with the same sign of the weight synchronize, while plus an minus neurons spike in alternation (fig. 1.5B). The frequency of bursts in active and quiescent state depend on the parameter 𝜈 (fig. 1.5C). The frequency of bursts is high for low linear cost. For the optimal cost, bursts are still

(34)

present, but extremely rare, and they vanish altogether for higher costs (fig. 1.5C). Lin-ear cost parameter also determines the coefficient of variation of spike trains (fig. 1.5D). Coefficient of variation increases with increasing 𝜈 and is similar in active and quiescent network up to the optimum, 𝜈*. For costs higher than the optimal, the network in the

quiescent regime fires very few spikes, which might explain the divergence of 𝐶𝑉2 between

active and quiescent regime for costs higher than optimal. In the optimal regime, the

co-efficient of variation is below the coco-efficient of variation of the Poisson process (𝐶𝑉2 ≈ 0.7

for the efficient network with optimal costs while the Poisson process has 𝐶𝑉2 = 1).

-20 0 400 positive weigth negative weight 0 20 40 20 B D Linear cost Time [ms] F re q . bu rs t -30 0 30 200 400 0 5 10 15 20 3 6 Time [sec] N e u ro n id . F. r at e S ig n a l A C Linear cost C V2 N eu ro n id . S ig na l N b . s p ik e s

Figure 1.5. Bursts in the optimal regime are close to vanishing. (A) Network activity for optimal set of costs, 𝜈 = 𝜇 = 5. We show the signal with three features, overlaid by the estimate (top), spike trains (middle) and the population ring rate (low). Besides the noise in the membrane potential, the network has another source of noise, the probability of spiking (𝑝 = 0.3). (B) A zoom in into a burst of the network in (A). The estimate (top, magenta) oscillates around zero, since plus and minus neurons re in alternation. During a burst, neurons of the same sign re synchronously. (C) Frequency of bursts as a function of the linear cost, with 𝜇 = 5, 𝑝 = 1. We show results in the network with external input

(active) and without it (quiescent). (D) 𝐶𝑉2 as a function of the linear cost, in active and quiescent

working regime. 𝜇 = 5, 𝑝 = 1

We proceed by measuring statistics of coupling of the activity of single neurons to the population in the active and quiescent regime. We measure the spike-triggered population activity (𝑆𝑇 𝑃 𝐴), that evaluates the temporal correlation between the spike train of a

(35)

single neuron and the population activity (see methods). 𝑆𝑇 𝑃 𝐴 is measured as a function of the time lag between the spike train and the population activity. We find that the peak of the 𝑆𝑇 𝑃 𝐴 occurs at zero time lag (fig. 1.6A), showing that single neurons tend to spike when the population activity is the strongest. The peak in the quiescent regime is higher that in the active regime, indicating stronger synchrony in quiescent compared to the active network. This is expected, since in the active network, the input is weighted by heterogeneous weights, resulting in different net input across neurons. The amplitude of the peak depends strongly on the linear cost parameter, with the amplitude of the peak decreasing as we increase the linear cost (fig. 1.6A). The time intervals between bursts (inter-burst intervals or IBI) are decreasing with increasing linear cost (fig. 1.6C), while quadratic cost parameter determines the duration of bursts (fig. 1.6D). The duration of bursts decreases with increasing quadratic cost, as expected. In general, we find that spiking statistics is qualitatively similar in the active and quiescent working regime (fig. 1.6).

Spatially organized network with local connectivity

So far, we investigated a fully connected network with low rank connectivity matrix. In fact, if weights account for M features of the stimulus, the rank of the connectivity matrix

w𝑇wis also M. Besides the low-rank of the connectivity matrix, fully connected network

does not have any spatial structure. In biology, however, cortical networks are to some degree spatially structured and neurons connect mainly locally. In the last section, we construct an efficient network with spatial structure and local connectivity.

The input is now a circular variable with Gaussian statistics:

𝑠𝑚(𝑡) = 𝐴 exp(︀ cos(𝜃𝑚− 𝐼(𝑡)) − 1

)︀

(1.28)

with 𝜃𝑚 ∈ [0, 2𝜋], and elements equally spaced, 𝜃𝑚+1 − 𝜃𝑚 = 𝐶, where C is a constant.

The variable 𝐼(𝑡) is smoothed white noise (eq. 1.24), and the parameter A controls the strength of the input. Neurons share the same spatial organization as the input and respond to the input only if the latter is inside their receptive field. Neurons weight input variables with a local weight, representing a blob-shaped increase (ON neurons) or

(36)

P ro ba bi lit y de ns ity A B D C Duration [sec]

Lag [ms] Linear cost

S pi ke -t rig ge re d po p. P ea k am pl itu de

Inter-burst interval [sec] Linear cost M ea n IB I M e a n du ra tio n Quadratic cost

Figure 1.6. Linear cost regulates intervals between bursts, quadratic cost their duration. (A) Spike-triggered population activity (𝑆𝑇 𝑃 𝐴) in active and quiescent network. 𝜎 = 0.25, 𝜇 = 5, 𝜈 = 5.75 (B) 𝑆𝑇 𝑃 𝐴 as function of the linear cost, for two levels of noise. We show results in the active (full line) and quiescent (dashed) network. (C) Distribution of inter-burst intervals for three values of the linear cost. Results in the active network are in full lines and dashed lineas are for the quiescent network. Empirical distributions have been tted with gamma distribution. (D) Distribution of the duration of bursts for three values of the quadratic cost in active and quiescent network. Distributions have been tted with non-parametric kernel-smoothing distribution.

decrease (OFF neurons) in, e.g., the luminance of the stimulus. We have 200 ON and 200 OFF neurons. 𝑤𝑖𝑗 = 𝑒𝑥𝑝(𝐵 cos( 2𝜋(𝜃𝑗− 𝜃𝑖) 𝑁 ) − 1), ON (1.29) 𝑤𝑖𝑗 = −𝑒𝑥𝑝(𝐵 cos( 2𝜋(𝜃𝑗 − 𝜃𝑖) 𝑁 ) − 1), OFF (1.30)

Parameter 𝐵 controls the width of the tuning curves. The concept of ON and OFF neu-rons is similar to plus and minus neuneu-rons we had in the random network. ON and OFF neurons with the same position of the peak are considered to have the same physical po-sition on the layer. Because only nearby neurons have a nonzero intersection, connections in the network are local. Similarly to plus and minus neurons, connections between neu-rons of the same polarity are inhibitory, while connections between neuneu-rons of opposite

(37)

polarity are excitatory. Same as with random network, we add the white noise to the membrane equation of each neuron.

We search for the optimal working regime of the network by computing the ⟨Total error⟩ as a function of cost parameters 𝜈 and 𝜇. Similarly to the behavior of the all-to-all con-nected network, we get several minima (fig. 1.7A, red dots). We test the activity of

the network for the set of optimal parameters that coincides with the diagonal, 𝜈* _{= 𝜇}*

(red square on fig. 1.7A). The network represents the input precisely and with moderate firing rates (fig. 1.7B). As we use a set of costs that is lower than optimal, (yellow square on fig. 1.7A), the network shows occasional bursts (fig. 1.7C). As a difference to the all-to-all connected network, bursts do not engage the entire network simultaneously, but propagate through local connections according to network’s topology. Recurrent connec-tions only engage neurons with overlapping weights, first of neurons with the same sign of the weight, and then of neurons with the opposite sign of the weight (fig. 1.7D). In the random network, plus and minus neurons are firing in alternation in every time step (fig 1.5B), while in the network with spatial structure, the activity of smaller ensembles of ON and OFF neurons alternate on a slower time scale (fig. 1.7D). In spite of the fact that lateral connections are activated only locally, the wave of activity often travels through the big portion of the neural layer (fig. 1.7C), in particular when there is no external input.

The alternation of firing of ON and OFF ensembles can be put in evidence with the cross-correlogram (fig. 1.8A). Locally, pairs of neurons with the same sign of the weight fire in phase (ON/ON pairs and OFF/OFF pairs), while pairs of neurons with the opposite sign of the weight fire in anti-phase (ON/OFF pairs). Because of the local connectivity, distant pairs show close to zero interaction. As the wave of activity propagates, the membrane potential of all neurons is perturbed, even though not all neurons necessarily fire exactly one spike per burst (fig. 1.8B).

We measure the frequency of bursts as a function of the linear and quadratic cost con-stant (fig. 1.8C). As expected, busts are frequent for for low costs and disappear for high costs. As we overlay points corresponding to optimal cost parameters (red dots, taken from the fig. (fig. 1.7A)), we can appreciate that optimal sets of cost parameters

(38)

coin-S ig na l A N eu ro n id Time [sec] B D C Time [sec] Linear cost Q ua dr at ic c os t N eu ro n id S ig na l N eu ro n id Time [ms]

Figure 1.7. With local connectivity and spatial structure, bursts travel through the network. (A) Total error as a function of linear and quadratic cost constants. The error is measured on the scale of the natural logarithm. Red points are sets of optimal parameters. The error is measured in the active state. (B) Activity with an optimal set of cost constants, marked with red square in (A) (𝜈 = 𝜇 = 19). We show the signal (top), the estimate (middle) and the spiking activity (bottom). The last third of the trial is without external input. (C) Same as in (B), but for a suboptimal set of cost parameters

𝜈 = 𝜇 = 12 (corresponding yellow square in (A)). Parameters: 𝐴 = 0.3, 𝐵 = 50, 𝜆𝑠= 500−1, 𝜎 = 0.1,

synaptic delay is 2 ms, identical across neurons. (D) Zoom-in into a burst from (C), showing the spiking activity of ON and OFF neurons.

cide with cost parameters that make the activity transition from bursty to asynchronous working regime.

Finally, we test the robustness of the model by adding an additional source of noise, a failure of spike generation. As the neuron reaches the threshold, a spike is generated with the probability 𝑝 = 0.3. We estimate the ⟨Total error⟩, jointly for the range of linear and quadratic cost constants. In comparison with the ⟨Total error⟩ with a single source of noise (fig. 1.7A), optimal sets of cost parameters are now shifted towards lower quadratic costs and higher linear costs (fig. 1.8D). We argue that suppressing spikes has a similar effect on network dynamics as the reset current, controlled by quadratic cost. As many

(39)

0 150 Q ua dr at ic c os t c on st . A B D C

Linear cost const.

Lag [ms] Time [ms] P ro b. c oi nc id en t s pi ke N eu ro n id.

Linear cost const.

Figure 1.8. Bursts are vanishing for optimal cost constants. (A) Cross-correlogram for an average pair of neighboring ON/ON neurons (blue), neighboring ON/OFF neurons and distant ON/ON neurons (red). (B) Spike train (top) and membrane potential (bottom) of 5 neighboring neurons during a burst. Activity is measured in the quiescent network. (C) Percentage of time spent in bursts, as a function of linear and quadratic cost constants. The criterion for having a burst is to have at least 20 % of neurons active in a short time window. Red points mark optimal costs from (g. 1.7A). (D) Total error as a function of linear and quadratic cost constants for the network with two sources of noise. To the noise in the membrane potentials, we add the probability on spiking 𝑝 = 0.3. Optimal costs are in red.

spikes fail, the quadratic cost should therefore be lower. Importantly, it remains true that an optimal set of costs (red square on fig. 1.8D) gives asynchronous working regime with occasional bursts, and costs smaller than optimal give a bursty network.

1.4 Discussion

We have built a network with coding functionality that also takes into account the energy expenditure due to spiking. The network is robust to noise, and does not require learning of specific stimuli, but instead encodes arbitrary input signals. Controlling the importance

(40)

of metabolic costs over coding precision, we get a continuum of working regimes from asynchronous spiking to highly synchronized network. Importantly, we can find a set of costs that give an optimal working regime of the network. Optimal costs on spiking coincide with working regime where spontaneous bursts are vanishing. This is true for random networks with all-to-all connectivity as well as in locally connected networks with spatial structure, and therefore seems to be a generic property of the model. Working regime where the bursts are vanishing is a signature of maximally precise coding, where the network is maximally sensitive to its inputs, but not as sensitive as to drive its own activity through lateral connections. The fact that the optimal trade-off between metabolic and coding efficiency gives this very specific working regime in both all-to-all and locally connected network is not trivial.

Coding mechanism of the efficient network allows for reliable encoding of input variables with spike trains that are highly variable from one trial to another. Two properties of the model make this possible. First, we need that the number of encoded variables is much smaller than the number of neurons. As the dimensionality of spike trains is bigger than the dimensionality of the stimulus, multiple spiking patterns can encode (and be decoded as) the same signal. Second, groups of neurons that share a redundant coding function are forced to spike efficiently. This enforces that small number of spikes (ideally only one spike per variable) are fired in one time step. Different initial conditions and different realization of the noise will then cause a different spiking pattern in every trial.

As a downside, the efficient coding mechanism requires neurons that can have both excitatory and inhibitory effect on other neurons, and neurons therefore do not obey Dale’s law. While one can imagine neurons in the efficient network as functional and not biological units, where the inhibitory effect between neurons with the same coding function could come from an inhibitory neuron in between (excitatory neuron 1 excites the inhibitory neuron, which in turn inhibits the excitatory neuron 2), the importance of precise spike timing on the coding function as well as on the stability of the network does not allow to make such approximations. In fact, present network computes its representation with millisecond precision. Also, as demonstrated with analytical examples in the present work, synaptic delay of only one millisecond can have a great impact on

Coding of low-dimensional variables with spiking neural networks