Brain-inspired methods for achieving robust computation in heterogeneous mixed-signal neuromorphic processing systems

Neuromorphic processing systems implementing spiking neural networks with mixed signal analog/digital electronic circuits and/or memristive devices represent a promising technology for edge computing applications that require low power, low latency, and that cannot connect to the cloud for off-line processing, either due to lack of connectivity or for privacy concerns. However, these circuits are typically noisy and imprecise, because they are affected by device-to-device variability, and operate with extremely small currents. So achieving reliable computation and high accuracy following this approach is still an open challenge that has hampered progress on the one hand and limited widespread adoption of this technology on the other. By construction, these hardware processing systems have many constraints that are biologically plausible, such as heterogeneity and non-negativity of parameters. More and more evidence is showing that applying such constraints to artificial neural networks, including those used in artificial intelligence, promotes robustness in learning and improves their reliability. Here we delve even more into neuroscience and present network-level brain-inspired strategies that further improve reliability and robustness in these neuromorphic systems: we quantify, with chip measurements, to what extent population averaging is effective in reducing variability in neural responses, we demonstrate experimentally how the neural coding strategies of cortical models allow silicon neurons to produce reliable signal representations, and show how to robustly implement essential computational primitives, such as selective amplification, signal restoration, working memory, and relational networks, exploiting such strategies. We argue that these strategies can be instrumental for guiding the design of robust and reliable ultra-low power electronic neural processing systems implemented using noisy and imprecise computing substrates such as subthreshold neuromorphic circuits and emerging memory technologies.


Introduction
With the advent of deep networks for artificial intelligence [1], and the increasing need of special purpose low-power devices that can complement general-purpose power-hungry computers in 'edge computing' applications [2,3], several types of event-based approaches for implementing spiking neural networks (SNNs) in dedicated hardware have been proposed [4][5][6][7][8][9]. While many of these approaches are focusing on supporting the simulation of large scale SNNs [4,6,9], on converting rate-based artificial neural networks (ANNs) into their spike-based equivalent networks [10][11][12], or on processing digitally stored data with digital hardware implementations [13][14][15][16][17], the original neuromorphic engineering approach, first introduced in the early '90 s, proposed to implement biologically plausible SNNs by exploiting the physics of subthreshold analog complementary metal-oxide-semiconductor (CMOS) circuits to directly emulate the bio-physics of biological neurons and synapses [18,19].
Today this approach has been extended to include emerging nano-scale memory technologies and a wide range of different types of memristive devices [20][21][22][23][24][25]. While more difficult to control due to the analog and noisy nature of the subthreshold analog circuits and the variability of the memristive devices, this approach has the potential of leading to the construction of extremely compact and low-power brain-inspired neural processing systems [26][27][28][29]. Neuromorphic processors built following this approach typically comprise many spiking neuron and dynamic synapse circuits that can be configured to carry out complex spike-based signal processing and learning tasks in real-time. Similar to the biological neural systems they model, these types of neuromorphic systems are extremely low power, operate in a massively parallel fashion, and process information using both analog and asynchronous digital processing methods [28,30]; they are adaptive, fault-tolerant, and can be configured to display complex behaviors by combining multiple instances of neural computational primitives. On the other hand, as these types of systems are radically different from standard computing platforms based on the Turing machine concept and the von Neumann architecture, there is no well established formalism for 'programming them' to carry out pre-defined procedures. Furthermore, due to their analog, continuous-time, and in-memory computing nature, they do not use bit-precise Boolean logic operations, they cannot represent signals with arbitrary precision, and do not support the storage of large amounts of state variables in dense and compact dynamic random sccess memory (DRAM) memory blocks. In particular, like their biological counterparts, their processing elements, such as the neuron and synapse analog circuits, are strongly affected by variability and device mismatch [31][32][33]. All these properties pose significant challenges to understanding how to use this technology to carry out robust computation, in the face of device variability, and without being able to use the classical formalism of computation based on Turing machines.
In this paper, we address the problem of achieving robust and reliable computation using underlying hardware that is noisy and highly variable. This work is in line with the recent investigations that analyze the role of variability and heterogeneity in neural computation and that attempt to exploit it for improving learning performance [34][35][36]. Rather than attempting to minimize the effects of device variability with brute force approaches, we propose brain-inspired computational strategies to counteract, and even exploit, the effects of heterogeneity on spike-based computation. We present neural computing primitives that use these strategies for representing and processing signals in a robust manner, and validate them using CMOS neuromorphic chips designed following the original neuromorphic engineering approach [18]. In particular, we demonstrate how the use of population coding [37], Excitatory-Inhibitory (E-I) balanced networks [38][39][40], and Winner-Take-All (WTA) architectures [41,42] can be exploited for controlling the precision of the signals in these neuromorphic circuits, and we demonstrate how the choice of using signal representations based on population codes, and brain-inspired computational primitives leads to important additional advantages in terms of speed of computation, coding efficiency, and power consumption.
In the next section, we describe the types of neuromorphic systems that we use in this study and quantify the amount of variability present in the neurons, due to device mismatch. In section 3 we demonstrate how brain-inspired strategies can effectively reduce variability effects, with quantitative measurements, as a function of population size and integration time. Furthermore, we show how such strategies offer additional computational advantages, for example in increasing the precision of variable representations, in restoring signals, or in implementing working memory and state-dependent computation. In section 4, we discuss the benefits of adopting the principles of neural design [43] that we demonstrated with experimental data, and in section 5 we present the concluding remarks.
In these systems input signals are typically represented as trains of digital pulses. These spikes are integrated by the synapses and converted into currents. The outputs of multiple synapses are then summed together and, in many implementations, are integrated by current-mode log-domain filters. The resulting Figure 1. Die photo of the DYNAP-SE multi-core neuromorphic processor, with single neuron element highlighted. The chip was fabricated using a standard 0.18µm 1P6M CMOS technology. Neurons between cores and between different chips can be interconnected by programming the on-chip CAM memory cells and the asynchronous digital routers labeled R1, R2, and R3. Analog parameters can be set independently for each core using on-chip 12 bit analog-to-digital bias generators. current, which represents a weighted sum of the inputs, is then fed into the neuron circuit, which integrates it and produces a spike, if the integrated signal exceeds the neuron's spiking threshold. In most cases all these circuits are passive (i.e. there is no active -always on-component), and if there is no input data, there is no dynamic power consumption. For this reason, this approach is particularly attractive in the case of applications in which the signals have sparse activity in space and time.

Mixed-signal analog/digital processors
The brain-inspired computational strategies proposed in this paper apply to all types of heterogeneous neuromorphic processing systems. However, here we demonstrate their benefits using the Scalable Dynamic Neuromorphic Asynchronous Processor (DYNAP-SE) neuromorphic SNN chip. This neuromorphic processor, originally proposed and fully characterized in [7], comprises both analog subthreshold circuits to emulate the neural and synaptic dynamics, asynchronous digital circuits to route spikes among the neurons and program different network connectivity schemes, and local CMOS memory cells distributed within and between neuron elements to store network parameters and connectivity routing tables (see figure 1). The DYNAP-SE is a multi-core architecture, with 4 cores of 256 silicon neurons each, and an asynchronous inter-core and inter-chip hierarchical routing scheme. The input and output spikes are encoded as address events and are transmitted across cores using the Address-Event Representation (AER) communication protocol [55,56]. The silicon neuron circuits reproduce the dynamics of the Adaptive-Exponential Integrate and Fire neuron model [29,57], and the synapses use Differential Pair Integrator (DPI) current-mode circuits [29] to reproduce synaptic dynamics with tunable time constants that range from micro-seconds to hundreds of milliseconds. Each neuron has a local Content Addressable Memory (CAM) block containing 64 12 bit addresses representing the identities of the pre-synaptic neurons that it is subscribed to. Input address-events are extended by local pulse extender circuits and converted into weighted currents that are then summed into the DPI synaptic dynamics blocs. Four different synapse types can be chosen for each neuron: AMPA (fast excitatory), NMDA (slow, voltage-gated excitatory), GABA-b (inhibitory, subtractive), and GABA-a (inhibitory, shunting). Each synapse type is implemented with a dedicated DPI circuit and independent parameters.
The synapse and neuron parameters are programmable via a 12 bit temperature-compensated on-chip bias generator [58]. All parameters are globally shared within a core, and there are four independent bias generators per chip (one per core). Due to the analog nature of the silicon neurons and their device mismatch, the shared parameters used to set temporal and functional characteristics of the circuits, such as refractory period, synaptic efficacy, and time constants, vary between individual neurons and synapses. For this work we used a board with four DYNAP-SE chips, a custom Field Programmable Gate Array (FPGA) device, and a USB interface to a standard PC for configuring circuit parameters, setting up synaptic connections, sending input events and reading out neural firing activity. The board allows real-time measurement of the spiking activity of all the 4096 neurons in the board, measured as address-events, via the AER protocol, and of the analog neuron membrane potential of one (user addressable) neuron per core, with a total of 16 parallel voltage traces.

Device mismatch effects in neuromorphic circuits
Variations in silicon doping or mismatched geometries is intrinsic to the fabrication process of CMOS devices [31]. Device mismatch yields heterogeneous electrical properties that affect the behavior of the analog circuits, even if they have identical geometries at design time. Figure 2 shows measurements characterizing the device mismatch effects on the response properties of the analog neuron and synapse circuits implemented on the DYANP-SE. For example, when injecting the same input current to multiple instances of integrate and fire neurons of one core, the integration and spike generation circuits produce different delays in the time-to-first-spike (see figure 2(a)). Similarly, when multiple synapses that share the same weight parameters are stimulated, the device mismatch in these circuits affects the height of their response (see figure 2(b)). When measured across multiple instances of synapse and neuron circuits belonging to the same chip, the response properties of the neuromorphic circuits that we design produce distributions which have typical coefficients of variation (CVs) ranging from 10% to 20% (e.g. see figures 2(c) and (d)).
To quantify the device mismatch effects in the DYNAP-SE circuits accurately, we first defined a set of shared parameters that produce the desired average neuron and synapse behaviors and then systematically measured the circuit responses across all synapse and neuron circuits integrated on the chip. We automated the data acquisition process using a computer-controlled oscilloscope and measured the analog subthreshold membrane response of the neuron, in response to different types of inputs. We recorded all the measurements on a work-station and carried out standard signal analysis routines to derive the neuron refractory period, the neuron time constant, the synaptic weight and the synaptic time constant from the recordings (see figure 3).
The refractory period of individual neurons was measured by driving the neuron to produce regular spikes trains with constant input currents, as depicted in figure 3(a). The refractory period was defined as the time interval between the action potential reset and the voltage of the neuron rising above the 20% of its value at rest.
The neuron's time constant was estimated by fitting its response to a step input current small enough to keep the neuron's membrane potential below its firing threshold. The fit was performed for the decaying part of the circuit response, corresponding to the removal of the input current (see figure 3(b)).
The synaptic weight and time constants of the synapse were estimated indirectly by analyzing the neuron membrane potential using a more elaborate protocol: we first set the neuron time constant to very short values to allow the neuron to follow faithfully its input currents; then, to generate excitatory or inhibitory post-synaptic potentials (EPSPs and IPSPs, respectively), we stimulated the neurons with 20 Hz spike trains and analyzed the impulse response decay in-between input spikes (see figure 3(c)). The weight values were estimated by measuring the PSP amplitude at the onset of the input spike, and the time constant was estimated by fitting the curve with a decaying exponential.  According to the measurements performed, the neuron time constant parameter has a CV of 18%, and it is refractory period of 8%. Similarly, the CV of the synapse time constant ranges between 7% and 10% depending on its type (NMDA or AMPA) and the weight parameter from 14% to 30%. The detailed distributions of these parameters measured across 256 neurons of one core are shown in figure 4. The figure illustrates how neuron refractory period, neuron membrane time constant and excitatory synapse weight distributions change for four different parameters settings, controlled by changing the values of their respective biases currents. As shown, the CV of such distributions tends to stay constant across different parameter settings. This allows us to determine exact mismatch measures for individual properties across every core.
Even though the variability of different parameters can exhibit different spatial distributions across the chip area, as evidenced by the patterns shown in figure 5, these spatial distributions differ for each parameter, and are generally uncorrelated between parameters. As a consequence, the superposition of these effects  results in a heterogeneous neuron firing behavior that has no particular correlation with the location of the neuron on the chip layout, or with the specific chip used. Figure 6 shows the combined effect of both synapse and neuron variability: the plot shows the average firing rate of 16 neurons belonging to the same core, and sharing the same parameter settings. Each neuron is stimulated via its own excitatory synapse, driven by the same set of input spike sequences of increasing frequency. Although the variability of both synapse and neuron parameters produces response profiles with large differences, the ensemble average response follows the desired profile with good linearity.

Neural processing strategies for robust computation
All physical systems that represent variables with analog signals (including electronic and biological ones) are susceptible to the effects of noise and variability. An effective strategy that can be used to improve their accuracy, is to increase the signal-to-noise ratio by resorting to signal-restoration processes at every signal processing stage. In digital systems, this is achieved by discretizing the signal levels to binary values and to restoring them to these discrete levels using logic gates. Many classical analog electronic signal processing systems also perform signal restoration by using high gain amplifiers and analog filters at multiple stages of the communication pipeline. This strategy, however, comes at a great cost of high power consumption and large area overhead.
As in electronic chip design, animal brains have evolved to minimize both metabolic and wiring costs (i.e. power consumption and area) [26,59]. The strategies used to achieve this include the miniaturization of components, the representation of information with the use of ensemble methods and energy-efficient codes [26,60]. The most basic ensemble method is averaging, but cortical circuits also use more elaborate neural ensemble methods which make use of population codes and canonical microcircuit neural primitives [61][62][63]. Importantly, the brain also exploits plasticity and adaptation at multiple spatial and temporal scales to compensate for noise and variability. Here we will show how some of these strategies can be adopted in neuromorphic systems design to mitigate the effects of variability, or even to exploit it, for achieving robust and efficient computation.

Averaging across space and time
Spiking neural electronic systems made of high precision components or ideal neurons in computer simulations represent signals with high fidelity. In analog circuits, very high levels of precision for representing signals can be achieved only at great costs. Alternatively, a practical solution, extensively exploited also in the brain, is to relax the high precision constraint for single computational units, the neuron, and to distribute computation in space and time. Feeding signals to populations composed of heterogeneous units with uncorrelated mismatch in their properties, as in figure 6, naturally reduces the effect of noise and variability in the single units. Averaging signals over time filters out irregularities in the spike sequences generated by both the input sensors and by the neural processing parts of the system.
The first strategy, of space averaging encoding, can be implemented, for example, by feeding the output spike train of one unit to multiple neurons in a cluster. We tested this strategy by carrying out an experiment in which a node producing a regular spike train of 200 Hz drives a population of 256 neurons, in one DYNAP-SE core, in a way to produce an average output rate of 50 Hz. We computed their individual output firing rates during one second of recording and used its distribution across the cluster to compute the rate CV of different size clusters, ranging from two clusters of 128 neurons to 4 clusters of 64 neurons each, to 8 of 32 neurons, and so on, up to 256 clusters each made of one neuron. For each combination, the CV of the rate is computed from the mean and variance of the firing rate distribution across clusters, grouping neurons selected within the same single core. To measure average values of the CV, we computed the mean CV over ten reconstructions of the same configuration with shuffled (regrouped) neurons across clusters. Figure 7(a) shows the firing rate CV computed in this way, as a function of cluster size. Our data are consistent with the averaging theory, which states that the standard deviation and the coefficient of variation of a distribution is proportional to the √ N cl , where N cl is the cluster size (see figure 7(a)). Thus, the balance between employed resources and acceptable cancellation of mismatch, i.e. firing rate CV reduction, results from a trade-off that scales with √ N cl . In the second time averaging encoding strategy, the integration time window is a relevant parameter. Unlike neurons simulated with digital hardware, but very much like biological ones, mixed signal analog/digital silicon neurons produce irregular spike trains even if stimulated with constant inputs, and exhibit heterogeneity in their firing rates across multiple trials even if the input stimulus is always the same. If we fix an integration bin size, within the same bin, fast firing neurons provide more information than slow ones. In figure 7(b), we estimate the CV for increasing bin sizes, ranging from 20 ms up to 5 s, calculated for each choice of cluster size. The trade-off between readout time and amount of resources is now evident: precise rate encoding requires large clusters. For example, in this experiment an equivalent bit precision of 14 bits is achieved only by using clusters that comprise 128 neurons. However, by combining longer integration times (e.g. 100 ms-200 ms) with cluster sizes of 8 or 16 neurons it is possible to achieve equivalent bit precision of 8 bits, which appears to be adequate for many artificial intelligence and neural processing tasks [64][65][66].
An advantage of this approach is that the chip designer does not have to make critical decisions (such as choosing the number of bits to use in a digital bus) at chip design time. The size of the cluster and the integration time can be flexibly changed and even adapted dynamically at run time.

Using population codes
Encoding signals in a robust and reliable manner is a fundamental step for processing and computing. In the nervous system this is done seamlessly at the site of sensory input and further elaborated at relay stations along the path to the central nervous system via populations of neurons. However, to carry out robust neural computation, it is also important to choose the proper way to represent signals. A common strategy adopted by the nervous system is to use distributed representations [67], such as population codes [37]. Population codes are used across the nervous system to represent many types of signals, such as visual [68] or auditory cues and their spatial localization [69]. Population codes, and more generally distributed representations, are tolerant to damage, noise, and in the case of neuromorphic implementations, to device mismatch. Indeed, heterogeneity and neuronal diversity in population coding can greatly enhance the network's information capacity [70].
A basic distributed representation can be implemented by using populations of neurons subdivided into independent clusters that represent the value of their input signal. To assess the benefits of this representation in mixed-signal neuromorphic systems, we performed an experiment in which we encoded the activity of input nodes with populations of silicon neurons arranged as uncoupled clusters (see figure 8(a)). Each cluster comprises a set of neurons that are not interconnected, but that receive the same spike train from the cluster's input node. Figure 8(b) shows the response of the cluster neurons to a bump signal spatially distributed across the input nodes. Note that, although the input nodes have a one-to-one non-overlapping connectivity to the neuron clusters, we activate all of them in parallel with a spatially distributed profile to  emulate a distributed representation also at the input level. Given their shared input and the strong synaptic weight values used, the neurons in each cluster would tend to produce strongly correlated spike trains if they were homogeneous. In our case, however, the heterogeneity of the silicon neuron circuits is beneficial in reducing the effects of strong input with temporal correlations. This has been shown to be effective in improving encoding accuracy [71]. Forcing the neurons to saturate at relatively low firing rates can be beneficial for reducing the overall system power consumption. However, it limits the output dynamic range of the neurons, and hinders their ability to faithfully encode inputs that have wide dynamic range. This limited dynamic range problem is faced also by biological neurons: real neurons typically have low firing rates and a small dynamic range, compared to the signals they must represent, especially in the early sensory stages. Using populations of heterogeneous neurons to average out the effects of variability can solve this limited dynamic range problem: as postulated in [72], populations of N integrate and fire neurons can faithfully encode band-limited signals that have N times the bandwidth of individual neurons. So resorting to the use of populations of neurons for representing signals while reducing the effect of variability has the added benefit of allowing the system to represent signals with high dynamic ranges that exceed the range of individual neurons. Furthermore, an important requirement of this theory is that neurons do not start from identical initial conditions [72]. So using populations of heterogeneous neurons, such as those implemented with analog circuits and/or with memristive devices, naturally satisfies the requirements of the theory. To validate this theory with our neuromorphic circuits, we carried out an experiment in which we stimulated with a single Poisson input spike train a population of silicon neurons in a cluster of 16 units. The input node was configured to spike with an average firing rate of 100 Hz, while all neurons in the cluster produced firing rates with a maximum value of about 40 Hz. Figure 9 presents the experimental measurements. As shown in panels 9(a)-(e), although individual neurons in the cluster cannot reproduce the fine details of the high dynamic range input signal, the population average (represented by the red line in the raster plots) can follow the input reliably. There is strong evidence that real neural systems also use this strategy, for example to encode signals in the vestibular system [73].
Also in this case, in addition to averaging and reducing the effect of variability, resorting to using populations of neurons produced extra important advantages for encoding high bandwidth signals with low-bandwidth (and low power) silicon neurons. Moreover, although we restricted the analysis of the networks using mean firing rates without studying the effects of precise spike-timing, the same networks could encode signals exploiting the dynamics and the timing of input/output signals, for example using rank-roder neural codes [74].

Using recurrence and self-excitation
In addition to choosing the right representation, computation heavily relies on using gain in the processing pathway. One way to implement and modulate gain in networks of neurons is to change the strength of the synaptic weights from layer to layer. However, this process, typically achieved via synaptic plasticity, can be very slow and does not support modes of operation that require fast gain changes to carry out the desired computations. Alternatively, a strategy commonly found in the nervous system that overcomes this problem is the use of recurrence, with both positive and negative feedback. By adding recurrent excitation to the population of neurons, it is possible to control and modulate the gain of the network quickly, following the fast changes in the activity of neurons rather than the slow changes in synaptic efficacy of relevant synapses. Indeed, as gain modulation is directly affected by the network activity, it can happen at much higher speeds than those dictated by plasticity mechanisms.
To demonstrate how this strategy is effective also for recurrent networks of silicon neurons, we added self-excitation to the cluster of 16 neurons of figure 9 (see figure 9(f)). This has the effect of increasing the gain of the network, driving the response of the network to much higher firing rates compared to the 40 Hz baseline of panel 9(d) (see red data in figure 9(g)), while still being sensitive to the changes in the input signal, as evidenced by the linear input-output relationship measured in figure 9(h). However, as this gain modulation is obtained via positive feedback, it can be difficult to control. For example, in this case and with the parameter settings chosen, the network maintains its activity in a high persistent state even after the input has been removed. This can be a desirable effect for developing attractor networks [75,76], but it can also be an undesired effect in other cases. The brain-inspired strategy that can be used to keep this effect under control is to add a negative feedback loop in parallel with the positive feedback one. This is achieved by projecting the activity of the excitatory neurons to a population of inhibitory neurons that, in turn, inhibits back the excitatory population (see figure 9(i) and (j)).

Balancing excitation with inhibition
In the configuration of figure 9, a single common Poisson source of spikes (with ISI CV = 1) was used to stimulate all neurons in the cluster. This led to correlated firing in the cluster, especially in the presence of recurrent excitation: the average ISI CV of the population spike trains in figure 9(g) decreased to 0.1. This impairs energy efficiency and is detrimental for signal encoding [70,77]. Recurrent inhibition in architectures with excitatory and inhibitory synapses can help to decorrelate firing activity and significantly enhance coding efficiency [78][79][80]. This is also true for our silicon neuron network of figure 9(i). The recurrent inhibitory feedback leads to an excitatory/inhibitory balance that has the effect of producing sparse and decorrelated activity: the average ISI CV of the data in figure 9(j) increased back from 0.1 to 0.37.
We initially introduced recurrent inhibition as a negative feedback loop to better control the gain of the network and reduce the average firing rates in the cluster. As this mechanism is effective in reducing correlations among the neurons, it produces additional beneficial effects for efficient signal encoding and for memory compression [81,82], which could be exploited for storing memories in mixed signal or hybrid neuromorphic/memristive architectures [83,84]. In addition, the asynchronous firing state produced in this way can generate an optimal noise structure, enabling the network to track input changes rapidly [85,86].

Using soft Winner-Take-All networks
By combining the recurrent inhibition mechanisms used to sparsify neural activity of figure 9(i) with the distributed representation population coding scheme of figure 8, we can implement networks that exhibit both competitive and cooperative features, and that can support a wide range of useful computational features. Figure 10(a) shows an example of such a network: cooperation is mediated by excitatory connections with local connectivity (nearby clusters are connected via excitatory connections), and competition is achieved by means of global inhibitory connections: all clusters are inhibited by the population of inhibitory neurons, which are stimulated by the excitatory neurons of all the clusters in the network. These types of networks are often referred to as soft Winner-Take-All (sWTA) networks. In these networks, nearby neurons are biased to have similar response properties (e.g. similar stimulus preferences, or receptive fields) and thus create a map in which close-by units represent similar features.
Depending on their parameters, the same sWTA network can be used to process continuous signals and represent different features that change smoothly in feature space (e.g. the orientation of a visual stimulus) or to manipulate discrete symbols such as numbers and numerable variables (e.g. one of n possible keywords). The recurrent connections in the network make the outputs of individual clusters depend on the activity of the whole network, and not just on the neurons driven by the local input [87]. As a result, sWTAs can perform both linear operations, such as amplification by a linear gain or locus invariance, and complex non-linear operations, such as normalization, selective amplification and non-linear selection, multi-stability, or signal restoration [42]. Interestingly, it has been observed that, despite significant variation across cortical areas, sWTA types of connectivity patterns are found throughout all of the neocortex [88,89]. Indeed, this architecture is a 'canonical microcircuit' that can be used as a fundamental computational primitive for multiple types of both signal processing and computing tasks [42]. It has been shown that the computational abilities of sWTAs are of great importance in tasks involving feature-extraction, signal restoration and pattern classification problems [41]. Artificial neural networks with this architecture, and their neuromorphic hardware implementations, have been used to detect elementary image features (e.g. oriented bars) and reproduce orientation tuning curves of visual cortical cells [90][91][92]. Figure 10(b) shows the experimental measurements from the sWTA network formed by the silicon neuron circuits of the DYNAP-SE chip, with (blue solid line) and without (gray, dashed line) the recurrent connections. The black dashed lines in figure 10(b) represent the input to the network. As for figure 8(b), the output firing rate of the silicon neurons is kept low via the refractory period setting to reduce power consumption. So while low frequency inputs are followed faithfully, high frequency ones are scaled to lower values. However, what is important is the effect of the recurrent inhibition: this network of silicon neurons can reproduce the selective amplification features expected from sWTAs models, amplifying with a gain higher than one of the strongest inputs while at the same time suppressing the weaker ones. This gives rise to a 'sharpening' of the tuning curve similar to what has been measured in real cortical cells [87,90,93].

Selective amplification
Due to their cooperative/competitive nature, clusters with the highest response are amplified while weaker ones are suppressed. To highlight the selective amplification properties of the sWTA presented in figure 10 we stimulated the network with two input bumps of slightly different amplitudes. Figure 11 shows the response of the network to these inputs. The reference response of the feed-forward network without the recurrent feedback is shown in figure 11(a). As expected, the network follows the input with two bumps of the same width and proportional amplitudes. As soon as feedback is enabled, the non-linear processing features of the network become evident: higher inputs are preserved and 'pass through' to further processing stages, while (even slightly) weaker inputs are almost fully suppressed (see panels (b) and (c) of figure 11 for the cases in which the stronger input is on the left or right side, respectively).

Signal restoration
These very same features enable the network to support another important computational primitive: that of 'signal restoration' . This is the very process that allowed the success of logic gates in digital computing systems, always restoring their output to a nominal '1' level or '0' level. If signals are encoded with distributed population codes as provided by the output of sWTA networks, then multiple layers of such networks automatically carry out signal restoration. To demonstrate this experimentally, we provided in input to the hardware sWTA network a 'bump' signal as produced by neurons belonging to another sWTA network, and corrupted it in two different ways: with 'dead' neurons (i.e. by silencing completely 30% of the input units) and with corrupted inputs (i.e. by adding 20% Gaussian distortion to the nominal value of each input unit).  As shown in figure 12 the network is able to recover the original population encoded signal and produce an output that is very close to the un-corrupted input.

Attractor dynamics
sWTA networks can exhibit dynamic characteristics of attractors underlying many cognitive functions, including decision making and working memory. Depending on their connectivity patterns, sWTA networks can open boundary conditions (e.g. to represent a variable ranging from zero to a maximum value) or closed boundary conditions (e.g. representing an angle that can take values between 0 and 360 • ). In the latter case, the sWTA network forms a ring attractor, rather than a line in the network state space. Both types of configurations have been found in different brain areas (e.g. in the brain stem for oculomotor control [94], or in the fly's central complex for navigation [95,96]). In both types of networks, the position of the neuron, or cluster of neurons, that has the highest activity represents the value of the variable being encoded. When driven by external signals, this activity bump stabilizes around the strongest input. Ideally, when the input is removed, and when the sWTA is configured to produce working memory behavior, the activity bump should persist and remain in the same position. However, in our experiments, after the input is removed, the bump of activity starts to drift (see figure 13). In these experimental results, the sWTA activity bump drifts randomly, continuously shifting between semi-stable positions across the whole population for the first two seconds of the recording. At t = 2 s an input bump is presented with a Gaussian profile. The peak of the Gaussian is indicated by the red line in figure 13. As evidenced, the network activity quickly shifts to the strongest input's location and stays there until the next input is presented. After the input is removed again (at t = 4 s) the bump starts to drift again. Thus, this sWTA does not have a single, strong attractor able to immediately cancel the memory of an input. This phenomenon has been modeled also in theoretical works, and has been shown to be controllable by endowing the network with homeostatic plasticity features [97], which can be readily implemented in neuromorphic electronic circuits [98,99]. 3.6. Linking multiple soft Winner-Take-All networks together sWTA networks have been associated with canonical microcircuits that sub-serve many computational properties in many cortical areas [100][101][102]. As these microcircuits have been found often to be reciprocally connected in the cortex [103], it is natural to hypothesize that multiple sWTA networks coupled among each other have the potential of performing even more complex computations.
Indeed, it has been shown how these coupled networks can form arbitrary relations between the variables encoded in the individual sWTA populations [104], or can self-organize to learn relations when coupled and endowed with spike-based learning mechanisms [105]. One example of a relational network formed by coupling multiple population-coded variables among each other is the model of sensory-motor mapping between head and eye positions [106]. Another example of a relational network is shown in figure 14, where three sWTA networks are coupled to form the relationship A + B = C [104,107]. The neuromorphic implementation of this relational network consists of 4 distinct populations of silicon neurons: three input 1D sWTA networks, representing the variables A, B, and C, and one hidden 2D population encoding the relationship between the three variables (see figure 14(a). The links from the 1D networks to the 2D one are bi-directional, therefore creating a recurrent network that allows the activity of 1D networks to influence the others. We configured the hardware such that the variables encoded by the networks range from 0 to 1, and we implemented closed boundary conditions such that the variables are wrapped around (i.e. if an operation increases a variable by n beyond 1, the result is only the fractional part of n). The plots of figure 14(b) show experimental data measured from the silicon neurons representing the different networks, as populations A and B were being driven by external inputs. The choice of the relationship used to demonstrate this principle was arbitrary. Furthermore, in this example we set the parameters of the networks (and in particular the weights of the hidden population) manually, to demonstrate the proper operation of the relationship chosen. It is interesting to note though that arbitrary relationships, including non-monotonic ones, such as A 2 + B 2 = C 2 , can also be trained through local spike-based learning rules [105,108].

Using spike-based learning and plasticity
Perhaps the most powerful and widespread strategy used by the brain to maximize robustness for neural computation is plasticity and learning. Indeed, a large number of neuromorphic spike-based learning circuits have been proposed to improve robustness to device mismatch and mitigate the effects of noise in both the input signals and the system parameters [49,54,[109][110][111][112][113][114][115]. Also many studies on spike-based learning circuits interfaced to memristive devices have shown how plasticity and on-line learning can overcome the variability and low-resolution properties of the nanoscale memristive devices [83,[116][117][118]. Similarly, another powerful plasticity mechanism inspired by nature that is fundamental for improving robustness and stability, is that of homemostatic plasticity [119,120]. Electronic implementations of homeostatic plasticity mechanisms have been proposed using both pure CMOS [98,99,121] and hybrid memristive/CMOS [122][123][124] technologies.
An increasing number of spike-based learning models compatible with neuromorphic computing technologies is being proposed, that can lead to novel memristive/CMOS robust and fault tolerant neuromorphic computing systems [125][126][127][128][129]. Also for these systems, variability and heterogeneity in neurons can lead to extremely powerful benefits. For example, boosting studies have proven that, if one considers single neurons as 'weak' linear classifiers, i.e. perceptrons [130] that respond to patterns they have been trained to recognize with low accuracy, then using populations of heterogeneous neurons trained to recognize the same patterns will increase the accuracy of classification and decrease the recognition error exponentially, with the population size [131].
Therefore, plasticity and learning have an extremely high potential for enabling robust computation on mixed-signal neuromorphic hardware computing substrates. However, as this is still an active area of investigation and exploring all these mechanisms would go beyond the scope of this work, we refer the reader to the literature cited in this section and to a recent survey of local learning rules compatible with the constraints of neuromorphic circuits [132].

Discussion
More and more evidence is being accumulated showing that biologically plausible constraints in neural networks, such as heterogeneity [34][35][36], non-negativity of parameters, and partitioning neurons distinct excitatory/inhibitory cell type [133] improves robustness, reliability, and overall computing performance. Neuromorphic processing systems comprising either mixed-signal electronic circuits or memristive devices, or both, naturally support and even enforce such constraints at no additional cost (e.g. without requiring extra random number generator circuits). In this work, we presented biologically inspired strategies compatible with such constraints and showed how they can support the construction of reliable artificial neural processing systems, using neuromorphic physical computing substrates that use massively parallel arrays of electronic components affected by device variability (e.g. such as subthreshold transistors, or memristive devices). Inspired by biology, we resorted to using populations of spiking neurons to reduce the effects of device-to-device and cycle-to-cycle variability. However, rather than using one single population with a large number of neurons, the strategy adopted to represent signals using population codes with distributed representations led to the use of multiple clusters (populations) with relatively small numbers of neurons. We validated this strategy using a mixed-signal neuromorphic processor: we measured the amount of variability present in the silicon neuron circuits and characterized the trade-off of population size versus accuracy in producing reliable mean firing rate figures. Although the figures reported are specific to the particular chip used, the overall approach holds for any technology and any design that uses multiple physical instantiations of analog electronic circuits to directly implement stateful elements that emulate synapse and neural function (e.g. as in neuromorphic processing systems that use memristive cross-bar arrays) [23,30]. Once we accepted the fact that robustness, precision, and fault-tolerance could be improved by trading off accuracy with area (as for this approach, the area of the physical substrate increases with the number of neurons used in the population), we found that this strategy offered many more computational advantages than just accuracy: by using populations of neurons to represent signals, and by adopting the same strategies adopted by biological neural processing systems (namely, to use distinct classes of neurons and synapses for the excitatory and inhibitory signal pathways, to balance excitation with inhibition, and to represent signals with population codes) we demonstrated how such neuromorphic processing systems can support signal reconstruction, efficient coding, and coherency of representations. We provided chip measurements demonstrating how such population coding strategies could then be exploited to create robust and reliable computational primitives such as selective amplification and signal restoration. Additional advantages of this approach have also been demonstrated in experimental and computational neuroscience, such as ultra-fast response times of populations of real neurons [134], robust and efficient representation of time-varying signals [135], and reliable propagation of information across multi-layer networks of inhomogeneous spiking neurons [136].
Unless explicitly simulated, networks of spiking neurons implemented with bit-precise digital circuits or simulated in software would not be able to exhibit the effects of variability and inhomogeneity, and they would not be able to exploit the advantages of the approaches used to cope with them presented here. This holds for standard computers, custom ANN accelerators, and fully digital time-multiplexed neuromorphic computing systems which integrate numerically the dynamic equations to simulate the function of multiple neurons [4,6,137]. On the other hand, since these digital systems support fetching data from external memory banks, they can take advantage of the high density of DRAM and implement complex large-scale systems which can accomplish extremely remarkable achievements [138], even without adopting the brain-inspired strategies and methods presented here. This, however, comes at the cost of exceptionally large computational resources and power consumption figures [139,140]. The neuromorphic systems we have been discussing in this can operate with much lower power budgets, because of their in-memory computing nature [23-25, 141, 142]. However, since they implement 'stateful neural networks' they require memory and area resources at every neuron and synapse. Therefore scaling such systems to very large sizes is still an open challenge. This challenge might be solved in the future with the advent of memristive devices and other emerging memory technologies, but it still cannot be easily addressed today. Nonetheless, rather than replacing digital ANN hardware systems, these neuromorphic systems can complement them for solving problems that do not require large-scale deep neural network (DNN) models. Indeed, there are many practical problems that can be solved with shallow feed-forward networks and populations of perceptrons [130] or small recurrent networks with temporal dynamics and a linear read-out layer [143][144][145]. For example, both the chip and principles presented here have already been applied to solving clinically relevant problems, to classify ECG and EEG signals and detect anomalies in heartbeats or indicate putative epileptogenic zones in epileptic seizure patients [146][147][148]. In general, these mixed-signal in-memory computing systems can provide advantages over classical ANN accelerators for those types of problems that require a very limited amount of resources, in terms of volume, memory, and power, and that cannot resort to off-line cloud computing (i.e. 'edge-computing' or 'extreme edge-computing' applications). Indeed, the best solution will likely lie in the combination of both approaches, using the ultra low-power neuromorphic systems to carry out 'always-on' sensory processing and basic classification tasks, and act as 'watch-dogs' that can alert and activate the more powerful and power-hungry ANN accelerators for carrying out more sophisticated data-processing tasks with higher accuracy. The design of the optimal intelligent neural network processing system will therefore lie in the judicious combination of bottom-up, brain-inspired mixed-signal circuits, and top-down application-driven digital computing system [149].

Conclusions
On our journey to achieving robust computation using inhomogeneous and noisy neural computing substrates, we found that the basic principle of averaging the activity of multiple inhomogeneous neurons led to many more computational advantages that go well beyond the expected one of reducing the variance with the square root of the number of neurons. In doing so we explored the relation between the size of the neuronal cluster size and the mean spike rate integration time, to reliably encode scalar values. We validated and quantified our studies using a mixed-signal analog/digital neuromorphic chip. Using this device as a representative example we could link the size of the cluster and the integration time to the equivalent number of bits in representing the mean spiking rate of the population. Once the number of neurons in a cluster required to achieve the desired precision has been determined, we added recurrent excitation and inhibition within the cluster, and configured the parameters in a way to keep neurons active also after removing the input stimulus to implement working memory. This strategy led to the reduction of correlations in the spike timings and cluster firing rates, which is a desirable feature in many models and applications. By adding local excitation and global inhibition among the clusters, we produced sWTA dynamics which produced even more interesting and robust computations, for example by reliably encoding scalar values with population coding, and by performing signal restoration. Finally, we showed how recurrently coupled sWTA networks can perform non-trivial operations on these scalar values, demonstrating network-level examples of neuromorphic processing systems that can be used to map sensory inputs to motor outputs, and produce state-dependent computation.
Using distributed population codes based on multiple clusters of heterogeneous neurons coupled with sWTA connectivity schemes raises questions about the optimal number of clusters and the number of neurons to use in each cluster. Recent theoretical studies can shed light on these questions, as they have shown how the robustness of these population codes depends on the activity correlations between neurons and on how their variability is shared in each cluster [80,150,151].
This work therefore demonstrates that reliable and robust computation can indeed be achieved using resource-constrained and inhomogeneous neuromorphic processing systems, by adopting the same strategies and principles used by the nervous system.

Data availability statement
The data cannot be made publicly available upon publication because they are not available in a format that is sufficiently accessible or reusable by other researchers. The data are generated on a custom chip, thus the experiments could be re-run with the same chip only, which could be done upon request. The recordings performed for the figures in the rest of the data that support the findings of this study are available upon reasonable request from the authors.