Gradient-descent hardware-aware training and deployment for mixed-signal neuromorphic processors

Ugurcan Cakal; Maryada; Chenxi Wu; Ilkay Ulusoy; Dylan Richard Muir

doi:10.1088/2634-4386/ad2ec3

1. Introduction

Neuromorphic processors use analog and mixed-signal circuits to emulate the dynamics and computational abilities of biological neurons and synapses. One of the most advanced architectures is DYNAP-SE2 [1], which has an asynchronous mixed-signal structure whose analog components operate in the subthreshold range, making it a candidate for ultra-low power and ultra-low latency applications.

Devices such as DYNAP-SE2 offer a high degree of biological realism and configurability, but have been historically difficult to configure for several reasons: their complex architecture; the vast number of parameters due to high configurability; and the lack of standardized configuration protocols. Despite this difficulty, DYNAP-SE family members have already been used in several low-dimensional signal processing applications. In [2], real-time classification of heartbeat pathologies from multi-channel electrocardiogram recordings was performed, distinguishing between nominal beats and pathological rhythms. In [3] and [4], electromyography signals were analysed to distinguish the movement of hand muscles to classify gestures. In these applications, the reservoir computing paradigm [5] was exploited. A semi-randomly initialized spiking recurrent neural network was deployed to the DYNAP-SE device to integrate the temporal patterns hidden in sensory signals. Classification was performed by a linear readout implemented on a conventional CPU, by monitoring the spiking activities of hardware neurons on DYNAP-SE. These applications demonstrated that RSNN inference on the Dynap-SE chip can operate in the sub-mW power range.

DYNAP-SE2 and other similar devices individually instantiate arrays of synapses and neurons in analog circuits. The analog nature of these circuits exposes them to variability of the individual device components, due to variations introduced in the fabrication process. This variability, known as 'mismatch', introduces diversity in the behavior of ostensibly identical neurons and synapses across and between chips [6, 7]. Mixed-signal devices such as DYNAP-SE2 do not usually permit individual control over each parameter on the chip, instead grouping parameters such as time-constants, thresholds, and even weight values across a number of neurons and synapses. This grouping reduces the parameter complexity of the device. While it inherently presents a challenge in calibration, the additional factor of mismatch further intensifies this challenge. In addition, grouping parameters means that the parameter configuration space of an SNN deployed to the chip is itself heavily constrained.

As a result, training and deploying applications to these devices usually requires several months of dedicated effort from skilled researchers. To address this challenge, our work introduces an efficient and effective methodology that provides the potential for commercial application development for DYNAP-SE2. Extending Rockpool, an open-source deep-learning library for SNNs [8], our toolchain performs offline gradient-based hardware-aware optimization of SNNs, which can be robustly deployed at scale to mixed-signal devices, while preserving behavior. This Rockpool tutorial, available at https://rockpool.ai/devices/DynapSE/jax-training.html offers an in-depth Jupyter Notebook on how to train a spiking network for use with the DYNAP-SE2 processor. It reproduces the experiment introduced in this paper, and comprehensively covers creating synthetic datasets, constructing and fine-tuning a spiking neural network with Rockpool and Jax, and evaluating the network's performance. Additionally, it addresses gradient-based optimization techniques and the complexities of device mismatch in mixed-signal chips.

This work provides a DYNAP-SE2 simulator, 'DynapSim', which operates in the same parameter space as DYNAP family processors [1, 9]. DynapSim executes an efficient and accurate differentiable simulation of the DYNAP-SE2 design dynamics to solve the characteristic circuit transfer functions over time. This differentiable computing approach has in recent years been applied to SNNs to train deep spiking networks with gradient-based methods borrowed from machine learning [10, 11].

In order to perform gradient-based optimization, a spiking neuron model requires an additional surrogate gradient function to ensure loss gradients can propagate through the neuron [10, 12–14]. Broadly speaking, this addresses the issue that the derivative of the spike generation functions used in spiking neurons are ill-formed, resulting in zero or undefined gradients when propagating through the neuron. By providing a surrogate gradient for the spike generation function, loss gradients can be preserved. In our DYNAP-SE2 neuron implementation, taking the derivative of the output spike train $S_\textrm{out}(t)$ with respect to a parameter P that affects the membrane current dynamics appears as follows:

$\begin{equation*} \dfrac{\partial S_\textrm{out}\left(t\right)}{\partial P} = \dfrac{\partial \Theta \left(I_\textrm{mem}, I_\textrm{spkthr}\right)}{\partial I_\textrm{mem}} \cdot \dfrac{\partial I_\textrm{mem}}{\partial P}. \end{equation*}$

Here, $\Theta(\cdot)$ denotes the Heaviside step function, and P is any parameter that changes the membrane dynamics such as a leakage current, gain current, etc. In order to determine the effect of changing the parameter P on the output spike train, ${\partial \Theta(\cdot)}/{\partial I_\textrm{mem}}$ must be well defined. However, the derivative of $\Theta(\cdot)$ is zero everywhere and infinte at the spiking threshold. As a solution to this problem, a continuous surrogate function is defined that substitutes the derivative of $\Theta(\cdot)$ in the backward pass of the backpropagation algorithm. In particular, we adopt a rectified linear function (ReLU) as a surrogate, with constant derivative when $I_\textrm{mem} \gt I_\textrm{reset}$ . For further implementation details of the neuron model and the surrogate function see the Rockpool tutorial https://rockpool.ai/devices/DynapSE/neuron-model.html.

DynapSim is used as a computational neuron model in offline SNN simulations and during training. Rockpool translates the optimized networks to equivalent hardware configurations, and provides straightforward deployment of these networks to DYNAP-SE2 chips. For an overview of the DYNAP-SE2 hardware, we refer the reader to [1]. We then describe the implementation details of DynapSim, and demonstrate training, deployment and quantitative evaluation of a toy model to DYNAP-SE2 hardware. We present the steps to achieve training and deployment using code examples for Rockpool.

Our implementation is available as part of the open-source Python package Rockpool: https://rockpool.ai/devices/DynapSE/dynapse-overview.html, with code available at https://github.com/synsense/rockpool .

1.1. Overview of the DYNAP-SE2 hardware

The DYnamic Neuromorphic Asynchronous Processor—ScalablE 2 (DYNAP-SE2) is a mixed-signal chip that inherits the event-driven nature of the DYNAP family [1, 9]. It directly emulates biological behavior using analog spiking neurons and analog synapses as the computational units. The transistors of the neural cores operate in the subthreshold regime, resulting in power consumption below 1 mW. Each DYNAP-SE2 chip is equipped with 1024 adaptive exponential integrate-and-fire (AdExpIF) analog ultra-low-power spiking neurons and 64 synapses per neuron. Figure 1 shows an overview of the architecture of the chip.

**Figure 1.** DYNAP-SE2 Architecture. See text for details. NC: Neural Core; R: Router; BG: Bias Generator. Other acronyms: see text.
Download figure:
Standard image High-resolution image

The DYNAP-SE2 digital spike routing architecture involves pre-synaptic neurons broadcasting their spiking activities on an internal bus using specific neuron 'tags'. Post-synaptic neurons monitor the bus for up to 64 tags each, such that they can connect to up to 64 pre-synaptic tag IDs. This approach enables both sparse and dense connection patterns, and since multiple neurons may broadcast the same tag, permits a fan in of greater than 64 pre-synaptic neurons.

In this routing system, post-synaptic neurons themselves effectively store the weight matrices through their broadcasting and listening connections. The strength of each synaptic weight is determined by the weight current configuration stored at each synapse. This means that the synaptic efficiency, or the impact of a pre-synaptic neuron's signal on a post-synaptic neuron, is determined by the amount of current assigned to that particular synaptic connection. For more comprehensive information on the specific mechanisms of weight storage and transfer in this architecture, refer to [1] and [15].

The neural computation unit serves as the primary building block for creating the dynamics in DYNAP-SE2. Each neural core (NC) consists of 256 analog neurons that share the same parameter set. The digital memory blocks, content-addressable memories (CAMs) and Static-RAMs, store the transmitting and receiving event configurations, respectively. The synapses and neuron soma carry out analog computations, with four different types of synapses—AMPA, GABA, NMDA, and SHUNT—integrating the incoming events and injecting current into the membrane. AMPA and NMDA activation increase the firing probability, while GABA and SHUNT activation decrease it.

The CAM stores the listening event setting for each of the 64 connections of a neuron, which specifies its synaptic processing unit. The neuron soma integrates the injection currents and holds a temporal state, with configurable paths of charging and discharging capacitors designating the temporal behavior. The membrane current, which is a secondary reading on the membrane capacitance, functions as the temporal state variable. When the membrane current in a neuron reaches the firing threshold a reset mechanism is activated, which returns the neuron membrane potential to a reset state. This also triggers the event sensing units, which then package the event in address event representation (AER) format and broadcast it on the internal bus. In this way, the neuron uses analog sub-threshold circuits to compute the dynamics but conveys the resulting outputs using a digital routing mechanism.

Each neural core holds a parameter group to set the neuronal and synaptic parameters for its 256 neurons and their pre-synaptic synapses. The neurons in the same core share many of the same parameter values, including time constants, refractory periods, synaptic connection strengths, and other attributes. Special digital-to-analog converters, "bias generators" (BG), set these parameter current values. In total, there are 70 parameters that can be set to adjust the behavior of the neurons and synapses, including time constants, pulse widths, amplifier gain ratios, and synaptic weight strengths, among others. A comprehensive table detailing these parameters, and their impact on the SNN simulation is provided in the appendices for reference, table 3.

For simulation purposes, a custom computational spiking neural model relates the behavioral dynamics of a computational neural setting to the VLSI parameters of the respective circuits. It uses forward Euler updates to predict the time-dependent dynamics and solves the characteristic circuit transfer functions in time. Specifically, a 'DynapSim' neuron solves the silicon neuron [16] and silicon synapse [17] circuit equations, making use of assumptions and simplifications from [18]. Further details of the application and implementation can be found in [15].

2. Methods

2.1. Task and training approach

DynapSim is an extension of the contemporary spiking neural network library, Rockpool [8], and serves as a simulation solution for the DYNAP-SE2 device. The approach it offers involves solving characteristic equations of the analog circuits and does not provide a circuit-level accurate simulation. Instead, DynapSim provides an approximate simulation that can be fine-tuned and translated into a device configuration. The simulator is powered by the state-of-the-art high-performance machine learning library JAX [19], which facilitates fast execution and just-in-time (JIT) compilation on CPUs, GPUs and TPUs. The toolchain we provide performs off-chip gradient-based optimization of a spiking neural network (SNN) and deploys the trained network to the chip while preserving the optimized behavior. The upcoming sections elaborate on the technique of gradient-based optimization employed to train a SNN before its deployment to a DYNAP-SE2 chip.

2.1.1. Toy task: frozen noise classification

The purpose of the frozen noise classification experiment is to evaluate the learning abilities of the implemented simulator. The experiment focuses on training a DynapSim network to accurately classify two distinct random frozen noise patterns. The network comprises two analog neurons with recurrent connections, along with 60 external input connections. The desired outcome is for the first neuron to exhibit a significantly higher firing rate when presented with the first frozen noise, and for the second neuron to exhibit a significantly higher firing rate when presented with the second frozen noise. Figure 2 illustrates the task at hand.

**Figure 2.** Frozen noise classification task. 60 input channels provide input spiking patterns to the network (left; 4 shown here). The network provides two output channels (right), each of which should emit high spiking activity when presented with one of two target frozen noise inputs. 'LinearJax' and 'DynapSim' modules, provided by Rockpool, are used to simulate the weights, synapse and neuron dynamics of the network (see text for further details).
Download figure:
Standard image High-resolution image

2.1.2. Data

To run the experiment with the spiking neuron model, a spiking input pattern is necessary. For this specific task, randomly generated discrete Poisson time series with a mean frequency of 50 Hz in a 500 ms duration are used as frozen noise recordings (see figure 3). Each sample comprises 60 channels, and the time-step duration is 1 ms.

**Figure 3.** Frozen noise recordings used in training.
Download figure:
Standard image High-resolution image

For training purposes, two samples are utilized to enable the network to overfit, while 1000 different random samples are reserved for testing the trained network. The optimized network should recognize the 2 critical training samples, by generating high activity on the corresponding output neuron, and low activity on the other neuron. If any other sample is provided, the network should generate random output activity.

2.1.3. Network

The network architecture used in this study consists of a simple recurrent spiking network with weighted inputs. In Rockpool, this network is constructed of two layers LinearJax and DynapSim. The first module, LinearJax, applies a linear transformation to the input spikes to simulate spike weighting. The second module, DynapSim, simulates the time-dependent analog silicon neuron and synapse dynamics. Listing 1 shows how to instantiate this network in Rockpool.

Listing 1. Constructing the SNN in Rockpool.

This network architecture can be compared to using a ReLU activation layer following a fully connected layer in classical NNs. The difference, however, lies in the fact that the DynapSim layer computes and maintains a time-dependent state instead of a stateless activation. The output of the DynapSim neurons depends not only on the instantaneous inputs but also on past inputs via internal state variables. The state continues to evolve continuously over time, regardless of when the neuron receives spikes on its input. Additionally, the DynapSim layer encapsulates a recurrent connection matrix that is one of the targets of the optimizer. Lastly, the layer corresponds to a custom analog hardware configuration, and solving the characteristic equations of the analog circuits is a key part of its function.

To limit the complexity of the task, in this case only the weight parameters are trained, while the rest of the neuron and synapse parameters are fixed to their default simulation values. This means that mathematically, only two 2D weight matrices are subject to optimization: the $60\times 2$ input weight matrix stored inside LinearJax and the $2\times 2$ recurrent weight matrix stored inside DynapSim.

2.1.4. Response analysis

For the frozen noise discrimination task, the classification of the network is indicated by the neuron with the highest mean firing rate r₀ for class 0, and r₁ for class 1. As a performance metric, the ratio between the output neurons' mean firing rates quantifies the network's ability to distinguish the two target frozen noise patterns. The firing rate ratio (FRR) is calculated by dividing the higher mean firing rate by the lower mean firing rate read from the decision neurons, as shown in equation (1).

$\begin{equation} \textrm{FRR} = \dfrac{\text{max}\left(r_0, r_1\right)}{\text{min}\left(r_0, r_1\right)}. \end{equation} \tag{ 1 }$

2.1.5. Mismatch Simulation

Each optimisation step during training includes a forward and a backward pass. The forward pass simulates the mixed-signal circuit behavior under ideal conditions, but the circuits in real sub-threshold computation devices are subject to parameter mismatch. In order to make trained networks robust against parameter deviations, Büchel et al proposed to modify values during training by injecting parameter noise as well as by an adversarial attack on parameter values [6].

DynapSim includes an empirically-verified model of parameter mismatch, which we apply in the forward pass, slightly perturbing the parameter values in the network that are subject to change. This mismatch simulation model addresses parameter variability without focusing on certain known factors like temperature fluctuations or process impurities. Instead, it offers a general approach to managing parameter mismatch. It varies the parameters under a Gaussian distribution, with the mean taken as the nominal parameter values, and variance determined by empirical measurement [6, 7]. New mismatched parameters are set every n epochs. During optimisation, the network reaches parameter values that obtain a low loss value, in spite of the parameter variation. As a result, SNNs trained in this way are less sensitive to mismatch-induced loss of performance when deployed to mixed-signal neuromorphic hardware. Figure 4 illustrates the effect of mismatch on parameter values.

**Figure 4.** Mismatch Simulation. Nominal values for two parameters (orange) are applied to the network, following which DynapSim is used to simulate the parameter mismatch that would be experienced when deployed to a DYNAP-SE2 device (blue distributions).
Download figure:
Standard image High-resolution image

2.1.6. Optimization

The optimization objective is highly dependent on the task and can be customized according to the requirements. In this particular task, the goal is to increase the firing rate of a specific neuron upon receiving a known frozen noise record. To achieve this, the mean square error (MSE) loss function is utilized.

The target signal is a uniform spike train that generates an event at every time step from one channel and that generates no events from the other channel. The mean value of the differences in rate between target and network output gives a scalar loss value to be used in error backpropagation.

In this experiment, the Adam algorithm is utilized for optimization [20], and the training pipeline is similar to a conventional machine learning pipeline. Since the forward computation involves non-differentiable spike production functions, a surrogate gradient approximation replaces these in the backward pass [10, 12–14].

Figure 5 illustrates the decrease in MSE loss over the training process. During training, the MSE loss decreased from 0.5 to 0.46 over one million epochs. Despite the straightforward task, a small learning rate was necessary to provide stable learning in the face of the complex non-linear neuron model. Even this seemingly small reduction in loss value results in a significant improvement in behavior, allowing the network to classify two similar frozen noise samples. Figure 6(a) shows the response of the network to trained input samples.

**Figure 6.** Response of the simulated trained network to trained and untrained input samples. a Response of the trained network to input classes 0 and 1. FRRs 4.1 and 8.4, respectively for these samples. b Response of the trained network to an untrained Poisson noise sample. c Distribution of FRR of the trained network to 1000 untrained noise samples. FRRs are close to 1.0, indicating the network has learned to reject these unknown inputs.
Download figure:
Standard image High-resolution image

When the first noise pattern is presented to the network, neuron 0 (channel 0) exhibits high firing activity (164 Hz), while neuron 1 (channel 1) shows significantly lower activity (24 Hz), resulting in an FRR of 6.83. On the other hand, when the second noise pattern is presented, neuron 1 (channel 1) fires almost constantly (122 Hz), while neuron 0 (channel 0) remains quiet as intended (10 Hz), resulting in an FRR of 12.2. The clear distinction between the higher and lower firing rates demonstrates that the network is capable of distinguishing between the two input patterns.

To evaluate the network's recognition capabilities on unseen data, we used a test set of random noise samples to demonstrate that the network only recognizes the training patterns. We generated 1000 frozen noise input patterns with the same mean frequency and length as the target patterns used in training. The FRRs between the decision neurons were recorded to quantify the ability of the network to reject un-trained input patterns. We expect the network to respond with FRRs close to one, indicating the input patterns are not similar to either class 0 or class 1. Figure 6(b) shows the network's repsonse to a random sample, with an FRR close to 1. The distribution of FRR values under 1000 random noise samples is shown in figure 6(c).

During the 500 ms test runs, both the first and second neurons remain active, firing at similar rates. Based on these observations, it can be concluded that the network responds strongly only to the trained target inputs, and rejects the non-trained noise inputs.

2.2. Deployment to DYNAP^TM-SE2

To deploy an SNN using Rockpool, the network is first defined in simulation and optimized using gradient-based or non-gradient-based methods. Rockpool then extracts a computational graph from the optimized network, containing all the necessary parameters for specifying the chip configuration. However, the computational graph does not include information about hardware resource allocation, so a mapping procedure is required to cluster the parameters and find a suitable hardware allocation. The parameters also need to be quantized since the DYNAP-SE2 hardware cannot support floating point precision, and converted to bias values to configure the neuron and synapse parameters. Finally, the user needs to connect and interface with the chip to deploy the SNN.

Rockpool accomplishes this process in only a few lines of code, as demonstrated in listing 2.

Listing 2. Deploying an SNN to DYNAP-SE2.

Our pipeline also supports 'reverse mapping', whereby a simulation SNN can be extracted from an existing hardware configuration, as shown in listing 3.

Listing 3. Extracting an SNN from a hardware configuration.

The sections below explain the details of these steps.

2.2.1. Computational graph

A computational graph in Rockpool represents the flow of data through a neural network. In Rockpool, the as_graph() method extracts a computational graph from a full SNN model. This graph captures all the computationally significant parameters of the network, as well as the network structure. The graph representation enables manipulation of the SNN architecture, facilitating mapping the network parameters to various hardware architectures, and permitting conversion between neuron models. For instance, a trained LIF network can be transformed into an equivalent DynapSim network, with similar behaviour.

2.2.2. Mapping

The mapper() functionality converts a computational graph from an arbitrary SNN into a DYNAP-SE2 HDK hardware specification, regardless of whether the network was originally a DynapSim network. mapper() clusters the parameters into groups and determines the hardware IDs of neurons. The mapping process is described in detail below. For an alternative approach to mapping SNNs to DYNAP-SE hardware, see [21].

Figure 7 shows the weight parameters and information flow in an example SNN with both feed-forward and recurrent components.

The input weight matrix, $W_\textrm{in}$ , applies a linear transformation to the external input and delivers it to the hardware neurons. Recurrent weight matrices, $W_\mathrm{1(rec)}$ , $W_\mathrm{3(rec)}$ , and $W_\mathrm{5(rec)}$ , establish the connection weights between hardware neurons, while feed-forward weight matrices, W₂ and W₄, connect different groups of neurons to each other. The mapper produces a single equivalent recurrent weight matrix that collects all feed-forward neurons in a single pseudo-recurrent representation.

Figure 8 shows the merged input and recurrent weight matrices corresponding to the network in figure 7.

The input weight matrix W_in connects virtual input neurons to hardware neurons on DYNAP-SE2 All other weight matrices in figure 7 are merged into one large recurrent weight matrix. The input neurons are assigned tags (virtual IDs) from the set of virtual input tags ('virtual tags'), while the hardware neurons are assigned tags (hardware IDs) from the list of available hardware neurons ('actual tags').

Once this is complete, the mapper() reduces an SNN down to three connected graph modules: one DynapseNeurons object holding the current parameter values of the hardware neurons, one LinearWeights object holding the input weights from the external connections to the hardware neurons, and one LinearWeights object holding the recurrent weights between the hardware neurons.

2.2.3. Quantization

During simulation, weight matrices in the layers can adopt any floating-point value, but when deploying to the hardware of DYNAP-SE2, weight settings are limited to a 4-bit restricted connection-specific assignment. To convert weight matrices to device configuration, parameter quantization is necessary. The quantization process for DYNAP-SE2 involves two steps: obtaining 4 base weight parameters for the inner product space; and storing connection-specific 4-bit binary weight masks in digital memory cells. The goal of quantization is to find a set of 'base weight' parameter values and a binary bit-mask matrix that together reconstruct a floating-point weight matrix with minimal deviation. To accomplish this, an auto-encoder structure, a popular unsupervised machine learning method, is used. The intermediate code representation represents the base weight currents, and the decoder weight matrix provides binary bit-masks.

In the simulated network, weight values can be positive or negative, representing the synapse's excitatory or inhibitory behavior. The sign of each weight determines the synapse type for that connection: inhibitory GABA synapses for negative values and excitatory AMPA synapses for positive values. Figure 9 shows the weight quantization procedure.

**Figure 9.** Process of weight quantization. The floating-point weight matrix (top) is converted into a quantized representation, consisting of a bit-mask containing the number of quantal connections made between two neurons as well as connection type (bottom; left) and a set of 'base weight' parameters (bottom; middle) indicating the strength of each quantal connection. The reconstructed weight matrix (bottom; right) will be similar to, but not identical to the original matrix. For example, weak connections may be pruned (blue highlights).
Download figure:
Standard image High-resolution image

Training of the unsupervised auto-encoder learns a hardware-compatible configuration that replicates the target weight matrix with minimal deviation. The MSE loss metric between the target and reconstructed weight matrices is optimised during the quantization process.

2.2.4. Deployment

The behavior of a neuron and synapse is characterized by several parameters, including time constants that determine leakage rate and gain ratios that control the amplitude of spike-dependent jumps. While in simulation these parameters can be adjusted mathematically to modify the behaviour of neurons, implementing this parameterisation in VLSI circuits is more complicated. Silicon-based implementations of neurons and synapses rely on adjusting bias voltages and currents. Deploying an SNN application to a mixed-signal device implies translating simulation parameters and the behavioral dynamics of a computational neuron model, into the parameters and bias voltages of the corresponding neuron circuits. This involves finding a digital bias generator setting that accurately expresses bias current values in Amperes using empirical lookup tables.

Figure 10 illustrates what parameter translation entails within this reference frame. Mapping from high-level parameters to hardware configuration involves two major steps. The first step is to identify a supporting current value in amperes, given the parameter value. For instance, to ensure τ = 1 ms, $I_\mathrm{leak}$ needs to be 800 pA for the AMPA gate, according to our theoretical expectations using the equation (2)

$\begin{equation} \tau = \dfrac{C_\mathrm{syn}U_{T}}{\kappa I_{\tau}}. \end{equation} \tag{ 2 }$

Here, $C_\mathrm{syn}$ represents the synaptic capacitance, $\kappa$ denotes the mean subthreshold factor (n-type, p-type), and U_T represents the thermal voltage, which is approximately 25 mV at room temperature. It should be noted, however, that to maintain the simplicity and efficiency of the simulation, the effect of temperature variation due to circuit power dissipation has not been considered.

The second step involves identifying a suitable bias generator configuration, which yields the exact current value needed to set the high-level parameter. To do this, we used empirical recordings of bias current responses-given the digital bias generator configuration-as a guide in the digital configuration search.

While the process explained here applies to all configuration parameters, it does so with a different first step. For weight matrices, the weight quantization process returns the base weight currents, and amplifier gains are subsequently computed relative to the leakage currents, depending on predefined ratios. For a more detailed analysis of the conversion methodology, please refer to [15].

3. Results

We used Rockpool and DynapSim to train an SNN as described above, then used the mapping, quantization and deployment facilities of Rockpool to deploy the trained SNN to a DYNAP-SE2 device, configuring the hardware parameters on the chip. We converted the frozen noise patterns to real-time AER sequences for injecting into a DYNAP-SE2 device. Each event in the noise pattern is encapsulated with its time and address and sent to the development kit, where an on-board FPGA circuit converts the AER events to digital pulses that stimulate the synaptic input gates of the neurons [1]. The analog neurons on the DYNAP-SE2 then process the inputs and produce output events. Whenever a neuron fires, digital circuits on the FPGA capture the event's timestamp and source address and encapsulate it as an AER event, which is temporarily stored in buffers implemented inside the FPGA. The output of the hardware evaluation of an input is recorded as AER event sequences.

Figure 11 demonstrates that the combined effects of quantization and device mismatch do not result in loss of performance. The hidden temporal information that the network has learned is still present, and the network is able to differentiate between the training samples.

The FRR described in the Response Analysis section under Methods is utilized to assess the neurons' ability to distinguish frozen noise patterns. A high FRR (>1) indicates good discrimination between the two trained frozen noise input samples. Table 1 displays the firing responses of the simulated, quantized, and hardware networks when presented with trained frozen noise 0 (FN0), trained frozen noise 1 (FN1), and the un-trained random noise TEST samples.

Table 1. Output firing rates and FRRs for target and test input samples. 'Simulated' results are from PC-based simulation of the trained DynapSim network in Rockpool. 'Quantized' results are from PC-based simulation of the quantized model in Rockpool. 'Hardware' results are obtained from running inference of the model deployed to DYNAP-SE2 hardware. 1000 samples were used for the Simulated and Quantized results. 10 samples were used for inference on hardware. FN: Frozen Noise (trained target input sample); TEST: Random poisson untrained test samples; N0: Output neuron for class 0; N1: Output neuron for class 1.

	Simulated			Quantized			Hardware
Frozen Noise	N0 (Hz)	N1 (Hz)	FRR	N0 (Hz)	N1 (Hz)	FRR	N0 (Hz)	N1 (Hz)	FRR
FN class 0	164	14	11.7	136	58	2.3	18	2	9.0
FN class 1	24	122	5.1	58	144	2.5	0	36	$\gt$ 100
TEST (mean)	138	123	1.1	143	123	1.2	17	14	1.9

We presented 1000 randomly generated independent test samples for inference in the simulation, taking advantage of the flexibility of the simulation environment. In contrast, for hardware testing, we presented 10 random independent Poisson noise samples to DYNAP-SE2. The limited number of hardware tests was a deliberate decision, influenced by the intricate nature of the hardware setup. Chip configuration for each iteration requires manual intervention, and required at least 5 min for each iteration in practice.

In all cases, mismatch was either simulated (for 'simulated' and 'quantized' results), or was physically present on the DYNAP-SE2 hardware device. Untrained test samples produced high output firing rates in general (>100 Hz simulation; TEST in table 1), but with low FRR close to 1. The worst-case test sample with highest FRR for an untrained test sample, indicating false-positive discrimination, was 1.5 in simulation; 1.6 for the quantized network; and 3.2 for inference on the DYNAP-SE2 HW. Trained target frozen noise input samples produced similar maximum firing rates as the untrained test samples, but with much higher FRR.

The experimental results indicate that quantization and mismatch cause information loss when converting an optimized network to a hardware configuration. However, despite this loss of information and device mismatch, the decision mechanism remains functional. The hardware deployed model also successfully distinguishes trained input samples, with high FRR.

3.1. Training speed

Gradient-based spiking neural networks training is known to take a long time due to the complexity of the dynamical equations that spiking neurons solve in time, which involve many floating point operations. Additionally, backpropagation through time implies additional memory overhead compared with backpropagation in classical ANN optimisation problems. However, recent advancements in machine learning tools have presented opportunities for performance improvements.

DynapSim utilizes JAX, a high-performance mathematical library which includes automatic differentiation. JAX's JIT compilation support for functional programming significantly reduces execution time in the optimization loop. To illustrate the benefits of JIT, we executed the training script on two different machines, and the performance results are presented in table 2.

Table 2. Training time comparison. The training process was executed identically using the same training code on two machines. We compared the training speed when using non-accelerated JAX, and when using JAX-JIT compilation to the CPU on each machine.

Attribute	Machine 1	Machine 2
CPU	8 Core Apple M1 Pro	Intel Core i7-7500
RAM	32 GB	16 GB
OS	macOS 12.4	Ubuntu 20.04
Epoch/s (JAX)	0.7	0.4
Epoch/s (JAX-JIT)	2600	1350
Duration (JAX)	15 days	28 d
Duration (JAX-JIT)	6.5 min	12.5 min
Speedup	3714 ×	3375 ×

Using JAX-JIT reduced the computation time by up more than 3000×, making it possible to optimize SNN structures employing complex neuron models without the need for giant computer clusters or waiting for weeks to see the results. Rockpool / DynapSim is able to use the JIT facilities of JAX to target GPUs and TPUs, enabling scalable use of large computational resources when available, for efficient training of SNNs.

4. Discussion

We demonstrated a new approach and toolchain for gradient-based training and automated deployment of SNN applications to mixed-signal SNN devices such as DYNAP-SE2. The training pipeline is shown to run 1 million epochs in minutes instead of weeks, by exploiting the JIT compilation features of JAX. DynapSim is a huge step towards building commercial SNN applications for mixed-signal neuromorphic processors.

The deployment strategy offers an unsupervised method for weight quantization, removing the need for manual calibration and tuning of hardware bias parameters. The resulting quantized network's parameters are automatically translated to a hardware configuration. Although the task introduced here is relatively simple, it proposes a novel methodology for application development targeting mixed-signal SNN processors. The approach, metrics, and evaluation strategies can easily be applied to more complex tasks.

Results indicate that the optimized network is robust to the effects of quantization and device mismatch, which are common challenges in hardware implementation. This suggests that spiking neural networks can be used robustly in real-world applications where hardware constraints and variability are significant factors. Still, further studies are needed to investigate the network's performance under different quantization and device mismatch scenarios and to generalize the findings to different spiking neural network architectures and applications.

Rockpool simplifies the modeling and deployment process for the neuromorphic community, addressing a key obstacle that has limited the accessibility of mixed-signal neuromorphic processors to only high-end academic and industrial research. Our approach paves the way for building commercial applications using mixed-signal neuromorphic technologies.

Acknowledgments

The authors thank Dmitrii Zendrikov and Adrian Whatley for their comments and feedback while developing the simulation tools. This work was funded in part by the ECSEL Joint Undertaking under grant agreements 826655 'TEMPO' and 876925 'ANDANTE'; by the KDT Joint Undertaking under grant agreement 101097300 'EdgeAI'; by Innosuisse and by the Swiss State Secretariat for Education, Research and Innovation (SERI).

Data availability statement

No new data were created or analysed in this study.

Appendix:

Table 3. Bias Parameters, a supplementary table listing the most significant parameters and their impact on the SNN simulation.

Parameter	Corresponding current	Description
SOAD_TAU_P	$I_{\tau_\mathrm{ahp}}$	AHP block time constant $\tau_\mathrm{ahp}$
DEAM_ETAU_P	$I_{\tau_\mathrm{ampa}}$	Excitatory AMPA synapse time constant $\tau_\mathrm{ampa}$
DEGA_ITAU_P	$I_{\tau_\mathrm{gaba}}$	Inhibitory GABA synapse time constant $\tau_\mathrm{gaba}$
DENM_ETAU_P	$I_{\tau_\mathrm{nmda}}$	Excitatory NMDA synapse time constant $\tau_\mathrm{nmda}$
DESC_ITAU_P	$I_{\tau_\mathrm{shunt}}$	Inhibitory SHUNT synapse time constant $\tau_\mathrm{shunt}$
SOIF_LEAK_N	$I_{\tau_\mathrm{mem}}$	Neuron membrane time constant $\tau_\mathrm{mem}$
SOAD_PWTAU_N	$I_{pulse\_\mathrm{ahp}}$	AHP block pulse width $t_{pulse\_\mathrm{ahp}}$
SYPD_EXT_N	$I_\mathrm{pulse}$	Any synaptic input pulse width $t_\mathrm{pulse}$
SOIF_REFR_N	$I_\mathrm{ref}$	Neuron membrane refractory period $t_\mathrm{ref}$
SOAD_GAIN_P	$I_\mathrm{gain_{ahp}}$	AHP block gain
DEAM_EGAIN_P	$I_\mathrm{gain_{ampa}}$	Excitatory AMPA synapse gain
DEGA_IGAIN_P	$I_\mathrm{gain_{gaba}}$	Inhibitory GABA synapse gain
DENM_EGAIN_P	$I_\mathrm{gain_{nmda}}$	Excitatory NMDA synapse gain
DESC_IGAIN_P	$I_\mathrm{gain_{shunt}}$	Inhibitory SHUNT synapse gain
SOIF_GAIN_N	$I_\mathrm{gain_{mem}}$	Neuron membrane gain
SYAM_W0_P	$I_{w_{0}}$	Weight bit 0 strength
SYAM_W1_P	$I_{w_{1}}$	weight bit 1 strength
SYAM_W2_P	$I_{w_{2}}$	weight bit 2 strength
SYAM_W3_P	$I_{w_{3}}$	weight bit 3 strength
SOAD_W_N	$I_{w_\mathrm{ahp}}$	AHP block weight current
SOIF_DC_P	I_dc	Constant DC current injected as input
DENM_NMREV_N	$I_\mathrm{if_{nmda}}$	NMDA gate soft cut-off current
SOIF_SPKTHR_P	$I_\mathrm{spkthr}$	spiking threshold current

Gradient-descent hardware-aware training and deployment for mixed-signal neuromorphic processors

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract