DYNAP-SE2: a scalable multi-core dynamic neuromorphic asynchronous spiking neural network processor

With the remarkable progress that technology has made, the need for processing data near the sensors at the edge has increased dramatically. The electronic systems used in these applications must process data continuously, in real-time, and extract relevant information using the smallest possible energy budgets. A promising approach for implementing always-on processing of sensory signals that supports on-demand, sparse, and edge-computing is to take inspiration from biological nervous system. Following this approach, we present a brain-inspired platform for prototyping real-time event-based spiking neural networks. The system proposed supports the direct emulation of dynamic and realistic neural processing phenomena such as short-term plasticity, NMDA gating, AMPA diffusion, homeostasis, spike frequency adaptation, conductance-based dendritic compartments and spike transmission delays. The analog circuits that implement such primitives are paired with a low latency asynchronous digital circuits for routing and mapping events. This asynchronous infrastructure enables the definition of different network architectures, and provides direct event-based interfaces to convert and encode data from event-based and continuous-signal sensors. Here we describe the overall system architecture, we characterize the mixed signal analog-digital circuits that emulate neural dynamics, demonstrate their features with experimental measurements, and present a low- and high-level software ecosystem that can be used for configuring the system. The flexibility to emulate different biologically plausible neural networks, and the chip’s ability to monitor both population and single neuron signals in real-time, allow to develop and validate complex models of neural processing for both basic research and edge-computing applications.


Introduction
As technology has progressed, the need for processing more sensory data at the edge has increased dramatically.In particular, an increasing amount of applications are expected to process data near the sensors, without resorting to remote computing servers.For these types of applications it is of prime importance to minimize power consumption and latency, while maintaining robustness and adaptability to changing conditions.The processors used in these applications therefore need to process the data being measured by the sensors continuously, in real-time, and to extract relevant information using the smallest possible energy budgets.A promising approach for implementing always-on processing of sensory signals that supports on-demand, sparse, and edge-intelligence computation, is that of using event-based Spiking Neural Networks (SNNs) [1][2][3][4][5][6][7].The event-based representation has been shown to be particularly well suited to transmitting analog signals across noisy channels, while maximizing robustness to noise and minimizing bandwidth requirements and power consumption [1,8,9].Furthermore, by encoding only the changes in the signals, this representation is optimally suited for sensory signals that change sparsely in time, producing data only when necessary [2,4].The computational paradigm that best exploits the event-based representation is that of SNNs.
As one of the largest sources of energy consumption in electronic processing systems is data-movement [10,11], the best way to minimize power consumption in event-based SNNs is to implement them as massively-parallel in-memory computing architectures that process the data on the fly, as it is being sensed, without having to store it and retrieve it.It is therefore important to match the rate of the data arriving in input to the processing rate and the time constants of the synapses and neurons in the SNN.Neuron and synapse circuits can be configured to process natural Figure 1: Photo of the DYNAP-SE2 chip, which has an area of 98 mm 2 manufactured in 180nm CMOS technology as a cost effective prototyping platform.signals such as human voice, gestures, or bio-signals, by setting their time constants to tens or hundreds of milliseconds (and significantly reducing their processing speed).This can improve the information retention and processing ability of feed-forward SNNs.However, processing of signals that contain very long and multiple timescales using this approach requires resorting to recurrent SNNs (RNNs) [12][13][14].These types of networks provide a valuable algorithmic foundation for adaptive and efficient processing of continuous sensory signals, as they can be configured to exhibit a wide range of dynamics that are fundamental in lowering the amount of storage resources required to process, recognize, and generate long temporal sequences and patterns.
Conventional neural network accelerators and digital implementations of SNNs [15,16] can be in principle used to design and train both feed-forward and recurrent neural networks.However their memory storage and data movement requirements increase their power budget significantly and negates their advantages compared to using standard computing architectures [17].The original neuromorphic engineering approach proposed in [18,19] aims to solve the above challenges by using analog circuits that operate in weak-inversion (subthreshold) and in physical time to implement neural dynamics for solving sensory processing tasks, in a datadriven manner.In this approach each neuron and synapse computational element is implemented using a dedicated physical circuit, without resorting to time-multiplexing of shared computing resources.Computation is therefore massively parallel and distributed, and takes place only if the synapse/neuron is driven by input events.For interactive real world data processing, the event-based mixed signal approach is an optimal match: it allows carrying out physical-time sensory processing with lowpower circuits, and the implementation of artificial intelligence computing primitives for solving extreme edge computing applications [12,19].
In this paper we present a mixed-signal neuromorphic processor that follows this approach.It directly emulates the dynamics of biological neurons and synapses using analog integrated circuits for computation, and asynchronous digital circuits for transmitting the events (spikes) produced by the neurons to destination synapses or to the output pads.The processor features a clock-free asynchronous digital hierarchical routing scheme which runs in native real-time and ensures low latency [20].The processor we present is denoted as the DYnamic Neuromorphic Asynchronous Processor-ScalablE 2 DYNAP-SE2 This chip significantly extends the features of the previous generation DYNAP-SE [20] at the synapse and neuron circuit level, at the network-level, and at the asynchronous routing fabric level.We show here how the DYNAP-SE2 offers rich neuronal dynamics across different timescales to support a wide spectrum of biologically plausible recurrent networks.We present the overall architecture and describe in detail the individual circuits, providing experimental results measured from the chip to validate the theory.To enable near-sensor processing the DYNAP-SE2 also integrates an on-chip analog front-end (AFE) with low-noise amplifiers, band-pass filters and asynchronous delta modulators for converting input waveforms into streams of address-events [21].Similarly, DYNAP-SE2 includes a direct 2D sensor event pre-processor [22] that can cut, scale and arbitrarily map 2D event stimuli from a Dynamic Vision Sensor (DVS) [23].
The structure of this paper is the following.Section 2 presents an overview of the general architecture and available resources of the chip.Section 3 reviews the common building blocks that are crucial to understanding and using the chip.Section 4 enumerates the core analog neural circuit with application examples and real measurement.Section 5 elaborates the routing scheme and methods for building large scale neural networks.Section 6 briefly describes the interfaces the chip presents to the outside world and Sec.7 describes the software system that supports the usability of the chip.As the analog front-end is independent of the neuron cores and event processing, for more information regarding its circuit design and application see [21].

System architecture
Computation is centered on the 1024, analog, integrate-and-fire neurons arranged in 2 × 2 cores of grids of 16 × 16 neurons each.Each neuron has 64 synapses and four dendritic branches.The only way to send information to the neurons and for the neurons to send information out is through digital spikes.The routing scheme will be elaborated in Section 5.As opposed to many computational models, the neurons do not receive analog current injection directly, and the membrane potential is also not accessible.These design choices are taken for scalability reasons, because there is no easy way to access thousands of analog values at the same time, while digitized spikes can easily be routed using time-multiplexing [24].Thus, in order to provide analog input to the network, a neuromorphic sensor [25] (such as a DVS [26] or AFE [21] ) that encodes a signal into spikes is needed, and the computation and learning algorithms should be completely spike-based.
As summarized in Fig. 2, each neuron circuit is composed of synaptic, dendritic and somatic compartments with many conditional blocks for dynamic features, which are constructed in a highly modular way, meaning that all of them can be bypassed with digital latches when not needed.The default state of these latches after reset is always disabled, so the users do not have to disable them by setting parameters to extreme values as in the previous generation.
In order to better monitor and debug the network, the user can select one neuron per core to monitor, the membrane potential of which is directly buffered to an external pin, and multiple other intermediate analog current signals are converted into pulsefrequency modulated signals using spiking analog-to-digital converters (sADC).In addition, a delay pulse internal to a couple of specific synapses and the homeostasis direction of the monitored neuron are also buffered to external pins.Section 6.3-6.4 include more details about the monitoring.

Specifications
The specifications of DYNAP-SE2 are summarized in Table 1.

Differential pair integrator (DPI)
The DPI is current-mode a low-pass filter that enables a wide range of dynamic features in neuromorphic aIC design [19].It has many advantages such as small area, high power efficiency and good controllability, and is thus used in silicon synapse and neuron designs, as well as longer time constant adaptation [32] and homeostasis circuits [33].When used as a linear integrator, it can exploit the super-position principle and receive high-frequency spike trains to produce an output that represents the sum of many synapses receiving low-frequency inputs.[27] and exponential integrateand-fire model [28], with conditional adaptation and 'calcium'-based homeostasis.When the neuron fires, the AER spike is sent to up to four chips.

Equations and typical operating regimes
The most general equation in current mode for the output I out is where the time constant τ = CU T κItau .The non-linear equation can be simplified in the DVS interface with pre-processing [22].Can target up to ±7 × ±7 surrounding chips.
2D resistive grid on AMPA [36].The circuit is designed in current mode, where I x (x ∈ {tau, gain, out}) is the current flowing in the diode-connected transistor with voltage V x of the corresponding type (for example I out and V out in the schematics).
three typical operating regimes: which is a first-order linear system with input I in and state variable I m , (ii) which is a linear integration of inputs on the membrane capacitor, which is an exponential decay for I out and linear ramp-down for V out .

Mirrored output
The output current can be flipped using a current mirror so that it flows in the same direction as the input, as shown in Fig. 5.
: N-and P-type DPI with mirrored output.The new output I out flows in the opposite direction to the original one in Fig. 4 but has the same magnitude.

Pulse extender
As the information in the network is exclusively carried with spikes, which are extremely short duration (sub-nanosecond) digital pulses, they would be largely inconsequential for the analog circuits, thus there must be a way to convert the spikes into analog pulses with a longer duration.For instance, the input presynaptic spikes have to be converted into analog post-synaptic currents, and the neuron spikes have to trigger refractory periods and negative feedback mechanisms such as spike-frequency adaptation and homeostasis, etc.This conversion is achieved with a class of pulse extender circuits.

Basic pulse extender
The most simple and low power pulse extender circuit is shown in Fig. 6.The pulse width T pulse is controlled by the discharging current I pw in an inversely proportional manner:

Delayed pulse extender
The pulse extender circuit in Fig. 6 charges the capacitor immediately to V dd when the input event arrives, which makes the output pulse also immediate.If the charging current of the input current is also restricted with an analog parameter, the output pulse will be delayed with reference to the For the minimal pulse extender PX MIN with only one transistor and one capacitor, the voltage V C on the capacitor is the output.This circuit is simple, but the output is not clean (dashed waveform) and consumes more power as it stays around V dd /2 longer.For the low-power pulse extender PX EFF , once V C reaches the switching threshold around V dd /4, positive feedback will discharge the capacitor rapidly (solid line), so the output pulse is cleaner and consumes less power.The switching threshold is shifted down to ∼ V dd /4 by the unsymmetrical starved inverter as well as sizing the P-FET physically the same size as the N-FET and resulting in a beneficial pull-up/pull-down drive strength imbalance.With this the capacitance can be significantly smaller while still achieving the same time constant.
input [34].The circuit is shown in Fig. 7.The delay time T delay is controlled by the charging current I delay , and pulse width T pulse by the discharging current I pw , both in an inversely proportional manner:

Loss of information
Both pulse extension and delay mechanisms will make each spike take longer.The important edge case is when another event arrives before the pulse of the previous event finishes.From the circuit and information theoretic perspective, since the 'time left' information (for either delay or pulse width) is stored as the voltage on the capacitor, it is impossible to keep track of multiple of them with only one state variable.If two pulses overlap, one of them must be dropped.Because physical systems are causal, the second pulse cannot remove the already started one but only overwrite the remaining part of it.The C-element [37] (shown as ©) is an asynchronous digital circuit that changes its output to X when both inputs are equal to X.When the active-low event arrives, if there is no output pulse (1), the output of the C-element goes from 1 to 0, which starts the charging of the capacitor with current I delay .When the voltage on the capacitor exceeds the threshold of the inverter, the output pulse becomes active (0) and positive feedback charges the capacitor to V dd immediately.The output of the C-element then goes to 1, which starts the discharging of the capacitor with current I pw .When the voltage on the capacitor drops below the threshold of the inverter, the output pulse finishes (1).
For the low-power pulse extender circuit, the capacitor will be recharged to V dd immediately when the second event arrives, thus the pulse restarts.Mathematically, the output pulse is the union (logical OR) of the incoming pulses.
For the delayed pulse extender circuit, the charging can only start when the output pulse is inactive, otherwise the output of the C-element will remain at 1.If another input event arrives during the delay phase or in the extreme case during the transition between delay and pulse phase, the output of the C-element will still be 0. In both cases, the output of the C-element does not change, meaning that the information is dropped.In other words, if the inter-spike interval is shorter than the delay, the second spike will be ignored.

Event low pass filter
When a pulse extender is combined with a DPI as shown in Fig. 8, it serves as an event low-pass filter (LPF).x is a set of discrete events (treated as sum of Dirac functions) and the output y is an analog current waveform.
Let the (active low) input be x(t) = N i=1 δ (t − t i ) where the t i are the spike times, the pulse width of the pulse extender be T pulse , the time constant and threshold parameters for the DPI be I tau andI gain , and the weight parameter be I w .τ = CU T κItau and If ∀i = 1, • • • , N, t i+1 − t i > T pulse and Itau Iw τ ≪ T pulse ≪ τ , the combined circuit is a first-order low-pass filter with transfer function Similarly for the delayed pulse extender with the extra delay parameter T delay , if ∀i = 1, • • • , N, t i+1 − t i > T delay + T pulse and Itau Iw τ ≪ T pulse ≪ τ , the combined circuit is a delayed first-order low-pass filter with transfer function If we plug in which implies that the system is linear, and the total output charge per input event is The output only depends on hyperparameters τ , W and T delay or even Q and

Digital-to-analog converters
The parameters required to properly operate the analog circuits in the chip are generated on chip by on-chip programmable digital-to-analog converters (DAC).
Because of the large scale of the neural network, i.e. 1024 neurons × (∼20 somatic parameters + ∼20 parameters for the four dendrites + 64 synapses per neuron × 14 synaptic parameters), if every neuron and every synapse would have individually configurable parameters, there would be around one million parameters to set.As a trade-off, the neurons are divided into four cores of 256 neurons each, and most of the parameters are shared across all neurons and synapses within a core, and implemented with separate parameter generator DACs for each core.A very few but important cases

Parameter generator
The current-based parameter generator used in this chip generates accurate analog currents over a very large dynamic range [38].This parameter generator is enhanced with a proportional to absolute temperature (PTAT) and complementary to absolute temperature (CTAT) current reference for current temperature stabilization.The general formula for any current parameter I parameter is where integers n coarse ∈ [0, 5], n fine ∈ [0, 255].k parameter is a scaling factor which is roughly constant for all n coarse and n fine values, but a more precise non-ideality correction from simulations based on the transistor type and size is also available.
Estimates of the values of the 'base' currents I coarse (n coarse ) are shown in Table 2.It is important to note that the error in these estimates increases with the coarse value, and because of mismatch, different (n coarse , n fine ) combinations that produce the same result according to Eq. ( 12) may give different results on actual hardware.In the case of very low currents, n coarse = 0 always gives the highest accuracy.Therefore, it is recommended to always use lower n coarse values when possible.Especially when n fine = 0, the parameter generator outputs the dark current of the corresponding transistor, and can be very different for different n coarse values.As for other implementations [38], a small-scale non-monotonicity, caused by a large transistor stack moving out of saturation in the current branch also exists in this implementation and can be corrected via calibration with a pre-recorded look-up table.
For very small currents the DAC requires a settling time for the parameters to reach their steady-state programmed values, which can take up to several seconds.
For circuit parameters that are in the voltage-domain instead of the current one, the voltage V parameter at the gate of the diode-connected transistor of the appropriate type that conducts the parameter current in sub-threshold is given by

Flexible DAC
For the 4-bit synaptic weight and 2-bit delay, in order to achieve maximum flexibility, a customized DAC is used.The circuit (in P-type) is shown in Fig. 10.The base currents I b0 through I bn come from the parameter generator, and the digital configurations x 0 through x n are stored in latches, The output current follows If an always-on minimal current is wanted, the P-FET connected to x 0 could be bypassed, which implies x 0 ≡ 1 in Eq. ( 14).
If we set the flexible DAC could be used as a normal n + 1 bit DAC.If higher dynamic range is needed, the different I bi 's can also be very different, but the bit resolution will be lower as a trade off.

Somatic compartment
The center of the silicon neuron is the integrate-and-fire soma circuit.Based on the desired 'firing' mechanisms, there are two switchable somatic models on the chip: • Thresholded: the neuron fires when the membrane potential reaches a threshold; • Exponential: the neuron receives positive feedback that drives spike output [28].
In addition, there are conditional spike-frequency adaptation circuit [32] and homeostasis circuits [33] that can be activated on either model.The overall architecture of the somatic circuit is shown in Fig. 11.

Somatic DPI -information integration
The integration of information on the soma is achieved with the N-type DPI circuit introduced in Section 3.1.There are two basic parameters to control the somatic DPI -the leak (SOIF_LEAK parameter, or the I tau of the DPI) and the gain (SOIF_GAIN parameter, or the I gain of the DPI).The neuron receives post-synaptic current I dendritic from three dendritic branches AMPA, NMDA and GABA B , and the somatic current I somatic from shunting inhibitory dendrite GABA A .The output is the membrane potential I mem in current mode or V mem in voltage mode.
The most commonly used conditional function is the constant DC current injection (enabled using the latch SO_DC and configured with the SOIF_DC parameter), which goes into the input branch of the DPI, together with the dendritic input I dendritic .The DC input can be used to set a proper resting potential and even drive a constant firing rate.One can also turn off any specific neuron using the latch SOIF_KILL.
Mathematically, the DPI inputs corresponding to Section 3.1 are Figure 12 shows the membrane voltage V mem waveform recorded for a neuron on the chip for the two somatic models with the same DC input but different gains.

Biologically plausible time constant
The somatic DPI employs a 7.72 pF capacitance to achieve a biologically plausible time constant.When the leak of the neuron is set to its minimum, which is the leakage current of the transistor, the slew rate of V mem can achieve 108 ± 12 mV/s (measured across one core).Thus a single neuron can hold a 'memory' for up to about five seconds, enabling processing of signals on a biologically plausible timescale.
The biologically plausible time constant is a trade off and the reason for a comparatively higher power consumption compared to other systems.Measurement results and discussion on the power consumption is presented in more detail in subsection 4.1.4.2)).With a higher gain value (right), the integration phase is linear (exponential I mem in Eq. ( 2)).The top two plots show the the thresholded model with firing threshold set to around 0.5 V.The bottom two plots show the exponential model, where V mem has an exponentially increasing shape that leads to the neuron firing.While we show data for the voltage across the output capacitor of the circuits, the neuron uses the current resulting from the voltage across the capacitor.This is given by the exponential of the plotted voltage and it is affected by the relevant transistor variables (e.g., U T , κ) [39].
4.1.3.Refractory period -maximum firing rate After the spike is generated, the neuron enters a state in which integration is blocked: this is the (absolute) refractory period in biology.It is an important computational feature, as it sets an upper limit on the firing rate and introduces non-linearity.The refractory period circuit as shown in Fig. 13.The length of the refractory period is controlled by the discharging current I refractory (SOIF_REFR parameter).Based on Eq. ( 5), the maximum firing rate r max or equivalently the inverse of the length of the refractory period T refractory is proportional to the recharging current: The capacitance of the refractory period pulse extender is about 2 pF.The longest refractory period is achieved when the parameter is set to its minimum value.The measurement result across one core shows that the maximum refractory period for the thresholded model is 1.58 ± 0.10 s, and 0.748 ± 0.045 s for the exponential model.The difference is because the pulse extender circuits are different in the two models.The thresholded model uses the low-power pulse extender and the exponential model uses the simplified minimal pulse extender without positive feedback.The latter also has the problem that there might be multiple events generated for one neuron spike, which makes the exponential model unsuitable for building a complex network.When the neuron emits a spike, (spike = 0), req = 0 is sent to the encoder, which returns ack = 0.When both req = ack = 0, the pulse extender is triggered, which discharges the neuron until the spike disappears (spike = req = 1).The encoder then releases ack (ack = 1) and the refractory period starts by discharging at a rate determined by I refractory , during which the neuron's membrane potential is clamped to ground.
development step towards the later [41] and the adapted exponential I&F model as shown in [42].They share the integration circuit described in subsection 4.1.1,but have different ways of generating spikes.The two models are selected using the SOIF_TYPE latch (default 0 = thresholded model, 1 = exponential model).
(i) Thresholded I&F model The thresholded I&F generates a spike whenever the membrane potential (I mem = I out of the somatic DPI) exceeds a certain threshold I spkthr (controlled by the parameter SOIF_SPKTHR).The circuit is shown in Fig. 14.The generated spike will give a positive feedback to the DPI by charging I mem to its maximum immediately, thus the spike pulse width is just the time for the following asynchronous digital encoder to respond and can be as low as a few nanoseconds (thus the ramp-up of I mem is too sharp to be buffered to the monitoring pin and cannot be seen) and it is more power efficient due to being shorter.The top two plots in Fig. 12 show the firing pattern in the thresholded model.Measurement results show that it consumes 150pJ per somatic spike when spiking at 80 Hz for the full soma operation, including the integration of the DC input.
(ii) Exponential I&F model The exponential integrate and fire circuit is shown in Fig. 15.As the membrane voltage V mem increases and exceeds a certain threshold, a positive feedback current proportional to I mem is injected onto the membrane capacitor.This makes the neuron fire with an exponential curve as shown in the bottom plots in Fig. 12.The threshold is the point at which the exponential feedback overpowers the leak, and is not controlled by any additional parameter.Measurement results show that the full soma consumes 300pJ per somatic spike for 80 Hz spiking, double the power consumption of the thresholded model (also including the integration power consumption).The main reason is that the spike pulses are longer.4.1.5.Neuronal dynamics on a longer timescale Beside the relatively fast integrateand-fire dynamics, biological neurons also have dynamics on longer timescales, such as adaptation and homeostasis, which benefit computation.The common part of these two mechanisms is that they both use the spikes as negative feedback to regulate the excitability of the neuron itself.Both are implemented with the LPF from Section 3.4, sharing a pulse extender with the input being the neuron spikes and I pw configured using the parameter SOAD_PWTAU.

(i) Spike-frequency adaptation
The spike-frequency adaptation circuit prevents the neuron from generating a lot of spikes in a very short time.The adaptation current is the output of the LPF consisting of the shared pulse extender from Section 3.3.1 and a non-shared DPI.This current is subtracted from the dendritic current I dendritic of the soma.The adaptation function is enabled using the latch SO_ADAPTATION.The individual controllable parameters are the LPF biases: I w (SOAD_W), I gain (SOAD_GAIN) and I tau (SOAD_TAU).Figure 16a shows measurements of the adaptation of the neuron with constant input.Figure 16b shows the spike-frequency adaptation measurement with alternating DC input.Notice that the parameters were chosen to give long time constants.In real applications, shorter time constants can reduce the effects of device mismatch.When DC input is first presented at time t = 0, the neuron starts to spike at a high rate, causing the adaptation current to increase until it reaches a high enough value to shunt the input, which enters the firing pattern as shown in Fig. 16b, and the firing rate drops.When the DC input is removed at t = 0.6 s, the adaptation current decays exponentially to 0, until the neurons starts firing again (at high rate) when DC input is again presented at t = 1.4 s.

(ii) Homeostasis
The homeostasis mechanism is also known as synaptic scaling.It regulates the excitability of the neuron so that the firing rate stays in the medium range (or a target).On <chip-name>, this is achieved with the automatic gain control (AGC) mechanism which can achieve a very long timescale of up to hours.First, the firing rate of the neuron is estimated using a 'calcium current' I Ca , which is implemented using an LPF consisting of the pulse extender shared with the spike-frequency adaptation mechanism described above and a non-shared DPI, and should have a relatively long time constant in order to act as an indicator of the overall neural activity.The calcium current monitored with sADC is shown in Fig. 16a), The homeostasis function is enabled using the latch HO_ENABLE.The DPI biases are the weight I Ca,w (SOCA_W), threshold I Ca,thr (SOCA_GAIN) and time constant I Ca,tau (SOCA_TAU).The output (in current mode) is used as an input to the AGC circuit, and can also be chosen as the reversal potential for the conditional conductance dendrites (see Section 4.3.1) The basic control logic of the AGC is a negative feedback on the somatic gain (or on NMDA gain, controlled by the latch HO_SO_DE where default 0 = somatic, 1 = NMDA) to keep the calcium current around a reference level I Ca,ref (SOHO_VREF parameter).Usually, but the ratios could also be set differently to get different ramp-up and ramp-down rates.The output gain voltage can also be reset directly to SOHO_VREF_M, which is controlled by the latch HO_ACTIVE (default 0 = reset, 1 = enable homeostasis).
Figure 17 shows the working mechanism and measurement results of the homeostasis circuit.

Synaptic compartment
Each neuron contains 64 synapses and four dendritic branches.Each synapse can be attached to any one of the four dendritic compartments.More details of the dendrite circuits will be discussed in Section 4.3.
The synaptic and dendritic compartment generate post-synaptic currents from pre-synaptic events.The synapse is a delayed, weighted, low-pass-filter as shown in Fig. 8 that takes the pre-synaptic events as its input and outputs analog pulses with programmable width and height, which are used as the inputs to the dendritic DPI blocks.A block diagram of the synapse is shown in Fig. 18.

Synaptic delay
The delay current DAC of the type described in Section 3.7 contains two digital latches named precise_delay for x 1 and mismatched_delay for x 2 , and three analog parameters: SYPD_DLY0 for I 0 , SYPD_DLY1 for I 1 and SYPD_DLY2 for I 2 .The naming 'precise_delay' and 'mismatched_delay' comes from the design feature that SYPD_DLY2 has higher mismatch than the other two, in order to give a distribution of delays across a core.x 0 is fixed to 1, which means SYPD_DLY0 sets the minimum output current and thus maximum delay time.Different combinations of the settings of the two latches can also be interpreted as providing four groups of delays, as shown in Table 3.
An illustration of the four groups of delay distributions is shown in Fig. 19.Note that this is just one example of the analog parameter configurations, shorter (down to a few microseconds) and longer (up to one second) delays are also possible; the combined use of the two precise and one mismatched delay parameters gives control over shaping the delay distribution for the desired application.The neuron receives Poisson-distributed input events at an average of 100 Hz starting at t = 0. To begin with, the neuron has a very high gain and thus a very high firing rate.This makes the calcium current I Ca much higher than the reference (target value -dashed line) and a down regulation of the gain takes place.At around t = 2.5 s, the gain is low enough that the firing rate decreases and the calcium current drops below the reference value, and the gain regulation changes sign.The feedback regulation then keeps the firing activity (calcium current) fluctuating around the reference level.(b) Homeostasis dynamics on a longer timescale.The automatic gain control regulates the gain of the soma very slowly until the firing rate reaches the target in about 15 minutes.Both shorter (milliseconds to seconds) and longer time constants (hours to days) can also be achieved.
Table 3: Latch configuration for four groups of delays.The delay current parameter comes from another 2-bit DAC of the type described in Section 3.7 (output I delay , n = 2 but with always-on I 0 ).The pulse width control I pw is set by the SYPD_EXT parameter.The output demultiplexer uses one-hot encoding, where four latches control whether the current goes to each of the four dendritic branch DPIs.For the two excitatory dendrites AMPA and NMDA, there is also a copy of the current provided to the double DPI (DDPI) responsible for producing alpha-function shaped EPSCs (see Section 4.3.2).
When there is no pre-spike, assuming V stp is not very far from V stpw , the P-FET is approximately a pseudo-resistor: which means that V stp will converge to V stpw exponentially with time constant τ = RC.
For small signals (V stp ≈ V stpw ) the corresponding current I stp following I stp = I 0 e κ V stp U T also converges at τ .
During a pre-spike pulse, assuming which means that V stp will drop linearly at rate Istpstr C , and the output current I stp decays exponentially with time constant τ = CU T κIstpstr .

Dendritic compartment
The dendritic block contains two excitatory (AMPA and NMDA) and two inhibitory (GABA B and GABA A ) DPI compartments, which turn pre-synaptic events into excitatory and inhibitory PSCs.The block diagram is shown in Fig. 21.

Conductance dendrites
The AMPA, NDMA and GABA B dendrites can be individually switched to conductance mode to emulate a large class of biologically inspired synaptic models.The circuit is shown in Fig. 22 and is adapted from [35].
The output from the conductance block I conductance to the soma is a tanh function of the difference between the reversal potential V reversal set by the parameter REV and V neuron which is either the somatic membrane potential V mem or the calcium current V Ca (selected using the latch COHO_CA_MEM, default 0 = V mem , 1 = V Ca ): The measurement result shown in Fig. 23a illustrates a simple example of using the conductance function.

Double-DPI -alpha function EPSC
Both AMPA and NMDA EPSCs can accurately emulate alpha function synapse potentials with an additional inhibitory DPI (P-type but with mirrored output as described in Section 3.2) [35].The difference where the coefficients W E and W I are controlled by the parameters EGAIN and IGAIN respectively and the time constants τ E and τ I are controlled by the parameters ETAU and ITAU respectively as described in Section 3.4.The measurement result shown in Fig. 23b illustrates a simple example of using the alpha function dendrite.

Diffusion over a 2D grid
The AMPA dendritic compartment offers an conditional 1D or 2D resistive grid similar to that described in [36] to diffuse incoming EPSCs between nearby neurons.The circuit is shown in Fig. 24a.An example of one dimensional (horizontal) diffusion is shown in Fig. 25.

NMDA -gating with the membrane potential
The NMDA dendritic compartment can gate the incoming current depending on the membrane potential, shown in Fig. 24b.The measurement result shown in Fig. 23c illustrates a simple example of using the NMDA threshold circuit.
It is important to note that disabling the gating using the latch and enabling it but setting V NMDA to 0 are not equivalent, as one would predict from an ideal computational model, because of the different leakage current with and without the NMDA gating circuit.Measurement shows the latter condition may give several picoamps more leakage thus decreasing the excitability of the neuron.

Inter-neuron routing and connection mapping scheme
The routing scheme used within the core has been inspired by [20].The details of this should not concern the user unless special edge cases are encountered (e.g., applications requiring very low latency or very high firing rate or many neurons firing The neuron has both excitatory dendrites in conductance mode, with one of the reversal potentials set to 0.5 V and its synaptic weight very high, and the other has the reversal potential at around 0.7 V but the weight is low.The spiking threshold is set to around 0.6 V. Starting from the resting potential at around 0.35 V, when the first dendrite receives an input spike at time t = 0, it charges the soma up to the reversal potential, and when the second dendrite receives a spike shortly afterwards at time t = 5 ms, it further charges the soma until it crosses the firing threshold and emits a spike (the neuron then goes into its refractory period).However, if the second dendrite receives its input (at time t = 100 ms) before the first dendrite (at time t = 105 ms), then since the second dendrite by itself cannot drive the neuron to fire (due to the low weight) but the first dendrite cannot charge the soma once V mem reaches its reversal potential (0.5 V), which is lower than the firing threshold, the neuron does not emit a spike and slowly leaks back to its resting potential.Thus this neuron could be used to detect the order in time of the two inputs, since it fires if and only if one input comes shortly before the other.(b) The neuron uses both excitatory dendrites, one using the alpha function and the other using only the normal DPI.If the first dendrite receives an input spike at time t = 0, it will start to charge the soma slowly (according to the alpha function), and if the second dendrite receives a spike shortly afterwards at time t = 20 ms, it will charge the soma even further to cross the firing threshold and emit a spike.However, if the second dendrite receives its input (at time t = 300 ms) before the first dendrite (at time t = 320 ms), then since the effect of the second dendrite goes away very fast, and the first dendrite by itself cannot charge the soma to cross the firing threshold either, the neuron does not emit a spike.This mechanism introduces a delayed dynamic, so it can also be used to detect the order of the two inputs.(c) The neuron uses the AMPA and NMDA dendrites.If the AMPA dendrite receives an input spike at time t = 0, it will charge the membrane potential to a value higher than the NMDA threshold (which is set to around 0.1 V), and if the NMDA dendrite receives a spike shortly afterwards at time t = 5 ms, it will charge the soma to cross the firing threshold and emit a spike.However, if the NMDA dendrite receives the input (at time t = 100 ms) before the AMPA dendrite (at time t = 105 ms), then since the membrane potential at the moment when the NDMA dendrite receives the spike was still lower than the NMDA threshold, it has no effect on the soma, and the AMPA dendrite by itself cannot charge the soma to cross the firing threshold, so the neuron does not emit a spike.This mechanism forces an asymmetric condition on when the soma receives the input, so it can also be used to detect the order of the two inputs.
I in Figure 24: Conditional dendritic blocks.(a) 2D diffusive grid connected to the AMPA dendrite.This can be enabled neuron-wise using the latch DEAM_AMPA, and includes the corresponding neuron pseudo-resistor NRES (R n in the figure), the horizontal pseudo-resistor HRES (R h in the figure, between neuron n and n + 1), and the vertical pseudo-resistor VRES (R v in the figure, between neuron n and n + 16).The pseudoresistors are implemented with single P-FETs, and the controllable parameters are the gate voltages DEAM_NRES, DEAM_HRES and DEAM_VRES.(b) NMDA gating.When enabled using the DENM_NMDA latch, the output current of the NMDA DDPI (here I in ) will flow out into the neuron's I dendritic if and only if the membrane potential V m is higher than the NMDA threshold V NMDA (set by DENM_NMREV parameter).Figure 25: AMPA diffusion in one dimension.The input spike is only sent to the neuron in the middle (neuron n), but the diffusion creates a bump in the membrane potentials in the neurons in its (here, 1D) neighborhood.
simultaneously).The user must however understand the addressing scheme in order to make connections between neurons.The principle idea is to use AER to encode the spikes into a stream of bit patterns, so that they can be easily transmitted and routed within and outside of the chips.More specifically, on DYNAP-SE2, each normal inter-neuron event is encoded as a 24-bit word comprising a format indicator bit (bit 23 = 0) and four variable fields as shown in Table 4, the event tag, the target chip displacements in the x and y directions (dx and dy respectively) and the cores mask that determines which cores the event is delivered to on the target chip.Each neuron has four 23-bit SRAMs to store four combinations of tag, dy, dx and cores.When the pre-neuron fires, the router will read and transmit the content of these four SRAMs.This is known as source mapping.This is in contrast to DYNAP-SE [20] which does not include arbitrary source mapping and is therefore limited in the network connectivity it could implement.There is no dedicated 'enable' bit for an outgoing event, but if none of the four cores on the target chip is selected (cores = 0000b), the event will be dropped by the router.
The events are transmitted inside a 2D grid of trees, where every chip has a tree routing structure and four grid connections to the neighboring chips.
When an event arrives at a chip (this can be the sender chip itself), the toplevel router will decide, based on the target chip displacement bits, whether the event should be kept for this chip (dx = 0 and dy = 0) or forwarded further on one of the grid buses (west if dx < 0, east if dx > 0, south if dx = 0 and dy < 0, north if dx = 0 and dy > 0, see Section 6.2 for more details).If the top-level router decides to keep the event, it will be sent to all cores that are selected in the cores bits.
Once it has arrived in a core, an event is identified only by its 11-bit tag.This means that when two events with the same tag arrive in the same core, there is no way for a neuron in that core to tell them apart, even if they come from different pre-neurons.This is used to share synapses as the tags can be assigned arbitrarily in the source mapping.The 11-bit tag is broadcast to all 256 neurons × 64 synapses in the core.Each synapse is provided with an 11-bit CAM.If all eleven bits of the broadcast tag match those in a synapse's CAM, an active low 'match' signal is sent to the synapse circuitry as described in the caption of Fig. 18.This matching process is known as destination mapping.

Example configurations
To better illustrate the tag scheme, two concrete examples of how the tag in the SRAMs of the pre-neurons and in the CAMs of the post-neurons can be used are shown in Python-like pseudo-code: (i) For all-to-all connections from n neurons (in list pre) to r synapses on each of n neurons (in list post), a single tag x is used: (ii) To connect each of the n pre neurons (in list pre) to the (2r + 1) neighbors (mod n) in the n post neurons (in list post), tags in the interval [x, x + n] are used:

Multiplexing of four neurons
For networks that require higher synaptic counts, there is an option to merge the dendrites of four neurons into one (enabled using the latch DE_MUX, set for each core individually).This increases the number of synapses per neuron to 256 and reduces the number of neurons by a factor of four.More specifically, the PSCs I dendritic and I somatic of neurons 0, 1, 16 and 17 will all go to the soma of neuron 0; those of neurons 2, 3, 18 and 19 will go to the soma of neuron 2, etc.

2D event sensor routing and mapping scheme
A separate pipeline for mapping and routing is available for 1D and 2D event streams originating from sensors.It is an earlier version of the event pre-processor block described in [22].These sensor events can be routed in an alternative event word format on the 2D routing grid buses described in Section 5.1.The events are then encoded with a format indicator bit (bit 23 = 1) and five variable fields as shown in Table 4, the event polarity pol, the x and y coordinates of the event (pixel_x and pixel_y respectively), and the target chip displacements in the x and y directions (dx and dy respectively).
The mapping pipeline consists of multiple stages as shown in Fig. 26.The pipeline has the following blocks: • Sensor Interface: The chip can interpret event formats from the following sensors directly via parallel AER: DAVIS346 [43]; DAVIS240 [44]; DVS128_PAER [23].
Other sensors such as AEREAR2 [45] or ATIS [46] can be interfaced to the event routing grid by following the sensor event word format described above.• Pixel Filtering: Up to 64 arbitrary addresses can be discarded from the sensor event stream.This is done in one step using content addressable memory.• Event Duplication: The pipeline can optionally duplicate and forward unprocessed events to one of the four surrounding chips.• Sum Pooling: This can be used to scale the 2D input space by 1:1, 1:2, 1:4 or 1:8 in the x and y directions individually.• Cutting: Cutting can be used to cut a 1×1 up to 64×64 pixel sized patch out of the 2D input space that is forwarded for source mapping.• Polarity Filtering: Polarity selection provides the ability to use a specific polarity or both polarities.• Source Mapping: A patch of 64×64 pixels can be arbitrarily mapped one to one (specifying tag, dx, dy and cores) to the standard event word format.Such mapped events are introduced to the top level router for further routing and mapping inside the normal event system, as described in Section 5.1.
Figure 27: The corner of the chip, showing the eight channels of the Analog Front-End (AFE), including everything presented in [21] .

Inter-chip event communication
As alluded to in Sec. 5, each chip has four high-speed asynchronous AER buses on the four sides to directly transfer events in and out of the chip.The pins are assigned in such a way that adjacent chips can be conveniently connected together, which facilitates network scalability across a 2D grid.Each chip can directly address a neighborhood comprising up to seven chips in each direction, which allows a maximal 8 × 8 fully connected chip array without any external mapping.Using an alternative packet format, this grid also transmits and receives sensor events to and from its direct neighbors, see Sec. 5.4.

Direct monitoring
Some important analog signals are copied to six external pins through rail-to-rail buffers so that they can be monitored directly off-chip using an oscilloscope for debugging purposes.These are a neuron membrane potential from all four cores and analog voltage or current reference parameters from Parameter Generators 0 and 1.Also externally available are digital homeostasis charging direction signals from all four cores and a digital delay pulse extender pulse from particular synapses on each core.

On-chip monitoring
Sixty-four on-chip, current-based spiking analog to digital converters (sADC) ensure easy monitoring of all relevant neural signals.This greatly improves the configuration experience and usability.
The signals are divided into three separately configurable groups, in order to adapt to the wide range of signal magnitudes.All of the chips supported by Samna should be supported in a similar way, such that once a user is familiar with the GUI and the API for one chip, the experience is reasonably portable to the use with other chips, thus saving the user familiarization time.
Samna aims to support the remote use of DYNAP-SE2 (and other chips) such that the user interface and user-supplied code can (but need not) run on a different computer (e.g. the user's laptop) from the computer to which the chips are attached, be it at the same desk, in a server room in the same building, or half-way around the world.This facility has already proved invaluable in teaching.Students working at home have been able to perform experiments on DYNAP-SE2 chips without having to be physically provided with the hardware.
Experience with earlier generations of mixed-signal neuromorphic chips has shown that it is highly advantageous to provide a graphical user interface (GUI) to provide visual feedback of, for instance, neural spiking activity, and to provide on-screen virtual potentiometers to control on-chip analog parameters.This is particularly important while initially tuning those parameters.At the same time, for anything beyond this most trivial of interactions, an application programming interface (API) of some kind is essential.In earlier software, the existence of these two interfaces, through which the state of the neuromorphic chips which are being used could be altered, caused problems, as the state could be changed in the GUI without this being apparent to code using the API and vice versa.Avoiding these kinds of discrepancies between different components' view of the state information has been key to the architecting of Samna.
The API presented to the user is in Python 3, as Python has become the de facto standard in neuroscience, in particular in the field of modeling and simulation [47,48].The underlying code, however, is in C++ (C++17) for performance.
Finally, for broad acceptance and ease of use, it is important that Samna is supported on multiple platforms.Currently Linux and macOS are supported.The following description concentrates on the DYNAP-SE2 and FPGA firmware communication modules of Samna, as these were the modules written in the course of the DYNAP-SE2 project.7.2.1.User code, GUI and object store Although the GUI is part of Samna, it is on equal footing with the user's code when accessing the rest of the system.Both talk to the DYNAP-SE2 module of Samna via a local remoting layer and a remote object store where the remote object store and everything below it may be on a remote computer.Objects from the DYNAP-SE2 module (and other similar modules supporting other hardware, not shown in Fig. 28) can be placed in the object store and transparently retrieved from there by the user's code and/or the GUI.They can then be manipulated and returned to the store and thus to the lower modules.

The Software and Hardware Stack
The user's Python code only sees a Python extension library which can be imported in the usual fashion: import <software> from <software>.<chip> import * import samna from samna.dynapse2 import * From this point on, barring a little setup to connect to a remote Samna node, the user need not be aware of the presence of the object store, or that the hardware might be attached to a remote machine.The classes in <software>.<chip>samna.dynapse2can all be used transparently as if everything was local.Within Samna, the actual Python interface to the underlying C++ code is implemented with the aid of pybind11 [49].

7.2.2.
DYNAP-SE2 module Within Samna's DYNAP-SE2 module, there are DYNAP-SE2Interface classes which provide an interface to facilities provided by the PCB(s) on which the DYNAP-SE2 chips are mounted.At the time of writing, two DYNAP-SE2Interface classes exist: DYNAP-SE2DevBoard and DYNAP-SE2Stack are available for the two present PCB types, dev board and stack respectively.Alongside the DYNAP-SE2Interface class is the DYNAP-SE2Model class which provides an interface to an abstraction, held in a DYNAP-SE2Configuration class, of the hardware state in the physical DYNAP-SE2 chip(s).The DYNAP-SE2 chips do not support the read-out of internal state, so the entire state information is held in software in the DYNAP-SE2Configuration class and other aggregated classes which are not shown in the figure.See Sec.7.2.5 below for details.
In operation, the user's code, and/or the GUI, obtains a reference to a DYNAP-SE2Model object via the object store, then gets the current configuration of the hardware from the DYNAP-SE2Model object as a DYNAP-SE2Configuration object, modifies that object and the tree of objects within it representing the neuron and synapse configuration information, and applies the DYNAP-SE2Configuration object back into the DYNAP-SE2Model object and hence to the hardware.This process can then be performed repeatedly, see Fig. 29.In this way, changes made by the user's code are visible to the GUI and vice versa.
When the DYNAP-SE2Configuration object is set back into the DYNAP-SE2Model object, the DYNAP-SE2Model determines the changes from the current configuration and uses event generator functions to produce a list of configuration events sufficient to bring about those changes on the DYNAP-SE2 chip(s).This list of events is then passed to the DYNAP-SE2Interface object for transmission to the hardware.Meanwhile, address-event (AE) streams to and from the hardware pass directly to and from the user code and the GUI directly via the same DYNAP-SE2Interface object.

FPGA firmware communication module and below
The FPGA firmware communication module manages the packet-based communication with the firmware instantiated in the FPGA on the hardware.To avoid overhead associated with constantly allocating and freeing packet buffers, the firmware communication module manages a pool of constant-length packet buffers.Empty packet buffers are then obtained by the overlying hardware-specific module(s), in this case by a DYNAP-SE2Interface object in the DYNAP-SE2 module, when there are events to send to the hardware.The hardware-specific module is responsible for filling in the payload of the packet before calling back into the firmware communication module to let the latter complete the header of the packet with appropriate payload size information and put the packet on a transmit queue.
The firmware communication module is also home to a thread which continually attempts to read from the underlying hardware platform support module.At the time of writing, for DYNAP-SE2 this is always the OpalKelly module, since both the supported dev board and stack boards interface via Opal Kelly [50] FPGA Integration Modules.After each read, the firmware communication module determines whether the firmware is ready to accept more data, and if so, how much.It then takes as many packets as possible from the transmit queue and writes them out via the OpalKelly module, packing them into the blocks that the OpalKelly layer understands.Once the packet buffer contents have been copied into the OpalKelly blocks, the packet buffers are returned to the packet buffer pool.
The OpalKelly model abstracts the 'Pipe' and 'Wire' interface provided by the Opal Kelly hardware and communicates with the hardware via libusb [51].Finally when the Opal Kelly board receives the blocks assembled by the software, the FPGA firmware unpacks the individual packets from the Opal Kelly blocks and passes the event data contained in the packets to the DYNAP-SE2 chips via the appropriate bus.7.2.4.Events from the chip(s) Events coming from the inter-chip communication buses and from the sADC output of the DYNAP-SE2 chip(s) are read by the firmware in the FPGA on the Opal Kelly board.In the case of inter-chip communication events, these events are timestamped and placed in packets.In the case of sADC events, the number of events received for each possible sADC address in a fixed time interval are counted, and all of these counts are placed into a different packet type at the end of the interval.In both cases, the packets are placed in blocks and transmitted over USB to the host.When these blocks are read by the thread referred to above in Samna's FPGA firmware communication module, the packets are unpacked from the blocks into buffers taken from the packet buffer pool and dispatched according to packet type.In the case of the normal timestamped events, the packets are placed into a queue from which they can be read by the top-level code via the DYNAP-SE2Model object.In the case of sADC count packets, the packet contents are written into a buffer which always holds the latest sADC count values which is also available to be read by top-level code via the DYNAP-SE2Model object.7.2.5.DYNAP-SE2Configuration aggregation hierarchy As mentioned above in Sec.7.2.2, the entire DYNAP-SE2 hardware state information is held in the software in DYNAP-SE2Configuration objects and a hierarchical aggregation of Plain Old Data (POD) types and objects of further classes: DYNAP-SE2Chip, DYNAP-SE2Core, DYNAP-SE2Neuron, DYNAP-SE2Synapse etc., which themselves are (almost all) POD types, i.e. they are aggregates with only public data.It is this hierarchically organised data structure which the user manipulates in their Python code to control the operation of the DYNAP-SE2 chips.

Discussion
The large range of dynamics and computing features supported by the DYNAP-SE2 support the definition of networks that can solve a wide range of applications.Similarly, the DYNAP-SE2 fully configurable tag-based routing system enables the definition of arbitrary network topologies, ranging from simple feed-forward neural networks, to fully recurrent ones.
Feed-forward networks are the simplest form of network architectures, in which the neurons process events as they move through the layers of the network.Sparse feed forward networks can be built by dividing the available neurons into layers, and forming unidirectional synaptic connections between layers [52].Unlike in standard crossbar and addressable column approaches [53,54], the CAM-based synaptic addressing allows all the available physical synapses to be used [20].To support dense feed forward networks and allow users to define heterogeneous networks with different fan-in and fan-out figures, each core allows the number of programmable synapses to be increased to 256 per neuron, at the cost of a reduced number of neurons (64 instead of 256).
The asynchronous and mixed signal design of the DYNAP-SE2 is particularly well suited for emulating the dynamics of recurrent spiking neuronal network architectures.The native support for recurrent mapping and continuous physical time emulation overcomes the limits of digital time-multiplexed simulation systems, avoiding the need for complex clock tree designs and reducing signal synchronization issues.Reservoir networks use recurrent connections to build complex network dynamics supporting a 'memory trace' of their activity over time.Attractor networks can exploit recurrent connectivity patterns to memorize patterns, recover partial or corrupted input patterns, and perform stateful computation [55,56].
Both feed-forward and recurrent networks can be configured to implement timeto-first-spike (TTFS) computation.This paradigm relies on the latency of spike waves traveling through a network, like wavefront algorithms [57] or as seen in the nervous systems of weakly electric fish [58].The low-latency nature of DYNAP-SE2 and its ability to support delay-based synapses make TTFS applications first class citizens.In particular, the fact that synapses can be configured to belong to one of four delay classes (with two well-matched -precise-classes, and two purposely mismatchedinhomogeneous-classes) provide a controlled distribution of delays which enables both precise time-to-first-spike configurations, and randomly timed networks [59,60].
The ability to configure synapses as diffusive gap junctions [61] with 2D nearest neighbor connections supports the configuration of networks with local spatially distributed connectivity kernels, as originally proposed in [36,62].In addition, excitatory synapse circuits can be configured to emulate both slow voltage-gated NMDA receptor dynamics [31] as well as fast AMPA dynamics [35].For both AMPA and NMDA synapse types (as well as both inhibitory types, GABA-A and GABA-B), the 4-bit weight resolution, combined with the configurable weight-range scale enable users to explore and implement more advanced hardware-in-the-loop learning systems.
The improved spike-frequency adaptation circuits present in the neuron circuits [41], the neuron's homeostatic synaptic scaling circuit [33], and the synapse short term depression plasticity control [31] provide the user with a large range of computational primitives for exploring dynamics at multiple time scales and produce complex dynamic behaviors [63].
Finally, the ability to monitor all dendritic, somatic and synaptic current traces via asynchronous current-to-frequency ADCs [38] greatly simplifies prototyping and debugging in experiments that explore the dynamics and computing abilities of the DYNAP-SE2.

Conclusion
We presented a full custom implementation of a DYNAP-SE2, built for prototyping small networks of spiking neurons that emulate the dynamics of real neurons and synapses with biologically plausible time constants, for interacting with natural signals in real time.We argued that the real-time nature of the system and its direct sensoryinput interfaces for receiving 1D and 2D event streams make this an ideal platform for processing natural signals in closed-loop applications.We characterized in detail all circuits present on the chip and presented chip measurements that demonstrate their proper operating function.This platform will enable the prototyping of biologically plausible sensory-processing systems and the construction of physical neural processing systems that can be used to validate (or invalidate) hypotheses about neural computing models.

Figure 2 :
Figure 2: Neuronal compartments.64 synapses with 4-bit weights and conditional delay and short-term plasticity (STP) convert pre-synaptic spikes to pulses.The pulses are low-pass-filtered by one of the four dendrites to generate post-synaptic currents (PSC).The dendrites have conditional alpha-function excitatory PSCs, a diffusive grid, membrane voltage gating and ion-channel conductances.The PSCs are injected into the soma, which can switch between a thresholded[27] and exponential integrateand-fire model[28], with conditional adaptation and 'calcium'-based homeostasis.When the neuron fires, the AER spike is sent to up to four chips.

Figure 4 :
Figure 4: N-and P-type DPI circuits and corresponding block diagrams.The output current I out can be thought of as a low-pass filtered version of the input current I in .The circuit is designed in current mode, where I x (x ∈ {tau, gain, out}) is the current flowing in the diode-connected transistor with voltage V x of the corresponding type (for example I out and V out in the schematics).

Figure 6 :
Figure6: Minimal and low power pulse extender.When the active-low input event arrives, the capacitor C immediately charges to V dd , then discharges with current I pw .For the minimal pulse extender PX MIN with only one transistor and one capacitor, the voltage V C on the capacitor is the output.This circuit is simple, but the output is not clean (dashed waveform) and consumes more power as it stays around V dd /2 longer.For the low-power pulse extender PX EFF , once V C reaches the switching threshold around V dd /4, positive feedback will discharge the capacitor rapidly (solid line), so the output pulse is cleaner and consumes less power.The switching threshold is shifted down to ∼ V dd /4 by the unsymmetrical starved inverter as well as sizing the P-FET physically the same size as the N-FET and resulting in a beneficial pull-up/pull-down drive strength imbalance.With this the capacitance can be significantly smaller while still achieving the same time constant.

Figure 7 :
Figure7: Delayed pulse extender.The C-element[37] (shown as ©) is an asynchronous digital circuit that changes its output to X when both inputs are equal to X.When the active-low event arrives, if there is no output pulse (1), the output of the C-element goes from 1 to 0, which starts the charging of the capacitor with current I delay .When the voltage on the capacitor exceeds the threshold of the inverter, the output pulse becomes active (0) and positive feedback charges the capacitor to V dd immediately.The output of the C-element then goes to 1, which starts the discharging of the capacitor with current I pw .When the voltage on the capacitor drops below the threshold of the inverter, the output pulse finishes (1).

Figure 8 :
Figure 8: Event low pass filter consisting of a pulse extender PX and a DPI.The input x is a set of discrete events (treated as sum of Dirac functions) and the output y is an analog current waveform.

Figure 9 :
Figure 9: One DAC (two horizontal structures) with the adjacent sADC block (bright rectangle) between two adjacent neural cores.

Table 2 :I
Nominal I coarse value for each n coarse value coarse 70 pA 550 pA 4.45 nA 35 nA 0.28 µA 2.25 µA

Figure 10 :
Figure10: Flexible DAC of n + 1 bits with minimal current (including x 0 transistor, dashed line) and n bits without it (dashed transistor connected to x 0 bypassed).

Figure 11 :
Figure 11: Somatic circuit block diagram.All the conditional functions within the dashed outline can be disabled or bypassed using digital latches.

Figure 12 :
Figure12: Comparison of the two somatic models in different operating regimes.With a lower gain value (left), the integration phase is logarithmic (linear I mem in Eq. (2)).With a higher gain value (right), the integration phase is linear (exponential I mem in Eq. (2)).The top two plots show the the thresholded model with firing threshold set to around 0.5 V.The bottom two plots show the exponential model, where V mem has an exponentially increasing shape that leads to the neuron firing.While we show data for the voltage across the output capacitor of the circuits, the neuron uses the current resulting from the voltage across the capacitor.This is given by the exponential of the plotted voltage and it is affected by the relevant transistor variables (e.g., U T , κ)[39].

4. 1 . 4 .Figure 13 :
Figure13: Refractory circuit and its block diagram.It combines the pulse extender from Section 3.3.1 with event routing handshaking.In the idle state both the request (req) and acknowledge (ack) signals are inactive(1).When the neuron emits a spike, (spike = 0), req = 0 is sent to the encoder, which returns ack = 0.When both req = ack = 0, the pulse extender is triggered, which discharges the neuron until the spike disappears (spike = req = 1).The encoder then releases ack (ack = 1) and the refractory period starts by discharging at a rate determined by I refractory , during which the neuron's membrane potential is clamped to ground.

Figure 16 :
Figure 16: Spike-frequency adaptation and calcium current.(a) Adaptation and calcium currents.The vertical bars are standard deviations over 200 trials.The neuron receives constant DC input.When it fires at time t = 0, the output of the adaptation and calcium DPIs increase by a certain amount and then decay exponentially.The adaptation current is subtracted from the DC and distal dendritic input .The calcium has a independent weight and a longer time constant, and will fluctuate around a level proportional to the average firing rate of the neuron.(b) Spike-frequency adaptation application example.When DC input is first presented at time t = 0, the neuron starts to spike at a high rate, causing the adaptation current to increase until it reaches a high enough value to shunt the input, which enters the firing pattern as shown in Fig.16b, and the firing rate drops.When the DC input is removed at t = 0.6 s, the adaptation current decays exponentially to 0, until the neurons starts firing again (at high rate) when DC input is again presented at t = 1.4 s.

Figure 17 :
Figure 17: Homeostasis.(a)The neuron receives Poisson-distributed input events at an average of 100 Hz starting at t = 0. To begin with, the neuron has a very high gain and thus a very high firing rate.This makes the calcium current I Ca much higher than the reference (target value -dashed line) and a down regulation of the gain takes place.At around t = 2.5 s, the gain is low enough that the firing rate decreases and the calcium current drops below the reference value, and the gain regulation changes sign.The feedback regulation then keeps the firing activity (calcium current) fluctuating around the reference level.(b) Homeostasis dynamics on a longer timescale.The automatic gain control regulates the gain of the soma very slowly until the firing rate reaches the target in about 15 minutes.Both shorter (milliseconds to seconds) and longer time constants (hours to days) can also be achieved.

1 T 1 T 1 low delay, low mismatch low delay, high mismatch 4 . 2 . 2 .Figure 18 :
Figure18: Synapse block diagram.The input pulse is the active low match signal coming from the content addressable memory (CAM) (see Section 5.1).The output current of the delayed weighted pulse extender (see Section 3.3.2) will be copied and directed to one of the dendritic branches.The weight can either come from a 4-bit DAC of the type described in Section 3.7 (outputs I w , n = 3) or from the short term plasticity (STP) output (V stp ), chosen by the latch STP (default 0 = DAC, 1 = STP).The delay current parameter comes from another 2-bit DAC of the type described in Section 3.7 (output I delay , n = 2 but with always-on I 0 ).The pulse width control I pw is set by the SYPD_EXT parameter.The output demultiplexer uses one-hot encoding, where four latches control whether the current goes to each of the four dendritic branch DPIs.For the two excitatory dendrites AMPA and NMDA, there is also a copy of the current provided to the double DPI (DDPI) responsible for producing alpha-function shaped EPSCs (see Section 4.3.2).

Figure 19 :Figure 20 :Figure 21 :
Figure 19: Four groups of synaptic delay distributions.The configuration of the latches are given in Table3.The measurement results show the standard deviations in I dly0 , I dly1 and I dly2 to be 5.4%, 6.7% and 37.1% respectively.With the different standard deviations the spread and position of the delay distribution can be freely configured via parameters as I dly0 and I dly1 and/or I dly2 are summed depending on the individual synaptic configuration.The summed current then controls the effective delay applied.

Figure 23 :
Figure23: Application examples for the conditional dendritic functions: conductance, alpha-function and NMDA gating.(a) The neuron has both excitatory dendrites in conductance mode, with one of the reversal potentials set to 0.5 V and its synaptic weight very high, and the other has the reversal potential at around 0.7 V but the weight is low.The spiking threshold is set to around 0.6 V. Starting from the resting potential at around 0.35 V, when the first dendrite receives an input spike at time t = 0, it charges the soma up to the reversal potential, and when the second dendrite receives a spike shortly afterwards at time t = 5 ms, it further charges the soma until it crosses the firing threshold and emits a spike (the neuron then goes into its refractory period).However, if the second dendrite receives its input (at time t = 100 ms) before the first dendrite (at time t = 105 ms), then since the second dendrite by itself cannot drive the neuron to fire (due to the low weight) but the first dendrite cannot charge the soma once V mem reaches its reversal potential (0.5 V), which is lower than the firing threshold, the neuron does not emit a spike and slowly leaks back to its resting potential.Thus this neuron could be used to detect the order in time of the two inputs, since it fires if and only if one input comes shortly before the other.(b) The neuron uses both excitatory dendrites, one using the alpha function and the other using only the normal DPI.If the first dendrite receives an input spike at time t = 0, it will start to charge the soma slowly (according to the alpha function), and if the second dendrite receives a spike shortly afterwards at time t = 20 ms, it will charge the soma even further to cross the firing threshold and emit a spike.However, if the second dendrite receives its input (at time t = 300 ms) before the first dendrite (at time t = 320 ms), then since the effect of the second dendrite goes away very fast, and the first dendrite by itself cannot charge the soma to cross the firing threshold either, the neuron does not emit a spike.This mechanism introduces a delayed dynamic, so it can also be used to detect the order of the two inputs.(c) The neuron uses the AMPA and NMDA dendrites.If the AMPA dendrite receives an input spike at time t = 0, it will charge the membrane potential to a value higher than the NMDA threshold (which is set to around 0.1 V), and if the NMDA dendrite receives a spike shortly afterwards at time t = 5 ms, it will charge the soma to cross the firing threshold and emit a spike.However, if the NMDA dendrite receives the input (at time t = 100 ms) before the AMPA dendrite (at time t = 105 ms), then since the membrane potential at the moment when the NDMA dendrite receives the spike was still lower than the NMDA threshold, it has no effect on the soma, and the AMPA dendrite by itself cannot charge the soma to cross the firing threshold, so the neuron does not emit a spike.This mechanism forces an asymmetric condition on when the soma receives the input, so it can also be used to detect the order of the two inputs.

Figure 28 Figure 28 :
Figure28shows the full stack of DYNAP-SE2 software and hardware, from the user's Python code and the GUI at the top to the DYNAP-SE2 chips at the bottom.* name hidden for double-blind review process

Table 1 :
[20]ary of enhanced and new features of DYNAP-SE2 compared to a current multi-purpose mixed signal prototyping platform DYNAP-SE[20]offered by the neuromophic engineering community.