From clean room to machine room: commissioning of the first-generation BrainScaleS wafer-scale neuromorphic system

The first-generation of BrainScaleS, also referred to as BrainScaleS-1, is a neuromorphic system for emulating large-scale networks of spiking neurons. Following a ‘physical modeling’ principle, its VLSI circuits are designed to emulate the dynamics of biological examples: analog circuits implement neurons and synapses with time constants that arise from their electronic components’ intrinsic properties. It operates in continuous time, with dynamics typically matching an acceleration factor of 10 000 compared to the biological regime. A fault-tolerant design allows it to achieve wafer-scale integration despite unavoidable analog variability and component failures. In this paper, we present the commissioning process of a BrainScaleS-1 wafer module, providing a short description of the system’s physical components, illustrating the steps taken during its assembly and the measures taken to operate it. Furthermore, we reflect on the system’s development process and the lessons learned to conclude with a demonstration of its functionality by emulating a wafer-scale synchronous firing chain, the largest spiking network emulation ran with analog components and individual synapses to date.


I. INTRODUCTION
Simulating the dynamic properties of large-scale spiking neural networks is challenging due to the massively parallel interactions of their neurons and synapses.The BrainScaleS neuromorphic architecture proposes a solution to this dilemma by providing inherently parallel computation at nodes operating as neurons and synapses and communicating through asynchronous spikes.It thereby achieves a constant emulation speed with increasing network sizes [1].
BrainScaleS implements physical models of neurons and synapses on a CMOS substrate with analog circuits, while the spike communication is digital.On the one hand, the physical models inherently provide solutions to neuron and synapse dynamics in continuous time, in contrast to the timediscretized and numerically integrated solutions of digital systems and software simulations.On the other hand, the programmable digital communication of action potentials allows for flexible network topologies and the possibility of using digital logic to feed and read spike events from outside the system.Furthermore, circuits are operated in strong inversion, targeting dynamics with a typical speedup factor of 10 000 compared to biological real-time.
The BrainScaleS-1 system utilizes wafer-scale integration to achieve large ASIC counts with energy efficiency and high communication bandwidth.The structure of its underlying neuromorphic chip and the technology to achieve its waferscale integration are introduced in [2], [3], [4], [5].Turning the silicon wafer into a ready-to-use system, though, implicates bringing several additional components, shown in fig. 1, to work hand in hand.For that cause, a commissioning chain is established, which is this paper's focus.
We first illustrate the different components that constitute the system and how they are tested.Then, we show the steps to assemble the module before it is finally placed in the machine room, as shown in fig. 2. In the second part of the paper, we describe the methods devised to bring such a system into a reliable substrate for neuromorphic experiments: a large number of VLSI analog components inevitably leads to malfunctioning parts and analog variability, for which an underlying fault-tolerant design and suitable handling have to be put in place.To demonstrate its operation and the successful implementation of these measures, a biologicallymotivated network of spiking neurons, a synchronous firing chain, is emulated on a fully commissioned BrainScaleS-1 wafer module.
The system belongs to the still-nascent field of neuromorphic computing and remains under continuous development.Having pioneered a neuromorphic wafer-scale integration of VLSI analog and digital circuits, we also discuss the lessons learned while solving or circumventing the challenges faced along the way.Taken from [6].
Fig. 2. The BrainScaleS-1 machine room comprising 20 wafer modules organized in 5 racks.A slot in the middle of each rack hosts the Analog Readout Module and the Main Control Units of its neighboring wafer modules.Gigabit-Ethernet cables connect each wafer module via aggregation switches to the control cluster positioned in the middle rack.Taken from [6].

II. SYSTEM COMPONENTS AND INDIVIDUAL TESTS
A BrainScaleS-1 wafer module is depicted in fig. 1.Each of its constituent boards is individually tested before its integration into the system, which permits differentiating errors in the parts from those arising from the assembly.A short description of each component and the tests it undergoes is given in the following.

A. The BrainScaleS-1 Wafer
The heart of each module is an uncut 20 cm wafer, displayed in fig.3a, fabricated in UMC 180 nm technology comprising 384 High Input Count Analog Neural Network (HICANN) ASICs.Each HICANN contains 512 analog neuron circuits implementing the adaptive exponential integrate-and-fire The BrainScaleS-1 wafer with applied postprocessing to achieve wafer-scale integration and to establish its connection to the (b) bottom side of the Main PCB.There, the wafer connects through elastometic connectors to the center, marked with 1.In the borders, 48 connectors, marked with 2, accommodate the communication boards.
model [5].Single neuron circuits receive input from up to 220 analog synapses.Since neuron membranes can interconnect in groups of up to 64, a maximum of 14 080 synapses can provide input to each of these composite neurons.Synapse weights are stored with 4-bit resolution in local SRAM at each synapse.Each HICANN stores 12 384 analog quantities for parameterization of its analog circuits in Single-Poly Floating Gate (FG) CMOS cells that retain their operation levels according to their isolated gate's accumulated charge [7], [8].These FGs are written via an onboard 10-bit-resolution DAC, enabling reprogramming via incremental loops with feedback.Then, the stored values get translated to either a voltage or a current using a source follower or a current mirror, respectively, to set neuron parameters and other onboard circuit operation levels.While these FGs present a low-power, small-space solution to store analog operation settings, they introduce write-cycle to write-cycle variability, as will be further discussed.Wafer-wide communication is achieved with a customdeveloped redistribution layer applied post-wafer-production, creating around 160 000 lateral connections across chip borders [9].These connections provide the modules with on-wafer spike event communication through low-voltage differential signaling (LVDS) buses utilizing an asynchronous serial event transmission protocol.Furthermore, connections through top-layer pads on the wafer provide the modules with parallel per HICANN off-wafer communication, which in conjunction with programmable and redundant components, constitute the system's fault tolerance [2].
Testing: In order to assess the effect of wafer postprocessing on the digital yield of an entire wafer, initial needle card tests were carried out on two unprocessed 1 wafers to determine their yield immediately after production.Since the wafers undergoing these tests cannot be further processed, comparing results on the same wafers before and after the post-processing is, however, not possible.
The setup for these tests in the institute's clean room is shown in fig.4, and the procedure is as follows.The needle card is used to contact each individual ASIC.Immediately after contacting and powering up, the total current on the used lab supply is measured to detect potential power shorts.Henceforward, all digital memory cells on the HICANN circuits are tested using a built-in Joint Test Action Group (JTAG) access mode.During these tests, 448 HICANNs on each of the two wafers were tested, and 93 % of them showed no single digital error.To compare, UMC's calculator estimates a yield of approximately 85 % by taking into account the process parameters and circuit size.However, our results are only an estimation: On the one hand, the tested digital memory cells only cover a fraction of the whole silicon area, which is dominated by analog circuitry.Therefore, the digital test yield could be assumed to be too optimistic.On the other hand, perfect power and signal integrity could not be ensured while connecting the circuits through the needles, leading to a possible detection of false negatives, caused for example by slightly underpowered memory cells.In addition, only wafers from the initial engineering sample production have been available for testing.No documentation has been available to relate the production yield data from UMC to small batch-size engineering runs.Nonetheless, the results match the expectations taking the high level of uncertainty into account.Also, a yield in the order of, e.g., 85 % would not indicate that 15 % of the dies cannot be used.Instead, advantaging from the fault-tolerant design, and depending on the defect type, it could suffice to disable single neuron or synapse circuits, for example, on affected HICANNs that are otherwise fully functional and can remain available for experiments.

B. Main PCB
The Main PCB, displayed in fig.3b, is a 43 cm × 43 cm passive interconnector board for most parts of the wafer-scale integration system.Seven of its 14 layers are used to distribute 23 power rails carrying up to 200 A of current.The rest of the layers are used to route 1152 power monitoring, 1472 high-speed differential communication, and different sideband signals.Auxiliary boards, communication infrastructure, and the silicon wafer are connected via various kinds of detachable connectors.These enable system modularity for development and upgrades, desirable for research and development in dynamic environments over longer timespans.
Testing: The manufacturer1 performs complete optical inspection and electrical tests of the Main PCB.The BrainScaleS-1 wafer modules are assembled using exclusively fully validated, error-free Main PCBs.

C. Auxiliary Boards
The wafer module is completed by populating it with 48 communication boards and auxiliary boards for power delivery, control, monitoring, and inter-module communication.
1) Communication Boards: Each communication board2 contains a field-programmable gate array (FPGA) and connects to one HICANN group consisting of 8 HICANNs.These boards communicate through separate high-speed LVDS interfaces with each of the connected HICANNs to configure, monitor, and coordinate the experiment runs; they feed and collect generated spikes into/from the experiments.Furthermore, they synchronize the start of experiments to allow for wafer wide execution.Trigger signals generated on these boards also align experiments with analog recordings using the Analog Readout Module (AnaRM).
Testing: The communication boards are tested on a standalone setup that implements loopback connections for the high-speed interfaces.For this purpose, a test board accommodates and tests four PCBs in parallel, as shown in fig.5a.Primarily automated and controlled via software, the tests switch the power supply via General Purpose Interface Bus.Programming is performed via JTAG and Power Management Bus.Tests comprising current consumption measurements, loading and communicating with the FPGA design, as well as memory tests are conducted.In addition, communication with the host computer as well as the links to the wafer and neighboring communication boards are tested.As per data logs, only 18 out of 1404 produced PCBs had to be discarded after failed tests.
2) Wafer I/O PCB: Each one of the module's four Wafer I/O PCBs (WIOs) 2 attaches to twelve communication boards, aggregating Gbit-Ethernet and connections to other communication boards.
Testing: A manual approach is followed as the number of boards is smaller than that of the communication boards.The board, shown in fig.5b, is supplied with power, and the proper functioning of the DC/DC converters is checked with a multimeter.Individual communication ports are tested.In addition, the proper transmission of signals using a signal generator and differential probes is measured.A partial test of the JTAG pins is also carried out.As per data logs, only 2 out of 120 produced WIOs were discarded after failed tests.
3) Main Power Supply: The Main Power Supply (PowerIt) has three output channels: Two 1.8 V outputs as main analog and digital supplies of the wafer with a current limit of 200 A each, as well as a 9.6 V output capable of up to 110 A to supply Testing: Commissioning of the PowerIt involves basic functionality tests and calibration of the current and voltage measurement circuits using an external electronic load capable of sinking 4.8 kW and precision multimeters, see fig.5c.
4) Auxiliary Power Supply: The Auxiliary Power Supply PCB (AuxPwr) designed in [10], receives 9.6 V from the PowerIt and provides ten different voltage outputs for the wafer module.The currents drawn at the derived voltages vary from 50 mA, for the common-mode voltage of the LVDS onwafer communication, to 60 A for the synapse driver output.The board has an L-shape with linear and switching regulators placed on different axes to reduce the coils' electromagneticnoise induction.In addition, the usage of intermediate voltages reduces the power dissipation for the voltage scaling.An onboard microcontroller monitors all the voltages and currents.Four voltages can be controlled digitally through the Inter-Integrated Circuit (I 2 C) protocol.
Testing: The AuxPwr components' functionality is tested during the calibration process of the board, during which an external voltmeter permits adjusting voltage offsets.A two-point linear calibration under load is performed for the currents.The test stand can be seen in fig.5d.
5) Control Unit for Reticles: Since the BrainScaleS-1 wafer is not cut into individual chips, the wafer module must be faulttolerant to individual HICANN problems.For this purpose, the Main PCB features power-FETs for the supply rails of each HICANN group of the wafer; overcurrents manifest as a large voltage drop across these power transistors.The Control Unit for Reticles (CURe) controls the gates of these transistors and monitors the supply voltages of the wafer.Three microcontrollers manage the measured data and react to fault conditions by shutting off the power of the affected HICANN groups.Thus, the CURe allows to identify individual fatal faults and to exclude the respective HICANN groups from the usable components.The term reticle refers to the semiconductor manufacturing process and consists of one HICANN group.
Testing: The CURe is tested using a custom setup producing the voltages expected inside the actual BrainScaleS-1 wafer module, simulating all possible fault conditions while the response time is measured.Likewise, the drive strength of the control signals for the power transistors on the Main PCB is quantified.The test setup is displayed in fig.5e.
6) Analog Readout Module: Further insight into the neuron dynamics can be obtained via measurements of its membrane potential, allowing for a better understanding of experiment results and the implementation of calibration routines.To this end, each neuron contains a switchable analog output amplifier that connects to one of two 50 Ω output buffers per die.These two outputs are each short-circuited across dies in the same HICANN group.Therefore, each of these groups has two analog outputs, totaling 96 independent analog channels available on each wafer module.
The AnaRM system consists of twelve FPGA-controlled 12-bit ADC modules that allow for the digitization of the membrane voltages on one wafer module per BrainScaleS-1 system rack.Each of the modules in the AnaRM system connects through a ribbon cable to one of two Analog Breakout PCBs mounted on the Main PCB, receiving eight analog signals that are multiplexed into the ADC.An additional digital signal acts as a trigger; four HICANN groups share one, allowing synchronization during an experiment between the involved communication boards, HICANNs and the AnaRM system.Overall, the AnaRM system can simultaneously sample 12 membrane traces per wafer module.
Testing: The FPGA board in the AnaRM, displayed in fig.5f, undergoes DRAM memory tests and basic functional testing of all its peripheral components.The analog front end is tested during the calibration of the modules.This calibration is performed using a source meter to generate a series of groundtruth voltages, which are subsequently measured using each input channel.A 50 Ω series impedance is used at the output of the source meter to match the impedance of the output buffers on the HICANN.This voltage divider formed by the output and input impedances halves the 1.8 V span of the HICANN output to the 0.9 V maximum input of the AnaRM.A linear function fits the recorded signal to the source meter voltages, and the per board offset and gains are stored in a database.

D. Main Control Unit
The Main Control Unit (MaCU) consists of a Raspberry Pi powered by the standby voltage of the PowerIt.Using the I 2 C protocol to communicate with all other wafer module components, it controls the start-up sequence of the system.Additionally, it monitors the multitude of components of a wafer module, which is crucial to ensure robust operation.With this in mind, the MaCU aggregates over 1800 metrics per wafer, e.g., supply voltages, temperatures, or the active/inactive status of components.Most data is of a timeseries nature and stored via Graphite [11], with visualization through Grafana dashboards [12].These dashboards are hierarchically structured, allowing an intuitive drill-down navigation of the data.As it is not practical to manually oversee such a large amount of metrics, alerts are set up to check for unexpected events.For example, supply voltages are checked to be in a valid range and to remain constant over time.Furthermore, event data, e.g., powering up components, is handled via the ELK stack [13] but also integrated into Grafana and displayed as marks.These allow easily matching the events with changes in the time-series data.
Testing: The Raspberry Pi computers used for the MaCUs are purchased and commissioned without further tests.However, the maintenance and deployment of the control and monitoring software they run is part of the system's continuous integration development methodology [14].

III. SYSTEM ASSEMBLY AND INTEGRATION TESTS
In addition to the tests devised for the individual components, the BrainScaleS-1 wafer module assembly process is carried out along with additional tests that allow pinpointing problems to the individual steps.In the following, we discuss the module assembly method and the different tests it undergoes during this phase.

A. Wafer to Main PCB Marriage and Module Integration
The wafer is connected to a total of 11 904 pads on the Main PCB via 384 elastomeric connectors, shown in fig.6a.Mounting the Main PCB and the silicon wafer in custommilled aluminum brackets allows reaching the compression forces required by the connectors.The station used to align the two components is shown in fig.6b.Electrical resistance tests, described in section III-B2, are performed while compressing the elastomeric connectors to ensure correct positioning and even pressure distribution.Then, the wafer module is populated with the auxiliary boards and, when fully assembled, connected to the MaCU.Afterward, it is put on a test stand for initial full-system tests using the same communication chain later used for experiments.Following this step, the wafer module is placed in a rack in the machine room and attached to the AnaRM system.

B. Tests at Different Assembly Stages
Stage-specific tests allow mapping arising errors to individual assembly steps of the BrainScaleS-1 wafer module, which enables evaluating and improving the procedure.This section shows the test results obtained for one wafer as an example.
1) Pre-Assembly Tests of All HICANNs on the Wafer: Before placing a wafer in a module, digital and analog tests are performed on a wafer prober in the institute's clean room, see fig. 4.These tests distinguish production problems from those arising in the wafer module assembly procedure.Similar to the initial needle card tests on the unprocessed wafers, described in section II-A, a test system was built using a different needle card connecting to the redistribution layer of a pair of HICANNs on a wafer with post-processing.Extended analog and digital tests are run on the connected dies, a process that is repeated until the entire wafer is analyzed.These tests serve two purposes: first, to sort out wafers with a high error count that might arise from disrupted connections in the post-processing, and second, to establish a base level for the following assembly tests.Figure 7a shows the results of a high-level test for all HICANNs of one wafer.The image shows more test results than the number of dies on the picture of the assembled wafer module.The reason for this was design constraints and limited routing resources on the Main PCB, by which not all HICANN groups could be electrically connected and thus used within the module context; those at the edge of the wafer were left out.For the same reason, the two HICANN groups at the center are without high-speed connection.
2) Tests During the Assembly Phase: For these additional tests the Main PCB is equipped with test PCBs 1 , shown in fig.6c, which measure ESD diode currents and termination resistances between the LVDS lines on the wafer.The tests determine whether a good connection of the wafer to the Main PCB exists.Figure 7b shows the result of one of these tests, where only the same faulty device on HICANN group 29, also detected in the needle card test, can be seen.No additional faulty devices validate that the wafer to Main PCB marriage was appropriate.
3) Post-Assembly Tests of All HICANNs on the Wafer: After the assembly of the wafer module is completed, the same tests run on the pre-assembly phase are conducted, and results are compared.The results for one test are shown in fig.7c.The errors in HICANN groups 15 and 29 are still present, while the errors in groups 36 and 42 are not.Further investigations could trace these last errors to connection problems of the needle card used in the wafer prober. 1 Developed by the group of Yasar Gürbüz at Sabanci University, Istanbul IV.COMMISSIONING SOFTWARE After assembly, additional steps are necessary to bring the BrainScaleS-1 wafer module into readiness for experiments.These include digital tests to find and exclude malfunctioning components and calibrating the individual neurons to address manufacturing-process-induced circuit mismatches.Databases store the results from these two steps, allowing serialized data storage to disk.See [14] for details.Furthermore, all steps are fully automated and periodically executed after installation of the module in the machine room to track the systems' current state.

A. Communication Tests
The first test that is executed on a newly assembled wafer module is the communication test, which is used to find unresponsive HICANNs.Communication problems most likely arise from insufficient connection quality between the Main PCB and the wafer, cf.

B. Memory Tests
Using a whole uncut wafer, each BrainScaleS-1 wafer module profits from better energy efficiency and higher bandwidth for communication between its ASICs as if these were produced separately and then integrated.This approach presents a challenge, though, as producing an error-free waferscale system in such a way is not possible, as ASICs with manufacturing-induced problems cannot be removed.The BrainScaleS-1 system addresses this through a digital memory test, which in conjunction with the fault-tolerant system design, enables dynamic handling of malfunctioning components.Executed after assembly as well as periodically, the test also tracks the state of the systems over time.Therefore, it allows to operate wafer modules despite a subset of malfunctioning components or connections, consequently increasing the yield of functional systems.The test builds upon the communication test and establishes a connection to a HICANN group.First, it initializes the connected communication board and the HICANN under test.Subsequently, each digital memory is repeatedly write/readtested using random values.If a mismatch is found, the largest functional unit that depends on the malfunctioning component is excluded so that it is not utilized in experiments.HICANNs that can communicate only via JTAG are exclusively used for spike route-through to and from neighboring HICANNs on the same wafer.For these, a routing-specific reduced memory test  I. Tested components and their position on the HICANN are visualized in fig.8.
With 110 KiB per HICANN, the configuration registers of the synapses make up the largest part of the tested memory.They are split into two synapse arrays per HICANN, each of which is programmed by a custom on-chip SRAM controller described in [15].In the tests, on 1.97 % of the synapse arrays, unstable behavior is observed.This means, consecutive write/read operations with fixed values on a single synapse register show varying results.Since problems in individual synapse registers are very unlikely and could also derive e.g. from the control chain, a special stability test is introduced.There, each register is tested several times with the same value.If a single register shows unstable behavior, the whole synapse array is excluded.Thereby, at the expense of functional components, only stable programmable synapses are used during experiments.
A test with ten write/reads of random data per component and a stability test with ten repetitions takes approximately 70 s per HICANN.Since the tests can be executed in parallel for each HICANN group, a full wafer test takes approximately 10 min and can be executed periodically to track the state of the systems.

C. Effective Exclusion of Components
In special cases, it is not enough to skip malfunctioning components during an experiment, but it is also important to be aware of hardware specific dependencies that can be linked with these components.This is achieved through an additional step, the effective exclusion of components, where functional but dependent components are excluded.Several dependencies lead to an effective exclusion.Some of them are visualized in fig.8.
• Unstable repeater controller: To enhance the signal integrity of spike events that have to be routed across several HICANNs, the signal is regenerated between dies by repeaters.These repeaters are organized in blocks where each block has a custom on-chip controller used to program its repeaters.Since failures in the digital memory of the repeaters are very unlikely, more than one failing repeater per block indicates that there could also be a problem in the control chain.To ensure no unstable components are used, all repeaters connected to the corresponding repeater block are removed from the availability database in such cases.
• Buses connected to malfunctioning repeaters: Buses are used to route spike events between neuron circuits.On boundaries between two HICANNs, the buses are connected to repeaters that regenerate the signal.Each repeater is connected to a bus on its own HICANN as well as on a neighboring one.If a repeater is failing the memory test, there is no possibility to test if it sends wrong signals to its connected buses.To circumvent this, all buses connected to such a repeater are excluded and thus not used during an experiment.The same holds for repeaters on HICANNs without JTAG connection.As the repeaters cannot be initialized correctly, all neighboring buses connected to repeaters on the problematic HICANN are excluded.
• Malfunctioning FG controller: The FGs are not only used to configure the neurons but also to supply bias voltages to the spike event routing.If an error in the controller programming the FGs is found, the whole HICANN is excluded from the availability database and, in the following, treated as if there would be no JTAG connection.Such a HICANN is not used at all in experiments.
• Without high-speed: HICANNs that have no high-speed connection are, due to the higher bandwidth requirements, not used to emulate neurons or external inputs but only used to route spike events.This is achieved by removing all neurons and external input mergers from the availability database.
• No routing options: To improve the placement and prevent lost connections, the algorithm checks that all the components required to establish a route from each neuron and external input merger are available.If not, the neuron or the external input merger is excluded and therefore skipped in the process of building a network.
• Handling hardware versioning: In an earlier version of the post-processing, connections were established to HICANNs on the edges of the wafer that must not be connected.To prevent leakage currents from these dies, the connected buses are excluded.Therefore, it is unnecessary to distinguish wafer versions in all the following steps.An overview of removed components before and after the effective exclusion of components can be seen in table I.The availability database, used to handle the excluded components, allows for storing different states on disk, so malfunctioning components and effective components can be differentiated afterward.This is for example important during the initialization of the HICANNs, where only malfunctioning components have to be handled specifically.

D. Analog Readout Tests
Before usage, the analog recording system gets verified for correct connectivity and configuration by running a series of tests.Each HICANN is set in sequence to generate two different voltage levels, which the AnaRM measures.The voltage levels originate from the configuration of one of the FGs.A recording that agrees with the settings and whose noise levels are within a tolerance threshold indicates that the system is ready for experiments or calibration runs.

E. Calibration
VLSI transistors are subject to manufacturing variations translating into differences in signal response.This problem and the potential impacts have been noted since the first approaches to neuromorphic computing using VLSI [16].Consequently, the HICANN's microelectronic analog circuits require correction mechanisms to deliver homogeneous responses.
As the manufacturing variability is stationary within the components' operating ranges, thus termed fixed-pattern noise, it can be reduced by suitable calibration.To this end, a framework has been developed for the BrainScaleS-1 wafer module that performs a one-time circuit characterization through running sequences of experiments that sweep neuron parameters, measure the effect in the observable, and perform appropriate fits on suitable models.The process creates a database that holds the calibration results and is loaded on routine hardware usage, allowing for automatic translation between biological-space parameters and FG-stored parameters.Such a conversion is automated and transparent for the users when running an experiment.See [14] for details.
The calibration procedure configures all the neuron circuits at once and then processes the individual measurements to allow for programming the FGs in parallel.In addition, parallelizing the analysis algorithms on the already measured steps further optimizes the time required for calibration.Regardless, an increase in the number of calibration steps could improve the quality of the fits, while also parameters that are more sensitive to FG parameter variability benefit from an increase in the number of measurement repetitions.Consequently, calibration time and precision of the results require balancing.
1) Calibration Methodology: In the BrainScaleS-1 system, the only analog neuron property that can be directly recorded is the membrane voltage.Accordingly, all parameter calibrations are based on membrane recordings under different parameter configurations.In general, the calibration of one parameter sweeps over its operating range while maintaining the rest of the parameters constant.The execution order is relevant, as some calibration routines require an already calibrated subset of parameters.Furthermore, the calibration accounts for analog readout noise, and measurements can be repeated to factor in FG parameter variability.
The main neuron calibration parameters are summarized in fig.9.In the following, the calibration procedure is exemplarily shown for the parameter I pulse , which controls the refractory period τ ref , i.e., the time after the emission of a neuron's action potential during which its membrane is clamped to the reset potential and the neuron can elicit no further spike.The higher I pulse is, the shorter the achieved τ ref .Each I pulse calibration step sets the resting potential E leak above the level at which a spike event is elicited, i.e., V threshold , which causes the neurons to spike continuously.The inter spike interval (ISI) is the measurable result.
In the first step, I pulse is set to maximum, and the corresponding ISI is regarded as ISI 0 , the minimum attainable interval under the current settings.Larger refractory periods are referenced to ISI 0 by using making the minimum τ ref zero seconds by definition.
Such a model derives from transistor-level simulations described in [17].The resulting fits for five neurons are shown in fig.10.
The pair of constants c 0 and c 1 corresponding to model eq.( 2) is stored in the calibration database for each neuron, which is then used for translation from τ ref in seconds to I pulse in digital value.Further details for each parameter calibration are provided in the supplementary material.
Depending on each parameter's sensitivity to the programmed FG values, some calibrations enable a more precise setting of parameters than others.An increased  sensitivity due to non-linear hardware dependencies is found where small changes in FG values cause large changes in the observables.Furthermore, for some FGs only a limited range of their available parameter space is used, reducing the ability to set their corresponding parameters precisely.
As can be seen from the measured values in fig. 10, such is the case for I pulse .For comparison, fig.11 shows how the leak potential E leak , which is easier to control, obtains a more precise calibration than I pulse .For this reason, the control precision of several parameters was improved in the second-generation BrainScaleS-2 chip [18] partly by enabling digital value storage.
2) Synapse Weight Calibration: The calibration of the synaptic input differs from the other calibrations due to its additional dependency on the synapse drivers.The strength of a synapse is configured by three hardware parameters.The 4-bit digital weight w stored per synapse, a scaling factor gmax div stored per synapse row, and the FG-stored reference parameter V gmax .This last parameter is set per synapse row and selects one of four possible values shared by blocks of 128 neurons.Calibrating this large parameter space for each of the 512 neurons with 110 connected synapse drivers using the analog readout system, which allows for measuring 12 membrane traces in parallel, is not possible in a reasonable time frame.Therefore, a per wafer translation is performed, where only some of the components are taken into account to find the average circuit behavior.The measurement requires the results of all previous calibrations.Neurons on different HICANNs are stimulated by a single spike for different combinations of the three hardware parameters to cover the whole parameter range.Subsequently, a fit of the conductance based neuron model is applied to the recorded membrane traces to extract the ratio between biological weight and membrane capacitance wbio CHW .Since the membrane capacitance is fixed during experiments, it is unnecessary to separately determine both values.During the fit, the model parameter of the already calibrated reversal potential is fixed.The reduced χ 2 value of the fit is used to identify and exclude saturation effects of the involved operational transconductance amplifier 1 , cf. fig.9, which might occur for large weight values.Finally, the weight translation is found by fitting the expected hardware behavior adapted from [19], to the results of the first fits.The fit parameters i 0-8 characterize the effect of parasitic capacitances found in the synaptic circuit for each enabled bit of the 4-bit weight value w. Figure 12a demonstrates the large parameter space of the synapse weight calibration.It shows the measurement of a single neuron, stimulated by a single synapse driver for a single V gmax value without rewriting the FGs.The performance of the fit applied on the whole measured parameter space is shown for fixed values of gmax div in fig.12b and for fixed digital weight values w in fig.12c.Although the whole neuron circuit and consequently the expected noise of each individual component is involved, the error of each measurement does not exceed the variations observed in other calibrations.However, additional deviations arise from rewriting the FGs, which is demonstrated in fig.12d; this renders the search for a more precise fit function unbeneficial.In addition, the per wafer calibration opted over a per neuron circuit calibration introduces a dominant error due to the deviations between neuron circuits, shown in fig.12e.A precise weight calibration within a reasonable runtime would be achievable via a parallel measurement of each neuron circuit.This would also allow to exclude neurons showing unintended behavior.However, this is not possible with the currently used analog readout system.Nonetheless, the lack of a perfect weight calibration can be circumvented via inthe-loop training on the BrainScaleS-1 system, as shown for inference tasks in previous results [6].

3) Calibration Based Exclusion of Components:
The operation of the HICANNs during the calibration is similar to the operation during experiments.All components have to work correctly for the calibration to succeed.Failing calibrations indicate unintended behavior.This allows for testing the whole die, especially the analog circuits that cannot be tested directly.Additionally, thresholds can be defined to exclude outliers.Consequently, neurons that do not pass all calibration steps are excluded from the availability database.Numbers of calibration based excluded neurons on a typical wafer are given in table II.

V. EXPERIMENT SHOWCASE -SYNCHRONOUS FIRING CHAIN
Previous experiments on the BrainScaleS-1 system relied on a small subset of the available neurons [6], [20], [21].In this section, we use a synchronous firing chain (synfire chain) to Fig. 13.Structure of the synfire chains presented in this section.The synfire chain is made up of several groups of excitatory (blue) and inhibitory (red) populations.The inhibitory population connects to the excitatory population within the same group and aims to improve the chain's filtering for synchronous input [26], [29].Each excitatory population is connected to the excitatory and inhibitory population of the next group.By repeating this construction schema (grey), chains of arbitrary length can be realized.The network is excited by a stimulus population (orange) which projects to the excitatory and inhibitory population of the first group.utilize a large number of the available wafer module resources.We start with a relatively short chain to illustrate the behavior of the network and finally present a longer one that utilizes a large part of a single wafer module.
Synfire chains can filter for synchronous activity and propagate the activity along a chain of neuron groups [22], [23].We choose synfire since they can easily be up to arbitrary sizes by increasing the chain length as well as the number of neurons in a single group and have been studied extensively in previous publications [24], [25], [26].Furthermore, synfire chains were used to showcase the functionality of the predecessor of BrainScaleS-1 [27] and to characterize the behavior of the current system in software simulations [28].
Figure 13 displays a synfire chain with feed-forward inhibition.Each chain link consists of an excitatory and inhibitory population.The inhibitory populations are connected to the excitatory population within the same group.This feed-forward inhibition can enhance the filtering properties of the chain [29], [26].The excitatory population forwards its outputs to both populations within the next group.External stimulus is injected in the form of Gaussian pulse packages [24].The strength a denotes the number of input spikes per stimulus neuron and σ the standard deviation of the Gaussian from which the spike times are drawn.We will use (a, σ) to refer to specific packages.

A. Network Behavior
In a first step we will look at a relatively short chain with six chain links, shown in fig.14, to illustrate how the filtering properties of the chain can be tuned.Table III summarizes some of the key properties of the network.We used the manual placement described in [14] to place the different populations on the wafer.Specifically, we distribute the external stimulus over several HICANNs in order to minimize spike loss due to limited bandwidth.As mentioned previously, synfire chains are able to filter for synchronous input and to synchronize less-synchronous input as it travels along the chain [24], [23].Figure 14a shows the propagation of three different input stimuli along the chain.In case of a relatively weak and synchronous input (1, 1) a single, narrow package travels along the chain.If the input is stronger and more asynchronous, we observe a broader response in the first groups of the chain which is synchronized as the signal propagates along the chain such that the responses in the final group are comparable.Too weak and asynchronous input, here (1, 4) as an example, dies out and does not cause a response in the final group.This is in agreement with previous results [24], [29], [27], [28].
Figure 14b shows in more detail for which input stimuli the propagation along the chain is successful.In agreement with the previous observations, weak and asynchronous input is not transmitted to the final group.The response in the final group is almost uniform.This indicates that the packages are synchronized as they travel along the chain.Setting appropriate parameters which reproduce the expected results from simulations relies on the calibration routines, introduced in section IV-E.The calibration allows to set model parameters in the biological domain and reduces the inherent mismatch between the physical components.

B. Wafer-Scale Network
The previous section demonstrates the implementation and control of a short synfire chain on the BrainScaleS-1 system.This section shows that the commissioning efforts described in section IV also facilitate the implementation of wafer-scale networks.The properties of this synfire chain are summarized in table III.
The complexity of the emulation increases with the size of the model.While for a relatively short chain it is possible to investigate the behavior of individual neurons and manually detect malfunctioning and bad calibrated entities, this is not feasible for larger experiments.Therefore, digital tests described in section IV-B are essential to automatically avoid these components during the experiment.
To simplify the automatic routing of the abstract network description to physical entities on the wafer, we once again employ manual mapping, see fig.15a.We place the different groups in a zig-zag pattern starting from the topleft side towards the bottom of the wafer and then back up towards the top-right side.This placement schema allows the BrainScaleS-1 operating system [14] to find appropriate connections between the different populations and minimizes synapse loss, i.e. synaptic connections that could not be mapped to the hardware.We were able to successfully emulate a synfire chain with 190 chain links on the BrainScaleS-1 system.Figure 15b shows an example of a pulse package that travels along the full length of the chain.The activity of the individual groups still depends on the exact neuron and synapse properties, but the calibration ensures that the pulse package remains compact.A synchronous pulse reaches the final group after a signal propagation time of about 600 ms in the biological regime, which corresponds to 60 µs wall-clock time.Biological Time (ms)

VI. DISCUSSION
Starting its development more than ten years ago, the first-generation BrainScaleS wafer-scale neuromorphic system represents a milestone toward a large-scale analog neural network emulation platform.Over years during which several modules have been commissioned and experiments run, we have learned important lessons on building and handling such a complex system.We discovered drawbacks in our first implementation; some of them could successfully be circumvented via our commissioning software.Our secondgeneration neuromorphic BrainScaleS-2 chip [18] addresses BrainScaleS-1's design weaknesses.Moreover, it enables the application of advanced learning mechanisms by introducing a digital plasticity processor, neuron multi-compartment capabilities, as well as extended analog to digital conversion capacities.
In this paper, we described the individual components of a BrainScaleS-1 wafer module and showed the necessary steps to assemble it.A wafer-scale analog system is complex and requires many hardware components working concurrently.
Once a wafer module is assembled, it is often not possible to pinpoint defects in individual components.To alleviate this, each component must get tested on its own; malfunctioning ones must be repaired or replaced before they are added to the system.Additional tests during the assembly are also crucial to allow for finding and solving errors that arise during that process.The remaining problems are handled by the exclusion of affected components or circuits from the availability database to ensure the correct operation of the system.
The importance of the tests and monitoring remains after the wafer module gets placed in the rack.For example, tight monitoring during system operation is necessary to uncover the wear out of system components.Automated alerts are fundamental for warning in case of values deviating over time.Furthermore, the tests executed nightly help keep track of the wafer modules' state.
Concerning the wafer in the core of the BrainScaleS-1 system, the probability of fabrication defects in microelectronics is proportional to the circuit area [30].Thus, it is unfeasible to build such a large analog system without malfunctioning components.This will most likely further intensify in the future by utilizing novel materials.With this in mind, the digital tests introduced are executed nightly to identify such malfunctioning components and exclude them from our availability database.These tests enable storing different states of the database on disk and allow to differentiate actual malfunctioning components from those not usable due to a dependency.The users can then utilize reliable components, possibly even using a custom availability database.
An additional challenge using analog hardware is the fixed-pattern noise introduced by unavoidable manufacturing process variations.In the BrainScaleS-1 system, this is worsened by the design decision to use FGs to store the neuron configuration.These cells allow for long-term storage of analog parameters without storing digital values onboard.However, the current implementation introduces write-cycle to write-cycle variability.Though small, these variations lead to noticeable errors if they are further enlarged by non-linear dependencies between control signal and observable.To minimize these effects, we presented our calibration framework, which also allows non-expert users to configure experiments in the biological domain without specific knowledge of the hardware.We demonstrated the narrowing and centering of the achieved value distribution for exemplary parameters after the calibration was applied, limited by thermal noise and the variations caused by the FGs, nonetheless.Since single-poly floating-gate cells are non-standard devices and not supported by the manufacture, the second-generation BrainScaleS-2 chip reverts to a digital parameter storage scheme employed in a previous neuromorphic architecture [31], thereby vastly improving analog parameter accuracy.Since the second generation uses a manufacturing process with much smaller geometry, namely 65 nm vs. 180 nm, the area penalty for the digital parameter storage is manageable.A further advantage of the novel parameter storage is the reduced programming time [32].
In the presented wafer-scale implementation, the single-poly floating-gate parameter storage was the only feasible solution to achieve the required number of analog parameters for the neuron circuits.
On top of explaining the calibration methodology, we demonstrated the necessity for parallel execution of the calibrations.The large parameter space of the synapse weight calibration exceeds reasonable runtimes using the current readout system.In order to circumvent this, we introduced a per wafer calibration which, compared to a per circuit calibration, shows larger errors but can be generated in a reasonable time frame.To improve this, we developed a new readout system, which will replace the external set of ADCs with on-wafer-module boards, increasing the parallel readout capabilities from 12 to 96 channels [33].Moreover, in the BrainScaleS-2 chip, we introduce a per neuron-circuit ADC system, which allows for a massive parallel calibration [18].A per-circuit calibration before each experiment becomes feasible with such a solution.
Finally, we demonstrated the operation of a fully commissioned BrainScaleS-1 wafer module implementing synfire chains.While small chains portray the capability to fine-tune the network parameters, extending to a long chain of 190 links illustrates the possibility to scale up networks.Successfully mapped to an inherently imperfect substrate, it consists of the largest spiking network emulation run with analog components and individual synapses to date.
Our endeavor in developing and maintaining the BrainScaleS-1 system has demonstrated, while illustrating the field's challenges, that building wafer-scale analog neuromorphic hardware is feasible.Furthermore, the BrainScaleS-1 wafer module with its operating system laid the foundation for the next-generation systems; all lessons learned from the first generation contribute to the success of future large-scale neuromorphic systems.A calibration procedure is in place for the BrainScaleS-1 system, which compensates for manufacture-induced analog circuit variability.It accounts for analog readout noise by averaging the features extracted from the membrane traces over time.In addition, measurements are repeated and then averaged after rewriting the Single-Poly Floating Gates (FGs), where stated, to consider FG write-cycle to write-cycle parameter storage variability.
A detailed explanation of each parameter calibration conducted on the wafer module is provided in the following.We first describe the parameter, explain the calibration approach and the settings used, and show plots illustrating the results.In addition to the synaptic-weight calibration presented in the main text, these constitute the complete neuron-and synapse-circuit calibrations performed in the system.Details of the sample points measured, the models utilized, and the average runtimes per parameter are summarized in table I.
Readout shift: On each High Input Count Analog Neural Network (HICANN), every neuron's membrane trace can be recorded by connecting its switchable analog output amplifier to one of two output buffers.Due to circuit variability, each amplifier adds a constant offset to the recorded traces, the so called readout shift.It has to be determined first, since all further calibrations are influenced by it.
• How: Neuron membranes are interconnected in groups of 64 (the maximum possible).Their individual resting membranes are recorded and every neuron's deviation from the group's mean is stored.
• Settings: E leak = 0.9 V, the middle of the range, V threshold above resting potential, I conv set to 0 A to switch off both operational transconductance amplifiers (OTAs) for the excitatory and inhibitory synaptic input conductances.
• Effects: The offset is automatically corrected for all subsequent calibrations by loading the calibration backend.The distribution of the analog output amplifier offsets of all neurons on one HICANN is shown in fig.1a.* Contributed equally V reset : The potential to which a neuron's membrane is set after a spike is generated.It is shared among a group of 128 neurons.Each HICANN contains four of these groups.
• How: Neurons are set to spike continuously by setting their leak potential E leak above their threshold potential V threshold .A recording time of 80 µs per target value collects an average of 39 inter spike intervals on each membrane.The refractory time τ ref is set to maximum in order to allow for long baseline traces between the spikes.The reset voltage is calculated as the average over all the interspike baseline samples to account for readout noise.
• Settings: I conv = 0 A for both excitatory and inhibitory synaptic inputs, shutting off the OTA of their synaptic conductance.I gl = 1.1 µA, I pulse = 20 nA to set the refractory time to a high value.
• Sweep: V reset , with  the individual neurons.
V threshold : The threshold potential of the leaky integrate-andfire model, at which an action potential is elicited and the membrane's voltage is forced into the reset potential for the refractory period.
• How: Synaptic inputs are minimized in order to isolate the membrane and the threshold detect circuits.The threshold potential V threshold is set below the leak potential E leak to elicit constant spiking.The maximum membrane voltage at several spike peaks is averaged and considered the true threshold voltage.
• Settings: The corrected hardware voltage distribution is centered around the correct target value.The standard deviation decreases, as can be seen in fig.1c.
E syni : The inhibitory reversal potential towards which the OTA in the inhibitory synaptic input drives the membrane when processing synaptic input.
• How: V convoffi of the inhibitory synaptic input is set to a small value so the bias generator forces the membrane potential to the inhibitory reversal potential.No spikes are elicted since the threshold voltage is never reached.
Once the neuron is at rest the averaged membrane voltage characterizes the reversal potential.
• Settings: The achieved inhibitory reversal potential voltages before and after calibration are shown in fig.1d.
I pulse : Bias current that controls how fast the neuron's timing mechanism recovers from the reset state after a spike is generated.
• How: Neurons are set to spike continuously by setting E leak above V threshold .For the refractory time constant measurements, the baseline traces corresponding to the reset-state of the membranes are extracted.I pulse is first set to its maximum and the effective refractory period is measured and recorded; this constitutes the minimum achievable period denoted thus τ 0 .The subsequent measured refractory periods are referenced to τ 0 by substracting τ 0 from them, and fitting eq.(Main-2) from the main text.
• Settings: The achieved refractory time constants' mean is closer to the target value after the calibration is obtained and applied, as can be observed in fig.11a in the main text.The standard deviations reduce.In fig. 10 in the main text the limited precision to configure the refractory time constant is demonstrated, as only a fraction of the possible parameter range of I pulse results in reasonable configurations.
E leak : The reference voltage towards which the membrane potential is constantly driven through the leak conductance.
• How: Synaptic inputs are minimized and the membranes are read on a resting state.
• Settings: The corrected hardware voltage distribution is centered around the correct target value.The standard deviation decreases, as can be seen in fig.11b in the main text.
V convoffx : Offset voltage for the integrator on the excitatory synaptic input.The voltage parameter is used by a bias generator that controls the reference of OTA 1 , compensating for mismatches.The offset should balance two effects: minimize an undesired permanent current flowing to the membrane, which shifts the neuron's resting potential, against the weakening of the synaptic input caused by a too substantial compensation.Consequently, the goal of the calibration is to find the sweet spot in between, where the bias generator compensates precisely for the mismatch of OTA 1 .
• How: The point of interest is the transition from a zero to a non-zero conductance on OTA 1 .It is measured by the shift of the resting potential arising for different values of V convoffx .The calibrated value of V convoffx corresponds to the first value where the resting potential is no longer shifted.In addition, the linear range of the relation between the membrane rest-voltage shift and V convoffx is characterized.Effects from the inhibitory synaptic input are minimized by using low values for E syni , I convi and a high V convoffi .Furthermore, the effect is more pronounced for lower values of E leak .
• Settings: 2 µA a low value that limits the leakage current from the synapse onto the membrane, deviations in the effective resting potential arising from leaks through the excitatory synaptic input, as shown in fig. 2. Nevertheless, a minimal I gl is required to allow the neuron membranes to exhibit uniform effective resting potentials.V convoffi : Offset voltage for the integrator on the inhibitory synaptic input conductance.The calibration principle is the same as for V convoffx , but it should be performed independently as both inputs introduce leak currents into the membrane.
• How: A low E synx , I convx and high V convoffx minimize effects from the excitatory synaptic input.
• Settings: E leak = 0.8 V, E syni = 0.4 V, E synx = 1.2 V, I convx = 0 A, I gl = 0.2 µA a low value that limits the leakage current from the synapse onto the membrane, The following parameter calibrations use input spikes to generate post synaptic potentials (PSPs) on the membrane.From the shape of the voltage traces, it is possible to approximate parameters related to the time constants of synaptic inputs (τ syn ) and the membrane (τ mem ).For a single input spike arriving while the membrane of a LIF neuron is in a steady state, the PSP shape can be either described by an αfunction, if both time constants are the same, or by a difference of exponentials if one of the time constants is smaller [1].This behavior is described by and τ = τ2 τ1 a ratio between τ mem and τ syn , derived in [2] and further developed in [3].It relates the membrane's voltage  course with both relevant time constants and the height h of the PSP.The fitting algorithm fixes one of the time constants and varies the other.Although the PSPs are symmetric in τ mem and τ syn , the fact that typically τ mem > τ syn is considered.Once the parameters are determined from the measurements through fitting the model, a linear fit is used to obtain a calibration relating parameters with FG values, as with the previously treated calibrations.
I gl : Bias current that controls the membrane's leakage conductance.This parameter and the chosen membrane capacitance, which can be set to two different values, determines the membrane time constant.
• How: The input spikes should arrive with enough space to allow the membrane to return to a steady-state after each perturbation.A strong excitatory synaptic input is set to achieve a better signal-to-noise ratio.Fitting eq. ( 1) returns both the membrane and the synaptic input time constant from the PSP shape.A fit of the softplus function is subsequently used to translate between biological-space parameters and FG-stored parameters.
• Sweep: I gl • Effects: The achieved membrane time constants' mean is closer to the target value after the calibration is obtained and applied, as can be observed in fig.3b.The standard deviations are reduced.However, as seen in fig.3a, the precision to configure τ mem is limited as only a fraction of the possible parameter range of I gl results in reasonable configurations.
V syntcx : Voltage controlling the excitatory synapse time constant, τ syn,x , by varying the voltage integrator's resistive element.Large values of V syntcx shift E leak towards the reversal potential, since leak currents in the synaptic input integrator inrease for higher voltages.
• How: Similar to the I gl calibration, input spikes that arrive with enough separation are used.Equation ( 1) is fitted to to the extracted values is used to translate between biological-space parameters and FG-stored parameters.
V syntci : Voltage controlling the inhibitory synapse time constant, τ syn,i , by varying the voltage integrator's resistive element.
• How: Similar to the V syntcx calibration.
E synx : In biologically plausible networks, the excitatory reversal potential is above the threshold and thus never reached by the membrane potential.Its calibration is a good showcase for pitfalls during the operation of analog circuits.Intuitively, a direct measurement using the membrane potential would be used for both reversal potentials.However, similar to their biological counterparts, the circuits of the HICANN chip are not designed for the membrane potential to get close to the excitatory reversal potential.Thus, the circuits show a nonlinear behavior when approaching the reversal potential, as they deviate from the center of their design ranges.This can be observed in fig.5a.Therefore, the excitatory reversal potential is measured indirectly.
• How: The height of the PSP of a stimulated neuron is measured for different resting potentials in the linear regime of the circuits.A linear extrapolation is used to  extract the resting potential where the height reaches zero, shown in fig.5a.In the conductance based synapse model this resting potential is equal to the reversal potential.The measurements are repeated for different reversal potentials to extract the linear dependency between hardware value and applied voltage.
• Settings: I convi = 0 A, I gl = 10 −7 s, V convoffx,i = 0.9 V, V syntcx,i = 2 × 10 −7 s, V threshold = 1.8 V, V gmax = 0.9 V, gmax div = 2 LSB, w = 15 LSB • Sweep: E leak , E synx • Results: Results of the calibration compared to a direct measurement can be seen in fig.5b.The disadvantage of the indirect measurement is the increased runtime and the dependency on the shape of the PSP.Small variations of hardware parameters, most likely due to the necessity to rewrite the FG value of the resting potential, are enlarged by the linear extrapolation performed to find the reversal potential.As a result, fig.5b shows larger variations for the indirect calibration than the direct measurement.Nevertheless, the technique allows for correctly calibrating the excitatory reversal potential without directly measuring it.

Fig. 1 .
Fig. 1.(a) 3D-schematic of a BrainScaleS-1 wafer module (dimensions: 50 cm × 50 cm × 15 cm) hosting the wafer (A) and 48 communication boards (B).The positioning mask (C) aligns elastomeric connectors that link the wafer to the large Main PCB (D).Support PCBs provide power supply (E & F) for the on-wafer circuits as well as access (G) to analog dynamic variables such as neuron membrane voltages.The connectors for inter-wafer and off-wafer/host connectivity (48 × Gigabit-Ethernet) are distributed over all four edges (H) of the Main PCB.Mechanical stability is provided by an aluminum frame (I).(b) Photograph of a fully assembled wafer module.Taken from [6].

Fig. 3 .
Fig. 3. (a)The BrainScaleS-1 wafer with applied postprocessing to achieve wafer-scale integration and to establish its connection to the (b) bottom side of the Main PCB.There, the wafer connects through elastometic connectors to the center, marked with 1.In the borders, 48 connectors, marked with 2, accommodate the communication boards.

Fig. 4 .
Fig. 4. (a) Photograph of the wafer prober and (b) a close-up of a wafer under test.Different needle cards have been developed and used for tests carried out before wafer post-processing (section II-A, visible in this setup) and before wafer module assembly (section III-B), respectively.

Fig. 5 .
Fig. 5. Auxiliary boards under test.(a) communication boards test setup and (b) Wafer I/O PCB board.(c) Main Power Supply connected to programmable power supply and electronic load.(d) Auxiliary Power Supply PCB test stand.(e) Control Unit for Reticles test stand.Each Power Emulation Systems for Testing (PEST) board emulates the supply voltages of one HICANN group.(f) FPGA board of the Analog Readout Module.During the calibration, the pins on the top left are connected via a 50 Ω impedance to an external source meter, while the module is connected via USB to the host computer.Figures (a) and (b) made available by S. Schiefer, TU-Dresden.

Fig. 6 .
Fig. 6.(a) Detail view of the elastomeric connectors that connect the pads on the BrainScaleS-1 wafer with the Main PCB.(b) Station used to align the Main PCB to the silicon wafer.The Main PCB is fixed by springs that apply a constant force (blue arrows).Its position is controlled with a micrometer linear stage (red arrows).Angular errors can be corrected by rotating the wafer (purple arrows).(c) Test PCBs mounted on the Main PCB to measure the connectivity to the wafer during assembly.

Fig. 7 .
Fig. 7. Test results of one BrainScaleS-1 wafer for the different assembly steps: (a) Before assembly (b) during assembly (c) after assembly.In (a) and (c), the number in the smallest rectangles shows the amount of errors found on the corresponding HICANN.Purple or red indicate that all tests were successful or failed, respectively.For grey HICANNs the test was skipped since no connection could be established using the wafer prober.In (b), test results are shown per elastomeric connector and a yellow rectangle indicates a problem in the high-speed communication of one HICANN.(d) Communication test result.HICANNs without high-speed communication are marked yellow, without JTAG communication red.The center two HICANN groups have no high-speed interface by design.Consequently, they are marked faulty in all tests requiring high-speed communication to the Main PCB.
[9], or from scratches or similar defects on the post-processing layers.During the test, an individual connection is established to each of the 384 HICANNs of one wafer.The test is split into a high-speed test and a JTAG test, which reflects the two possibilities to communicate with the HICANN.Failures are stored separately in the availability database.The result of a communication test is shown in fig.7d.In this example, the result comparison between the test stand and the rack-mounted fully assembled wafer module shows one additional HICANN group and 3 individual HICANNs that cannot communicate via JTAG.

Fig. 8 .
Fig. 8. Left: Picture of the HICANN with labeled components and marked areas shown on the right side.Top right: Detail of the synapse array.Two synapse rows are connected to one synapse driver.All synapses of the same column are connected to one neuron circuit.Middle right: Left half of the merger tree.Neuron input from the top gets routed to the buses on the bottom.Several inputs can be merged on the same bus.Background-generators are used to inject additional signals generated on-chip.Right bottom: Sketch of the bus system.Buses are connected by a sparse switch matrix.Repeaters, used to regenerate the signals, connect buses of neighboring HICANNs.

Fig. 9 .
Fig.9.Simplified neuron circuit schematic, displayed on the bottom, with the most relevant calibrated parameters in the BrainScaleS-1 system.The leak conductance controlled by I gl is constantly driving the membrane potential Vmem towards the rest voltage E leak .A spike is elicited when the membrane potential reaches the threshold voltage V threshold .After a neuron spikes, its membrane's potential is connected to the reset voltage Vreset for a period controlled by the parameter I pulse .For simplicity, one synaptic input is displayed out of two through which a neuron integrates excitatory and inhibitory input currents Isyn; this controls a conductance between the reversal potential Esyn and the membrane with a synaptic time constant controlled by Vsyntc.Each input receives currents from all the synapses connected to one column in the synaptic array, displayed on top.Additional parameters Vsyn, V convoff , and Iconv provide further control over the synaptic input, as further discussed in the supplementary material.

Fig. 10 .
Fig. 10.Exemplified calibration procedure for the refractory period.Sample fits obtained for a set of neurons, relating the I pulse parameter configured with the Floating Gates with the measured τ ref .Seven values within the dynamic range of I pulse were used for the fits.

Fig. 11 .
Fig. 11.Histograms of achieved parameter settings on all neurons of one HICANN for (a) the refractory time constant controlled by the parameter I pulse and (b) the leak potential controlled by the parameter E leak .Pale and intense colors correspond to the hardware achieved time constants and voltages for different target values (shown in black-dashed lines), before and after the calibration is applied, respectively.The narrowing and centering of the achieved value distributions is better for E leak than for τ ref .

Fig. 12 .
Fig. 12. Results of the synapse weight calibration.(a) Weight measurement for a fixed neuron circuit for different settings of the digital weight w and hardware parameter gmax div with Vgmax = 700 LSB.Horizontal dashed lines indicate cuts with fixed values of the hardware parameter gmax div shown in (b), vertical dashed lines indicate cuts with fixed digital weight values w shown in (c).In (b) and (c), solid lines represent measured values, dashed lines the results of the fit of eq.(3) applied on the whole measured parameter space.(d) Variations of weight measurements with and without rewriting of the Floating Gates.Values are extracted for 3 digital weight parameters w from a fixed neuron with fixed hardware parameters (Vgmax = 700 LSB, gmax div = 2 LSB).(e) Comparison of a per wafer and a per neuron weight calibration.Measurements for the entire parameter space are performed on a subset of neurons.The calibration is then performed for the whole subset or per individual neuron.The histogram shows the difference between the measured and expected values using the obtained calibrations.

Fig. 14 .
Fig. 14.Hardware emulation of a chain with six chain links.(a) Propagation of pulse packages along the chain.Successful propagation depends both on the strength a as well as the synchronicity σ of the initial stimulus, represented by (a, σ).Broad input stimuli synchronize along the chain or do not reach the end of the chain.(b) Average number of spikes per neuron in the final group āout of the chain in dependency on the initial strength a and synchronicity σ.Each input package was presented 40 times and the results are averaged over all presentations.The pulse packages propagate if the initial input is strong and synchronous enough.In the region of stable propagation the output strength is almost constant, near the separation of the two regimes the average strength of the final pulse package decreases.This separation line between successful propagation and failure of transmission can be controlled by several parameters such as the synaptic weights.

Fig. 15 .
Fig. 15.Hardware emulation of a chain with 19 000 neurons.Further parameters of the network can be found in table III.(a) Mapping of the network to a BrainScaleS-1 wafer.HICANNs excluded from the availability database are marked in red, cf.section IV-B.HICANNs which cannot host an entire group are marked in orange and are not used in the experiment.On each HICANN colored in blue an entire group of neurons is placed.Colored lines indicate synaptic connections.(b) Response of the chain to an input packet of strength a = 1 and spread σ = 1.
2 V • Effects: The achieved hardware voltage distribution is shifted towards the correct target value from its original mean, as can be seen in fig.1b.The standard deviation does not improve for all targets since the shared nature of the parameter limits the action of the correction over arXiv:2303.12359v1[cs.ET] 22 Mar 2023

Fig. 1 .
Fig. 1.(a) Analog readout offset distribution for the 512 neurons of one HICANN.Calibration results for the parameters (b) Vreset, (c) V threshold and (d) E syni .Pale and intense colors correspond to the hardware achieved voltages for different target values (shown in black-dashed lines) before and after the calibration is applied, respectively.For Vreset the correction effect is limited by the parameter being shared by 128 neurons.

Fig. 2 .
Fig. 2. Effective resting potential of the neurons on one HICANN (a) before and (b) after calibration of parameter V convoffx .

Fig. 3 .
Fig. 3. (a) Fits for the parameter I gl against the achieved membrane time constant on five neurons, using a softplus function model and eight measurement steps.(b) Distribution of membrane time constants before and after the I gl calibration is applied for all neurons, with pale and intense colors, respectively.

Fig. 4 .
Fig. 4. Distribution of the achieved (a) excitatory and (b) inhibitory synaptic time constants before and after calibration in pale and intense colors, respectively, for four different target values (black dashed lines).

Fig. 5 .
Fig. 5. Excitatory reversal potential calibration.(a) Indirect measurement of the excitatory reversal potential.The height of the PSP of a stimulated neuron is extracted for different resting potentials.Different colors indicate different hardware settings of the reversal potential in LSB.Since the circuits are not designed to reach the excitatory reversal potential, non linear behavior is observed for small distances between membrane potential and reversal potential.A linear extrapolation of the linear region (dotted line) is used to extract the correct reversal potential.During experiments the neuron is exclusively operated in the linear regime.(b) Comparison of direct and indirect measurement of the excitatory reversal potential.Because of the non-linear behavior of the circuits close to the reversal potential, the direct measurement provides too small values.The indirect measurement has larger errors due to its dependency on the whole neuron circuit and the enlargement by the linear extrapolation.

TABLE I OVERVIEW
OF EXCLUDED COMPONENTS EXTRACTED FROM A FULLY ASSEMBLED BRAINSCALES-1 WAFER MODULE."COMPONENTS" SHOWS THE NUMBER OF COMPONENTS TAKEN INTO ACCOUNT FOR THE TESTS AND THE EFFECTIVE EXCLUSION.IF TWO NUMBERS ARE GIVEN, THE FIRST ONE IS THE NUMBER OF TESTED COMPONENTS AND THE SECOND ONE IS THE NUMBER OF COMPONENTS EVALUATED FOR THE EFFECTIVE EXCLUSION."INDIVIDUAL" LISTS THE COMMUNICATION AND MEMORY TEST RESULTS.BUSES ARE MARKED WITH "-" BECAUSE THEY HAVE NO DIGITAL MEMORY THAT COULD BE TESTED."EFFECTIVE" SHOWS THE RESULTS OF THE EFFECTIVE EXCLUSION OF COMPONENTS.HERE, ALL COMPONENTS THAT SHOULD NOT BE USED DURING AN EXPERIMENT ARE INCLUDED.THEY NOT NECESSARILY FAILED A TEST.
minimizes the runtime using the slower connection.In total, more than 42 MiB of digital memory get tested per wafer.Results for a fully assembled wafer module are shown in table

TABLE II OVERVIEW
OF CALIBRATION BASED EXCLUDED NEURONS OF A FULLY ASSEMBLED WAFER MODULE.IN THE COLUMN LABELED "NEURONS" THE FIRST ENTRY SHOWS THE NUMBER OF NEURONS TAKEN INTO ACCOUNT FOR THE CALIBRATION, THE SECOND ENTRY THE NUMBER OF NEURONS TAKEN INTO ACCOUNT FOR THE EFFECTIVE EXCLUSION.

TABLE I
PARAMETER-WISE DETAILS FOR THE CALIBRATION OF NEURONS IN ONE HICANN, INCLUDING THE NUMBER OF STEPS AND REPETITIONS RUN.EXTRACTION REFERS TO THE MODEL USED TO RELATE AND DETERMINE VARIABLE VALUES FROM THE OBSERVABLES.HW-FG REFERS TO THE MODEL USED TO FIT PARAMETERS BETWEEN THE HARDWARE DOMAIN AND THE PROGRAMMABLE FG VALUES.