This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy. Close this notification
Brought to you by:
Paper

Analysis and mitigation of parasitic resistance effects for analog in-memory neural network acceleration

, , , , and

Published 14 October 2021 © 2021 IOP Publishing Ltd
, , Special Issue on Neuromorphic Devices and Applications Citation T Patrick Xiao et al 2021 Semicond. Sci. Technol. 36 114004 DOI 10.1088/1361-6641/ac271a

0268-1242/36/11/114004

Abstract

To support the increasing demands for efficient deep neural network processing, accelerators based on analog in-memory computation of matrix multiplication have recently gained significant attention for reducing the energy of neural network inference. However, analog processing within memory arrays must contend with the issue of parasitic voltage drops across the metal interconnects, which distort the results of the computation and limit the array size. This work analyzes how parasitic resistance affects the end-to-end inference accuracy of state-of-the-art convolutional neural networks, and comprehensively studies how various design decisions at the device, circuit, architecture, and algorithm levels affect the system's sensitivity to parasitic resistance effects. A set of guidelines are provided for how to design analog accelerator hardware that is intrinsically robust to parasitic resistance, without any explicit compensation or re-training of the network parameters.

Export citation and abstract BibTeX RIS

1. Introduction

Matrix operations lie at the heart of processing deep neural networks [1, 2], which are increasingly being adopted for image recognition, natural language processing, and various other tasks across many application domains. There has been significant recent interest in accelerating matrix operations by executing them in the analog domain inside non-volatile memory arrays. This approach potentially yields orders-of-magnitude improvements in the energy efficiency of neural network inference, which relies heavily on matrix multiplications [35]. The improvement comes from the intrinsic high efficiency of analog computation and a significant reduction in data movement energy, since the same arrays used for computation also store the network's weights.

However, analog computation is more easily corrupted by noise and process variations than digital computation. For the analog computation of matrix-vector multiplications (MVMs) within memory arrays, another intrinsic source of error is the parasitic voltage drops across the array's metal interconnects. These voltage drops distort the network's weight values, but unlike noise, the errors are not random, are not uniform across the array, and scale more quickly with array size. Thus, parasitic resistance limits the size—and therefore efficiency—of memory arrays that can be used for analog computation without compromising accuracy, much as it also limits the fidelity of write operations in a large non-volatile memory array [6].

Errors induced by parasitic resistance in analog neural network accelerators have previously been studied [712]. However, prior work has been limited to small networks and simple classification tasks, or if applied to larger networks, assumed only a specific value of parasitic resistance and did not quantify the sensitivity of the system to parasitic resistance. Partly, this is because end-to-end inference simulations of deep networks (especially convolutional neural networks, or CNNs) that correctly account for parasitic resistance effects are computationally expensive. In this work, we use linearized circuit models of large MVM arrays that capture the essential physics of parasitic resistance, and which enable rapid simulation of large networks, including state-of-the-art ImageNet CNNs (ResNet-50). This simulation framework is then used to comprehensively study the sensitivity of inference accuracy to parasitic resistance, and how design decisions at the technology, circuit, architecture, and algorithm level affect this sensitivity.

Prior work has also proposed methods to actively compensate for parasitic resistance by modifying the weights that are programmed into the array, or by accounting for the effects of parasitic resistance during training [8, 9, 1113]. This work takes neither approach, but instead assumes that weights from off-the-shelf neural network models are transferred directly to array conductances. Rather than actively compensating for parasitic resistance, we describe how to design the analog hardware to be intrinsically robust to parasitic resistance. This robust design entails a combination of device requirements for the memory cell (resistance and On/Off ratio), an optimal mapping of neural network weights onto the cells, and an electrical topology for the memory array that reduces parasitic voltage drops. This paper shows that by following these principles and by using realistic memory cell properties that have been experimentally demonstrated, accuracy losses due to parasitic resistance can be kept minimal without requiring compensation or re-training, and without sacrificing energy efficiency by reducing the array size. For systems that use cells with lower resistance or On/Off ratio, these techniques can complement prior methods of parasitic resistance compensation to enable high accuracies.

This paper is organized as follows. Section 2 describes how neural network accuracy is degraded by array parasitic resistance and briefly reviews related work. Section 3 outlines the important accelerator design decisions that govern its sensitivity to parasitic resistance, and describes the methodology used by this paper to study this sensitivity. Sections 46 contain the main results of this sensitivity analysis, detailing its dependence on the system's data mapping scheme, the array circuit topology, and the neural network and dataset, respectively.

2. Parasitic resistance effects in memory arrays

The effect of metal wire resistance is a critical consideration when designing memory arrays, whether for storage or for in-memory analog computation. When applying a voltage or current to one end of a row or column, IR drops along the metal wires distort the voltage and current that are seen at a given memory cell. The longer the wire, the larger the voltage drop, which means that the amount of distortion depends on the position of a cell in the array. This further implies that parasitic metal resistance is one of the main factors that limit the maximum feasible size of a memory array. The problem is made worse as the metal lines are made thinner, which increases their resistance.

2.1. Parasitic resistance effects in non-volatile memory arrays for storage

A typical non-volatile memory cell consists of a memory device, which holds one or more bits of data, and an access device. The access devices in the array enable a selected subset of cells within the array to be read or written at a time. Ideally, the unselected cells draw zero leakage current and thus remain unperturbed. To accomplish this, the access device is implemented with a highly nonlinear circuit element such as a transistor, diode, or other back-end-of-line compatible switches [6]. These switches still draw a small leakage current that cannot be ignored in large arrays.

Metal lines connect each memory cell to read and write circuitry at the periphery of the array. Parasitic voltage drops due to the resistance of the metal wires change the voltages across the selected and unselected memory cells. During write, the voltage across the selected cell may fall below the threshold of the access device; to avoid this, a larger voltage may be needed to write to cells at the far corner of the array, increasing power. At the same time, parasitic wire resistance can change the voltages across a large number of unselected cells, increasing their combined leakage currents. These effects make write failures to selected cells and data disturbs to unselected cells more difficult to avoid as the array size increases. Parasitic resistance usually has a weaker effect during standard memory reads, where the current through a selected cell is smaller compared to a write operation. Detailed reviews and quantative analyses of the effects of wire resistance and its interaction with access devices in non-volatile memory arrays can be found in Burr et al [6] and Narayanan et al [14].

2.2. Parasitic resistance effects in analog in-memory matrix-vector multiplication

In addition to data storage, non-volatile memory arrays show significant promise in accelerating matrix computations, particularly matrix-vector multiplication. A prominent application of such an accelerator is deep neural network inference, which relies heavily on MVMs. By storing matrix weights in a memory array and driving the rows with a vector input, a fully parallel analog MVM can be executed by exploiting circuit laws within the array. In-memory MVM offers a large potential reduction in energy compared to digital processors, including dedicated digital inference accelerators [15, 16], due to the low energy of analog multi-bit arithmetic operations and the reduction of data movement energy associated with shuttling matrix weights into and out of memory [35].

Figure 1(a) shows how an MVM can be mapped to a non-volatile memory array. For simplicity, the access devices are not shown and the memory elements are depicted as programmable resistors. In this example, the conductance Gij of a memory cell is proportional to the corresponding matrix element, and the applied voltage Vi to a row is proportional to the ith element of the input vector. Practical considerations in the mapping of weights and inputs to this circuit, which may cause deviations from this conceptual example, will be discussed in more detail in section 3. Multiplication of the conductance and voltage occurs by Ohm's law, and these products are summed along a bit line using Kirchoff's current law. The dot products are encoded in the bit line currents, which are typically read out using a transimpedance amplifier [24] or current integrator [3] to produce a proportional voltage. The peripheral circuitry also keeps the bit line voltages at virtual ground, which ensures that the voltage across the conductance Gij is equal to Vi . The final analog result is quantized using an analog-to-digital converter (ADC). To make the most efficient use of the peripheral circuits such as the ADC, which generally dominate the energy, every row of the array is activated simultaneously.

Figure 1.

Figure 1. (a) One way to execute an MVM $\vec{y} = \mathbf{W}\vec{x}$ within an ideal resistive memory array: $\vec{x}$ maps to the input voltages V, W to the conductances G, and $\vec{y}$ to the bit line currents $I_\mathrm{BL}$. (b) Perturbation of an MVM due to parasitic wire resistance.

Standard image High-resolution image

In the presence of non-idealities in the memory array, such as programming errors and drift in the cell conductances, the bit line currents do not perfectly represent the correct values of the dot products. This work will focus on errors induced by parasitic metal resistance, shown as red resistors in figure 1(b). Due to parasitic resistance on both the rows and columns, the voltage across the conductance Gij will deviate from Vi by an error $\delta V_{ij}$. This in turn leads to an error in the cell current $\delta I_{ij}$, which contributes to an error on the accumulated bit line current $\delta I_{\mathrm{BL},j}$. The error in the bit line current is propagated as an error in the dot product to the next layer of the neural network.

Compared to a standard memory read, parasitic resistance has a larger effect in an in-memory MVM where all rows are read simultaneously. Since the voltage error $\delta V_{ij}$ depends strongly on the position of a memory cell, the weight matrix is distorted in a non-random and spatially non-uniform manner, unlike noise and process variations. Additionally, because voltage drops increase with current, the errors also depend on the value distributions and sparsity of the weight matrices and input activations. The problem rapidly becomes worse with larger matrices. Each new row or column added to the matrix increases both the total line resistance and the line current, together increasing the largest IR drops in the array.

The size of the parasitic IR drop relative to the voltage across the memory cell depends on the ratio of the cell resistance $1/G_{ij}$ to the parasitic wire resistance between two cells $R_\mathrm p$. The effect of wire resistance can be reduced by a lower-resistance interconnect process, or by selecting a more resistive memory cell technology that lowers the array currents in an MVM. Table 1 shows the measured resistance of several published multi-level memory cells that were designed for use in in-memory MVM arrays, including resistive random access memory (ReRAM), phase change memory (PCM), floating-gate flash, and SONOS (Si-oxide-nitride-oxide-Si) charge trap memory. In addition to their resistance, these technologies differ in their programmable resolution and their susceptibility to process variations and conductance drift [4, 25, 26]. The analysis in this paper remains agnostic to these technology-specific details and considers only the device properties that are pertinent to the parasitic resistance effect: the device resistance and On/Off ratio. IV nonlinearity is also an important property, as we describe qualitatively in section 3.4.

Table 1. Selected memory cell resistances for analog in-memory MVM.

Cell technologyCell resistanceNormalized parasitic resistance with $R_\mathrm p = 2.215~\Omega$ [17]
Al2O3/TiOx ReRAM [18]10–100 kΩ $2.2 \times 10^{-4}$
TaOx ReRAM [8]2 kΩ–2 MΩ $1.1 \times 10^{-3}$
TaOx /HfOx ReRAM [19]50–400 kΩ $4.4 \times 10^{-5}$
PCM [20]45–500 kΩ $4.9 \times 10^{-4}$
Floating-gate flash [21]1–33 MΩ $2.2 \times 10^{-6}$
SONOS flash [22]62 kΩ–2 MΩ $3.6 \times 10^{-5}$
Ionic floating-gate [23]10–20 MΩ $2.2 \times 10^{-7}$

To generalize the results of this work to any non-volatile memory or metal interconnect technology, the parasitic resistance will always be expressed as a normalized parasitic resistance $R_{\mathrm {p,norm}}$. This is defined as the ratio of the parasitic wire resistance between two memory cells $R_\mathrm p$ to the minimum resistance of a cell, $1/G_\mathrm{max}$. This ratio is sufficient to quantify the effect on accuracy of parasitic voltage drops with any memory technology. Table 1 shows the normalized parasitic resistance assuming a fixed interconnect resistance of $R_\mathrm p = 2.215~\Omega$ between cells, for arrays with different memory cell technologies. This metal resistance corresponds to the 32 nm process node in the 2011 ITRS roadmap [17].

Prior modeling work has developed a general understanding of MVM errors induced by parasitic resistance [7, 912, 27]. Zhang et al, in particular, performed end-to-end inference simulations of CIFAR-10 CNNs with a specific value of parasitic resistance and characterized the distribution of parasitic-induced errors in different layers [9]. This paper extends these earlier works by comprehensively studying how various architecture-, circuit-, and algorithm-level design choices influence the sensitivity of an analog inference accelerator's accuracy to parasitic resistance. We also show how parasitic resistance effects scale to state-of-the-art neural networks for the Imagenet classification dataset [28].

Some prior works have also proposed methods to mitigate the effects of parasitic resistance on the accuracy of in-memory analog matrix operations [8, 9, 12, 13, 29]. A strategy that is shared by several of these papers is to compensate for expected parasitic voltage drops by modifying the programmed conductance that corresponds to each element of the weight matrix. While these techniques can improve accuracy, this paper focuses on the direct transfer of neural network weights to memory cell conductances without any compensation, nor with any regularization for parasitic resistance effects during network training [10, 11]. This paper shows that parasitic resistance effects can be mitigated to a large extent by the appropriate linear mapping of data to the memory cells, and by an optimal design of the array electrical topology. These techniques are complementary to compensation methods developed in prior work, which can still be beneficial in systems with high normalized parasitic resistance.

3. Methodology of parasitic resistance sensitivity analysis

This work studies how design choices in an analog inference accelerator influence the sensitivity of the accuracy to parasitic resistance effects. Among these design choices is the scheme for mapping neural network weights to analog hardware, and the electrical topology of the MVM array. The effect of neural network topology and dataset complexity will also be evaluated. This section describes the neural networks and hardware design choices considered, as well as the simulation methodology.

3.1. Evaluated neural network

To provide clear guidelines for hardware design, the sensitivity analysis of design choices related to data mapping and circuit topology (sections 4 and 5) will be conducted using a single benchmark neural network and dataset: ResNet-14 on CIFAR-10, which has ten image classes [30]. This network has the same residual block structure as the residual networks (ResNets) proposed by He et al [31] for CIFAR-10, but has fewer total blocks. The network was trained using Keras [32] and attains an accuracy of 90.73% on the CIFAR-10 test set before accounting for hardware non-idealities. Section 6 will additionally evaluate the sensitivity of deeper ResNets for both CIFAR-10 and ImageNet. The evaluated networks all use rectified linear (ReLU) activations. It is assumed that network weights are quantized to 8 bits before mapping to hardware, and activations are digitally quantized to 8 bits during inference.

Table 2 enumerates the convolution and fully-connected layers in ResNet-14 that can be accelerated by in-memory MVM. Layers 7 and 12 are projection shortcuts. A convolution is executed in analog by unrolling the operation as a sequence of sliding window MVMs as described by Shafiee et al [33]. The resulting array has $K_xK_yN_\mathrm{ic}$ rows and $N_\mathrm{oc}$ columns, where $K_x \times K_y$ is the 2D kernel size, $N_\mathrm{ic}$ is the number of input channels, and $N_\mathrm{oc}$ is the number of output channels. Layers that use $3 \times 3$ convolutions typically map to arrays that have many more rows than columns. The aspect ratios of the largest arrays ($576 \times 64$) are such that parasitic resistance effects along the columns, rather than the rows, are dominant.

Table 2. Convolution and fully-connected layers in ResNet-14.

LayerKernel typeArray size
1 $3 \times 3$ conv $27 \times 16$
2, 3, 4 $3 \times 3$ conv $144 \times 16$
5 $3 \times 3$ conv $144 \times 32$
6, 8, 9 $3 \times 3$ conv $288 \times 32$
7 $1 \times 1$ conv $16 \times 32$
10 $3 \times 3$ conv $288 \times 64$
11, 13, 14 $3 \times 3$ conv $576 \times 64$
12 $1 \times 1$ conv $32 \times 64$
15Fully-connected $64 \times 10$

3.2. Mapping weights to conductances

One of the most critical system-level design choices in an analog accelerator is the scheme for mapping neural network weights to memory cell conductances. Several techniques have been proposed in the literature to handle negative-valued and/or high-precision weights. For negative values, the two most common approaches are offset subtraction and differential cells.

In offset subtraction [33], signed weights in a matrix W are programmed onto cell conductances with an offset: $G_{ij} = \alpha W_{ij} + G_\mathrm{offset}$, where α is a fixed conversion factor. This allows negative weight values to be represented by positive conductances, and a zero weight is mapped to $G_\mathrm{offset}$. In this work, we set $G_\mathrm{offset} = 0.5(G_\mathrm{min} + G_\mathrm{max})$. An offset term is digitally subtracted from the MVM result after the ADC to obtain the dot product. By contrast, systems with differential cells map signed weights to the difference in current of two memory cells: $\alpha W_{ij} = G_{ij}^+ - G_{ij}^-$ [1820]. Typically, for a positive weight $G_{ij}^-$ is mapped to the minimum conductance state $G_\mathrm{min}$, and for a negative weight $G_{ij}^+$ is mapped to $G_\mathrm{min}$. In this work, it is assumed that the subtraction of currents occurs in the analog domain, using the array topologies described in section 3.3. In both mapping schemes, α is chosen to utilize as much of the conductance dynamic range as possible. For differential cells, this means that the weight with the largest absolute value in each layer is mapped to $G_\mathrm{max}$.

High-precision weights, beyond the programmable resolution of a cell, can be handled in in-memory MVM by bit slicing [3335]—the bits of precision in the weight value are partitioned onto multiple devices. In this work, it is assumed that bit slicing is not used for the 8-bit weights. This means that offset subtraction uses a single 8-bit memory cell per weight, while differential cells use two 7-bit cells per weight. Note that a 7-bit (or 8-bit) cell in this case does not imply 128 (or 256) well-separated levels that can reliably be read out as in a digital memory. For analog inference, it is typically sufficient to use approximate memories where the levels overlap substantially in their distributions due to process variations or write noise. To isolate the effects of parasitic resistance, however, this work assumes that weight values are mapped without error to the cell conductances.

A key difference between offset subtraction and differential cells is proportionality. When using differential cells, weights are mapped to conductances in proportion to their absolute values, while with offset subtraction they are not. This difference is visualized in figures 2(a) and (b) for layer 11 of ResNet-14. As is commonly the case in neural networks, the values in this layer's weight matrix are largely clustered around zero. This means that when using offset subtraction, most of the cells have conductance close to $G_\mathrm{offset}$ (${=}\ 0.5G_\mathrm{max}$ in this case) while for differential cells most of the cells have conductance close to $G_\mathrm{min}$. Thus, with differential cells the currents internal to the array during an MVM will be smaller, and the effect of parasitic resistance will be weaker. Differential cells possess maximal proportionality when the memory cell conductance On/Off ratio is infinite, such that a zero weight maps to exactly zero conductance: $G_\mathrm{min} = 0$. In practice, the cell has a finite On/Off ratio, which means that the weights and conductances will not be exactly proportional. In this case, the array will have larger currents, as shown in figure 2(c).

Figure 2.

Figure 2. Conductance spatial distribution within the MVM array(s) for layer 11 in ResNet-14, for (a) offset subtraction with infinite conductance On/Off ratio, (b) differential cells with infinite On/Off ratio, and (c) differential cells with an On/Off ratio of 10.

Standard image High-resolution image

3.3. Evaluated array topologies

Figures 3(a)–(c) show three array topologies whose sensitivity to interconnect parasitics is evaluated, each of which uses 1-transistor 1-resistor (1T1R) memory cells. In all cases, it is assumed that the 8-bit input elements xi are applied to the array one bit at a time, and the results for different input bits are combined via shift-and-add operations after the ADC [33], or before the ADC using specialized analog circuitry [36]. To first order, this choice means that the memory device does not need a highly linear IV curve across a large voltage range, and simplifies the row driver circuitry as only two (or three) input voltage levels are needed. When parasitic resistance is considered, however, IV nonlinearity can affect the accuracy even with binary voltages, as we discuss in section 3.4.

Figure 3.

Figure 3. (a)–(c) Three array topologies considered in this work, shown for differential cells. The input voltages Vi are binary. Black resistors denote memory devices. For offset subtraction, only the $G^+$ cells are used. Topology C is only relevant for differential cells. (d)–(f) Approximate circuit models for the three topologies using linear resistors and ideal select devices.

Standard image High-resolution image

In Topology A, the select transistors are all gated on during an MVM, connecting one terminal of the memory device to the row, which carries the input signal [3, 19, 20]. Each row voltage Vi is equal to $V_\mathrm D$ if the bit is high, 0 V if low. Since cell currents are supplied from the same line that carries the input signal, parasitic voltage drops are present across both the rows and columns of the array. A square memory cell is assumed such that $R_\mathrm p$ has the same value for the rows and the columns. If using differential cells, negative weights are implemented on separate bitlines, and the positive and negative bit line currents are subtracted using peripheral circuitry, as shown in figure 3(a) [20]. Alternatively, the negative bit line can be driven with the opposite polarity input voltage, and the currents subtracted by Kirchoff's law [3, 19]. Assuming ideal peripheral circuitry, the effect of array interconnect parasitics in the two cases is the same.

In Topology B, the input is applied to the gates of the select transistors [21, 22, 37]. When the input is high, the transistor connects the memory device to a shared voltage supply ($+V_\mathrm D$). Since the transistor gates draw negligible current, the input signals are not distorted as they travel along the rows to a given cell. Additionally, since the voltage supply is shared by all cells, current can be sourced from a distribution network with relatively small resistance. Therefore, accuracy will be degraded primarily by the parasitic resistance of the bit lines where the cell currents are summed. A further benefit is that when an input is low, all cells connected to the corresponding row draw nearly zero current because they are disconnected from the voltage supply. By contrast, in Topology A, a cell draws zero current only if the voltage across the element is zero; when the input is low, a voltage fluctuation induced by parasitic resistance on the lines can cause current to flow through a cell.

Topology C is a modification of Topology B where the positive and negative cells are interleaved along the array columns and each pair of devices is connected to the same bit line. This topology is only compatible with differential cells. Current subtraction occurs within each pair of cells rather than at the periphery of the array. Since these pairs can either add current or subtract current, cancellation of currents can occur along the bit line to reduce parasitic voltage drops. Unlike Topology A or B, supply voltages of both polarities are necessary in addition to a virtual ground.

In all the neural networks evaluated in this work, the inputs are strictly positive due to ReLU activations except in the first layer. To handle negative inputs with any of the three topologies, the number of cells per weight is assumed to be doubled in the first layer; one cell (or cell pair) implements the weight $+W_{ij}$, and the other implements $-W_{ij}$ on a separate row. If the input is positive, the first row is driven, and if the input is negative, the second row is driven.

3.4. Simulation framework

The results in this work are generated using CrossSim, a highly parameterizable simulation framework for analog in-memory MVMs [38]. Keras pre-trained models are imported into CrossSim, which internally represents each layer as a collection of analog MVM arrays. To isolate the effects of parasitic resistance, models for random cell programming errors, conductance drift, and read noise are disabled. Array circuit simulations are DC simulations; the integration time is assumed to be long enough to remove the effect of array transients.

CNN inference requires a large number of MVMs; for instance, ResNet-14 for CIFAR-10 requires over 50 000 bit-wise MVMs per image while ResNet-50 for ImageNet requires nearly 106 bit-wise MVMs per image. To feasibly simulate full CNNs while accounting for parasitic resistance effects, the three topologies in figures 3(a)–(c) are simulated using the approximate circuits in figures 3(d)–(f). In these approximations, a small-signal assumption is made to model the memory device as a linear resistor. The transistors are also assumed to be ideal switches that act as short connections when on and open connections when off. The short circuit approximation is valid when the transistor is substantially more conductive than the memory device in its On state, such that the voltage drop $V_\mathrm{ds}$ across the channel is small.

It is worth noting that if the memory device is sufficiently nonlinear in its IV characteristic, the small-signal approximation above may break down. Some of the ReRAM devices reported in the literature are strongly nonlinear [18], and transistor-based devices have linear and non-linear regimes of operation with respect to the drain-source voltage. For a given parasitic voltage drop, the error induced in a weight value in a nonlinear device may be larger or smaller than that in a linear device, depending on the curvature of the nonlinearity ($d^2I/dV^2$); at the same time, the nonlinearity will also change the size of the voltage drops. Modeling and understanding the complex interaction of parasitic voltage drops with IV nonlinearity in the memory device and/or access device is an important subject of future work.

To simulate the circuits in figures 3(d)–(f), the currents along the rows and columns are first computed assuming zero wire resistance. The voltage drop across each parasitic resistance is then computed, and the resulting node voltages along the rows and columns are used to modify the values of the cell currents. These cell currents are then used to re-compute the line parasitic voltage drops iteratively. The simulation is considered to have converged when the worst-case change in node voltage $\Delta V$ between iterations falls below 0.0002$V_\mathrm D$; this threshold was determined empirically based on the stability of the end-to-end inference accuracy. The linearizing simplifications in the circuit enable each step to be computed using fast matrix operations, accelerated in this work by NVidia V100 GPUs.

For every layer, the numerical range of the 8-bit input activations is chosen to minimize the errors induced by quantization and clipping. This optimization is performed using statistics gathered from 500 randomly chosen calibration images from the CIFAR-10 training set. ADC quantization is also included and the ADC ranges are similarly optimized. An 8-bit ADC is used for all systems. At this resolution, the effect of activation quantization and ADC quantization on CIFAR-10 accuracy are both found to be negligible.

To reduce digital processing overheads in the accelerator, batch normalization parameters are folded into the weight matrix of the preceding convolution layer [39]. Bias weights are stored digitally outside of the array. For computational tractability, each accuracy result in the sensitivity analyses below is based on a fixed subset of 1000 images from the CIFAR-10 test set.

4. Data mapping dependence of parasitic resistance effects

This section investigates how the way that weight values are mapped to cell conductances influences an analog accelerator's sensitivity to parasitic resistance. For any system, the specific mapping that is used can be both an architectural design choice and a consequence of technology limitations. In this section, all results are based on ResNet-14 with Topology B.

4.1. Negative number handling and proportionality

As discussed in section 3.2, negative number handling via differential cells maps weights proportionally to conductances, while offset subtraction does not. This proportionality property is very important, because the parasitic voltage drops $\delta V_{ij}$ are proportional to the accumulated bit line currents, and these currents increase proportionally with the cell conductance. Thus, the higher the average cell conductance, the greater the accuracy degradation caused by parasitic interconnect resistance.

Figure 4 shows the stark difference between the two weight mapping schemes in terms of the average parasitic resistance induced bit-line current error, $\delta I_\mathrm{BL}$. The error is averaged over all bit lines of all layers that use the same number of rows, across all MVMs over a set of 100 CIFAR-10 images. Because differential cells use much lower conductances, as illustrated in figure 2, its average bit line error is more than an order of magnitude lower for the same parasitic resistance.

Figure 4.

Figure 4. Bit line current errors $\delta I_\mathrm{BL}$ vs input bit position for differential cells and offset subtraction, for (a) arrays with 576 rows and (b) arrays with 144 rows. $\delta I_\mathrm{BL}$ is expressed in units of the maximum cell current, $I_\mathrm{max} = G_\mathrm{max} V_\mathrm D$, assuming a normalized parasitic resistance of $R_{\mathrm {p,norm}} = 10^{-5}$. LSB = least significant bit, MSB = most sigificant bit.

Standard image High-resolution image

Figure 4 also shows how the parasitic resistance induced error changes with the input bit position and the array size. The error decreases sharply for the most significant bits of the 8-bit input. This is because the activation values at the input of the layer tend to cluster around zero, and thus the most significant bits are high only for a few large outlier inputs. As a result of this sparsity, only a very small fraction of the rows are activated for these bit positions, greatly reducing the parasitic voltage drops. This is a favorable trend, since the most significant input bits are weighted most heavily in the MVM output. Comparing figures 4(a) and (b) also shows, as expected, that the bit line error is smaller for arrays with fewer rows. For the evaluated array topology, the number of columns in the array does not affect the average $\delta I_\mathrm{BL}$.

In addition to lower currents arising from proportionality, differential cells possess another kind of parasitic resistance resilience that is absent with offset subtraction. In both the positive and negative bit lines of the array, the effect of parasitic resistance is always to reduce the voltage across a memory cell: $\delta V_{ij} \lt 0$ for Topologies A and B. Thus, the error in bit line current in these topologies is also always negative. Since both the positive and negative bit line currents are always perturbed in the same direction, the error induced by parasitic resistance partially cancels when the two bit line currents are subtracted. In offset subtraction, the subtracted offset is digitally computed and is unaffected by parasitic resistance; therefore, there is no beneficial cancellation of errors.

Figure 5 shows the CIFAR-10 inference accuracy with ResNet-14 as a function of parasitic resistance. Consistent with the errors in figure 4, differential cells are more resilient to parasitic resistance effects by about two orders of magnitude: differential cells maintain high accuracy up to about $R_{\mathrm {p,norm}} = 10^{-4}$, while offset subtraction begins to lose accuracy at about $R_{\mathrm {p,norm}} = 10^{-6}$. Thus, to minimize parasitic resistance effects, the system should map weight values proportionally to conductances.

Figure 5.

Figure 5. CIFAR-10 accuracy with ResNet-14 vs normalized parasitic resistance for differential cells vs offset subtraction. An infinite On/off ratio and topology B is assumed. The dashed line shows the accuracy with no parasitic resistance on this 1000-image subset (90.9%).

Standard image High-resolution image

4.2. Memory cell On/Off ratio

As discussed in section 3.2, a finite memory cell conductance On/Off ratio leads to imperfect proportionality between weights and conductances even with differential cells. This is important since most neural network layers are dominated by small weights, which should be mapped to as small a conductance as possible to minimize array currents and parasitic voltage drops. Figures 6(a) and (b) show how CIFAR-10 accuracy degrades for cells with low On/Off ratio. For this network, a cell with an On/Off ratio of 100 has effectively the same parasitic resistance sensitivity as an ideal cell with infinite On/Off ratio. Meanwhile, a cell with an On/Off ratio of 4 is about 5 × more sensitive to parasitic resistance, due to increased average cell currents. Therefore, a sufficiently high On/Off ratio is needed in the memory cell to fully exploit the abundance of small weights in the neural network to mitigate parasitic resistance effects. Nonetheless, using differential cells with a small On/Off ratio of 4 still provides an order-of-magnitude improvement in sensitivity over offset subtraction with an infinite On/Off ratio.

Figure 6.

Figure 6. (a) CIFAR-10 accuracy with ResNet-14 vs normalized parasitic resistance for differential cells with several values of the memory cell On/Off ratio. Array topology B is assumed. (b) Accuracy vs On/Off ratio at a normalized parasitic resistance of 10−4.

Standard image High-resolution image

The On/Off ratio varies greatly among the non-volatile memory devices that have been used for analog neural network accelerators, some of which are listed in table 1. Floating-gate flash and charge trap memory, for example, can exploit the intrinsically large current On/Off ratio of field-effect transistors in response to a gate bias [21, 40].

5. Array topology dependence of parasitic resistance effects

This section compares the parasitic resistance sensitivity of the array topologies presented in section 3.3 and explores how limiting the size of the array reduces parasitic resistance effects. All results are based on ResNet-14 with differential cells and infinite On/Off ratio.

5.1. Array topology

The spatial distribution of cell current errors $\delta I_{ij}$ is shown in figure 7(a) for layer 11 of ResNet-14 using the three array topologies in figure 3. Since parasitic resistance effects depend not only on the weight matrix but also on the input vector, the errors are shown for arrays that are driven by the specific input bit vector shown on the left. This vector corresponds to the input LSB for an MVM that was chosen at random during CIFAR-10 inference. In all three cases, the error is smallest in the lowermost rows of the array, whose bit line voltages are closest to virtual ground.

Figure 7.

Figure 7. (a) Distribution of cell current errors for the array corresponding to layer 11 in ResNet-14, using the three electrical topologies in figure 3. The error is computed for the input bit vector shown on the left and is expressed as a fraction of the maximum cell current $I_\mathrm{max}$. The colorbar limits were chosen for visibility and do not contain the full range of observed cell current errors. (b) Distribution of the dot product error among the columns for the MVM shown in (a).

Standard image High-resolution image

Topologies A and B contain exclusively negative errors at the cell current level: $\delta I_{ij} \leqslant 0$. This is because all cells connected to a bit line conduct current in the same direction, and thus the bit line voltage increases monotonically from zero starting at the bottom of the line, reducing the voltage across the memory cells. For Topology A, the input voltage is also attenuated by parasitic resistance along the rows, but this effect is much smaller due to the large aspect ratio of the $576 \times 64$ matrix. A crucial difference is that in Topology B, cells that are connected to a low input bit are gated off, and are guaranteed to conduct negligible current as desired ($\delta I_{ij} = 0$); this can be seen in figure 7(a), where the error spatial distribution for Topology B contains stripes that follow the input bit vector. By contrast, in Topology A, a cell that should ideally conduct zero current due to a low input would conduct a slightly negative current since the voltage on the bit line exceeds the voltage on the row due to IR drops. These cells do slightly reduce the amount of current on the bit line, which can reduce the errors in the cells lying above them.

Figure 7(b) shows the dot product errors for the same MVM, obtained for Topologies A and B by subtracting the accumulated currents on the positive and negative bit lines. Topology B has significantly lower dot product errors than Topology A due to the large number of disconnected cells on each column that are unaffected by parasitic voltage drops. It is also worth noting that for both Topologies A and B, the parasitics-induced error in the dot product is comparable to the largest single-cell errors, because the majority of the error accumulated on the two bit lines is canceled when their currents are subtracted. This is a consequence of using differential cells, as discussed in section 4.1.

Unlike the other topologies, Topology C contains both polarities of cell current error since cells can add or subtract current from the bit line. The cell errors are much smaller on average, due to the cancellation of cell currents along the bit line. This array also inherits from Topology B the benefit of zero error in cells that are connected to a low input. Comparing the dot product errors in figure 7(b), however, Topology B and Topology C have very close to the same error. To understand this, we first note that both topologies offer cancellation of currents in the positive and negative cells; this occurs in Topology B when two bit line currents are subtracted at the periphery, and in Topology C when cell currents are subtracted inside the array. An important observation about Topology C is that the voltage still changes monotonically along the bit line, though this time it can increase or decrease from the virtual ground. This can be seen by closely inspecting figure 7(a), where the cell errors in any given bit line all have the same sign. The implication is that although both positive and negative weights are present on a bit line, allowing currents to cancel, the parasitic voltage drops do not cancel but rather reinforce each other along a bit line. Therefore, the two topologies can be summarized as follows: Topology B offers greater cancellation of errors caused by its IR drops, while Topology C enables smaller IR drops but their effects do not cancel. The net effect of parasitic resistance appears to be roughly equivalent in the two cases, even though Topology C has lower maximum currents on its bit lines.

Figure 8 shows the CIFAR-10 inference accuracy with ResNet-14 using the three topologies. Topology A is significantly more sensitive to parasitic resistance. As mentioned above, the fact that parasitic resistance exists along the rows is a relatively minor factor, due to the large aspect ratio of all the largest weight matrices in table 2. Instead, the large accuracy loss with Topology A is due to the nonzero error in the unactivated rows of the array. In general, the unactivated rows outnumber the activated rows at any input bit position. In the higher bits, this is due to the clustering of activations around small values, as also observed in figure 4. In the lowest bits, the unactivated rows are still the majority due to the sparsity induced by ReLU activations.

Figure 8.

Figure 8. CIFAR-10 accuracy with ResNet-14 vs normalized parasitic resistance for the three array topologies in figure 3. Differential cells with an infinite On/Off ratio is assumed.

Standard image High-resolution image

Topologies B and C have almost the same sensitivity to parasitic resistance when evaluated based on end-to-end inference accuracy. This is consistent with the results in figure 7(b) and the observation above that the two topologies have differing advantages that ultimately make their dot products equally robust to parasitic resistance.

5.2. Array size

The size of an array is clearly an important determinant of the magnitude of accuracy losses incurred by parasitic resistance; the larger the array, the greater the parasitic voltage drops accumulated on the bit lines in an MVM. The utilized array size depends on the dimensions of the weight matrix. For ResNet-14, the largest weight matrices have 576 rows. For more complex networks, this can be much larger; for instance, the largest matrices in ResNet-50 for ImageNet have 4608 rows. Clearly, an array of this size would not be feasible not only due to MVM accuracy degradation but also due to the difficulty of programming the array, as described in section 2.1.

In practice, a large weight matrix can be partitioned spatially across multiple arrays, each of which contains an ADC and produces digital outputs. If more rows are needed than available in an array, the partial results from multiple arrays are added digitally. To maximize energy efficiency, the arrays should be made as large as feasible, since having many arrays increases the energy cost of peripheral circuits and ADCs [5, 41].

Figure 9 shows how the inference accuracy of ResNet-14 changes when the matrices are partitioned into arrays with a maximum limit on the number of rows. If the matrix exceeds the maximum array size, it is partitioned evenly across multiple arrays. When the maximum is reduced from 576 rows (no partitioning) to 288 rows, the sensitivity to parasitic resistance improves slightly. This operation partitions matrices with 576 rows (layers 11, 13, 14). A more substantial improvement in sensitivity comes from reducing the maximum array size to 144 rows; this operation further partitions the three layers above, and also partitions layers 6, 8, 9, and 10. In general, the end-to-end inference accuracy will not change smoothly as a function of the maximum array size. This is because different layers in the network can have different parasitic resistance sensitivities for the same array size, due to differences in the value distributions of their weights and activations. Furthermore, the network's inference accuracy may be more sensitive to errors in some layers than in others. For ResNet-14, it is possible that reducing the array size from 288 to 144 rows is more beneficial largely due to its effect on layers 6, 8, 9, and 10.

Figure 9.

Figure 9. CIFAR-10 accuracy with ResNet-14 vs normalized parasitic resistance using arrays with different upper limits on the number of rows. Differential cells with an infinite On/Off ratio and array Topology B are assumed.

Standard image High-resolution image

6. Parasitic resistance sensitivity vs neural network architecture

The neural network model and the complexity of the dataset both play important roles in the sensitivity of the analog inference accelerator to parasitic resistance errors. In this section, we explore the sensitivity of networks other than ResNet-14 and datasets other than CIFAR-10. For all simulations, differential cells are assumed with infinite conductance On/Off ratio. Array Topology C is assumed in this section for computational tractability when simulating large networks; based on figure 8, a similar sensitivity is expected when using Topology B.

6.1. Network size and depth

Figure 10 shows the parasitic resistance sensitivity of three residual networks trained on CIFAR-10: ResNet-14, ResNet-32, and ResNet-56 [31]. The three topologies use the same residual block structure and differ only in their total depth. Each network has a maximum weight matrix size of $576 \times 64$. The number of parameters and the ideal (error-free), floating-point accuracies of the three networks on the full CIFAR-10 dataset are shown in table 3.

Figure 10.

Figure 10. CIFAR-10 accuracy vs normalized parasitic resistance for three ResNet models. Differential cells, an infinite conductance On/off ratio, and Topology C are assumed.

Standard image High-resolution image

Table 3. Evaluated neural networks.

NetworkDataset# WeightsIdeal floating-point accuracyMax # rows
ResNet-14CIFAR-10175.6 k90.73%576
ResNet-32CIFAR-10467.9 k91.88%576
ResNet-56CIFAR-10857.7 k92.34%576
CNN-6MNIST61.6 k98.58%294
ResNet-50v1.5ImageNet25.6 M76.46%4608

There are two competing effects on parasitic resistance sensitivity as the network is made deeper. On one hand, in a deeper network, errors induced by parasitic resistance have the opportunity to cascade through more layers before they reach the network output; the errors can accumulate with depth, and this potentially increases the sensitivity. On the other hand, networks with a larger number of parameters are generally more tolerant of errors in weights and activations since their larger information capacity provides a greater degree of redundancy [42]. For the three ResNets studied here, figure 10 shows that the second effect is stronger. The deeper networks are marginally more resilient to parasitic resistance errors.

One factor that contributes to the trend in figure 10 is that as shown in figure 9, the fall-off in accuracy is largely due to weight matrices that use 288 or more rows, where the effect of parasitic voltage drops is greater. Besides the greater sensitivity to parasitic resistance, another consequence of the large array size is redundancy, due to having more channels and more weights. In all three ResNets, the layers are ordered such that the later layers have the most redundancy. Therefore, the early layers—which have the least redundancy and which can potentially source errors that ripple through the entire network—are least impacted by parasitic resistance. This fortuitous property helps to explain why the effect of redundancy outweighs that of error accumulation in figure 10.

6.2. Sensitivity comparison of datasets

The complexity of the dataset plays an important role in the sensitivity of a neural network to errors. Here, we compare the accuracy loss caused by parasitic resistance in neural networks trained on three different image recognition tasks: MNIST handwritten digits ($28 \times 28$ resolution, 10 classes) [43], CIFAR-10 ($32 \times 32$ resolution, 10 classes) [30], and ImageNet ($224 \times 224$ resolution, 1000 classes) [28]. The networks used for MNIST and ImageNet are shown in table 3. CNN-6 is a custom CNN with four convolution layers and two fully-connected layers; the four convolutions have 3, 3, 6, and 6 output channels, respectively. ResNet-50v1.5 is a modification of the ResNet-50 network in He et al [31]. We use the reference model that is part of the MLPerf Inference Benchmark [44], which is available for download online [45]. Due to the very large number of rows needed for the largest weight matrices in ResNet-50v1.5, these matrices are partitioned into arrays with at most 1152 rows. No partitioning is done for the other networks. The simulated accelerators for all networks use 8-bit quantized weights, 8-bit activations and 8-bit ADCs with calibrated ranges.

Figure 11 shows the parasitic resistance sensitivity of CNN-6 on MNIST, ResNet-56 on CIFAR-10, and ResNet-50v1.5 on ImageNet. For computational tractability, these results are obtained over fixed subsets of 1000, 1000, and 500 images from the three test sets, respectively. There is a substantial difference in error sensitivity across the three tasks, from the easiest (MNIST) to the most difficult (ImageNet), highlighting the importance of choosing a dataset of appropriate complexity when evaluating the accuracy of an inference accelerator. Nonetheless, assuming a realistically achievable normalized parasitic resistance of $R_{\mathrm {p,norm}} = 10^{-5}$, it is possible to fully suppress the effects of parasitic resistance on accuracy even in a state-of-the-art computer vision neural network like ResNet-50v1.5.

Figure 11.

Figure 11. Dependence of inference accuracy on normalized parasitic resistance for neural networks trained on three different datasets. Differential cells and Topology C are assumed.

Standard image High-resolution image

7. Conclusion

The parasitic wire resistance in memory arrays is an intrinsic source of computational errors in neural network accelerators based on analog in-memory MVMs. This work evaluates how various design decisions in an analog accelerator—at the technology, circuit, architecture, and algorithm levels—influence the sensitivity of its accuracy to parasitic resistance. While techniques exist to compensate or train the network around these errors, this work demonstrates that it is possible to mitigate these errors to a large extent by optimizing the memory cell technology and appropriately designing the hardware. An accelerator that is intrinsically robust to parasitic resistance has the following properties:

  • Weight values should be mapped proportionally to the conductance values that represent them. When handling negative weights, differential cells should be used, rather than offset subtraction.
  • The memory cell should have a high conductance On/Off ratio (at least 10, preferably $\gt$100) in order to maximize the proportionality of the mapping.
  • Inputs should be passed one bit at a time to the array, and should be used as signals that gate the current on or off through a cell. Cell current should be not be supplied through the input signal line. Topologies B and C satisfy both properties.

After taking the above measures, a higher inference accuracy can be obtained for the same ratio of minimum memory cell resistance to the metal interconnect resistance between cells. This paper shows that accuracy can remain resilient to parasitic resistance effects so long as the cell is 104 times (for CIFAR-10) or 105 times (for ImageNet) more resistive than the interconnect. These resistance ratios are readily accessible with scaled metal interconnects and memory cells that have been demonstrated to date for in-memory MVM—see table 1. If the memory cell technology does not have sufficient resistance or On/Off ratio, the effects of parasitic resistances can be further mitigated using previously proposed compensation or re-training techniques. These strategies can be used to complement the circuit- and system-level mitigation approaches above to further improve robustness. For example, the techniques described here can sufficiently reduce the IR drops to keep the memory devices in a more linear regime, making the compensation more effective. Ultimately, the reduction of parasitic resistance effects on accuracy is a significant step in enabling large arrays (∼1000 rows), which are inherently more energy-efficient because they amortize the peripheral circuit energy across more analog operations.

An important future step is to quantify the degree to which the effects of parasitic voltage drops are exacerbated by the nonlinear IV properties of the memory cell—whether in the access device or the memory device itself. Compared to the study of linear effects described here, these nonlinear effects will depend more strongly on the specific cell technology. Nonetheless, we expect that the approaches described here will still mitigate parasitic resistance effects to some degree, as they will still reduce the average cell currents used for MVM.

Acknowledgments

This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under Contract DE-NA0003525.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Please wait… references are loading.
10.1088/1361-6641/ac271a