Using the multi-bit feature of memristors for register files in signed-digit arithmetic units

Dietmar Fey

doi:10.1088/0268-1242/29/10/104008

1. Introduction

It was as early as 1971 when Chua [1] assumed that there should exist a further fundamental two-terminal passive circuit element besides the resistor, the capacitor, and the inductor, which he called memristor. This element should be able to change sustainably its resistive features in dependence on an outer appearing electrical flux that controls the relation of the devices' inner charge. Since the discovery of such memristive features in nanoscaled devices by a research group around S Williams at HP labs in 2008 [2], [3] much research work was triggered on the technological side concerning the physical realization of such devices for a better understanding of the physical principles and the tuning of such devices. In addition, also circuit and computer architects [4–6] have become interested in how the interesting features of this innovative nanotechnology can improve future computing units in a substantial way that is not possible with pure CMOS technology.

Memristive devices offer quite a lot of features which are beneficial for computer architectures, e.g. the non-volatility of its internal state after switching-off electrical power, comparatively fast access times, low energy consumption, compactness, and in the authors' view one of the most important features is the compatibility with CMOS technology. It is to expect that the first commercial application of memristive devices will be storage capability. HP intends in partnership with Hynix to use memristor technology for new high-dense binary memory devices, which will succeed current flash memory devices whose scaling properties will come to an end within the next four years [7].

Furthermore, there is current research activity on using memristors also for the next step after storing data, namely for the realising of memristor based data processing. Generally, processing with memristors can be divided in two branches: (i) digital processing and (ii) analogue processing. Using memristors for digital processing has the advantage of combining storage and logic functionality with the same technology in one single device. The earliest proposals on how to use memristor technology in a beneficial way for digital and analogue processing summarizes the review article in [8]. Crossbar architectures are proposed by the authors in which memristive devices are integrated in an add-on stack on top of a conventional CMOS chip to form hybrid solutions combining memristor technology with CMOS logic. This can be used, e.g. for the realisation of new FPGAs (field programmable logic arrays), in which the necessary configurations bits for the programming of a CMOS FPGA are stored in memristors directly above the FPGA circuit. This avoids the comparatively large chip area by a factor of 10 to 100 that has to be spent for conventional flash devices and SRAM (static random access memory) cells, which are usually used to realise the FPGA configuration memory. Furthermore it was proposed to use these memristive/CMOS integrating crossbar architectures for further computing applications such as processing with neurons and resistor diode based logic operations in which the CMOS technique is primarily used for signal restoration and inversion.

Currently, different possibilities are discussed how to use the memristive features for realising Boolean logic. One of them is the so-called stateful IMPLY logic [9], in which a pair of memristors is used in a three-phase mode. First, an initial state is written in the memristor pair. Second, it follows possibly a change of the states depending on the input data, which is applied as electrical flux to the memristors. Finally, either the changed or the unchanged state is read out, what corresponds to the result. In particular, this realises the Boolean implication operator.

Another possibility for realising logic is the so-called ratio logic [10]. Here an input current appears either to the positive or to the negative port of two-terminal memristors connected in a parallel circuit. Evaluating the ratio of the stored resistor values in the devices allows establishing a Boolean logic. A third proposal in building up logic functions with memristors proposes a 1-to-1 mapping of CMOS $n$ - and $p$ -transistor networks onto an equivalent network of memristors [11].

The widest field of proposals on how to use memristors for processing concerns definitely analogue computing [12]. Many researchers are driven by the fascinating possibility of memristors to mimic the exciting and inhibitory effect of a neuron in spike time-dependant plasticity (STDP) neural networks. An STDP neuron is directly mapped onto the nature of a memristor in such a way that the direction of the electric flux applied to memristor ports causes either an increase or a decrease of the continuous memristors state variable [13].

In addition to these activities, the work presented in this paper draws the attention to a feature of memristors that allows fundamentally improving a computing circuit and which was not strongly considered in literature so far, namely the possibility of storing multi-stable states in a memristor. In particular, we propose exploiting a memristor showing three-state storing behaviour to support the realization of a ternary logic. In this sense our proposal belongs to the digital processing branch of memristors. However, in contrast to the mentioned proposals above, such as IMPLY logic, ratio logic and CMOS transistor network alike memristor logic, we prefer as a first step for the memristor use to process data still in conventional CMOS logic. However, this logic can be speeded up and realised with less area if we use memristors for storing ternary values. To our best knowledge this is the first published work that proposes to use memristors as base element for a future ternary nanocomputer. In future steps it is also worthwhile to think about realising the ternary logic directly with memristors. Currently, we are convinced, and we will prove it in this paper, that just using the possibility to store a ternary value in one physical storage cell allows building up a better arithmetic unit as is fundamentally possible and actually done with conventional binary logic.

The rest of this paper is organised as follows. In section 2 the basic physical principle of a memristor and in particular its characteristic to realise multi-stable and ternary states are explained. This happens by means of a SPICE model from Biolek [14], which is well-known in literature. Section 3 shows the fundamentals and benefits of ternary logic and presents a digital transistor netlist for realising an adder which is based on three-state valued operands. This adder can process a ternary and a binary input operand in $O(1)$ steps independent of the word length $n$ . Chapter 4 combines the transistor netlists of the memristor and the adder by a proposed interface logic that changes the ternary state stored in a memristor cell to binary Boolean signals. All presented circuits are modelled on analogue level using SPICE descriptions. Chapter 5 compares the solution to a conventional binary ripple-carry and carry-look-ahead adder concerning run time and area effort. Finally we finish the paper with a summary of the most important results and an outlook to the next intended steps.

2. Realising multi-stable states in memristors

Generally the base material of a memristor, or more exactly a memristive switch, consists of so-called transition metal oxides, such as e.g. titan oxide ( $Ti{{0}_{2}}$ ) or strontium titanate $(SrTi{{0}_{3}}).$ Transition metal oxides combine both insulator as well as metal conductive properties in one device. This behaviour can be traced back to two distinguished zones located inside the material. One zone, the undoped zone, behaves more like an insulator due to occupied outer electron orbitals, whereas the neighboured doped zone behaves as a conductor due to a high oxygen reduction resulting in a low electrical resistance. By applying a voltage over time on both ends of the zones one generates a flux, which can shift the zones laterally within the device dimension length $D$ . This changes the resistance of the memristor, ${{R}_{MEM}}$ , which is the sum of the resistances in both zones. If $w$ is the width of the doped zone, and if ${{R}_{ON}}$ and ${{R}_{OFF}}$ are both extreme values of resistance in the case of completely doped, i.e. $w=D$ , and undoped zones, i.e. $w=0$ , (1) holds for the memristor resistance ${{R}_{MEM}}$ of a memristive device.

$\begin{eqnarray}\begin{array}{rcl} {{R}_{MEM}}\left( x \right) & = & {{R}_{ON}}\cdot x+{{R}_{OFF}}\cdot \left( 1-x \right), \\ where\ \ x & = & \frac{w}{D}\epsilon \left( 0,1 \right) \\ \end{array}\end{eqnarray} \tag{ 1 }$

The mathematical model of (1) expresses a linear ion drift. Actually, the change of the zone width $w$ , denoted as state variable, and consequently also the change of the normalized state variable $x$ is highly non-linear in real devices. In order to express that non-linearity one defines a so-called window function, $f\left( x \right):x\to (0,1)$ , (2). Different window functions are published in the literature, which model in distinguishing degree of detail the behaviour of a real device. e.g., they model differently the speed of the ion drift between the boundaries $w=0$ and $w=D$ of the device. The unique feature that all window functions have to fulfil is that they must produce zero at the boundaries, $f(0)=f(1)=0$ , since the drift of the ions has to stop there. The exponent $p$ is a measure for the strength of the non-linearity. In this paper the window function (2) is used, which was published by Joglekar ([15]).

$\begin{eqnarray}&&f\left( x \right)=1-{{\left( 2x-1 \right)}^{2p}}\end{eqnarray} \tag{ 2 }$

This window function is used as weighting factor in (3), which expresses according to Strukov et al [2] the time derivative of the normalized state variable, $x$ , where ${{\mu }_{v}}\approx {{10}^{-14}}\frac{{{m}^{2}}}{Vs}$ corresponds to the dopant mobility, and $i(t)$ to the induced current flowing through the memristive device by an applied voltage $v(t)$ (4).

$\begin{eqnarray}&&\frac{dx}{dt}=k\cdot i\left( t \right)\cdot f\left( x \right)\ k=\frac{{{\mu }_{v}}{{R}_{ON}}}{{{D}^{2}}}\end{eqnarray} \tag{ 3 }$

$\begin{eqnarray}&&i\left( t \right)=v\left( t \right)/{{R}_{MEM}}\left( w \right)\end{eqnarray} \tag{ 4 }$

Inserting (1) in (4), and inserting (4) and (2) in (3) yields a differential equation for the normalized state variable $x$ . Biolek et al published in [14] a SPICE replacement circuit for solving this differential equation. This SPICE model was used in this work in order to find appropriate parameters to generate a well suitable multi-stable state behaviour serving as data input for the intended ternary arithmetic unit. Figure 1 shows a schematic and the simulation results, produced in LTSpice, for a simple memristor triggering circuit. The memristor parameters, ${{R}_{ON}}=100\Omega$ , ${{R}_{OFF}}=38\;K\Omega$ , ${{R}_{INIT}}=28\;K\Omega$ , and $p=1$ for the window function were taken from [14], where an example for a muti-stable memristive device was shown for that parameter set. In the original simulation four stable states were produced and stroked subsequently as shown on the left side in figure 1.

**Figure 1.** Simulation of a multi-stable memristor with three states.
Download figure:
Standard image High-resolution image

The parameters for the voltage source which is used for generating of ion drift were changed compared to [14] in order to produce a tri-stable behaviour which is needed for the three-state arithmetic unit (see figure 1, right). The sinusoidal voltage was reduced from 1.5 V to 1.3 V, and the frequency of the sine wave was changed from $2\pi$ to $\pi$ . The shown curve for the voltage V(memristor:x) in the wave diagram corresponds to the inner node $x$ in the SPICE model for the memristor, and is equivalent to the state variable $x$ . With each positive voltage wave a new stable state is generated. After applying two sine waves the resistance level was raised twice. Together with the base level three available states were received. The initial state, ${{x}_{0}}$ , with which the memristor operation starts is also decisive to achieve such a multi-stable behaviour. This initial value is determined by a start memristor value, denoted as ${{R}_{INIT}}$ , according to (5).

$\begin{eqnarray}&&{{x}_{0}}=\frac{{{R}_{OFF}}-{{R}_{INIT}}}{{{R}_{OFF}}-{{R}_{ON}}}\end{eqnarray} \tag{ 5 }$

Now by means of simulation a first appropriate SPICE model was found for a memristor that allows one to store three states. As next this model is used for ternary register cells, which will serve as input for a binary arithmetic unit operating on three-state operands. In the next chapter this unit is explained in detail.

3. Three-state binary SD arithmetic unit

For nearly 50 years since the invention of the IAS (Institute of Advanced Studies) computer by John von Neumann et al we have been using a binary system in nearly every computer. In contrast to that classical binary system, a ternary number system is differentiated not between two but between three states. As early as the 17th century, the Spanish bishop and scholar Caramuel y Lobkowitz (1606-1682) investigated in his scripture Mathesis biceps, vetus et nova number systems with different bases, e.g. a number system with the digits $0,1$ , and $2$ . In the 18th century Abraham Gotthelf Kästner proved that each number can be composed in a weighted ternary system as a sum of multiples of 3 which are either weighted by a plus one, a minus one, or a zero. Such a system with the weights $\bar{1}=-1,0,1$ was later called by Donald E Knuth the balanced ternary system [18]. In 1961 Avizienis [19] proposed to use such balanced number systems with plus and minus weighted digits to build-up a fast carry-free parallel arithmetic. He denoted such number systems as signed-digit (SD) number representations. This SD number system offered as well as the ternary system a non-redundant number representation. This means for each number there exists a unique representation. However, such systems have the disadvantage that they were difficult to implement in digital electronics, which is based on binary logic operations. Therefore, in 1988 Parhami [20] proposed using binary SD numbers with balanced ternary weights but with a significance of a power of ${{2}^{i}}$ in each digit position $i$ and not to ${{3}^{i}}$ as in a pure ternary system. Then carry-free adders can also be implemented in digital electronics. However, the price for that carry-free addition is the loss of non-redundancy, i.e. one number can have different representations. Worse is that all data has to be stored with double effort compared to a pure binary representation.

If a new device, such as e.g. a memristor, allows storing three states reliably in one single physical storage cell and if this device is also compatible with high-dense CMOS logic, then a carry-free adder could be realised in current semiconductor technology and the drawback of doubled storage space is not given anymore. Neither of the above described number systems that distinguish from the binary system have been implemented in a real computer system, with the exception of the pure ternary system based on the factors ${{3}^{0}}$ , ${{3}^{1}}$ , ${{3}^{2}},$ etc. This number system was used in the computer SETUN from Brousentsov built in the former Soviet Union [16], [17]. Due to the fascinating features of memristive devices carry-free adders can become possible. They can be built-up in modern CMOS technologies in the near future without the storage overhead we have using a binary SD representation.

According to Knuth [18] the outstanding properties of a balanced binary SD representation are:

(i)
The negative of a number can be simply received by exchanging the positive and the negative part in each digit of a number.
(ii)
The sign of a number is the sign of the first leading digit unequal to 0.
(iii)
The comparison of two binary SD numbers to its size can be realized by a digit-wise comparison from left to right by observing the order $1\gt 0\gt \bar{1}$ .

In binary SD representation based on the three balanced weights per digit, $\bar{1},0,1$ , the value of a number $a=({{a}_{n-1}},\ldots ,{{a}_{0}})$ is determined according to (6), if $\ {{a}_{i}}\ \epsilon \ (\bar{1},0,1)$ , and $n$ be the word length. For the processing of signed digits with Boolean logic it is necessary to define for a binary SD number, $a$ , a positive part, ${{a}^{+}}$ (7), and a negative part, ${{a}^{-}}$ (8). It holds that the positive and the negative part in each digit, $a_{i}^{+}$ and $a_{i}^{-}$ , are either 1 or 0.

$\begin{eqnarray}&&w\left( a \right)=\mathop{\sum }\limits_{n=0}^{N-1}{{a}_{i}}\cdot {{2}^{i}}\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}\begin{array}{rcl} {{a}^{+}} & = & \left( a_{n-1}^{+},\ldots ,a_{0}^{+} \right)a_{i}^{+}=1\Leftrightarrow {{a}_{i}} \\ {} & = & 1\wedge a_{i}^{+}=0\Leftrightarrow {{a}_{i}}\ne \ 1 \\ \end{array}\end{eqnarray} \tag{ 7 }$

$\begin{eqnarray}\begin{array}{rcl} {{a}^{-}} & = & \left( a_{n-1}^{-},\ldots ,a_{0}^{-} \right)a_{i}^{-}=1\Leftrightarrow {{a}_{i}} \\ {} & = & \bar{1}\wedge a_{i}^{-}=0\Leftrightarrow {{a}_{i}}\ne \ \bar{1} \\ \end{array}\end{eqnarray} \tag{ 8 }$

For the transfer of the three-state value stored in a memristor cell to a pair of binary signals we need an appropriate coding scheme. There are different possible solutions on how to code three states with two binary signals. In this paper a proposal by Duprat and Muller ([21]) is selected, which proved as the minimum concerning the realisation of the required transistor logic due to the fact that $a_{i}^{+}=a_{i}^{-}=1$ is not allowed (see table 1).

Table 1. Digit coding of a binary SD number.

$a_{i}^{+}$	$a_{i}^{-}$	$a$
0	0	0
0	1	$\bar{1}$
1	0	1
1	1	not defined

This coding scheme avoids that a carry digit, which was generated during an addition, affects more than one digit to the left. The coding scheme guarantees that the carry vector $c$ and the intermediate sum vector $z$ will never show two $\bar{1}s$ or two $1\;s$ at the same digit position. Table 2 demonstrates this carry-free addition for an example of an addition of two positive integer numbers. In each digit position $i$ one of the four following operation rules is applied: $({\rm i})0+0=0\cdot 2+0={{c}_{i+1}}+{{z}_{i}}$ , $({\rm ii})0+1=1\cdot 2+\bar{1}={{c}_{i+1}}+{{z}_{i}}$ , $({\rm iii})1+0=1\cdot 2+\bar{1}={{c}_{i+1}}+{{z}_{i}}$ , $({\rm iv})1+1=1\cdot 2+0={{c}_{i+1}}+{{z}_{i}}$ .

Table 2. Example of a carry-free addition for two positive integers, A and B, using binary coded SD numbers.

A	.	0	1	0	1	$={{(5)}_{10}}$
B	.	1	0	0	1	$={{(9)}_{10}}$
c	1	1	0	1
z	0	$\bar{1}$	$\bar{1}$	0	0
s	1	0	$\bar{1}$	1	0	$={{(16-4+2)}_{10}}={{(14)}_{10}}$

A further simplification of the hardware realisation is possible if we assume that one of the operands is an SD number, $a$ , and the other one is binary operand, $B$ . It is assumed in this first investigation of memristor based arithmetic that the first operand comes from a three-state register cell, and the second operand comes e.g. from the normal DRAM. Table 3 shows the truth table for this operation that has to be performed in each digit using the coding scheme in table 1.

Table 3. 2-stage addition of a binary SD number $a$ and binary number $B$ (left); Truth table for first stage (right).

a	$a_{n-1}^{+}$	...	$a_{1}^{+}$	$a_{0}^{+}$
	$a_{n-1}^{-}$	...	$a_{1}^{-}$	$a_{0}^{-}$
B	${{B}_{n-1}}$	...	${{B}_{1}}$	${{B}_{0}}$
c	$c_{n-2}^{+}$	...	$c_{0}^{+}$
z	$z_{n-1}^{-}$	...	$z_{1}^{-}$	$z_{0}^{-}$
s	$s_{n-1}^{+}$	...	$s_{1}^{+}$	$s_{0}^{+}$
	$s_{n-1}^{-}$	...	$s_{1}^{-}$	$s_{0}^{-}$
$a_{i}^{+}$	$a_{i}^{-}$	${{B}_{i}}$	$c_{i}^{+}$	$c_{i}^{-}$	$z_{i}^{+}$	$z_{i}^{-}$
1	0	1	1	0	0	0
1	0	0	1	0	0	1
0	0	1	1	0	0	1
0	0	0	0	0	0	0
0	1	1	0	0	0	0
0	1	0	0	0	0	1

As you can see in the truth table the entries for $c_{i}^{-}$ and $z_{i}^{+}$ are always zero. Consequently, they have not to be calculated and stored. The Booelan equations in (9) and (10) are sufficient for the calculation of the positive carry digit, $c_{i}^{+}$ , and the negative intermediate sum, $z_{i}^{-}$ .

$\begin{eqnarray}&&c_{i}^{+}=a_{i}^{+}\vee \left( {{B}_{i}}\wedge \overline{a_{i}^{-}} \right)\end{eqnarray} \tag{ 9 }$

$\begin{eqnarray}&&z_{i}^{-}=\left( a_{i}^{+}\vee a_{i}^{-} \right)\oplus {{B}_{i}}\end{eqnarray} \tag{ 10 }$

It follows the second step, the final calculation of the sum digits, $s_{i}^{+}$ , and $s_{i}^{-}$ . The corresponding truth table is given in table 4 and the Boolean equations toto determine $s_{i}^{+}$ and $s_{i}^{-}$ are shown in (11) and (12).

Table 4. Truth table for second stage of the calculation of sum digits.

$z_{i}^{+}$	$z_{i}^{-}$	$c_{i-1}^{+}$	$c_{i-1}^{-}$	$s_{i}^{+}$	$s_{i}^{-}$
0	0	0	0	0	0
0	0	1	0	1	0
0	0	0	1	x	x
1	0	0	0	x	x
1	0	1	0	x	x
1	0	0	1	x	x
0	1	0	0	0	1
0	1	1	0	0	0
0	1	0	1	x	x

x: Donʼt care

If we want to execute a subtraction then simply a change of the positive and negative parts in each digit of the SD operand has to be performed at the beginning. Afterwards a normal addition can start, and at the end a further exchange has to be applied to the resultʼs positive and negative digits (13).

$\begin{eqnarray}&&s_{i}^{+}=\overline{z_{i}^{-}}\wedge c_{i-1}^{+}\end{eqnarray} \tag{ 11 }$

$\begin{eqnarray}&&s_{i}^{-}=\overline{c_{i-1}^{+}}\wedge z_{i}^{-}\end{eqnarray} \tag{ 12 }$

$\begin{eqnarray}&&a-B=\left( -1 \right)\cdot \left( \left( -1 \right)\cdot a+B \right)\end{eqnarray} \tag{ 13 }$

Figure 2 shows as schematic the transistor netlist in CMOS logic for step1, implementing (9) and (10). Figure 3 displays the corresponding transistor netlist for step 2, implementing (11) and (12). A similar netlist exists for the realisation of the exchange/bypass function, which is necessary for the realization of the SD subtraction. The exchange/bypass function has two inputs, which in the case of a subtraction are just exchanged by passing them to the two outputs. Otherwise, in the case of an addition, the two inputs are directly bypassed 1-to-1 to the outputs.

Figure 4 displays a block schematic for a whole SD cell and a simulation result as waveform. The correct result for the SD addition and subtraction, controlled by the signal addsub is shown for an SD number, $a_{i}^{+}$ and $a_{i}^{-}$ , and a binary number, $B$ . The control signal addsub is high, if an addition has to be executed, otherwise a subtraction has to be carried out. The wave diagram shows the addition/subtraction results for different combinations of an SD number $a$ , with its positive and negative parts, ${{a}^{+}}$ and ${{a}^{-}}$ , and binary operand B in the signals ${{s}^{+}}$ and in ${{c}^{+}}$ , which has to be weighted with factor 2.

Figure 5 displays a scheme for four side-by-side arranged SD cells which form an SD adder with four digits. The correctness of the transistor layout for the whole SD adder with word length $4$ was verified by simulation. Thus, the next step is to connect a memristor based register set with this four digit SD adder netlist. The correctness of the four digit SD adder cell is shown later in a simulation example with a memristor register connected to the SD adder.

4. Connecting memristor register model with SD adder transistor netlist

The transformation of one of the three stable memristor states into two binary signals requires an appropriate decoder circuit. For such a decoder a voltage divider is preferable, e.g. realised by an additional resistor connected in series to the memristor cell (see figure 6). Then the voltage, which is accessible at the intermediate node, will change according to the stored resistance level in the memristor cell. With a further serially connected resistor network that is parallel connected to the intermediate node it is possible to measure uniquely the stored value. Then this value can be changed by means of a constant voltage level by a comparator into digital CMOS level signals. Finally these signals have to be converted according to the selected coding scheme for an SD (see table 1).

Yilmaz and Prazunder presented in [22] a solution for the decoding of multi-bit memristive states in which the intermediate node is connected to a serial chain of active diode elements. Each diode in this chain contributes a certain bias level. These serially connected bias levels are exploited to detect the different voltage levels, which are the consequence of the different multi-bit memristive states. The bias level at each diode has to be controlled in such a way that exactly one voltage level corresponds to the difference of two neighboured stored memristive stable states. The argument for using diodes rather than e.g. a further resistor based voltage divider circuit is that the comparators would need different voltage sources as input to produce the CMOS level outputs and that different voltage sources are difficult to realise in CMOS. However, diodes are also difficult to realise in VLSI. Furthermore, a solution was found by the author that allows feeding the comparators with one voltage source to distinguish between the three voltage levels that we need for the coding of a ternary digit. Therefore, we prefer the resistor based decoder circuit shown in figure 6 as a solution for the analogue-to-digital change of the three-level stable memristive state.

The schematic in figure 6 shows a memristor cell, which is excited with two subsequent voltage sine waves by E1 (corresponds to $V(n003)$ in the wave diagram), then the excitation of the memristor is switched off. As result the curve for the memristor state V:memristor:x, shown in the mid wave diagram, is permanently raised twice about two levels. Together with the base level at $270mV$ at the beginning we get the three states $0,1,\bar{1}$ . Furthermore, two waveforms are shown for the voltages at the nodes of the concluding voltage divider circuit consisting of resistors R2 and R3. It is clear to see in the mid wave diagram that there is a hub for $V(n004)$ , the node which corresponds to the top node of R2. The hub runs between the first and the second maximum of 0.450 mV at 0.5 s and 0.823 mV at 1.5 s¹ . This hub follows exactly the excitation wave for the memristor states. A similar course is shown for the voltage waveform of $V(n005)$ , which corresponds to the intermediate node between R2 and R3. The resistance values of R1 and R2, 3 K and 10 K, are dimensioned in such a way that a single threshold level of 0.35 V is sufficient.

$\begin{eqnarray}&&s_{i}^{+}=\overline{out1}\wedge out0\end{eqnarray} \tag{ 14 }$

$\begin{eqnarray}&&s_{i}^{-}=out1\wedge out0\end{eqnarray} \tag{ 15 }$

That means that at the first maximum excitation only the voltage at $V(n004)$ is above the threshold. After the next excitation swing both node voltages are above the level. The two comparator instances U1 and U2 produce as implication of these threshold crossings one or two 5 V signals at their outputs $V(n002)$ , denoted as $out0$ , and $V(n006)$ , denoted as $out1$ , at 0.5 s or 1.5 s, resp. (see top wave diagram in figure 6). For the comparator circuit a public available SPICE netlist from Wayne State University [23] was used. If both outputs of the comparators are high this will correspond to $\bar{1}$ , if only one output is high this corresponds to $1$ , whereas two low outputs correspond to $0$ (see table 5). The converter output signals, $out0$ and $out1$ , have to be converted by further Boolean functions, (14) and (15), to the coding scheme shown in table 1. These converted signals can then be attached to input pins of the SD adder.

Table 5. Conversion of the decoder outputs into selected coding scheme.

$out0$	$out1$	signed digit	${{s}^{+}}$	${{s}^{-}}$
1	1	$\bar{1}$	0	1
1	0	+1	1	0
0	0	0	0	0

Figure 7 shows the schematic for the complete setup of the SD adder operating on four digits. The circuit comprises an analogue part, i.e. the memristor register cells and the decoder circuits (instances A_D_interface), and a digital part, comprising of the four code converters (invi and andi realising (14) and (15)) and the SD adder cell $X1$ . The correct working of the mixed-signal circuit was verified by SPICE simulations, exemplarily shown by the waveforms shown in figure 8. The two memristors in figure 7, memristor1 and memristor2, have been excited with one, resp., two positive sine waves, so that a 1 is stored in memristor2 and a $\bar{1}$ is stored in memristor1. In the other two memristors, memristor3 and memristor4, a 0 and another $\bar{1}$ were stored. Therefore, a $7=\bar{1}01\bar{1}$ was held in the memristor register. This value was added, since signal addsub is assigned to 0, with the integer +6, (see the corresponding connections of GND to inputs B4 and B1, and 5 V to the inputs B3 and B2 in the schematic in figure 7). Therefore, the result of the addition must be $-1$ . This is shown for the signals $s{{1}^{+}}=0$ and $s{{1}^{-}}=1$ in figure 8, which are filtered out between the time of 1.4 s and 1.7 s by an AND gate. The other signals, $s2$ to $s4$ are not shown for clarity, but they are all equal to 0.

**Figure 8.** Waveform of simulation for selected inputs and outputs of the complete circuit.
Download figure:
Standard image High-resolution image

5. Comparative evaluation of a memristor based SD adder

After the principal functioning of an SD adder and the interfacing to three-state storing memristor cells was demonstrated, the question arises what are the real benefits of such a solution. It was already mentioned in the introduction that using an SD number system brings qualitative improvements such as a fast carry-free addition. In this chapter these benefits that come with the aspired use of memristor technology will be quantified to first order. The quantitative comparison concerns the time and needed area, which can now be estimated by investigating the existing transistor netlist for the digital logic and the interface circuitry for attaching the memristors to the digital SD adder. The memristor based adder is compared with other frequently used adder structures, e.g. a ripple-carry-adder and a carry-look-ahead adder, concerning the required area, measured in numbers of necessary transistors, and the run time, measured in multiples of gate delay time $\Delta$ .

5.1. Estimation of time and area effort for memristor-based SD adder

First, the area effort for the memristor based adder including the possibility to realise a subtraction is determined. For the realisation we assume as a first simple conceptual proposal a stacked circuit setup. In an ideal case such a circuit would consist mainly of three stacked chip layers. Alternatively, which is of course easier to realise, the second and the third layer of the chip stack, described in (ii) and (iii), are integrated in one mixed-signal layer:

(i) In the first layer on top of the chip stack a complete 2D memristor array is realized. In this layer a memristor based register file for the SD arithmetic unit or even a complete hierarchical memristor-based cache memory system, storing the signed digits, is hosted.
(ii) In the second layer the interface circuitry could be integrated, which converts the memristor states to digital outputs. This can happen similar to the proposal for the decoder circuit shown in figure 6, consisting of a voltage divider and two comparators. However, integrating resistors in VLSI circuits is area consuming. Therefore MOSFET based voltage controlled resistors (VCR) are discussed in the literature for realizing resistors in mixed-signal circuits. Figure 9 shows a solution for a VCR from [24]. The MOSFETs M3–M5 on the right side guarantee by means of a current mirror that transistor M2 is always running in the saturation region. This ensures that transistor M1 is always running in the linear region realizing an ohmic resistance that can be controlled by the ratio of transistor channel width to channel length and the control voltage $Vc$ . The transistors M1, M3, and M4 are operating as floating gates with bias voltages $Vb$ and $Vb1$ . Using floating gate transistors offers benefits such as e.g. simplicity and low power consumption, for details see [24].Consequently the voltage divider circuit of figure 6, consisting of three resistors, requires 15 transistors if they are realised as VCRs. Of course, these transistors and the effort for the two comparators results in a comparatively large area in contrast to a single memristor cell, which requires only a few square nm of size. However, it is emphasized that the area overhead for the A/D (analogue-to-digital) conversion is not necessary for each memristor cell located in the first layer. It is required only for each digit in one row of the memristor based register file. The length of the row corresponds to the word length used in the arithmetic unit. To address a single row requires a special selector circuit as well as a circuit for writing multi-bit states in a single memristor cell. Both topics are not in the scope of this paper. They are of course not easy to solve. However, the above simulations (see figure 8) show in principle, that either one or two sine oscillations of a writing signal are sufficient, at least for the used simplified memristor model.
(iii) In the third layer the digital part is integrated, i.e. the SD adder including the code conversion circuit. The presented work in this paper is to understand as a first step in the long-term orientated goal to build a memristor based SD arithmetic unit. Therefore, it was initially investigated if a memristor based register file for an SD arithmetic unit offers benefits concerning area and time constraints on the digital layer at all. If there are no advantages at that level, any further investigation is obsolete. However, it is clear that subsequent intensive investigation concerning the interface problem has to follow.

**Figure 9.** Voltage controlled grounded resistor, according to [24].
Download figure:
Standard image High-resolution image

Since the digital part is in the focus of this paper the area and time estimation is restricted to that part. A simple counting of the transistors of an SD adder cell, composed of the modules add_step1 (see figure 2) and add_step2 (see figure 3), two exchange/bypass circuits to consider a subtraction (see figure 4), yields a number of 62 transistors. For the realisation of (14) and (15) one gets 14 additional transistors for the two-input AND gates and the inverter we need to determine the positive and negative part, ${{s}^{+}}$ and ${{s}^{-}}$ , of an SD. Thus, in total 76 transistors are required per digit.

For the estimation of the gate delay time it is assumed for all adders a realisation effort of $2\cdot$ (number of inputs) of transistors for a standard CMOS NAND and NOR gate. For the corresponding AND/OR gates two further transistors accrue for the conclusive CMOS inverter. For an EXOR gate eight transistors are assumed. According to the number of NAND/NOR and inverter stages that have to be passed, the supposed gate delay time is $1\Delta$ for a NAND/OR, $2\Delta$ for an AND/OR gate, and $3\Delta$ for an EXOR gate. The longest path delay time for the memristor based SD adder can be determined as follows. In one SD cell the input signals run first through the gates of the exchange/bypass circuit and next through the module add_step1. Then the output ${{c}_{i}}$ is led to the left neighbour cell, where it becomes the input of the module add_step2. After running through that module a final exchange/bypass circuit follows. In each module a Boolean sum-of-product with at least one inverted input is calculated. This results in a total gate delay time of $12\Delta$ for the longest path independent of the operands' word length $n$ .

5.2. Estimation of time and area effort for ripple-carry-adder

In contrast we have in a ripple-carry-adder (RCA) a dependence on the operandʼs word length. An RCA consists of $n$ cascaded 1-bit full adders. Each full adder processes a certain bit position (16). It has three inputs, two for the two operand bits, ${{a}_{i}}$ and ${{b}_{i}}$ , and a further input signal, ${{c}_{i-1}}$ , which corresponds to the carry output of the right-neighboured full adder. A full adder generates two outputs, ${{s}_{i}}$ and ${{c}_{i}}$ . In the worst case a carry propagates, or ripples, from the least significant bit on the right side to the most significant bit on the left side. Therefore, the run time for an RCA yields a total gate delay, including an additional EXOR gate for the twoʼs complement, of $(2n+3)\Delta$ .

$\begin{eqnarray}\begin{array}{rcl} {{s}_{i}} & = & {{a}_{i}}{{b}_{i}}{{c}_{i-1}}\vee \overline{{{a}_{i}}}\overline{{{b}_{i}}}{{c}_{i-1}}\vee {{a}_{i}}\overline{{{b}_{i}}}\overline{{{c}_{i-1}}}\vee \overline{{{a}_{i}}}\overline{{{b}_{i}}}\overline{{{c}_{i-1}}} \\ {{c}_{i}} & = & {{a}_{i}}{{b}_{i}}\vee {{a}_{i}}{{c}_{i-1}}\vee {{b}_{i}}{{c}_{i-1}} \\ \end{array}\end{eqnarray} \tag{ 16 }$

The number of required transistors per bit can be calculated as follows. One OR gate for calculating ${{c}_{i}}$ needs three inputs and one OR gate for calculating ${{s}_{i}}$ needs four inputs. In total this results in $\left( 3+1 \right)\cdot 2+\left( 4+1 \right)\cdot 2=18$ transistors. The three AND gates for ${{c}_{i}}$ need two inputs and the four AND gates for ${{s}_{i}}$ have three inputs, requiring in total $3\cdot \left( 2+1 \right)\cdot 2+4\cdot \left( 3+1 \right)\cdot 2=50$ transistors. Finally, we need an additional 6 transistors for the complement formation of $\overline{{{a}_{i}}},\overline{{{b}_{i}}}$ , and $\overline{{{c}_{i-1}}}$ in the inverters and 8 transistors for the twoʼs complement with an EXOR gate, resulting in a total of 82 transistors per bit in a RCA. This refers to an RCA, which is optimised to run time. There are also published solutions with many fewer transistors but they require additional run time or specially designed transistors. For a fair comparison we assume to use the same standard transistor types in all adders and due to a time complexity of $O(1)$ in a memristor based SD adder a run time as low as possible is the aspiration for all adders.

5.3. Estimation of time and area effort for carry-look-ahead adder

A faster and frequently used adder is the carry-look-ahead adder (CLA), which computes in parallel the generation or propagation of a possible carry over a block of several bits. Afterwards, for every bit position it can be determined in a divide-and-conquer method if a carry bit occurs at each bit position. The occurring of a carry in a certain bit position $i+1$ can have two causes. First, a possible carry coming from right at position $i-1$ is propagated to position $i$ . This is the case if at least one of the operand bits is 1 at position $i$ , i.e. $p_{j}^{i}=1$ , (17). Second, a carry bit is generated at this position, i.e. $g_{j}^{i}=1$ . The bit position is denoted with $4i+j$ , where $i$ denotes the number of the block with a length of 4 bits, $0\leqslant i\lt (\lceil n/4\rceil\ -1)$ , and $j\in (0,..,3)$ is the bit number within that block.

$\begin{eqnarray}&&p_{j}^{i}={{a}_{4i+j}}\oplus {{b}_{4i+j}}\quad g_{j}^{i}={{a}_{4i+j}}\wedge {{b}_{4i+j}}\end{eqnarray} \tag{ 17 }$

With those propagate and generate bits, $p_{j}^{i}$ and $g_{j}^{i}$ , it is possible to calculate if a carry bit, shifted into a block, can propagate completely from right up to the left end through the block (18), or if a carry bit is newly generated in this block (19).

$\begin{eqnarray}&&P\left( 4i+1,4(i+1)-4 \right)=p_{3}^{i}\;p_{2}^{i}\;p_{1}^{i}\;p_{0}^{i}\end{eqnarray} \tag{ 18 }$

$\begin{eqnarray}&&\begin{array}{l} G\left( 4(i+1)-1,4(i+1)-4 \right) \\ \quad =g_{3}^{i}\vee g_{2}^{i}\;p_{3}^{i}\vee g_{1}^{i}\;p_{3}^{i}\;p_{2}^{i}\vee g_{0}^{i}\;p_{3}^{i}\;p_{2}^{i}\;p_{1}^{i} \\ \end{array}\end{eqnarray} \tag{ 19 }$

With those block-wise propagate and generate bits, $P(4i-1,4i-4)$ and $G(4i-1,4i-4)$ , propagation and generation of carry bits over larger blocks can be recursively summed up. For example, for a larger block, running e.g. from bits 0 to 7, the propagate block bit $P(7,0)$ , and the generate block bit $G(7,0)$ can be determined according to (20) by using the propagate and generate bits for the block combining bits 0 to 3, $P(3,0)$ , and $G(3,0)$ , and the propagate bit for block combining bits 4 to 7, $P(7,4)$ .

$\begin{eqnarray}\begin{array}{rcl} P\left( 7,0 \right) & = & P\left( 7,4 \right)\wedge P\left( 3,0 \right) \\ G\left( 7,0 \right) & = & G\left( 7,4 \right)\vee P\left( 7,4 \right)\wedge G\left( 3,0 \right) \\ \end{array}\end{eqnarray} \tag{ 20 }$

By traversing a tree structure upwards and downwards of these partial propagate and generate block bits the final carry bits $c_{j}^{i}$ can be produced for each bit position (21).

$\begin{eqnarray}\begin{array}{rcl} c_{3}^{i} & = & g_{2}^{i}\vee g_{1}^{i}\;p_{2}^{i}\vee g_{0}^{i}\;p_{2}^{i}\;p_{1}^{i}\vee c_{0}^{i-1}\;p_{2}^{i}\;p_{1}^{i}\;p_{0}^{i} \\ c_{2}^{i} & = & g_{1}^{i}\vee g_{0}^{i}\;p_{1}^{i}\vee c_{0}^{i-1}\;p_{1}^{i}\;p_{0}^{i} \\ c_{1}^{i} & = & g_{0}^{i}\vee c_{0}^{i-1}\;p_{0}^{i} \\ c_{0}^{i} & = & G\left( 4i-1,4i-4 \right) \\ {} & {} & \vee P\left( 4i-1,4i-4 \right)\cdot c_{0}^{i-1} \\ \end{array}\end{eqnarray} \tag{ 21 }$

Finally, in all bit positions the sum bits can be determined in one step (22).

$\begin{eqnarray}&&{{S}_{4i+j}}=p_{j}^{i}\oplus c_{j}^{i}\end{eqnarray} \tag{ 22 }$

The run time for the CLA adder is given by $1\Delta$ for (17) and $2\Delta$ for (19). Equation (19) has to be repeated in ${\rm log} (n)$ modules according to (20) during upwards traversing in the tree of block propagate and generate bits. During downward traversing equation (21), which has a delay time of $2\Delta$ , is repeated $({\rm log} (n)-1)$ times (downward traversing requires one gate delay less than upward traversing). A further $3\Delta$ is necessary for (22) and another $3\Delta$ for twoʼs complement formation. This results in a total run time of $(4{\rm log} (n)+5)\Delta$ .

The counting of transistor numbers for the CLA with a block size of 4 bits and word length $n$ yields a number of 12 transistors to compute (17), 42 transistors for a block of 4 bits due to (18) and (19), further $18\cdot \mathop{\sum }\limits_{i=1}^{{\rm log} (n)-2}{{i}^{2}}=(n/4-1)$ transistors for (20), 78 transistors for a 4 bit block to fulfil (21) and finally 16 transistors for the EXOR in (22) and the twoʼs comelement. This gives a sum of 12+(42+78)/4+18log(n)+16=58+18· $(n/4-1)$ transistors per bit for a CLA.

Table 6 summarizes for all three adders the result for the calculated time and area effort. A comparison of the numbers shows clearly the advantage of a memristor based SD adder. It has always been less effort concerning the number of necessary bits versus the RCA, and a smaller transistor number versus the CLA beginning from a word length $n=16$ . Concerning the run time the memristor based SD adder is anyway unbeatable due to its constant effort of $O(1).$

Table 6. Summarizing area and gate delay effort for the three compared adders.

Adder	time	area (measured in #transistors
		per bit/digit)
Memristor based SD adder	$12\Delta$	76
Ripple-carry-adder	$(2n+3)\Delta$	82
Carry-look-ahead adder	$(4{\rm log} (n)+5)\Delta$	$58+18\cdot (n/4-1)$

6. Conclusion

The paper focuses on how to exploit the multi-bit storing features of memristors for digital arithmetic circuits. In particular it is investigated how a three state storing memristor cell can be used for an SD adder, which exploits the benefits of a balanced ternary weighted number system. Though such adders show, compared to all other adders, the qualitative advantage of having a constant, from word length independent, run time in $O(1)$ they have not been realised so far since they need a doubling of the memory space for their operands in binary logic systems.

If memristors can store three states reliably in future and if they will be compatible with CMOS VLSI technology, then such adders could be integrated in silicon for the first time without paying with unacceptable memory overhead. Therefore, in the paper we designed for the first time in detail a transistor netlist for such a memristor based SD adder. Appropriate controlling parameters for the memristor, i.e. the ON/OFF resistance, the initial resistance, the operation voltage, and a parameter for the window function was found by SPICE simulations in order to generate a stable three-state behaviour of a memristor cell. For that a simulation model from Biolek et al was correspondingly modified. Furthermore, an analogue-to-digital converter circuit was designed and verified by SPICE simulation that allows changing the memristor storing state to a binary signed-digit signal, which is the input for the SD adder. Finally, the qualitative benefits of the memristor based SD adder were quantitatively proved by a detailed comparison analysis considering required time and area against a ripple-carry-adder and a carry-look-ahead adder.

Future work intends to expand the adder to a complete pipelined arithmetic unit, in which the pipeline stages as well as the caches could profit from the memristors multi-bit storing features. Furthermore, the interface circuitry to the memory systems itself is to expand towards reading and writing complete register arrays consisting of memristors.

Using the multi-bit feature of memristors for register files in signed-digit arithmetic units

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Realising multi-stable states in memristors

3. Three-state binary SD arithmetic unit

4. Connecting memristor register model with SD adder transistor netlist

5. Comparative evaluation of a memristor based SD adder

5.1. Estimation of time and area effort for memristor-based SD adder

5.2. Estimation of time and area effort for ripple-carry-adder

5.3. Estimation of time and area effort for carry-look-ahead adder

6. Conclusion

Footnotes

Using the multi-bit feature of memristors for register files in signed-digit arithmetic units

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Realising multi-stable states in memristors

3. Three-state binary SD arithmetic unit

4. Connecting memristor register model with SD adder transistor netlist

5. Comparative evaluation of a memristor based SD adder

5.1. Estimation of time and area effort for memristor-based SD adder

5.2. Estimation of time and area effort for ripple-carry-adder

5.3. Estimation of time and area effort for carry-look-ahead adder

6. Conclusion

Footnotes