Characterization of spin–orbit torque-controlled synapse device for artificial neural network applications

William A. Borders; Shunsuke Fukami; Hideo Ohno

doi:10.7567/JJAP.57.1002B2

1. Introduction

Research on developing artificial intelligence (AI) technology is a rapidly increasing field of interest for fabricating machines, devices, and large computing systems that can perform tasks such as high-performance pattern recognition and interpretation of ambiguous information that the human brain excels at. A recognizable trend in recent years has been seen that begins with implementing AI algorithms such as deep learning on CMOS-based, conventional von Neumann computing architecture.¹⁾ An architectural mismatch, however, arises when implementing algorithms suited for the parallel, asynchronous architecture of the brain on the sequential clock-driven von Neumann architecture. This mismatch leads to an overhaul in time and power consumption compared with the human brain. One solution is to implement CMOS technology onto a unique, non-von Neumann architecture, otherwise known as an artificial neural network (ANN).²^–⁴⁾ This architecture models the processing centers of the brain (i.e., neurons) with the data centers connecting two neurons, synapses, to improve operation efficiency and brain likeness. However, CMOS technology on its own does not have dedicated hardware to represent the aspects of the brain. In large-cell-area ANNs, the large number of neurons used exponentially increases the number of required synapses, which in turn creates a need for systems with high density and high speeds. The paradigm known as "edge computing" has recently arisen, which aims to deliver dedicated devices for representing synapses and neurons.⁵^–¹¹⁾ Spintronics technology is a contender to fit in such a paradigm owing to its capability to produce devices with nonvolatility, analogue-like behavior, and virtually infinite read/write endurance.¹²^–¹⁸⁾

In this study, we give a systematic explanation regarding the procedure of an ANN-based associative memory operation¹⁹⁾ and the role of spintronics devices²⁰^,²¹⁾ as artificial synapses. The work utilizes the physical interaction at the interface of antiferromagnet (AFM) and ferromagnet (FM) layers to produce an analogue-like change in output resistance. This change is proportional to the perpendicular component of magnetization in the FM layer, which is switched by an exchange-bias field from the AFM layer²²⁾ and a current-induced torque known as the spin–orbit torque (SOT).²³^–²⁵⁾ In the human brain, a synaptic junction weight evolves in an analogue-like manner, similarly to an SOT device, leading to the assumption that SOT devices can take on the role as dedicated synaptic devices in an ANN. We first briefly describe the previously reported results obtained using an SOT device as an artificial synapse in a proof-of-concept demonstration of the brain like associative memory operation. In this paper we explain in further detail how to teach devices to recognize patterns through an iterative learning process and then how to determine if the devices can reproduce the learned patterns when the input to the system is noisy. In addition to an explanation of a previous report, we also include further device characterization and describe the challenges uncovered when implementing the devices under the current operation scheme and provide insights into improving the devices for realization in viable spintronics-based edge-computing technologies.

2. Experimental methods

2.1. SOT device preparation

To effectively operate SOT devices, a proper device structure and a read out scheme is necessary. To fabricate the SOT devices, an AFM–FM stack is deposited on Si wafers by sputtering under zero magnetic fields. The structure of the film is, from the substrate side, Ta(3)/Pt(2.2)/PtMn(9.5)/Pt(0.6)/[Co(0.3)/Ni(0.6)]₂/Co(0.3)/MgO(1)/Ru(1) (in nm).²⁶⁾ The stacks are then fabricated into Hall bar devices with a 5-µm-wide channel by photolithography and Ar ion milling [Fig. 1(a)]. Cr/Au electrodes are then patterned to form electrical contacts. Hall bar devices are then annealed in a 1.2 T magnetic field along the −X-direction for 2 h at 300 °C to apply the exchange bias necessary for SOT switching. The anomalous Hall voltage V_Hall with respect to the DC channel current I_CH (read across the electrodes patterned in the Y-direction) is converted into an anomalous Hall resistance R_Hall, and plotted in Fig. 1(b). The analogue-like switching is seen by initializing the device with a −I_CH magnitude and then sweeping I_CH towards various magnitudes of I_MAX. Regardless of the magnitude of I_MAX, I_CH is reset to the same negative value after each iteration. The sweeping is carried out by sending a 0.3 s I_CH pulse in the X-direction and then applying a 0.1 s read current of 1 mA to read V_Hall.

Fig. 1. — Download figure:
Standard image High-resolution image

2.2. Construction of ANN

In our previous work,¹⁹⁾ we implement the SOT devices as artificial synapses in a Hopfield model²⁷⁾-based ANN for pattern recognition tests. This model is often considered comparable to a content-addressable memory where an input state with complete or incomplete portions of desired information is applied to a state space. The state space will then converge and output an energetically minimized stable state to represent the system. Conventional Hopfield models are used for forming a state space (later noted as the synaptic weight matrix) that will produce the desired stable state (later noted as block patterns). The discrete Hopfield model, similar to many ANNs, describes the input and output of the system with vectors, henceforth, named state vectors where each element in the vector represents a neuron in either the "ON" (1) or "OFF" (−1) states. We represent three 3 × 3 block patterns with these state vectors [Figs. 2(a) and 2(b)]; the values of 1 and (−1) corresponding to a black block and a white block, respectively. A 3 × 3 block pattern provides a total of nine blocks; therefore, each state vector contains nine elements.

Fig. 2. — Download figure:
Standard image High-resolution image

The Hopfield model is designed to determine the stable states of the system in response to a certain input using the following equation:

$\begin{align} \mathbf{y} & = \text{sgn}(\mathbf{Wy} + \mathbf{b}),\\ \text{sgn}(x) & = \begin{cases} +1 & \text{if $x > 0$}\\ -1 & \text{if $x < 0$} \end{cases} , \end{align} \tag{ 1 }$

where y represents the state vector, W represents the synaptic weight matrix, b represents a bias vector (= 0 in this study), and sgn(⋯) is the signum function. When an input is applied, Eq. (1) will iterate until the right side of equation produces a state vector identical to that on the left side of Eq. (1). The produced result is determined as a stable state vector. The representative synaptic weight matrix W of the system formed so that the desired three 3 × 3 block patterns become stable states [Fig. 2(c)] is determined by

$\begin{equation} \mathbf{W} = \sum_{\mu = 1}^{M}\boldsymbol{{\xi}}_{\mu}\boldsymbol{{\xi}}_{\mu}^{\text{T}} - M\cdot\mathbf{I}, \end{equation} \tag{ 2 }$

where M denotes the number of patterns (= 3), ξ_μ represents the state vector of the pattern μ ( $\mu = 1,2,3$ ), T denoting the transpose operator, and I the identity matrix. As can be seen in Fig. 2(c), the values of W are symmetric across the diagonal. This translates to 36 unique synaptic weights, or rather 36 unique SOT devices. Furthermore, there are four distinct values shown: −3, −1, 1, and 3. To protect against a misinterpretation of the state, a maximum of four R_Hall levels are designated to represent the four synaptic weights used.

2.3. Associative memory demonstration

To demonstrate the associative memory capabilities of analogue SOT devices, the system is first taught the three 3 × 3 patterns through an iterative learning process. To do so, there are three types of vectors that are necessary: the memorized vectors, which are stored in the personal computer (PC) and hold the ideal state vector values for "I", "C", and "T", the key vectors, which are the input vectors to the synaptic weight matrix, and the recalled vectors, which are the stable state vectors that the signum function has converged to. The learning process is performed by first measuring the R_Hall–I_CH curves for each of the 36 devices and mapping four levels on the linear region (defined as the dynamic range) of the R_Hall curve to correspond to four synaptic weight levels. Each device's R_Hall is then changed with I_CH. For example, the R_Hall of SOT device #1 that is used to store W_2,1 in W will be written with a value of I_CH corresponding to a synaptic weight of 1. After the writing phase, the R_Hall of each SOT device is read and mapped into synaptic weights to create a recalled W. At this point the key vectors are applied, one at a time to the recalled W as in Eq. (1) to produce one recalled vector ζ_μ ( $\mu = 1,2,3$ ) for each pattern.

If any of the recalled vectors do not perfectly match the three state vectors stored in memory, the recalled W is not a representative synaptic weight matrix of ξ_μ, or rather, the SOT devices have not successfully learned "I", "C", and "T", and each weight in W is adjusted according to²⁸⁾

$\begin{equation} \Delta W = \sum_{\mu = 1}^{M}\eta(\boldsymbol{{\xi}}_{\mu}\boldsymbol{{\xi}}_{\mu}^{\text{T}} - \boldsymbol{{\zeta}}_{\mu}\boldsymbol{{\zeta}}_{\mu}^{\text{T}}), \end{equation} \tag{ 3 }$

where η is a learning coefficient that determines the rate of learning. If the value is too high, the final change in W will result in a too vague weight matrix. If the value is too small, the system will require too many iterations of adjusting the synaptic weight. During the experiments, η is set to 0.005. The result of Eq. (3) is then applied to the current W by sending I_CH to the SOT devices. At this point, the process is repeated until the recalled W produces stable state vectors identical to the memory vectors.

After the SOT devices have learned the patterns, the final recalled W is stored in memory to use in an associative memory operation. Noisy vectors are applied to the final recalled W. During the learning process, the key vectors were identical to the memory vectors; thus, each key vector now has a single element flipped (−1 to 1 or 1 to −1). The objective is to determine whether the W that the SOT devices converge to using the Hopfield model is capable of associating patterns, an important aspect of the human brain and pattern recognition. The associative memory operation goes through 100 trials before summarizing the results.

3. Results and discussion

3.1. Associative memory operation

The SOT device array takes a maximum of 20 learning cycles to learn the three patterns, that is, the devices undergo at least several read/write cycles of their R_Hall. By applying an associative memory operation to the system, the synaptic weight matrix that the SOT devices learned can associate the memorized patterns from noisy inputs. This is seen by calculating the mean direction cosine (represented by the degree of agreement between the recalled and memorized patterns), which recovers to nearly the ideal value after applying the learning process. These results give one of few experimental demonstrations that spintronics devices can be implemented in ANNs.

While the results of this proof-of-concept are very promising, to improve the efficiency and reliability of the system, the variation between devices must be reduced. A previous work¹⁹⁾ shows in detail a histogram of the R_Hall of 36 SOT devices before and after the learning process. A distinct change in the distribution of levels can be seen, which demonstrates the capability of the device to learn patterns. However, the after-learning histogram shows deviation from the ideal histogram, which can be attributed to the variation the dynamic range of the R_Hall–I_CH output characteristics. Furthermore, the reason why the direction cosine does not recover completely to the ideal value is also attributed to the variation of the dynamic range of the SOT devices. Figure 3 shows an example of R_Hall–I_CH characteristics of two representative devices with different dynamic ranges, affecting the operation of the system. With a small dynamic range, there is a reduction in distinction between levels, where a small dynamic range may cause misinterpretation of the synaptic state.

**Fig. 3.** Illustration of the variation of linear region and dynamic range between devices. The red line represents the linear region and the ΔR_Hall area represents the region of R_Hall values used for synaptic weights −3, −1, 1, and 3.
Download figure:
Standard image High-resolution image

During experimental measurements of the device R_Hall–I_CH characteristics, the maximum switching amplitude ΔR_Hall showed a larger difference than the switching amplitude after insertion into the demonstration circuit. Typically, such a change in switching amplitude is not expected, meaning that there is an altering effect during operation that is reducing the capabilities of the devices.

3.2. Study of device variation and reduced performance

To understand more about the reason behind the large variation, and to find a way to improve the SOT devices for ANN applications, additional experiments were carried out. During the initial testing of the demonstration circuit, batch testing of the SOT devices was performed to determine the average output characteristics of each device. The ΔR_Hall of each device before the batch testing of 400 read/write cycles was observed to decrease in amplitude. After replicating the same devices, the devices are tested by applying a current pulse of −I_CH for 0.3 s to the device and measuring R_Hall with a 0.1-s 1-mA I_CH pulse. A current pulse +I_CH for 0.3 s is then applied and the R_Hall magnitude is measured. This process is repeated more than 400 times to represent the number of read/write cycles applied to the device during the associative memory operation [the measurement sequence is shown in Fig. 4(a)]. Figure 4(b) shows the resulting $\Delta R_{\text{Hall}} = R_{\text{Hall}}^{ + I_{\text{CH}}} - R_{\text{Hall}}^{ - I_{\text{CH}}}$ .

**Fig. 4.** Measurement setup for resistance decay and results. (a) Measurement setup to measure the change in R_Hall. To switch the magnetization of the FM layer, 0.3 s duration current pulses with the same magnitude but with different signs are applied. Between each switching pulse, a 0.1 s duration measurement current pulse (I_Meas) of 1 mA is applied to measure R_Hall. (b) ΔR_Hall vs number of repetitions of read/write cycles N when I_CH = 36 mA (red), 34 mA (yellow), and 31 mA (green).
Download figure:
Standard image High-resolution image

Fitting the curve with an exponential decay allows the calculation of ΔR_Hall if the number of read/write cycles is set to ∞. Figure 5 shows the values of ΔR_Hall at N = ∞ for different levels of channel current, indicating the highest ΔR_Hall at 28 mA after infinite iterations. As noted in a previous paper, α, or the total system variation can be determined by

$\begin{equation*} \alpha\equiv\frac{\text{dynamic range}}{\text{standard deviation}}\times 100\ (\text{previously 43%}). \end{equation*}$

Measuring a new set of 36 SOT devices at the maximum write current of 28 mA shows a dynamic range of 0.23 Ω with a standard deviation of 0.0048 Ω, leading to an α of 2.1%. This percentage fits within the prediction of the simulation of the previous work, which showed that for an ANN with ten 10 × 10 unique patterns, an α of 10% or less is necessary.

**Fig. 5.** Final ΔR_Hall obtained from extending exponential decay fit to N = ∞.
Download figure:
Standard image High-resolution image

3.3. Investigation of the reason behind the decay of ΔR_Hall

To further improve the efficiency of the ANN operation, an investigation into the reason behind the ΔR_Hall decay will allow for a wider dynamic range at current levels that do not induce R_Hall decay. Because the SOT devices do not breakdown completely after applying high magnitudes of current, there is possibly an important aspect of analogue-like SOT switching that is weakening. As stated in the introduction, the SOT switching is due in part to two major factors: SOT, which arises from the spin–orbit interaction inside the AFM layer or AFM/FM interface when a charge current is applied, and the exchange bias field, arising from the exchange coupling at the interface of the AFM and FM layers. Furthermore, the analogue-like switching is mainly due to the unique crystalline structure of the AFM.²⁹⁾ Because so many elements that lead to the operation of the device arise from the AFM layer, we expect that the cause of the decay is a current-induced effect on the AFM structure. Although an extensive investigation of the effects of SOT or the changes in the crystalline structure is required, a rather simple experiment can show whether high current magnitudes applied for several cycles affects the exchange bias field strength. To do so, an experiment similar to that in Ref. 15 was conducted, where R_Hall–I_CH measurements of the device were carried out while applying an external magnetic field of different magnitudes in the ±X-direction.

Plotting the resulting exchange bias field H_bias values as a function of I_CH shows that after applying 150 read/write cycles, the devices to which an I_CH of more than 32 mA applied showed a decrease in the effective H_bias (Fig. 6). It can thus be concluded that high magnitudes of current affect the effective H_bias. Furthermore, because there is a negligible difference between H_bias after 150 read/write cycles and after 400 read/write cycles, it can be concluded that the decay of the devices is less dependent on the length of operation than the magnitude of operation current. Therefore, exploring material systems that improve the issues listed above and/or operating such systems considering the decay of device properties caused by too high currents are expected to lead to ANNs with high reliabilities.

**Fig. 6.** Experimental values of the exchange bias field after applying I_CH magnitudes in the range of 26 to 36 mA for 0, 150, and 400 cycles.
Download figure:
Standard image High-resolution image

4. Conclusions

We have explained in detail the procedure of the previously reported work that demonstrated the capabilities of an ANN with 36 SOT devices to execute an associative memory operation. The devices showed the distribution of their R_Hall values mapped to synaptic weights after applying a learning process to memorize three 3 × 3 block patterns. However, owing to a high level of the total system variation α, the direction cosine of the system does not recover to its ideal value. After applying several hundred read/write cycles and measuring the resulting ΔR_Hall, we find that high magnitudes of current decay the efficiency of the device. Because the learning process does not rewrite all devices during each iteration of the learning process, some devices decay more than others, leading to the large system variation. We then determine that operating the devices at a safe current of 28 mA can produce negligible amounts of decay, effectively reducing α to 2.3%, which is small enough for reliable associative memory operation.

While these results are promising for improving the current demonstration, to be able to compete with other hardware-based ANN paradigms and CMOS technology, one aspect to focus on is the read-out mechanism. Using a magnetic tunnel junction showing tunneling magnetoresistance for the read-out operation allows for a higher level of resistance change, and thus a higher level of system-level integration. Furthermore, if the device is patterned into a three-terminal structure, the read and write current paths can be separated, providing design flexibility. These new developments will be achieved through an in-depth study of the material stack structure, device design and device physics, and system-level integration. These new developments will open a discussion on progressing spintronics technology towards a wide range of machine learning paradigms for non-von Neumann architecture applications.

Acknowledgments

The authors would like to thank H. Akima, S. Sato, Y. Horio, S. Moriya, S. Kurihara, C. Igarashi, T. Hirata, H. Iwanuma, Y. Kawato, and K. Goto for discussion and technical support. This work was supported in part by the R&D Project for ICT Key Technology to Realize Future Society of MEXT, JST-OPERA, JSPS KAKENHI Grant Number 17H06093, JSPS Core-to-Core Program, and Cooperative Research Projects of RIEC (H28/B01, H28/A16, H29/B15).

Characterization of spin–orbit torque-controlled synapse device for artificial neural network applications

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction