Two-dimensional materials for artificial synapses: toward a practical application

Combining the emerging two-dimensional materials (2DMs) and neuromorphic computing, 2DM-based synaptic devices (2DM synapse) are highly anticipated research topics with the promise of revolutionizing the present Si-based computing paradigm. Although the development is still in the early stage, the number of 2DM synapses reported has increased exponentially in the past few years. Nevertheless, most of them mainly focus on device-level synaptic emulations, and a practical perspective toward system-level applications is still lacking. In this review article, we discuss several important types of 2DM synapses for neuromorphic computing. Based on the cross-layer device-circuit-algorithm co-optimization strategy, non-ideal properties in 2DM synapses are considered for accelerating deep neural networks, and their impacts on system-level accuracy, power and area are discussed. Finally, a development guide of 2DM synapses is provided toward accurate online training and inference in the future.


Introduction
With ever-increasing data generation and the rising demand for data processing, conventional computing systems based on the von Neumann architecture can no longer satisfy the requirement of fast processing speed and high energy efficiency [1]. As a result, novel non-von Neumann architectures are now being actively investigated. Neuromorphic computing [2] is believed to be the most promising novel computing architecture to provide high energy efficiency and massive parallelism inspired by the human brain. Many synaptic devices, which act as crucial components in neuromorphic computing, have been realized using several types of nonvolatile memories (NVMs), including resistive switching memory (RRAM) [3,4], phase-change memory (PCRAM) [5,6], ferroelectric memory [7], etc for accelerating deep neural networks (DNNs) with in-memory computing capability [8]. While the three-dimensional bulk materials (3DMs) are commonly adopted in the aforementioned synaptic devices, 2D materials (2DMs) have drawn increasing attention in recent years, and a large number of 2DM-based synaptic devices (2DM synapses) have been reported. The use of 2DMs as the semiconducting channel with an atomically thin thickness and a larger bandgap compared to the traditional Si counterpart not only offers better immunity to short channel effects by providing superior gate control but also lowers the leakage current, making it suitable for low-power applications [9]. Moreover, the dangling-bond-free surface promises excellent charge transport. Controlling the number of layers in 2DMs allows tunable bandgaps that provide flexibility in diverse applications [10]. The newly available synthesis and advanced device fabrication methods enrich the structural design of 2DM synapses including van der Waals heterostructures [11] and 3D monolithic integration [12]. The wafer-scale integration of 2DMs enables highdensity electronic circuits for more complex computing tasks [13]. These highly scalable, tunable, and stackable properties make 2DM synapses promising candidates for neuromorphic computing applications.  [34]). The statistic results consider two main structural categories of 2DM synapses including two-terminal MIM-based (black bars) and transistor-based (grey bars) synaptic devices, indicating steadily growing trends. (b) The percentage of various types of 2DMs adopted in synaptic devices (statistic dataset from 2014 to July 2021 modified from [34]). MoS 2 and graphene are still the two most popular 2DMs due to their maturity. The 'others' category includes WS 2 , MoTe 2 , ReS 2 , Ti 3 C 2 , etc, showing increasingly diverse 2DMs are adopted in 2DM synapses. Note that the statistic dataset only includes those 2DM synapses reporting the properties of synaptic plasticity, and for those only demonstrate memory characteristics are not counted.
Although numerous 2DM synapses have been proposed and demonstrated, their development for neuromorphic computing is still in the early stage. Whether the reported synaptic behaviors of those 2DM synapses meet the requirement for practical applications remains in doubt. Many excellent review papers have systematically summarized 2DM synapses from the perspectives of materials and devices [14][15][16][17][18][19][20][21][22][23][24][25]. However, a comprehensive and quantitative study that connects to system-level requirements is still lacking. In this article, we first introduce the recent advances in 2DM synapses for neuromorphic computing applications. We categorize 2DM synapses by their device structures and operating mechanisms and discuss their synaptic characteristics. To evaluate the readiness of 2DM synapses for practical applications, the main focus of this work, we adopt a cross-layer device-circuit-algorithm co-optimization strategy, where the influence of non-ideal device properties such as limited precision, nonlinear weight update, intrinsic device variation, etc could be connected to the system-level specifications, namely performance (accuracy), power, and area, for image classification tasks [26]. Based on the co-optimization results with the special emphasis on accurate online training and inference in DNNs, not only the current status but also the future research directions of 2DM synapses will be provided in this work.

2DM synapse and categorization
In neuromorphic computing systems, an artificial synapse performs multiplication of the voltage input signal given from the pre-neuron and its present synaptic state, and the product output signal is then transmitted to the post-neuron. Such multiplication can be easily realized in crossbar memory arrays using Ohm's law [27], which at the same time promises high-density integration and massively parallel computation [28]. Therefore, implementing learning functions in synaptic devices becomes a prerequisite for neuromorphic applications [29], which ultimately aim for constructing a replica of the human brain. Synaptic plasticity is one of the key features to be obtained, and it includes potentiation and depression of synaptic weight that describes the strengthening and weakening of the connection between neurons, respectively [30]. Figure 1(a) shows the trends in the annual counts of the number of published papers focusing on 2DM synapses. The total number of reported 2DM synapses has increased exponentially within a few years, suggesting an ever-growing interest in this field. Similar to the 3DM-based synapses that have been widely realized in several types of devices [31][32][33], here, based on the device structure, the 2DM synapses could be categorized into the two-terminal metal-insulator-metal (MIM)-based synapses and three-terminal transistor-based synapses accordingly. The former is generally a memristive 2DM synapse showing resistive switching (RS) by applying bias. The latter is usually based on gate-modulated conductance change in the transistor structures to represent synaptic plasticity. Herein, various 2DMs are adopted in the 2DM synapses. Figure 1(b) manifests the percentage of various 2DMs in the reported 2DM synapses, where MoS 2 and graphene are most popular due to the maturity in synthesis. The 'others' category includes a wide variety of 2DMs, such as WS 2 , MoTe 2 , ReS 2 , Ti 3 C 2 , etc Interestingly, the proportion of this 'others' category of 2DMs has increased from 19.72% in 2019 [34] to 35.2% in 2021, suggesting the growing interest in exploring the rich material database of 2DMs in this emerging application. synapses. Each layer may be substituted by 2DMs. The structure in (b) is especially for the charge-storage transistor-based 2DM synapse, where the charge-storage layer (also called floating gate) is indicated. The structure in (c) is for other transistor-based 2DM synpases not relying on the additional charge-storage layer.

MoTe 2
Ti/MoTe 2 /Au Filament [80] Although optical synaptic responses [14,16, have also been presented in 2DM synapses, in the following sections, we will only focus on the electronic synapses because of their relatively mature architecture and peripheral circuit design for accelerating most DNN problems in practical applications. Figure 2(a) illustrates the general structure of MIM-based 2DM synapse, which is similar to several twoterminal memristors such as RRAM and PCRAM using the typical 3DMs. The compact two-terminal structure offers the ultimately scaling capability to construct synaptic crossbar arrays with the highest possible density [58]. Each layer in the MIM structure may be replaced by 2DMs. For instance, table 1 lists several representative MIM-based 2DM synapses. The semimetallic graphene layer can serve as conducting electrodes [59][60][61][62][63][64][65], while the insulating h-BN [66,67] and semiconducting MoS 2 [68][69][70][71][72][73][74][75][76], WSe 2 [65,77], WS 2 [78,79], MoTe 2 [80], etc are commonly adopted as RS layers sandwiched between two electrodes. The conductance change of MIM-based 2DM synapses involve a wide range of switching mechanisms, including filament formation and rupture [59-64, 66-70, 77, 78, 80-83], barrier modulation [65,[71][72][73][74]79], phase transition [75,76], and ferroelectric polarization [84][85][86]. Similar to the conventional filamentary RRAM, the filament formation and rupture in 2DM synapses relies on mobile ions or vacancies from the metal electrodes or RS thin films that are driven by the applied electric field. On the other hand, unlike the locally formed filament, the current conduction in the barrier modulation mechanism is rather uniform, where the modulated Schottky barrier at the 2DM/electrode interface is affected by the migration of charged defects. As for phase transition in the 2DM synapse, it requires the intercalation process by metal ions such as Li + to initiate the phase transition of 2DMs.

Three-terminal transistor-based 2DM synapse
The transistor-based synapse has been long adopted since the early stage of neuromorphic implementation by using a traditional Si-based charge-storage transistor [29]. Compared to the two-terminal MIM-based synapse, the three-terminal transistor-based synapse is less compact. However, it offers another degree of freedom not only in structural design but also in read and write operating methods for synaptic property tuning due to the multiple terminals [87,88]. Figures 2(b) and (c) illustrate the general structures of transistor-based synapses, where 2DMs may play roles in certain layers of transistor-based synapses. For instance, table 2 summarizes several reported transistor-based 2DM synapses, which are further categorized into the charge-storage transistor [89][90][91][92][93][94][95], electrolyte-gated transistor [96][97][98][99], ferroelectric-gated transistor [49,[100][101][102][103], and memtransistor [104,105] according to different operating mechanisms. The 2DMs with superior carrier mobility are most often adopted as the channel layer. The insulating h-BN is reported with fewer charge impurities and better uniformity [106], which is suitable as the tunneling barrier [89,[91][92][93][94]. The graphene commonly acts as the floating gate [89][90][91][92] that stores different amounts of charges to represent analog synaptic weights. Moreover, it can be adopted as a thin ionic tunneling layer between the channel and electrolyte for creating a stable interface in the WO 3 -based electrolyte-gated 2DM synapse [98]. The electrolyte at the gate terminal provides gate-controlled ions that modulate the channel conductance through the intercalation process. As for the ferroelectric-gated 2DM synapse, the spontaneous polarization switching may arise from the conventional ferroelectric materials [49], the newly emerged HfO 2 -based ferroelectric materials [100], or the 2DM itself [101,102]. On the other hand, the memtransistor-based 2DM synapse is a hybrid structure combining both the memristor and transistor [104], and the defects migrating along the grain boundaries of polycrystalline MoS 2 result in the modulation of conductance [104,105]. In the next section, we will focus on the synaptic plasticity realized in various 2DM synapses.

Synaptic plasticity in 2DM synapse
Several forms of synaptic plasticity can be found in 2DM synapses. The short-term plasticity such as pairedpulse facilitation/depression describes the decay of weight modulation immediately after removing stimuli. The long-term plasticity including spike-timing-dependent plasticity (STDP) and long-term potentiation/depression (LTP/LTD) show longer sustainability in the weight modulation. Although the emulation of short-term and long-term plasticity are both important for paving the way toward neuromorphic computing, long-term plasticity is more commonly adopted in DNNs. This is because the time-independent weight modulation is more compatible with well-developed learning rules in DNNs [107]. Besides, STDP is reported as an important learning rule by encoding the spike-based information in the time domain, which is favorable for spiking neural networks (SNNs). SNN is promising for constructing a computing system resembling the biological neural network with low power consumption and high energy efficiency [1]. However, the lack of global learning architecture supporting the STDP learning algorithm still hinders its development [32]. On the contrary, DNN has been long developed with optimized backpropagation algorithms [108], and it is now matured for most artificial intelligence applications. Therefore, in this section, we will mainly discuss the longterm synaptic plasticity, especially in LTP and LTD for weight update, where the adjustable multilevel memory states are preferable to practical DNN hardware implementation. The representative experimental results of the 2DM synapses are reviewed.  [75,84,86,92]. By applying consecutive pulses for LTP and LTD, multilevel weight modulation can be achieved bidirectionally. The tunability of LTP and LTD characteristics depends on the process and operating conditions. For example, various weight update characteristics with different conductance ranges and nonlinearity can be obtained in a MoS 2 synapse by using different types of ions for intercalation [75]. Moreover, the tuning of weight update can be realized by modulating the amplitude [86,92], width, and number [84,92] of the input pulses. Reducing pulse amplitude and pulse width show an improvement in nonlinearity, and it is also effective when further decreasing the pulse number to achieve a perfectly linear weight update [92]. However, the consequential degradation in dynamic range and thus weight precision need to be taken into consideration. a The device is based on a two-terminal floating-gate memory structure without a gate terminal. b The graphene layer serves as an insertion layer between WO 3 channel and gate electrolyte.  [90]. (g) and (h) Optimizing the cycling behaviors in the WO 3 -based electrolyte-gated 2DM synapse. By inserting a graphene layer between the WO 3 channel and electrolyte, the weight update performance can be greatly improved from (g) to (h) [98].

Reliability and variability properties
The reliability and variability are crucial issues to be considered for any practical application, they are however less discussed in most 2DM synapses. Nevertheless, there is still no clear consensus on the reliability requirement in synaptic devices because it is strongly application-specific. Since the synaptic device exhibits both memory and computing functions in DNNs, its reliability evaluation could be similar to that of the NVMs [109], where endurance and retention are the two key metrics [110]. Figures 3(e) and (f) show the stable cycling measurement of weight updates in several 2DM synapses [76,90]. Although non-identical rather than identical input pulses were applied, this cycling measurement still provides useful information for evaluating the endurance of synaptic devices. Besides, the cycling behavior of the WO 3 -based electrolyte-gated 2DM synapse is optimized by inserting a graphene layer between the electrolyte and WO 3 channel, as shown in figures (g) and (h) [98]. The inserted graphene successfully suppresses the self-diffused Li + ions into the channel and prevents unstable states. Therefore, obvious improvement in the weight update can be observed. Moreover, temporal variation in the weight update is commonly found in the reported 2DM synapses, but this cycle-to-cycle (CtC) variation is rarely the main focus because usually the higher priorities are given to the demonstration of a large number of analog states and stable cycling of LTP/LTD in the currently reported 2DM synapses. Examining the device-to-device (DtD) variation in 2DM synapses is even scarcer. Although CtC and DtD variations are seldom reported in current 2DM synapses, their impact on DNN training and inference may not be negligible and will be discussed in the next section. Figures 4(a) and (b) demonstrate outstanding cycling endurance for evaluating variation in α-In 2 Se 3 -based and SnS-based ferroelectric 2DM synapses with a CtC variation of 1.91% [102] and 6.27% [86], respectively. Moreover, the 5.95% DtD variation among 10 devices is further reported in [86], supporting its relatively stable and uniform characteristics. Although both devices present superior cycling endurance, the stable and symmetric weight updates were all obtained using non-identical input pulses [86,102], which may require a write-and-verify programming scheme. Such a complex programming scheme not only increases the additional operational overhead but also compromises the advantages of high parallelism and energy efficiency in the hardware implementation of DNNs. The retention of 2DM synapses is more frequently demonstrated compared to the variability. However, the reported retention properties are usually obtained from the DC-cycled high resistance and low resistance states that possess a much larger conductance change than those used in synaptic operations [76,80,111]. Analyzing the retention properties obtained by AC pulses is preferable. Figure 5(a) demonstrates the retention of 2000 s obtained using AC pulses in the MoS 2 -based charge-storage 2DM synapse [90]. However, only two synaptic states were characterized using stronger pulses, 35 pulses of −10 V/10 μs and 8 V/10 μs for potentiation and depression, respectively, which is different from its normal programming condition in LTP (−10 V/1 μs) and LTD (8 V/1 μs). A similar case can be found in the retention behavior of the WO 3 -based electrolyte-gated 2DM synapse illustrated in figure 5(b) [98]. Although each of the five synaptic states maintains a retention time of 10 s, these synaptic states were programmed using incremental pulses, which is different from the  favorable identical pulses used for weight update. Therefore, evaluating the retention of each synaptic weight state obtained using operating conditions similar to that used in the weight update is highly recommended. In figure 5(c), the 131 distinguishable synaptic weight states between 0.1 nS to 13 nS are successfully demonstrated in the MoS 2 -based charge-storage 2DM synapse, and the reported retention of each state is 120 s [92]. Although some detailed information of the measurement is not disclosed, this retention measurement could be by far one of the most practical demonstrations toward real applications. However, a longer retention time of each synaptic weight state is required especially for inference-based applications. Table 3 summarizes several key device parameters obtained in representative 2DM synapses [72, 75, 76, 80, 84, 86, 90, 92, 96-99, 102, 111] from different categories. There is still plenty of room for exploration and improvement in 2DM synapses, especially in statistic measurement and reliability analysis.

Discussion and prospects
While the development is still in the early stage, a quantitative analysis of the requirement of 2DM synapses from the system-level perspective for practical hardware implementation is still lacking. Whether the current 2DM synapses are adequate or which part of characteristics needs further improvement for real applications is still under scrutiny. Therefore, in the following sections, we will consider the impact of the key device parameters (listed in table 3) on DNN applications based on our previously proposed methodology [26]. We intend to analyze the device requirement of 2DM synapses by connecting with the system-level specifications, such as accuracy, power, and area. In particular, we will focus more on how to realize accurate online training and inference in DNNs. In addition, we will highlight the importance of low state variation in synaptic devices. Finally, we will discuss the system-level accuracy during DNN online training and inference quantitatively. By taking into account state variation, we aim at providing a design guideline for the future development of 2DM synapses toward practical applications.  [112]. c Non-identical pulse train is adopted. d Assume ANL = 0 for symmetric weight update.

Power prospects
The pulse amplitude and pulse width for achieving LTP and LTD weight update in synaptic devices are directly related to the required power and speed during DNN online training. Although improving training accuracy is the main concern at the current development stage, operating the synaptic device using low voltages (<1 V) and short pulses (<μs) are highly favorable for improving energy efficiency and latency. Moreover, the operating voltage needs to be compatible with the power supply voltage of peripheral driving circuits. In advanced CMOS, the power supply voltage has been scaled to less than 1 V. The short pulse width of less than μs should be a reasonable target because most 3DM NVMs claim to have a fast programming speed. However, most of the reported 2DM synapses require high operating voltages (>5 V) and are with slow speed (∼ms). Therefore, there exists plenty of room for further reducing the pulse amplitude and pulse width in 2DM synapses.

Area prospects
The dimension of synaptic devices directly affects the device density and area overhead. Realizing synaptic devices in nanometer scales is still highly recommended [31]. However, most reported 2DM synapses are still in micrometer scales. Although scaling the device area is practical for improving operating voltage, current, and speed, whether similar performance can be obtained in the scaled device needs to be carefully investigated [109].

Accuracy prospects in DNN online training 4.3.1. Weight precision
Classical LTP and LTD characteristics utilize the same total number of electrical pulses for both LTP and LTD. Applying one LTP/LTD pulse results in the increase/decrease of the synaptic weight state by one. Therefore, the maximum number of synaptic weight states determines the weight precision of a synaptic device. Weight precision is one of the key parameters that define the accuracy in DNN online training. Higher weight precision enables a more precise calculation and results in higher accuracy in DNNs. Although the weight precision of synaptic devices also has an influence on DNN inference, its requirement is much relieved compared to that in online training.

Asymmetric nonlinearity (ANL)
The ANL of a synaptic device is an important indicator for describing the asymmetry and nonlinearity of weight update during LTP/LTD, it is therefore crucial for DNN online training. In our analysis, ANL is defined based on the following equations [112]: where G P and GD functions describe the behavior of synaptic weight modulation in LTP and LTD, respectively. n is the number of applied pulses. N is the maximum number of pulses allowed for both P and D, which defines the weight precision. k is the fitting parameter. ANL is normalized to between zero and one by using the dynamic range (i.e. G max − G min ). For a perfectly symmetric and linear LTP and LTD, ANL is zero. By contrast, ANL is one for an extremely asymmetric nonlinear LTP and LTD. ANL is particularly important to DNN online training because of the frequent weight updates required in synaptic devices. By contrast, ANL is less concerned in DNN inference where the write-and-verify programming scheme could be applied with a negligible penalty because of the infrequent weight update. The existence of ANL results in uneven modulation of each synaptic state depending on the direction of LTP/LTD and the present synaptic state, and the cumulative effect on the weight value no longer follows a simple summation and subtraction rule [112]. As a result, the training accuracy degrades when ANL is increased. Note that the synaptic device showing symmetric weight update is less influenced by the nonlinearity. Even though the nonlinearity contributes to the uneven increase/decrease of weight update depending on the present state, the symmetric LTP/LTD provides comparable modulation and mitigates the training accuracy loss [112,113].

Cycle-to-cycle (CtC) variation
The CtC variation defines the temporal random variation of the conductance state change in a synaptic device when applying the same input stimuli. The conductance values during the weight update process are randomly The synaptic device with higher weight precision shows more severe accuracy degradation with increased CtC variation, and the one with lower weight precision eventually outperforms the others at σ = 5%. The predicted accuracy using 2DM synapses [86,102] is identified in the figure, and the detailed device parameters are listed in table 3. (c) The training accuracy vs CtC variation when using different weight precisions and ANL = 0.6. The accuracy of the synaptic device with lower weight precision is degraded due to the high ANL. affected by the CtC variations. The CtC variation is the main factor affecting online training accuracy. By contrast, the DtD variation is tolerable due to the self-adaptive nature of online training [114], where the weight of each synaptic device could be adjusted appropriately through a supervised learning process even with different weight update characteristics. In our analysis, the CtC variation is normalized to the total dynamic range [26], and we use the standard deviation (σ) to describe CtC variations of synaptic devices.

Endurance property
For endurance analysis, the total number of input pulses applied during the cyclic LTP and LTD measurements could be roughly seen as the maximum number of input pulses allowed during weight update. The endurance property is particularly important for DNN online training with frequent weight updates, but it has less impact on DNN inference. The maximum allowed number of input pulses should be high enough to guarantee successful training without early device failure.

Simulation results
To analyze the performance during DNN online training, we first consider that the synaptic devices are ideal without state variation. The accuracy of VGG-9 DNN training for CIFAR-10 classification under different ANL of synaptic devices is evaluated, as shown in figure 6(a). When a perfectly linear weight update (ANL = 0) is assumed, the accuracy of 88.3%, 90.8%, and 91.84% are obtained with weight precision of 64 (6 bit), 128 (7 bit), and 256 (8 bit), respectively. The higher weight precision results in higher accuracy. However, when the ANL is increased, a clear accuracy degradation could be found. The synaptic device with a lower weight precision (N = 64) shows an even worse immunity against the increase of ANL compared to that with a higher weight precision (N = 128 and 256). This accuracy degradation induced by ANL has been a well-known issue in synaptic devices [115,116]. Without considering state variation, pursuing linear weight updates and high weight precision in synaptic devices is feasible to increase the training performance in DNNs.
However, when CtC state variation is present in a realistic case, drastically different conclusions are obtained. Figure 6(b) shows the training accuracy when considering CtC variation in the synaptic device with linear weight update (ANL = 0). The result suggests that the synaptic devices with a higher weight precision (N = 128 and 256) sustain the advantage of high accuracy when the CtC variation is low (σ < 2%); however, when the CtC variation is increased (σ > 2%), those with higher precision show severe accuracy degradation. On the other hand, the synaptic device with a lower weight precision (N = 64) is less sensitive to CtC variation because of the wider margin between each synaptic state. The synaptic device with 64 states (accuracy = 85.02%) even outperforms that with 256 states (accuracy = 76.92%) when the CtC variation is 5%.
When considering the existence of both CtC variation and ANL in a more realistic situation, the accuracy is calculated as shown in figure 6(c). Even though the synaptic device with lower weight precision (N = 64) has almost constant accuracy with increased CtC variation, it can no longer prevail in a large ANL case (0.6) because the baseline accuracy without CtC variation is seriously compromised. Therefore, pursuing a higher weight precision is beneficial only when the CtC variation is small or when ANL is high. One should carefully consider the non-ideal properties of the synaptic device to find the best trade-offs.
Despite of the importance of ANL, the CtC variation, and their interaction, very few 2DM synapse studies reported them simultaneously, which makes the evaluation on their online training potential difficult. The two ferroelectric 2DM synapses [86,102] are the only few with reported CtC variation, and the cycling behaviors are shown in figure 4. The accuracy of 80.54% and 87.78% based on SnS-based [86] and α-In 2 Se 3 -based [102] ferroelectric 2DM synapses is identified, respectively, as indicated in figure 6(b). Both devices have weight precision of 100, and their symmetric nonlinear weight update is assumed to have ANL = 0. Table 4 summarizes the specifications of synaptic devices for accurate online training under several situations mentioned above. The total number of pulses applied to each synaptic device after completing online training on CIFAR-10 is also provided accordingly. For the ANL = 0 cases, higher/lower weight precision results in finer/rougher calculation, thus requiring more/less numbers of weight updates. However, when ANL is considered (0.6 in this case), the value of synaptic weights is often modulated close to G max or G min due to the nonlinear LTP/LTD properties. As a result, the online training requires a less number of updates due to the early convergence to a less accurate model. Note that the required endurance is a minimum criterion. Any additional retraining increases the total number of pulses required. In general, the endurance of synaptic devices should be as high as possible to perform repeated training tasks. Higher weight precision enables a more precise calculation and results in higher accuracy in DNNs.

Dynamic range
The dynamic range of synaptic devices describes the total modulated range of synaptic weight in conductance from the lowest G min to the highest G max , and it can be defined by the conductance range of G max − G min or by the ratio of G max /G min . The latter is similar to the definition of memory window. A higher dynamic range allows a larger margin in conductance between the multilevel states of synaptic devices, which is important when DtD variations are present. Besides, the dynamic range defines the magnitude of individual synaptic current and the summing current of multiple synapses on the same bit-line (BL) [26], which directly correlates with the power consumed in DNN systems. Synaptic devices with a higher conductance value result in higher power consumption. The dynamic range and conductance value of a synaptic device also limit the synaptic array size because the summing current of BLs must not exceed the maximum allowed current of the output sensing amplifier. Although a large array could be partitioned into multiple smaller ones to lower the summing current of each BL, the additional overheads in area and power are unavoidable for maintaining the same accuracy [117]. A more detailed study on the dynamic range of synaptic devices for the area and energy co-optimization in a feasible in-memory-computing design can be found in [26].

Device-to-device (DtD) variation
The DtD variation defines the spatial random variation of the conductance values in different synaptic devices when applying the same input stimuli, and it needs to be considered in DNN inference. By contrast, the CtC variation can be ignored here because there is no weight update during inference. In our analysis, the DtD variation is normalized to the mean conductance value [26], and we use the standard deviation (σ) to describe DtD variations of synaptic devices.

Retention property
For conventional 3DM synapses, several retention failure modes with a wide range of time constants have been discussed and their impacts have been evaluated in [110]. Here, for simplicity, we mainly consider a global and permanent conductance drifting that typically occurs after a long retention time. Because the time constant of retention degradation considered here is long, it mainly affects inference but not training accuracy because frequent weight updates are performed during training, and maintaining weight states accurately for a long Figure 7. Accuracy analysis for DNN inference by using different parameters of the synaptic device. (a) The inference accuracy vs DtD variation when using different weight precisions (N ) and a dynamic range (G max /G min ) of 2. Severe accuracy degradation can be found due to the small dynamic range. The synaptic device with N = 2 has a relatively larger margin between the two states, resulting in better immunity against DtD variation. (b) The inference accuracy vs DtD variation when using different weight precisions and a dynamic range of 100, showing much higher resistance to DtD variation compared to (a). time is not critical. By contrast, the retention time for inference prefers to be as long as possible. For 3DM NVMs, a 10 year retention time is usually targeted.

Simulation results
In inference applications, repetitive reading without modifying the conductance of synaptic devices is required. Therefore, the stability of each weight state is crucial because any deviation of the BL summing current may lead to accuracy degradation. To analyze the performance during DNN inference, the dynamic range of 2 and 100 times are assumed with a fixed maximum resistance (10 7 Ω) of the synaptic device. The inference accuracy is shown in figures 7(a) and (b), respectively. When DtD variation = 0%, both dynamic ranges of 2 and 100 have the same baseline accuracy of 91.03%, 92.09%, and 92.18% for weight precision of 2 (1 bit), 4 (2 bit), and 8 (3 bit), respectively, because each state could be precisely defined regardless of the magnitude of dynamic range. This result also indicates that the requirement of weight precision for inference is much relieved compared to that for training. However, in the presence of DtD variation, the dynamic range becomes relevant because the margin between each synaptic state depends on both the DtD variation and the dynamic range. Synaptic devices with lower weight precision have better immunity against variation when the dynamic range is small and the margin between states is insufficient for the high weight precision. On the other hand, synaptic devices with higher weight precision show better accuracy when the dynamic range is large because higher weight precision improves the baseline accuracy and the margin between states is less a concern. However, to increase the dynamic range of synaptic devices unlimitedly is not recommended from the perspective of power and area.
The specifications of synaptic devices for accurate inference under several situations mentioned above are summarized in table 4. Here we further consider the state variation induced by the retention degradation of 5%. The retention degradation is defined by the ratio of the amount of drifting after a certain retention time to the total dynamic range, and the sign represents the drifting direction (positive/negative: toward higher/lower conductance). Therefore, one should carefully consider the non-ideal properties of synaptic devices to achieve the best possible inference accuracy.

Conclusions and outlook
The 2DM synapse is the union of the abundant research in 2DM and novel computing paradigm. With the increasing interest, current research on 2DM synapses focuses more on the emulation of synaptic functions in different 2DM systems but significantly less on necessary properties enabling practical DNN applications. Based on our analysis, we made the following recommendations for future research on 2DM synapses. First, state variability is arguably the most critical factor for accurate DNNs. Statistical data such as CtC/DtD variation should be provided for any 2DM synapse of interest. Even though the process maturity of 2DM synapses is less than that of the conventional 3DM counterparts, leveraging the unique properties of 2DMs could be the key to success. For example, ferroelectric 2DMs are discovered just recently [118] and are relatively new to be implemented as synapses. They often show superior CtC and DtD variation [84,86,102], probably due to the stable spontaneous polarization in the large-domain or epitaxial ferroelectric 2DMs that cannot be easily achieved using thin ferroelectric 3DMs. Retention-induced state drifting is also rarely discussed in 2DM synapses, which is critical for inference applications. Second, although many studies focus on increasing the number of analog states and improving ANL, these properties are only important for online training. The device requirements for online training are much stricter. Even the state-of-the-art in-memory computing studies using matured memory technologies focus mostly on the inference applications where a significantly smaller number of states is sufficient and there exists no ANL and endurance constraints. Third, an excessively large dynamic range is usually not necessary because it usually comes with higher conductance and penalty in power and area. Therefore, combining the first and third points, improving state variability should be the first priority for achieving not only accurate but also efficient DNN hardware. Fourth, although scaling the device to the nanometer scale is not only beneficial for achieving a high-density crossbar synaptic array but also helps to improve operating voltage, current, and speed. Nevertheless, most reported 2DM synapses are with dimensions larger than micrometer scales. Besides, the array-level demonstration is seldom seen [75,92,111]. The device scalability and array integration of 2DM synapses are prerequisites for large-scale DNN hardware.
Overall speaking, discovering the best synaptic device for neuromorphic computing is a holistic optimization that has never been simple. The specification of synaptic devices is strongly application-specific, and trade-offs must be made with care. There is still plenty of room for improvement and innovation in the field of 2DM synapses. But with its current strong momentum, we anticipate continuing progress and blossom in 2DM synapse research for neuromorphic computing in the future.