The LSTM Neural Network Based on Memristor

The recurrent neural network adds the concept of time series on the basis of the traditional multi-layer feedforward neural networks, provides the memory function, and makes the network show good modeling ability on time-series data. Therefore, this paper proposes a LSTM (Long Short-Term Memory) neural network based on memristor. It establishes a discrete weighted LSTM network model by simplifying the traditional recurrent neural network, and uses memristor arrays on the premise of ensuring recognition performance. We realize the function of weight matrix to improve the structure of LSTM neural network, and finally carry out simulation research on the proposed neural network. And due to the volatility and yield of memristors, this paper also demonstrates and analyzes the impact of these two characteristics on network performance, and the performance level of the LSTM neural network based on memristor is verified under the existing preparation level. Experiments on the TIMIT speech database show that the proposed neural network in this paper has good accuracy and its speech recognition performance is superior.


Introduction
In the field of deep learning, recurrent neural network (RNN) is one of the widely studied deep network models with information feedback. Compared with feedforward neural networks, recurrent neural networks incorporate the concept of time series, which provides learning and memory functions for neural networks, maintains long-term dependence on time series, and has good modeling capabilities for time-series scenarios. However, because the recurrent neural networks use the back propagation algorithm to adjust the model parameters, when there are pretty many hidden layers, the gradient explosion or gradient disappearance may occur, which will affect the prediction accuracy of the model. To this end, Hochreiter and Schmidbuber proposed the Long Short-Term Memory Recurrent Neural Network (LSTM) algorithm [1]. Compared to RNN, LSTM adds a forget gate, an input gate to control the state of the cell, and an output gate to control the output of the cell in the neuron, which suppresses the gradient disappearance to a certain extent and maintains the information dependency for a longer period. The LSTM model can be used to process and predict events with long intervals and delays in the time series. At present, the LSTM neural network has been widely applied to speech recognition, machine translation, text classification, computer vision and other fields[2], [3], [4].
The implementation of LSTM neural network is mainly based on CMOS devices, resulting in the waste of sapce and excessive power consumption. While the appearance of memristor provides a good solution. Compared with the traditional neural networks based on CMOS devices, the memristor based 2 neural network has the characteristics of small size, low power consumption and high integration [5], and has multi-resistance characteristics, which is very suitable for the weight matrix circuit device in the LSTM neural network.
Memristors have achieved extensive research results in CNN, SNN and other aspects. For example, the convolutional neural network chip based on memristors improves the performance by 7.7 times under the premise that the area is basically equivalent [6]. However, the current research on recurrent networks is carried out on the basis of continuous weights, and the lack of recurrent neural network models based on discrete weights is difficult to achieve through memristors. Therefore, it is pretty important to consider how to properly simplify the neural network structure under the condition that the network recognition performance is not degraded, so that it is significant to implement the memristor array. At the same time, the memristor array also has non-ideal characteristics such as volatility and yield. It's obvious that these problems need to be considered when analyzing the performance of the artificial neural networks based on memristors. In this paper, the LSTM neural network model based on memristor is constructed first, and then the effect of the yield and volatility of the memristors on the performance of the proposed neural network is analyzed.

LSTM neural network based on memristor
The key of LSTM neural network is to set cell state to process time series information. At the same time, LSTM neurons design three kinds of gated units to remove or add elements to cell state. And the LSTM neural network are mainly composed of three gated units, such as the forget gate, input gate and output gate. The unique gated units are used to learn and memorize the sequence data to maintain long-distance time series information dependence and achieve high-precision prediction. As shown in Figure 1, the LSTM neuron has the input gate, output gate, and forget gate. The input gate mainly processes input data. The forget gate determines the current neuron's retention of historical information. The output gate represents the output result of the neuron. Suppose the input sequence is (x1,x2,…,xt) , then at time t, the calculation formula of each parameter of LSTM neuron is as follows:  In the LSTM network, the concatenated vector is multiplied by the weight matrix and obtained outputs go through activation functions to form gate values. Memristors with variable resistance and capability of remembering its resistance state are ideal devices for weight matrix hardware. Therefore, memristor-based LSTM network realizes the function of weight matrix through memristor crossbar array [7], [8]. Memristor has two states on R and off R that can be controlled by applied voltage amplitude and pulse duration, we use this property to realize the representation of positive and negative numbers in the weight matrix. To be specific, it can be represented as a difference of two memristor conductances [9]. Figure 2 shows the structure of LSTM neural network with 3 hidden layers. The deep LSTM neural network is based on the LSTM neural unit, and thus constructs the network model with multiple hidden layers, and continuously removes redundant information in the data set through the forget gates to maintain long-distance dependencies, so it has stronger predictive performance and better generalization ability.

The memristor-based LSTM network voice recognition processing flow
In this paper we firstly use MFCC(Mel frequency cepstrum coefficent) [10] to process the feature of speech samples. The imput speech information will be preprocessed into a matrix, so that it can act as the imput vector of the LSTM network.  The LSTM neural network based on memristor firstly obtains the mathematical LSTM neural network with continuous weight through the training process of LSTM neurons, then uses memristor to train the weight matrix of the network in data set, so as to determine the value of the weight matrix, and then discretizes the value of the weight matrix, so as to realize the modeling process of the performance of LSTM neural network based on memristor.

Experimental results and analysis
Firstly, this paper designs an LSTM neural network which can recognize and classify the speech information of digit 0-9. The samples are selected from the TIMIT speech database. The length of speech samples are cut to 1s, and the frame length is 25ms, frame shift is 10ms. MFCC is used to process the feature of the sample. The feature value of each frame is extracted, and a vector sequence of 1 * 13 with length of 100 is obtained. [input_ shape=(100,13)].

Effect of the resistance states
In order to realize the simulation of LSTM neural network based on memristor, we first select the appropriate number of memristor's resistance states, set the number of resistance states as a , and the maximum weight is max w and minimum weight is min w . The weight range is divided into a intervals, then when the weight of the network is in the nth interval, the weight size can be shown in (6).   Table 1 shows that the recognition rates of the LSTM network under different number of resistance states. It can be seen that when the number of resistance states of the memristor is 2 or 4, the recognition accuracy of the network is poor; when the resistance state of the memristor is more than 8, the recognition performance of the network is improved significantly, and when the resistance state is 16, it is not significantly improved compared with 8. Therefore, when the resistance state of the memristor is 8, the non ideal characteristics of the memristor are simulated.

Effect of the variation
Due to the limitation of fabrication technology, the resistance of memristor synapses fluctuates to some extent. In LSTM, when the resistance value of the memristor used to simulate artificial synapse fluctuates in different degrees, the calculation formula of the resistance value is as follows: Among them, A is the resistance value of the memrisome synaptic device in the ideal status, B is the variation, and the resistance value of the memrisome synaptic device obeys the normal distribution in this range. When the maximum value of the memristor synapse device is taken, the fluctuation of the resistance causes little change in the conductivity, that is to say, it has little influence on the current flowing through the synapse, which can be ignored. Therefore, this paper only considers the influence of the fluctuation of the minimum resistance on the recognition rate of the LSTM. It can be concluded from the table 2 that the recognition rate of the LSTM network based on memristor decreases gradually with the increase of the volatility of the memristor resistance value. When the volatility of the resistance value reaches 15%, the recognition rate of the network drops below 90%.

Effect of the yield
In practice, the memristor synaptic devices in LSTM may be partially damaged, which can be divided into three situations: 1) Memristor in high resistance state; 2) Memristor in low resistance state; 3) Memristor in opposite state. Assuming that there is no fluctuation in the resistance value of the memristor at this time, it is defined in this topic that when the memristor is damaged, the resistance value is always in the high resistance state, and different yield is set to test the recognition rate of the pulse neural network. When the fluctuation rate is 0, the results are as follows: It can be seen from table 3 that the recognition rate of the network decreases with the decrease of yield. When the yield of the memristor array is 96%, the recognition rate of the network decreases to about 90%.

Conclusion
In this paper, by simplifying the mathematical LSTM network model, the LSTM neural network model based on memristor with discrete weights is established on the premise of ensuring the recognition performance. And the weight matrix of the simplified LSTM neural network is mapped to the memristor array, so that the function of the weight matrix can be realized through the memristor array. In the experiment on TIMIT speech database, we analyze the influence of the number of memristor resistance states on the recognition accuracy of neural network, and discuss the influence of the memristor's volatility and yield on the recognition rate. The final experimental results verify the accuracy and performance of the proposed LSTM neural network based on memristor.