Computer dynamic model and time series prediction of air by LSTM recurrent neural network

Air quality is closely related to people's daily life. In order to predict air quality with high accuracy, the air pollution monitoring data of Lanzhou City from May 13, 2015 to April 18, 2020 is used as the basis, and the LSTM model based on the deep learning library TensorFlow is used to predict the air quality of Lanzhou City and compared with the RNN model. The experimental results show that the mean square error of the model is 39.579212, which is more accurate than the RNN model but takes a longer time, and provides a new prediction method with the scientific and reasonable theoretical basis for air pollution prevention and control work.


Introduction
With the continuous improvement of people's production and living standards in today's society, environmental issues have gradually come into people's view [1]. Lanzhou City, Gansu Province, is an ancient cultural city with a history of 2200 years. For a long time, as an essential industrial base and comprehensive transportation hub in the northwest, one of the critical central cities in the western region, and a vital node city of the Silk Road Economic Belt, Lanzhou has played an indispensable role in the political, economic and cultural development of the northwest [2]. At the same time, as a city standing in the northwest, it also inevitably receives a certain amount of environmental impact [3]. Nowadays, environmental problems are becoming more serious, especially in spring, when Lanzhou receives dust storms and other climatic causes [4] [5]. In order to further reveal and manage the pollution situation of air quality in Lanzhou, it is necessary to understand the trend of air change, get timely, accurate and comprehensive air quality information, and make a precise prediction of air quality.
There are many air pollution prediction methods, including numerical prediction model methods, statistical prediction model methods, and machine learning-based model prediction methods. Numerical prediction model methods, such as the Nested Air Quality Prediction Modeling System [6] and Model-3/CMAQ model [7], are based on atmospheric dynamics to build a numerical model of atmospheric pollutant transport in the air for prediction. This method has been widely used, but the application range is single, and it is not easy to achieve real-time online prediction of air quality [8]. Statistical forecasting models such as statistical regression models, clustering models and backpropagation models are based on historical data to determine the intrinsic development pattern to complete the prediction [9], which has become one of the mainstream forecasting methods. Traditional machine learning algorithms such as Boost [10], MFO [11], and SVM [12] can also achieve air quality forecasting in a short period but have 2 the disadvantages of harrowing feature extraction and high uncertainty in the long-term forecasting [13]. In recent years, with the rapid development of deep learning technology, we have seen its great potential in the field of air quality prediction [14]. In this paper, a tensor flow-based LSTM (recurrent neural network) time series model is proposed to predict air pollution in Lanzhou City, and its performance is compared with the RNN model.

RNN model theory
RNN, namely recurrent neural network, is applied to the scene where the input data is a sequence and the front and back data are dependent. Compared with the fully connected neural network, RNN has a cyclic hidden layer, which makes the value of this layer depend on both the current input and depend on the value of the previously hidden layer. Its structure is shown in Figure 1 [15]. In Figure 1, each circle is a unit, and each unit's work is the same. Therefore, the extended network can be folded as the structure of the left half of the figure, using a single unit to calculate repeatedly. According to the expansion structure of RNN and its critical sequence data, we make the following definition based on Figure 1[16]: • : Represents the input at time t • : Represents the output at time t • : Represents the memory at time t As a sort of neural network, RNN is also like other neural networks. It adds all kinds of parameters to the input value and obtains the parameters by continuous learning based on the existing data. The basic formula is as follows (1): Among them, is the activation function. Standard activation functions include ℎ , , . is the weight matrix of input , and is the memory value of the hidden layer at the last moment ( −1 ), as the weight matrix of this input.
According to the basic formula of RNN, it can be seen that in the process of learning, RNN not only needs to receive time input data, but receive the last moment (i.e., The data stored in the hidden layer.). After the calculation of the basic formula, the result is satisfactory. It will be input into the output formula (2), that is: Wherein is the activation function, and is the weight matrix of the output layer. After continuous calculation, the output value of the cyclic network is affected by the memory value with the front 3 number of − 1, which solves the problem that the fully connected neural network cannot process the sequence input with the data related to each other.
However, in the RNN model, sufficient sequence length can easily lead to extreme nonlinear behaviors, including gradient vanishing or gradient explosion. The recursive calculation of RNN is similar to continuous matrix multiplication. Because the weight of RNN is fixed, the exact weight is used in all time steps. Therefore, with constant learning, the weight coefficient will grow or decay exponentially, making the gradient change considerably [17].

LSTM model theory
LSTM is an improved model based on RNN to solve the gradient vanishing problem. Its cycle unit modules have different structures, and four network layers influence each other in a particular way. As shown in Figure 2, LSTM uses a "gate" to control discarding or adding information to realize the function of forgetting or remembering to avoid the long-term dependence of RNN [18].
Gate is a structure that can control the information selection, composed of a activation function and a dot operation [19]. As shown in Figure 1, an LSTM unit has three such gate structures: forgetting gate, input gate, and output gate. The data update process of LSTM is as follows: is the weight matrix of the current input of the forgetting gate, ℎ is the weight matrix of the output ℎ −1 at the last moment of the forgetting entrance, is the offset, and is used as the activation function.
The forgetting gate outputs an output value between [0,1] through the activation function according to the output result ℎ −1 of the hidden layer at the previous time and the current input . When the output value is 0, all the information at the earlier time is discarded, and when the output value is 1, it means that all the information is retained. The calculation of output value is as follows (4): is the weight matrix of the current input of the forgetting gate, ℎ is the weight matrix of the output ℎ −1 at the last moment of the forgetting entrance, is the offset, and function is used as the activation function. Current moment candidate memory value � and a tan function to control which new information will be added, the candidate memory value � is calculated as follows (5): is the weight matrix of the current input of the candidate memory value. ℎ is the weight matrix of the output ℎ −1 of the previous time of the candidate memory value, is the offset, and the tanh function is used as the activation function.
Cell state value at the current moment is composed of input gate and forgetting gate respectively to mediate candidate unit � and its own state −1 at the last moment. The calculation formula is shown in Formula (6): is the forgetting gate value at the current moment, and is the input gate value at the current moment.
The output gate controls the output of the cell state value, which is multiplied by the activation function to determine which information will be output. The calculation is shown in equation (7): Where is the weight matrix of the current input of the output gate, ℎ is the weight matrix of the output ℎ −1 at the last moment of the output gate, is the offset, and function is used as the activation function.
The output of the final LSTM unit is ℎ , and the calculation is shown in equation (8): is the value of the output gate, and is the cell status value at the current moment.

Data preparation
Through the national urban air quality real-time release platform of China's environmental monitoring station, we obtained the air quality data used in the experiment. The accurate air quality data of Lanzhou from May 13, 2014, to April 18, 2020, are selected as the experimental data set, composed of the hourly data of these days. The 24-hour PM2.5 content, PM10 content, PM100 content, AQI index, and other weather indicators are recorded, respectively, with a total of 16 hands and 50043 data points (as shown in Figure 3)

Data preprocessing
In the process of data collection, data may be missing due to human operation or equipment failure. Incomplete data will affect the prediction results of the model, so it is necessary to complete the missing data. The air quality data has a specific periodic change law. In the experiment, the average load data at the same time of the first day and the second day as the missing value are used to complete the missing data.
At the same time, because the LSTM neural network model is sensitive to the input data scale, in order to reduce the impact of data scale on the model training effect, we normalize the data, as shown in equation (9):

Model structure design
As the most popular machine learning framework, TensorFlow is widely used in machine learning and deep learning. The RNN neural network model (see Figure 4) and LSTM neural network model (see Figure 4) for air quality data prediction are designed based on the TensorFlow framework. The experimental data is divided into two parts. The hourly data of the first 1900 days is used as the model training set, and the hourly data of the last 185 days is used as the test set. The model's prediction performance is tested by using the test set to compare with the prediction results of the model. Through constant parameter tuning and optimization of the model, the RNN model parameters are finally determined as follows: 128 Hidden layer neurons, the batch size is 16, the window size is 2850, the Adam Optimizer is 0.001, the model uses the first 24 data points to predict the next one Also, the parameters of the LSTM model are the same as the RNN model [20].

Result Analysis
The trained LSTM and RNN were used to predict the next 185 days' hourly data according to the way that use the first 24 data points to indicate the next one, with a total of 4440 data points (185×24). After the end of prediction, the predicted data and the accurate data are inversely normalized, and the mean square error of the model in the prediction is calculated to obtain the fitting curve between the predicted value and the actual value. The blue represents the actual data and the orange represents the predicted data, as shown in Fig. 5 and Fig. 6. By comparing the original data, LSTM prediction results, and RNN prediction results (as Figure 7). In the picture, blue represents accurate data, orange represents LSTM prediction data, and green represents RNN prediction data. We can see that LSTM is better than RNN in predicting normal data fluctuation. Although both of them can reasonably forecast the trend of changes in data, LSTM performs better than RNN in order of magnitude prediction. At the same time, we can find that many extreme mutations are not predicted in LSTM, while RNN reasonably predicts the occurrence of mutations.  By comparing Table 1, we can find that LSTM has lower MSE and higher accuracy in predicting the air quality of Lanzhou. However, at the same time, the training time of the LSTM model is higher than that of the RNN model.