AGA-LSTM: An Optimized LSTM Neural Network Model Based on Adaptive Genetic Algorithm

With the increase of the hidden layer, the weight update of the LSTM neural network model depends heavily on the gradient descent algorithm, and the convergence speed is slow, resulting in the local extremum of the weight adjustment, which affects the prediction performance of the model. Based on this, this paper proposes an optimized LSTM neural network model based on adaptive genetic algorithm (AGA-LSTM). In this model, the mean squared error is designed as the fitness function, and the adaptive genetic algorithm (AGA) is used to globally optimize the weights between neuron nodes of the LSTM model to improve the generalization ability. The experimental results show that, on the UCI dataset, the prediction accuracy of the AGA-LSTM model is greatly improved compared to the standard LSTM model, which verifies the rationality of the model.

which could only obtain locally optimal. Based on this, this paper proposes an optimized LSTM neural network model based on adaptive genetic algorithm (AGA-LSTM). The AGA-LSTM model uses the mean squared error of the output of the LSTM neural network, which is designed as an appropriate value function, and an adaptive genetic algorithm (AGA) is used to construct an optimization space, and the weights between the nodes of the LSTM model are globally optimized to improve the prediction performance. Finally, the experimental results on the classic UCI dataset also verify the rationality and effectiveness of the proposed AGA-LSTM neural network model.

2.1Adaptive genetic algorithm
Genetic algorithm is a swarm intelligence algorithm proposed by Holland. It builds an optimization space by simulating the genetic mechanism of the biological world. It is widely used in solving various combinatorial optimization problems. Each chromosome in the genetic algorithm corresponds to an individual in the population, which represents a candidate solution for an optimization problem. However, the crossover probability and mutation probability of genetic algorithm are fixed, so the convergence speed is slow and premature phenomenon is easy to occur, and it cannot obtain the global optimal solution. And adaptive genetic algorithm (AGA) is an improved genetic algorithm proposed by Srinivas et al. [13] to overcome the randomness and blindness of traditional genetic algorithms in selection, crossover and mutation, and it is easier to obtain the global approximate optimal solution. AGA can make the crossover probability and mutation probability be dynamically adjusted with the change of the fitness function and the number of iterations, so as to retain the excellent individuals in the population and avoid the premature phenomenon.
The process of adaptive genetic algorithm includes the steps of initializing the population, calculating the individual fitness function of the population, selecting replication, adaptive crossover, adaptive mutation, and judging whether the iteration stops. The specific description is as follows: 1) Initialize the population space and set the relevant parameters of the algorithm.
2) Calculate the fitness function value of each individual(chromosome) in the population.
3) Select individuals to the crossover pool. Individuals (chromosomes) with the higher value of fitness function are selected and copied to the offspring to form new individuals. Individuals with a low fitness function are eliminated. 4) Adaptive crossover operation. A pair of chromosomes exchange information and recombine, that is, individuals in the population are randomly paired, and some chromosomes between the paired individuals are exchanged with adaptive cross probability. 5) Adaptive mutation operation. That is, each individual in the parent group will change some genes of an individual (chromosome) to other alleles with an adaptive probability. 6) Determine whether the iteration stops, and terminate the iteration if the conditions are met, otherwise skip to step (2).
The crossover probability and mutation probability in the adaptive genetic algorithm can be dynamically adjusted in iterations to maintain the diversity of the population. And the appropriate value function is the main basis for evaluating the merits of individuals, so the crossover probability and mutation probability will change with the change of the appropriate value.
Formulas (1) and (2) show the adaptive calculation formulas of crossover probability and mutation probability in the AGA. Among them, avg f is the average fitness value of all individuals in the population, max f is the largest fitness value of the individual, ' f is the larger fitness value of the two individuals to be crossed, f is the fitness value of the mutated individual; 1 k , 2 k , 3 k and 4 k are parameters between 0 and 1,which are used to adjust the crossover probability and mutation probability.
The adaptive genetic algorithm mainly reduces the crossover probability and mutation probability at the early stage of the iteration by setting the values of 1 k , 2 k , 3 k and 4 k , to ensure the survival of better individuals; at the late stage of the iteration, when all individuals in the population tend to stabilize or tend to the case of local optimal, the crossover probability and mutation probability are increased to overcome the local optimal and find an approximate optimal solution globally.

2.2The LSTM neural network
The LSTM neural network is mainly composed of three gated units, such as the forget gate, input gate and output gate. The unique gated units are used to learn and memorize the sequence data to maintain long-distance time series information dependence and achieve high-precision prediction. The standard LSTM neuron structure is shown in Figure 1.
Input_Gate Output_Gate Figure 1. The structure of the LSTM neuron. As shown in Figure 1, , and respectively represent the input gate, output gate, and forget gate. The input gate mainly processes input data. The forget gate determines the current neuron's retention of historical information. The output gate represents the output result of the neuron.Suppose the input sequence is (x_1,x_2,…,x_t), then at time t, the calculation formula of each parameter of LSTM neuron is as follows: W is the weight between the input and the cell unit; then t h represents the output of the hidden layer at time t; S is the Sigmoid function. The deep LSTM neural network is based on the LSTM neural unit, and thus constructs the network model with multiple hidden layers, and continuously removes redundant information in the data set through the forget gates to maintain long-distance dependencies, so it has stronger predictive performance and better generalization ability.

2.2.1An example.
In this example we can see that there are footnotes after each author name and only 5 addresses; the 6th footnote might say, for example, 'Author to whom any correspondence should be addressed.' In addition, acknowledgment of grants or funding, temporary addresses etc might also be indicated by footnotes [5].

The AGA-LSTM neural network model
In this paper, the adaptive genetic algorithm is used to optimize the weights between the nodes of the LSTM neural network, which can make the weights between neurons more reasonable, and improve the generalization ability and prediction performance of the model. This chapter mainly describes the AGA-LSTM model in detail.

3.1Chromosome coding
In this paper, the adaptive genetic algorithm is used to optimize the weights between the nodes of the LSTM neural network, which can make the weights between neurons more reasonable, and improve the generalization ability and prediction performance of the model. This chapter mainly describes the AGA-LSTM model in detail.   Figure 2 shows the structure of LSTM model with 3 hidden layers, Figure 3 shows the coding method of a certain chromosome. Among them, 1 IH w represents the weight between the first neuron of the input layer and the first neuron of the hidden layer of the first layer, and so on. The coding method of chromosomes in this paper includes the weights between all neuron nodes, all of which are real numbers.

3.2Fitness value function
In this paper, the adaptive genetic algorithm is used to optimize the weights between the nodes of the LSTM neural network, which can make the weights between neurons more reasonable, and improve the generalization ability and prediction performance of the model. This chapter mainly describes the AGA-LSTM model in detail.
The mean squared error, usually expressed by, is the mean squared error value on the test set of the global optimal solution obtained by the chromosome during the iteration process, and is also a suitable value function in the AGA-LSTM model. The calculation process is shown in (8

3.3The flow of AGA-LSTM model
In this paper, the adaptive genetic algorithm is used to optimize the weights between the nodes of the LSTM neural network, which can make the weights between neurons more reasonable, and improve the generalization ability and prediction performance of the model. This chapter mainly describes the AGA-LSTM model in detail.
The AGA-LSTM model maps the weight values between the nodes of the LSTM neural network through the adaptive genetic algorithm, and maps each weight value to a certain dimension of the chromosome, making the chromosome a solution set of candidate weights for the LSTM neural network. Afterwards, the adaptive genetic algorithm is used to establish the optimization space and iterate continuously to make the weights between the neurons more reasonable, improving the prediction performance and accuracy of the LSTM model. (2) Train the LSTM neural network to get the default optimal weights.
(3) Use the formula (8) to calculate the fitness value function of each chromosome in the population, and use the adaptive genetic algorithm to construct the optimization space.
(4) Select the chromosomes to the crossover pool. The chromosomes with the fitness value ranking in the top 0.1*N are completely copied to the offspring to form new individuals, and the chromosomes with lower value of the fitness function are eliminated.
(5) According to formula (1), perform an adaptive crossover operation. (6) According to formula (2), perform an adaptive mutation operation. (7) Add 1 to the number of iterations n to determine whether n is greater than the maximum value M of iterations. If yes, terminate the iteration, otherwise skip to step (3).  (8) Output the globally optimal chromosome, which corresponds to the optimal weight distribution of the LSTM network model.
It should be noted that when using adaptive genetic algorithm to optimize the weights of the LSTM model, the selection operation adopts the elite selection method: the basic idea is to use the roulette selection method for selection operation, and at the same time The chromosomes with ranged the top 0.1 * N are completely copied to the next generation according to the original coding, N is the population number; in the crossover operation, that is, two new chromosomes are linearly combined after two paired chromosomes are generated. See the formula (1) for the crossover probability; In the adaptive mutation operation, non-uniform mutation is used. For the mutation probability, see the formula (2).

Experimental results and analysis
In this paper, the models based on deep GRU neural network, deep LSTM neural network and deep AGA-LSTM neural network are established respectively, and the performance of AGA-LSTM model is verified by comparing the prediction accuracy of each model. The experimental environment configuration is as follows: deep learning framework tensorflow1.10, the language is Python3.

4.1Experimental data set
This article selects Air Quality (AQ), EEG, Dow Jones Index (DJI), Ozone Level Detection (OLD) in UCI database. Among them, the AQ data set records the air quality data of a region in Italy from 2014 to 2015, which is used to predict the air quality of the region; the DJI data set records the stock price data of the Dow Jones Index, which can be used to predict the stock price trend of a certain period in the future; The OLD records the ozone concentration on the ground, which can be used to predict the ozone concentration of the day, which belongs to the problem of time series prediction.

4.2Establishing prediction models
The modeling steps of the neural network model include data preprocessing, training data set and verification model. Based on the above steps, this paper establishes prediction models based on GRU, LSTM and AGA-LSTM neural networks on three UCI data sets, such as AQ, DJI and OLD, to verify the effectiveness of the AGS-LSTM model. For the pre-processed data set, 70% of them is taken as the training set, 20% is taken as the verification set, and 10% is taken as the test set. The structure of the three network models established in this paper is the same, the input layer and the output layer are set to 1 layer, and the hidden layer is set to 6 layers. And, the error square sum SSE is used as the index to predict the performance of the test model, as shown in formula (9) After the training of the neural network model is completed, the test sets of the above three data sets are sequentially input to each model, and the comparison results of the experiment can be obtained. Table 1 shows the SSE values of the three models of GRU, LSTM and AGA-LSTM under SSD, MDTD, OLD and other data sets. Figure 5 is a histogram drawn according to Table 1.  Table 1 and Figure 5, the SSE values of the LSTM model under the three different data sets are the largest, followed by the GRU model, and the SSE values of the AGA-LSTM model are the lowest on each data set, indicating that the model has the highest prediction accuracy, and its generalization ability and prediction performance are best. Figure 6 shows the trend change graph of the average SSE value of the three models under each data set. The average SSE value of the GRU model is 12.3% lower than that of the LSTM model, while the average SSE value of the AGA-LSTM model is 11.9% lower than that of the LSTM model and 22.8% lower than that of the LSTM model. Obviously, the AGA-LSTM model has the smallest error and the best prediction accuracy. This proves the effectiveness of the AGA-LSTM model.

Conclusion
In this paper, an LSTM neural network model based on adaptive genetic algorithm optimization is proposed to solve the problems of slow convergence speed and the emerging of local extremum in the weights adjustment of the LSTM neural network. The experimental results show that, compared with the traditional LSTM and GRU models, the AGA-LSTM model has the lowest average SSE value and the smallest prediction errors under the three UCI data sets. Therefore, the prediction performance and