RUL Prediction by LSTM Model with Bayesian Parameter Optimization for Turbine Engines

Effectively predicting the remaining useful life (RUL) of a product is significant for reasonable reliability planning and maintenance activities. The long-short term memory (LSTM) model, which belongs to deep learning methods, was applied for RUL prediction of turbine engines, and a parameter optimization method with Bayesian theory was studied.

ICNISC 2020 Journal of Physics: Conference Series 1646 (2020) 012122 IOP Publishing doi:10.1088/1742-6596/1646/1/012122 2 1) Form a matrix X composed of n rows and m columns; 2) Subtract the average of each row of X, that is zero-averaged; 3) Obtain the covariance matrix C, 1/ T C m XX  ; 4) Find the eigenvalues and corresponding eigenvectors of the covariance matrix C; 5) Arrange the feature vectors in rows from top to bottom according to the size of the corresponding eigenvalues, take the first k rows to form a matrix P; 6) Y PX  is the data after k-dimensional reduction. The decentralization step is to subtract the average value of each column: Where i x is the original data value;  is the average value of each column; n is the number of the data.
The normalized way is linear Max-Min: Where min x is the minimum value of each column; max x is he maximum value of each column; x is the original data value; ' i x is the normalized value. Figure 1 shows the structure of a LSTM model. The forward calculation method can be expressed as follows:

Model Architecture
Where () ot is output gate; ,, W U b are corresponding weight and bias;  is sigmoid activation function; is Hadamard product.

Predicting RUL
With the help of Keras, a deep learning framework, the network shape was set of activation function, loss function, regular terms, bias terms, constraints, and dropout [4]. A 4-layer sequential model was built, of which 3-layer LSTM to train, and 1-layer Dense to output. The three layers ahead are composed of LSTM nodes, of which the number is 48, 36, and 24. The structure has shown in figure 2 to train the data set.  Figure 2. The structure of network Set the batch size as the total number of a single engine operating data, the learning rate is 0.001 with Adam optimizer. Then train the network for epoch=100 times on the training set.
The error change during the training process is shown in Figure 3. The horizontal axis represents epoch, the vertical axis represents the errors MSE and MAE respectively.

Parameter Optimization Based on Bayesian Theory
There are many parameters directly affecting the performance of the model, such as the hidden parameters, learning rate, batch size, number of layers, dropout, and regularization coefficients. Setting appropriate hyperparameters is critical to the model predictive ability.
As the term suggests, the Bayesian optimization method uses Bayesian thought theory. The next hyperparameters is selected based on the existing results. Bayesian method tracks past evaluation results to form a probabilistic model, then map these to the objective function's scoring probability: ( | ) P score hyperparameters (8)

Algorithm Implementation of Bayesian Optimization Method
The Bayesian optimization method includes four parts: objective function, search space, optimization algorithm and visualization. Objective function: The objective function is the LSTM in the previous section, and the return value is the MAE of the training set.
Search space: The search space is set as the learning rate, which is continuously and uniformly distributed from 0.0005 to 0.0015.
Search algorithm: Search algorithm is the core algorithm of the method. There are two options: random search and Tree of Parzen Estimators (TPE), and the latter is selected.

Optimization Result
Bayesian optimization method is used to automatically adjust the learning rate of the neural network. Set epoch as 50 to minimize MAE. The learning rate of each epoch is shown in Figure 5.   Figure 6 is scatter plot of learning rate and MAE. The result shows that the MAE is the smallest when the learning rate is 0.0012260442375009789, and the minimum value is 10.55162912133929.

Comparison
Set the learning rate as 0.0012260442375009789 and retrain the model, the error change during the training process is shown in Figure 7.

Conclusion and Future Work
In this paper, the PCA algorithm was used to extract data features, and a LSTM model was trained to predict the RUL of turbine engines. Then, a Bayesian parameter optimization method was studied, and the better hyperparameters were obtained, which means more accurate RUL prediction model can be obtained. In the future, how to add an algorithm to update the model automatically may be studied. That is, to optimize hyperparameters online and adjust model dynamically.