Chemical Process Fault Diagnosis Method Based on Deep Learning

To address the problem that the traditional fault diagnosis method for chemical processes under big data relies too much on expert experience and fault features are difficult to distinguish, a deep learning-based fault diagnosis method is proposed, which combines convolutional neural network (CNN), long and short-term memory (LSTM) and attention mechanism (AM). In this method, the spatial sequence features of the input signal are extracted by the CNN adaptively, while the LSTM extracts the time-series features of the signal. Finally, the model performance is enhanced by introducing the attention mechanism and using the SoftMax layer as a classifier for fault diagnosis, so that the model can notice the important features of the faults with the interference of noise. Simulation validation of the method in this paper is performed using the TE chemical process data set, and it is demonstrated that the method can be used for chemical process fault diagnosis studies. Finally, compared with other fault diagnosis methods, the method is more accurate and has certain superiority.


Introduction
Since the era of big data, the use of artificial intelligence algorithms has become more and more popular in research.For most scholars, deep learning algorithms have become the key to solving problems in fault diagnosis research.Deep learning is an important direction of machine learning research.Machine learning relies on algorithms to train data and make decisions on tasks [1] ; deep learning uses deep learning networks to represent the features of data.However, traditional machine learning methods for fault diagnosis of chemical processes rely on expert knowledge and experience.In contrast, deep learning excels at feature extraction and classification, as well as feature function mapping, which automatically learns and extracts features directly layer by layer, making it more suitable for situations where the structural properties of the equipment are unclear, reducing the use of human resources, and allowing direct manipulation of equipment data without an empirical background.Secondly, deep learning can analyze the equipment state changes embedded in data changes, and such complex correlations are more difficult for general algorithms to learn, especially when facing massive data with high dimensionality and multiple types.Therefore, deep learning has a broad application prospect in fault diagnosis, not only can it provide a variety of more effective diagnostic methods, but it can also greatly improve the accuracy of troubleshooting.
At present, there are many advanced research results for deep learning methods in the field of fault diagnosis.Combining the existing literature and comparing multiple deep learning models, the research object of this paper is more suitable to use CNN as the basic fault diagnosis model, and introduce the long-and short-term memory network and attention module on this basis, so that the model can complete the fault diagnosis task more accurately [6] .

Convolutional Neural Network (CNN)
CNN are not very different from traditional neural networks, since they are the same as layer networks.The difference is that the function and form of the layers of convolutional neural networks have changed, which can be said to be an improvement of traditional neural networks.CNN can reduce the number of parameters of deep learning network models by sharing weights, establishing local perceptual fields, and adding pooling layers, thus reducing the memory occupied by the models and alleviating the overfitting of models.Equation ( 1) is the calculation of the convolution layer.
where  represents the l channel in the j layer, f is the activation function,  is the set used to calculate the l channel in the j layer,  is the l-1 channel in the i layer, * is the convolution operator,  is the convolution kernel vector, and  is the bias of the l channel in the j layer [10] .
The pooling layer, also called the convergence layer by some scholars, is essentially a process of downsampling.With pooling layers, deep learning models are simplified, and the computation speed is increased and the robustness of target features is enhanced.Therefore, the main role of the pooling layer is to extract target features and avoid overfitting the model by reducing parameters and computation.The following equation ( 2) calculates the maximum pooling and equation (3) calculates the average pooling where  , represents the value of the l neuron in the i channel of the t layer, S indicates the size of the pooled kernel, and  , represents the value of the l neuron in the i channel of the j layer.
The fully connected layer follows the convolutional layer and the pooling layer, and its neurons are usually connected to all the neurons in the pooling layer to transform all the feature matrices in the pooling layer into a one-dimensional feature macrovector.

Long short-term memory(LSTM)
LSTM can be considered a better RNN because it can solve the problem that RNNs cannot cope with long-range dependencies, and LSTM controls the transmission state through a special gating unit, which can decide which information in the input signal can be retained and which information needs to be forgotten.Therefore, it is a good choice to apply LSTM to tasks that require "long-term memory" such as fault diagnosis [8] .
There are three main gates within the LSTM, which act as forgetting gates and selectively forget the input coming in from the previous node.The input gate controls the input at this stage and selectively "remembers" it.The value of the current state output is determined by the output gating, The forgetting gate is calculated as Equation ( 4), the input gate is calculated as Equation ( 5), and the output gate is calculated as Equation (6) [7] .
where  is the sigmoid activation function, ℎ is the hidden layer state vector at time t-1,  is the input vector at time t,  is the bias of the corresponding gate, and  is the weight of the corresponding gate [5] .

Attention Mechanismx
The purpose of the attention mechanism is to highlight the features that want to be analyzed by analyzing the raw data and discovering the correlations that exist in it.When the attention mechanism is used for fault diagnosis tasks, it allows the model to focus more on certain specific information, thus ignoring irrelevant noise and improving the performance of the model to better accomplish the task [2] .

Fault diagnosis model
Traditional fault diagnosis methods have certain limitations, and the results obtained by fault diagnosis using CNN only have certain errors.Considering the superior performance of LSTM in handling fault diagnosis tasks and the good effect of the attention mechanism in emphasizing data features [3] , the fault model selected for use in this paper is shown in Figure 1 specifically.The signal is first fed into a convolutional neural network, and the time series features of the input data are extracted by mining the spatial features of the signal and feeding them into the LSTM network through the maximum pooling operation and the adaptive extraction function of the convolutional neural network itself [4].Finally, the degree of correlation between different variables is captured by an attention mechanism and used to distinguish between noise and features to be retained.In the end, fault diagnosis is performed by softmax function.The fault diagnosis flow chart is shown in Figure 2.

Experimental verification
Deep learning methods are differentiated based on whether labels are required.When sufficient data and labels are available, supervised learning-based methods are usually used, in which a model with sufficient performance can be trained to separate normal data from each fault data to achieve fault detection and diagnosis.When data labels are insufficient or difficult to obtain, unsupervised learningbased methods are usually used, which can perform high-dimensional feature representation of data with unknown labels.For fault detection tasks, data operating under normal conditions are usually modeled to isolate deviations from the normal state.For fault diagnosis tasks, unsupervised learning is used for feature representation to represent data information in an unsupervised training manner, and the classification of faults is accomplished by supervised training.In this paper, we will use the TE process dataset for fault diagnosis using supervised learning methods.

Data set
Data is the basis of deep learning methods, and a realistic fault diagnosis model needs actual representative chemical process data first.In practical research, it is difficult to obtain real chemical process data because it often takes a long time to collect a large amount of industrial data sufficient for fault diagnosis, and the current data sharing problem has not yet been solved.Therefore, scholars have used simulation to obtain chemical process data by adding different disturbing factors to simulate the types of faults that may occur in real industries.Combining a variety of industrial process datasets used in existing studies, the Tennessee Eastman (TE) chemical process dataset is chosen to be used in this paper based on what is needed for the troubleshooting task [9] .

.Simulation Verification
In this paper, the TE process simulation model is run under different fault states, and the data required by the experiment is collected.After research, in order to not lose generality, six fault types are selected in this paper.The specific types of faults are given in Table 1, which are Faults 1, 2, 5, 9, 11 and 18 respectively.Faults 1, 2 and 5 are step faults, Faults 9 and 11 are random faults, and Fault 18 is unknown.In addition, under normal circumstances, there are 7 kinds of TE process fault diagnosis research and verification.
Table 1.Fault type.During the TE process simulation run, each category of the training set was run for 45 hours respectively, and 800 samples were collected for each category; the test set was run for 24 hours respectively, and 480 samples were collected for each category.The time step was divided into 20, 3 variables with constant values were removed, and the remaining 50 variables were selected to make the data set.The form of training lumped samples is 280×20×50, and the form of test lumped samples is 168×20×50.The recognition accuracy of the CNN-LSTM-AM neural network on the TE data set is shown in Figure 3, and the specific parameters of the grid used for this deep learning model are given in Table 2.According to the experimental analysis, the recognition accuracy of the test set was stable at 97.67% under the optimal parameter settings in Table 2, and the training time was 36.3 minutes during the training of this experimental method.In order to prove that this deep learning algorithm is superior compared with other fault diagnosis methods, the results of diagnosis results of this method and other methods are compared, and the comparison results are shown in Figure 4. From Figure 4, it can be seen that the fault diagnosis method used in this paper has the highest accuracy rate.

Fault number Fault description Type 1
The content of component B remains constant while the feed flow ratio of component A to component C changes (