Monitoring and Early Warning for Hydroelectric Generating Sets based on Hotelling’s T2 and LSTM Model

A method of monitoring and early warning for hydroelectric generating sets based on Hotelling’s T-squared statistics(T2) and Long Short-Term Memory(LSTM) network is proposed. Multi-channel vibration and swing signals can be fused and predicted based on the given model. The monitoring and alerting function can also be implemented according to a threshold value of T2. First, the vibration and swing signals of multi-channels hydroelectric generating sets are obtained and fused based on Principal Component Analysis(PCA) to reduce the amount of data. Second, Hotelling’s T2 statistics under normal running state is calculated and taken as a warning threshold. Third, a LSTM model is established to predict future values of T2, and early warning for a hydroelectric generating set can be realized by use of the obtained warning threshold. The vibration and swing signals from 16 channels in a set are used to validate the effectiveness of the method. Finally, there is a more than 90% reduction in the amount of data and the efficiency is significantly improved. LSTM has a high accuracy in T2 prediction and realize the early warning for abnormal status.


Introduction
After technical accumulation and information construction, most hydropower stations in domestic already have advanced monitoring system.In daily operation and maintenance works of hydroelectric generating, vibration and swing signals are the most commonly used, since they are always directly obtained from the mechanical equipment and will not be disturbed by the environment and has high reliability [1].In order to extract the state information hidden in original signals, vibration and swing signals should be analyzed in time or frequency domain based on methods such as Wavelet transform (WT) [2,3], Variational Mode Decomposition and Empirical Mode Decomposition [4] et al.Then recognition model can be established to realize a fault diagnosis or abnormal state monitor.These methods have already developed for a long term and widely used in practical, however they still belong to post-maintenance and cannot avoid failures occur, they can realize diagnosis for certain faults already happened.In order to upgrade the current fault prediction technology and improve the technology of pre-maintenance for hydroelectric generating sets, it is important to introduce methods of data fusion and intelligence prediction into the monitoring and fault diagnosis of sets.
Recently, academics are paying more attention to health status of sets in pre-maintenance [5].With the rapid development of deep learning, various prediction models based on deep learning for hydropower generating sets are established, and Long and Short-Term Memory (LSTM) is one of the most widely used model.For instance, a hybrid LSTM model with random forest was proposed by Zolfaghari [6], and the power generation of hydropower sets was predicted with a high accuracy.Fu [7] also made a prediction of running stats for water turbine based on LSTM.A LSTM-DBN model was proposed by Luo [8] to predict vibration signals from generating sets, time-frequency features and wavelet packets were used in the model.The study about prediction based on LSTM model has a significant in the proactive maintenance of hydroelectric generating sets, due to the strong ability in regression of LSTM.In practice, monitoring signals of hydroelectric generating sets usually come from multiple channels, so the amount of data is huge, and this make it hard to train a LSTM model efficiently and accurately.Besides, time-frequency features from WT or EMD will consumes much more computing resources.Therefore, LSTM models based on signal prediction or time-frequency features are difficult to be used in actual.To solve the problem, various data fusion methods have been received more attention.Wang [9] introduced Principal Component Analysis (PCA) to reduce the amount of data, and divided original mechanical signals into principal component subspace (PCS) and residual subspace (RS).With Hotelling`s T 2 statistics, a monitoring of diesel engine faults was realized.Then Li [10] improved the warning threshold values by combining T 2 and SPE statistics together with different proportions.
In order to improve the fault prediction method of hydroelectric generating set, and enhance the efficiency and accuracy of prediction for vibration and swing signals of hydroelectric generating set, a data-driven model for early warning based on PCA and LSTM is proposed, and early warning for a hydroelectric generating set has been realized.Second, Hotelling's T 2 statistics under normal running state is calculated and taken as a warning threshold.Third, a LSTM model has been established to predict the value of T 2 .Then early warning for a hydroelectric generating set can be realized based on the predicted value and threshold.

Hottelling T 2 statistic
In this paper, vibration and swing signals are used as information sources.Calculate the Hotelling's T 2 statistics of them, thus multi-channels signals can be changed into one sequence and be replaced.In fact, T 2 represents the distance between the input sample and mean of the whole samples, and it can be used to describe the deviation degree based on the normal running state of a hydroelectric generating set.T 2 statistics is calculated based on PCA as Eq.(1). Here (2) S means the covariance matrix and calculated with Eq.( 3).
n represents the number of samples and m represents the length of each sample.X means the sample mean vector, the sample variance of the jth feature in the K th sample, X ijk represents the ith observed value on the jth feature in the kth sample.

Long short-term memory network
The LSTM network is an improved model based on Recurrent Neural Network(RNN).In a LSTM model, gate mechanism is introduced, therefore, long sequence data can be processed much better.Gate mechanism can also overcome gradient issues.To sum up, LSTM is more suitable for a long sequence such as in the monitoring of hydroelectric generating sets.The structure of a LSTM model is more complex compared to a typical RNN model, which has only one internal state ht and sensitive to short-term sequences.A basic unit of LSTM is shown in figure .2.A new state unit named as Ct is added in each cycle unit of LSTM to save the information already input for a long time, which is why LSTM is more suitable for prediction of time sequences.At a time of t, the input of model contains output value ht-1 , input value and state value Ct-1 at a time of t-1, while the output of model contains state value Ct and output value ht at a time of t.The key function of LSTM is realized by the control of state value Ct, and a structure named as gate is used.Three kinds of gates always appear in LSTM as following.
Forget gate: The forget gate assigns a value to the cell state Ct-1 and reads the values Xt and ht-1, where the magnitude of the output value denotes the degree of forgetfulness.The range of ft is [0, 1], and the smaller the value, the more forgetting it is.The forget gate is calculated with Eq.( 4).
f1 ( [  ,  ] ) Where ft indicates the forget gate.Wf is the weight matrix of the forgetting gate into a longer vector.
[ht-1,Xt]represents the connecting of two vectors ,bf is the offset term.

Input gate:
The input gate is partially obtained from Xt and ht-1 to obtain the current it, which generates the updated neuron state information Ct.The current state information ht is also calculated from the input Xt and ht-1.The input gate is calculated as follows. 1 ) Where it denotes the output of the input gate.Wi is the weight matrix of the input gate.bi is the offset term. .bc is the bias term of the unit state t C . Ct is the unit state at the current moment.Ct-1 is the unit state of the previous moment.
Output gate: The output gate Ot determines the information in Ct which used to calculate ht as the current time.Tanh layer processes Ct to get a value between (-1,1).Ot and ht are calculated as follows.

( [ , ]
) tanh( ) Where Ot is the output of the output gate.Wo is the weight matrix of the output gate.bo is the offset term.ht is the output value at the current time.
In this study, multi-step prediction is used.Compared to single-step prediction, multi-step can predict for a longer time.It is prepared for handling multiple related output variables at once, accounting for their correlation during the modeling process, effectively capturing the interaction and dependence between data, and ensuring that the numerous output variables generated have consistent outcomes.

Early Warning based on T 2 -LSTM model
After a T 2 -LSTM model is established, the T 2 sequence in the future can be predicted by LSTM.Since this threshold is derived from the normal operating state of the hydroelectric generating set, once the predicted sequence is higher than its limit value, it indicates that the set is in a fault or failure state, which will cause abnormal vibration of the set.It is strong suggested that stop the set operation and check the reason.

Data description
In order to demonstrate the effectiveness and correctness of proposed T 2-LSTM model and realize the function of early warning, vibration and swing signals, collecting from a hydroelectric power station located in the southwest China, are analyzed in this work.The equipment is a Kaplan turbine and the rated power is 175 MW.The signals come from 16 channels, composing of 6 groups of swing signals and 10 groups of vibration signals, with a sampling frequency of 1024Hz.
Both normal running state data and abnormal operating state are concluded, and the samples are divided into two groups.One group is used to obtain the T 2 threshold and build a LSTM model as training sets, the other is used as test sets to evaluated the effect of proposed method.Besides, parts of training sets are used as validation sets to evaluate the model during training process.

Monitoring and warning
To determine the threshold value for warning, the running data under normal state is used.Signals from 16 channels are all used to make information fused based on PCA .And then threshold value is obtained from the Hottelling's T 2 statistics, sometimes the threshold value is also named as control limit in statistics.The standard of monitoring and warning is the threshold line shown in figure .3.

Figure 3.
T 2 control limit based on normal running data.A group of abnormal running data is collected when the turbine has an obviously vibration caused by an inlet-blade vortex, this state should be avoided immediately since it will make a physical damage for turbine.With the limit obtained above, this fault state has been successfully be distinguished and warned figure.4.

Figure.4.
Abnormal data identified by the control limit.To evaluate the performance of established LSTM model, the mean square error(MSE), root mean square error(RMSE) and coefficient of determination (R 2 ) are chosen as the evaluation statistics.The calculation can be described as Eq.( 10), Eq.(11) and Eq.(12).

Multi-step-output prediction model based on LSTM
Where n is the number of samples in the test sets is the actual value, is the predicted value acquired with LSTM.MSE is a performance measure often used in regression problems, it is the average of the squared differences between the predicted and true values, and a lower value means a better performance of the model.The difference between the true and expected values is measured by RMSE, which is the square root of the MSE.The R 2 coefficient of determination is a statistic that measures how well the regression model fits the data.These values reflect the similarity degree between actual observations and predictions of model in table 3.For the application, historical running data of the set specified in Section 3.1 is chosen.The T 2 -LSTM model developed in Section 3.3 is used with the T 2 early warning control limit determined in Section 3.2.In order to compare the prediction results with the control limit and determine whether the future condition is need to be warned, the T 2 statistics corresponding to the vibration and swing signals are confused and input LSTM model.The output result is a sequence of T 2 values which can be compared with limit control line.

Example
A data sets on July 2021 is identified to be abnormal and caused a warning by trained model.The predicted output of T 2 -LSTM is then extracted and shown in figure.5, and the real data is shown in figure.6.It can be seen that predicted data are very similar to the actual one.The red line represents threshold for warning.The data are all over the limit line, and it means that the state of hydroelectric generating set corresponding to this sequence is abnormal and need to be checked.After a maintenance work, it has been found that the set running into an obvious deviation from rated power, thus low load caused heavy vibration of overflow components and this running status should be avoided in daily work.

Conclusion
A T 2 -LSTM monitoring and early warning model for hydroelectric generating sets is proposed in this paper.With an information fusion of multi-channel vibration and swing signals, reduced-dimensional data and warning threshold are obtained.Prediction based on LSTM is used, monitoring and warning has been realized based on T 2 values predicted.Main conclusions are as following.
(1) The multi-channel signals are fused into a sequence of T 2 values based on multivariate statistical process.Meanwhile, the threshold value for early warning can be also obtained by PCA.The amount of data to be processed is reduced by more than 90% compared with original signals.
(2) A multi-step regression for the T 2 sequence is realized based on LSTM model.Thus, a prediction of fused T 2 sequence instead for a prediction of multi-channel signals.It is obviously that the difficulty of predicting original data can be reduced in a large degree.A regression based on LSTM aiming at one-dimensional sequence is much easier and with a very high accuracy.
(3) According to the results from LSTM, the running state of hydroelectric generating sets can be predicted to reflect the state changing in future.Combined with the threshold value, an early warning can be realized.In an example, a sequence of abnormal running state has been successfully predicted and identified based on the proposed model.

Figure 1 .
Figure 1.The procedure flowchart of monitoring and early warning for hydroelectric sets.Procedure flowchart is shown in figure.1.Firstly, vibration and swing signals from multi-channels of hydroelectric generating sets are obtained and fused based on PCA to reduce the amount of data.Second, Hotelling's T 2 statistics under normal running state is calculated and taken as a warning threshold.Third, a LSTM model has been established to predict the value of T 2 .Then early warning for a hydroelectric generating set can be realized based on the predicted value and threshold. 21

Figure 2 .
Figure 2. The structure of a basic unit in LSTM.

cW
is the weight matrix of the unit state t C A multi-step-output model to predict the sequence of Hottelling T 2 statistic from fused with vibration and swing signals has been established, Hottelling T 2 statistic is used to reflect the state of hydroelectric generating sets.All data is divided into training set, validation set and test sets in a 6:2:2 ratio.There are 12888 groups of training sets, 4096 groups of validation sets, and 4096 groups of test sets.Each set contains 10 inputs and 3 outputs.The size of parameters in the model are shown in table. 2.

Figure 5 .
Figure 5. Predicted results of abnormal signals.Figure 6. Actual results of abnormal signals.For the application, historical running data of the set specified in Section 3.1 is chosen.The T 2 -LSTM model developed in Section 3.3 is used with the T 2 early warning control limit determined in Section 3.2.In order to compare the prediction results with the control limit and determine whether the future condition is need to be warned, the T 2 statistics corresponding to the vibration and swing signals are confused and input LSTM model.The output result is a sequence of T 2 values which can be compared with limit control line.

Figure 6 .
Figure 5. Predicted results of abnormal signals.Figure 6. Actual results of abnormal signals.For the application, historical running data of the set specified in Section 3.1 is chosen.The T 2 -LSTM model developed in Section 3.3 is used with the T 2 early warning control limit determined in Section 3.2.In order to compare the prediction results with the control limit and determine whether the future condition is need to be warned, the T 2 statistics corresponding to the vibration and swing signals are confused and input LSTM model.The output result is a sequence of T 2 values which can be compared with limit control line.

Table 1 .
Vibration and swing signals selected.

Table 2 .
Size of parameters in LSTM model.

Table 3 .
Evaluation results of model.