Application of ensemble learning approach on anomaly detection of a dual induced draft fan system

Intelligent monitoring of thermal power plants has been receiving growing interest. As one of the promising technologies in intelligent monitoring, machine learning-based fault detections for auxiliary equipment in the thermal power plant have been widely investigated. In this work, we constructed normal behavior models to predict the current of a dual-induced draft fan system by gradient-boosting tree regression. For anomaly detection, we evaluated the residual by calculating the mean value and meanwhile added a control boundary based on statistical analysis. As a result, we successfully identified the abnormal fan in the dual-induced draft fan system and assisted in the schedule of maintenance work.


Introduction
Despite the fastest growing of renewable energy, thermal power plants accounted for the majority of installed power generating capacity in China, which had a capacity of about 1300 gigawatts and took around 52% of total power generation by the end of 2022.The operation and reliability of thermal power plants depend not only on the coal quality but also on the performance of each auxiliary equipment [1].An induced draft (ID) fan is one of the important auxiliary equipment in a thermal power plant, which is used to exhaust the flue gas generated in the boiler and maintain the boiler operated under negative pressure [2].Due to the harsh working conditions, faults occur to ID fans in daily operations, which leads to losses of power generation and sometimes unscheduled shutdowns of the boiler [3].Therefore, early fault detection and health evaluation of the ID fans is expected.
Normally, the distributed control system (DCS) and supervisory information system (SIS) are installed in thermal power plants, which are used to record thousands of parameters during the operation.Data-driven methods, by analyzing data either from DCS or SIS, have drawn rising interest in fault detection and health evaluation of the auxiliary equipment in thermal power plants [4][5][6][7][8].Fault detection of ID fans based on vibration analysis or temperature modeling is widely studied.Panda et al. [6] analyzed the vibration data of the ID fan for a superthermal power plant and successfully diagnosed the issue of insufficient lubrication.Hu et al. [7] implemented a nonlinear autoregressive exogenous approach.They combined it with principal components analysis to model the normal behavior of the drive-end bearing temperature of an ID fan and detect the overheating of the bearing system.Lv et al. [9] constructed the informative memory matrix through discrete particle swarm optimization and subsequently trained the multivariate state estimation technique (MSET) model.The overheating of the bearing system of ID fans was identified based on the deviations of the MSET model.
In addition to vibration signal and temperature, the current of ID fans is another important parameter, which is related closely to the health state of the ID fan.In this work, we implemented normal behavior modeling and residual analysis to detect anomalies in the current of a dual-induced fan system, which is equipped in a 600 MW coal-fired power plant.In section 2, the data set and the method for anomaly detection are introduced.In section 3, the results are reported and discussed.In section 4, conclusions are presented.

Data set
In this study, two ID fans are equipped parallel to the boiler.Raw SIS data with time step 30 s of the parameters of the two ID fans was collected between 2020-08-01 and 2021-02-28.The measured load curve is shown in Figure 1.The working state that after the start-up process of the boiler was considered here.The data points with a load of less than 200 MW were removed.The data was then split into training data set, which was from 2020-08-01 to 2020-12-01 (before shutting down), and testing data set, which was from 2020-12-13 (after starting up) to 2021-02-28.Based on the domain knowledge, the following three feature variables were selected to construct current prediction models: input vane position of ID fan A/B, feedback vane position of ID fan A/B, and load.The data with NaN was cleaned.

ID fan current modeling
The current regression model of each ID fan was trained respectively by gradient boosting tree regression (GBTR), which is a tree-based boosting ensemble approach and has the advantage of dealing with a large data set.To avoid over-fitting, the hyper-parameters, number of trees (), maximum depth of the individual tree () and learning rate () were selected through cross-validation (CV).To improve tunning efficiency, a 5-fold CV was implemented.The mean squared error was calculated to evaluate the trained model.The current of each ID fan of testing data was predicted respectively by the trained model.

Anomaly detection
In this work, the anomaly of current was detected through normal behavior current prediction and residual analysis.The residual is calculated according to Eq.1.

𝑑 𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑐𝑢𝑟𝑟𝑒𝑛𝑡
(1) To reduce noise, the residual evolution of testing data was processed by following procedure (as shown in Figure 2): firstly, the mean of residual ( ̅  1 ) in a defined window length ( 1 ) is calculated according to Eq.2, then moving with a defined increment length ( 2 ) to the next window and calculating the  ̅  1 .The procedure is repeated for the complete testing data.
1 is the number of samples in  1 .

Figure 2. Processing of residual.
For anomaly detection, the upper control boundary (UCB) and lower control boundary (LCB) were added to the diagram of residual evolution based on Eq.3.
here,  is a constant and defines sensitivity to detect the anomaly and in the onsite application,  is usually set as 3. σ 0 is the standard deviation of the residual of the training data set. ̅  1 is calculated simultaneously during the onsite monitoring.The anomaly of current is detected when  ̅  1 is out of the control boundary for certain of times.

ID fan current prediction models
The selected hyper-parameters of the two GBRT models are listed in Table 1.A shorter training time was approved compared to other machine learning algorithms, such as the support vector machine.Acceptable values of mean squared error were reached for both models.The mean (μ 0 ) and standard deviation (σ 0 ) of the residual of the training data were calculated, respectively.According to Eq.3, UCB/LCB for each ID fan was calculated. 0 = 0.014  0 = 5.9 B 33.9  0 = 0.003  0 = 6.2

Anomaly detection
According to the domain knowledge, comparable values of current are expected for the two ID fans.In the onsite application, the measured values of the current of the testing data were compared for two ID fans.The difference continuously increased from about 10 A at 2021-02-23 01:07:30 to the peak value (around 89 A) at 2021-02-23 14:03:30 (Figure 3).Significant differences between the current of the two ID fans indicated abnormal behavior of the dual ID fan system.However, the existing monitoring regime cannot locate the abnormal device.Further investigation should be performed to diagnose the anomaly for each ID fan.The current of each ID fan of testing data was predicted by the trained model, respectively.The predicted and measured values are plotted in Figure 4.For ID fan A, a significant difference between the predicted and measured values was observed on 2021-02-23 (red dashed circle).To reduce noise and smooth the trend curve, the residual evolutions of two current models were processed based on Eq.2, respectively.In this study, L 1 and L 2 were set equally as 60 minutes.The processed residual evolution is plotted for each ID fan respectively in Figure 4.According to Eq.3, the UCB and LCB (red dashed lines) were added.The anomaly of ID fan current is identified when the calculated residual mean is out of the UCB/LCB.As shown in Figure 4, for ID fan A, pronounced deviation of the residual mean above the UCB (17.7 A) was detected between 2021-02-23 06:00:00 and 2021-02-23 19:00:00.While, for ID fan B, the residual mean remained within the control boundary (±18.6A).Therefore, an anomaly might have happened in ID fan A. According to the operation log, abnormal control of ID fan A was proved by the operation personnel.The proposed approach successfully detected the anomaly of the specific ID fan and helped to guide the maintenance work.Additionally, in the application, warning criteria could be set.For example, the calculated  ̅  1 is exceed the UCB for a certain of time.However, the sensitivity of the warning criteria depends on the defined window lengths.The proposed approach contributes to the intelligent monitoring of the thermal power plant.

Conclusions
In this study, we implemented normal behavior modeling and residual analysis to detect anomalies in a dual-induced fan system.We trained gradient boosting trees to predict normal behavior current for two ID fans.Then, the current of each ID fan from 2020-12-13 (after starting up) to 2021-02-28 was predicted by the model, respectively.The residual was evaluated by calculated mean value in a given window time.The calculated residual mean of the current of ID fan A deviated significantly above the UCB between 2021-02-23 06:00:00 and 2021-02-23 19:00:00, which indicated the abnormal behavior of ID fan A. The anomaly was proved by the operation personnel, and the maintenance was carried out.

Figure 1 .
Figure 1.The load curve of the thermal power plant.

Figure 4 .
Figure 4.The measured and predicted current from 2020-12-13 (after starting up) to 2021-02-28 and the corresponding evolution of residual mean: (a) induced draft fan A and (b) induced draft fan B.

Table 1 .
Two current regression models.