Research on online diagnosis method of insulator contaminated discharge based on XGBoost algorithm

Insulators are one of the commonly used power equipment in overhead transmission lines. Long-term operation of insulators will cause end deterioration, pollution flashover and other problems due to the discharge phenomenon caused by surface contamination, after which equipment failure huge losses to the power system may occur. In order to accurately monitor the discharge status of insulators, an online diagnosis method for insulator contamination discharge based on XGBoost algorithm is proposed in this paper. The discharge status of insulators can be judged by analyzing the discharge acoustic signals of the insulators running under different pollution levels, so as to realize accurate monitoring of insulators. The results show that this method can effectively judge the discharge state of the insulator thus early warnings can be given the safe and stable operation of the power system can be ensured.


Introduction
Overhead transmission line insulators are exposed to strong electromagnetic fields for a long time, they will suffer from defects such as surface deterioration due to loss and aging, or long-term contamination will cause pollution flashovers, causing large-area line outages, seriously threatening the reliability and safety of the power system [1][2]. Contamination accumulates on the surface of the insulator, forming a moist polluted area in a humid environment, which leads to a decrease in the insulation performance of the insulator [3]. In this case, it is very easy to cause pollution flashover . With the higher voltage levels of modern power systems, the current research results on insulators are difficult to meet the operational requirements of actual lines. It is necessary to further study the pollution flashover monitoring methods of insulators, so as to monitor and early warning of pollution flashovers in time [4][5][6]. Existing studies have proposed methods based on mathematical morphology, principal component analysis, and random forest based on insulator discharge sound by monitoring the discharge sound of insulators, but related methods are mainly based on the analysis of data frequency domain characteristics, which are easily affected by the environment noise [7][8][9]. This paper proposes an on-line diagnosis method for the insulator discharge state. Through the study of the acoustic information of the operating insulator in different discharge states, the accurate judgment of the discharge state is achieved. The results show that the method can effectively monitor the discharge status of insulators under different pollution levels and find hidden equipment hazards in time.

Test system
This test collects the discharge acoustic signals of wet polluted ceramic insulators at 10 different distances under the same pollution level and the discharge acoustic signals of dry polluted glass insulators at the same distance under 5 different pollution levels with different discharge state(corona discharge, surface discharge, flashover). The designed artificial pollution discharge test system is shown in Fig.1,which consists of sound collection device, capacitor voltage divider, test transformer, voltage regulator and insulators.In this experiment, the insulator string consists of two insulators. Fig.2(a) shows the test device at the testing site and Fig.2(b) is the flashover discharge phenomenon of wet polluted ceramic insulators.

Production of polluted insulators
The pollution degree of the sample is expressed by salt deposit density(SDD) and the non-soluble deposit density(NSDD). Combined with the actual pollution situation, the NSDD-SDD ratio in this test is 5:1 and the insulator strings are numbered in sequence and smeared with different degrees of SDD. The sodium chloride (chemically pure) and the weight of diatomaceous earth required for each sample are calculated according to SDD and the insulator surface area of the insulator. After drying, the sodium chloride is placed in a small ceramic bowl, stirred evenly and applied to the surface of the sample insulators. [10] Using this kind of quantitative smearing of stained samples is simple and easy to do and does not require random inspection. The above-mentioned 2 pieces of smeared insulators are selected as a string of insulator to which power frequency voltage is applied. The sound is record and collected under different discharge states using sound collection devices.The salt deposit and non-soluble deposit required to be painted on the upper and lower surfaces of the insulators with different SDD is shown in  Fig.3.Typical acoustic signal of corona discharge of the polluted insulator at 3 meters is shown in Fig. 4.  Fig. 4 The corona discharge sound at 3m

Introduction to XGBoost algorithm
The XGBoost (Extreme Gradient Boosting) algorithm is a very effective algorithm in machine learning. Its basic idea is to perform a second-order Taylor expansion on the objective function, and then use the second-order derivative information of the function to train the tree model. In the training process, the complexity of the tree model is added as a regular term to the optimization goal, which makes the generalization ability of the obtained model higher. When the XGBoost algorithm model has K trees, its objective function O is expressed as below. [11][12] (1) Where is the true value of the i-th target; is the predicted value of the i-th target; , describe the difference between and ; n is the number of samples; is the complexity of the tree model where the k-th sample feature parameter is located; k is the total number of sample feature parameters.By optimizing in the gradient direction, the model residual of the learner is continuously reduced and a new tree model is obtained. Therefore, when the objective function is approximated by the second-order Taylor expansion,the training objective function O in the t-th iterative step calculation is transformed into the expression below.
2) The first and second derivative of the prediction error with respect to the current model are Since at the t-th iteration, the t-1th model residual is known, the constant term is removed and the equation (3) is expanded, and the objective function is written in the form of accumulating leaf nodes: is the tree structure function; is the output score of each leaf node; T is the number of leaf nodes in the split tree; λ and γ are the weighting factors used to control the proportion of corresponding part. According to the specific application scenario, the corresponding error function is constructed to train the specific tree model. Therefore, it is completely feasible and necessary to introduce the XGBoost algorithm into fault diagnosis and build an adaptive fault split tree.Similar to the information gain and Gini index in the decision tree, the XGBoost algorithm calculates the gain Θ of the selected parameter every time when adding a segmentation to a leaf: where L and R represent the left subtree and the right subtree respectively; / is the information score of the left subtree; / is the information score of the right subtree; / is the current undivided information score. The gain Θ is essentially to statistically purify the important information contained in the data to reduce the uncertainty of the information before and after the tree leaves are split. When the entire tree is split, the amount of information of the leaf nodes that are finally formed is the largest, and the importance of information is also the highest.

Test Results and Discussions
The acoustic signals of wet polluted ceramic insulators at different positions are collected. According to the acoustic signal data, the minimum, maximum, standard deviation and distance are taken as the characteristic parameters of the signal and the different stages of the discharge (corona discharge, surface pollution discharge, flashover) as the target, some of the sample parameters used to train the model are shown in Table 2.The acoustic signals of wet polluted ceramic insulators at different distances include 30 sets of data, and the distance between the sound collector and the insulator ranges from 2 meters to 11 meters. The samples collected at the same distance include the acoustic signals of three different discharge states of insulators. The minimum, maximum, and standard deviation of the acoustic signals in the three different discharge states have an obvious trend of gradual increase and the maximum value and standard deviation in the flashover discharge stage have obvious sudden changes. The first 22 sets of data are used as the training set to train the model, and the last 8 sets of data are used as the test set.
The acoustic signals of dry contaminated glass insulators at different levels of contamination are also collected.According to the acoustic signal data, the minimum, maximum, standard deviation and contamination density are taken as the characteristic parameters of the signal, the different stages of the discharge as the target, some of the sample parameters used to train the model are shown in Table 3. The sample No.5 is related to the clean insulator. The acoustic signals of insulator discharge with the same SDD include three different discharge states. The minimum, maximum and standard deviations of the the acoustic signal of different discharge states have the same tendency as the wet polluted ceramic insulators, and there are obvious sudden changes in the minimum and standard deviations of the signals during the flashover discharge stage. The first 9 sets of data are used as the training set to train the model, and the last 6 sets of data are used as the test set .
The XGBoost model is built and trained using Python 3.7. In order to illustrate the accuracy of XGBoost's online diagnosis of the discharge phase, it is compared with Random Forests (RF) and Gradient Boosting Decision Tree (GBDT) models. Table 4 shows the diagnosis of discharge status of wet polluted ceramic insulators using different models, where 0 stands for corona discharge, 1 stands for surface discharge and 2 stands for flashover.The accuracy of the XGBoost-based online diagnosis method for wet polluted ceramic insulators at different positions reach to 87.5% and it is higher than both of the accuracy of the random forest and the accuracy of the gradient boosting decision tree ,which are 62.5% and 75.0% separately.The accuracy of the XGBoost-based online diagnosis method for dry polluted glass insulators at different pollution levels reach to 83.3% while the accuracy of the random forest is 83.5% and the accuracy of the gradient boosting decision tree is 66.67%. In summary, XGBoost has a high accuracy rate for on-line diagnosis of insulator discharge and hidden perils in insulator operation can be efficiently found using this method.

Conclusion
Based on the results and discussions presented above, the conclusions are obtained as below: (1)An online diagnosis method of insulator discharge state based on XGBoost algorithm is proposed in this paper. By taking the minimum, maximum, standard deviation, distance and contamination density of the discharge acoustic signal of the insulator in operation as characteristic parameters as the input of the XGBoost algorithm, the discharge state of the insulator is predicted.
(2)The accuracy rate of the online diagnosis method for wet polluted ceramic insulator discharge reaches 87.5%, and the accuracy rate of the online diagnosis method for dry polluted glass insulator discharge reaches 83.3%, which is higher than RF and GBDT.
(3) The contamination status of insulators can be effectively monitored by the method so that hidden dangers in the operation of insulators can be discovered in time and the safe, reliable and stable operation of the power system can be ensured .