Data Restoration of dissolved gas content in transformer oil based on the CS-SVR model

Accurate monitoring of the dissolved gas content in transformer oil is crucial for transformers’ safe and stable operation. The early identification for detecting potential power transformer failures is necessary for the stability of an electrical grid. Dissolved gas analysis is an essential technology in transformers diagnosing insulation faults. Missing dissolved gas data can directly impact the reliability of monitoring results of a transformer. This study presents a data plug-in model based on support vector regression (SVR) to restore missing dissolved gas data. To further improve the accuracy of data restoration, the cuckoo search algorithm (CS) is used for optimizing SVR parameters. By verifying H2 and C2H4, the CS-SVR model demonstrates superiority over other plug-in procedures in repairing dissolved gas data.


Introduction
The oil-immersed condition is a critical element in power equipment.The stable operation of the transformer is essential to the reliable provision of electricity [1] .Factors such as extreme weather, operating errors, and cascading outages may lead to transformer failure.During regular operation, a small amount of gas is dissolved in transformer oil [2] .In the event of a fault, the gas composition in oil undergoes a more significant alteration.The concentrations and relative proportions of by-product gases strongly correlate with the insulation conditions of transformers [3] .An online monitoring system constantly collects data on dissolved gases in oil, monitors the characteristic gas content, and creates a time series of the typical gas content.When dissolved gas data is missing, the transformer potential faults and abnormal status are difficult to identify--conditions such as sensor breakage, communication transmission issues, storage system failure, etc., cause the missing data.Therefore, adding and restoring the lost data is necessary to ensure the integrity of dissolved gas data [4] .
The removal and plug-in methods are commonly employed for handling data loss.The removal method involves ignoring the missing data, which can lead to misunderstandings of partial information and result in variances in assessment outcomes.On the other hand, the plug-in method includes several approaches, such as linear interpolation, average value method, and neural networks.However, these methods of data restoration need to be corrected.For example, the neural networks method has a slow learning rate and is susceptible to local extremes [5][6][7] .Therefore, finding a suitable solution to address data restoration of dissolved gas content is necessary.
SVR is based on statistical learning theory and can accurately restore the time series of dissolved gas data while improving generalization ability through structural risk reduction.However, the data restoration accuracy relies on SVR parameters [8] .Test-based methods and cross-validation methods are frequently applied to optimize these parameters.Both approaches have drawbacks, such as unknown test set labels and a lengthy optimization process.To address these issues, a CS-SVR model is proposed for data restoration of dissolved gas content.The CS model optimizes the parameters of SVR to recover complete data.The efficacy of the CS-SVR model is validated through simulated assessments.

2.1.Support vector regression
SVR depends on a high generalization ability, and accurately predicting outcomes is particularly useful for small sample sizes and nonlinear data.The fundamental principle of SVR is to find a non-linear regression function and employ the position to fit the data set in the training sample points.
where xi is the input variable, and the input variable undergoes a nonlinear transformation φ(xi) which maps input to a higher-dimensional space.The corresponding output is denoted as f(xi).The equation also includes weight vector w and deviation b, which are calculated by the provided equation: To equalize the complexity of the model and error parameters, the solution procedure involves relaxation variables δi ≥ 0 and δ* i ≥ 0, as well as error penalty coefficient C. The optimal solution can be obtained using the following function: The Lagrange multipliers λ, λ * , η, and η * are used to get the performance function: The nonlinear regression function can be calculated as follows: where K represents the radial basis function (RBF), and γ denotes the kernel radius of the RBF.
The accuracy of SVR largely depends on its model parameters, namely γ and C.These parameters are typically determined by random selection or researcher verification, both of which can result in suboptimal model accuracy.Therefore, the CS algorithm is utilized to improve the accuracy of SVR by optimizing γ and C.

2.2.Cuckoo search algorithm
The CS algorithm inspired by the incubation parasitism behavior of cuckoos was first proposed by Yang.It combines global and local random wandering to find the best solution to the current optimization problem efficiently.The procedure involves three key elements: a local random movement, the determination of the best solution, and a global Lé vy flight-based random selection [9] .The CS algorithm utilizes global wandering based on the three ideal states to implement population updates and determines nest locations: • Cuckoos lay one egg at a time and choose a random location to build their parasitic nests.
• A high-quality nest can survive until the next generation.
• Each generation of the number of nests remains constant.The probability of discovering the eggs is denoted by Pa which ranges from 0 to 1.The three ideal states are expressed as follows: where t denotes the current iteration count, xt k represents the k-th cuckoo's t generation number, and xt+1 k represents the updated solution.The flight step of Lé vy is represented by Lé vy(λ), and α represents the step factor.
Lé vy(λ) is given as follows: ( ) where μ and ν are both uniformly distributed random numbers.
The steps above are repeated for iteration, and the algorithm is finished once the maximum iteration limit is reached.

2.3.CS-SVR model structure
To evaluate the adequacy of the SVR model, the root mean square error yRMSE is applied in this paper.yRMSE is calculated as follows: where xact and xpred represent the actual value and reconstructed value of dissolved gas content in oil, and k is the number of gas concentration samples in the reconstruction set.The model's effectiveness is improved as the yRMSE value approaches zero, especially when the difference between xact and xpred is minimized.
The CS-SVR algorithm flow chart is shown in Figure 1, and its steps can be summarized as follows: • SVR is initialized, then the model fitting is completed through the training set to obtain the corresponding fitness yRMSE ; • The CS algorithm updates the search path to complete the population iteration and retains the dominant individuals; • After iterating for a set number of generations, the optimal solutions of and γ and C are applied for the parameters of the SVR model to obtain the final model [10] .
The condition is that either yRMSE < 0.1 or the parameter exceeds a limit value in this paper.

CS Figure1
The CS-SVR algorithm flow chart.

Analysis of experimental results
In this study, a dataset comprising 300 sets of dissolved gas data is utilized.The data, which is evenly sampled at a frequency of 1 day, was collected from a 500-kV transformer in operation.The first 240 pieces of actual data were employed as the training data to construct input samples, the last 60 pieces were utilized for comparing predicted values of missing data.Taking H2 and C2H4 as examples, the CS-SVR model was applied to predict the lost data as output samples, and the findings were presented in Figures 2 ~ 5.
The results presented in Figures 2 ~ 5 demonstrate that the reconstruction curve of missing data by the CS-SVR method exhibits a high level of consistency with the actual data curve.Residual values which represent the distinction between the true and reconstructed value mostly fall within the range of -0.5~0.5 μL/L.It is calculated that all are less than 0.1, indicating that the CS-SVR model is capable of effectively reconstructing missing data.Therefore, the CS-SVR model is a reliable data repair tool that exhibits excellent interpolation effects.The precision of the reconstructed data is affected by both the CS-SVR model's performance and the quantity of training samples.In order to investigate how the quantity of training samples affects the interpolation accuracy of the CS-SVR model, experiments were conducted on C2H4 content data with missing data ratios ranging from 10% to 40%, as shown in Figure 6.The interpolation accuracy of the CS-SVR model decreases as the amount of data that is missing increases, as shown in Figure 6.The model exhibits a good interpolation effect when the amount of data that is missing is less than 30%.When the proportion of missing data is less than 20%, the interpolation effect is ideal, and the reconstructed concentration curve fits well with the measured C2H6 content.Therefore, the CS-SVR model has high accuracy in repairing small data samples with a missing proportion of less than 30%.
To validate the accuracy of the data repair outcomes of the CS-SVR model, comparative experiments on data restoration are conducted using various methods, such as linear interpolation, neural networks, Bayesian methods, and SVR.The comparison results of 30% missing data are presented in Table 1.
Table 1 Error results for different methods.Upon comparing the results of data interpolation using different methods, it is evident that the yRMSE of SVR is smaller than that of other methods.Furthermore, the interpolation outcomes of CS-SVR are superior to SVR, suggesting that the utilization of the CS for optimizing SVR enhances the precision of data reconstruction.Therefore, CS-SVR is an excellent data reconstruction model, effectively overcoming the limitations of other models.

Conclusions
In this paper, a data reconstruction model based on CS-SVR is introduced to restore dissolved gas data in transformer oil, achieving precise and effective interpolation of missing data.The results of reconstructed data have a high degree of consistency with the original monitoring data, and the yRMSE is less than 0.1.Furthermore, compared with other methods, CS-SVR has higher accuracy and better reliability in data restoration.

Figure 6
Figure 6The reconstruction results under different missing ratios.