Research on big data risk assessment of major transformer defects and faults fusing power grid, equipment and environment based on SVM

With the development of power big data, considering the wider power system data, the appropriate large data analysis method can be used to mine the potential law and value of power big data. On the basis of considering all kinds of monitoring data and defects and fault records of main transformer, the paper integrates the power grid, equipment as well as environment data and uses SVM as the main algorithm to evaluate the risk of the main transformer. It gets and compares the evaluation results under different modes, and proves that the risk assessment algorithms and schemes have certain effectiveness. This paper provides a new idea for data fusion of smart grid, and provides a reference for further big data evaluation of power grid equipment.


Introduction
In the development of urban power grid, equipment safety warning plays a very important role. And equipment defects and risk assessment need to integrate a variety of factors, so the comprehensive assessment model is in short supply. With the development of smart grid, large and complex equipment monitoring systems have been on the line. At the same time, the species and quantity of data produced by increment of all kinds of equipment test and inspection plan both are increasing rapidly. With the increment of power system equipment related data, the disadvantages of traditional data analysis method will be highlighted. It leads to low data utilization, and makes power system staff more and more skeptical about existing equipment analysis and evaluation model. In the foreground of big data development, it is very important to integrate the data from different dimensions to mine potential information. Equipment failure risk science determine the needs of different departments in the information processing stage of data problems, such as lack of cross and miscellaneous. Aiming at this kind of problem, the traditional data modeling analysis is difficult to keep pace with the change of data type and the fallibility of partial data. The better solution is to quickly excavate the latent law of data analysis by machine learning algorithm, and form an elastic evaluation scheme. In this paper, based on the monitoring, defect, test and other data of the whole network main transformer equipment in Guangxi Province, the data are preprocessed and fused firstly. Then, this paper designs risk evaluation scheme of the defect and fault based on SVM for main transformer. Finally, the algorithm is implemented and the experimental results are analyzed.

Algorithm principle and data fusion
In the rapid development of power big data, the data type of power transmission equipment design is very diverse [1]. In a typical data center that does not include real-time data integration of power grid equipment environment data, there are more than 1000 database table files, the total number of data columns is more than 8000 columns, and the number of records is more than 1 billion. They have included the acquisition plans and installation data such as construction planning, ordering information [2] technical requirements, accounting supervision, information infrastructure, information engineering, factory test information of all kinds of equipment, but also contains a large amount of unstructured data such as contract information, blueprint and structure design etc.. The total amount of data is near 100TB level. The log data generated in the operation and maintenance of power transmission and transformation equipment include data such as test, repair, defect, overhaul, technical transformation, movement and return, and the total amount of data is more than 100 thousand lines per year [3]. The data of Online monitoring, such as the running state of the primary or secondary equipment, is larger. In the province's quasi real time monitoring data of transmission and transformation equipment (average 15 minutes refresh), there are more than 20 million measuring points. More than 2 billion records are generated daily, and more than 10G of data is produced daily after compressed storage. The high frequency real-time monitoring data can produce hundreds of TB level data every year [4]- [15]. Some unstructured monitoring data, such as smart substations, online video surveillance and other data, can produce nearly PB of data each year. The operating environment of power transmission and transformation equipment also affects its performance and life curve. The data include meteorological data, regional weather forecast, meteorological six factors, weather warning, typhoon, lightning situation, ice cover, GIS topographic and topographic data, pollution monitoring, remote sensing data and so on [16]- [20]. The accumulated data has been over 50TB, with an increase of about 100G order. Finally, SVM is one of the classifiers in large data machine learning algorithms, and is very suitable for nonlinear classification problems. Therefore, this paper uses SVM as a machine learning algorithm to further evaluate and analyze the equipment risk status based on data fusion analysis.

Raise and analysis of the evaluation scheme (data cleaning, SVM)
The information of power transmission and transformation equipment is huge and the type is complex. In the data preprocessing and fusion stage, firstly, according to the equipment data and the account information, the paper constructs the primary data associated with the unique ID as the primary key. Thus, it can be related to the data of monitoring, inspection and defects. On the basis of the above, the power grid data and environmental data are expanded according to the idea of facing the equipment. In each class of data, the unique ID column of the device is inserted as an association, thereby extending the grid data and performing the preprocessing. Finally, the environment data is connected to the corresponding device data sets according to the region and time, so as to form an association expansion between the device and the environment device. The preprocessed data, according to the day time slice, the data fusion for the "time equipment only ID Inherent attribute equipment monitoring data +environmental data" was recorded as X. The corresponding "whether there is a defect or fault" is denoted as y, as an association data set for the structure. Finally, the data are divided into training sets and the last 100 test sets. For high frequency monitoring data, such as the number of cross-border records, the number of low voltage alarms associated with the main transformer and other data takes the latest data as the standard, such as H2, C2H2, CO, CO2 and so on in the oil. Finally, combining the inherent data, such as device manufacturers, voltage levels, and run time, the data fusion for SVM is completed. At this point, the problem can be reduced to: Defining a straight line The distance from any point to the straight line is: For all training sets, that is, the main transformer state + defect, or the fault state, are denoted as x , y . The solution process is briefly considered as follows: On the basis of the above, first of all, this paper builds a big data computing and analysis platform based on Spark. And the machine learning is analyzed. Finally, the trained model is tested and analyzed.

Analysis of machine learning results
As described above, SVM is trained by using a training set, and the results are tested to obtain the following experimental results. According to the actual data, the C-SVC type SVM is selected as the main type. The R represents the radial basis kernel, and T represents the hyperbolic tangent kernel. The O indicates the temporal ordering, and S represents the random sort, such as SVM R-O, which represents the C-SVC+ radial kernel + time sorted data set. In the test results, ITER stands for the number of iterations that indicate the extent of the correction to the model, and nu is only the one-class SVM parameter. Obj is the minimum value obtained for the two programming solution converted into SVM file, and Rho is the constant term of the decision function. NSV is the total number of support vectors, nBSVC is the number of support vector on the boundary, and Total is the total number of nSV vectors. Finally, Accuracy represents the correct rate of testing the model by using the test set after the training. The higher the accuracy, the closer the model is to the actual situation, the more it can be used as a reference for the state assessment of the equipment.
The radial basis kernel and the hyperbolic tangent kernel have 72% and 71% accuracy in the sorted data sets. It can be seen that the power grid equipment environment fusion data can reflect the health status of the equipment to a certain extent, and it has a strong correlation.

Summary
The research on big data risk assessment of power transmission equipment defects and faults based on SVM shows that transformer manufacturers, running time, oil immersion data and other equipment attributes have a strong correlation with their failure rate. Moreover, the power grid operation data and environment data can provide reference for the prediction of equipment running state directly. This research provides a new idea for the grid equipment environment data fusion of the smart grid, and makes a new exploration for the equipment status assessment. It provides a reference for more efficient models in the future and a more accurate assessment of device status.