Error Evaluation Method of Capacitive Voltage Transformer Based on Improved Principal Component Analysis

The capacitive voltage transformer is an important data source for electricity trading and is easily influenced by environmental factors during operation, resulting in decreased metering accuracy. Because the voltage data does not satisfy the Gaussian distribution, the accuracy of the traditional principal component analysis method is poor when evaluating the error of capacitive voltage transformers. This paper proposes an improved principal component analysis method based on the local outlier factor. First, principal component analysis is utilized to separate the primary voltage fluctuation and error change. Then, a new statistic is established by using the local outlier factor instead of the traditional statistic as an evaluation standard, which reduces the missed diagnosis and misdiagnosis caused by the data distribution characteristics. The experimental results show that the improved method is better than the traditional method in terms of the detection rate of transformer anomaly detection, can effectively detect the abnormal change of transformer error and is more suitable for capacitive voltage transformers in operation.


Introduction
Capacitive voltage transformer (CVT) is an important data source in electricity trade settlement, and the metering accuracy is crucial to the fairness of massive electric energy trade [1,2].However, the stability of CVT is poor, and metering performance is easily degraded [3].To ensure the fairness of the electric energy trade, it is very important to accurately evaluate the operation status of CVT.
At present, the evaluation of CVT error status is mainly divided into two categories: periodic detection and online real-time monitoring.The periodic detection method is to check the error of the transformer by using the standard transformer [4].However, this method requires periodic power failure operations, and the standard transformer is inconvenient to transport.In [5], the author proposed a calibration method without power failure, but the safety of the operation is difficult to guarantee, and it still needs to be calibrated by standard transformers, which is difficult to be widely used.
The widely recognized online monitoring method is a measurement error evaluation method based on principal component analysis (PCA).In [6], the author used the stability of the voltage three-phase imbalance (VUF) between CVTs, separated the primary side electricity grid fluctuation and the measurement error signal by PCA and evaluated the operation status of the transformer by establishing Q statistic for the error signal.However, due to the non-Gaussian distribution of the process data, the evaluation effect is not ideal [7,8].In [9], the effect of VUF was reduced by performing wavelet packet decomposition (WPD) on the Q statistic.In [10], a more accurate Q statistic threshold was established by the kernel density estimation (KDE).However, these methods are still based on Q statistic, and this calculation method is not suitable for data with non-Gaussian distribution, so the improvement effect on error evaluation is not significant.This paper proposes a CVT error evaluation method based on principal component analysis-local outlier factor (PCA-LOF).By combining the idea of LOF, LOF statistic is established instead of the traditional Q statistic in PCA methods, improving the accuracy of error evaluation and reducing maintenance costs.

Basic principle of error evaluation
The basic principle of error evaluation is to use PCA to separate the primary side electricity grid fluctuation and the measurement error signal, project the primary side fluctuation signal to the principal component space, and map the error of the transformer to residual space.The modeling process of PCA is described below.
It is supposed that the sample X is the data matrix of n m  , where n is the number of data used for training, and m is the number of CVTs, in this paper, 3 m  .The sample is first standardized, and the standardized matrix X can be represented in the following form: where T TP is the principle component space, T TP   is the residual component space, denoted by X and E respectively.T is the principle score matrix, T  is the residual score matrix, P represents the principle loading matrix, P  represents the residual loading matrix.The loading matrix can be obtained by decomposing the data covariance matrix. [ where is the eigenvalue matrix, which represents the principle component information of the data.
Due to the small number of output characteristic data of CVT, this paper adopts the method of cumulative percentage of variance (CPV) and specifies 90% as the threshold based on experience.
This paper focuses on the fluctuation of the error signal, so the following analysis mainly focuses on the residual space.The Q statistic is used to indicate the deviation degree of the CVT output from the model.The projected Q statistic in the residual space is computed as follows.The Q statistic and Q threshold are calculated as follows: ( )( ) where . When the Q statistic is less than C Q , it indicates that the state of the transformer is normal, otherwise, it indicates that the transformer is abnormal.

Improved method for error evaluation
The Q statistic in the traditional PCA error evaluation method is to determine the limit according to the distance between the projection of the sample in residual space and the center of the residual space.To enhance the rationality of the confidence region, the LOF method is utilized to detect the projection of the data in residual space, and the LOF statistic is established to replace the Q statistic of the traditional detection method.

Local outlier factor
The LOF method is an anomaly detection method that is suitable for situations where the data distribution is unknown.Dataset X is decomposed by PCA, the distribution characteristics of the projection of the error signal are unknown, and the LOF algorithm can be used to establish the LOF statistic on the projection to evaluate the error state.The LOF statistic is represented by LOF.
For the training set T and the test sample t , the following is the calculation method of the local outlier factor of t .
First, the K points are found closest to t in T and sorted as 1 { ,..., , , }(1 ) according to the distance, and the k-distance of t is listed as follows: ( ) Then by computing the local reachability density of t , 1 ( ) -( , ) where -( , ) max{ } ( ), . Then the local outlier factor of t is shown as follows: where ( ) lof t is the LOF statistic of t .
( ) lof t reflects the outlier degree of t relative to T .The bigger the ( ) lof t is, the greater the deviation of t is from T , that is, t is identified as an outlier.For the value of K, it should not be too small.Generally, it is required that K is not less than 10.
Since the distribution of the LOF statistic is unknown, KDE can be used to calculate the threshold of the LOF statistic.This method can reflect the distribution of data more truly by analyzing the actual data instead of the unreliable assumption.Given a set of random samples 1 2 [ , , , ] n X x x x   , the probability density function is computed in the following form.
In this paper, ( ) K  is selected as the Gaussian kernel function, and h is obtained when the mean square integral error function reaches its minimum.
After   f x is accurately estimated, the corresponding probability distribution function ( ) The LOF statistic threshold is obtained in the following form.

The evaluation process for CVT metering error
The error evaluation flow chart is shown in the figure below.The CVT error evaluation based on the improved method is divided into two stages.First, the sample data of normal operation is selected as historical data to train the PCA model, and the LOF statistical threshold is calculated by kernel density estimation.When the new sample data arrives, the data is standardized and the LOF statistic is calculated, compared with the threshold of the PCA model.If it is less than the threshold, it is regarded as normal data.Once the LOF statistic is higher than the threshold and if it is a wild point, the wild point algorithm is called for processing, otherwise, it is determined that the transformer is abnormal.

Data acquisition and analysis
By using the output information of a potential transformer as the signal source, sampling is carried out for a period of 4 days.The data point number is 5760.For the convenience of testing, the output is reproduced to simulate the real voltage signal.By using standard PT and simulated CVT to build a simulation experiment platform, which can simulate various situations of CVT in operation, the threephase voltage amplitude is shown in the figure: The ratio error is obtained by comparison.
Table 1.The ratio error of three-phase CVT.

CVT
Phase A Phase B Phase C Ratio error/% 0.0154 -0.092 0.107 After the chi-square test at the 0.05 significance level, the voltage of all three phases does not fit the Gaussian distribution, which reduces the accuracy of the traditional PCA method in the actual monitoring process.

Error experiment
The effectiveness of the improved method should be verified by experiments on the simulation experiment platform.Keeping the data of the first day unchanged as normal samples and adjusting the data of the next three days for testing, the specific parameters are as follows.The control limits obtained by the method based on the Q statistic cannot be adjusted according to the change of data distribution characteristics, which leads to the unreasonable setting of control limits of the Q statistic and includes some abnormal samples.The control limit based on LOF statistics is obtained in terms of distribution density, so the confidence region set is more reasonable, which can effectively distinguish normal samples from abnormal samples.
To verify the improvement effect on the transformer problem, the results of this method are compared with other results of the PCA-KDE method and PCA-WPD method under this experimental data, and the results are as follows.The improvement method based on the Q statistic has a small effect on the improvement of the model, while the method based on LOF statistics has a significant improvement in the fault identification rate compared with the traditional PCA method, which can accurately identify the transformer error.

Conclusion
To improve the sensitivity and accuracy of the PCA method in CVT error evaluation, a PCA-LOF-based CVT error evaluation method is proposed in this paper.The LOF-based statistic is established to replace the Q-statistics in the traditional PCA method.Experiments show that the improved method is better than the traditional PCA method in the detection rate of transformer anomaly detection, and has a good recognition rate for abrupt errors that occur in practical applications.This method can well meet the accuracy requirements of the transformer.

Figure 1 .
Figure 1.The flow chart of error evaluation.
adding 0.15% error variation in the C-phase CVT test sample and simulating the capacitive breakdown fault of the CVT, the ratio error has a sudden change.According to the evaluation steps mentioned above, a statistical monitoring chart is established, and the statistical monitoring chart of the traditional Q statistic method is used as a comparison, as shown in Figure3.

Figure 3 .
Figure 3.The statistics monitoring chart of three-phase CVT.After the CVT fault occurs, a large number of statistics exceed the threshold.It can be considered that the transformer is abnormal, which is consistent with the actual situation.Methods based on the LOF statistic are more sensitive to abnormal changes.The confidence regions set in the residual space by the two methods are as follows:

Figure 4 .
Figure 4.The confidence region in residual space.

Table 3 .
Comparison of error recognition rate.