CVT error prediction method based on improved attention mechanism transformer model

In order to enhance the performance of capacitive voltage transformer (CVT) error prediction, we propose a transformer model (AM-transformer) error prediction method based on an improved attention mechanism. The method starts from both input data and model improvement, adopting Pearson’s correlation coefficient to calculate the environmental factors that have a greater impact on the CVT measurement error, and enhancing the ability of local learning of the attention mechanism by adding causal convolution and data segmentation operations. The experimental results show that the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE) of the method reach 0.016, 4.21% and 0.021, respectively, realizing a high prediction accuracy.


Introduction
Capacitive voltage transformers (CVTs) are widely used in power systems because of a series of advantages such as high insulation reliability, low price, small size, and less likely to generate system resonance [1][2] [3].However, in the actual operation of CVT, the error measurement is susceptible to the influence of the acquisition principle and the environment is prone to deviation over the limit, therefore, how to predict the trend of the error change of the CVT in order to forewarn the possible risks is an important research topic.Li [4] deeply analyzes the physical structure and working principle of CVT, and studies the influence mechanism of its own electrical parameters and external working environment on its accuracy.Zhang [5] established an error prediction model based on the Q-ARMA model on the basis of CVT measurement error state assessment.Lei [6] established an error perturbation model through the a priori knowledge in the field of the power grid, on the basis of which an error combination prediction method based on LightGBM and LSTM was proposed.

Overall framework
The overall framework of the method based on the AM-transformer model proposed in our paper is shown in Figure 1.Firstly, the Pearson correlation analysis was performed on the CVT historical measurement error data and the environmental influencing factors to obtain the dominant factor quantities affecting the CVT measurement error.Then, the transformer model based on the improved attention mechanism was utilized to encode and decode the CVT measurement error and the environmental dominant influencing factor quantities to derive the final prediction results.

Pearson correlation coefficient
Pearson Correlation Coefficient (PCC) is widely used to measure the degree of linear correlation between two variables [7].We adopt the PCC to calculate the environmental influences that are more highly correlated with the CVT measurement error as inputs to the subsequent model, with the following formula: where X represents the CVT measurement error data; Y represents the environmental factor data; X is the mean value of X , and Y is the average value of Y ; r represents the calculated correlation coefficient, with a range of [-1, 1], where positive and negative values represent the relationship between the represent the relationship between the two variables as positively correlated or negatively correlated, and the larger absolute value represents the stronger correlation.Table 1 lists the correlation coefficients of six common influencing factors with the CVT error.It can be seen that the correlation coefficients of temperature, three-phase amplitude imbalance, and zero-sequence voltage are larger in absolute value, and therefore they are used as inputs to the subsequent network along with the error data.

Transformer network
In our paper, we use a transformer network based on an improved attention mechanism which mainly consists of an encoder and a decoder as shown in Figure 2.Both the encoder and decoder contain an improved multi-head attention module, normalization layer, and feed-forward neural network.The Transformer model uses a position coding strategy to obtain the relative position messages in the input sequence, which is achieved by the attention mechanism to pay attention to various detailed information on diverse subspaces.The CVT error data and the affect factor data analyzed by Pearson's correlation analysis are indicated as , respectively.the encoder,

Improved multi-head self-attention
The general transformer model [8] attention mechanism learns the dependencies between time points in a time series by pointwise multiplication, but ignores local features in the sequence.We perform causal convolutional embedding of temporal information in front of the multi-head self-attention layer to enhance the perception of local data, and replace the point-by-point computation with segmented computation [9] [10] to capture comprehensive semantic representations not available in single-point computation, As shown in Figure 2. Specifically, the causal convolution a is performed on the input data i X with a filter of f .The causal convolution at i t x is: ; l is the filter size.Subsequently, a segmentation operation is performed on .We apply the multi-head self-attention for i p z as follows: where i head H is the output of the multi-head self-attention; is the number of heads of multi-head self-attention;

Experimental settings
Our experimental data were collected from the CVT monitoring data of a substation for the period from September 2022 to January 2023, including the CVT ratio difference data and the data of each environmental parameter.The experimental data is divided into training set, validation set and test set in the proportion of 6:2:2.Our experimental environment is built by Windows 10, the deep learning framework is Pytorch 1.8.1, and the programming language is Python.The initial learning rate of the model was set to 0.01 and the batch size was 16.

Evaluation indicators
We selected mean absolute error (MAE), mean absolute percent error (MAPE), and root mean square error (RMSE) as the evaluation metrics with the formulas shown below:

Analysis of experimental results
The training loss values and validation loss values of the AM-transformer model are shown in Figure 3, from which it can be seen that the model has basically converged after many rounds of iterations.Figure 4 shows the prediction results of the model on the test set.It can be seen that the AM-transformer model is able to fit the real data better, indicating that the AM-transformer model is able to effectively mine the implicit relationship between the environmental covariates and the measurement error, and learn the temporal representativeness in the data.

Comparative experiments
In order to verify the prediction effect of the model proposed in our paper, we selected several common mainstream deep learning prediction algorithms for comparative experiments, including transformer, GRU and LSTM models.The experimental results are shown in Table 2, from which it can be seen that compared to the transformer, GRU and LSTM models, our model has lower MAE, MAPE, and RMSE values and shorter running time, which indicates that the overall performance of the method is better in the timing prediction task.Therefore, it has certain practical applications and significance.

Conclusion
In our paper, a new CVT error prediction method is proposed, which is based on the characteristics of CVT operating data, using Pearson correlation analysis to screen out the data with high correlation.At the same time, the causal convolution-data segmentation multi-attention module is introduced to the traditional transformer model to better learn the local characteristics and comprehensive semantic information in the time series, which can reduce the time complexity of the operation, and improve the accuracy and efficiency of CVT prediction.The experimental results show that each evaluation index of the method has certain advantages over other models.Therefore, this method can provide a reference for the operation and maintenance of an electric energy measurement system.

Figure 1 .
Figure 1.The overall framework of CVT error prediction.
i X first performs input embedding plus the location coding information, calculates the attention scores among the sequences by improving the attention mechanism, introduces different weighting information for the model, and then obtains different coding vectors after the nonlinear transformation mapping.The decoder retrieves the coded vector and finally generates the output prediction sequence true value and i y ˆ denotes the predicted value; smaller values of the three assessment metrics represent better predictions of the model.

Figure 3 .
Figure 3. AM-transformer model training set and validation set loss values.

Table 1 .
Correlation coefficients of each influencing factor with error.

Table 2 .
Evaluation indicators for different comparison models.