Research on new energy power prediction technology based on privacy protection

New energy power prediction is an important part of the transition process from the traditional power system to the new power system. How to improve the power prediction accuracy while ensuring that data privacy is not leaked is an issue that needs to be focused on. Based on this, this paper constructs a new energy power prediction model integrating NGBoost and LSTM by screening the optimal feature sequences as model inputs, then encrypting the transmission aggregation process of model parameters and finally testing and evaluating the scheme based on a real data set. Experiments show that the scheme proposed in this paper not only improves data confidentiality to a certain extent compared with a single prediction model, but also the model is characterized by robustness and high prediction accuracy.


Introduction
Vigorously developing new energy is a necessary way to build a new type of power system and promote the green and low-carbon transformation of energy [1].Due to its clean and low-carbon advantages, new energy is widely used in electric power, transportation and heating, which is of great significance to promote the sustainable development of the environment.With the increase of the proportion of new energy, the traditional energy management technology in the field station makes it difficult to support the flexible control of new energy-controllable resources on a large scale [2].The new energy field station's controllable resources are of many kinds, huge quantities and different interfaces [3].The disorderly access to new energy seriously affects the data privacy security and stable operation of the new power system.It is difficult to realize the accurate prediction of new energy power.
Aiming at the above problems, relevant scholars have launched research from the perspectives of intelligent regulation and power prediction.
In [4], the impact of new energy large-scale connections on power stability is analyzed and some suggestions are put forward.In [5], the key technologies in new energy development from the aspects of new energy load assessment and power system security are described.In [6], artificial intelligence technology on the basis of physical methods is integrated to construct a power prediction model to realize the accurate prediction of new energy generation.In [7], on the basis of source-network-loadstorage, new energy regulation technology is optimized and improved to improve the intelligent regulation capability of new energy.The above research can improve the accuracy of power prediction to a certain extent, but there are still problems such as a single prediction model, weak robustness, and low prediction accuracy [8].Based on this, this paper constructs a new energy power prediction model under data privacy protection to improve power prediction accuracy under the premise of ensuring data privacy.

Scheme design
In this scheme, we first derive the optimal feature sequence for local model training by calculating the similarity.Then, we construct a power prediction model integrating NGBoost and LSTM to overcome the defect of low prediction accuracy of a single model.At the same time, we encrypt the parameters of the local model, such as weight matrices, when they are transmitted back to the service end to prevent the model parameter information from being stolen or tampered with and to improve the robustness of the prediction model.

Feature selection
For a given dataset  × , columns 1 to  − 1 are the feature sequences.Column  is the power sequence denoted as .The similarity is calculated and the best feature is selected.
In order to avoid large differences in data distribution due to environmental factors, the data need to be normalized, where   is the original data and  , and  , are expressed as the maximum and minimum values of the features in the jth column respectively.
Next, the difference between the elements of the feature sequence and the power sequence ∆  are calculated.
The similarity coefficient   is calculated.
The similarity   is calculated.
Finally, based on the size of   , the best features are selected as input features to the model.[9].Compared to regular recurrent neural networks, LSTM adds input gates, forgetting gates and output gates.The LSTM architecture is shown in Figure 1.

NGBoost model.
NGBoost is a gradient boosting algorithm based on a probabilistic model that introduces a probabilistic model, takes the logarithm of the likelihood function as the loss function and uses a probability distribution to calculate the average prediction value [10], which mainly consists of the base learner and the probability distribution.The prediction principle is: • The logarithm of the likelihood function is calculated as a loss function by using a probabilistic model which is denoted as (  ,  ̂) to measure the model prediction performance, where  and  ̂ denote the true and predicted values of the model respectively.• The loss function is constructed and the gradient:  = ∇  ̂(  ,  ̂) is computed.
• Using the natural gradient method, the gradient size and direction are computed with the natural gradient ℎ =  −1 , where  is the Fisher information matrix.• The base learner ℎ() is trained by using the negative gradient of the current sample and the direction of the natural gradient.• The model predictions are updated by accumulating the predictions of the current model with the predictions of the newly trained base learner:  ̂+1 =  ̂ + ℎ(), where  ̂+1 is the final prediction and  is the current number of iterations.

NGBoost-LSTM model.
Based on the given dataset, the prediction results of NGBoost model and LSTM model are computed separately as shown in Figure 2. Finally, the weighted combination is used to compute the final prediction results and the computation process is as follows: • The error between the true and predicted values of the LSTM and NGBoost models are calculated to obtain the error matrix  where   and   denote the difference between the predicted and true values of the LSTM and NGBoost models respectively. denotes the total number of samples.
• The weight matrix W is calculated by the Lagrange multiplier method:

Experimental environment
This experiment was conducted in Python (3.9) + TensorFlow (2.13).The dataset used is the dataset of the National Energy Rising PV Power Prediction Competition, which is recorded from January 1, 2017 to December 31, 2018 with one sampling point recorded every 15 minutes.In order to make the final prediction effect more obvious, the training process is read once every 1000 sampling points.The programs in this paper are all based on this dataset for testing and evaluation.

Evaluation of indicators
• MSE • MAPE where m is the number of sampling points in the dataset; (),   () and   () denote the average of the model predicted values of real, true and real values respectively.

Analysis of results
Based on the previously described dataset, the scheme of this paper is tested and the results are obtained as shown in Table 1.
Table 1 0.9580 0.9411 0.9795 The above table shows that the three evaluation metrics of NGBoost-LSTM perform the best in all three model species.From the  metric, it can be seen that both NGBoost model and NGBoost-LSTM perform better and the difference between NGBoost and NGBoost is only 0.58.From the  metric, it can be seen that there is a certain gap between the NGBoost model and the other two.From the  2 metric, it can be noticed that the NGBoost-LSTM model is the last in terms of effect, the NGBoost model effect is next and LSTM model effect is the worst.Therefore, comprehensively, the NGBoost-LSTM model combines the advantages of a single model which is greatly improved.The comparison of the predicted and true values of power is shown in Figure 3 for the three models.As can be seen from the figure, both the single LSTM and NGBoost models have a certain degree of deviation compared to the real value.The predictive effect of the NGBoost-LSTM model labeled using red squares can be clearly seen almost completely covering the real value indicated by blue dots.The effect of the predictive model introduced in this paper can be seen to be better than the single predictive model.

Conclusion
In order to solve the problems of low prediction accuracy and data privacy leakage of traditional power prediction model, this paper, from the perspective of privacy protection, firstly, selects the optimal feature sequence by calculating the similarity degree, then constructs a power prediction model integrating NGBoost-LSTM to improve the prediction accuracy of the model.At the same time, it encrypts the transmission of the model parameters to reduce the risk of data privacy leakage and finally conducts a test in a real dataset.The performance is good.

Figure 1 .
Figure 1.Basic unit of LSTM network.Where   denotes the input data value at time ;   and  −1 denote the state of the basic unit of the network at time  and  − 1, respectively; ℎ  and ℎ −1 denote the state of the hidden layer at time  and  − 1, respectively.The results obtained from the computation of the oblivion gate, the input gate, and the output gate are denoted as   ,   and   respectively.

2. 3 .
Model parameters return and update • Each local model encrypts the weight matrix  obtained after training and sends it back to the server:   (,   ,   ), where   is the public key of the local training participant,   is the prediction value of the LSTM model and   is the prediction value of the NGBoost model.• The server side decrypts the weight matrix ,   and   using   , where   is the private key of the local training participant.• The server side calculates the final weight matrix:   = [  − ]  to update the global power prediction model.• Global power prediction model prediction:  =    − +    − .

Figure 3 .
Figure 3. Power Prediction Comparison Chart.The comparison of the predicted and true values of power is shown in Figure3for the three models.As can be seen from the figure, both the single LSTM and NGBoost models have a certain degree of deviation compared to the real value.The predictive effect of the NGBoost-LSTM model labeled using red squares can be clearly seen almost completely covering the real value indicated by blue dots.The effect of the predictive model introduced in this paper can be seen to be better than the single predictive model.
. Evaluation of power prediction results