Daily load prediction study on SCASL and BiGRU models

In the short-term power load prediction, the environmental uncertainty will have an impact on the prediction accuracy and increase the complexity of the prediction. To deal with this problem, the paper proposes a bidirectional gated recurrent unit (BiGRU) network model for and sine cosine optimization algorithm with self-learning strategy and Levy flight (SCASL). With real power load data as a data set, high correlation parameters are selected through the Pearson correlation coefficient analysis as input, using the SCASL algorithm and nonlinear weight factor. Finally, the key time points in capturing the historical information of the load changes are established according to the optimization parameters, and BiGRU model is used to learn to predict its accuracy. The experimental results show that the MAPE (Mean Absolute Percentage Error) and RMSE (Root Mean Square Error) of BiGRU model are better than the comparison model gated recurrent unit (GRU) network model. The final results show that the prediction accuracy of the power load system has been significantly improved.


Introduction
With the increasing share of clean energy and China is on track to achieve peak carbon by 2030 and carbon neutrality by 2060 (the "double carbon" goal), the rational use of clean energy has a profound impact on energy management in companies with high energy consumption.Electricity load forecasting as part of energy management is of profound importance.Electricity loads are influenced by the external environment and are highly random [1].Comprehensive analysis of load characteristics, research on the influence of external factors such as weather and date type(holiday/working) on load and the degree of influence are needed to improve the accuracy of short-term electricity load forecasting [2].However, the non-linear relationship between electricity load and influencing factors is complex, and how to reliably store and efficiently and accurately analyze the big data on the electricity user side is an important issue facing the current electricity industry.Therefore, it is of great significance to study the analysis of big data and parallel load forecasting on the power user side [3][4].
Power load prediction models are mainly divided into two types: traditional statistical methods and machine learning-based prediction methods.These two traditional methods have the advantage of simplicity and convenience, but are not suitable for complex systems, such as nonlinear systems.Datadriven method for characterizing the external characteristics of power loads has strong practicality in observing the historical data's law of mining capacity changes.With the development of the deeplearning algorithms, a series of variants neural networks were derived from conventional recurrent neural networks (RNN) as a suitable model for processing the time series data.Such as the long-term and short-term memory (LSTM) neural network and gated recurrent unit (GRU) neural network, it is suitable for forecasting power load data with irregular and complex features [5].Literature [6] used an optimization algorithm to improve the BILSTM model and finally obtained the optimal computational parameters.

BiGRU model and positive cosine algorithm
Starting from the structure of the GRU network [5] unit, the calculation equation can be given in Equations ( 1) to (4).
  ) where: x t are the input vector at moment t, h t-1 are the hidden layer state at moment t-1, W and U are the weight matrix and bias of the corresponding gate, ( )   is the sigmoid activation function and b is the bias vector, r t and Z t are calculated by ( ) The above networks consider the association of input sequences only unidirectionally, in ordinary GRU neural networks generally only one-way correlation of input sequences is considered, which has limitations on data with strong backward and forward correlation.In this paper, a bi-directional gated recurrent unit (BiGRU) neural network is used.Figure 1 shows the BiGRU network structure.This structure can integrate the information of the load history of the grid and the future time to improve the efficiency of the model and make the forecasting process more comprehensive.The BiGRU network computes the hidden layer states independently by Equations ( 5) and ( 6) for forward and backward propagation, and finally splices the outputs in both directions by Equation (7).
where: GRU( ) is a symbolic function and is a specific operation of the GRU network; x t is the input; 1 t h   denotes the hidden layer state of the forward output at moment t; 1 t h   denotes the hidden layer state of the reverse output at moment t; h t is the hidden layer state at moment t; W t , U t are the reverse hidden layer output weights at moment t; b t is the hidden layer bias at moment t.
BiGRU uses Equations ( 1) to (7) to derive the model output data, but in the process of seeking the optimal solution parameters, the computational complexity is high and it is not easy to find the optimal parameters.Neural networks can converge to good predictions with the aid of a suitable optimization method [7].The parameter optimisation part of this paper uses the Levy flight sine cosine algorithm to optimise the parameters in the network so that the model yields predictions that are as close to the true values as possible.
Sine cosine optimization (SCA) algorithm, is a population-based intelligent optimization algorithm proposed by Mirjalili, an Australian scholar, in 2016 [8].Compared with other intelligent optimization algorithms, SCA has the advantages of simple architecture, few control parameters and high computational efficiency [9].
The sine cosine optimisation algorithm uses the oscillatory properties of the sine and cosine functions to find the optimum, and eventually converges to the optimal solution or near the optimal solution as the number of iterations increases.Due to the strong dependence of the population update on the current optimal solution, the population diversity decreases rapidly late in the iteration of the algorithm, resulting in the algorithm easily falling into a local optimum.Known unconstrained ndimensional minimization optimization problems. .
, 1,2,3, , where: x i is the i th variable to be optimized; L i is the lower boundary of x i ; U i is the upper boundary of The basic principle of SCA for solving this optimization problem is as follows: a search individual is randomly generated 1 2 3 n x x x x  ， ， ， ， in the n-dimensional search space, each search individual is a candidate solution of the problem to be optimized, the position of the ith individual is ; based on the fitness function to calculate the fitness value of each individual f(x i ), and the search individual corresponding to the best fitness value is recorded as the current optimal individual X*.

Structure of the model
The steps of SCASL-BiGRU forecasting model construction are as follows.
(1) Using the dataset, the correlation between electricity load and external factors is analyzed separately using Pearson's correlation coefficient (PCC), and finally the input parameters of the BiGRU model are selected according to the high and low factors of the coefficient impact.
(2) The SCASL-BiGRU model is selected, the main parameters are set, and then the parameters are fed into the network to start running, while the main output is recirculated as the input prediction for the next time period.
(3) For the SCASL algorithm, the sample weights need to be adjusted and the parameters in BiGRU need to be updated in real time, and the final up to conditions are two: solving the optimal solution and reaching the maximum number of iterations, respectively.
(4) After obtaining the SCASL-BiGRU results, inverse normalization is the model output values, and finally the data is analyzed to obtain the desired results and the whole prediction is finished.
In order to effectively assess the accuracy of the model, two evaluation indicators, mean absolute percentage error (MAPE) and root mean square error (RMSE), are chosen in this paper.MAPE is used to determine the accuracy of the forecasts, and RMSE reflects the actual situation of forecast errors.MAPE where: Q i is the actual value of the load on the grid; i Q  is the predicted value of the load on the grid; the indicator quantifies the error, the smaller the value of the error, the better the performance of the model.
Environmental uncertainty can play a very important role in short-term load forecasting and influence the accuracy of load forecasting [10].By pre-processing the collected data and fixing the raw data anomalies, a data foundation is laid for the subsequent construction of a data-driven forecasting model.

Data processing
Usually, in the measurement and data transmission, will cause errors, so that the original data are usually flawed, the error will lead to the accuracy of the final prediction is greatly reduced.Therefore, in order to improve the quality of the data, it is not possible to apply the measurement data directly for prediction in engineering use, and some pre-processing of the raw data is usually required [11].
(1) Data normalization: To avoid the adverse effects of different magnitudes, Equation ( 10) was used to normalize the variables to [0,1], i.e: min max min Where: x is the original load data, which x  is the load data value after normalisation, x min is the minimum value in the original load data and x max is the maximum value in the original load data.
(2) For the prediction results, inverse normalization is required to reduce the data, and by changing Equation (10), Equation ( 11) can be obtained max min min (3) Feature selection and outlier correction: Feature selection can effectively select input variables that are closely related to PV power, avoiding the interference of irrelevant variables on the model power prediction and reducing the complexity of calculation.In this paper, the Pearson correlation coefficient is used to measure the correlation between each variable and PV power.

 
In Equation ( 14), P(t) is the actual value of the load at moment t, and P a is the average value at moment t on similar days (Data collection in approximately the same time period as external conditions), b is the set normal range value, determined by according to the required accuracy.When P(t) and P a satisfy the relationship of Equation (12), it means that the point at moment t is an anomaly, and then the value at that moment needs to be corrected, and Equation (13) is the common correction formula [12].

Correlation analysis and input and output selection
In order to analyse the degree of interaction between the multiple loads and between the loads and the environment, and to effectively select the input characteristics of the multiple forecast model, correlation analysis of the multiple loads and environmental factors is required.The Electricity load data set for October-December 2020 in a region of Gansu Province was used, with October 2020 to November 2020 as the training data and December 2020 as the test data, to forecast future loads in 1h steps.The data were collected at a frequency of 15 min.The correlation analysis was carried out using the maximal information coefficient (MIC), which performs better for nonlinear data.The heat map of the influencing factors is shown in Figure 2. Based on the MIC theory, correlation analysis was carried out on the load and weather factors in the area, which need to be included as influencing factors in the input feature set for load forecasting.The correlation analysis is shown in Table 1.Based on the results of the analysis, temperature, light intensity and electricity prices were selected as input data.The input matrix can be expressed as follows: The output matrix is expressed as: y y y  Y  .

Comparison of multiple forecasting models
To validate the performance of the SCASL-BiGRU prediction models, the models in this paper were compared with the GRU model, the BiGRU model and the SCASL-BiGRU model using the same experimental dataset.In order to have reliable experimental data, 20 tests were conducted for each of the four selected models and recorded to find the final average value.The model prediction error plot can be more intuitively derived from the excellent performance between the prediction methods.As can be seen from Figure 4, the SCASL-BiGRU whole process model final result error is minimized at most time nodes, and the fluctuation is relatively small.

Conclusions
This paper constructs a SCASL-BiGRU model based on PCC.First, the input parameters are selected according to the coefficients under the PCC analysis; second, the BiGRU network model, in order to introduce the SCASL algorithm for optimization of the relevant parameters; and finally, the SCASL-BiGRU model is used for training.The main conclusions are as follows: (1) In this paper, firstly, based on the real data of a region in Gansu Province, A multi-dimensional enlistment including temperature, wet bulb, light intensity and electricity price was created, After that, Then, the PCC is used to analyse the magnitude of the load and environmental impacts, further selecting the external factors that have a large impact on the predicted value of the loads, and finally selecting the input parameters for the training model.
(2) A BiGRU model with bi-directional recurrent features is selected for training, which can improve the utilisation of past historical parameters and can deeply explore the potential features in the data.The SCASL algorithm was then chosen to optimise the hyper-parameters of the BiGRU network model, and the optimised BiGRU model was verified to have improved predictive ability.
The final result concludes that the SCASL-BiGRU model proposed in this paper improves the short-term load forecasting accuracy as compared to the load forecasting methods based on GRU model, GRU, GA, PSO and other algorithms.

( 4 )
Pearson correlation coefficient: The Pearson correlation coefficient measures the degree of correlation between each variable and PV power.The formula is calculated as

Figure 3
Figure3shows that different models have different prediction accuracies, in which the SCASL-BiGRU model predicts the most closely to the real values, and the other three comparison models predict poorly.
is the Hadamard product, i.e. the corresponding

Table 1 .
Correlation analysis of influencing factors and power.

Table 2 .
Random 20-time model average error comparison.From the results presented in Table2, it can be observed It can be concluded that the SCASL-BiGRU predictions have the lowest error compared to other models.The forecast result errors obtained from SCASL-BiGRU are much lower than those of the other models, with their MAPE reduced by 0.71%, 0.37% and 0.25%; their MAE reduced by 26.91, 23.54 and 11.61; and their RMSE reduced by 25.52, 19.09 and 16.84 respectively.The analysis results have high consistency and the lower values indicate that the model has high accuracy in prediction results and is a quality model.SCASL-BiGRU SCASL-BiGRU has the smallest error and is closer to the actual data, which ultimately shows that the SCASL-BiGRU network model works better compared to the other comparison models.