Method for analysing of wind power wave nature based on kernel density estimation

It is beneficial for improving the accuracy of wind power output prediction by analysing and mastering the inherent laws of the fluctuation characteristics of wind power output, guiding the power grid dispatching department to reasonably arrange power generation plans, and improving the economic efficiency of system operation. In order to characterize the probability density distribution of wind power output fluctuation, two adaptive bandwidth kernel density estimation models are established by correcting the fixed bandwidth obtained from the empirical method and unbiased cross-validation method respectively, and then the two models are combined and optimized, and ultimately, the probability density distribution model of wind power output fluctuation based on Hybrid Adaptive Kernel Density Estimation (HAKDE) is established. A variety of probability density distribution models are used to fit the wind power output fluctuations at different spatial and temporal scales in a province in North China, and the example results show that the hybrid adaptive kernel density estimation model has the best fitting effect, thus verifying the effectiveness of the hybrid adaptive kernel density estimation model.


Introduction
With the depletion of global fossil energy and the proposal of dual-carbon strategy goal [1][2], the utilization of renewable energy has received attention from all walks of life, and wind energy, with its advantages of low cost and cleanliness, has been widely applied and developed in countries around the world.However, due to the characteristics of wind power, such as volatility and intermittency, with the continuous growth of wind power access capacity, which makes the grid scheduling of wind power difficult to aggravate, the accurate description of the volatility of wind power output is particularly important.
In recent years, statistically based studies of the probability density distribution of the fluctuating amount of wind power output have been widely used.Depending on whether they depend on the choice of parametric estimation model or not, the methods for fitting the probability distribution of the number of fluctuations in wind power output are mainly categorized into parametric and non-parametric methods.
Parameter method is: assuming that the wind power output fluctuation meets a specific probability distribution, according to the great likelihood estimation method, the least squares method and other estimation methods to determine the parameters of the distribution, to get the final wind power output fluctuation probability density distribution model.Literature [3] first analyses the distribution law of wind power output plunge, and then uses the generalized Pareto distribution model to fit its probability distribution characteristics.Literature [4][5] uses a beta distribution to fit wind power prediction errors.
In order to solve the problem that the fitting of a single distribution is not precise enough, the literature [6] used the mixed t-distribution and mixed Gaussian distribution to describe the fluctuation characteristics of wind power output, respectively.
The parametric method usually assumes that the wind power output fluctuation amount obeys a certain known probability distribution model and obtains the model parameters through certain algorithms, but when the pre-assumed model does not match well with the actual distribution, the error may be very large, and the data-driven non-parametric method based on the data-driven method provides a new way of thinking and a solution for this problem.In characterizing the data distribution, the nonparametric Kernel Density Estimation (KDE) does not depend on the choice of parameter estimation model [7], does not require any assumptions about the form of the distribution of the fluctuating amount of wind power output and is able to mine the statistical information in the historical data.Literature [8] proposes a prediction method for long-term fluctuation characteristics of wind power output using KDE method and GA-SVM.Literature [9] firstly by extracting the wind power output fluctuation quantity, then combined with KDE method for modelling and adaptive improvement.
Based on the above research on the characteristics of probability density distribution of wind power output fluctuation, this paper adopts empirical method and unbiased cross-validation method to determine the fixed bandwidth of kernel density estimation, and in order to improve the situation of low local fitting excellence, the bandwidth is targeted to amend the bandwidth, and two adaptive bandwidth kernel density estimation models are set up, and then the two models are combined, and a hybrid adaptive kernel density estimation model is proposed, which is used to fit the probability density distribution of wind power output fluctuation in different group sizes and different sampling timescales respectively, and through the comparison of the fitting indexes, the accuracy of the model proposed by this paper is verified.

Probability distribution of the amount of wind power output fluctuations
China's wind power station has a wide range of distribution, wind power output has unpredictable volatility, the amount of fluctuations can be more intuitive characterization of wind power station power generation smooth situation, the smaller the value characterizes the more stable wind power output, the smaller the impact of grid-connected operation on the system, and vice versa, the greater the impact.
Figure 1 shows the time series graph of wind power output in a province in North China, and the first-order differential transformation is performed to obtain the time series of first-order differential fluctuation volume of wind power output, as shown in Figure 2. The first-order differential fluctuation volume expression is shown in equation (1).
Where: ΔPt is the first-order differential fluctuation of wind power output at time t; P t is the wind power output at time t.

Adaptive bandwidth kernel density estimation model
Let x 1 , x 2 , ..., x n be n samples of wind power output fluctuations, using Gaussian function as the kernel function to get the expression of kernel density estimation function of wind power output fluctuations, see Equation ( 2).
Where: ˆ( ) h f x is the kernel density estimation function of wind power output fluctuation, K(.) is the kernel function, h is the bandwidth, x m is the mth sample value of wind power output fluctuation, and n is the total number of samples.
In KDE, the selection of bandwidth determines the smoothness of the fitted curve, the larger the bandwidth the smoother it is, but the worse its fitting effect.Currently, the main methods to determine the fixed bandwidth are: rule of thumb, unbiased cross-validation method [10], see the following equation (3)(4).
(1) Rule of thumb: Where: h 1 is the fixed bandwidth obtained using a rule of thumb; σ is the normally distributed standard deviation of the wind power output sample, and n is the total number of samples.
(2) Unbiased cross-validation method: Where: argmin is the independent variable that minimizes the function value; h 2 is the fixed bandwidth obtained by using the unbiased cross-validation method; K(v)=∫K(u)K(v-u)du is the convolution kernel function derived from K(.).
The KDE models obtained by the two fixed bandwidths h 1 and h 2 obtained by the above two methods have a low local goodness of fit, so the two fixed bandwidths are derived, and then the bandwidths h 1 and h 2 are modified in a targeted way, and the adaptive bandwidths of the two methods are solved respectively, and the improvement strategies are as follows: 1) After obtaining a fixed bandwidth h 0 , the whole sample interval is divided into k subsample intervals l 1 , l 2 ,...,l k that do not overlap with each other, and the model error of the subsample intervals is discriminated, and for any subsample interval if it does not satisfy the following equation, it is said that there is a problem of low goodness-of-fit in that subsample interval.
Where: χ 2 l,i is the χ 2 test statistic for the li subinterval; χ 2 m-1 is the χ 2 distribution with m-1 degrees of freedom under α at the level of significance, and α is taken to be 0.05 in this paper.dividing the l i subintervals into group s, the mathematical expression for χ 2 l,i is: Where: t ij is the actual frequency of samples in the jth group of the l i subinterval; d ij is the number of samples in the group; and F ij is the theoretical probability value of the group.
2) To improve the goodness of fit, a local bandwidth factor λ l,i is introduced in the interval of lower goodness of fit, where λ l,i can be obtained through the probability densities at each sample point obtained based on the fixed bandwidth [11]: where: d i is the number of samples in the l i subinterval.The adaptive bandwidth h l,i is obtained by multiplying the local bandwidth factor λ l,i with the fixed bandwidth h 0 : 3) Modify Eq. ( 2) to Eq. ( 9) to obtain the adaptive kernel density estimation model:

Hybrid adaptive kernel density estimation models
In this paper, considering the advantages of the above two bandwidth solving methods, the two adaptive kernel density estimation functions obtained above are combined to establish a hybrid adaptive kernel density estimation optimization model.Since the probability density functions estimated with different bandwidths have different errors, and these errors can cancel each other out by weighted combinations, the hybrid adaptive kernel density estimation, f HAKD (x), more accurately fits the probability density distribution of wind power output fluctuations.The specific expression for f HAKD (x) is given in equation (10).
) Where: 1 ˆ( ) f x is the adaptive bandwidth kernel density estimation model obtained by fixing the bandwidth h 1 by the modified empirical method; 2 ˆ( ) f x is the adaptive bandwidth kernel density estimation model obtained by fixing the bandwidth h 2 by the modified unbiased cross-validation method; β 1 , β 2 are the weight coefficients and are satisfied:

Modelling process of probability density distribution of wind power output fluctuations based on HAKDE
The flow chart of the probability density distribution modeling method of wind power output fluctuation amount proposed in this paper is shown in Figure .4. Firstly, the empirical method and the unbiased cross-validation method are used to determine the fixed bandwidth of the kernel density estimation model, which are h 1 and h 2 , respectively.Since the kernel density estimation model obtained after determining the fixed bandwidth using the two methods mentioned above has the situation that the model has a bad fitting effect in a certain locality, therefore, after the two fixed bandwidths are found out respectively, the bandwidths are then corrected in the sub-interval, that is, the adaptive bandwidths of the two methods are obtained, and finally, the hybrid adaptive kernel density estimation model is obtained by weighted combination.adaptive bandwidths, and finally, the hybrid adaptive kernel density estimation model is finally obtained by weighted combination.

Data description
In order to verify the validity and accuracy of the probability density distribution model of wind power output fluctuations based on HAKDE proposed in this paper, the wind power output data of wind farms in a province in North China, with a total installed capacity of 4764 MW, are analyzed as samples and normalized.

Fitting evaluation indicators
In order to evaluate the goodness-of-fit of the probability density distribution function, this paper adopts three evaluation indexes to verify the validity of the model, namely: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination (CoD), R-square.The smaller the RMSE and MAE are, the more accurate the model is, and the closer the R-square is to 1, the better the model is fitted.is more accurate, and the closer R-square is to 1, the better the model fit is.
The formulas for each of the three evaluation indicators are as follows: Where: i = 1, 2, ..., n; n is the number of sample sequences; y(x i ) is the longitudinal coordinate of the ith histogram;  ( ) f x is the function value corresponding to the fitted probability density function; and y(x) is the mean of the values of the longitudinal coordinates of the histogram.

Characterization of the probability density distribution of wind power output fluctuations on different time scales
In order to analyse the probability density distribution characteristics of wind power output fluctuations under different time scales, the total output of wind farms in the province is taken as the object of study, and the histograms of the probability density distribution of wind power output fluctuations in the province for 5-minute and 15-minute sampling intervals are counted respectively, and the wind power output fluctuations in the province under different sampling intervals are modelled by using the modelling methods proposed in this paper and the normal distribution, Logistic distribution, mixed Gaussian distribution, and traditional kernel density estimation, respectively.
The comparison of the probability density curves of the fluctuating amount of wind power output in the province under the 5min sampling interval is shown in Figure 5.The results of the error operation are shown in Table 1.As can be seen from Figure 5, the 5min level wind power output fluctuations are concentrated in the range of smaller values, most of the wind power output fluctuations are distributed in the range of -0.25~0.25p.u., and its probability distribution presents the "thick tail" characteristics, and the whole has symmetry.
Table 1 1, it can be seen that among the above distributions, the model in this paper has the best fitting effect, and its three indicators are optimal.And the average absolute error and root mean square error of the normal distribution, Logistic distribution and mixed Gaussian distribution are higher compared to the other two distribution models, and the coefficient of determination is lower, which shows that if the a priori distribution is chosen incorrectly, it is difficult for the parameter estimation method to achieve better modelling accuracy.
Normal distribution, Logistic distribution, mixed Gaussian distribution, traditional kernel density estimation, and the model in this paper were used to fit the probability density histogram of the fluctuating amount of wind power output in the province under the 15-min sampling interval, as shown in Figure 6, respectively.
As can be seen from Figure 6, the wind power output fluctuations are symmetrically distributed with 0 as the axis of symmetry, and the maximum value of fluctuation is about 0.05.The results of the fitting accuracy evaluation are shown in Table 2.
From Table 2, it can be intuitively concluded that the hybrid model in this paper fits the histogram of the probability density of wind power output fluctuation quantities to a higher degree compared to the other three distribution functions.3.
Combined with Figure 7 and Table 3, it can be seen that the RMSE and MAE indexes of this paper's model have the smallest values, which are almost close to 0, then it indicates that this paper's model fits the probability density of a single wind farm most accurately, and the R-square indexes of this paper's model are improved by 1.9% compared with the traditional kernel density estimation and are the closest to 1, which indicates that this paper's model fits the better degree of optimization.As can be seen from Figure 8 and Table 4, applying this paper's model to the wind farm group, its fitting effect is still optimal, RMSE and MAE compared with the traditional kernel density estimation, the value of a substantial reduction in the R-square index compared to the 1 is only a difference of 0.0018, which is not much difference, the above analysis can be seen, the model of this paper is fitted to the wind farm group of the probability of density of the degree of accuracy of the wind farm group is still the optimal.As the size of the cluster gradually increases, the amount of fluctuation in wind power output decreases.The main reason for this phenomenon is that when the same wind passes through several wind farms at different times, the outputs of the wind farms are complementary in time sequence.

Conclusion
In this paper, for the existing problems in the accuracy of wind power output prediction in China, a multi-temporal and spatial scale wind power fluctuation characterization model based on hybrid adaptive kernel density estimation is proposed, and the following conclusions are obtained: (1) Aiming at the distribution characteristics of wind power output fluctuation under multi-temporal and spatial scales, this paper proposes a method to fit the probability density distribution of wind power output fluctuation based on hybrid adaptive kernel density estimation, which is able to effectively improve the fitting excellence, and has higher applicability and accuracy.
(2) With the growth of sampling time interval, the fluctuation of wind power output shows an increasing trend, and the hybrid adaptive kernel density estimation model is used to fit the histogram of probability density distribution of wind power output fluctuation under different sampling time intervals with better effect.
(3) With the expansion of cluster size, the fluctuation of wind power output shows a decreasing trend, and the hybrid adaptive kernel density estimation model has a better fitting accuracy for the fluctuation of wind power output of wind farms with different cluster sizes.

Figure 1 .
Figure 1.Time series of wind power output of a province in North China.

Figure 2 .
Figure 2. Time series of first-order differential fluctuations of wind power output in a province in North China.The wind power output fluctuation amount of probability distribution statistics, drawing probability distribution histogram, and then available probability density distribution function describes the wind power output fluctuation amount of change law, for the exhaustive use of wind power to provide a theoretical basis.Figure3shows the histogram of the probability distribution of the first-order differential fluctuation volume of wind power output in a province in North China.

Figure 3 .
Figure 3. Histogram of the probability distribution of the first-order differential fluctuations of wind power output in a province in North China.

Figure 4 .
Figure 4. Modelling process for probability density distribution of wind power output fluctuation.

Figure 5 .
Figure 5.Comparison of different probability density curve fits at 5min sampling interval.

Figure 7 .
Figure 7.Comparison of different probability density curve fits for a single wind farm.

Figure 8 .
Figure 8.Comparison of different probability density curve fits for wind farm clusters.

.
Comparison of evaluation indexes of probability density distribution models under 5min sampling interval.

Table 2 .
Comparison of evaluation indexes of probability density distribution model under 15 min sampling interval.

Table 3 .
Comparison of evaluation indexes of probability density distribution model for single wind farm.

Table 4 .
Comparison of evaluation indexes of probability density distribution model of wind farm group.