Short term wind power scenarios forecast based on multivariate normal distribution

According to the fact that the increase of wind power penetration rate causes the influence of wind power’s randomness and fluctuation on the power grid, the single short-term wind power point prediction often cannot meet the needs of power grid risk assessment and decision-making. In this paper, we first calculate the theoretical probability model of each wind power forecast box by the end function in MATLAB, and then use the exponential covariance function expression to determine the best covariance matrix corresponding to the dynamic scenarios, and determine the multivariate normal distribution model of wind farm output obedience at multiple connected moments; For each predicted moment of the wind power point prediction value of the wind belongs to the prediction box, we direct sample random vector which obey multivariate normal distribution to form the wind power dynamic scenario. After a simulation experiment on a real wind farm, the results show that the scenario set considering wind power fluctuation at different time scales can cover the measured wind power curve and the reliability of the method is proved.


Introduction
With the increasingly prominent problem of global environmental destruction and energy shortage, the growth and utilization of renewable energy sources have attracted widespread attention. Based on the rapid rise of wind power generation and its high degree of randomness and intermittency [1], the impact of large-scale wind power grid integration on the power grid has become more and more obvious, causing the situation of "disposal of wind" in China to be very tense. Wind power forecasting is the most essential and critical technology for improving the operation of large-scale wind power access systems, which we all agree with today. Therefore, accurately predicting the output of wind farms has important value for stable and economical operation of power grids with large wind farms.
The research and application of wind power forecast in China in the past 10 years have achieved a breakthrough from scratch [2]. Currently, there are two major types of prediction methods for wind farm output: physical methods and statistical methods. The forecast forms mainly include point prediction, probability prediction, and interval prediction. The literature [3] uses the improved particle swarm optimization to optimize the parameters affecting the regression function of the least squares support vector machine, and forms a novel wind farm output point prediction model. The literature [4] proposed a method to predict the short-term output probability of wind farms based on sparse Bayesian learning that can actively prevent over-learning and it is not necessary to set the penalty factors for the equilibrium experience risk and confidence intervals in SVM. The literature [5]  out statistics on the probability density function of the prediction deviation in each power interval, and proposes a confidence interval prediction strategy with a certain level of significance based on the distribution of the predicted deviation distribution. Although the discussion on forecasting methods of wind farm output is more adequate, each single forecasting model has more or less problems [6]. The wind power point prediction is only a conclusive figure of the wind farm output that meets certain accuracy requirements at a certain moment, but it cannot give the possibility that the number may occur and the possible fluctuation bounds; Although probabilistic prediction gives information about the measured probability distribution that approximates the distribution of a certain probability model, it cannot reveal the possible change areas of the prediction result; Interval prediction although given the upper and lower limits of wind power to meet a certain confidence level, it cannot describe the wind power fluctuations at different times. Moreover, they all share a common problem. They do not consider that there is a correlation between wind powers in successive periods. This correlation shows that wind power does not arbitrarily fluctuate, and the amplitude and range of wind power fluctuations have a certain probability.
In view of the above research status, this paper adequately exploits this correlation and adopts a short-term wind power scenario prediction method based on multivariate normal distribution. Under the premise of known wind power point prediction, a multivariate normal distribution forecasting model was established, and using the data from Ireland wind farms in 2016, the daily power scenario prediction was performed, demonstrating the effectiveness of the method.

Wind Power ECDF Model
When the probability distribution of the wind power is mostly directed at the specific wind farms. Due to the differences in the geographical environment, station conditions and meteorological conditions of wind farms, their probability distribution models are quite different. Therefore, the probability distribution model of wind power does not have a universal analytical distribution function. For the power system, the probability distribution of wind power is a statistical inference process of its external changes. The wind power can be regarded as a random variable p, assuming that the theoretical distribution of the random variable is F (p), and FL (p) is the empirical distribution function. Glivenko proved the following theory in 1953 [6]: The theorem illustrates that when the experimental data l is large enough, FL (p) is very close to F (p). On the one hand, it can be shown that the population can be inferred from the sample, and on the other hand, when the empirical distribution function of the sample size is large enough, it is very close to the actual theoretical distribution. This is the theoretical basis for estimating the probability distribution of wind power using non-parametric empirical distribution.
When the theoretical distribution of wind power is not known, assume that the order statistics x1, XL are a sample of the wind power random variable p, then the cumulative empirical distribution function of wind power is as follows: In order to verify the effectiveness of the empirical distribution, we randomly generate one thousand and ten thousand standard normal distribution random numbers in the MATLAB environment, and obtain the cumulative empirical distribution of the two sets of normal distribution random numbers through formula (2). The normal distribution function is compared as shown in Figures 1 and 2. The result is obvious.

Wind Power Forecast Box
The data used in this paper was provided by the Ireland Power Company and was used to standardize the data. This article draws on the modeling idea in the literature [7], first sorts the point prediction values of wind power from the largest to the smallest, and divides the predicted value into several "value intervals", according to the size of the predicted value, the corresponding data set [measured Value Predicted Values] is placed in the corresponding numerical range. For the data used in this paper, the numerical interval length is 0.05p.u. There are 19 numerical intervals, and all data sets in each numerical interval are called a "forecast box". In a forecast box for wind power, the prediction values of all data sets are very close, but the corresponding measured values are quite different. By observing the measured wind power in the forecast box to observe whether it obeys a known analytical distribution function, Figure 3 shows that the distribution of measured wind power in the 7th forecast box is closer to the normal distribution. Figure 4 shows that The distribution of measured wind power in the 18th forecast box is more inclined to the Beta distribution, so it is impossible to simply infer the theoretical distribution of its obey. In this paper, non-parametric empirical distribution will be used to approximate the theoretical distribution of wind power measured values.  Direct sampling is the simplest and most common measure of producing random variables. Its theoretical basis is the principle of probabilistic integral transformation, which generates random variables by inverting the cumulative distribution function.
Let the cumulative distribution function of the random variable x be Then the range of F(x) is [0, 1]. For the sampling value that can generate a random variable x, a number of random numbers that conform to the uniform distribution U [0, 1] are first formed, and these random numbers are used as the values of the distribution function F(x), and then obtain the sample value of the random variable x by solving the inverse function of the distribution function F(x), as shown in Figure 5. In view of the large number of random variables Xt that obey the multivariate normal distribution random numbers in this paper, since the set of cumulative distribution function values F(Xt) of multivariate normal distributions of these random numbers is obey uniform distribution U[0,1], we can use the following two formulas obtain the sampled value of the random variable Ext .
Where d is the dimension of the random variable; ∑represents the covariance matrix of the multivariate normal distribution; |∑| represents the determinant of the covariance matrix; ∑ -1 represents the inverse of the covariance matrix.

Multivariate Normal Distribution of Wind Power
The output power of the wind farm can be regarded as a random variation process. Let the random vector X=(X1, X2, Wk.)T,k is the length of time, which is 96 in this paper. Based on the Central Limit Theorem [5], we assume that the random vector X conforms to the multivariate normal distribution X~N (µ, ∑), and the mean vector µ can be obtained from the historical data of wind power. The covariance matrix ∑is:  (7) Where it represents the covariance of the random vector Zip and the random vector so, imp=1, 2, k.

Determination of Covariance Matrix
This article defines the wind power fluctuations per unit time interval as The wind power fluctuation Up can also be regarded as a random variable. The literature [8] clarifies that the student t distribution can properly describe the change of wind power. This paper uses the distribution fitting tool in the MATLAB statistical toolbox to do a variety of probability density fits and their corresponding probability plots for the theoretical distribution of wind power measured fluctuation values, as shown in Figure 6 and Figure 7.  In this paper, the statistical diagnosis effect of Kolmogorov-Smirnov distance [10]  The results show that the distribution of wind power measured value shows a clear kurtosis and thick tail. Whether it is from the fitting of probability density function plot, probability plot or Table 1, it can be shown that the fitting effect of t Location-Scale distribution is superior to other distribution and more accurately describe the measured wind power fluctuations.
This article uses the thinking of literature [11] to perform exponential modeling of covariance: In the formula, ε is the time scale parameter used to control the correlation between the random variables Xt. In order to make the fluctuations of wind power dynamic scenarios closer to the fluctuations statistical rule of the measured value of wind power, the best time scale parameter ε should be estimated to achieve this goal. When the time scale parameter ε is determined, the covariance matrix ∑ is uniquely determined. At this time, a random vector with multiple normal distribution functions X~N(µ,∑) can be generated by using the mvnrnd function in MATLAB to generate the wind power dynamic scenarios vector; Then exploring the fluctuation value vector Pad of the dynamic scenarios of the wind power and the historical fluctuation value vector Δp of the wind power history. When theΔpd is regarded as a simulated distribution and Δp is taken as an actual distribution, the KL divergence is calculated using the following formula: Where P is the actual distribution of data, Q is the approximate distribution of data, in order to judge the approximate level between their probability models. When the two random distributions are consistent, their KL divergence value is zero; when the difference between the two random distributions increases, their KL divergence value also increases. Therefore, when the KL divergence is minimized, the time scale parameter ε at this time is the best estimate time scale parameter. From the exponential modeling of the covariance matrix, it can be seen that when the best time scale parameters are determined, the covariance matrix ∑ is uniquely determined.

Dynamic Scenarios Generation Steps
In the case that the short-term wind power point prediction value set and the empirical distribution function of the measured wind power value in each forecast box are known, the dynamic scenarios generation of wind power is as follows: Step 1: First, identify the best covariance time-scale parameter ε according to the objective function in equation 9, then determine the optimal covariance matrix of the exponential model, and then determine the multivariate normal distribution of wind power X~N(µ,∑). Step 2: Use the mvncdf function in the MATLAB Statistics Toolbox to generate d samples of a random vector X that obeys a multivariate normal distribution of wind power.
Step 3: Determine the forecast box to which the wind power point predict value for each leading time section belongs. In this forecast box, use the formulas (4) and (5)  As shown in Fig. 8, since the time scale parameter ε will affect the correlation of multivariate normal distribution random numbers at different time slices, the rationality of the value of the time scale parameter ε will directly affect the fluctuation of randomly generated wind power dynamic scenarios. Experiments show that when ε = 234, The KL divergence between the probability distribution of wind power history data and the probability distribution of dynamic scenarios reaches a minimum of 8.0564, which means that the two probability distributions reach the maximum closeness.  Second, when the optimal time scale parameters are determined, our best covariance matrix is uniquely determined. Then we use the function corrode in MATLAB to calculate the correlation coefficient matrix and plot it as shown in Figure 9. From the figure, it is not difficult to find that there are differences in the correlation strengths at different time levels, and when the distances at different time levels increase, the strong correlation decreases. It also clarifies and complies with the correlation of wind power at different time intervals.
Based on the best time scale parameter ε and the best covariance matrix ∑, a multivariate normal distribution statistical model obeyed by wind power was determined, and a number of 50 wind power dynamic scenarios set were formed according to the point prediction values of wind power on a certain day as shown in Figure 10. Each of the thin solid lines represents a possible scenario of wind power generated by the method described in this paper; the blue thick solid line represents the measured values of a certain day of wind power, and the red thick solid line represents the point prediction values corresponding to the wind power measured values on a certain day. From the figure, it is not difficult to see that the set of 50 wind power dynamic scenarios completely contains the measured values of wind power. What is more prominent is that the overall change trend of the wind power dynamic scenarios set is very similar to the change trend of the wind power measured value curve. It explains the effectiveness of this article approach.

Conclusion
In this paper, under the premise of point prediction of wind power, we fully explore the relevance of wind power as a random variable at different time intervals and the inherent properties of wind power, and establish an empirical distribution of wind power and a multivariate normal distribution model. It shows a forecasting strategy of wind power dynamic scenarios and illustrates the effect of the method more profoundly through calculation examples. At the same time, the wind power scenario prediction can not only reduce the wind abandonment phenomenon, increase the utilization of wind energy and reduce the system operation risk, but also can flexibly plan the output of the hydro-thermal power unit in the system and optimize the system's spinning reserve capacity, thus improving the economic efficiency of the system operation.