Evaluation and correction of wind energy forecast in Yangjiang

Wind power generation is one of the most promising new energy sources. It is important to study effective short-term wind speed forecasting methods. In this paper, the CMA-GD model is applied to forecast the wind speed over South China in four typical months (January, April, August, and October) in 2022. Taking Longyue Wind Farm in Yangjiang as an example, the simulated wind speed and wind direction are evaluated based on the observation of this station at four heights (10 m, 30 m, 50 m, and 70 m). The results show that the CMA-GD model has a good effect on forecasting the next day’s 24 h wind speed of Longyue Wind Farm. Overall, the correlation coefficients (R) between forecast and observation are 0.77-0.81, and the root-mean-square errors (RMSE) are 1.80-2.08 m·s-1. With the increase in altitude, the simulation effect is a little better. For different months, the R is as follows: October>August>April>January, and the RMSE and mean absolute error (MAE) are as follows: October>January>August>April. For diurnal variation, the wind speed simulation effect in the night is better than that in the day. Due to the influence of the subtropical monsoon climate and local mountain microclimate in April and August, there are some deviations in the wind speed simulation during 4-8 p.m. The difference in wind direction between forecast and observation are as follows: August<January<October<April, the average wind direction differences are between 2-17°, with little difference in the vertical direction. After the forecast wind speed is corrected by the random forest (RF) algorithm, the R of wind speed at each height is 0.92, and RMSEs are 1.07-1.27 m·s-1. The revised wind speed forecast has the same diurnal variation characteristics as the observation, indicating that the random forest algorithm can effectively reduce the model forecast error.


Introduction
As a clean renewable energy, wind energy is one of the new energy sources with the greatest development potential and development prospects under the current economic and technological condition.The rational development and utilization of wind energy resources are conducive to solving the global energy crisis and alleviating climate anomalies, environmental pollution, and ecological deterioration.However, wind is a natural product with randomness, intermittency, and volatility, which brings great challenges to wind power grid connection, seriously affecting the safe operation of the power grid and hindering the development of wind power.How to reduce the fluctuation of wind power output and the impact of wind randomness on power grid security as well as ensure the smooth operation of the power grid system are important challenges restricting the development of wind power.An accurate wind speed prediction is the important basis for breaking through this challenge [1] .
The methods for predicting wind speed mainly rely on statistical methods, which need to collect a large amount of historical data and establish probabilistic statistical models and machine learning models based on time series, and then these models are used to predict future data [2] .With the decrease in data storage difficulty and the development of large computers, machine learning becomes easier and easier, so this method is widely used.However, the statistical method only relies on historical data, without considering the factors such as terrain and climate as well as the atmospheric circulation principle of wind speed changes.Therefore, this method is not sensitive to weather changes, which leads to a certain lag in the prediction of wind speed changes caused by strong convective weather.Another method is using numerical weather models, such as WRF [3][4] , MM5 [5] , etc.A numerical model can combine the physical information (altitude, slope, surface roughness, etc.) to obtain the meteorological elements around the wind farm, which is crucial for obtaining accurate wind power.At present, the wind speed forecast is not accurate enough in most numerical models, particularly for the wind in the near-surface layer, which cannot meet requirement of the wind power forecasting system.The Southern Supervision Bureau of the National Energy Administration has officially implemented the new version of examming target since January 1st, 2023.This requires that the error between the actual power generation and the amount of electricity that can be generated should be as small as possible.Accurate wind speed forecasting is the foundation of wind power generation prediction.The accuracy of numerical models restricts the development and utilization of wind energy resource forecasting systems, so how to use effective resources to further numerical models is a difficult problem.
By modifying the model products, the wind speed prediction accuracy of numerical models can be effectively improved.At present, the commonly used correction methods at home and abroad include nonlinear regression, least partial square estimation, Kalman filter, machine learning [6] , and MOS method, etc.The MOS method combines the accuracy of numerical models to forecast weather situations and the advantages of statistical models to forecast local weather elements.With the development of computer hardware and machine learning theory, machine learning has gradually replaced the traditional MOS and becomes the mainstream method of meteorological data error correction.Random forest (RF) is a typical machine learning method [7] , which can effectively solve the problem of large model prediction errors.Compared with other machine learning methods, RF is simpler and has better performance because this method only relies on the number of variables in a random subset of each node and the total number of components of the model.
Southern China is one of the regions with abundant wind energy resources.Improvement of wind energy forecasting accuracy is conducive to promoting the development of China's wind power industry.Based on the South China Regional Mesoscale model (CMA-GD model), the wind in Yangjiang, Guangdong province was simulated in four typical months in 2022.Based on the 15minute observation data of 4 heights of Longyue Wind Farm, the prediction ability of this model is evaluated by statistical method.In addition, RF is used to correct the prediction result.By analyzing the quality characteristics, error causes and correction effects of the forecast data, the aim is to explore ways to improve prediction accuracy of wind speed based on the CMA-GD model, provide a reference for improving the wind power prediction accuracy and promote the development of wind power in Southern China.

Data
The geographical locations of the four wind farms in Yangjiang (Longyue (LY), Longgao Mountain (LGS), Lingnan (LN), and Xinye (XY)) are shown in Figure 1.Due to certain deficiencies in the observed data of LN, XY, and LGS wind farms, this paper only selects the 15-minute wind speed and wind direction data of the four heights (10 m, 30 m, 50 m, and 70 m) of LY in 2022 as the observed data.

Model and experimental design.
The CMA-GD 3 KM model with a horizontal grid size of 0.03º*0.03ºand 65 vertical layers developed by Guangzhou Institute of Tropical Marine Meteorology, China Meteorological Administration is used in this study.This model is evolved from the nonhydrostatic mesoscale Glogal/Regional Assimilation and Prediction System (GRAPES) [8][9] .The model has the time difference scheme of semi-implicit-semi-Lagrangian and Chaney-Philips vertical layering scheme, which is obviously different from the Weather Research and Forecasting (WRF) model.In addition, the longitude-latitude grid, Araka-C grid leapfrog scheme and the height terrainfollowing coordinate are also the features of this model.In term of model physical processes, an improved WSM6 scheme and a modified SAS cumulus convective parameterization scheme with scale adaptive process are applied in this model.Additionly, RRTMG long and short wave radiation scheme and NMRF boundary layer scheme are also included in the model [10] .The range of the model is from 96-123.36ºE and 16-31.36ºN,and the time integration step is half a minute.
In this paper, four typical month in 2022 are selected to do the test, which are January, April, August, and October.The simulations start at 20:00 (Beijing time, the same as below) in the model every day and are integrated for 48 h.The forecasting data is output every 15 minutes.The model output heights of latitudinal wind (u) and longitudinal wind (v) are 10 m, 60 m, 107 m, and 182 m, respectively.Due to the vertical stratification of the model, the height of the output wind field data of the model is different from that of the observed data.To achieve an effective comparison with the observed data, the bilinear interpolation method is used to interpolate the output data of the model to the heights corresponding to the wind tower of the wind farm (30 m, 50 m, and 70 m).

Evaluation test.
In this study, five evaluation indexes are used to evaluate the wind speed simulation effect, including correlation coefficient ( R ), root-mean-square error ( RMSE ), mean absolute error ( MAE ), mean absolute percentage error ( MAPE ) and relative error ( RE ).Specific calculation formulas are as follows: where i M is the simulated value and i O is the observed value.RMSE is sensitive to outliers in the data.MAE represents the mean of the absolute error between different values.MAPE shows the absolute error between different values in the form of percentages, which can better reflect the reliability of the measurement.The closer the R is to 1 and the smaller the RMSE, MAE, MAPE, and RE are, the better the simulation effect is.

Correction method.
The accuracy of numerical prediction can be improved effectively by correcting the errors.Therefore, RF is used to correct the prediction results of the CMA-GD model.RF is an efficient, accurate, and easy-to-use machine learning algorithm.It combines various unrelated regression trees and predicts them by voting or averaging, thus reducing the instability of each tree.It is widely used in solving problems such as classification and regression.RF is an ensemble learning method, which performs well in dealing with high-dimensional data and large amounts of training data.In addition, this method can deal with missing values and unbalanced data well.
In this paper, the number of RF classifiers is 200, the maximum search depth is 5, and the minimum number of leaf nodes is 1 as test parameters, which can obtain good results.

Results and analysis
In this section, firstly, the comparison between wind speed forecast and observation of the CMA-GD model is analyzed.Then the forecast effect of each month is analyzed.Thirdly, the accuracy of the wind speed forecast of the CMA-GD model is verified from the perspective of daily variation and wind direction difference.Finally, the revised results of RF are compared.Generally, the forecast effect decreases with the increase of the forecast time, and the accuracy of the wind power forecast day is required to be no less than 60% in the "two rules".Therefore, 4 h before the model starts integrating is taken as the spin-up time of the model, and the 24 h model forecast data of the next day are extracted for analysis.

Overall forecast effect evaluation
Table 1 presents the overall evaluation index of the CMA-GD model wind speed forecast calculated based on the wind speed observation data of Longyue Wind Farm in 2022.There is a good linear correlation between the 24 h wind speed forecast and the observation at each altitude, and R is between 0.77-0.81,all of which pass the significance test with 0.01.The RMSE of wind speed at each layer is between 1.80 and 2.08 m•s -1 , MAE is between 1.41 and 1.60 m•s -1 , and MAPE is between 47.79% and 75.41%, indicating that the model can well simulate the wind speed change of Longyue Wind Farm.With the increase in altitude, the R, RMSE, and MAE of wind speed forecast gradually increase, but MAPE shows a fluctuating trend.This indicates that with the increase in altitude, although the correlation between simulated and observed wind speed becomes stronger and the degree of data dispersion increases, the overall change trend fluctuates within a reasonable range.It is worth noting that the MAPE of wind speeds of 30 m, 50 m, and 70 m is less than that of 10 m, which may be related to the certain error between the model terrain height and the actual terrain height.It is necessary to correct the terrain error of the model products in the future.
Combined with the statistical distribution of RE (Figure 2), it can be found that in the four typical months in 2022, the wind speed forecast at heights of 10 m, 30 m, 50 m, and 70 m accounts for 54.85%, 71.03%, 71.31%, and 66.30% respectively within the RE range of ±50%.Over ±100% accounted for 18.83%, 7.84%, 9.32%, and 12.41%, respectively.In proportion, compared with other altitudes, the wind speed forecast deviation of 10 m is larger and scattered, and the wind speed forecast deviation of 30 m and 50 m is relatively small, and most of them are concentrated within ±50%.

Monthly forecast variance
Figure 3 shows the 15-minute time series of four heights wind speed forecasts and observations of Longyue Wind Farm in 2022.The CMA-GD model not only shows the trend of wind speed increasing with height but also shows the change in wind speed with time.Especially on January 11th and August 25th, when the wind speed increased sharply, the model not only predicts the trend well, but also successfully captures the wind speed.However, it can also be seen that the CMA-GD model failed to predict the actual wind speed in some periods, which is quite different from the observatio.Combined with Table 2, it can also be found that the observed wind speed of 10 m is the smallest, while there is little difference among the observed wind speed of 30 m, 50 m, and 70 m, indicating that the wind speed increases nonlinearly with the increase of height.In addition, the deviation of wind speed predicted and observed at 30 m and 50 m is small, while at 10 m and 70 m is large.In terms of intermonth variation, the simulated wind speed in January, April, and October is slightly larger than that observed, while the average wind speed between the simulated and observed winds in August has little difference.From the comparison of wind speed, it is significantly higher in October than wind speed in other months, indicating that the wind energy resources of Longyue Wind Farm are better in autumn.
Table 2   The variations of the RMSE and MAE of wind speed forecast and observation in different months at different heights were the same, with the characteristics of October>January>August>April.With the increase in altitude, the error fluctuation increases, which is mainly related to the large wind speed in October and the nonlinear increase of wind speed with altitude.The MAPE of wind speed forecast and observation at each altitude in each month is inconsistent with other errors.The MAPE in January is the largest, followed by August, and the MAPE in April and October is the smallest with little difference.Judging from the wind speed at various heights, the MAPE of 10 m wind speed is the largest, followed by 70 m wind speed, and the wind speed of 30 m and 50 m are the smallest with little difference.The forecast effect in October is the best, but the numerical dispersion is large and the MAPE is small.The correlation between the forecast and observation in January is the weakest, the numerical dispersion is not large, but the MAPE is large, and the simulation effect in April and August is similar.At different heights, the simulation effect of 10 m wind speed is slightly worse than the other heights.

Forecast differential diurnal variation
Due to the diurnal variation of the atmospheric boundary layer, the near-surface wind speed usually shows obvious diurnal variation, so it is necessary to evaluate the diurnal variation of wind speed.Figure 5 shows the diurnal changes of the average wind speed forecast and observation at different heights in Yangjiang in 2022.Generally, the CMA-GD model can simulate the night wind speed well, but there are some deviations in the daytime, which need to be further revised and improved before application.Specifically, the CMA-GD model can well display the variation trend and wind speed at night, especially for the forecast of 10 m and 70 m wind speed, while the forecast of 30 m and 50 m wind speed may be affected by vertical interpolation, and there are still some deviations in the forecast of wind speed.In the subsequent correction and improvement work, the nonlinear reduction of wind speed with altitude should be considered.In the daytime, the forecast wind speed failed to reflect the change of decreasing in the afternoon and increasing at night but maintained a high wind speed.Therefore, how to improve the prediction of daytime wind speed is an urgent problem in the application of the CMA-GD model in wind energy forecasting.
To analyze the reasons that lead to the inconsistency between simulated and observed diurnal variations of wind speed in Yangjiang, the diurnal variations of wind speed at different heights in each typical month are further analyzed (Figure 6).The model well presents the diurnal characteristics of wind speed at different heights of Longyue Wind Farm in January and October, that is, the wind speed shows a downward trend in the afternoon and continues to rise at night, which is consistent with the diurnal variation of the atmospheric boundary layer.The simulated diurnal variation in January is in good agreement with the observed wind speed, while the simulated wind speed in October lags by 2-3 hours.However, the model fails to show the characteristics of daytime wind speed decline, but peaks at around 5 p.m. in April and August, which should be the reason for the inconsistency between the simulated and observed diurnal variations of wind speed in Figure 5.

Wind direction forecast variance
Wind power generation is greatly affected by wind speed.Besides wind speed, wind direction will also affect the efficiency of wind power generation, and the prevailing wind direction determines the direction of the wind turbine, so it is necessary to test the wind direction prediction.Figure 7 shows the wind frequency rose diagram of wind direction forecast and observation at different heights of Longyue Wind Farm in 2022.As can be seen from Figure 7a, the wind direction of Longyue Wind Farm in January presents a typical subtropical winter monsoon climate, with the northeast wind as the dominant wind direction, accompanied by a small amount of southerly wind, and no significant difference in wind direction at each height.The difference between the predicted and observed average wind direction is 5-10° (Table 3).The predicted wind direction is farther north than the observed wind direction.With the height increases, the wind direction difference increases.The wind direction of Longyue Wind Farm in April (Figure 7b) and August (Figure 7c) shows obvious subtropical summer wind characteristics, dominated by the southerly wind.The difference between the predicted and observed average wind direction in April is 14-17°, and the predicted wind direction is easterly than the observed wind direction.This difference increased with height.The difference between the predicted and observed average wind direction in August is 2-4°, and it gradually decreases with the increase in altitude.In October (Figure 7d), the wind direction of Longyue Wind Farm is dominated by the northeast wind, and the difference between the average wind direction forecast and observed at each height is between 6-13°.The predicted wind direction is farther north than the observed wind direction, and the fluctuation of wind direction difference decreases with the increase in altitude.In general, the CMA-GD model well simulates the change of wind direction, which is beneficial for wind farms to determine the direction of fans according to the prevailing wind direction and improve the conversion rate of wind power generation.

Prediction correction based on random forest algorithm
Correcting the wind speed predicted by the numerical model can effectively improve the accuracy of the numerical forecast.RF has greatly improved the accuracy of this numerical forecast.The correlation test between wind speed forecast and observation at four heights has been improved from 0.77-0.81 to 0.91-0.92,and the RMSE has been reduced from 1.80-2.08m•s -1 to 1.07-1.27m•s -1 .After correction, RMSE and MAE still show a trend of increasing with the elevation.The modified MAPE is 13%-30% lower than before.With the increase in height, the MAPE after revision shows a decreasing trend.The revised wind speed forecasts of four heights account for 79.73%, 87.77%, 88.12%, and 87.92% within the RE range of 50% respectively.Those exceeding 100% accounted for 7.34%, 4.99%, 4.72%, and 4.55%, respectively, which were 11.49%, 2.84%, 4.60%, and 7.85% lower than those before the revision.The wind speed forecast of 10 m is revised greatly and the error is reduced.
As far as the diurnal variation is concerned, the revised diurnal variation of wind speed is in good agreement with the observation (Figure 8a), which shows the characteristics of maximum wind speed at 6-7 a.m. and minimum wind speed at 2-4 p.m.The wind speed has also been well-corrected, and the difference in the wind speed at each height has been reduced after the correction.For example, from the daily variation of the wind speed deviation at 10 m and 70 m in August (Figure 8b), the deviation amplitude is reduced from [-11.04, 6.73] m•s -1 and [-14.84,7.99] m•s -1 before the correction to [-3.95, 4.88] m•s -1 and [-6.77, 4.25] m•s -1 after the correction, respectively.Especially on August 24th, when the wind speed deviation was large, the wind speed deviation of 10 m decreased from -11.04 m•s -1 to -3.95 m•s -1 .And that of 70 m decreased from -14.84 m•s -1 to -6.77 m•s -1 .In addition, the proportion of deviation between wind speed forecast and observation of 10 m and 70 m within ±0.5 m•s -1 are 39.13% and 33.51% respectively before the revision and increased to 79.73% and 87.92% respectively after the revision.The deviation of wind speed over ±1 m•s -1 at two heights decreased

Figure 1 .
Figure 1.The geographical location of the Yangjiang wind farms.

Figure 2 .
Figure 2. Relative error distribution of wind speed of LY.

Figure 3 .
Figure 3.Time series of wind speed forecast (MOD) and observation (OBS) at different heights of LY in (a)January, (b)April, (c)August, and (d)October 2022.Yangjiang has a subtropical monsoon climate.The geographical location along the coast makes this area more typhoons and rainstorms in summer and autumn, accompanied by the obvious maritime climate, and the daily wind speed fluctuates greatly.To evaluate the forecast effect of the CMA-GD model on the wind speed of Longyue Wind Farm more intuitively, the monthly evaluation indexes of forecast and observation in four typical months in 2022 are calculated (Figure4).From the correlation coefficients of forecast and observation, the correlation coefficients between forecast and observation of wind speed of 10 m, 30 m, 50 m, and 70 m in October are 0.81, 0.83, 0.83, and 0.84, respectively, showing the strongest correlation.In August, the correlation coefficient ranges from 0.73 to 0.78.In April, the correlation coefficient of prediction and observation of 50 m and 70 m wind speeds are not much different from those in August, while those of 10 m and 70 m wind speeds are slightly smaller than those in August.The wind speed forecast for January has the weakest correlation with observations with a correlation coefficient between 0.65 and 0.68.The correlation between the wind speed forecast and the observation at each altitude in each month passed the confidence test with a significance of 0.01.

Figure 4 .
Figure 4. Wind speed forecast evaluation indexes at LY in 2022 (RMSE and MAE are bar charts, and R and MAPE are line charts).

Figure 5 .
Figure 5.Diurnal variations of wind speed forecast (MOD) and observation (OBS) of LY in 2022.

Figure 7 .
Figure 7. Wind frequency rose graphs of wind direction forecast (MOD) and observation (OBS) of LY in (a) January, (b) April, (c) August, and (d) October in 2022.

Table 1 .
Evaluation of 24 h wind speed forecast of LY in 2022.
Note: ** indicates the correlation passed the confidence test with a significance of 0.01.

.
Average wind speed (m•s -1 ) at four heights of LY in 2022.

Table 3 .
Forecast (MOD) and observed (OBS) average wind direction for LY in 2022.