Wind Speed Ramp Rate Predictions Using Wind Farm SCADA Data Assimilation and a WRF Ensemble

This study presents a novel method for improving wind power ramp events forecasts up to six hours ahead by utilizing data assimilation of SCADA measurements with an ensemble of Weather Research and Forecasting (WRF) models estimates. Leveraging data from nine wind farms in France and Belgium, the approach aims to improve WRF model predictions for wind speed and ramp event timing. The methodology employs grid and observational nudging techniques, enhancing model accuracy by incorporating real-time observational data. Key findings demonstrate that nudging significantly reduces Mean Absolute Error (MAE), decreases the Time Distortion Index (TDI), and increases the Probability of Detection (POD) of ramp events. Nudged ensemble members outperform non-nudged counterparts, exhibiting better accuracy in identifying true ramp events and reducing false alarms. MAE, TDI and POD improvements are as high as 3.7%, 8.5% and 37%, respectively. The study also explores the benefits of an ensemble approach, highlighting improved accuracy in predicting ramp rate magnitudes and providing valuable insights for grid stability management. This research contributes to wind power forecasting, showcasing the importance of integrating SCADA data into predictive models.


Introduction
In recent times, there has been a significant rise in global wind and solar energy production, and it is anticipated that a considerable share of the overall energy output will be derived from these sources [1].Nonetheless, both wind and solar energy exhibit an inherent variability, rendering them less dependable as sources for electrical power generation.This inherent inconsistency adds a layer of complexity to the task of managing power systems, as grid operators must rapidly and effectively harmonizing supply and demand in real-time.Power system operations have implemented strategies to manage demand variability and uncertainty, such as regulation reserves, load-following reserves, and sub-hourly economic dispatch.In spite of the implemented measures, the September 2021 winter storms resulted in significant power losses affecting over five million residents in the US.The challenge of upholding a proper supply-demand equilibrium was amplified by the abrupt reduction in electricity generation, triggering the activation of grid protection mechanisms that ultimately culminated in a widespread blackout.These incidents highlight the utmost importance of accurate forecasting for such exceptional events [2] [3].Therefore, accurate predictions of sudden surges or drops in wind and or solar electricity generation, referred to as ramp events, are essential for integrating grids, planning systems, and trading electricity in specific electricity markets [4]. 2 Despite being crucial for the industry, ramp events are infamously challenging to forecast [5].This challenge stems from the accurate assessment of the timing and magnitude of the ramp event.Errors in ramp magnitude occur when a ramp is predicted, but the actual value fluctuates significantly more or less than anticipated.Similarly, errors in ramp timing occur when the actual change in power precedes or follows the predicted time by a significant margin [6].
Early attempt of forecasting ramp rate date back on the early 2000 [7].Nowadays studies attempting to forecast wind speed ramp rates can be classified in two categories: deterministic and probabilistic forecast.Deterministic forecasting provides a single predicted value for wind speed and direction at a specific location and time in the future, typically derived from physical models of the atmosphere.This method yields a single, definite output of wind speed and direction for a specified time and location.However, deterministic forecasts do not account for the inherent uncertainties in atmospheric conditions, which can lead to decreased reliability for longer time horizons.Consequently, they are not suitable for risk assessment as they do not provide a range of possible outcomes, which is crucial for applications that require evaluating different scenarios and their associated risks.Studies such as those referenced in [8], [9] and [10] employed the mesoscale weather prediction model "WRF" (Weather Research and Forecasting) [11] to forecast wind ramp rates.The results from these studies suggest that the WRF model can generally predict the timing of a ramping event reasonably well, but it struggles with accurately assessing the magnitude of the event.This limitation highlights the challenges associated with deterministic forecasting, particularly in applications that require precise quantification of event magnitudes for effective planning and decision-making.Probabilistic forecasts on the other hand provides a range of possible wind speeds and directions along with the associated probabilities.These forecasts are usually obtained using ensemble models that run multiple simulations with different initial conditions or model parameters.The output is a probability distribution of wind speeds and directions.Probabilistic forecasts are suitable for risk assessment; yet their output can be more complex to interpret as it provides a range of possible values with associated probabilities.Additionally, they are computation intensive: running multiple simulations for ensemble modelling requires more computational resources compared to deterministic models.Example of such approach are [12], [13], and [14] where an ensemble of WRF forecasts were used to forecast ramp rates.The findings from these investigations indicate that employing an ensemble approach enhances the accuracy of the magnitude of the ramp rate forecast.However, they also underscore the challenge associated with weighting the ensemble members, a critical step in ensemble forecasting.Often, a simple averaging of the ensemble members is used, but this approach may not always yield the most accurate results.
A recent trend involves the use of data assimilation (DA) to address the shortcomings of weather forecast models.This technique combines the physical information provided by numerical models with onsite observations, to enhance the forecast accuracy [15].This combination can be categorized as either empirical, statistical or hybrid depending on how the weighting function is computed to produce an accurate estimate of atmosphere true state: the "analysis".Statistical DA [16] [15] while robust, requires extensive computational power to identify the most accurate analysis, making it less feasible for operational and time-sensitive applications.On the other hand, empirical DA [17] [18] [19] [20] also known as four-dimensional DA (FDDA), offers a faster, more straightforward and user-friendly approach.However, it is important to note that for empirical DA a notable challenge arises in adjusting the analysis toward an arbitrary state, especially when model predictions greatly diverge from sparse and low-confidence observations in data-deficient areas.In their studies, researchers in references [17] [18] [19] and [20] employed techniques of observation and analysis nudging.Across these studies, it was consistently found that data assimilation improved the accuracy of wind speed and direction forecasts.A critical factor identified was the spatial extent over which an observation was assimilated and the degree of influence each observation exerted on the model.Moreover, the study in reference (17) highlighted that denser observational networks enhanced the benefits of nudging, particularly in accurately forecasting wind speed and direction.However, while these studies concentrated on evaluating the general accuracy of wind speed metrics, they did not address the quantification of the magnitude and timing errors associated with ramp events.Additionally, there was no effort made to measure how nudging techniques affected estimates of wind power production.
The purpose of this research is to introduce a robust approach for accurately forecasting wind power real-time ramp events six hours ahead.Our method's novelty lies in employing an ensemble approach along with data assimilation of Supervisory Control and Data Acquisition (SCADA) measurements from surrounding wind farms relative to the target wind farm.To the best of our knowledge a similar approach to the one proposed in this study has not been conducted yet.The structure of this article is organized as follows: Section 2 introduces the input data and models, which include SCADA data, the WRF model, and the implementation of the nudging technique.Section 3 outlines the workflow of the nudging method the validation metrics employed in the study.Finally, Section 4 and 5 present the results and conclusions drawn from this research.

Data and models
This section presents the data and model used for conducting the nudging and forecasting ramp rate events.

SCADA data
This research harnessed data from 9 wind farms across France and Belgium.Data were sourced from Darwin, the Engie group's specialized digital platform for data storage and power asset optimization.For confidentiality reasons, specific details such as the exact locations, turbine types, and power curves of these wind farms are not disclosed in this study.The dataset from each wind farm encompassed measurements of wind speed, direction, and air temperature, recorded at 15-minute intervals from 2023-05-14 to 2023-05-16.These data underwent comprehensive automated quality checks within Darwin to rectify common issues, including: 1. Correction of frozen values resulting from communication losses; 2. Identification and removal of outliers using the Local Outlier Factor statistical algorithms; 3. Assessment and correction of abnormal values by comparison with data from neighbouring assets 4. Elimination of duplicate entries; 5. Addressing missing values through techniques like linear and spline interpolation, forward/backward filling, or referencing data from proximate wind turbines within the same farm.
The compiled database was divided into two segments: one for data assimilation, 8 wind farms, and the other for validation purposes, 1 wind farm (figure 1).The validation data were exclusively derived from a wind farm composed of a single wind turbine which is a specificity of Belgium market.This particular wind farm is situated with no other turbines within a radius of five rotor diameters, ensuring that it is unaffected by inter-farm wake effects and have minimal or no wake interference from neighbouring wind farms as concluded in [21].Furthermore, validation was carried out from 08:00 to 18:00 due to local noise restriction curtailments [22].

Figure 1.
Relative distance between the validation site (green) and nudging sites (yellow)

Synthetic SCADA data
Increased observations significantly enhance the advantages of data assimilation, as evidenced in Reference (17).However, wind farm operators typically do not share data from their assets for competitive reasons.Global Wind Power Tracker database [23] illustrates that there are currently numerous wind farms in operation globally.These farms collectively offer an abundant source of measurements, including wind speed, wind direction, and temperature.To replicate the accessibility of these real-world data, we utilized data from the NCEP global surface [24] and upper air observational weather data [25] while manually removing measurements of wind speed, wind direction, and temperature at heights greater than 200 meters above ground level.The database obtained and used contains 2202 stations, aligning with the estimated number of wind farms in the study area, which is around 2500 as per the Global Wind Power Tracker data.This database is available with a one day delay and was quality checked using the Obsgrid module of WRF.

WRF model
WRF [11] is a numerical weather prediction model that solves numerically a set of non-linear equations over a Cartesian grid.These equations model the physical process that occur in the atmosphere and its interaction with the ocean and land surface.WRF takes global model forecasts as initials and boundary conditions inputs to account for large scale.The effects of Physical phenomena with length smaller than the grid size are approximated using parameterization scheme (PS).Table 1 gives WRF model setup.Radiation PS RRTMG [28] and Dudhia [29] Microphysics PS WSM6 [30] Planetary boundary layer (PBL) PS MYNN 2.5 [31], YSU [32], MYJ [33], Shin Hong [34] Land surface PS Noah [35] To combine WRF with SCADA observations an empirical DA method is used: the "nudging".A short description of the nudging and its implementation in WRF is given in the subsections below.

Observational nudging
Observational nudging involves continuously relaxing the model state, typically hourly to sub-hourly, towards the observed state, as detailed in reference [36].This approach is particularly advantageous when there is a scarcity of observational data, as it facilitates dynamic and frequent adjustments to the model.Observational nudging is executed by adding to the prognostic equations of the variable to be nudged, an artificial tendency term, which is based on the difference between the two states as illustrated in equation (1).
where, p is the hydrostatic pressure; α is the nudged variable: either the components of the wind speed vector or the temperature; Fα is The model's physical forcing terms for the variable α; Gα is the nudging strength for the variable α; Wα,I is the weighting function of the observation number i at a location  ⃗ and time t of the WRF computation grid; α0 is the value of observation number i located at  ⃗  and measured at time t';   is the WRF value of variable α either at location  ⃗ (native domain grid point) or spatially interpolated to the observation location.In this study, the observation nudging make use of SCADA data of wind speed, wind direction, and temperature, with a frequency of 15 minutes as detailed in section 2.1 of the paper.

Analysis nudging
Analysis nudging, as referenced in [36], is an intermittent technique for initializing the model.This approach employs objective analysis techniques, where observed values are interpolated onto the model's grid points to refine the atmospheric state estimate at regular intervals, typically every 6 hours.In this process, Newtonian relaxation terms are integrated into the prognostic equations for wind and temperature.These relaxation terms dictate the rate at which the initial conditions, derived from the first guess, converge towards the gridded analysis.This approach is based on the expectation that enhanced initial conditions will lead to a reduction in forecast errors.However, it is crucial to acknowledge that the production of this gridded analysis demands a greater volume of data compared to observational nudging.Therefore, analysis nudging is more commonly applied on larger scales with coarser grid resolutions.
Equation (2) details the objective analysis employed in this study, while Equation (3) illustrates how the gridded analysis is integrated into the prognostic equation..
( ⃗, ) =   ( ⃗, ) +      ( ⃗, )   ( ⃗) × (  ( ⃗, ) −   ( ⃗, )) where Wα is the weighting function at a location  ⃗ and time t of the WRF computation grid; αa is the value of the objective analysis of variable α at time t;   is the analysis quality factor which is based on the quality and distribution of the data used to produce the gridded analysis.Other variables are identical to those presented Equation (1).
Table 3 presents the WRF input parameters chosen empirically to compute the weight Wα and define the nudging strength Gα .In this study, the analysis nudging make use of synthetic SCADA data of wind speed, wind direction, and temperature, with the same frequency than the initial and boundary conditions of GFS: 6 hours.Synthetic SCADA data are further detailed in section 2.2 of the paper.

Method and validation metrics
This section presents the methodology used for accurately forecasting wind power ramp events as well as the validation carried out

Methodology workflow
Our methodology incorporates two types of nudging techniques: grid nudging 2.3.2 and observational nudging 2.3.1.In the lead-up to the day of interest (2023-05-14 00:00:00 to 2023-05-15 00:00:00), grid nudging is employed to refine our estimate of the atmosphere true state.This enhancement is achieved by adjusting the large-scale initial and boundary conditions from GFS to align with the gridded analysis.Consequently, at the start of the focus period (2023-05-15 00:00:00), we possess an improved understanding derived from the data of the preceding day.Subsequently, we employ observational nudging to integrate new, incoming SCADA observations.Every 6 hours, new SCADA observations are assimilated by initiating a fresh forecast from a designated restart point.These restart points are automatically generated at 6-hour intervals and define the span of our forecast range.The workflow of the methodology is shown in figure 2.

Ensemble approach
Various sensitivity studies were carried in WRF by Carvalho and al [37]; Draxl and al [38]; Krogsaeter and Reuder [39] and Muñoz Esparza and al [40] for a total of 4 different Planetary boundary layer (PBL) schemes.An intercomparison of these studies showed that the accuracy of PBL parameterization schemes is site and stability condition dependent.The choice of the best model setup for a forecasting system for a particular region being dependant on the typical distribution of atmospheric stability conditions at the site, we decided to use a multi ensemble approach to benefit from the strength of each individual ensemble members.Carvalho and al (35), Krogsaeter and Reuder (37) and Shin and Hong (39) indicate that local closure schemes, such as MYNN2.5 and MYJ, are better suited for atmospheres with stable stability.These schemes are designed to assume that fluxes are dependent only on the values and local gradients at nearby grid points.Conversely, for unstable atmospheric conditions, non-local closure schemes like YSU and Shin Hong are often preferred.These non-local schemes account for turbulent fluxes influenced by large eddies, which transport variables over more considerable distances.However, it is important to note that non-local closure schemes can struggle in replicating accurate stable wind speed vertical profiles, particularly in stable and stratified atmospheric conditions, as they tend to cause excessive turbulent mixing near the surface (35).This overestimation of mixing can lead to inaccuracies in modelling wind profiles under such conditions.The choice between local and non-local closure schemes, therefore, depends on the specific atmospheric stability being modelled, with each having its strengths and limitations in different meteorological scenarios.The selection of these schemes, table 4, was strategic to encompass turbulence modelling across various terrain types, complexities, and meteorological conditions.

Ramp rate detection
To identify ramp events, we employ the swinging door algorithm [6].This algorithm facilitates the extraction of ramps from signals, employing a piecewise linear method, while accommodating a threshold parameter that influences its sensitivity to variations in ramps.The implementation of Aleksandr F. Mikhaylov was used for this study [42].The threshold parameter selected is 0.01 which highlighted thirty-six power ramp events during the validation period.

Validation procedure
The validation of our approach hinges on several metrics, including the Probability of Detection (POD), False Alarm Rate (FAR), Mean Absolute Error (MAE), and Time Distortion Index (TDI) [6] [12].While POD and FAR gauge the ensembl''s capacity to detect ramp occurrences, MAE and TDI quantify the accuracy of ramp magnitude and temporal alignment using a dynamic time warping algorithm [43].
These metrics are described as follows: = area under optimal warping path    ℎ (7) where PSCADA and PWRF are the power measured by SCADA and computed from WRF estimated wind speed; the warping and identity path are obtained by computing the cost matrix: a measure of the amplitude difference between a sample from the SCADA and a sample from WRF.Further information on the TDI can be found in [6] and its implementation in [44].
The MAE, POD, and FAR are metrics that range from 0% to 100%.A 0% MAE or FAR indicates a perfect forecast, where the former denotes no deviation from observed values and the latter reflects no false alarms.Conversely, a 100% POD signifies a perfect model in terms of correctly identifying ramp events.The TDI is a dimensionless number varying in the interval [0,1], where 0 corresponds with the null temporal distortion and 1 with the maximum temporal distortion.

Results
This section presents the power production data measured at the validation wind farm via SCADA, alongside various estimates derived from WRF ensemble members.
It is observed from the data analysis that the impact of nudging dissipates after 2023-05-15 13:30:00 and that the forecast accuracy decreases (figure 3).This phenomenon can be explained as follows: from 2023-05-14 00:00:00 to 2023-05-15 00:00:00, the abundance of synthetic SCADA data contributes to improved initial conditions for the model.However, from 2023-05-15 00:00:00 to 2023-05-15 13:00:00, only observational nudging from 8 stations is applied, and this is updated every 6 hours.After the second update cycle, the domain's inherent momentum likely becomes dominant, diminishing the influence of the limited observational data points.Consequently, we infer that our methodology is capable of forecasting ramp rates up to 6 hours ahead but loses efficacy beyond 13 hours from the initial observation nudging cycle when limited SCADA data are available.
To evaluate the effectiveness of the proposed nudging methodology, key performance metrics such as the Probability of Detection (POD), False Alarm Ratio (FAR), Mean Absolute Error (MAE), and Time Displacement Index (TDI) were calculated for each ensemble member (table 5).These metrics, detailed in 3.4, cover the time period from 2023-05-15 08:00:00 to 2023-05-15 13:00:00.The results indicate that ramp events, identified using the Swinging Door Algorithm, are consistently captured by the nudged members, with a POD of 100%.With the exception of member 1n, the nudged members generally exhibit equal or lower FAR compared to their non-nudged counterparts, suggesting enhanced accuracy in detecting true events and reducing false alarms.
In terms of the magnitude of ramp events, the MAE values demonstrate notable improvements.For instance, member 1n exhibited a 37.4% improvement in MAE over member 1 (table 5).The smallest improvement was observed in members 3n, with a 3.1% enhancement in MAE.These findings highlight the nudging methodology's efficacy in refining the accuracy of ramp event magnitude predictions.
Regarding the time alignment of ramp events, the TDI scores of the nudged members, with the exception of member 1n, are lower, indicating superior accuracy in predicting the timing of ramping events.
It should be noted that the accuracy of the forecasting could be further enhanced.Both observation and analysis nudging techniques are significantly influenced by factors such as the radius of influence and the specific parameters of the nudging process.While a sensitivity study to explore these aspects could provide valuable insights and improved metrics, it was beyond the scope of this current research.
Figure 3 provides a probabilistic perspective on the ramp rate forecast, illustrating the median, maximum, and minimum power production values for both nudged and non-nudged members.The median value offers an estimate of the most probable ramp magnitude, while the ensemble spread reflects the associated uncertainty.For example, during the interval between 2023-05-15 12:00:00 and 2023-05-15 13:00:00, when multiple ramp events are observed, the nudged members predict an increase of 800kW, albeit with considerable uncertainty as indicated by the ensemble spread.However, this range of uncertainty still encompasses the actual value recorded by the SCADA system: 910 kW power increase.Grid operators can leverage this information to establish threshold values for activating grid protection mechanisms, considering various cost factors.These include the cost incurred for responding to a forecasted event, the cost arising from unnecessary purchases of ancillary services, and the cost associated with unanticipated events that were not forecasted and thus not prepared for.By balancing these costs, operators can more effectively manage grid stability and efficiency.3. Power production probabilistic forecast.The band thickness is given by the ensemble member with lowest and highest value for each time steps

Conclusions
This study presents a methodological advancement in the forecasting of wind power ramp events, utilizing data assimilation of SCADA measurements within an ensemble of Weather Research and Forecasting (WRF) models.The approach leverages observational data from multiple wind farms in France and Belgium, aiming to refine the predictive capabilities of the WRF models, particularly for wind speed and ramp event timing.
The findings of this research indicate that incorporating grid and observational nudging techniques within the data assimilation process results in a reduction of Mean Absolute Error (MAE) and an improvement in the Probability of Detection (POD) of ramp events.The analysis demonstrates that the ensemble members enhanced with nudging techniques generally exhibit superior performance compared to their non-nudged counterparts, particularly in terms of accurately detecting true ramp events and minimizing false alarms.The improvement is highly dependent of the availability of measurements which dictates the forecast horizon over witch improvements are observed.Moreover, the study highlights the utility of an ensemble-based approach in forecasting.This approach provides essential information for grid operators, facilitating more effective grid stability management.
Overall, this research contributes to the field of wind power forecasting by illustrating the potential benefits of integrating SCADA measurements with ensemble-based WRF models.This advancement is particularly relevant for the effective integration of renewable energy sources into power grids, optimizing system planning, and enhancing the operation of electricity markets.

Table 1 :
WRF model configuration

Table 2
presents the WRF input parameters chosen empirically to compute the weight Wα,i and define the nudging strength Gα .

Table 2 .
Observational nudging settings.Nudging is done only on the parent domain auxinput11_interval 1 How frequently the model should check to see if additional observations are available (in units of coarse grid time steps).auxinput11_end_h 99999 End time for reading observations (in hours).

Table 3 .
Analysis nudging settings.Nudging is done only on the parent domain gq 0.0000 Nudging strength for upper air mixing ratio analysis (deactivated).

Table 4 .
Ensemble member description.Other physical and numerical parameters are identical