Performing Experimentation with Physics Model to Predict Statistical Weather Condition

Ice in wind turbines may cause a tremendous reduction in energy conservation. As, ice over turbines are not considered to be a traditional weather prediction data, prediction towards power can leads to higher error. This work anticipates a statistical approach dependent on Niave bayes regression to identify production loss has to be analyzed. It measures input of regional weather condition and various other conditions, and identify power production loss for 48 hours to enhance prediction of next generation energy loss. This can be trained with various prediction measurements and drastically enhances other conventional approaches for longer period. It may diminish absolute production error by ∼100kW and it computes its skill with other models. Prediction of weather data is considered to be one of the effectual data for diverse statistical prediction and some calculations are not so absolute. This method can be computational less cost and may be trained again for next prediction.


Introduction
Appropriate recognizing of wind power prediction is essential to handle power demand and energy production. During cold climates, ice over wind turbines may leads to various crises [1]. Owing to variation in aero-dynamic balance, vibration generation, load improvement, ice may outcomes in essential production losses. During end of 2016, 25% of complete installation of wind energy capacity was in cold environment, which makes ice an essential factor in producing energy [2]. Ice modelling over structures like wind turbines or power lines have been performed with physics based ice model  [3]. With complex terrain, small structural scale and complexity of measuring ice, modelling ice and ice development are more confronts. Author in [4], demonstrated the usage of various NWP models or NWP ensemble prediction as input to ice system that may cause an extremely complex load prediction [5]. By initiating least variations of certain parts of input factors to ice models also determined small variation of ice accumulation that uses applied ice model. Therefore, icing prediction in physics-based ice models are uncertain in common [6]. Physics-based weather prediction have been utilized to model ice based production losses. It shows certain skill in estimating production loss [7]. Moreover, due to uncertain weather prediction, these predictions are not perfectly appropriate.
Machine learning has prior been utilized to generate ice based production loss model, however to certain investigators knowledge, everything will be originated from physics based ice model outcomes merged with certain power curves [8]. As records and historical weather prediction of on-site calculations, ice prediction can be performed in statistical manner [9]. However it may use weather prediction data integrated with machine learning/statistical approach that identifies production losses. There are huge applications of machine learning approaches in geo-scientific model by author in [10] and also certain specific in isolation with weather predictions.
One of the general way is there are certain progression towards entire replacing conventional numerical weather prediction approaches, however there should be a simplified realities [11]. In addition, there is huge progression with the use of neural network approaches to modify arguments in context to climate model [12].
Here, this work uses a machine learning based approaches termed naive bayes regression is trained with previous weather prediction [13]. Upper range for all 48 hours was selected as it has maximal time to lead regional NWP model that was used. Initializing prediction with UTC for 48 hours is most crucial; however it is specified in "next few days".
In this technical sections are utilized by input data that are describes, the selection of machine learning approaches will be elaborated and used as a method described. In these outcomes, this work will discuss power loss prediction skill and it is translated to appropriate power predictions [14]. As well, it discusses the significance of weather prediction data verses on-site measurements and individual input variables [15].

Data
For this analysis, various weather predictions have been analyzed with online available weather prediction data of certain data. Here, weather data from online is considered as an input for further process. This is explained in next section.

Observations
This work uses anonymized production data and with meteorological observations of wind data which is located with range of 58-60 o N at terrain or hill as various surroundings. With these sites, there are numerous wind turbines may be observed Turbines form various stations may have de-icing operation system to eliminate ice-growth. Moreover, this work may be incorrectly and it may still experience production loss periods. Henceforth, this work included in this investigation for de-icing system. Only 15 minutes value of every hour is utilized in verification. To identify production loss from various measured production and wind speed, it has to recognize production that may expect provided wind speed (potential production). With this, this model constructs easier production model that may be associated with wind speed at site with production ability, therefore production is encountered with power curve. As this does not hold ice occurrence relation, only threshold values are set to all these days.

Data splitting
With diverse machine learning tasks, algorithms are prone to "over-fitting" that it may work for all data with algorithm that is seen during training and testing data, however it may not work effectually on newer data (it may not be generalized). Henceforth, it should be evaluated with separate testing scenario. As this data may use non-zero auto-correlation, simpler randomized splitting of individual prediction into testing and training sets is not suitable, as it may cause to over-confident outcomes (owing to learning algorithms with correlation). As an essential approach, this may splits the data into chunks and these chunks are randomly distributes for training and testing, i.e. (20% and 80%). Training set generated in this dataset is used for selecting and tuning machine learning approaches.
As well, this work may compute the use of randomized data with operational practice: for prediction initialization at 5 days can be done with all available prediction data for sample validationtherefore every predictions are initialized with certain time period. As this method may be computationally expensive with an operational context it may be re-trained easily for all new predictions. This model is termed as operational validation all through the work.

Naive Bayesian
It is extremely hard to predict in previous that some machine learning approaches may work effectually for some crisis, even if they are already known to certain works on related constraints. Henceforth, section of appropriate techniques possesses numerous ways known as hyper-parameters. There parameters have to be provided in prior for training purpose. In recent time, there is a necessity of using neural networks with various neuron layers (termed as deep learning) for extensive application range, deep learning are generally based on extremely huge amount of training data to use its maximum result.
This work does not consider deep learning approaches here. Indeed, this method is tested with various approaches like neural networks with hidden layers, random forest, support vector machine and multi-layer regression. Every test is done with online available dataset of various hyperparameters over training set (expect multi-linear regression, this may not have hyper-parameters), and naive bayes is considered to perform training set. For performing thistraining, partitioned the data into 3 equal parts. Every technique is trained with 2/3 of training data and computed with remaining 1/3. This procedure may be repeated for 3 times, then average value is computed. This approach shows the superior average skill that was chosen. Appropriate configurations of all these techniques are validated with hyper-parameters merged with corresponding codes.

Numerical results
The simulation is carried out in MATLAB environment. Here, comparison is made of power prediction with naive bayes method with other physics ice model. Here, RMSE is measured as metric for performing this computation. The time will be of 18-20h. Unbiased RMSE is depicted as in Eq. (1): Here, two methods are used for prediction, they are: deterministic and ensemble prediction. Mean over entire period is generally studies. As, naive bayes is used as a classification factor it is essential to partition data into testing and training. The evaluation is performed for subset functionality. However, it may facilitate reasonable evaluation. Here, NB based prediction is slightly higher than deterministic physics model for prediction as in Fig 1. Comparison with mean prediction is slightly superior to other station, and slightly worse with various stations as in Table 1.   There is no essential impact when eliminating single variables for any sort of inputs. This specifies that variables are replaced with significant features. When eliminating snow, cloud ice turns to be a most essential variable after wind speed as in Fig 3. When eliminating both snow and ice, there is no significant loss in skill of naive bayes predictions. This step may be trained for variables with NWP forecast. Henceforth, when no NWP information is attained, naive bayes is considered to be simply persistence. Thus, NWP forecast is most useful, this may increase on-site measurements which are not statistically essential.