On probability distributions of wind speed data in Malaysia

The rapid depletion of fossil fuel resources leads to environmental issues and impacts, making alternative resources such as wind energy to be one of the important renewable sources. The statistical characteristics of wind speed and the selection of suitable wind turbines are essential to evaluate wind energy potential and design wind farms effectively. Hence, an accurate assessment of wind energy and wind data analysis is crucial before a detailed analysis of energy potential is conducted. The probability distributions of wind speed data are considered, and its parameters are precisely estimated to achieve this aim. This study considers the most selected distribution, namely, Weibull, Gamma, and Logistic distributions. These distributions are fitted to the wind speed data for sixteen stations in Malaysia. The parameter estimation is performed by the maximum likelihood method. The efficiency of the model distribution is analysed. The goodness of fit test is performed using the Kolmogorov-Smirnov test. The results show that Gamma distribution is the most suitable distribution for the wind speed data in Malaysia as it fits the data well for thirteen stations. The Logistic distribution is found to be the best distribution for the other three stations. The graphical method also agrees with the analytical result.


Introduction
Among the renewable energy sources, the wind energy has become an encouraging renewable energy source because it does not produce carbon dioxide or release any harmful products that can cause environmental degradation or negatively affect human health like smog, acid rain or other heat-trapping gases. A number of researchers across the world have studied the probability distributions of the wind speed. This is because the wind speed is the most crucial parameter in designing the wind energy conversion system. The important factor of implementing the wind turbine is the availability of the strong wind. Even though Malaysia is known to experience low wind speed, and the wind turbine is not a feasible renewable energy, there are some areas in Malaysia which there also exists strong wind during a certain period in a year.
Malaysia experiences two main weather seasons, that are southwest monsoon which happen in May or June until September and northeast monsoon in November to March. The wind speeds during the northeast monsoon is higher than the wind speeds during the southwest monsoon, especially in the east coast of Peninsular Malaysia [1]. Moreover, during April to September, the effects from typhoons strike the neighbouring countries such as Philippines and may cause strong winds even will be exceeding 10 m/s to Sabah and Sarawak.
In the present day, Malaysia is still in the process of developing the wind energy conversion system. According to Asia Wind Energy Association, the town in the east coast of Peninsular Malaysia such as Mersing, Kota Baharu and Kuala Terengganu experience stronger wind due to their monthly mean wind speed which could exceed 3 m/s. Besides that, for East Malaysian towns, such as Kota Kinabalu and Labuan also have stronger wind speed than the national average. In fact, Malaysia has constructed and installed a 150 kW wind turbine generator hybrid system at Terumbu Layang-layang, Sabah in year 1995 by Tenaga Nasional Berhad (TNB) Research Sdn. Bhd. as this place was discovered to possess the greatest wind energy potential compared to other places in Malaysia [2].
Besides that, Malaysia also has built a 100 kW of wind turbines hybridized with 100 kW solar Photovoltaics (PV) and 100 kW diesel at Perhentian Island, Terengganu but this project have stopped due to some issues and the findings which show that it was not convincing enough for wind energy to be successfully generated. Thus, an accurate determination of probability distribution for the wind speed is a very important step in evaluating the wind speed energy potential for a particular region and estimating average wind turbine power output.
There are several researches related to the wind energy assessment and the mapping have been carried out with respect to its application, potential, performance, optimization and integration with other kinds of power generation systems where this could be very helpful at the early stage of the development process. Ozay and Celiktas [3] stated that the Weibull distribution function can be used to forecast wind speed, wind density and wind energy potential. It is the best fit for the wind characteristics of the considered region.
Sopian et al. [4] analyzed the wind energy potential over 10 years, from 1982 to 1991 for 10 different sites by using Weibull distribution. Besides, [5][6][7] also considered Weibull distribution in their researches and the result gave a good fit and best result compared to other distributions. Although Weibull distribution is well accepted and provides a number of advantages, it cannot represent all wind regimes encountered in nature [8].
Tosunoğlu [9] evaluated the performance of six selected distributions and fitted the distributions with five goodness of fit tests, namely, Akaike Information Criterion, Anderson-Darling test, Bayesian Information criterion, Cramer-von-Mises and Kolmogorov-Smirnov tests. The result from that study showed that Generalized Extreme Value and Logistic distributions were found to be the best suitable models characterizing the annual wind speed data. Wu et al. [10]  There are various studies that have been done concerning the suitable PDFs for mathematically describing the wind speed frequency distributions in Malaysia as Malaysia is also contributed to developing the wind energy systems. Zaharim et al. [11] fitted the three distributions, namely Burr, Lognormal and Frechet, to the data collected in Cameron Highlands from the year 2000 to 2009. The result showed that the Burr distribution fitted data well. Besides, Najid et al. [12] considered Weibull, Gamma and Burr distributions as the models in their study. By comparing the result from the goodness of fit test values of Kolmogorov-Smirnov, Anderson-Darling and Chi-Square, the Burr distribution seemed to satisfy the statistical decision criteria and fitted well to the data collected in Terengganu.
The Weibull distribution was chosen to fit the data at Mersing location and the results showed that this city had a highly potential of the small-scale wind turbine system for the power generation purposes [13][14][15]. In addition, Daut et al. [16] applied the Weibull function to analyse the wind speed characteristics based on data collected in 2006 and calculated the wind power generation potential at the cities of Kangar and Chuping. The result of monthly mean wind power and energy density showed that the early (January to March) and the end (December) of the year have high wind power and energy potential, but low in the middle of the year. It is then necessary to develop a remarkable wind power generation capacity of harnessing the little wind resource available in Perlis.
Masseran et al. [17] performed nine types of distribution in their study and the result showed that Gamma distribution provides the best fit to the wind speed data observed at 22 stations and indicated that it was the most frequent selected distribution. In another study [18] in 2018, it was found that Gamma distribution is the most accurate model for the wind speed data that were collected at Kuantan and Balok Baru stations. There are various types of distribution function that can be considered for the wind speed characteristics which can help to determine the potential stations for the wind power generation in Malaysia. Hence, this study evaluates some of the most commonly used distributions in the past research on the wind speed data, which are Weibull, Gamma and Logistic distributions.

Wind speed data
The data in this study are obtained from Malaysian Meteorological Department. Sixteen stations are selected for this study which are Senai, Alor Setar, Kuantan, Labuan, KLIA, Subang, Kuala Terengganu, Kuching, Mersing, Kota Bharu, Kota Kinabalu, Muadzam Shah, Sitiawan, Bayan Lepas, Chuping and Melaka. The data of this study consists of daily wind speed data measured in meter per second. The period of data is 24 years for fifteen stations except KLIA which has eighteen years. To start with, the autocorrelation function (ACF) procedure has been performed to determine whether the data values is relevant with each other based on the number of time steps they are separated by.

Wind speed distributions and parameters estimation
The wind speed distribution is one of the important wind characteristics not only for structural and environmental design and analysis but also for assessing the wind energy potential and the performance of the wind conversion system. Hence, selecting the accurate distribution is required before further analysis is proceeded. The summary of probability density functions (PDFs) of the selected statistical distributions is given in table 1.

Distributions Probability Density Functions Distribution Functions
Weibull To calculate the Weibull parameters of and , the following approximations are used: where ̅ is the average of wind speed, is the standard deviation and Γ( ) is the Gamma function. The Gamma probability density function ( ) also has two parameters namely, scale and shape parameters [19]. Both parameters are usually in positive values. The shape parameter , has the effect of stretching or compressing the range of the Gamma distribution. The ( , ) in the cumulative density The parameters of the selected probability distribution are estimated by using the Maximum Likelihood method (MLM). Estimation by using the MLM involves the selection of parameter estimates that give a maximum probability of occurrence of the observations. The statistical software (Excel XLSTAT and Minitab) are used to analyse the data for estimating the parameters. Since wind is moving airflow and has kinetic energy, thus the power can be obtained from wind per unit area and based on their parameters, the average of wind power density for each distribution can be obtain. However, this study is only focusing on which distribution suit well for the wind speed data in Malaysia.

Goodness of fit test
Kolmogorov-Smirnov (K-S) test is used to evaluate the goodness of fit of the selected distributions for the wind speed data as they are commonly used to find the best fitted distributions. The Kolmogorov-Smirnov statistic is the maximum difference between the cumulative probability of the specified distribution and the empirical distribution function is calculated based on the index of the data point, where denotes as the th entry of the observed wind speed data in ascending orders, ( ) is the cumulative distribution and is the total number of observations. An attractive feature of this test is that its test statistic itself does not depend on the underlying cumulative distribution function being tested. The conclusion can finally be made based on null and the alternative hypotheses of : 0 : The data follow a specific distribution. 1 : The data do not follow the specific distribution The smallest value of test statistic D will be compared to the critical value in the K-S table. If D is greater than the critical value, then the hypothesis will be rejected for there is not enough evidence to reject the hypothesis.

Results and discussion
The data for this study are random as almost all of the autocorrelations fall within the 95% confidence limits. Thus, the estimation procedure is proceeded.The descriptive analysis is shown in table 2. Among the sixteen stations, Mersing has the highest maximum value of wind speed with a value of 9.3 m/s, followed by Kota Bharu and Kuala Terengganu stations with the values of 8.6 m/s and 8.0 m/s, respectively. Since the highest maximum value is from Mersing, then it also provides the highest mean wind speed of 2.7750 for 24 years period.
The minimum of 2 m/s of wind speed is required to start rotating most small wind turbines [15]. The standard deviations for each station are considered moderate, which are between (0.41 -1.04) m/s. The standard deviations are the indicator that all the data recorded are not much different from its means. Fifteen stations skewed to the right as the values are more significant than zero, while Kuching station is skewed to the left with its negative value of -0.0351.
As mentioned, the parameter estimations for each distribution are evaluated using the maximum likelihood method. Table 3 shows the result of parameter estimations for the Weibull, Gamma and Logistic distributions. These parameter estimation values are then being used in the respective probability density functions. The information of parameters for each distribution are very important to evaluate the wind power density function and . This is because the suitable values of parameter for each distribution are important for selecting the locations to install the wind turbine generators and to determine whether the wind farm is suitable.
The estimation values of shape parameter for Gamma distribution vary between 2.86 m/s and 15.14 m/s, the highest values shape parameter shows peak of wind distribution, while for Weibull distribution, it varies between 1.69 m/s and 3.65 m/s. Similarly, the estimation values for location parameter vary between 0.85 m/s and 2.77 m/s for Logistic distribution.   Although for the past studies, Weibull distribution is widely used to analyse the wind speed data, it is unable to fit well for the data in this study. This can be seen from the values of both statistical tests, which most of them are unable to provide the smallest value compared to the other two distributions. There are three stations (Kota Kinabalu, Subang, and Bayan Lepas) which show that Weibull is the least favourable distribution with the largest values of statistic test of 0.1255 , 0.0939 and 0.1107 respectively. Hence, based on the assessment goodness of fit tests, we can say that different stations have a different suitable probability distribution for the wind speed data.
The performances of the selected distributions are confirmed by the graphical method shown in figure  1 to 1 16. The graphs of the probability density function and cumulative density function show that Gamma distribution fits the data well for 13 stations. The other three stations which are not well fitted are Kuching, Sitiawan and Alor Setar. The Kuantan histogram graph shows that Gamma distribution captured the middle point of the data compare to the other distributions. It is also observed that the wind speed distribution is between 1.5 m/s to 2.5 m/s. Thus, the result of goodness of fit tests and the graphs are the evidence that Gamma distribution is as the most favourable distribution for Kuantan station.
There are ten stations that also provides the same result as Kuantan. They are KLIA, Kota Bharu, Kota Kinabalu, Kuala Terengganu, Labuan, Melaka, Mersing, Senai, Subang and Bayan Lepas stations. Based on their histogram graphs, most of the distributions follow the data and we can observe that the Gamma distribution captures the middle plot of the data. Hence, we can conclude that Gamma distribution fits well to these stations.          The histogram graphs of Chuping and Muadzam Shah show that the Gamma distribution captures the tail plot of the data. Chuping is categorized as one of the driest area in Malaysia. Thus, we can see that most of the data for this stations is within range of 0.5 m/s to 1.5 m/s. The CDF plots for these two stations also show that Gamma distribution is closer to the empirical distribution compare to Weibull and Logistic distributions.  The histogram graph for Kuching station shows that the Logistic distribution captures the middle point of the data compare to the other distributions. Most of the wind speed data for this station are within range 1.0 m/s to 2.2 m/s. Similarly, for Sitiawan and Alor Setar stations, the most favourable distribution is Logistic distribution. From the graphs, it can be seen that the graph captures the middle and tail points of the data. Their CDFs graphs also show that the Logistic distribution are closer to the empirical distribution.

Conclusion
The Gamma distribution is found to be the best distribution for 13, while the Logistic distribution performs well for the other three stations, which are Kuching, Sitiawan and Alor Setar. From this study, we conclude that different stations provide different distributions based on their locations. Although wind energy is the most economical renewable energy sources, it also requires a very detailed analysis of the selected region's wind speed characteristics. Hence, choosing an appropriate distribution is very important to describe the variation of wind speeds for optimizing the wind conversion system, resulting in less energy generation costs. For the investors in wind industry who are interested in studying the wind speed in Malaysia, it is recommended that they use the Gamma distribution since it will give the best wind speed probabilities compared to the other distributions.