Modelling wind direction data of Langkawi Island during Southwest monsoon in 2019 to 2020 using bivariate linear functional relationship model with von Mises distribution

The weather in Malaysia is characterised by two monsoons, namely, the southwest monsoon from May to September, and the northeast monsoon from November to March. Wind direction is essential in observing the weather patterns and global climate. In this study, our interest is on investigating the relationship of the wind direction data of Langkawi Island in Malaysia during the southwest monsoon from year 2019 to 2020. It is essential to highlight that wind direction data that is circular and this requires different statistical techniques from the techniques that are used to analyse linear data. In this paper, we model the relationship of wind direction data by using the bivariate functional relationship model with von Mises distribution. The magnificence of this model is that the existence of error terms in all variables is considered. When modelling the data, outliers of the wind direction data are identified by using the covratio method that considers row deletion. The covariance matrix of the parameter estimates of the data is obtained by using the Fisher information matrix. Also, the result is supported by the Q-Q plots of the von Mises that indicate the goodness-of-fit of the wind direction data to the von Mises distribution. Then, maximum likelihood estimation is used in obtaining the parameter estimates of the data and hence, the model of the wind direction data is attained. The implications of this study provides an improved comprehension of the behaviour of wind direction and may be used for the prediction of wind energy in future.


Introduction
Dealing with circular data is inherently different from dealing with linear data due to the periodic nature of circular data that is presented in the form of degrees or radians. Measurements on the circle, from 0° to 360°, or from 0 radian to 2 radians reflect the same direction, while for linear scale they would be located at opposite ends. For this purpose, circular knowledge requires unique methods of analysis [1].
Circular data occurs in many areas such as astronomy, geology, meteorology, physics, biology, image analysis and medical applications. In physics, for example, angular data is collected in bubble chamber experiments, where points representing events are observed through a circular window [2]. In studying the phenomenon of pole-reversal, geologists may be interested in the path of the magnetic pole of the earth and research the flow of rivers. In medical application, the angles of knee flexion are tested to evaluate the rehabilitation of orthopaedic patients [3]. In political science, cyclical timing attacks of domestic terrorism events were studied [4].
When dealing with data, the outlier would appear as anomalies. The presence of outliers may affect the fitted model, which makes scientific conclusions should be paid attention [5]. Thus, this study considers in identifying the presence of an outlier if there is any.
In this study, our interest is to propose a statistical model of wind direction. The wind direction data of Langkawi Island was obtained from Malaysia Meteorological Department. Langkawi Island is known as a UNESCO Global Geopark and it plays a crucial role in the development of heritage tourism [6]. The wind direction data of Langkawi island was recorded at the latitude of 6 o 20'N with the longitude of 99 o E' for its maximum daily direction at the elevation of 6.4 meters. Figure 1 shows the location of Langkawi Island in Malaysia.

Figure 1. Location of Langkawi Island in Malaysia
In this paper, Section 2 describes the von Mises distribution that is used to analyse the data. Section 3 discusses on bivariate linear functional relationship model that is fitted to model the data. Section 4 describes the parameter estimation used in modelling the data meanwhile Section 5 describes the outlier checking for the data. The results of the study are explained in Section 6 and then concluded in Section 7.

Von Mises Distribution
The von Mises distribution is widely used to model statistically circular data. It consists of two parameters namely mean direction and concentration parameters. When the concentration parameter is at its minimum value of zero, the distribution is also a uniform distribution over the unit circle [7]. 3 Use of the model is aplenty in the literature. In 2006, the von Mises distribution was used to determine the strength of minerals, namely biotite and feldspar in thin sections prepared parallel to the magnetic foliation plane [8]. In 2010, the von Mises distribution was applied to study wind direction for weather prediction over the Pacific Northwest [9]. Besides, von Mises distribution was used to describe the orientations of fault plane solutions before, during and after seismic swarm earthquakes in 2004 to 2008 [10]. In imaging, this distribution was applied in image analysis assessment [11].
The probability distribution function of the Von Mises distribution is given by where is the modified Bessel function of the first kind and order zero, which can be defined by where is the mean direction and is the concentration parameter for 0 ≤ < 2 and > 0.

Bivariate Linear Functional Relationship Model
The functional relationship model is a type of error-in-variables models (EIVM). The EIVM differs from the ordinary or classical linear regression model in that the variables are masked by measurements error and are not observed directly [12]. In reality, the existence of measurement errors arises into the observations in which by ignoring these measurement errors may directly affect the desirable criteria of the estimators [13]. EIVM is said to be the most statistically correct technique for reactivity ratio estimation as it considers the presence of error in all variables [14]. It is the other type of regression model that considers the measurement errors for both x and y variables. In contradictory, in ordinary linear regression analysis, the explanatory variable x is assumed to be fixed and measured without error, and the y-variable is considered with the error term where it is identified as the response variable [15]. However, there is no difference between 'explanatory' and 'response' variables in EIVM [16].
EIVM consists of three types which are functional, structural and ultrastructural. In the functional relationship model of EIVM, the underlying variables are fixed [15]. Meanwhile, in the structural relationship model, the variables are random whereas in the ultrastructural relationship model, the variables are the synthesis of linear functional and structural relationship model. In EIVM, a nestediterative loop is advised to be used to obtain the estimates of the variables and parameters [14].
In a linear functional relationship model, there are two variables that are linearly related to estimate a relationship between them. Unfortunately, observations on both quantities are subject to error [17]. The fitting of a linear relationship with errors in the continuous linear variables or error-in-variables model (EIVM) had been explored since the 19th century when Adcock investigated estimation properties under restrictive assumptions in ordinary linear regression models [18]. In 2015, this bivariate linear functional relationship model is used to study the relationship of wind direction data collected from the Holderness coastline situated at the Humberside coast of the North Sea, United Kingdom [19].

Parameter Estimation
In this paper, the wind direction data is modelled by using a linear functional relationship model for circular data. This model is used because we would like to model the relationship of the wind direction data between the two years, 2019 and 2020. The linear functional relationship model for circular data is given by With the assumption of equal error concentration that has been discussed by Mokhtar, the errors are distributed with von Mises distribution of ~(0, ) and ~(0, ) [20]. The parameter estimation is derived by using the method of maximum likelihood. The log-likelihood function of the Von Mises distribution is given by ( , , ; , ) = −2 2 − 2 0 ( ) for 0 ≤ < 2 , 0 ≤ < 2 and > 0 where 0 ( ) is the modified Bessel function of the first kind and order zero, which can be defined by : where is the concentration parameter of the error term.
The estimation of the variable is obtained iteratively given by and the estimation of the rotation parameter is given by The concentration parameter for the case of equal error concentration is estimated using the approximation given by [21]: Thus, the estimation is given by Thus, it is It is worthwhile to note that for circular case, the estimation of a concentration parameter (whose inverse is equivalent to the variance for linear data) is to be corrected by dividing it by 2 [22]. Hence, the estimate becomes ̃=2.

Identifying Outlier
Outliers refer to those data instances that make it more difficult to match the desired model [23]. Thus, in this section, the presence of an outlier is identified through the method of covratio. The covratio method is studied to detect the presence of outlier for circular data [23].
The covratio equation is developed from the Fisher Information matrix of the parameter estimates and is defined by where | | is the determinant of the covariance matrix of the parameter estimates is given by 2 and | (− ) | is the determinant of the covariance matrix for the reduced data set by excluding the i-th row.
The cut-off equation for outlier detection in bivariate functional relationship model for circular data with = 3.7586 −0.71 [24]. Since the sample size used for this study is n =151, thus the cut-off point becomes = 3.7586 −0.71 = 0.170382. This cut-off equation is used to detect the outlier for the wind direction data in this study.

Results
Preliminarily, we carried out the univariate analysis for the wind direction data during the southwest monsoon and described graphically with the rose diagram for the years 2019 and 2020. Figures 2 and 3 show the rose diagrams of the wind direction data of Langkawi Island in 2019 and 2020, respectively.
From the rose diagrams, we can see that the patterns of the data for both years are different. Thus, in this paper, we would like to investigate the relationship of the data for both years and describe it in the form of a bivariate functional relationship model for circular data.  The next step is to figure out if any outlier presents in the data. Figure 6 describes the value of (− ) for the wind direction data in this study and it shows that none of the values is greater than the cut-off equation of y = 0.170382. Thus, none of the data is classified as the outlier. Therefore, the statistical model for the wind direction data of Langkawi Island is proposed without having to eliminate any of the data. After that, the wind direction data is fitted to the bivariate functional relationship model described in Section 3 and 4. The parameter estimate values of the data are shown in Table 1.   concentration parameter of 1.29619. The parameter estimates show that the rotation parameter is 6.00350 which is very near to 2 radians. The variance of ̂ and ̃ are small given by 0.01887 and 0.01146, respectively which indicate that the values estimated are consistent and less dispersed.

Conclusion
To conclude, this paper discusses on modelling wind direction data of Langkawi Island during southwest monsoon season in 2019 and 2020 by using an error-in-variables model. The bivariate data is considered with the presence of error terms. The existence of outlier in the data is checked through the covratio statistic and it is shown that there is no outlier present in the data. The von Mises distribution is considered to estimate the parameters of the wind direction data through maximum likelihood estimation. From this model, we can say that the rotation parameter is very near to 2 radians and the value of the concentration parameter of the error terms is small and less concentrated. This model may be used in predicting the wind direction of Langkawi Island during the southwest monsoon and thus helps in managing the outdoor activities with the concern of safety and weather. Further studies may apply this model in studying the wind direction of any other places.