Smoothing parameter selection in kernel nonparametric regression using bat optimization algorithm

In the context of kernel nonparametric regression, the curve estimation is fully depending on the smoothing parameter. At this point, the nature-inspired algorithms can be used as an alternative tool to find the optimal selection. In this paper, a bat optimization algorithm method is proposed to choose the smoothing parameter in Nadaraya-Watson kernel nonparametric regression. The proposed method will efficiently help to find the best smoothing parameter with a high prediction. The proposed method is compared with four famous ` of prediction capability.


Introduction
The nonparametric regression model (NRM) and methods of estimation have been developed mainly in the last years [1,2]. "Kernel regression estimates are one of the most popular nonparametric estimates. In a univariate case, these estimates depend on a bandwidth, which is a smoothing parameter controlling smoothness of an estimated curve and a kernel which is considered as a weight function [3][4][5]. The choice of the smoothing parameter is a crucial problem in the kernel regression. The literature on bandwidth selection is quite extensive, such as [1,3,[6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. In this paper, a bat optimization algorithm method, which is a natural-inspired continuous algorithm, is proposed to choose smoothing parameter in Nadaraya-Watson kernel nonparametric regression. The proposed method will efficiently help to find the best smoothing parameter with a high prediction. The superiority of the proposed method in different simulated examples and a real data application is proved. This paper is organized as follows. The description of the Nadaraya-Watson kernel nonparametric regression and the smoothing parameter selection are covered in Section 2. The details of the bat optimization algorithm are covered in Section 3. Section 4 contains the details of our proposed method. The illustration of the proposed method through simulation studies and through real data application is given in Sections 5 and 6. In section 7, the conclusion is covered.

Smoothing parameter selection
The nonparametric regression model, often estimated by estimators of the Nadaraya-Watson type, forms an attractive framework for diverse areas such as engineering, econometrics, environmetrics, social sciences, and biometrics [5].
In NRM, we have a set of univariate observations ( ) x , y ∈ and the model can be defined as: σ . The nonparametric regression depend on weighted mean of the dependent variable, the weights are the distance between the observations of independent variable measured by a smoothing parameter. One of the techniques nonparametric regression estimate is the Nadaraya-Watson (NW) kernel function estimator which is more flexible than the other nonparametric techniques and it provides an accurate predictor of observations [1,23,24].
The kernel estimator of f(x) at the point x , in general, is defined as where K represents kernel probability density function centered at each point i x , and the smoothing parameter 0 h > is known as fixed bandwidth. The NW kernel function estimator with a fixed h is defined as the following: The NW kernel estimator depends on smoothing parameter h (bandwidth). It controls the amount of curve smoothing where large h value leads to a smooth density estimate [3,19,25,26].
The optimal bandwidth of the NW kernel estimator is the value which minimized the mean integrated squared error and it is obtained by integration the mean squares of errors (IMSE). There are many different methods to select the value of h . Among them, in 1981, Friedman and Stuetzle used this strong reparability to identify the components of the nonparametric regression model when h is unknown, and proposed a kernel-based consistent and asymptotically normal estimator [27]. In 1982, the researcher Abramson suggested the law of inverse square root to estimate h in variable kernel density function, which reduced the bias more than fixed h estimator [28]. In 1986, the researcher Silverman suggested an adaptation for the kernel function estimator by varying the h as nonparametric estimation depends on geometric mean [29]. In 1987, the researchers Scott and Terrell discussed relationship between the biased and unbiased cross-validation and the variable h used instead than the fixed h in the case of long-tail distribution [30]. There are several authors handling the problem of selection the smoothing parameter, such as [1, 3, 6-22].

Bat optimization algorithm
Nature has been an inspiration for the introduction of many meta-heuristic algorithms [31][32][33][34]. Swarm intelligence is an important tool for solving many complex problems in scientific research [31,35]. Swarm intelligence algorithms have been widely studied and successfully applied to a variety of complex optimization problems. Bat algorithm (BA) proposed by Yang [36] is based on the echolocation ability of the microbats that guides them on their foraging behavior. This algorithm starts with the random initial population of bats in a n-dimensional search space where the position of the bat where θ is a random number in [0, 1], best G represents the current global optimal solution, and i δ represents the pulse frequency emitted by bat i at the current moment, where min δ and max δ represent the minimum and maximum values of pulse frequency, respectively. Initially, i δ is assigned randomly for each bat which is elected uniformly from min max [ , ] δ δ [36][37][38][39][40][41]. The corresponding position of the randomly bat is updated as where old x represents a random solution chosen from the current best solutions, t L is the loudness and ε is a random vector that is drawn from [ 1,1] − . The pulse emission rate r and the loudness L are updated by controlling the balance between these techniques as follows where 1 b and 2 b are constants.

The proposed method
The efficiency of NW kernel estimator largely depends on an appropriately choosing the smoothing parameter, h . As a result, it is of crucial importance selecting a suitable value of the h . In literature, the most widely used method for selecting h is the cross-validation (CV), which is a data-driven approach [6,30]. In this paper, a BA is proposed to determine the smoothing parameter in the NW kernel estimator. The proposed method will efficiently help to find the best value with high prediction performance. The parameter configurations for our proposed method are presented as follows.
(1) The number of bats, is set to 20 and the number of iterations is max t =150 .

Simulation results
To test how well the proposed method performs for different possible mean functions the following study design was followed. The comparisons with different used methods, CV, GCV, AIC, and plug-in method (PM) are also conducted. "Three sizes of samples are taken as: 50,100,150. n = In addition, the type of kernel is setting as Epanechnikov kernel type.

Case 1:
In the case, we use the regression function 3 3 (1 The explanatory variable, x is generated from uniform distribution with the range 0 and 1.

Case 2:
In the case, we use the regression function The explanatory variable, x is generated from uniform distribution with the range 0 and 1.

Case 3:
In the case, we use the regression function The explanatory variable, x is generated from uniform distribution with the range 0 and 1.

Case 4:
In the case, we use the regression function sin(2 ) (0, 0.5) The explanatory variable, x is generated from uniform distribution with the range 0 and 1.

Case 5:
In the case, we use the regression function sin(4 ) (0, 0.11) The explanatory variable, x is generated from uniform distribution with the range 0 and 1.
The generated data is repeated 500 times and the averaged integrated mean squared error (IMSE) is calculate. The results of the used methods are summarized in Tables 1 -5. The results presented in Tables 1 -5 show that the BA algorithm yielded higher prediction accuracy than did CV, GCV, AIC, and PM methods for the case 1, case 2, case 3, case 4, and case 5. The proposed method, BA, has superior reduction in terms of IMSE. For example, for case 2, the reduction in IMSE using BA was 5.52%, 5.02%, 5.97%, and 7.84% compared with CV, GCV, AIC, and PM methods, respectively". With respect to the sample size, there is a decreasing in the IMSE values when the sample size increasing. However, the performance of the proposed method, BA, is still the best among others in all cases.

Real application results
To investigate the performance of our proposed method in real application, 255 observations are daily taken from 2ed January/2019 to 13ed March/2020). This data is related to ASIAcell Communication Company in Iraq. The response variable is the number of deals and the explanatory variable is the closed price of the share in Iraqi dinar. The estimated IMSE values and the value of the smoothing parameter, h , is reported in Table 6. It is clearly seen from Table 6, that the proposed method, BA, achieved the lowest prediction error. It reduced the IMSE by 5.08%, 4.87%, 3.64%, and 3.67% than those obtained by CV, GCV, AIC, and PM, respectively. Figure 1 shows the smoothing curve of the used methods in addition to the linear model which is estimated by ordinary least square method (OLS). It is clearly seen that our proposed method give a more smooth curve than the others.

Conclusion
In this paper, the problem of selecting smoothing parameter in Nadaraya-Watson kernel nonparametric regression is considered. A bat optimization algorithm was proposed to choose the parameter of smoothing parameter. The results obtained from simulation and real data application demonstrated the superiority of the proposed method, BA, in terms of IMSE comparing with other competitor methods.