Selection of Optimal Smoothing Parameters in Mixed Estimator of Kernel and Fourier Series in Semiparametric Regression

In this article, we propose a new method of selecting smoothing parameters in semiparametric regression. This method is used in semiparametric regression estimation where the nonparametric component is partially approximated by multivariable Fourier Series and partly approached by multivariable Kernel. Selection of smoothing parameters using the method with Generalized Cross-Validation (GCV). To see the performance of this method, it is then applied to the data drinking water quality sourced from Regional Drinking Water Company (PDAM) Surabaya by using Fourier Series with trend and Gaussian Kernel. The results showed that this method contributed a good performance in selecting the optimal smoothing parameters.


Introduction
Regression models can be used to model the pattern of relationships between predictor variables j x and response variables  , if in the regression analysis the shape of the parameter regression curve [1][2][3]. Several approaches exist in regression analysis, namely parametric regression, nonparametric regression, and semiparametric regression. If the pattern of the relationship between the response variable and the predictor variable is unknown and there is no information available regarding the data pattern, it is recommended to use a nonparametric regression approach [4][5][6]. Nonparametric regression is a regression approach that is suitable for data patterns for which the shape of the regression curve is unknown, or there is no complete past information about the shape of the data pattern [7,8]. Nonparametric regression models that often receive attention from researchers are Kernel [9][10][11]; Smoothing [4,7]; and Wavelets [7] .
Kernel estimator is more often used in nonparametric and semiparametric regression because the Kernel estimator is simpler [11]. In addition, the Kernel estimator has a relatively faster convergence speed than the Local Polynomial, Fourier Series, or Spline estimator [9]. Another nonparametric regression model that often gets attention is the Fourier series because it is very good for use in very specific and very well used in data cases where the relationship between the response variable and the predictor variable has a repeating pattern [8, [12][13][14].
In the case of the response variable has a linear relationship with one of the predictor variables, but with the other predictor variables the form of the relationship pattern is not known. In such circumstances, [15]  x β is the parameter component and j error random which is assumed to have independent normal distribution with zero mean and variance 2 . Some researchers such as [10] use the Kernel approach to estimate semiparametric regression curves. The [9,10,15,16] using the Kernel approach, and [17] developed a Fourier series approach to semiparametric regression. However, these researchers still view that the nonparametric component of the semiparametric regression still has the same pattern. In the case of semiparametric regression having nonparametric patterns that differ from one another requires a mixed estimator. Research on semiparametric regression with mixed estimator was developed by the researcher. The [16], namely developing a mixed estimator in semiparametric regression where some nonparametric components that do not have a certain pattern are approximated by a multivariable Kernel and other nonparametric components have a repeating pattern which is approximated by a univariable Fourier Series by [16].
Although the Kernel approach and Fourier series have advantages in estimating data, it is very dependent on the smoothing estimator. The kernel estimator relies on selecting the optimal bandwidth. Bandwidthis a smoothing parameter that functions to control the smoothness of the estimated curve. Meanwhile, on the Fourier Series a good estimator depends on the oscillation parameters and the optimal smoothing parameters. Meanwhile, in the mixed estimation of Kernel and Fourier series, the goodness of the estimator depends on the optimal oscillation parameters, optimal bandwidth and optimal smoothing parameters. Smoothing parameters and bandwidth that are too small will result in an undersmoothing curve, which is very rough and fluctuating. On the other hand, oscillation parameters and bandwidth that are too large or wide will result in an over-smoothing curve, which is very smooth, but does not match the data pattern [18]. While the smoothing parameter is a parameter that functions to control the smoothness of the goodness of fit function with the constraint function. Optimality bandwidth is very important in Kernel estimators, as well as optimal oscillation parameters in Fourier Series estimators. Oscillation parameters and bandwidth that are too small will produce very rough and fluctuating curves, on the other hand, oscillation parameters and bandwidth that are too large or wide will produce very smooth curves, but do not match the pattern. Furthermore, the smoothing parameter hasimportant role in controlling goodness of fit and smooth functioning. In this case, the goodness of the Kernel mixture estimator and the Fourier Series is highly dependent on the optimal smoothing parameters, optimal bandwidth, optimal oscillation parameters, and optimal smoothing parameters.
One method that is often used in the selection of optimal smoothing parameters, optimal bandwidth, and optimal smoothing parameters is Generalized Cross Validation (GCV). When compared with other methods, such as Cross Validation (CV) and Unbiased Risk (UBR) or Generalized Maximum Likelihood (GML) methods, GCV theoretically has asymptotic optimal properties. The GCV method also has the advantage of not requiring knowledge of population variance and the GCV method of invariance on transformations. The GCV method is a development of the CV [19]. In [20], developed the selection of smoothing parameters from a mixture estimator of Kernel and Fourier Series where the parametric components and the Fourier Series components are still univariable. In some cases, there is a mixed semiparametric regression of Kernel and Fourier series, both parametric components, Kernel components and Fourier series components all have multivariable predictors, so we need an appropriate optimal smoothing parameter selection method. For this reason, in this paper we propose a method for selecting the optimal smoothing parameter for mixed Kernel regression and multivariable Fourier Series in semiparametric regression using the Generalized Cross Validation (GCV) approach 2 .

Method and material
Semiparametric regression is a regression that has parametric and nonparametric components. Separametric regression combines the goodness of the properties of parametric regression and semiparametric regression.
Given paired data    is a regression curve that is assumed to be smooth,  ais random error which is assumed to be independent, identical, and normally distributed with zero mean and variance 2 . In addition, the regression curve  assumed to be additive, so it can be written as: where:  The estimation of the curve in equation (1) is obtained from the optimization: with the provision of:   The solution to the optimization of equation (5) with the condition (7), is equivalent to solving the optimization:   Parameter  is a smoothing parameter that controls between goodness of fit and smoothness of function.
One of the smoothing parameter selection methods  The optimum using GCV is defined as follows:

Selection of smoothing parameters
To obtain the optimal smoothing parameter selection method, the Kernel mixed estimator and the multivariable Fourier Series in semiparametric regression in equation (1) can be written as: To prove Lemma 1, we give Lemma 2, as follows:

Lemma 2
If given a mixed estimator of Kernel and multivariable Fourier series in semiparamteric regression in equation (1), which is obtained by optimization of equation (7), is: I X X X DBX X X X I DB I Ω

Application data
This section presents the application of the optimal bandwidth selection method to the Kernel mixed estimator and the multivariable Fourier Series to see the performance of the obtained method. The application of the method obtained in this data is carried out after being carried out on the drinking water quality data of PDAM Surabaya. After conducting an initial study of drinking water quality and the factors that influence it, it was found that the relationship between the response of the predictor variables, partly followed the parameter pattern, partly followed the Kernel pattern, and partly the Fourier series pattern. The results shown in this paper are several values of the optimal smoothing parameter from several later combinations of combinations equipped with their R-squared. Table 1. Results of minimum optimal GSV value for several types of models Based on the results in Table 1, the results of several minimum optimal GSV values in several types of models are obtained. For the smallest GSV value, which is 0.0029, it is obtained R square of 82.7553.

Conclusions
Based on the results of theoretical studies and the results of applied studies in this study, it can be concluded that the proposed smoothing parameter selection method is good because this method gets 2 R which is quite large, namely 82,755.