Village classification index prediction using geographically weighted panel regression

Village classification index is a certain status of the achievements of village development activities. In measuring the achievement of village classification index, it needs to be made in several time periods and must concern to the spatial effects because the geographical conditions of each village are diverse. It is necessary to study the variables that affect the village classification index in several time periods. Statistical methods that used in overcome the spatial effects of panel data type is Geographically Weighted Panel Regression (GWPR), which is a combination of Geographically Weighted Regression (GWR) models and panel data regression. This study focused on the establishment of GWPR model with fixed effects using fixed bisquare kernel on the village classification index in Batang Regency, 2015-2018. The results of this study indicate that the fixed effect model GWPR differ significantly on panel data regression model, and the model generated for each location will be different from one another. In addition, all independent variables namely the community economy, security and order, and community participation in development have a significant effect on the village classification index for all villages with R-square value of 0.3952.


Introduction
Village profile is an information system which the village is directly subject to inputting data. Village Profile is a comprehensive picture of the character of villages which includes basic family data, natural resource potential, human resources, institutions, infrastructure and facilities as well as the development of progress and problems faced by villages. The level of Village Development is a certain status of the achievements of the results of development activities that reflect the level of progress and / or success of the community, village government and regional governments in carrying out development in the village [1].
The purpose of the preparation of village profile is to make a database and source of information for development needs. The level of village development which is measured using a village classification index reflects the success of village development every year and every five years is measured from the speed of the community's economic development, security and order, and community's participation in development.
Batang Regency is a combination of coastal, lowland and mountainous regions so that the villages have a diverse geographical character. The difference in geographical conditions between one location and another location of an area allows for spatial problems that cause a significant influence on the rate of speed of economic development, security and order, and community's participation in development in each village. Therefore, it is necessary to study variables in several time periods that affect the level of village development. In addition, a statistical modeling method is needed that takes into account the geographical location or factors of the location of observation. The statistical method used to overcome spatial effects, especially the problem of spatial heterogeneity in panel type data is Geographically Weighted Panel Regression [2].  [3]. In this study, the focus is on the establishment of a fixed effect GWPR model using fixed bisquare kernel weighting on the village classification index data in the Village Profiles in Batang Regency in 2015-2018.

Data panel regression analysis
Panel data regression is a combination of cross-section data and time series data. The general panel data regression model is [4]: (1) The one-way residual component model for the panel data regression model is defined in the following equation: (2) Based on Eq. (1) and (2), the general panel data model becomes: ( 3 ) Fixed effect model is also known as the Least Square Dummy Variable (LSDV), because the observation value on the coefficient µi is a dummy variable that has different values for each i-th individual . One of the parameters estimated in the fixed effect model is using within transformation [5]. The fixed effect model as follow as: Where, RRSS = restricted residual sum of squares common effect model ; URSS = unrestricted residual sum of squares fixed effect model. H0 is rejected if F0 > Ftabel with Ftable = F((n-1, nT-n-p, α)) or p-value < α, which means the model used is the fixed effect model. The Hausman test is testing to model of regression panel data for random effect or fixed effect. It is as follow as [5]: H : random effect model H ଵ : fixed effect model H0 is rejected if W > ߯ ఈ, ଶ or p-value < α, which means the model used is the fixed effect model. The Lagrange multiplier test is to testing model of regression panel data which is common effect model or random effect model. The testing as follow as [5]: H : common effect model or p-value < α, which means the model used is the random effect model.

The assumptions of data panel regression
There are some assumption of Data Panel regression model such as normality of residual, homoscedasticity of residual; non-autocorrelation of residual; and non-multi-colinearity of predictors. For normality test, we can use Jarque Bera (JB) test, with hypothesis as: H0: Residuals are normally distributed H1: Residuals are not normally distributed H0 is rejected if JB > ߯ (ଶ) ଶ or p-value < α, which means the residuals are not normally distributed. One test used in analyzing the homoscedasticity assumption is the Glejser test. Glejser suggests regressing the absolute value of residuals with independent variables . Autocorrelation assumption aims to determine the possibility in a model there is a correlation between residuals in period t with residuals in period t-1. To detect the existence of autocorrelation can use the Durbin Watson method [6]. Multicollinity is a linear relationship between independent variables in the model. One way to detect the presence of multicollinearity is by calculating VIF (Variance Inflation Factor) where multicollinearity is detected if the VIF value > 10 [6].

Fixed effect geographically weighted panel regression model
Geographically Weighted Panel Regression model is a combination of the GWR model and the panel data regression model by involving the time element in the GWR model. The equation of the Fixed Effect Geographically Weighted Panel Regression model is [7]: i = 1,2,…,n dan t = 1,2,…T In spatial modelling, we can test the spatial effect of the data by spatial heterogeneity. Spatial heterogeneity is caused by spatial unit conditions in an inhomogeneous observation area. The Breusch-Pagan test method can be used to test spatial heterogeneity as [1]: H0: there is no spatial heterogeneity H1: there is spatial heterogeneity Statistical test: which means there is spatial heterogeneity. To estimate the parameters of fixed effect geographically weighted panel regression model using weighted least square approach as estimated in GWR.
In the weighting of the GWPR model is the same as the weighting of the GWR model, which depends on the distance between points of observation location. Observations in local sampling locations will be weighted based on kernel functions in GWPR as well as in GWR [7]. One is to determine the weighting matrix by using the kernel function. One type of kernel function used in this study is the fixed kernel function with the bisquare kernel function which is stated by the following formula [7]: In the fixed kernel weighting function, there are bandwidth parameters. Bandwidth is analogous to the radius of a circle, so that an observation location that is within the radius of a circle is still considered influential in forming parameters at the i-observation location. There are several methods that can be used to choose the optimum bandwidth and one of them uses Cross Validation (CV). The CV calculation in GWPR is the same as the GWR which is calculated based on the average of the dependent and independent variables for the whole time and is defined as follows [8]: where yi is the average over time the dependent variable at the observation location i and ‫ݕ‬ ത ஷ (ܾ) is the estimator value of yi with bandwidth b with observations at the location (ui,vi) removed from the estimation process. The testing of Fix Geographically Weighted Panel regression are simoultanous and partial test. The simoultanous test as follow as [8]: H0 ∶ βk(ui,vi) = βk for each k = 1,2,…,p and i = 1,2,…,n (there is no significant difference between the panel data regression model and GWPR) H1 ∶ there is at least one βk(ui,vi) ≠ βk for each k = 1,2,…,p and i = 1,2,…,n (there is significant difference between the panel data regression model and GWPR) Statistical test: ‫ܨ‬ = H0 is rejected if F > F1−α,df1,df2 or p-value < α,which means there is significant difference between the panel data regression model and GWPR. The partial test as follow as:

Research methods
The data used in this study are secondary data sourced from the Office of Community and Village Empowerment of Batang Regency. The variables used in this study are the village classification index data as the dependent variable and independent variables, namely the community economy (X1), security and order (X2), and community participation in development (X3) with the observation unit used covering 31 villages in Batang Regency from 2015 to 2018 and the geographical location of each village in Batang Regency. The analytical method used in this study is Fixed Effect Geographically Weighted Panel Regression using R software. The steps taken to analyze the data in this study: 1. Obtain Village Profile data 2. Estimating panel data regression parameters using the common effect, fixed effect, and random effect models. 3. Conduct a Chow Test to select a model between the common effect model and the fixed effect model.
Conduct Hausman Test to choose a model between the random effect model and the fixed effect model. 4. Testing the assumptions of panel data regression namely normality test, non-autocorrelation test, heteroscedasticity and non-multicollinearity. 5. Conduct spatial heterogeneity testing. 6. Testing local non-multicollinearity assumptions. Calculating the euclid's distance between the i-th location and the j-location located at coordinates (ui,vi). 8. Calculate the optimum bandwidth with the minimum CV method. 9. Calculate the fixed bisquare weighting matrix using optimum bandwidth. 10. Estimating the parameters of the fixed effect GWPR model using the fixed bisquare weighting matrix. 11. Testing the GWPR fixed effect model. 12. Get the final model and the coefficient of determination. 13. Interpret the model that has been obtained.

Panel data regression model
There are three model for panel data regression such as common effect, fixed effect, and random effect for modelling village classification index. The estimation results of the common effect as follow as: The estimation results of the fixed effect as follow as: The estimation results of the random effect as follow as: ‫ݕ‬ ො ௧ = 0.302 + 0.134 ܺ ଵ௧ + 0.317 ܺ ଶ௧ + 0.098 ܺ ଷ௧ From the three model, we conclude that the independent variables have positive impact for dependent variable. In fixed effect model, the variable of ܺ ଷ௧ not significant. The variable of ܺ ଶ௧ has the biggest impact than others. For selecting the best model, we should get the Chow test and Hausmant test as follow as in table 1.  Table 1, we concluded that the best model for modelling the village classification index is Fixed effect model. After that, we checking the assumptions of Fixed effect model of panel data regression. In testing the panel data regression assumptions, it was concluded that in the residual normality test there was no normal distribution, in the heteroscedasticity test variants of the constant residuals, in the nonautocorrelation test there was no autocorrelation in the residuals, and in the non-multicollinearity test there was no linear relationship between the independent variables.

Fixed effect geographically weighted panel regression model
Based on the spatial heterogeneity test obtained a BP value of 16.335 and a p-value of 0.0009 which indicates that there is spatial heterogeneity. Meanwhile, the assumption of local non-multicollinearity is fulfilled because the VIF value of each independent variable for each location < 10 indicates that there is

Conclusion
Based on the model selection in the panel regression, the fixed effect model was chosen to be applied to the Geographically Weighted Panel Regression model. In addition, based on testing the suitability of the model shows that the fixed effect GWPR model is significantly different from the panel data regression model, and the resulting model for each location will be different from one another. In the fixed effect GWPR model produces R 2 value of 39.52%.