Comparison Four Kernels of SVR to Predict Consumer Price Index

The economy of a region is affected by the stability of food supplies. If the market price of the food supply is stable, the purchasing power level will increase. The price stability of food supplies can be anticipated by using the Support Vector Regression Method, to predict the Consumer Price Index, known as CPI. In the Consumer Price Index assessment, using data based on recording, measurement and calculation of the goods and services average price which consumed by households in a certain period of time. Goods and services that are deemed to represent household expenses are then averaged. The CPI in this study is a type of food supply issued by the Indonesian Central Statistics, and the input variable is taken from the prices of staple commodities in the city of Surabaya, Malang and Kediri based on data from the Siskaperbapo website. To get the supported vector data, the hyperplane maximized by the SVR concept. This concept is able to overcome the overfitting, in order to obtain more accurate prediction results. In predicting the Consumer Price Index, reference data is divided as training data 2016-2019 and testing data 2017-2020. All four kernels were used in the test, namely Spline kernel, Gaussian-RBF kernel, Linear kernel and Polynomial kernel. All four kernels are compared to see their MAPE, this can be shown by the Mean Absolute Percentage Error (MAPE) of less than three, if by using Gaussian RBF kernel. The smallest MAPE value showed by Malang CPI value, which is 1.8242 with C = 50, followed by Kediri with the MAPE value of 2.251 with C = 50 and MAPE value of Surabaya which is 2.5279 with C = 50.


Introduction
A series of processes to explore the added value of a data set in the form of manual unknown knowledge or automatic analysis of complex big data with the aim of discovering patterns or trends that are usually not aware of its existence is called data mining. The data mining process often involves statistical methods, mathematics, and utilizing artificial intelligence technology. Data mining is part of data science by using an algorithm known as machine learning that functions to explore knowledge in datasets. It usually takes a long time to evaluate accuracy and select the best algorithm as the final model. Furthermore, communicating the final result through visualization that can be interpreted by related parties such as the user [1]. Science data is growing rapidly for many reasons, including : Fast growing data sets; Data is stored in a data warehouse; With web and intranet data access increases; Pressure of business competition increases market share in the global economy; Always increasing data; Storage media is growing rapidly [2]. SVR is also known as Support Vector Machine-Regression. The difference between SVM and SVR lies in the output and application of the system [3], [4]. In addition important information from large data, data mining is also used for predictions such as predict the CPI value with expectations of keeping food commodity prices stable. The purpose of using the kernel in research is to implement a model in a higher dimensional space (feature space) without having to define the mapping function from the input space to the feature space, so that for non-linear separable cases the input space is expected to be linear separable in the

Research Methodology
There are 34 types of input data (t1, t2, t3,..., t34) which are taken from food prices from the siskaperbapo websites in 3 different cities for 4 years from 2016 -2019. The amount of data as input variables are 1632 for each city. The target variable consists of one attribute (CPI) with 1680 data.

SVR method
The SVR method purpose is to improve generalized performance by selecting the appropriate use of kernel functions. Therefore, the kernel selection is very important for a particular application [7]. SVR was first introduced and developed from the concept of SVM theory [8]. SVM is a technique for making predictions, both in classification and regression cases [9]. SVM can approach the regression function by using the ε-insensitive loss function concept. The concept of ε-insensitive loss function is used to evaluate how well the regression function is used. The application of SVM in regression cases is called SVR-Regression. SVR is a method that can produce good performance, because it can overcome the problem of overfitting, which is a condition in where a model does not describe the relationship between input and output variables well, but instead describes random error or noise, which will result in poor predictive results [9, [10]. The basic objective of the SVR is to find the function f (x) which has the most ɛ deviation from the actual target obtained from all training data, and at the same time the function must be as flat as possible. In other words, the error does not matter, as long as the error is less than epsilon ɛ. In SVR it is known as support vector, support vector is training data used in testing.
is the mapping result of the T function in the input space, w is weighting vector dimension 1and b is bias or deviate. The w and b coefficients are estimated by minimizing the risk function defined in the equation (1). The coefficients w, b minimize the risk function of the following equation: SVR will find a function f(xi) which has the greatest deviation ε from the actual target yi for all training data. then with SVR, when ε is equal to 0, a perfect regression will be obtained. Conversely, a high ε value is associated with a small slack variable value and low accuracy. The addition of this slack variable is to solve the problem of infeasible margin limiter in the optimization problem [9]

Function Kernel
According to the kernel function is a function k where all the input vectors x, z will fulfill [3] the following conditions: Regression problems that have a nonlinear pattern (nonlinearity) can be solved by using kernel functions [11]. The four kernel functions used in the consumer price index forecasting test are as follows:

CPI Calculation
The calculation of the CPI in Indonesia is carried out by Indonesian Statistics. Started from January 2014 until now, CPI data have been obtained from surveys conducted in 82 cities throughout Indonesia. In East Java, the CPI calculated by eight representative cities / districts. This study used three cities, namely Surabaya, Malang and Kediri. The CPI calculation based on a 2012 Cost of Living (SBH) survey. This data used as the basis for determining the commodity package, weight, city coverage and base year for processing the CPI. Every five years, an improvement of the weighing list held, so that the Consumer Price Index value is on the same scale, which is 100. This CPI calculation carried out by Statistics Indonesia every month using the Modified Laspeyres method [12] as follow:

MAPE
Mean Absolute Percentage Error (MAPE) is a measure of relative error. MAPE is more accurate because MAPE states the percentage error in the results of estimates or forecasts against the actual results during a certain period. MAPE, the average absolute error over a certain period multiplied by 100% and is also a measure of the relative precision used to determine the percentage of deviation in the estimation results. This approach is useful when the forecast variable size is important in evaluating the accuracy of the forecast. In addition, MAPE indicates how much error in estimating is compared to the real value [14]. In this paper, MAPE is used to measure the performance of four kernel of SVR to predict the Consumer Price Index, The MAPE equation is shown bellow.
Xi = period true value data i Fi = data predicts of period i, n = predicts time period If the MAPE value is still below 10%, it can be said that the MAPE value is very good [5].

Testing Result
In the first study, it was tried to use four kernels alternately, namely using the Spline, Gaussian-RBF, Linear, and Polynomial but the smallest MAPE value was less than 3 using the Gaussian -RBF kernel. After that followed by the Polynomial kernel and then the SPline kernel.  For the city of Surabaya with C = 1000, the predicted CPI value (in blue) is almost close to the reference  CPI. And months 8-10, the predictive value is almost the same as the reference value. For the city of Surabaya with C = 1000, the predicted CPI value (in blue) is almost close to the reference CPI. And months 8-10, the predictive value is almost the same as the reference value.  For the city of Kediri with C = 1000, the predicted CPI value (in blue) is almost close to the reference CPI (in green) The Table can be analyzed that from three cities (Surabaya, Malang and Kediri) with a trial of four kernels (Spline, Gaussian-RBF, Linear, and Polynomial). It can be concluded that the Gaussian-RBF kernel is more appropriate in determining the predicted value, where by trying all C values, MAPE value Gaussian RBF tends to be more stable with MAPE values less than three [5]. With the same iteration C = 50, the MAPE value for Malang city is 1.8242, smaller than the MAPE value of Surabaya city and Kediri city. Followed by Kediri with 2,2511 and Surabaya with 2,5279. Malang City has a stable MAPE value which is close to 1.8 tested using four kernels at C = 50, it can be concluded that Malang City is relatively stable in the prediction of the Consumer Price Index.

Conclusion
The SVR method with 4 kernels has succeeded in predicting the Consumer Price Index in 3 cities, namely Malang, Surabaya, and Kediri. The prediction value of the Consumer Price Index for the three cities; Malang has the smallest MAPE value, which is 1.8242 with C = 50, followed by Kediri with the MAPE value of 2.251, C = 50 and Surabaya with the highest MAPE value of 2.5279, C = 50. From the four kernels modelling methods, the Gaussian-RBF kernel shows the most stable performance among other kernels namely Spline, Linear, and Polynomial kernels. The Gaussian-RBF kernel is the best method because it produces a stable MAPE value.