Forecasting the amount of rainfall in West Kalimantan using Generalized Space-time Autoregressive model

The generalized space-time autoregressive or GSTAR is a space-time model which can be used to analyze time-series data in several locations considered to be correlated. In this research, the GSTAR model is applied to forecast the amount of rainfall in West Kalimantan, especially at Sintang Station, Melawi Station, and Ketapang Station. The data used for modeling is data of the amount of rainfall for the period January 2013-December 2017, while the data used for model validation is data for the period January 2018-December 2018. The spatial weights used are uniform weights, inverse distance weights, and normalized cross-correlation weights, and the estimation method used is the ordinary least square or OLS estimation. The best model is selected based on the smallest RMSE (root mean square error). The results showed that all spatial weights gave the same good GSTAR(1:1) model because they had almost the same RMSE value. Thus, this model can be used to forecast the amount of rainfall for the period January 2019-December 2019. The forecast results show that for Sintang Station and Melawi Station, the highest amount of rainfall is estimated to occur in March 2019 and the lowest will occur in August 2019. Meanwhile, for Ketapang Station, the highest amount of rainfall is estimated to occur in January 2019 and the lowest will occur in September 2019.


Introduction
Rain is part of the water cycle and is the main water source that supplies water to the earth's surface. The volume of rain that occurs at a certain time is called the amount of rainfall. The amount of rainfall in an area is usually influenced by the amount of rainfall in other areas. This makes the amount of rainfall in a location with other nearby locations tend to have almost the same amount of rainfall. Apart from being influenced by other regions, rainfall is also influenced by previous times. Therefore, rainfall data are considered as spatio-temporal data.
The amount of rainfall at a location can be predicted using a space-time model, one of which is the Generalized Space-time Autoregressive (GSTAR) model. The GSTAR model was chosen because this model can be used to predict a spatial phenomenon or the phenomenon of the occurrence of an area that is influenced by the events of other regions. The GSTAR model is a generalization of the Space-Time Autoregressive (STAR) model. In the STAR model, the parameters are assumed to be the same for all locations, while GSTAR is assumed to be different. Therefore, the GSTAR model is considered more realistic than the STAR model [1]. In the GSTAR model, the spatial relationship is represented representing the spatial relationship between location and location [1].
The GSTAR model with time order and spatial order , , … , denoted ( : , , … , ) can be written as follows.
is a diagonal matrix with ∅ ℓ ( ) denotes the space-time parameter at lag time and spatial order ℓ in location . Meanwhile, (ℓ) is a spatial weight matrix of order ℓ with the elements 0 ≤ ≤ 1 and ∑ = 1 , ( ) = ( ), ( ), … , ( ) is the error vector with dimension at time with ( ) represents the model error at location i at time t which is following the white noise assumption (constant mean and variance, and uncorrelated) and normally distributed.
Research about GSTAR has been applied in various fields, such as GDP data modeling in West European Countries [2], and forecasting oil production data at Volcanic Layer Jatibarang, West Java, Indonesia [3]. Motivated by this result, this research will focus on forecasting the amount of rainfall in West Kalimantan using the GSTAR model using three spatial weights, uniform weight, inverse distance weight, and normalized cross-correlation weight. In 2017, [4] also predicted the amount of rainfall in West Kalimantan but different in location and the type of spatial weights.

Method
The data used in this research is rainfall data at three stations in West Kalimantan, that is Sintang Station, Melawi Station, and Ketapang Station. Data obtained from Badan Pusat Statistik (BPS), West Kalimantan Province from January 2013 to December 2018. The variables used in this research are presented in Table 1.

Result and Discussion
This section discusses the research results included data description, stationarity testing, spatial weight matrix, GSTAR model identification, parameter estimation, checking the residual assumptions, calculating the RMSE value, and forecasting the amount of rainfall in the next period.

Data Description
The statistics of the rainfall data at three stations in Kalimantan Barat are presented in Table 2. From the table, we found that Melawi Station has the highest average rainfall, while Sintang Station has the lowest average rainfall. The range between the minimum and maximum values is quite large. Comparing to other stations, the standard deviation in Melawi is also quite large. It shows that data distribution occurred in Melawi Station is spread widely.

Stationarity Test
Data is stationary if the mean and variance are constant [5]. Data is "stationary in the mean" if the data is stable fluctuating around the average. Data is "stationary in the variance" if the data fluctuates from time to time, but the average does not need to be constant. The stationarity can be checked by examining the time series data plot and ADF test. The data plot can be seen in Figure 1. Figure 1 shows that the fluctuation of the plot is around the average value and the fluctuation is stable, it can be concluded that the rainfall data at three locations are stationary in the mean and variance.
Stationary identification by looking only at a plot of the data pattern is often subjective. Alternatively, data stationarity can be identified using the ADF test. The ADF test was carried out using tseries package on the R-4.0.3 software. The ADF test results are shown in Table 3. Table 3 shows that each variable has a p-value <0.05, so it can be concluded that the data is stationary in the mean and variance.

Spatial Weight Matrix
The weight matrix of the GSTAR model used in this research is the uniform weight matrix, the inverse distance weight matrix, and the normalized cross-correlation weight matrix.
3.3.1. Uniform weight matrix. The element of a uniform weight matrix in the GSTAR model is calculated based on the number of nearby locations. According to [6], uniform weight is calculated by the formula = where is the number of the nearby location to location in the first-order spatial lag. From Table 4 and equation (2), we obtain the following uniform weight matrix and = 0 for = . In equation (3), is the distance from location to location [7].

Normalized cross-correlation weight matrix.
The element of a cross-correlation weight matrix between two locations is calculated for each time lag . According to [8], the normalized crosscorrelation weight is calculated by the formula (4). where is the cross-correlation between location and . The cross-correlation value for the first lag time is presented in Table 6.  (4) and Table 6, we get the normalized cross-correlation weight as follows

Model Identification
The order of the GSTAR model can be identified based on the ACF plot and the PACF plot for each location. ACF plot and PACF plot can be seen in Figure 2 and Figure 3.  Based on Figure 2 and Figure 3, it can be seen that the ACF plot and the PACF plot cut off at the first lag so that the possible order of the GSTAR model is only the GSTAR(1:1) model.

Model Estimation
The OLS method is the best estimation method that is unbiased, linear, and the best (BLUE). According to [1], GSTAR model for each location i can be written as Therefore, the GSTAR model for all locations can be presented as the following linear model where = ( , … , ) , = diag(X , … , X ), = (∅ , … , ∅ ), and = ( , … ). Estimation of parameters using the OLS method can be obtained by minimizing the number of squares of errors defined as = ′ The estimation results of the GSTAR model parameters using uniform weights, inverse distance weights, and normalized cross-correlation weights can be seen in Table 7.  Table 7 to the matrix and matrix ℓ . The GSTAR model equation for each location can be written as follows 1. GSTAR(1:1) for uniform weight The GSTAR model equation with a negative coefficient interprets that the amount of rainfall in an area in the previous negatively affected the amount of rainfall in the current period. Conversely, a positive coefficient value shows that the amount of rainfall in an area in the previous period has a positive effect on the amount of rainfall in the current period. For example, the GSTAR(1:1) model with uniform weight at Ketapang Station can be interpreted, if the amount of rainfall last month at Ketapang Station increased by 1 mm, while at other stations and other times it was constant, the rainfall at the Ketapang Station in the next period will decrease by 0.522 mm.

Checking Model Assumption
The model assumptions in this research are white noise assumption and normality assumption. White noise assumption can be inferred from the stationary residual around zero, and plot of residuals ACF and residuals PACF which lied on their confidence interval line. Normality assumption can be deduced from the symmetric histogram linear Q-Q plot of residuals. The results of checking GSTAR(1:1) model assumption for each spatial weight is presented in Table 8. Table 8 shows that GSTAR(1:1) model assumption for each spatial weight matrix has been fulfilled. It means that the model is suitable to forecast the amount of rainfall in West Kalimantan.

Calculation of RMSE Value
The criterion for selecting the best model is determined by the RMSE value. The best model has the smallest RMSE value. The RMSE value can be calculated by the following formula where is the observed value at time , and ̂ is the predicted value at time . The criteria for selecting the best model are determined by taking into account the RMSE value based on the insample residual. The results of forecasting accuracy can be known based on the value based on the out-sample residual. The best model is the model that has the smallest RMSE value. Table 9 shows that the in-sample RMSE and the out-sample RMSE for each spatial weight gave almost the same RMSE value. This proves that all the models produced are equally good, so that in predicting the amount of rainfall in the next period it is carried out using the GSTAR(1:1) model with three spatial weights, namely uniform weights, inverse distance weight, and normalized crosscorrelation weight.

Forecasting The Next Period
A good forecast is a forecast that produces a forecast value that is not far from the true value. Forecasting the amount of rainfall in the next period is obtained from the GSTAR(1:1) model with three spatial weights there are uniform weight, inverse distance weight, and normalized crosscorrelation weight. Forecasting the amount of rainfall for the next periods is presented in Table 10.
Based on Table 10, it can be seen that the results of forecasting the amount of rainfall in January 2019 to December 2019 for Sintang Station, Melawi Station, and Ketapang Station are fluctuating results. The forecast results for Sintang Station and Melawi Station, the highest amount of rainfall is estimated to occur in March 2019 and the lowest will occur in August 2019. As for Ketapang Station, The assumption is fulfilled The assumption is fulfilled 2. Inverse distance weight The assumption is fulfilled The assumption is fulfilled 3. Normalized crosscorrelation weight The assumption is fulfilled The assumption is fulfilled The highest amount of rainfall is estimated to occur in January 2019 and the lowest will occur in September 2019.

Conclusion
The GSTAR model suitable for rainfall forecasting in West Kalimantan is the GSTAR(1:1) model with uniform weight, inverse distance weight, or normalized cross-correlation weight. The forecast results show that for Sintang Station and Melawi Station, the highest amount of rainfall is estimated to occur in March 2019 and the lowest will occur in August 2019. As for Ketapang Station, the highest amount of rainfall is estimated to occur in January 2019 and the lowest will occur in September 2019.