Hybrid model for forecasting time series with trend, seasonal and salendar variation patterns

Most of the monthly time series data in economics and business in Indonesia and other Moslem countries not only contain trend and seasonal, but also affected by two types of calendar variation effects, i.e. the effect of the number of working days or trading and holiday effects. The purpose of this research is to develop a hybrid model or a combination of several forecasting models to predict time series that contain trend, seasonal and calendar variation patterns. This hybrid model is a combination of classical models (namely time series regression and ARIMA model) and/or modern methods (artificial intelligence method, i.e. Artificial Neural Networks). A simulation study was used to show that the proposed procedure for building the hybrid model could work well for forecasting time series with trend, seasonal and calendar variation patterns. Furthermore, the proposed hybrid model is applied for forecasting real data, i.e. monthly data about inflow and outflow of currency at Bank Indonesia. The results show that the hybrid model tend to provide more accurate forecasts than individual forecasting models. Moreover, this result is also in line with the third results of the M3 competition, i.e. the hybrid model on average provides a more accurate forecast than the individual model.


Introduction
Since the publication from Makridakis and Hibon [1] about "M3 competition: results, conclusions, and implications", particularly the third result that stated "the accuracy when various methods are being combined outperforms, on average, the individual methods being combined and does very well in comparison to other methods", many researches about hybrid method in forecasting have been done in spectacular way. Zhang [2] is one of the researchers who firstly proposed hybrid model by combining ARIMA as a linear model and Neural Networks as a nonlinear model for time series forecasting. Zhang's work has influenced many forecasting researchers to develop hybrid method for solving forecasting problems.
Most of the monthly time series data in economics and business in Indonesia and other Moslem countries not only contain trend and seasonal pattern, but also affected by calendar variation pattern. There are two types of calendar variation effects, i.e. the effect of the number of working days or trading and holiday effects. Monthly data about fashion sales in retail company [3], inflation both national and in certain city [4], and inflow-outflow of currency at central bank [5] are some examples about economics and business data that have trend, seasonal, and calendar variation patterns.
There are some methods that usually used for forecasting time series with trend, seasonal, and calendar variation patterns. Liu [6] and Cleveland and Devlin [7] were among the initial researchers who studied the calendar variation effect on time series. Currently, Suhartono et al. [3] proposed two level ARIMAX and regression models for modeling time series data with calendar variation effects due to Eid ul-Fitr, trend and seasonal patterns. In general, these models that were developed for forecasting trend, seasonal, and calendar variation patterns only focused on individual model, such as time series regression models, ARIMAX models, and models based on artificial intelligent methods such as Neural Networks. This paper focuses on developing a hybrid model by combining classical linear time series models (such as time series regression and ARIMAX modes) and modern nonlinear models based on artificial intelligent methods particularly Neural Networks for forecasting time series with trend, seasonal, and calendar variation patterns. The proposed models are two level models that are developed based on classical linear time series model in the first level and neural networks in the second level. This model is applied to two empirical data, i.e. simulation and real data about currency inflow and outflow in Bank Indonesia.

Method
In this section, three methods for handling trend, seasonal, and calendar variation patterns are presented, i.e. ARIMAX as classical linear models, Neural Networks as modern nonlinear model, and hybrid model that combining classical linear and modern nonlinear models.

ARIMAX
Cryer and Chan [8] stated that ARIMAX model is ARIMA model with additional variable or also known as exogenous variables. The general ARIMAX model for forecasting data with trend, seasonal, and calendar variation pattern is as follows [3] (1) where = 1,2, … , , is dummy variables for seasonal pattern , is dummy variable for -th calendar variation effect is a white noise process and is the seasonal period, is the backshift operator, and is a sequence of white noise with zero mean and constant variance.

Neural Networks
Neural networks (NN) is one of machine learning techniques that has been developed as a generalization of the mathematical model of the biological nervous system. The neural networks model that mostly be used in time series forecasting is feed forward neural networks (FFNN) or multilayer perceptron (MLP) [9]. The accuracy of neural networks model is determined by three components, i.e the network architecture, training methods or algorithms, and activation functions. FFNN with p input and one hidden layer that consist of m neuron can be illustrated as Figure 1.
The model of FFNN in Figure 1 can be written as follows: where w is the weights that connect the input layer to the hidden layer, v is the weights that connect the hidden layer to the output layer, g 1 (·) and g 2 (·) is the activation function, while w ji and v j are the weights. The widely used activation function are logistic sigmoid and tangent hyperbolic.

Hybrid Model
Hybrid model is a combination between linear and nonlinear models that usually be used for increasing the forecast accuracy. In general, the mathematical form of combination between linear and nonlinear models is as follows [2]: where is a linear component and is a nonlinear component of the model. In this paper, NN is used for modeling the nonlinear component as proposed by Zhang [2].
Estimation of this hybrid model is done in two steps. The first is modelling the linear component to get the residual and then applying a nonlinear model to this residual for handling the nonlinear component. In this paper, ARIMAX model is used for handling the linear component. Assume is residual at period from the first linear model or ARIMAX model, i.e.
= −̂ (4) where ̂ is the forecast of linear model at period . Then, NN is applied for modelling as follows: where (. ) is a nonlinear function from the NN model and is the residual of this NN model. Hence, the forecast value of the hybrid ARIMAX-NN model is as follows ̂=̂+̂ .

Results
In this section, the results from both simulation studies and real case studies are presented. Simulation studies is done for evaluating the performance of each method for forecasting data with trend, seasonal, and calendar variation patterns. Then, the proposed hybrid method applies to real monthly data about outflow and inflow currency at Bank Indonesia.

The results of simulation study
The simulation data that contains trend, seasonal, and calendar variation patterns are generated by using the model as follows: Three forecasting methods are applied to these four scenarios of simulation study and the forecast accuracy are compared. The comparison results of forecast accuracy between ARIMAX, NN, and Hybrid models in testing data based on RMSE are shown in Table 1. The results show that the Hybrid model yield more accurate forecast than ARIMAX and NN in each scenario. Hence, the Hybrid model is the best model for forecasting these four scenarios data, i.e. data with trend, seasonal both homogenous and heterogenous, calendar variation, and both linear and nonlinear noise patterns. This result is also in line with the third results of the M3 competition, i.e. the hybrid model on average provides a more accurate forecast than the individual model.

The results of real data
In this paper, two real data are used as case studies, i.e. monthly currency inflow and outflow at Bank Indonesia in West Java Province for the period 2004:M1 until 2016:M12. The last 12 data in year 2016 is used as testing data. Figure 2 illustrates the training data in a time series plot. This plot shows the presence of calendar variation due to the celebration of Eid every year. Moreover, Figure 2a illustrates that the largest inflow in every year is at one month before or during month Eid holidays depends on the week that Eid occurred. Similarly, Figure 2b shows  , where * is transformation data by Box-Cox transformation with = 0.2, = 1,2, … , ; 1, and 2, are the second and third trend respectively, , dummy variables for seasonal pattern, , is dummy variable for -th calendar variation effect, and is a white noise process.

The results of Neural Networks model
The inputs are determined by using PACF or Partial Autocorrelation Function from the data as proposed by Crone and Kourentzes [10]. For illustration, the best NN model for outflow data is the NN architecture as shown at Figure 3. This graph shows that the best inputs for forecasting outflow data are lag 1, 2, 3, 5, 12, 13, and 14 of outflow, and 2 neurons in hidden layer.

The results of Hybrid model
Hybrid ARIMAX-NN model is a combination between ARIMAX as linear model and NN as nonlinear model. Firstly, ARIMAX model is fitted to original data and then the residuals are fitted by NN. Finally, the forecast values are calculated by summing the forecast values of ARIMAX and NN.
As an illustration, the inputs of NN at this hybrid model for outflow data are the lags of residual from the ARIMAX model, i.e. lag 1, 2, 12, 13, 14, 25, and 26. By trying the number of neurons in hidden layer from 1 to 15, the results show that 3 neurons in hidden layer is the best model for forecasting the outflow data. Thus, NN with 7 lags input and 3 neurons in hidden layer is the best NN in this hybrid model with the nonlinear equation as follows: Finally, the forecast values for hybrid model are calculated by summing the forecast from linear model in equation (8) or ̂, and nonlinear model in equation (9) or ̂, i.e.

The evaluation of forecast accuracy
The root mean squared error (RMSE) in testing data is used as an evaluation index for evaluating the performance of three forecasting models that be applied for forecasting inflow and outflow data. The results of RMSE in testing data obtained using these models are listed in Table 2. Moreover, the graph between forecast and actual values from each method at testing data are illustrated in Figure 4.  The results at Table 2 and Figure 4 show that the hybrid model generate more accurate forecasted values than ARIMAX and NN for outflow data. It supports the third results of M3 competition, i.e. the hybrid model on average provides a more accurate forecast than the individual model. Otherwise, ARIMAX yield more accurate forecast than NN and Hybrid for inflow data. This result is also in line with the first results of M3 competition, i.e. the complex methods do not necessary yield better forecast than the simpler one.

Conclusion and future work
In general, forecasting of time series data that have trend, seasonal and calendar variation patterns need special treatment. This paper showed that ARIMAX, NN, and Hybrid model could be used for forecasting these kind of time series. Moreover, the most important part in applying these models is how to determine or choose the appropriate inputs for each model, particularly the inputs for tackling the calendar variation effects. The results from simulation study showed that the hybrid model yield better forecast than ARIMAX and NN models. Otherwise, the results at real data showed that the hybrid model only yielded better forecast in one of two case studies, i.e. for forecasting outflow data. These results of simulation and outflow data are in line with the conclusion of many previous researches, particularly the third result of M3 competition [1] and Prayoga et al. [5] that stated hybrid model in average will give better forecast than individual model. In addition, the results at the second case study, i.e. inflow data, showed that simpler model yielded better forecast than the complex models. This result also in line with the first result of M3 competition [1] and Suhartono et al. [3] that concluded the complex model do not necessary give better forecast than the simpler one. Moreover, further research is needed to validate these results and to compare the forecast accuracy with other more intelligent methodologies.