Crude Oil Price Forecasting Using Hybrid Support Vector Machine

Crude oil price is strongly impacting the world economy. However, it is very fluctuated and difficult for investor to make decision. Hence, forecasting is one of the ways to minimizing risks arise from indecision on future. This paper will apply Support Vector Machine (SVM) and Artificial Neural Network (ANN) and a proposed hybrid model name Empirical mode decomposition-Support Vector Machine (EMD-SVM) forecasting crude oil price. After obtaining the forecasting result, performance evaluation is carry out to show which method can better forecast the crude oil price. The result shows that the performance of cruel oil price forecasting can be significantly increased by using the proposed hybrid EMD-SVM model. Thus proven the hybrid model are out-perform than individual forecasting model.


Introduction
West Texas Intermediate crude oil (WTI) is one of the main oil market in the world. The most actively traded world commodity is oil, contributing more that 10% of the total world trade [1]. Moreover, oil has always played a prominent role and fuels almost two-thirds of world energy consumption. This makes oil movement considerably impactful on the world economy especially in January 2004 However due to the volatility of crude oil price time series data, the current forecasting method approaches cannot capture all the factor influencing future crude oil price data. Thus, a method that can better improve the forecast accuracy of crude oil price should be considered. The new forecasting approach that can assist business practitioners and researchers deals with uncertainty future crude oil price. In short, the aim of this study are first to forecast monthly West Texas Intermediate crude oil price using Artificial Neural Network (ANN) and Support Vector Machine (SVM). Second, to generate the hybrid model of Empirical Mode Decomposition (EMD) and SVM in order to improve the forecasting accuracy of crude oil price market and third to evaluate the performance of the individual and hybrid proposed approaches.

Materials and Methods
Crude oil price is a standard price officially controlled and agreed upon globally. For this paper study, West Texas Intermediate (WTI) crude oil spot price data from the Energy Information Administration (EIA) website was chosen for experimental purposes. The crude oil price data used in this study is monthly data which covers the period from January 1986 until December 2015. By this mean, there would be a total of 360 observations for this study. The crude oil price are measure in Dollars per Barrel.

Data Preprocessing
A difference in unit is able to affect the difference in data magnitude consequently have a tendency to impact prediction accuracy in the long run. Thus, the normalization process is important to sort out this problem. In this study, the crude oil price data need to be transformed into a definite appropriate range [0, 1] for prediction set via the normalization Equation (2.1) ) [2].
where X i is new x value, x is actual value, x min is minimum actual value, x max is maximum actual value, i= 1,…, n 2.2. Support Vector Machine (SVM) Support Vector Machines (SVM) was proposed by Vapnik [3]. SVM was initially intended for classification purposes (SVC). Besides classification, SVM also can perform regression and also time series forecasting. This technique is named support vector regression (SVR). The fundamental theory of SVM is structural risk minimization for reducing an upper bound of generalization error instead of an empirical error [4]. Generalization error is a measure accuracy degree of an algorithm in predicting outcome values for formerly unseen data. It can be reduced by avoiding overfitting. It is also the difference between error on the training set, expected error and error on the underlying joint probability distribution, empirical error. Besides, SVM models always used linear function to figure out the regression problem. It maps the data into a high-dimensional feature space though a nonlinear mapping ϕ if facing a nonlinear regression in order to make a linear regression in this space. where C is set parameters and to assess the trades-off between the empirical risk and the flatness of the model, ε is set parameter, L ε (d i ,y i ) is ε-intensive loss function, d i is real stock price during the i th period, w is flatness of the function.
Parameter C evaluated the trade-off between the empirical risk and the flatness of the model. Variables ξ and ξ * with positive slack represented the distance from the actual values to the relevant boundary values of ε -tube. Equation (7) was transformed to the following formation: Equation (2.5) is transformed to the following constrained formation by minimizing: where R(W, ξ, ξ * ) is dual function, ξ and ξ * are positive slack variables, which indicate the distance from the actual values to the corresponding boundary values of ε-tube.
iN  (2.7) Finally, presenting Lagrange multipliers and maximizing the dual function: where a i and a i * are Lagrangian multipliers. Lagrange multipliers satisfy the equalities, where K(x, x i ) represented kernel function. The amount of the kernel is equivalent to the inner product of two vectors x i and x j in the feature space ϕ(x i) and ϕ(x j ). Any function that fulfilling Mercer's condition can be applied as the kernel function. The Gaussian kernel function is apply in this study. The SVM were used to evaluate the nonlinear behavior of the predicting data set because Gaussian kernels aim to present good performance under common efficiency smoothness assumption. The equation of Gaussian kernel function is where x ix j is squared Euclidean distance, σ is a Gaussian parameter.

Artificial Neural Network (ANN)
Artificial Neural Network is a computational method stimulated by studies of the brain and nervous systems in living organisms. It is also an input-output mathematical model inspired from operation of human brain by implementing the same approach of obtaining knowledge through learning procedure. Multilayer perceptron means a feedforward network with one or more layers of nodes between the input and output nodes [5]. Basically it made up of three layers. First, the data is presented to the network via input layer. Next, data are processed in the hidden layer and finally reach the output layer where the results produced. A feed forward network is the input nodes in this illustration performing no computation but operates solely to distribute inputs into the network. The information in a feed forward network passes one way from the input layer, through the hidden layer and finally to the output layer. This study focus on three layer feed-forward backpropagation ANN. A three-layer MLP with n input nodes, q hidden nodes and one output node can be expressed as where y t is the output layer, x t −i is the input of the network, w i is the connection weights between the input and hidden layer nodes, w j is the connection weights between hidden and output layer nodes, g(.) and f(.) are activation functions, g(.) as the sigmoid function, f(.) as the linear function. g(x) = log sig (x) = [1+exp (-x)] -1 (2.14) Sigmoid functions are mostly used in financial applications as transfer functions because of its threshold behaviour which describes most of financial and economical series [6]. Backpropagation algorithms was choose as it is the most common applied learning algorithm among all neural network paradigms [7]. The back propagation transmits the errors across the network and permits adaptation of the hidden units. This algorithm helps in minimizing the global error via the steepest descent or gradient approach. The network weights and biases are accustomed by raising the error function's negative gradient for each iteration.

Model input determination
Model input selection is highlighted part for construction of data-driven model such as ANN and SVM as it contribute the elementary information about the system being modelled. There are doubt in many time series forecasting research on how they select appropriate model inputs since plenty of papers does not well describe the input determination methodology used. There are few way for select model input, the easier apply method is using autocorrelation (ACF) and partial autocorrelation (PACF), to identify the appropriate input variables [8]. The second method is ARIMA model proposed by Tang and Fishwick that the number of model inputs is the number of the autoregressive (AR) moving average components in the Box-Jenkins models [9]. Third method is stepwise method for determining the input for ANN models [10]. The method apply in this paper is by using trial and error according to minimum test errors in the ANN and SVM modelling [11]. By reference, sharda and patil proposed model inputs based on 12 inputs for monthly data and four for quarterly data heuristically [12]. Thus, the appropriate lags chosen using a trial-and-error approach (x t −1, x t −2,..., x t −p, where p is 2, 4, ..., 12). It gives the number of inputs (I) as 2, 4, 6, 8, 10 and 12.

Empirical mode decompostion
Huang et al. [13] is the one who first proposed the empirical mode decomposition (EMD) method. EMD is a signal analysis technique that could deal with nonlinear and nonstationary data. The key principle of this method is to decompose the original time series data into a sum of oscillatory functions so-called intrinsic mode function (IMF) by using the Hilbert-Huang transform (HHT). The IMFs kept the behaviour of the initial signal at different time scales. The EMD algorithm is implemented via R software package. IMFs have to fulfill the following two criteria: (a) At any point, the envelope's mean value built by the local maxima and minima is zero. (b) In the data set, the number of zero-crossings and extreme values should be differs at most by one or equal. The following are time series data decompositions process: 1) Detect all the local extrema, comprising local maxima and local minima with y(t).
2) Select the lower envelope y l (t) and upper envelope y u (t) of y(t).
3) Calculate the first mean value µ1(t), that is, µ1(t) = [y l (t) +y u (t)] / 2. 4) Compute the difference between the original time series y(t) and the mean time series µ1(t). The first IMF q 1 (t) is defined as q 1 (t)=y(t)-µ1(t). 5) Assess whether q 1 (t) satisfy the two settings of an IMF property. If not, repeat step 1 to 3 in order to find the first IMF. 6) After obtaining the first IMF, repeat the steps to determine the second IMF until achieve the final time series e y (t), residual component that satisfy the termination criteria to stop decomposition process. Sum all the IMF components including the residual component to obtain the original time series y(t).  In a time series x t , t=1, 2,…, N, one can predict h-steps ahead. For example X t+h , h=1 means make a prediction of 1 step ahead and so on. First, the historical time series x t , t=1, 2,…, N, is decomposed into an IMF component, q i (t), i=1,2,…,n and a residual component e y (t) using an EMD extraction process. Then, the SVM model is used in modelling each of the IMFs and the residual component, and which predictions are made.

Evaluation of Performance Forecasts
Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) were used to check the accuracy of prediction. As these performance criteria utilize as good indicators and are most commonly tools of judging the quality of prediction of the proposed model. According to Legates & McCabe [14], there should exist at least one measure of absolute error for each model in performance evaluation. Besides, RMSE is very sensitive to small errors thus it is a good indicator of performance evaluation. where y t is the actual values for period t, ŷ t is the forecast values for period t, n is the number of observations. Result from Augmented Dickey Fuller test and Anderson Darling test prove that the time series data is nonstationary and nonlinear. By comparing the overall performance of forecasting result, SVM perform better than ANN although ANN is quite competitive with SVM. Hence, SVM is a better forecasting method of crude oil price than ANN. The result of hybrid EMD-SVM is as we expected, its result showed better than individual ANN and even SVM. Thus this result of this research also gain more accuracy in predicting data of crude oil price and can used to replace the old forecasting crude oil price method. This will also directly aid in stabilizing the economy and the inflation rate.

Conclusion and discussion
The cruel oil time series often is highly non-stationary and non-linearity. Therefore, it may have poor prediction performance from applying traditional statistical models. Thus to improve the performance of cruel oil forecasting, a hybrid model based on a combination of EMD and SVM was proposed to predict monthly cruel oil price. The data are transformed into ranged [0,1] before introduced into model. In the proposed model, the EMD extraction process is first used to decompose the series of stable IMF that are each predicted by separate SVM models. The prediction results from these IMF are combined together to obtain a single SVM model thus obtaining the predicted result. The result shows that the performance of cruel oil price forecasting can be significantly increased by using the proposed hybrid EMD-SVM model. To illustrate the performance of the EMD-SVM model, the indication of performance evaluation such MAPE, MAE and RMSE was apply. The performance of the EMD-SVM are outperform than the individual ANN and SVM methods. In overall proved that hybrid method can increased the forecast accuracy and it is an effective tool as a very promising methodology for complex with highly non-stationary and non-linearity.