Multivariate generalized autoregressive score model (case study: vegetable oils and crude oil price data)

A multivariate econometric model can be used to forecast volatilities and dynamic correlations between various assets. Volatilities and dynamic correlations forecasting are important applied in risk management, hedging, asset allocation, and asset pricing options. This study uses a multivariate Generalized Autoregressive Score (GAS) model to analyze and forecast volatilities and dynamic correlations of the weekly prices of Crude Palm Oil, coconut oil, soybean oil, and crude oil. The GAS model is a new framework based on a score-driven time series model for updating time-varying parameters. Another multivariate econometric model is the Dynamic Conditional Correlation (DCC) GARCH model as a comparison. The empirical experiments regarding the distribution of return data show that the data can be approximated by the multivariate t-student distribution. The likelihood ratio test in the multivariate GAS model shows that the time-varying parameters in the GAS model are volatility, correlation, and location parameters. The evaluation performance model based on RMSE and MAE show that the multivariate GAS model has better performance in estimating and forecasting volatilities than the DCC GARCH model. Multivariate GAS models have better performance in estimating and forecasting the dynamic correlation on returns of CPO and soybean oil.


Introduction
Vegetable oil is an oil extracted from plants. Vegetable oil is useful as edible oil for food (edible oil) and other than food (non-edible oil). Indonesia is a major producer of world vegetable oil, namely Crude Palm Oil (CPO) and coconut oil. The price of Indonesia's vegetable oil correlates with the other vegetable oil price, such as soybean oil. The interaction between vegetable oil prices is due to the use of substitutions between one type of vegetable oil and another [1]. Research by Arianto et al. in 2010, Alam et al. in 2019, and Priyati and Tyers in 2016 show that vegetable oil prices also correlate with crude oil prices [1] - [3]. The use of vegetable oil as raw material for biodiesel is one of the factors driving the link between vegetable oil prices and crude oil prices.
As the largest CPO supplier country in the world, Indonesia has a great opportunity as a price maker. However, the Indonesian government does not have the power to determine prices both in international and domestic markets [4]. Bursa Malaysia and Rotterdam in the Netherlands still control the reference price of CPO. The Indonesian price for coconut oil is also a reference to the Rotterdam price. Indonesia plays a role as a price taker in the international trade of CPO and coconut oil, thus causing the uncertainty of CPO and coconut oil prices in Indonesia and causes price volatility.
Supply and demand for vegetable oil and crude oil in the global market strongly influence the price. The existence of variations between supply and demand of vegetable oil and crude oil in the market over 3 P i,t-1 = weekly closing price commodity-i at time-t-1 i = 1, 2, 3, and 4 refer to CPO, coconut oil, soybean oil, and crude oil, respectively.

Methodology
The multivariate GAS model The generalized autoregressive score (GAS) model is an observation-driven model based on the score function of the predictive model density at time-t. The basic idea of the multivariate GAS model is to update the time-varying parameters based on the information available in time-t. Let the return data is a N-dimensional vector at time t with conditional distribution as follows: | 1: −1~( ; , ) (1) 1: −1 ≡ ( 1 ′ , 2 ′ , … , −1 ′ ) ′ is contains the past value of up to time t-1. is the vector time-varying parameter and is a static parameter vector. The time-varying parameter vector are driven by the scaled score function of the conditional observation density in equation (1).
For simplicity, this study uses a first-order multivariate GAS model with p = 1 and q = 1 or multivariate GAS (1,1) model. Therefore, the updating mechanism of the time-varying parameter vector f t is given by autoregressive updating equation follows as: , A(θ) and (θ) are matrices of coefficients with proper dimensions, and s t is scaled score function derivative of the density function (1) at the time t with respect to the time-varying parameter vector , as follows: ∇ ≡ ∂ln p( | −1 ; , ) f t is a positive definite scaling matrix known at time t. Via the choice of scaling matrix , the GAS model allows more flexibility in the score function is used for updating [12]. ∇ is the score of (1) evaluated at f t . Scaling matrix that make the GAS model encompasses the well-known classic volatility model such us GARCH as follows: where | −1 is information matrix of . Parameter usually takes value in the set {0, 1 2 , 1} and fixed by the researcher. Creal et al. (2013) mention that the quantity of indicated the direction to update the vector of parameter to +1 , acting as a steepest ascent algorithm for improving the model local fit given the current parameter position. This updating procedure resembles the Newtown-Raphson algorithm.
The parameter vector in (2) has linear specification and not restricted. In practical, the timevarying parameter such as variance, correlation and degree of freedom number is often restricted. The solution of this problem under the GAS model is to use a non-linear link function that map ̃ into and has a linear dynamic specification of (2) [13].
The GAS model has a useful property that the time-varying parameter vector is perfectly predictable and the log-likelihood function can be easily evaluated with the prediction error decomposition. In other words, the time-varying parameter vector can be estimated by the Maximum Likelihood Estimation (MLE) method [14].
The multivariate DCC-GARCH model The DCC GARCH model was developed by Engle in 2002 with the basic idea is decomposed the covariance matrix into a conditional standard deviation matrix and a conditional correlation matrix [15]. The DCC-GARCH models are defined as: r t = μ t +a t (4) = n x 1 return data vector of n assets at time t a t = n x 1 mean-corrected return vector of n assets at time t with E[a t =0 and Cov[a t ]= H t μ t = n x 1 the expected value vector of the conditional r t H t = n x n conditional covariance matrix of a t at time t D t = n x n conditional standard deviation matrix of a t at time t R t = n x n conditional correlation matrix of a t at time t z t = n x 1 vector of iid errors such that E[z t ]=0 and E[z t z ′ ]=I μ t in mean model (4) can be a constant or a time series model such as AR (p), MA (q), or ARMA (p,q). In this study, the average model used is the time series model from the identification results.
The element of the diagonal matrix of conditional standard deviation D t is standard deviations from a univariate variance model such as the GARCH model.
For simplicity, this research uses the univariate GARCH (1,1) model specification, such as: The requirements have to be considered when specifying a form of R t :  Matrix R t has to be definite positive to ensure Matrix H t has to be a definite positive because it is a covariance matrix.  All element of matrix R t has to be equal or less than one by definition. To ensure the requirement of the DCC Model, R t is decomposed into: is the unconditional covariance matrix of the standardized error. Q t * is a diagonal matrix with the square root of diagonal elements of Q t at the diagonal. The parameter a and b are scalars. For simplicity and comparable with the Multivariate GAS (1,1) model, The DCC GARCH model also uses the first-order model with the correlation structure model in (8) and the univariate GARCH (1,1) model in (7).
The method used in estimating the conditional variance matrix is the two-stage Quasi Maximum Likelihood Method [16]. The first step is to estimate the univariate GARCH parameters and the second stage is to estimate the parameters of the correlation structure model.

Evaluation Forecasting Performance
Forecasting performance is evaluated using the smallest RMSE and MAE of the two models. The RMSE and MAE formulas are following Poon (2005) and Chen and Xu (2019) as follow [17] -[18]: ̂ and ρ t is 1-step ahead rolling forecast results of volatility and correlation at period t for each model. and ρ t is volatility and correlation realization as proxies of the actual volatility and correlation. Volatility is a latent variable. Therefore, in the evaluation of volatility forecasting, several measures are often used as a proxy for the actual volatility. This study uses the absolute value of return as a proxy for actual volatility because the absolute value of returns can predict volatility well [18]. Mapa et al. used a rolling correlation with specific periods as a proxy for the actual correlation [19]. Therefore, this study using a rolling correlation of daily price returns of two commodity price returns with a period of 5 weeks as actual correlation.

Empirical result
The data exploration and preliminary analysis The price returns for the four commodities fluctuated throughout the study period. The returns of CPO and coconut oil are of extreme value at several points in time. In 2015, the price of CPO decreased drastically, resulting in high return value and a negative sign. The decline in CPO prices during this period was caused by reduced in demand from exporting countries, an oversupply of soybean oil that is a substitute commodity, the ineffective biodiesel program in Indonesia and Malaysia, and the decline in crude oil prices.
The returns range differs between periods. The difference returns range is seen in the return of crude oil prices ( Figure 1). In 2010-2014, the returns range of crude oil was -8 to 7. In the following period, the returns range was between -15 to 15. The difference in the returns range is an initial indication of the non-constant variance of error.
The scatter plot between the returns from the four commodities along with the Pearson correlation is presented in Figure 2. Based on this figure, there is a significant relationship between the six pairs of returns. The strongest correlation is between CPO and soybean oil, which is 0.5. The high correlation between them is due to the use of the two commodities which are mutually substituting. The weakest correlation is between the returns of palm oil and crude oil.
Early identification of dynamic correlation can be done using a rolling correlation throughout the study period. A 15-week rolling correlation shows a fluctuating correlation that tends to be positive. The 15-week rolling correlation between returns of CPO and soybean oil tends to be more consistently positive than other pairs. The existence of significant correlations justifies the reasons for conducting analyzes of volatility and correlation simultaneously.  Table 1 shows descriptive statistics of the price returns for the four commodities. The standard deviation is a measure of unconditional volatility. The largest standard deviation is coconut oil return of 4.13, while the other three returns range from 2 to 3.5. That indicates that the return of coconut oil has the highest volatility compared to other commodities.
The skewness of CPO and crude oil is negative, while the skewness of coconut oil and crude oil is positive. The kurtosis value of the return of the four commodities is less than 3, which means that the  The univariate normality test on returns using the Jarque-Bera test rejects the null hypothesis on returns of CPO, coconut oil, and crude oil. It means that the three returns do not follow the normal distribution, while returns of soybean oil follow the normal distribution at the 1% significant level. The stationarity test using the Dickey-Fuller test shows that the four returns series are stationary at the 1% significant level. However, all four returns series exists an autocorrelation based on the Ljung-box test at lag 20. The ARCH test on error returns from the mean rejects the null hypothesis that there is no ARCH effect. It shows that the error has a non-constant variance. The results of these tests confirm the use of the multivariate GAS model and the DCC-GARCH model for forecasting the volatility of all returns series.
The multivariate normality test on returns using the Doornik Hansen test shows the rejection of the null hypothesis that the distributions of returns are symmetric [20]. An empirical experiment will be conducted to find the distribution of the data return approach. The basic idea of this empirical illustration is the distribution of the sample approachable with the distribution of the majority of the subsample [11].
First, we divide the data into sub-samples with a fixed size and sequential in chronological order. This experiment divided the sample into sub-samples with sizes from 20 to 400 with the addition of 10. Second, we perform the Doornik-Hansen test for each sub-sample. Third, we calculate the rejection rate from the test results. The rejection rate is the ratio of the number of sub-samples that reject the hypothesis to the total number of subsamples at a fixed sub-sample size. The high rejection rate means that majority of the sub-samples did not follow the multivariate normal distribution.
The results show that the smaller the sub-sample size, the smaller the rejection rate ( Figure 3). It means that the smaller the sample size, the more subsamples that follow the multivariate normal distribution. Based on the results of these experiments, the distribution of the returns can be approximated by the multivariate student's t distribution. Therefore the parameter estimation of the multivariate GAS model and the DCC-GARCH model is based on the multivariate student's t distribution error.
The estimation result of multivariate GAS model The multivariate GAS model is a data-driven model for modelling time-varying parameters. The mechanism updating of the time-varying parameter is using the autoregressive function. There are four types of time-varying parameters for the multivariate GAS model based on the multivariate student's t distribution error (the multivariate t-GAS model), namely location (µ), volatility ( ), correlation ( ), and shape ( ). The selection of time-varying parameters in the model from the four types of parameters in the multivariate t-GAS used the likelihood ratio test. This study performs three stages of the likelihood ratio test for testing the correlation, location, and shape as the time-varying parameter. The basic multivariate GAS model is a model with volatility as a time-varying parameter.
Note: Model 1: t-GAS model with volatility as time-varying parameters Model 2: t-GAS model with volatility and correlation as time-varying parameters Model 3: t-GAS model with volatility, correlation, and shape as time-varying parameters Model 4: t-GAS model with volatility, correlation, shape, and locations as time-varying parameters The testing of correlation parameters as time-varying parameters results in rejection of the null hypothesis at the 10% significant level ( Table 2). Multivariate t-GAS models with volatility and correlation as time-varying parameters are better than models with only volatility as time-varying parameters. The location parameter is also significant as a time-varying parameter with a p-value that is close to zero. Meanwhile, the shape parameter is not statistically significant as the time-varying parameter. Based on the likelihood ratio test, the best multivariate t-GAS model is a model with volatility, correlation, and location as time-varying parameters.
The coefficients of the autoregressive model for updating the time-varying parameters are κ, A, and B. The matrix κ is a diagonal matrix that contains constants that correspond to time-varying parameters  The estimation result of the multivariate t-GAS model is presented in the close to 1 and a high level of significance. However, the multivariate t-GAS model cannot capture a strong and consistent correlation between returns. It is supported by a small value of and a lower level of significance.

The estimation result of multivariate DCC-GARCH model
Estimating parameters of the DCC GARCH model (1,1) was carried out in two stages. The first stage estimates the variance model parameters with univariate GARCH on each return series. The second stage estimates the correlation model on the DCC GARCH (1,1). The mean model needs to be identified precisely so that the variance model accurately models the fluctuation of returns to the average model.
The model means were identified using Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF), Extended Autocorrelation Function (EACF), and auto ARIMA in the R program. The results are presented in Table 4. The best mean model is selected based on the smallest AIC from several tentative models. The best mean model for returns of CPO is the AR (1) model, while the other returns are the MA (1) model.  All coefficients are significant at the 5% level except for the constant on the mean model of each commodity and the coefficient 31 on the variance model for returns of soybean oil. It shows that returns of soybean oil have a short term memory while other returns have a long term memory. All coefficients on the variance model are significant at the 1% level so that this model can capture the existence of volatility.
The estimator for 1 and 1 in the correlation structure equation is 0.0341 and 0.6403, respectively, which are significant at the 5 % level. It shows a statistically significant correlation between returns. The existence of a dynamic correlation between the price returns of these four commodities is not strong enough because the sum of 1 + 1 is 0.6743 and is not close to 1. The fit performance on in-of-sample data and the forecasting performance on out-of-sample data The evaluation of the best model for estimating and forecasting volatility and the dynamic correlation between the multivariate t-GAS and t-DCC GARCH models use the RMSE and MAE values. The best model has the smallest RMSE and MAE values. The RMSE and MAE values in estimating volatility and the correlation in the in-of-samples data are presented in Table 7. Evaluation for forecasting volatility and correlation in the out-of-sample data are presented in Table 8.
The multivariate t-GAS model produces RMSE and MAE values that are smaller than the GARCH t-DCC model in estimating and forecasting volatility in the four returns. Based on these results, the multivariate t-GAS model has better performance in estimation and forecasting volatility. It is in line with the results of the research by Chen and Xu in 2019 that stated that the multivariate t-GAS model has good performance in forecasting volatility [11].
The estimate and forecast values of soybean oil return volatility produce the smallest RMSE and MAE. The largest RMSE and MAE occur in estimating and forecasting the volatility of coconut oil. It shows that the returns of coconut oil have a complex movement and make the error of the two models in forecasting is still large. The value of the volatility forecast for the four returns series based on the multivariate t-GAS model tends to be lower than the t-DCC GARCH model. Similar results also occur in the volatility estimation of the in-of-sample data. However, in general, the forecast results for the two models are quite close. The difference in forecast results between the two models is high when the return is far from the other values. The t-DCC GARCH model tends to be reactive to extreme returns. The forecast results based on this model are not robust on extreme values. It is consistent with the statement of Creal et al. in 2012 that the multivariate t-GAS model produces robust forecast values [12]. The comparison of the results of estimating volatility between the multivariate t-GAS and t-DCC GARCH models is presented in Figure 3, while the comparison of the volatility forecasting results is presented in Figure 4.  Table 7. The comparison of RMSE and MAE for estimating and forecasting volatility on in-ofsample data and out-of-sample data

Volatility
The multivariate t-GAS model produces RMSE and MAE values that are smaller than the GARCH t-DCC model in estimating and forecasting volatility in the four returns. The multivariate t-GAS model has better performance than the t-DCC GARCH model in estimating and forecasting volatility in the four returns series. However, the multivariate t-GAS model performed well only in estimating and forecasting the correlation between returns for CPO and soybean oil. The rolling correlation between CPO and soybean oil tends to be more consistent than the other pairs supporting these results.
Modelling of volatility and dynamic correlation to hedge assets, risk management, and asset allocation is better to use a multivariate GAS model. This model is more robust in extreme returns, is more practical in its application, and has a good performance in volatility forecasting. For further research, it is better to include exogenous variables and asymmetric elements in the GAS model, considering the increasingly complex data structure.