VECM and Bayesian VECM for Overparameterization Problem

Overparameterization is the amount of data used in the study is less than the number of estimated parameters. Overparameterization problems can cause the forecasting ability to be weak because the model is not suitable. This problem often occurs in complex models such as Vector Error Correction Model (VECM). This study discusses VECM and Bayesian VECM (BVECM), which aims to analyze the relationship between macroeconomic variables in Indonesia. First, estimate parameters of VECM with Maximum Likelihood Estimation. Second, estimate parameters of VECM with a Bayesian approach (BVECM). The variables used in this study are six macroeconomic variables in Indonesia in 2010 quarter 1 to 2019 quarter 4 are GDP, the money supply, exchange rate of rupiah to US dollar, exports, imports and interest rates. The amount of data in this study is less than the number of estimated parameters that causing overparameterization problems. Based on literature, Bayesian method can avoid overparameterization problems which can not be overcome by Maximum Likelihood Estimation. The model obtained from this study is the VECM(3) and BVECM(3). In the VECM analysis, the residuals did not meet the assumptions of diagnostic model. However, diagnostics of BVECM models show that it has been proven that the model is suitable. This conclusion is relevant to the statement that the Bayesian method can solve the problem of overparameterization.


Introduction
In the field of economics, multivariate time series analysis is usually used, one of which is the vector error correction model or restricted VAR model. VECM is an analysis for economic variables that has a long run and a short run for data that is not stationary [1]. The variables in VECM have a long-term relationship or are called cointegration. The most commonly used estimation parameters for VECM are Least Square and Maximum Likelihood Estimation. These methods have the advantage that they are easier to apply for a large number of observations. In VAR and VECM models, problems often occur, namely there are too many parameters to estimate and the amount of data is less than the estimated parameters (over-parameterization), so that the forecasting ability is weak because the model is not suitable [2]. The classical prediction method cannot solve the over-parameterization problem. Therefore, the study [3] uses the Bayesian parameter estimation method in the VAR model to avoid the problem of over-parameterization in economic data. The Bayesian approach to cointegration called Bayesian Vector Error Correction Model (BVECM) [4].

Method
The general form of VECM (p-1) where p is the lag of the endogenous variable with cointegration rank ≤r is as follows [5]: The method of parameter estimation that is often used in VECM is Maximum Likelihood Estimation (MLE). MLE is a method of estimating parameters using a distribution approach from the data held and the distribution assumptions given by the data. MLE provides a general method, in which a condition in a random sample contains a consistent predictor [6]. The weakness of the MLE method is that the data used must meet the assumption of multivariate normality and the sample size ranges from 100-300 so that the resulting parameter estimates are good (not biased) [7]. Mathematically, the VECM parameter estimates using the MLE method are as follows [5]: The likelihood function of the VECM for a sample of size The log likelihood function for the VECM model is After that, derivate the equation (5) to ∏ Γ. One alternative approach to estimate parameters other than classical parameter estimation such as Maximum Likelihood Estimation is the Bayesian method. The difference between the Bayesian approach and the classical approach is that the parameters in the Bayesian method are considered as random variables. Therefore, the prior distribution must be determined first and the researcher can choose the prior distribution based on the researchers' beliefs, but it must be useful as initial information to estimate the VECM parameter. The posterior distribution includes prior distribution and observational data information [8]. The posterior distribution is proportional to the product of the likelihood function and the prior distribution. The likelihood function of the VECM is as follows equation (5) ( | , , The joint posterior distribution of the VECM parameters is as follows: The following are the steps for the VECM and Bayesian VECM analysis: 1. Stationarity test can be done with the unit root test (Augmented Dickey Fuller Test) [9]. If it is not stationary then differencing is done. 2. After the data is stationary, it is necessary to determine the lag length ( ) by using several information criteria. Information criteria that are often used by Akaike Information Criterion (AIC) [10]. The selected lag length can be determined through the minimum value of the AIC criteria. 3. Testing cointegration using the Johansen approach. The VECM model is structured if the cointegration rank (r) is greater than zero [11].

Estimating the parameters of the Vector Error Correction Model with Maximum Likelihood
Estimation (MLE). To test the significance of VECM parameters using p-value. 5. Estimating the parameters of the Vector Error Correction Model with a Bayesian approach, namely through the Gibbs Sampler simulation. The Gibbs Sampler algorithm uses a conditional posterior distribution. Briefly, the steps for the Gibbs Sampler algorithm [4] are: 1. Set the initial value for each parameter  3. After forming ( ( ) , ( ) , ( ) ) then stored as a set of values that will be used for generation in the iteration to (t + 1) of the algorithm, 4. Perform steps 1 to 3 as many iterations as desired. The average value of the simulation results for each element in the parameter is taken to be used as parameter estimator.

Examination of the convergence of simulation results using the Trace Plot, MC Error and
autocorrelation. Convergence check is used to find out whether the generated observations match the posterior distribution. Convergence checks can be done with trace plots and density plots between observations that have been raised [8]. If convergence has not been fulfilled, the iteration needs to be increased. 6. To test the significance of BVECM parameters using the Credible Interval with a lower limit of the percentile of 2.5% and an upper limit of the percentile of 97.5% [8].

Stationarity Test
The stationarity of the data can be tested using the Augmented Dickey Fuller (ADF) method.
Following is the output of each variable can be seen in Table 1. Based on Table 1 on the level data, it is shown that the variables of the money supply, the rupiah exchange rate of rupiah to US Dollar, exports, imports, interest rates and GDP have a p-value of more than α (0.1), so the six macroeconomic variables are not stationary. Based on the results of the data stationarity test on differencing data, it is shown that all variables have a p-value of less than α (0.1), so it can be concluded that the six macroeconomic variables are stationary in the first differencing data.

Cointegration Test
The optimum lag of the VECM model was selected using the Akaike Information Criteria (AIC) information criteria and the optimum lag is 4. Then the cointegration test that used Johansen test shown in Table 2. Based on Table 2, it can be seen that the value of the trace statistics for the rank 4 hypothesis test, the trace statistic value is 10.85. The trace statistic value is less than the critical value at = 10%, 5% and 1%, so the decision to accept H0 is obtained. Thus, based on this analysis, it can be concluded that the results of the cointegration test indicate that there are a maximum of 4 cointegration equations.

Estimation Parameter of VECM
Based on the analysis that has been done previously, it is concluded that the macroeconomic variable relationship model in this study follows the VECM(3). Estimation of parameters using Maximum Likelihood Estimation. The results of the VECM parameter estimation in this study can be summarized in Table 3 where the estimated parameter and p-value are shown. After obtaining estimated parameters for each model, it is necessary to retransform all estimators using an exponential transformation to test the parameter estimates. Based on Table 3, the variables that significantly affect GDP are GDP one to three previous quarters, the money supply in the previous second quarter, the exchange rate of rupiah to US Dollar in the previous first quarter, exports in the previous second quarter and imports in the previous one to three quarters. On the other hand, macroeconomic variables that affect GDP are also influenced by other variables.

Estimation Parameter of Bayesian VECM
Based on the analysis that has been done previously, it is concluded that the macroeconomic variable relationship model in this study follows the BVECM(3). Estimation of parameters using the Gibbs Sampler simulation with iteration of 500 samples to get convergent results. The results of the BVECM parameter estimation in this study can be summarized in Table 4 where the average parameter values and credible intervals are shown.  The next step in the BVECM approach is to test the convergence of the parameters that have been estimated. The convergence can be seen by looking at the trace plot and density plot between the observations that have been raised. One of the trace plots of this research model is as follows. In Figure 1, it can be seen that the trace plot does not form a regular pattern when 500 iterations are carried out. So it can be said that the random sample has converged and the iteration has stopped. Next, look at the posterior distribution for BVECM parameters. The posterior distribution formed from the predicted parameter variables can be seen from the kernel density [8]. Some of the resulting posterior distributions are as follows. In Figure 2, it can be seen that the posterior distribution formed for several parameters is in the form of a normal distribution. This indicates that the model parameters have converged. So, based on the convergence test of the trace plot and density plot, it can be concluded that the assumed model has met the convergence criteria.
After obtaining parameter estimator for each model in Table 4, it is necessary to transform all estimators with exponential transformation to interpret them. In Table 4, the mean or average value of each parameter estimator and credible interval is written with a lower limit of the percentile of 2.5% and an upper limit of the percentile of 97.5%. The criteria for the null hypothesis are rejected or the predictor variables affect the response variable, namely if the credible interval does not contain a zero value. Based on Table 4, the variables that significantly affect GDP are GDP one to three previous quarters, the money supply in the previous quarter, the rupiah exchange rate of rupiah to US Dollar in the previous third quarter, exports in the previous second quarter, imports in the previous first and third quarters and interest rates. in the previous third quarter. On the other hand, macroeconomic variables that affect GDP are also influenced by other variables.

Diagnostic Model
The next step is diagnostic model test, that is to test the feasibility of the model. Diagnostic model tests performed on the VECM are normality of the residual and non-autocorrelation of the residual. First, the residual normality test was carried out using the Jarque-Bera test using the skewness and kurtosis values [12]. Second, the residual non-autocorrelation test is done by the Portmanteau test or the Ljung Box Test.

0.92
Based on the portmanteau correlation test in VECM, the p-value of each lag is less than α = 0.01, rejecting H0 means there is a residual autocorrelation. Based on the skewness test in VECM, p-value less than α = 0.05 then reject H0 and it means that the residual does not have a multivariate normal distribution. Kurtosis test obtained p-value more than α = 0.05, accepting H0 means multivariate normal distribution of residuals. Skewness and kurtosis tests are interrelated, if one has a decision to reject H0 then it can be concluded that residuals are not normally multivariate distributed, so it can be concluded that the diagnostic model is not fulfilled [12].
If the residuals do not meet multivariate normal assumptions and there are residual autocorrelations then the model is not suitable. This unsuitable model is caused by over-parameterization. Estimation methods Maximum Likelihood and other classical methods such as Least Square cannot overcome the problem of overparameterization [2], [3].
Based on residual normality test in Bayesian VECM, p-value of multivariate skewness and multivariate kurtosis> α (0.05) so accept H0. That is, with a confidence level of 95% there is sufficient evidence to state that the residual is a multivariate normal distribution. Based on residual autocorrelation test in Bayesian VECM, the p-value of Ljung Box Test listed in Table 5 is more than α (0.05), concludes to accept H0, which means that the residual has no serial correlation. From the two model diagnostic test results, it shows that the Bayesian VECM is feasible and can be used for forecasting. From the diagnostic model, BVECM(3) is the aprropriate model. From the results of Table 4, the BVECM (3) can be formed as follows.