The autoregressive integrated vector model approach for covid-19 data in Indonesia and Singapore

Almost all of the countries in the world were attacked by coronavirus disease 2019 (covid-19) which is started from pneumonia endemic cases in Wuhan China in December 2019. The virus spreads including in the Southeast Asian region where Singapore and Indonesia are two countries with the highest cases. Simultaneous modeling of cases in both countries is important as information on the development of covid-19 cases. The Vector Autoregressive Integrated (VARI) model is a multivariate time series model that can be used to build model non-stationary time series data in several locations simultaneously. This study aims to analyze the development of covid-19 cases and builds a VARI model of covid-19 cases in Indonesia and Singapore. The data used from daily covid-19 confirmed cases in the period March 16th until April 19th, 2020. Plotting and statistics testing covid-19 data series from both countries show non-stationary series which have trend and fluctuation. Based on optimum lag identification use the minimum Akaike Information Criteria correction (AICc), the parsimony model is obtained namely VARI(1,1) which has satisfied the multivariate normal, white noise, and homogenous assumption. The result shows that covid-19 cases in Indonesia and Singapore have a strong positive correlation. However, the covid-19 cases in both countries were only influenced by previous cases in each country. The accuracy shows that the model is good enough for forecasting covid-19 cases in both countries.


Introduction
At present, the world is rocked by an outbreak starting with a mysterious case of pneumonia in Wuhan China which was reported in December 2019 [1]. The pneumonia case spread and infected people outside of Wuhan, even throughout China. The World Health Organization (WHO) officially announced "Coronavirus Disease (covid-19)" as the name of the new disease on February 12 th , 2020 caused by the virus "Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)" [2]. The virus spread to other countries then WHO confirmed covid-19 as a pandemic on March 11 th , 2020. The covid-19 confirmed cases in the world up to April 19 th , 2020, are 2,324,731 cases which spread in 215 countries, with a mortality rate of 160,434 people [14]. The spread of covid-19 also occurred in the Southeast Asia region, with Indonesia and Singapore as the 2 countries with the highest cases. On April 19 th , 2020, there were many confirmed cases in Indonesia and Singapore as many as 6,575 and 6,588 cases, respectively. The death rate in Indonesia was 582 people while the death rate in Singapore was 11 people [14].
The impact of the covid-19 pandemic has been felt across countries, communities, and economies. In most areas, the government faces a difficult choice between public safety or reviving the economy [3]. Lockdown and physical distancing policies to prevent the spread of viruses harm the economy of developing countries and are even prone to triggering poverty and inequality [4].
Since the first case of covid-19 was recorded, the number of cases in Indonesia and Singapore until the end of April 2020 shows an increase and it is uncertain when the covid-19 pandemic will come to an end. Research related to covid-19 in Indonesia using time series data, in particular, is still not much done. Previous studies related to covid-19 in Indonesia showed that the trend of covid-19 cases overtime followed an exponential function [5]. Other studies have found that the proximity of a country's territory affects each other covid-19 case number [6].
An analysis is needed regarding covid-19 cases to obtain a simultaneous model of covid-19 cases in Indonesia and Singapore. Hence, it can give information on the development of covid-19 cases in both countries. The VARI model is a multivariate time series model that can be used to build model non-stationary time series data in several locations simultaneously [7]. Thus, this study aims to analyze the development of covid-19 cases and apply the VARI model of covid-19 cases in Indonesia and Singapore.

Data Source
This study engaged the covid-19 data obtained from the worldometers page (https://www.worldometers.info/coronavirus/). The variable used is the number of daily confirmed cases in Indonesia and Singapore period March 16 th until April 19 th , 2020. The simultaneous model approach of covid-19 cases in the two countries is the multivariate time series method.

Stationary
Time series analysis can be performed on stationary conditions. If the observed series have properties that do not depend on the time is called a stationary time series [8]. According [9] a stochastic process is stationary if its first and second moments are time-invariant. A stationary process ( ) have the mean E( ) = and variance Var( ) = E( − ) = , and the covariance Cov( , + ) = E( − )( + − ) [10]. A differencing process is needed when it is non-stationary in the mean and transformations such as logarithms are used to stabilize the variance.
The stationary condition for Vector Autoregressive (VAR) model can be investigated simply. A necessary condition for the VAR(1) series to be stationary is that all eigenvalues must be less than 1 in absolute value. It can also be shown that if all eigenvalues are less than 1 in absolute value, then the VAR(1) series is stationary [11].

Order Identification
Autocorrelation function (ACF) and partial autocorrelation function (PACF) are a tool for identifying the order of univariate AR(p) and MA(q) model. For the multivariate time series model, identifying the order is using matrix autocorrelation function (MACF) and matrix partial autocorrelation function (MPACF). The correlation matrix function is defined as ̂( ) = [̂( )], [10] where: The partial autoregression matrix at lag s, denoted by, ℘( ), to be the last matrix coefficient when the data are fitted to a vector autoregressive process of order [10]. The partial autoregressive matrix function is defined: For the vector model AR(p), the equation (2) can be written: The matrices of MACF and MPACF that is formed are complex when the dimension of the vector is increased. The simple method for identification of MACF and MPACF is given by [10] using note the sign (+), (-), and (.) in the ( , ) position of matrix. Sign (+) denotes a value greater than 2 times the estimated standard errors, (-) denotes a value less than -2 times the estimated standard errors, and (.) denotes a value within 2 estimated standard errors. The optimum lag determination is fulfilled when the lag (s) is dominated by sign (.). Other alternative to determine the optimum lag uses Akaike Information Criteria correction (AICc). The optimum lag is determined by the lowest value of AICc. The equation of AICc is [12]: where denotes the number of observations, denotes the dimensional autoregressive vector, is autoregressive order, and ̂ is the variance-covariance matrix.

Vector Autoregressive Integrated (VARI)
The VARI model is a development of Autoregressive Integrated Model (ARI) which is influenced by the variable itself and other variables in the previous period with non-stationary data [13]. If the data is carried out the process first differencing to produce stationary data, the VAR(1) model becomes VARI (1,1)

Mean Absolute Percentage Error (MAPE)
The model criteria selected depend on the goodness of fit, such as the residual mean square. However, if the model purpose for forecasting future values then it can be based on forecast errors as the alternative criteria for model selection [10]. The MAPE can be written as follows: with ̂ is forecast data.

Descriptive Analysis
The first confirmed covid-19 cases in Indonesia were reported on March 2 th , 2020, while Singapore announced the first confirmed cases on January 23 th , 2020. The daily confirmed cases in Indonesia and Singapore continued to increase. Figure 1 shows that in the period March 16th until April 19th, 2020, the confirmed cases in the two countries have the same pattern relatively. The series looks fluctuating and increasing. It could be explained if the confirmed covid-19 cases in one country increase or decrease then it would be accompanied by the increased or decreased confirmed covid-19 cases in the other country.

Figure 1. The confirmed covid-19 cases in Indonesia and Singapore on March 16 th until
April 19 th , 2020. Figure 1 shows that daily confirmed cases in Indonesia in the first period until April 12 th , 2020, were always higher than Singapore. However, Singapore has experienced a spike daily confirmed cases since April 15 th , 2020, with the highest confirmed cases was 942 cases on April 18 th , 2020, while the highest confirmed cases in Indonesia were 407 cases on April 17 th , 2020. Table 1 shows that the average daily confirmed cases in Indonesia was 184.51, higher than Singapore, which was 181.77. Meanwhile, the value of standard deviation from both countries is quite high which describes that the daily confirmed cases in both countries are quite varied. The coefficient correlation confirmed covid-19 cases between Indonesia and Singapore was 0.7842 which shows that there was a quite strong positive relationship confirmed cases between the two countries. The correlation between them means that the built model was one variable comprise of the function of other variables, therefore the appropriate model in this data is a multivariate time series model.

Stationary
According to Figure 1, it can be seen that the series of the two countries has an element of an increasing trend and tends to fluctuate. This indicates that the two series are not stationary. The stationary testing was carried out using Box-Cox Lambda statistics to test stationary in the variance and Augmented Dicky Fuller (ADF) statistics to test stationary in the mean.
The Box-Cox Lambda statistics were less than one which means that two series not stationary in the variance then the logarithm transformation is carried out. Next, the Indonesia and Singapore series were tested using the ADF test, the results showed that the p-value was 0.5495 and 0.9364, respectively which IOP Publishing doi:10.1088/1742-6596/1722/1/012057 5 was greater than the significance level of = 5%. This means that the two series were not stationary in the mean and hence first differencing was carried out. After the first differencing on both countries series, the result of the ADF test showed that the p-value was 0.0187 and 0.0206, respectively which was less than the significance level of = 5%. This means that the two series were stationary in the mean. Figure 2 shows that the two series are already stationary.

Identification Model
The time series model identification is through Partial Autocorrelation Function (PACF) as seen in Figure 3 and Figure 4. Both series Indonesia and Singapore show the PACF plot cut off after lag 1 which indicated that both countries have the same phenomenon model AR(1). The multivariate time series identification can also be done using MACF and MPACF plots. The MACF and MPACF plot are quite accurate but less practical and tend to be subjective, and hence the time series order determination is also done by looking at the AICc values.    Figure 6 show the MACF and MPACF plots from differencing data. Both the MACF and MPACF plot was signed (-) in the lag 1. It means the negative correlation was statistically significant. Meanwhile, the sign (.) is seen dominate all lags after lag 1. It shows that no correlation occurrence significantly. It can be concluded that there were cut-off after lag 1 in MACF and MPACF consequently, there will be more than one model to be composed. Therefore, identified AICc values in some order is required. The order which has the smallest AICc value is the appropriate VARI model.     1). The multivariate stationery can also be detected from the corresponding eigenvalues of the VARI(1,1) model which were 1 = 0.4630 and 2 = 0.4194. Those eigenvalues were satisfied with the stationary theorem which stated if the eigenvalues | | < 1 then the multivariate time series model was stationary, and hence the VARI(1,1) model for confirmed covid-19 cases of Indonesia and Singapore was stationary.

VARI(1,1) Model
Parameter estimation of the VARI(1,1) model was done by the Ordinary Least Squares (OLS) method as seen in Table 3. The results show that the p-value of F test in Indonesia and Singapore series were 0.03022 and 0.03748, respectively which less than the significance level of = 5%. This means that there was at least one variable that has a significant effect on each model. The partial test results on the two-equation models show the same result that only the variable itself in the previous period has a significant effect. This indicates that the covid-19 cases in Indonesia and Singapore were only affected by the previous covid-19 cases in each country. In other words, there was no influence between countries. The diagnostics test of VARI(1,1) residuals is done by Jacque-Berra test. The statistics test was 2.8163 with 0.589 of p-value which was greater than the significance level of = 5%. This means that the distribution of residuals is following multivariate normal distributions. The Portmanteau statistics test was 32.776 with 0.6227 of p-value which was greater than the significance level of = 5%. This indicated the accomplishment of white noise assumption. The ARCH-Lagrange Multiplier (LM) statistics test was 69 with 0.9511 of p-value which was greater than the significance level of = 5%.