Ability to Forecast standardized precipitation index in Vietnamese Mekong Delta for dry season months based on sea surface temperature

Drought has major impacts on agriculture, society, and ecosystems, so early prediction of drought plays an important role in mitigating its impacts. This study investigates the relationship between the Standardized Precipitation Index (SPI) in the Mekong Delta and global sea surface temperature (SST) in order to find potential variables for improving SPI forecast quality. The method used in determining potential predictors is based on the analysis of multiple correlation coefficients. Based on these potential predictors, SPI is predicted by stepwise regression along with the application of Leave-one-out cross-validation technique. The data used in this study is the rainfall of 15 stations in the Vietnamese Mekong Delta and the global SST from 1977 to 2020. Research results show that there is a pair of variables that have the best relationship with SPI in the study area, they are SST on Niño 3.4 region and SST in the region with latitude from 13 °N to 23 °N and longitude from 116 °E to 126 °E. When the forecast term is from 1 to 2 months, using this pair of variables gives better forecast quality than using only the predictors from the Niño indices.


Introduction
The Vietnamese Mekong Delta (VMD) has an area of 40,922 km 2 , of which 25,752 km 2 is for agricultural production, generating 55% of Vietnam's rice production.VMD has a dry season lasting 6 months, from November to April with a total rainfall of about 290 mm and accounting for 16% of the annual rainfall.This is the time to grow winter-spring rice and start the summer-autumn rice crop.In this area, the lack of rainfall in the dry season often causes severe drought, and significantly affects rice productivity (Lam et al 2019, Minh et al 2022, Lavane et al 2023).
The effects of El Niño and La Niña are evident in the VMD region (Ju and Slingo 1995, Chen et al 2012, Gobin et al 2015, Luong 2021).During El Niño years, in addition to the significant lack of rainfall, saltwater intrusion and high temperatures are often associated, which exacerbates the water scarcity in this region.Especially during the strong El Niño that lasted from November 2014 to May 2016, the region suffered its worst droughts and saline intrusion in 90 years.During the last months of the dry season of 2016, out of 12 provinces in this region, 6 provinces have declared natural disasters of saline intrusion, 3 provinces have declared natural disasters of drought and saline intrusion.
According to the World Meteorological Organization (World Meteorological Organization WMO and Global Water Partnership GWP 2016), there is no single indicator that is applicable to all drought types, climate regimes and sectors affected by drought.Among drought indices, the Standardized Precipitation Index (SPI) is flexible and simple to calculate, and is recommended for use in meteorological drought monitoring (Hayes et al 2011).This index is calculated from rainfall data for the different time scales at which the drought occurs.Based on the statistics of articles retrieved from Scopus database published from January 2000 to November 2021, SPI is the most used among drought indicators (Zaki and Noda 2022).This index is also widely used for meteorological drought and predictions (Bonaccorso et al 2015, Ma et al 2015, Mo and Lyon 2015).
In the tropics, climate variability is often associated with El Niño and La Niña activity (Webster and Yang 1992, Ropelewski and Halpert 1996, Zhou and Chan 2007).Sea surface temperature (SST) variability causes temperature and precipitation changes in many places (Trenberth et al 1998, Fang andXie 2020).Therefore, SST data is widely used for climate prediction (Ren et al 2018).
Drought forecasting is based on statistical, dynamical, and hybrid methods.According to the statistical method, drought is forecasted through drought indices and based on potential predictor variables including large-scale climate indices and global SST (Rasmusson and Carpenter 1982, Wu and Newell 1998, Garfinkel and Hartmann 2007, Deser et al 2010, Messie and Chavez 2011, Santos et al 2014, Bonaccorso et al 2015, Hao et al 2018, Lin, Qian 2019).In statistical methods, Multiple Linear Regression (MLR) is commonly applied because the prediction accuracy can be improved by choosing independent variables (Esha andImteaz 2018, Kim et al 2020).
In this study, drought is forecasted for VMD through SPI based on MLR method with predictor variables from SST.However, in order to improve the forecasting quality by this method, in addition to the traditional predictor variables SST in the Niño regions, it is necessary to add the suitable potential variables for VMD.

Data
The data used in this study were taken from 1977 to 2020 in monthly mean format.Data used include: (1) Precipitation at 15 weather stations in VMD (figure 1); (2) The global gridded SST field having a spatial resolution of 1°× 1°latitude by longitude.This data was obtained from the National Oceanic and Atmospheric Administration (NOAA) via https://psl.noaa.gov/data/gridded;and (3) Sea surface temperature anomalies (SSTA) in regions Niño 3, Niño 3.4, Niño 4 and Niño.west.These data are known as Niño indices.SSTA in Niño.west is taken from https://ds.data.jma.go.jp/,SSTA in the remaining regions is taken from http://www.esrl.noaa.gov.

Methodology
This study aims to identify potential variables and build the SPI predictor equation for VMD.The main contents in this study were: (1) Calculating SPI index for monitoring stations; (2) Training the SPI forecasting equations based on Niño indices and a time variable, and evaluating forecast accuracy; (3) Building a new predictor based on SST data in grid cells; and (4) Training the SPI prediction equation when adding new predictor variables and evaluating forecast accuracy.Figure 2 shows the main steps for building the SPI prediction equations.
In this flow chart, R 12 is the correlation coefficient between X 1 and X 2 .R 1.234 is the multiple correlation coefficient between X 1 and X 2 and X 3 and X 4 .Where X 1 is the SPI at a monitoring station in VMD, X 2 is a Niño index, and X 4 is a time variable (t).The variable t is used to take into account the trends of the variables participating in the forecast equation.'Mask' is the region used to extract the gridded global SSTA data to create a new potential predictor variable.This is the region with the largest value of R 1.234 , which is rectangular in shape.Its size and location are determined by gradual testing to ensure the best SPI forecast quality.The methods used in this diagram are as follows: 2.2.1.Calculate SPI SPI was developed by McKee and colleagues in 1993, this indicator is widely used around the world.SPI is an index that is determined based on the distribution of precipitation and is calculated with time scales ranging from a month to several years.In this study the time scale chosen was one month.This option is intended to meet short-duration drought warning requirements.
Let X be the rainfall corresponding to a given time scale, then SPI is calculated according to the following steps: -Calculating the shape parameters (β) and scale parameters (α) of gamma distribution: where X is the mean of X and U is the statistical coefficient.Let n be the number of observations, then U is calculated as follows: -Building the cumulative probability function: x e dx , 4 where Γ(α) is the Gamma function.Since the Gamma function is not defined for x = 0 and the precipitation distribution may contain zeros, the cumulative probability is calculated as follows: ( where q is the probability of zero. The cumulative probability H(x) is then transformed to the standard normal random variable, which is the value of the SPI: Where To calculate the SPI, this study uses the gamma function in the Cdflib.f90program from the Florida State University website: https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.html.

Finding potential variables for SPI prediction
In this study, the first step SPI is forecasted by the MLR method with the predictor variables from the Niño indices.In the next step, to improve the forecast quality, this study finds new predictor variables from the SST of the grid cells.The new variable is determined based on the value of the multiple correlation coefficient between SPI (X 1 ) and one of the Niño indices (X 2 ) and SST of the grid cells (X 3 ).Since X 1 , X 2 and X 3 have a trend, to account for these effects, a time variable (X 4 ) is added when calculating the multiple correlation coefficient.
The multiple correlation coefficient between X 1 and X 2 , X 3 and X 4 is calculated as follows: Where r 12 is correlation between X 1 and X 2 ; r 13.2 is first oder partial correlation; r 14.23 is second oder partial correlation.
The value of r 14.23 can be computed by the following formula: The first order partial correlation in equations (8) and (9) between X i and X j while controlling X k is computed by following equation: Based on the multiple correlation coefficient obtained from equation (8) for all grid cells, the region with the highest correlation coefficient is determined and used to extract SST.These extracted values will be the new predictor variables.Similar to the Niño regions, this extracted region will be rectangular in shape and sized so that the SPI prediction results are the best.

SPI prediction by MLR method
SPI is predicted based on stepwise regression along with application of leave-one-out cross-validation (LOOCV) technique.In this method the added or removed potential explanatory variables are determined by testing for statistical significance after each iteration.Let t be the time variable, X i are the predictor variables extracted from the SST, then SPI is determined through the following equation: In this equation Ŝ PI is the simulated SPI; a, b and c i are regression coefficients; bt is the part related to the trend of SPI and X i .The variable t is assigned the ordinal number of the month of observation, with data from 1977 to 2020, t has a value from 1 to 528.X i is the SST anomaly in the Niño and new regions.X i has a time scale of one to several months, and it is taken one to few months before SPI.
In order to improve forecasting accuracy, equation ( 11) is not constructed for each individual month but for the entire dry season.With 6 months dry season and 44 years, the length of the data series included in the analysis will be 264.
The evaluation of the forecast quality as well as determining a variable to be added or removed through 4 statistical parameters, which are the correlation coefficient (R), the root mean square error of prediction (RMSE), Nash Sutcliffe Model Efficiency Coefficient (NSE), and Willmott's index of agreement (d).These coefficients are determined as follows: In the above formulas, n is the length of the data series, Ō and S are the averages of O and S respectively.

Correlation coefficients between SPI and SST in Niño regions
The correlation coefficient between the SPI in the study area and the Niño indices is shown in figure 3.In this figure, the correlation coefficient is calculated for each month, the Niño indices are taken a few months before the SPI, and the colored part has a correlation coefficient at the significance level of 0.05.This figure shows that from June to October, the correlation coefficient is often lower than the critical value at the significance level mentioned above.These months are also rainy months, accounting for about 74% of annual rainfall.The figure also shows that the SPI of the less rainy months has a better relationship with the Niño indices than the rainy months.
From figure 3, the maximum lag correlation coefficient between SPI and Niño indices in the less rainy months (November to May) is shown in figure 4. According to this figure, Niño.west has the best relationship with SPI in the period from December to April, with a correlation coefficient of about 0.5.As shown in figure 3, the time lag between the Niño.westindex and the SPI is only zero to one month, or it is of little significance in predicting the SPI directly.Also according to figure 4, among the Niño indices in the equatorial Pacific, Niño 3.4 has the best relationship with SPI in VMD.

Multiple correlation coefficient between SPI in dry season months and Niño indices and SST in grid cells
This content aims to find a pair of regions where the SST has a good relationship with the SPI in the VMD to choose as a predictor.Because SST and SPI have a trend, to account for this effect, the multiple correlation coefficient is calculated between SPI with 3 variables including: (1) SSTA grid cells, (2) one of the Niño indices, and (3) a time variable.The unit of the time variable is the month, and January of the first year is assigned the value 1.In VMD, since SPI has a delay compared to SST from 0 to 3 months, lag time is also included in the analysis.To forecast SPI directly from SST, a delay time is chosen from 1 to 3 months.In addition, to ensure reliability, the correlation coefficient is not calculated separately for each month but for the whole dry season.In the case of using Niño.westindex, figure 5(a) shows that there are 2 regions where the multiple correlation coefficient between SPI with Niño.west index and SSTA in grid cells and a time variable is highest.One is located in the Niño 4 region, the other coincides with the Philippine Area of Responsibility (PAR) for tropical cyclone activity monitoring.The lowest correlation in figure 5(a) is 0.55, which is the multiple correlation between SPI and the Niño.westindex and a time variable.When adding a variable called SST in the grid cells, the highest multiple correlation coefficient is 0.62, increasing by about 0.07.In the case of using Niño 4 or Niño 3.4 or Niño 3 Index (figures 5(b)-(d)), the results show that the PAR region is where the correlation coefficient is highest.Based on the PAR region, the multiple correlation coefficient increases when using the Niño.west,Niño 3, Niño 4 and Niño 3.4 indices.
Thus, the multiple correlation coefficient between SPI and SST in Niño 3.4 and PAR region is the highest.In particular, in the PAR region there is an area surrounding the Luzon Strait, located between 13 °N and 23 °N and from 116 °E to 126 °E, which has a rather high correlation coefficient.Here the multiple correlation coefficient has a value from 0.67 to 0.69.Compared with the lowest correlation coefficient in this region is 0.55, when adding the SST variable in this place, the correlation coefficient increased above 0.1.

Ability to predict SPI in VMD based on SSTA in Niño regions
This content is intended to generate results to compare the accuracy of the SPI forecast before and after the addition of a new predictor variable, which is extracted from the SST in the PAR region.
SPI is predicted using the MLR method.The predictor variables are selected by the stepwise regression method along with the evaluation of the forecast quality based on the LOOCV technique.The predictor variables are selected from SSTA in the Niño regions and a time variable.The variables from SSTA have time scale from 1 to 4 months with its value recorded in the last month and are taken 1 to 3 months later than SPI, so there will be all 13 independent variables.The predictor is the average SPI of VMD or the SPI at the monitoring stations.To increase stability, the forecasting equation is not built for each month but for all months in the dry season.Forecast results are evaluated using RMSE, R, NSE and d.

The ability to predict the average SPI of VMD
Using the data as the average SPI of VMD, the results of determining the predictor variables and the coefficients of the linear regression equation are shown in table 1.According to this table, there are 2 variables selected into the forecasting equation, namely Niño 3.4 3-2 and a time variable t.The Niño 3.4 3-2 indicates that it is SSTA in the Niño 3.4 region with a time scale of 3 months and is taken before the SPI 2 months.This table shows that the quality of the regression equation is not high with a correlation coefficient of only 0.52.Based on the standard error of the regression coefficients, it shows that the trend component contributing to the SPI forecast value is low.
The scattering between the actual and predicted SPI by the LOOCV technique is shown in figure 6.The accuracy of the SPI prediction is as follows: RMSE = 0.61, R = 0.51, NSE = 0.26, and d = 0.52.Compared with the correlation coefficient of the regression equation, the correlation coefficient in the forecast is only slightly reduced.The reason related to the slight decrease in R is because the data included in the regression equation is   long enough.Through these evaluation coefficients and the slope coefficients in figure 6, it also shows that the forecast quality is low, or if only SST in Niño regions are used in the SPI forecast, the reliability will not be high.
To clarify the forecast quality in the dry season, the pairs of SPI observations and forecasts in figure 6 are separated by month and then evaluated.The Scattering between the actual and forecast SPI is illustrated in figure 7(a) for April, and the results of the evaluation of the forecast quality are shown in figure 7(b).
As shown in figure 7(b), the forecast quality gradually increases from November to April.November has a rather low forecast quality with NSE close to zero and R less than the critical value at the significance level of 0.05.The remaining months have RMSE between 0.58 and 0.63, R from 0.47 to 0.61, NSE from 0.23 to 0.37 and d from 0.46 to 0.58.The forecast quality for March and April is roughly similar.Thus, the average SPI of this region can be forecast based on SSTA in Niño 3.4 region with a forecast term of 2 months and for December to April.

Ability to predict SPI at stations
For consistency as well as to improve forecast quality, the predictor variables are selected as in the case of the VMD average SPI forecast, these variables are Niño3.43-2 and t.With 15 monitoring stations, the coefficients of the forecast equations and forecast accuracy are statistically and presented in table 2. Based on the values of significance F shows that the regression equations are reliable, but compared with table 2, the confidence level is lower.Based on the standard deviation values in this table, the coefficients of the SPI prediction equations are not very different.This is because VMD is not too large in space and has flat terrain.
The coefficients RMSE, R, NSE and d in table 2 show that the forecast quality is significantly reduced compared to the case where the predictor is the mean SPI of VMD.In which RMSE increased from 0.61 to 0.83,  R decreased from 0.51 to 0.39, NSE and d were both quite low.The decrease in forecast quality is due to the higher randomness of the precipitation as the space becomes narrower.
The accuracy of the SPI forecast for each month is shown in figure 8.According to this figure, the forecast quality tends to increase gradually from the beginning to the end of the dry season.In November, most of the stations have R less than the critical value with significance level 0.05.In this month, the mean values of RMSE and R were 0.9 and 0.2 respectively, indicating a rather low forecast quality.March and April have the highest forecast quality, but the average value of R is also around 0.5.The values of NSE and d also show that the prediction accuracy is quite low, with NSE usually less than 0.3 and d usually less than 0.5.

Ability to predict SPI based on SSTA in Niño and PAR regions
The new variable in the SPI prediction equation is the SSTA extracted from the region with the largest R in figure 5(c).The extracted area is rectangular in shape and meets the condition that the area is not too small and gives the highest prediction accuracy.The selected region that meets this condition has latitudes from 13 °N to 23 °N and from 116 °E to 126 °E.Because this region is in the PAR, for simplicity the SSTA in this region is called SSTA P .
Similar to the predictors built from SSTA in Niño regions, the predictors built from SSTA P also have a time scale of a few months with the value recorded in the last month and it is taken before SPI from one to two months.
3.4.1.Ability to predict the average SPI of VMD when using SSTA in the Niño regions and SSTA p In the case of independent variables built from SSTA in Niño regions and SSTA p , the results of determining predictors and coefficients of the regression equation are presented in tables 3 and 4.This table corresponds to the forecast term of 1 and 2 months.In the predictor variable names of these tables, the subscript has two digits.The first number indicates the time scale of the SSTA, and the second number indicates the month in which it was taken before SPI.Because the duration of 3 months or more has forecasting quality with no significant change when adding new variables from SSTA p , it is not included in the analysis.
The predictors in tables 3 and 4 show that they are only related to SSTA in Niño 3.4 region and SSTA P without a trend component.The absence of the trend component may be because it is already included in the SSTA P variable.In these equations, the first variable selected as SSTA P 1-1 or SSTA P 1-2 indicates that SSTA P is the most sensitive.In addition, it also shows that SSTA p is best suited for 1-month SPI forecast.The standard error value of the regression coefficients also indicates that the role of SSTA P is higher than that of SSTA in Niño 3.4 region.As with the case of the forecast equation built with SSTA in Niño regions, Niño 3.4 3-2 is also selected in   3 shows that the accuracy of the forecast results has increased significantly.Corresponding to the forecast term from 2 months and 1 month, RMSE decreased from 0.61 to 0.59 and 0.54, R increased from 0.51 to 0.56 and 0.65, and d increased from 0.52 to 0.58 and 0.65.Figure 9 is a scatter plot of the relationship between the observed and forecast SPI. Figure 10 illustrates the dispersion between the actual and simulated SPI for April.The comparison of the slope coefficient between the actual and forecast SPI values in figures 6 and 9 also shows an increase in the forecast quality.Before adding the predictor variable from SSTA P , this coefficient has a value of 0.27, after adding it has a value of 0.33 and 0.43, corresponding to the forecast term of 2 and 1 month.However, with other tests where SSTA p was taken before SPI by 3 months or more, the forecast quality did not increase or increase insignificantly.Therefore, it can be said that when adding SSTA P , the SPI forecast quality in VMD only increases significantly with the forecast term from 1 to 2 months.

Ability to predict SPI for each station when adding predictor variable from SSTA p
In order to improve the forecast quality as well as have consistency among stations, the predictor variables in this case are taken from the average SPI prediction equation.Similar to the case of the average SPI forecast, the forecast term is 1 and 2 months.Or if the forecast term is 1 month, the predictors are SSTA p 1-1 and Niño 3.4 3-2 , and if the forecast term is 2 months, the predictors will be Niño 3.4 3-2 and SSTA p 1-2 .Based on these predictors, the Multiple linear regression equations are built, then statistical and presented in table 6.
In case the forecast term is 1 month, comparing the coefficients of forecast accuracy between tables 2 and 6 shows that when adding a new predictor from SSTA p , the forecast quality has improved.The correlation coefficient between observed and forecast data has increased from 0.39 to 0.50, Willmott's index of agreement increased from 0.43 to 0.53, root mean squared error decreased from 0.83 to 0.79.If the forecast term is 2 months, compared with before adding new predictor from SSTA P , the significance level F of the forecasting equation has increased but not significantly.Forecast quality also improved but not much, R increased from 0.39 to 0.42, d increased from 0.43 to 0.47.
The accuracy of the SPI forecast for each month is shown in figure 11.In the case of a 1-month forecast term, it is shown from this figure and figure 8 that after adding the new variable the forecast quality has increased in all months.The most obvious increase was in November, December, and April.On average in these 3 months, RMSE decreased by 0.07, R increased by 0.16, NSE and d increased by 0.13. Figure 11 shows that the forecast quality increases gradually from the beginning to the end of the dry season.April is the month with the best forecast quality with about 75% of stations having R between 0.58 and 0.7 and RMSE between 0.67 and 0.75.If the SPI forecast term is 2 months, the forecast quality only increases from February to April, the remaining months are not obvious.April was the month with the highest forecast quality increase, but R also only increased by about 0.09 and RMSE decreased by about 0.04.
In both cases when the predicted variable is the average SPI or the SPI of each station, the quality of the 1-month forecast increases significantly if a new predictor from SSTA p is added.If the forecast period is two months, the forecast quality will only improve in the last three months of the dry season.The lag time of only about one month between SSTA P and SPI in VMD is related to this limited forecast period.

Discussion
Based on the close relationship between SST and drought indices, drought forecasting can be based on time lags and defining drought conditions (Vicente-Serrano et al 2011, Seager Hoerling 2014).Droughts often appear   (Hamlet and Lettenmaier 1999, Barton and Ramírez 2004, Zambrano et al 2018).In VMD, according to figure 3, the lag time between them is about 2 months.This result is suitable for weather stations located on the west coast of the Pacific Ocean (Nguyen et al 2014).Therefore, it is possible to use Niño indices to warn SPI in VMD with a forecast term of about few months.The above research results show that the correlation coefficient between the Niño index and SPI is only evident in the dry season.Reported by Puryajati et al (2021) also showed similar results.In other words, the ability to forecast SPI from Niño indices is only possible during the dry season.Rainfall in the rainy season in VMD is rather high and stable.The total rainfall in the 6 months of the rainy season in VMD is 1480 mm and is distributed quite evenly by month, so agricultural drought does not occur during these months.During the dry season, agricultural drought often occurs when there is a shortage of rainfall.Therefore, the need for SPI forecasting during the dry season months is much higher than during the dry season, and Niño indices can meet this requirement.
During the June to October period, SST in Niño regions had a poor relationship with SPI in the VMD possibly because El Niño or La Niña episodes typically begin in the fall and end in the spring.According to statistics of Luong (2021), El Niño and La Niña are less active in the period from April to July.According to figure 3, the SPI in VMD fluctuates about 2 to 3 months later than the Niño 3, Niño 3.4 and Niño 4 indices.These could be the reasons that SPI in VMD from June to September is less affected by El Niño and La Niña.
The analysis results show that the Niño 3.4 index has the best relationship with SPI in VMD, which is consistent with previous research results on the use of Niño 3.4 in climate warnings for this area.(Nguyen et al 2014).SST in the Niño 3.4 region is used to calculate the Ocean Niño Index (ONI), an index widely used in El Niño and La Niña monitoring and climate warning (Webb and Magi 2022).The time with the highest correlation coefficient between the Niño 3.4 index and the SPI is from February to May with values from 0.5 to 0.6 (figure 4).Thus, if only these Niño indices are used alone, the ability to predict SPI in VMD is not high.
Based on the analysis of the multiple correlation coefficient between SPI in dry season months and Niño indices and SST in grid cells, the research results have found a new variable for building the SPI forecast equation.To see the effectiveness of adding new variables, the following is an analysis of the average SPI of VMD forecast quality.The change in SPI forecast accuracy before and after the addition of a new predictor from SSTA P is shown in figure 12.This figure shows the difference of the evaluation coefficients between after and before the new predictor was added in the dry season months.
In the case of a 1-month forecast term (figure 11(a)), the forecast quality improved significantly in all months of the dry season.The most obvious change is in the first and last months of the dry season.In November, RMSE decreased 0.11 and R increased 0.33.In April, RMSE fell 0.13 and R increased 0.18.The remaining months RMSE decreased from 0.04 to 0.07 and R increased from 0.08 to 0.15.In the case of a 2-month forecast term (figure 11(b)), the forecast quality from November to January did not change markedly.Forecast quality from February to April has increased but not much with the most obvious increase in April.During these months RMSE decreased from 0.04 to 0.09, R increased from 0.07 to 0.14, NSE increased from 0.08 to 0.18 and d increased from 0.03 to 0.1.
Thus, in the period from February to April, the SPI forecast quality increased in both the 1-month and 2-month forecast term.These are the driest months, so the improved quality of the SPI forecast is very meaningful in responding to drought.

Conclusion
Through analyzing the relationship between SPI in VMD and Niño indices, the correlation coefficient between them is highest in the dry season months and is statistically significant.The Niño.west index has the best relationship with SPI for the period from December to April and the lag between them is from 0 to 1 month.Among the Niños in the equatorial Pacific, Niño 3.4 has the best relationship with SPI in VMD.The highest correlation between them occurs from February to May, and the lag between them is about two months.From the magnitude of the correlation coefficients and the time delay between SPI in VMD and Niño indices, it is shown that if only these Niño indicators are used alone, the predictability of SPI in VMD is not high.
Analysis results of multiple correlation coefficients between SPI in VMD and one of the Niño index, SSTA in grid cells and a time variable found the range from 13 °N to 23 °N and from 116 °E to 126 °E have the highest correlation coefficient.Here the multiple correlation coefficient has a value from 0.67 to 0.69 and it is chosen to be the new predictor.
When the predictor variables are only taken from SSTA in the Niño region and taking into account the trend, the MLR prediction results show that only the one-variable from Niño 3.4 region is selected as the predictor.This variable has a three-month time scale and is taken two months before SPI.Based on the evaluation results of forecast accuracy, it shows that if only this predictor variable is used, the forecast quality is still quite low and cannot be forecast for the first month of the dry season.
When adding a new predictor from SSTA p , the forecasting equation changes.In case the forecast period is one month, the first selected predictor is SSTA p 1-1 and the second is Niño 3.4 3-2 .This shows that SST in PAR region is more sensitive to SPI fluctuations than SST in Niño 3.4 in this case.If the forecast period is 2 months, the order of the predictors has a change of position and the forecast quality decreases when compared with the 1-month forecast period.
The comparison between before and after the addition of new predictors from SSTA in PAR region shows that the forecast quality has improved significantly.In case the forecast period is one month, the forecast quality increases in all months.If the forecast period is 2 months, the forecast quality will only increase in the period from February to April.But if the forecast period is 3 months or more, the forecast quality will not increase significantly.The ability to improve the quality of SPI forecasts in the dry months, especially from February to April, is very meaningful for VMD's drought response.
Compared with the accuracy when the predicted variable is the average SPI of the stations, when the predictor is the SPI at each station, the quality is lower.Thus, in order to reduce the randomness of the rainfall data and improve the quality of the SPI forecast, the predictor variable should be representative of an area or a watershed as well as need to increase the number of rainfall monitoring stations.

Figure 2 .
Figure 2. Flow chart in building SPI forecasting equations.

Figure 3 .
Figure 3. Correlation coefficient between SPI in VMD and Niño indices with lag time from 0 to 7 months.

Figure 4 .
Figure 4. Magnitude of maximum lag correlation coefficient between SPI and Niño indices in the less rainy months.

Figure 6 .
Figure 6.Scatter plot between actual and simulated SPI in case the forecasting equation is built based on SST in the Niño regions.

Figure 7 .
Figure 7. (a) Scatter plot between actual and forecast SPI for April and (b) forecast accuracy of each month.

Figure 8 .
Figure 8. Accuracy in SPI forecast at stations and dry season months.

Figure 10 .
Figure10.Dispersion between actual and simulated SPI in April when adding new predictor from SSTA P corresponding to 1-month and 2-month forecast term.

Figure 11 .
Figure 11.Accuracy in SPI forecast for each station when adding predictor variable from SSTA P .

Figure 12 .
Figure12.The difference of the evaluation coefficients between after and before adding the predictor from SSTA p .

Table 1 .
Coefficients of the multiple linear regression equation for the average SPI simulation.

Table 2 .
Statistical value of the coefficients in the linear regression equation in the case of SPI prediction at stations.Thus, the predictor variable is built from SSTA in Niño 3.4 region with a time scale of 3 months and is taken 2 months before SPI for the best forecast quality.The values of the coefficients representing the SPI forecast accuracy are presented in table 5. Comparing the coefficients in this table with table

Table 3 .
The linear regression equation coefficients in case the predicted variable is SPI mean of VMD with one month forecast term and adding a new forecast variable from SSTA P .

Table 4 .
The linear regression equation coefficients in case the predicted variable is SPI mean of VMD with two months forecast term and adding a new forecast variable from SSTA P .

Table 5 .
Accuracy in SPI forecast for dry season.Scatter plots of real and simulated SPI for the dry season when adding variables from SSTA P with forecast term of 1 month and 2 months.

Table 6 .
Statistical value of coefficients in the linear regression equation in the case of SPI prediction at stations and the addition of a new predictor variable.