Ten daily rainfall forecasting using SSA algorithms and Seasonal ARIMA model to determine the beginning of the rainy season

Rainfall is one of the most important hydrological parameters. Based on the prediction of rainfall, several plans related to rainfall can be carried out, including predicting the start of the planting season based on information on the beginning of the rainy season and preparing cropping patterns. The purpose of this study is to predict the ten daily rainfall for 12 months from January to December of the following year and predict the start of the rainy season based on BMKG criteria. Modeling and forecasting is carried out on the ten daily regional rainfall in the catchment of Citarum-Majalaya water gauge using the Singular Spectrum Analysis (SSA) algorithm and the Seasonal ARIMA model. Forecasting model performance is measured using error values, namely mean absolute percentage error (MAPE). The results of these performance tests illustrate that the SSA algorithm and the Seasonal ARIMA model have fairly good accuracy for predicting rainfall for the next 6 months, where each has a MAPE value of 36.8 percent and 40.0 percent. Prediction of the beginning of the rainy season at the study location based on SSA forecasting results is most likely to occur on the 3rd of October, in accordance with BMKG predictions. whereas based on the results of the SARIMA forecasting model, the beginning of the rainy season is likely to occur on the 1st of November, and another possibility that the beginning of the rainy season will occur on the 2nd of October.


Introduction
Indonesia in one year periodically changes two seasons, the rainy season and the dry season.The rainy season generally occurs from October to March, and the dry season lasts from April to September.The peak of the rainy season occurs from December to February and the peak of the dry season occurs from June to August [1] and [2].
Rainfall is the main determinant of the occurrence of the rainy season and river discharge variability.The Meteorology, Climatology and Geophysics Agency (BMKG) predicts the arrival of the rainy season and dry season based on rainfall.Ten daily rainfall is the total rainfall for ten days.Rainfall in one month consists of three ten daily rainfall, namely the 1st ten daily rainfall which is the total rainfall from the 1st to the 10th, the 2nd ten daily rainfall from the 11th to the 20th, and the 3rd ten daily rainfall from 1314 (2024) 012121 IOP Publishing doi:10.1088/1755-1315/1314/1/012121 2 the 21st to the end of the month.Total in one year consists of 36 ten daily rainfall [3].The total of ten daily rainfall is above 50 mm in a row in three consecutive ten daily, so the first ten daily will be determined as the beginning of the rainy season.Whereas if the ten daily rainfall is below 50 mm in a row in three consecutive ten daily, the first ten daily will be determined as the beginning of the dry season [4].
Time series models for predicting rainfall have previously been carried out using the Singular Spectrum Analysis Algorithm and the seasonal ARIMA model as in [5], [6] and [7].The Seasonal ARIMA model is commonly used in forecasting time series data that contains seasonality and has good accuracy, while the Singular Spectrum Analysis Algorithm is a nonparametric statistical method that does not require data normality but has good accuracy.The purpose of this study is to forecast the ten daily rainfall for the next year based on historical rainfall data for the last several periods and predict the start of the rainy season based on BMKG criteria.Modeling and forecasting is carried out on the ten daily regional rainfall in the catchment of Citarum-Majalaya's Water Estimation Post (PDA) using the Singular Spectrum Analysis (SSA) algorithm and the Seasonal ARIMA model.The results of several previous studies show that modeling using the SSA method is better and more accurate than other methods such as ARIMA and Holt-Winter based on MAE and MEAE [8], The results of several previous studies show that modeling using the SSA method is better and more accurate than other methods such as ARIMA and Holt-Winter based on MAE and MEAE [9]; and the SSA-ARIMA Hybrid Model for forecasting both multiplicative and additive has very good accuracy [5].

Methodology
This section will explain the research data, study locations and the methods used for modeling and forecasting the basic rainfall.An outline of the research steps is presented in Figure 1.

Study Area and Regional Raifall
The study was carried out at the catchment of Citarum-Majalaya's PDA with an area of 203.77 km 2 , located in the upstream Citarum sub-watershed which is administratively included in the Bandung Regency, West Java Province, Indonesia.The data needed for modeling and forecasting is the daily rainfall data series as a result of recording 3 rain stations, namely the Cibeureum rain station, the Cipaku-Paseh rain station and the Situ Cisanti rain station in a period of 20 years, from 2001 to 2020.Data Processing of daily rainfall data series from 3 rain stations into rainfall data series for region daily rainfall using the Polygon Thiessen method.The area of influence of the rain station and the Thiessen polygon map are presented in Table 1 and Figure 2. Within 20 years, 720 data series were obtained for region ten daily rainfall.For analysis purposes, the data is divided into two, data for training and data for testing.684 training data are used for modeling and 36 testing data are used for model performance evaluation.

2.2.1.
Decomposition Stages.At the Embedding stage, it is necessary to determine the Window Length parameter (L), under the condition 2 ≤ L ≤ N/2, and The result of this process is the X trajectory matrix with size L x K [5].
The next step is to create a Singular Value Decomposition (SVD) from the X trajectory matrix and represent it as the sum of the biorthogonal matrices.Let  1 , … ,   be the eigenvalues of XX T in descending order (λ 1 ≥ ⋯ λ L ≥ 0) and U 1 , … , U L are eigenvectors of each eigenvalue.Rank matrix X is indicated by  = {,   > 0} = rank X (2) for i = 1,...,d, then the SVD of the trajectory matrix can be expressed as follows: where   = √       , so that The X matrix is formed from the eigenvector   , singular value √  and principal component    .
These three elements form the SVD and are called eigentriple.

Reconstuction Stages.
The first step of the reconstruction stage is grouping, namely breaking the   into several groups and summing the matrices into each group.For Example  = { 1 , … ,   } is a group of indexes  1 , … ,   .Then the   matrix is adjusted to the group I defined as   =   1 + ⋯ +    .Index set separation  = {1, … , } be a disjoint subset  1 , … ,   according to the following statement: The next step is to convert each matrix resulting from the grouped decomposition into a new onedimensional series of length N and is called the Diagonal averaging.For example Y denotes a matrix with order (LxK), with elements   , 1< I < L, 1< j < k, and specify L*=min(L,K), K*=max(L,K), and T=L+K-1.For example   * =   if L<K dan   * =   other.Diagonal averaging transforms the Y matrix into a series  1 ,  2 , … ,   with formulas 5

Seasonal ARIMA
The Seasonal ARIMA Model is the development of the ARIMA model which was introduced to improve ARIMA's performance in seasonal time series modeling.The ARIMA model is a linear nonstationary time series model popularized by Box and Jenkins in 1976 [11].Seasonal elements in time series data can be visually detected through time series data plots (Buys Ballot plots) and autocorrelation function (ACF) plot.Plot data series with a seasonal pattern will have a wave-like composition.Conditions like this can also be inferred from the behavior of the ACF plot which appears sinusoidal.Detection of elements and seasonal periods can also be performed through statistical analysis using the spectral regression method.Seasonal model ARIMA is written in SARIMA (p,d,q)(P,D,Q) S which has the form of the following mathematical equation: • Selection of the best model is based on several criteria, that is, it fulfills the requirements for parameter significance, the residuals are white noise and have a normal distribution and have the best value of the goodness of the model as measured using the criteria of mean absolute percentage error (MAPE) and mean square error (MSE).The best Seasonal ARIMA model is then used for forecasting the series of ten daily rainfall.The process of modeling and forecasting seasonal ARIMA uses Minitab software Seasonal ARIMA modeling has previously been implemented in the field of water resource, including irrigation system modeling [12], and water demand modeling [13].The process of modeling and forecasting seasonal ARIMA in this study used the help of Minitab software.

Singular Spectrum Analysis (SSA)
The process of forecasting rainfall using the SSA algorithm is carried out using the R software on the Rssa Package.The output and plot of SSA forecasting results are presented in Figure 3 and Figure 4.In Figure 3 it can be seen that the MAPE and RMSE values for the last one year (in sample) of the training process are close to zero value (very good).Whereas, evaluation of forecasting performance shows that forecasting series data X in the short term for 6 months or 18 months ahead (out sample) has a MAPE value of 36.8% which can be categorized as quite good.Forecasting performance over 6 months has a MAPE value above 50% which is categorized as having poor performance.The performance criteria of Modeling and forecasting based on MAPE values are presented in Table 7.

Seasonal ARIMA Model
Rainfall data series plot based on time (Buys Ballot plot) is done to detect seasonal elements and trends in the series of ten daily rainfall data.The graph in Figure 5(a) shows wave patterns and repetitions at certain time periods, however, the series of data from time to time does not follow a certain pattern or does not have a trend.The seasonal patterns can also be visually detected through plots (ACF).The ACF plot with a sinusoidal pattern indicates a seasonal pattern.The results of the detection of seasonal patterns and periods using the spectral regression method confirm that the series of rainfall data have a seasonal pattern with a period value of 36.
The plot of the basic rainfall data in Figure 5(a) indicates that the data is not stationary in terms of variance and mean.The stationarity of the data for variance is detected by using Box-Cox plots, stationary if the rounded value or lambda (λ) is 1.Furthermore, the stationarity of the data in the mean is detected through the ACF plot of the Box-Cox transformed data.The ACF plot in Figure 5(b) shows that the first 3 lags are out of the confidence limits so that the data is not stationary in the mean and needs to be differentiated.The seasonal ACF plot in Figure 7 shows that after differencing once the majority of the lags are already at the confidence limit, so that the data is stationary in the mean.From Figure 7 there is a cut off at lags 1 and 3, and for seasonal there is a cut off at lag 72.Whereas in the PACF plot it is known that for non-seasonal there is a cut off at lags 1 and 2, and for seasonal there is a cut off at lag 72.Furthermore, in the ACF plot for seasonal differencing results in Figure 7(c) it appears that for non-seasonal there is a cut off at lags 1 and 2, and for seasonal there is a cut off at lag 36.Likewise in the seasonal PACF plot in Figure 7(d) there are cut offs at lags 1 and 2, and for seasonality there are cut offs at lags 36 and 72 Based on ACF plots and PACF plots as well as some other information, several temporary Seasonal ARIMA models are obtained, i.e.SARIMA (2,1,1)(0,1,1) 36 , SARIMA (1,1,1)(1,1,1) 36 , SARIMA (2,1,0)(0,1,1) 36 , SARIMA (1,1,1)(1,1,1) 36 , SARIMA (3,1,1)(1,1,1) 36 and SARIMA (2,1,0)(1,1,1) 36 .
The SARIMA (2,1,0)(0,1,1) 36 model is the best model, because it has a significant coefficient, the residuals are white noise and normally distributed, as presented in Table 2, Table 3 and Figure 8, and has a value the smallest MSE is 0.128798 which is presented in Table 4. Based on the data in Table 2. the  (2,1,0)(0,1,1) 36 model can be written in the form of a mathematical equation as follows:   Forecasting results using equation (10) are presented in Table 7. Forecasting result data is the ten daily rainfall data series for the next year, from the 1st ten daiy in January 2020 to the 3rd ten daily in December 2020.The data shows that rainfall with accumulation of rain greater than 50 mm consecutively starts from the 1st of November.Thus, the beginning of the rainy season occurs on the 1st ten daily of November.The plot of the rain data series based on the results of the Seasonal ARIMA forecasting model is presented in Figure 9.The results of the model performance evaluation, with a MAPE value of 40.0%(Table 6.), shows that the performance of the Seasonal ARIMA model for the next 6 months (18 basis) is quite good.

Comparison of Forecasting Results
Figure 9. shows that the pattern of rainfall data based on the forecasting results of the Seasonal ARIMA model relatively follows the forecasting pattern of the SSA Algorithm.Likewise with the forecasting performance of the two models, which have a MAPE value of less than 50%, namely 36.8% and 40.0, so that both are categorized as having fairly good performance and the SSA algorithm is relatively better than the Seasonal ARIMA model.Based on the data in Table 7, the prediction of the start of the rainy season at the study location based on the forecasting results of the SSA algorithm occurs on the 3rd of October, while the results of the Seasonal ARIMA forecasting model occur on the 1st of November.Thus, the prediction for the beginning of the first rainy season based on the forecasting results of the SSA algorithm is in accordance with BMKG predictions, namely the beginning of the rainy season in the southern part of Bandung will occur on the 3rd of October [14] .Determining the start of the rainy and dry seasons based on BMKG criteria can be illustrated as presented in Figure 10

Probability of Beginning of The Rainy Season
The prediction of the beginning of the rainy season based on ten daily rainfall data certainly contains an element of uncertainty due to random rain events.The probability of the beginning of the rainy season is calculated based on several possible data series of ten daily rainfall foreasting results obtained through a scenario setting the length of the training data series using 3 approaches, namely: 1) predicting rain for one year ahead is done per one lag time, 2) making bias corrections to the results of the first approach, and 3) predicting rain for one year done simultaneously.
Based on the scenarios and approaches mentioned above and their implementation for forecasting in 2019 and 2020, a total of 27 series of rainfall data are obtained.Furthermore, using BMKG criteria, out of 27 series of ten daily rainfall data, 16 times occurred on the 3rd ten daily of October, 6 times occur in the 2nd ten daily of October, 4 times occur in the 1st ten daily of November, and 1 time occurs in the 1st ten daily of October.Thus, the occurrence of the beginning of the rainy season at the dominant study location occurred on the 3rd ten daily of October with an occurrence probability of 0.59.The shift to the start of the rainy season is likely to occur in the 2nd ten daily of October and in the 1st ten daily of November and is less likely to occur in the 1st ten daily of October.The probability of the beginning of the rainy season based on forecasting results and BMKG criteria are presented in Figure 10.

Conclusion
Model performance evaluation illustrates that the performance of the SSA Algorithm and the Seasonal ARIMA model is good enough for the next 6 months (18 ten daily), where each has a MAPE value of 36.8% and 40.0%, so the SSA algorithm is relatively better than the Seasonal ARIMA model.
Based on the BMKG criteria and ten daily rainfall data from the SSA forecasting results, it can be predicted that the beginning of the rainy season at the study location will occur on the 3rd bottom of October, and based on the results of forecasting Seasonal ARIMA the beginning of the rainy season occurs on the 1st of November.The results of the prediction of the beginning of the rainy season SSA are in accordance with the BMKG predictions.
The beginning of the rainy season at the study location is most likely to occur in the 3rd ten daily of October with a probability of 0.59.If there is a shift it is likely to occur in the 2nd ten daily of October and the 1st ten daily of November, it is unlikely to occur in the 2nd ten daily November 1.

Figure 2 .
Figure 2. The Polygon Thiessen of Catchment Area of Citarum-Majalaya's PDA 2.2.Algoritma SSA The basic SSA algorithm consists of two complementary stages, namely decomposition and reconstruction.The decomposition stage consists of the embedding process and Singular Value Decomposition (SVD) stages, and the reconstruction stage consists of the Grouping and Diagonal Averaging process stages [10].

Figure 3 .
Figure 3. Forecasting output uses the SSA algorithm

Figure 4 .
Figure 4. Plot of the SSA algorithm forecasting results

Figure 5 .
Figure 5. (a) Plot of ten daily rain data series and (b) Plot of ACF (a) and Figure 7(b) it appears that for non-seasonal

Figure 8 .
Figure 8. Normal Distribution of Residuals

Figure 9 .
Figure 9. Actual ten daily rainfall and Forecasting Results for 2020

Table 1 .
The Polygon Thiessen of Catchment Area of Citarum-Majalaya's PDA Identification of the model is done by testing the variance and mean stationarity.Time series data that are not stationary in terms of the mean are stationary through a differencing process, and those that are not stationary in the variance are stationary with the Box-Cox transformation.•Parameter estimation is performed for each alternative model which is then followed by parameter significance test.• Residual diagnostics are carried out by testing the assumption of residual white noise and testing the assumption of normally distributed residuals.The residual is white noise if there is no correlation between the residuals with a mean equal to zero and a constant variance.The white noise test was performed using the Ljung-Box statistical test.The residuals are normally distributed if the residual values are mostly close to the average value.The normality test was performed using the Kormogorov-Smirnov test.

Table 5 .
The criteria for forecasting performance are based on the MAPE value

Table 6 .
Performance of the SSA Algorithm and Seasonal ARIMA Model