Statistical bias correction modelling for seasonal rainfall forecast for the case of Bali island

Rainfall is an element of climate which is highly influential to the agricultural sector. Rain pattern and distribution highly determines the sustainability of agricultural activities. Therefore, information on rainfall is very useful for agriculture sector and farmers in anticipating the possibility of extreme events which often cause failures of agricultural production. This research aims to identify the biases from seasonal forecast products from ECMWF (European Centre for Medium-Range Weather Forecasts) rainfall forecast and to build a transfer function in order to correct the distribution biases as a new prediction model using quantile mapping approach. We apply this approach to the case of Bali Island, and as a result, the use of bias correction methods in correcting systematic biases from the model gives better results. The new prediction model obtained with this approach is better than ever. We found generally that during rainy season, the bias correction approach performs better than in dry season.


Introduction
The rainfall in the tropical area is one of the factors in climate with the highest variability. Indonesia is one of the countries with a tropical climate which is geographically located between 6 o North Latitude and 11 o South Latitude and between 95 o West Longitude and 141 o East Longitude crossing the equator. The characteristics of rainfall in several regions are definitely not the same. This condition is caused by several factors, namely position of region, condition of earth in a region, mountain and valley in a region, and even structure and orientation of island. As the result, the pattern of rainfall distribution tends to be uneven among the regions in the wide scope [1].
The rainfall is one of the factors in climate which is highly influential on the agricultural sector. The consistency in rain pattern and distribution highly determines the sustainability of agricultural activities. However, this condition will be disturbed during the phenomenon of extreme climates, such as the dry extreme (El Nino) and wet extreme (La Nina). The phenomenon of extreme climate can cause rainfall pattern which deviates from the normal condition.
According to the analysis of Indonesian Agency for Meteorological, Climatological and Geophysics, in Banten, Special Capital City District of Jakarta, and West Java, the possibility of extreme rain reached 500 mm/month in 1970-1999. This possibility increased to 13% in 1900-1929, the possibility of extreme rain in these three regions is only 3% [2]. Indonesian people also experienced the real climate change in 2010. In wet drought throughout 2010, this year was ended with the drought at last, it is the real description of how the climate has really changed [3]. Based on data released by BPS Bali Province in 2014, agricultural production decreased 2.74% to 857.944 tons of dry milled grain [8]. The decline is generally due to a decrease of paddy harvest area to 5.11% or 7,683 ha due to drought as a result of the dry season. From June to September in 2014, extreme climate change occurs, ie in the category of dry months (months with an average of less than 100 mm rainfall), resulting in drought or pneumonia area of 807 ha or higher than 54 ha in 2013 [9]. This dry season has an effect on the retreat of planting season and harvest season about one month. If you look at the total loss of the data, this amount is very large.
Therefore, the information regarding the pattern of rainfall is very useful for farmers in anticipating the possibility of extreme events which cause the failure of agricultural production. Many researchers studied the method for modelling the rainfall pattern. Piani et al. [4] planned and applied the bias correction on the output of daily climate model in Europe, so a distribution which approximates the observation model was obtained. This method is known as Statistical Bias Correction. The obtained result shows that the daily climate model is improved consistently with the method of bias correction. Bambang, et al. [5] identified the systematic error in TRMM data and built the transfer function to correct the bias of TRMM rainfall. From this research, it was found that the rainfall pattern from TRMM data which have been corrected and observed is similar, but the amount of rainfall is not similar.
In this research, Statistical Bias Correction method to see the relation between ECMWF (European Centre for Medium-Range Weather Forecasts) rainfall forecast and observation model from Indonesian Agency for Meteorological, Climatological, and Geophysics in 1996-2015 will be used. This research aims to identify the systematic error in ECMWF data and to build the transfer function in order to correct the up-to-date ECMWF rainfall bias as the prediction model. The systematic error can be identifiable through comparison of form and distribution of ECMWF hindcast data with observation data using the approaches of interpolation and the closest point. The resulted ensemble model in this research is the predicted rainfall approximation for the period of January -December 2016. Then will see how good the results based on observations of monthly rainfall in 2016.

Data
The data used in this research are the rainfall data of ECMWF (The European Centre for Medium-Range Weather Forecasts) and observation in weather station from Central Indonesian Agency for Meteorological, Climatological and Geophysics in Jakarta from 1996 to 2016. The data taken is for observation in Bali Island.

2.2.1
Data Extraction. Data extraction was performed on the ECMWF rainfall data grid with MATLAB software to get the data output which is the same with the observation data. The data were extracted based on the location of latitude and longitude from every weather station. The process of data extraction used 2 methods, namely:  Method of the Closest Point. It was performed by making an algorithm to determine the distance between 2 points and the closest distance. The determination of distance among the points and used the formula [6],  Cubic Interpolation. Interpolation is a way of finding value among some known data points. This method is a numerical approach that needs to be done if we need the value of a function which is not known its formulation appropriately. In this research, cubic interpolation method will be used as one method for extracting ECMWF rainfall data. Cubic interpolation is done by determining the points between the four points by using the cubic polynomial approach as follows [7], (2) with,

2.2.2
Identification of Data Distribution. In this stage, the parameter estimation process and the distribution matching which was the closest to the data were performed. This process was required to determine the cdf for the bias correction process. This data distribution was checked with the support of MATLAB software to match data with one of the fittest distributions, namely beta distribution, birnbaum-saunders, exponential, extreme value, gamma, generalized extreme value, generalized pareto, inverse Gaussian, logistic, log-logistic, lognormal, nakagami, normal, rayleigh, rician, t location-scale and weibull. To sort the fittest distribution, it was characterized based on the parameter as follows: NLogL (Negative of the log likelihood), BIC (Bayesian information criterion), AIC (Akaike information criterion), and AICc (AIC with a correction for finite sample sizes).

2.2.3
Bias Correction. Bias correction was performed to see the relation between data of the observed rain and the predicted rainfall (ECMWF data) to make certain transfer function [ ]. The form of relation is cdfobserved= cdfpredicted. The relation between the observed and predicted data can be in form of regression equation, which is linear, exponential, or polynomial. The cdf can be formed with the formula as follows: where = average daily rainfall, = parameter of form, = scale, and = Gamma function, and this function can be evaluated with the factorial function . The cumulative distribution function (cdf) for data with zero value means that there is no rain on that day (dry spell) [4]. The result of this bias correction procedure was then used to correct model of the predicted rainfall in 2016.

2.2.4
Evaluation of the Best Prediction and Determination Models. The prediction evaluation was performed by considering and comparing the mean values and standard deviations from distribution of PDF rainfall model. The best model was determined by the following metric:  (8) with, ̅ = mean from pdf of the corrected rainfall, ̅ = mean from pdf of rainfall before correction, = the observed monthly rainfall, = standard deviation from pdf of the corrected rainfall, and = standard deviation from pdf of rainfall before correction. The statistical bias correction perform well when the values metrics above are within a unit interval: and [8][9] 3. Results

General Description of Data
General description of the pattern of ECMWF rainfall data and the observed rainfall data from Indonesian Agency for Meteorological, Climatological and Geophysics is shown in Figure 1. Based on Figure 1(a), it can be observed that rainfall data from ECMWF is grid-shaped. Data Grid is the data presentation with interpolation method that is the value estimation process in a region which is not sampled or measured, so map or distribution of region is formed. The rainfall data have 4 parameters, namely: average daily rainfall, latitude, longitude, and time. Figure 1(b) is the monthly rainfall data from the observation of Indonesian Agency for Meteorological, Climatological, and Geophysics for Sanglah station in January. These two data were taken in 1996-2016, in which 1996-2015 data will be used to see the relation between these two types of data and 2016 data are the basis for verification of the predicted rainfall.

Data Extraction
After the data extraction on the ECMWF rainfall data grid with MATLAB software, it resulted in 11 data of ensemble members for every observation month in 1996-2015. Meanwhile, from the process of data extraction in 2016, it resulted in 51 ensemble members data.

Identification of Data Distribution
The distribution determination which is the most suitable with the data is consideration of values of NLogL (Negative of the log likelihood), BIC (Bayesian information criterion), AIC (Akaike information criterion), and AICc (AIC with a correction for finite sample sizes) which is the smallest in Table 1. Based on the result in Table 1, ECMWF rainfall data for Sanglah Region in January 1996-2015 have Weibull distribution.

Bias Correction
In this research, the process of rainfall bias correction, as a step in making a new predicted rainfall model, was performed. ECMW rainfall model in 1996-2015 was corrected based on the observed rainfall in the same period.
This process was performed for every month in different location, so every location will result in the rainfall model from January to December 2016. The process of statistical bias correction was simulated like in Figure 2 below. The result from bias correction process by considering the relation of is then presented in PDF plot, so it is easier to be interpreted. Based on Figure 3, the bias correction method can identify well the bias of the historical rainfall forecast model. The value of the transfer function formed from the bias correction process can be used to improve the distribution of the 2016 rainfall prediction in Sanglah area. This can be seen from the rainfall prediction model in 2016 in Sanglah that has been improved based on information of rainfall bias from historical rainfall model.

The Best Prediction Model
After the precedures were applied, we can analyse the following result from the following table as follows: Based on Table 2, there is information that the predicted rainfall model in Sanglah with interpolation approach is good for prediction in March, April, May, July, and October. It can be seen from x and y values which have met the criteria. In the months, value is on range of which means that mean from the corrected rainfall distribution is not far from the real value of rainfall realization. This value also implies that the accuracy of the new prediction model obtained is better than the prediction model before the rainfall bias is corrected. Then, value which is less than 1 indicates that the actual rainfall, is estimated by an improved distribution with reduced standard deviation. This value also shows that the new prediction model obtained has better precision than the prediction model before the rainfall bias is corrected. For the cases in other months, the predicted rainfall model after correction can be said quite good since it meets one of the desired criteria.
Based on Table 3, the predicted rainfall after correction which is modeled with the closest point approach also gets a good result for some cases, in February and May. However, it is not for performing well in some months, such as September and December, which do not meet the determined criteria. With the value which does not meet this criterion, it indicates that the mean value of the corrected rainfall distribution in these months is far from the real value of rainfall. Besides that, having a larger value, indicates larger uncertainties where obtained after the procedure was applied. This value also indicates that the new prediction model obtained has a precision that is not better than the prediction model before it is corrected with its rainfall bias. So using this method is still less good when compared with using interpolation method. For modelling results in other areas of the island of Bali, generally we obtain improved results for rainfall data estimation using interpolation when compared to using the nearest point approach. The results of the evaluation of the bias correction process for all observation areas on the island of Bali can be seen in the graphs in Figure 4 and 5.
The average results obtained for all regions have met the criteria that have been determined. It is generally found that during the rainy season in February, March, November and December, the use of bias correction gives better results than other months for all areas of observation on the island of Bali. At least the corrected model for the season can meet one of the criteria in determining the best model that is seen from the mean value for the accuracy and standard deviation values for the precision level of the model [8][9]. Meanwhile, the results obtained by approaching the data using the nearest point cannot explain well if viewed the results based on the season that occurred in the territory of Indonesia.

Conclusion
The bias correction method can identify well the bias of the historical rainfall forecast model. The value of the transfer function formed from the bias correction process can be used to improve the distribution of the 2016 rainfall prediction on the island of Bali, to obtain better prediction. It is generally found that during the rainy season in February, March, November, and December, the use of bias correction provides better results than the dry season in modelling rainfall forecasts with data approach using interpolation. Whereas, the data approach using the nearest point in general has not provided good results for cases in the rainy or dry season. This can be caused of little information rainfall data used by researchers. For further research can use more rainfall data so that it can produce a better distribution.