Evaluation of bias correction methods for multi-satellite rainfall estimation products

Rainfall data from satellites provides hydrological studies with special temporal and spatial advantages. However, recognising the biases in satellite data is critical, so robust validation and correction methods using ground-based observational data are necessary. This research seeks to correct and validate multi-satellite rainfall data (TRMM, GPM-IMERG, and GSMaP) in order to enable hydrological applications. The corrective methods include linear scaling (LS), empirical quantile mapping (EQM), and local intensity scaling (LOCI). In validation, three statistical metrics are employed: Correlation Coefficient (R), Root Mean Squared Error (RMSE), and Relative Bias (RB). Assessing ten years of monthly data from the Kuranji watershed, LS and EQM emerged as optimal bias correction algorithms for all satellites, with LOCI outperforming TRMM and GSMaP. Corrected monthly rainfall patterns using LS and EQM closely correlate with observed data, substantially reducing discrepancies between field records and satellite-derived rainfall data. This enhances the usability of satellite data for in-depth hydrological studies.


Introduction
Rainfall estimation is of great importance in various disciplines [1], [2].These data are used as inputs for physical models in disaster-related weather and hydrological forecasting, and for effective disaster management.There are several sources of rainfall estimation available such as TRMM, gauging station observations, satellites, and weather radar.Gauge station observations are considered to have the highest accuracy as they represent rainfall at ground level, although this data is sparse.Spatial interpolation is used to fill in unmeasured locations [3], [4], [5].Another approach is to use remote sensing techniques, such as satellite data, to measure the distribution of rainfall in a given area [6], [7].However, the accuracy of rainfall estimation from remote sensing is still debatable and varies across regions [8], [9].
Currently, a wide variety of satellite products are widely used and accessible.Integrated Multisatellite IMERG, GSMaP, TRMM, and CMORPH are just a few of the well-known missions from which these datasets are derived [10], [11].These datasets are freely accessible and widely used.All of these goods were created utilizing algorithms based on thermal infrared or passive microwave observations.Remarkably, there have been significant advancements in these technologies, resulting in the worldwide accessibility of high-resolution spatial and temporal data.These restrictions include biases originating from multiple sources, including indirect measurements [12], uncertainties associated with sampling [13], and constraints imposed by sensor capabilities [14].
It's important to emphasize that a major challenge with satellite-based rainfall estimates (SREs) lies in the notable biases they exhibit when compared to ground-based data.Consequently, the application of bias-correction procedures becomes imperative before their use in hydrological and climatological investigations.In this context, various bias-correction methods have been employed, including"linear scaling [15],"local intensity scaling" [16],"power transformation" [17],"and distribution mapping [18], [19], [20]." The linear scaling approach was applied to rectify biases observed in IMERG and TRMM3B42V7 rainfall estimations within the context of Myanmar.The primary aim was to enhance the accuracy of daily and monthly streamflow simulations through the implementation of this correction method [20].In the case of Indonesia, the power transformation method was employed to correct biases in TRMM3B42RT estimates, resulting in adjusted values that demonstrated satisfactory performance when compared to gauge data [21].The distribution mapping method played a crucial role in rectifying TRMM3B42RTV7 estimates in both China [22] and India [23].Furthermore, this method was applied to rectify estimates from different satellite products, namely CMORPH, TRMM3B42V7, SM2R-CCI, TAMSAT, and 180CFSR, across the Upper Blue Nile basin [24].
In Indonesia, research on satellite-derived rainfall has been conducted, primarily utilizing a restricted set of data sources such as TRMM and PERSIANN, with a predominant focus on water resources.Conversely, alternative data sources like GPM-IMERG and GSMaP remain significantly underexplored and are not intricately connected to water resources analysis.Previous studies in this domain have employed TRMM data to scrutinize three distinct types of rainfall in Indonesia: monsoonal, equatorial, and local rainfall patterns.These investigations led to the formulation of correction equations tailored to each specific type of rainfall [25].Hence, there is a clear imperative for further research in the realm of satellite rainfall analysis, encompassing a more extensive array of satellite data sources.This subsequent research should be in alignment with the fundamental principles of water resources management, with a particular emphasis on comprehending the translation of rainfall into discharge within the study area, typically manifested in the form of watersheds.
Given the significance of Satellite-Based Rainfall Estimation (SRE) in hydrological modeling, particularly for watersheds lacking ground-based gauge coverage, this study aims to consolidate and build upon the findings of prior research.Previous studies have delved into the validation analysis of rainfall within the Kuranji watershed, utilizing data from the TRMM and GPM-IMERG satellites [26].In this context, our study takes a comprehensive approach by analyzing three high-resolution (daily) SREs, namely TRMM 3B42RT, GPM-IMERG, and GSMaP.These estimations will be meticulously assessed against ground-based rainfall data spanning a decade, from 2010 to 2019.Moreover, we investigate possible methods of bias correction to improve the efficiency of SREs in hydrological applications.
Through this research, we aim to the improvement of Satellite-Based Rainfall Estimation's precision and suitability for hydrological modeling in areas with sparse ground-based gauge utilization.

Study area
The Kuranji Watershed, located in Padang, West Sumatera, was selected to study.This location falls at 00 0 48'-00 0 56' North Latitude and 100 0 20'-100 0 34' East Longitude.The area is bounded by the subdistricts of Pauh, North Padang, Nanggalo, Kuranji, and Koto Tangah.It is situated on the west coast of Sumatra, adjacent to Padang City and Solok Regency [27].Figure 1

Datasets
Data that has been gathered is secondary data.Kuranji Watershed Daily Rainfall Data was utilized for ten years, from 2010 to 2019.The collection of daily rain data from observation stations and satellite data according to the research location and length of observation station rain data.The rainfall data was converted into monthly data.

Rain Gauge Data
The data utilized in 2010-2019 came from three observation sites in the Kuranji watershed: Batu Busuk, Koto Tuo, and Gunung Nago.The data is daily precipitation data from the West Sumatra PSDA service.

Satellite Rainfall Products
The TRMM, GPM-IMERG, and GSMaP satellite rainfall datasets' satellite rainfall products are assessed in this study.TRMM is a cooperative NASA-JaXA project that was launched on November 27, 1997.Its primary objectives are to monitor and study tropical precipitation and to promote studies on global climate change [28], [29].In particular, the Giovanni platform (GESDISC Online Visualization and Analysis Infrastructure) was used to retrieve TRMM rainfall data from product 3B42, version 7.This platform is available at https://giovanni.gsfc.nasa.gov/.A spatial resolution of 0.25° x 0.25° was used to aggregate and evaluate the daily measurements from the TRMM [30].
The GPM-IMERG satellite, jointly developed by NASA and JAXA, is designed to measure global rainfall using remote sensing instruments.Launched in 2014, it succeeds the TRMM satellite, which focused exclusively on tropical rainfall.Three main products are produced by the GPM-IMERG satellite: IMERG Early, IMERG Late, and IMERG Final.The accuracy and temporal delay of each product varies.These products prove highly valuable for hydrological analysis, particularly in regions lacking ground-based rainfall stations.Additionally,"GES DISC offers datasets such as GPM IMERG Final Precipitation L3 1 day 0.10 x 0.10 V06 and GPM Level 3 IMERG Late Daily 10 x 10 km, further expanding the available resources for rainfall analysis globally [29]." GSMaP was created in tandem with PMM,"the Precipitation Measuring Mission of the Japan Aerospace Exploration Agency."Threeprecipitation datasets were created by the PMM team for the GSMaP project.The Global Satellite Mapping of Precipitation Microwave-IR Combined Product (GSMaP-MVK),"the Gauge-calibrated Rainfall Product (GSMaP-Gauge), and the Global Rainfall Map in Near RealTime (GSMaP-NRT) are a few of these products [31]."IOP Publishing doi:10.1088/1755-1315/1317/1/0120084

Bias correction methods
Bias correction aims to eliminate differences between observed rainfall and unprocessed satellite data.The current study comprised three commonly used methods of bias correction: linear scaling (LS), local intensity scaling (LOCI), and empirical quantile mapping (EQM)."[32], [33], [34].The application of these methods to rainfall variables ranging in complexity from basic scaling to advanced QM methodologies [35], [36].

Linear scaling (LS)
The objective of the linear scaling approach is to reduce model biases by aligning the adjusted monthly mean rainfall with observed data, employing a multiplicative factor [15].The fundamental idea is to achieve a close match between the monthly mean rainfall from observations and the adjusted data.The method involves two steps.The model's historical records are represented by the control data, which is the monthly mean observed data divided by them.This algorithm yields the monthly scaling factor.The daily value derived from the uncorrected modeled precipitation quantity for the relevant month is multiplied by the scaling factor in the second step.The goal of this two-step procedure is to reduce model biases by aligning the monthly mean rainfall of the model with observed data.This approach is represented mathematically as follows [37]: where Pcor,m,d denotes the corrected precipitation on the dth day of the mth month, and Praw,m,d denotes the raw precipitation on the same day.The expectation operator is represented by µ(.) for example, "(Pobs,m")" stands for the average amount of observed precipitation for the specified month, "m."

Local intencity (LOCI)
The method of LOCI was presented in addition to linear scaling (LS).The goal of LOCI is to account for the severity and frequency of wet days in addition to monthly precipitation averages.The following steps are involved in determining the adjusted precipitation figures [16], [35]: 1) The monthly wet day threshold m, Pth,mod is calculated using a daily time sequence of raw model rainfall.This guarantees that the wet days in the model that fall above the cutoff (rainfall > 0 mm) correspond to the actual wet days.
,, = { 0   ,, <  ℎ.  ,, ℎ 2) The scaling factor (Sf) was calculated using the monthly long-term average intensity of these wet days with a focus only on wet days, defined as days with observed rainfall above 0 mm and modeled days with rainfall above a calibrated threshold (Pth,m).
3) To obtain model data that is free from bias, multiply the raw data by the scaling factor, as demonstrated in the following equation:  ,, =  ,,    (4)

Empirical quantile mapping (EQM)
Adaptable to various forms of rainfall distribution, the Quantile Mapping (QM) method stands out as a versatile non-parametric approach [35].With this technique, biases such as quantiles, standard deviation, mean, wet-day frequency, and others are efficiently corrected [38].The model's cumulative distribution function (CDF) is calibrated using the QM-based EQM approach.This is achieved by integrating the average rainfall, quantiles of the actual rainfall distribution, and delta changes in the associated quantiles [39].By using the quantile-to-quantile matching method, the probability distribution functions (PDFs) of the data and the model are integrated [18]."It generates a transfer function, which is then applied to change the rainfall from the raw model to the corrected model.After this modification, theoretically, the model's and the observations' Cumulative Distribution Functions (PDFs) ought to be equal. ,, =  −1 , ( , ( ,, )) The empirical cumulative distribution function (CDF) of the observed data is inverted by ECDF -1 obs,m in equation ( 5), while ECDFraw,m represents the empirical CDF of the raw satellite rainfall data.

Performance Evaluation of the Bias Correction Method
The correlation coefficient (R), root mean square error (RMSE), and relative bias (RB) were the statistical measures used to evaluate the effectiveness of the three techniques for correcting bias.These measures serve as metrics to measure bias between each bias correction method's output and observed values during validation.The evaluation will be conducted based on the following criteria: where n denotes the sample count, Si is the amount of precipitation as determined by satellite, and Gi is the data from the observed rain gauge.
The relationship between the measured ground rainfall data and SRE estimates is shown by the correlation coefficient (R).Greater R-values (which range from 0 to 1), where 1 denotes perfect correlation, are indicative of stronger consistency between SRE data and the gauge.The RMSE, which has a value between 0 and +, is used to determine how accurately observed and simulated data were calculated.When the RMSE value gets near zero, estimation accuracy improves.In contrast, the relative volume difference between the modeled and observed volumes is calculated using the Relative Bias (RB).Positive values imply overestimation, and negative values indicate underestimation [40], [41], [42].

Results and discussion
Rainfall data from the Kuranji watershed and multi-satellite rainfall data from 2010 to 2019 are graphed, and statistical parameter values are produced.Rainfall data from multi-satellite and the Kuranji watershed are graphed in Figure 2. A comparison of recorded rainfall values with satellite data sources, including TRMM, GPM-IMERG, and GSMaP, reveals disparities in rainfall measurements, as depicted in Figure 2.These discrepancies can arise from a variety of factors, such as sensor inaccuracies [43] and variations in retrieval algorithms [44], among others.In additional research, incongruities may be attributed to factors like cloud characteristics, climate, seasonality, geographical location, and topographical relief [45] and [46].
Corrections of satellite-derived rainfall data often face limitations linked to the availability of both spatial and temporal observational rainfall data.Challenges arise from discontinuous recording, inadequate time series length for rainfall data, data gaps, and uneven distribution of rain measurement stations [47].

Bias correction
The Linear Scaling (LS) method relies on Equation 1, which calculates the ratio between the average observational data and the average satellite data.This ratio is determined through the calibration of monthly data, resulting in 12 coefficients.In Figure 3, the accuracy of corrected multi-satellite data is presented in comparison with observational data, utilizing a monthly rainfall average spanning ten years.The comparison reveals a striking similarity in the rainfall patterns between TRMM and GSMaP satellite data and the observed monthly rainfall.This similarity is particularly notable for rainfall depths exceeding 200 mm, where the correction applied to TRMM results in values that closely align with the observed rainfall.However, Figure 3 also highlights a consistent underestimation of monthly average rainfall across all three satellites.This underestimation is most pronounced in the case of the GPM-IMERG satellite, with a notable peak rainfall of 143.62 mm observed in January.The corrected TRMM and GSMaP data, on the other hand, show the rainy season peaking in February, with values above 200 mm.The daily rainfall simulation model is adjusted for each month using the linear scaling (LS) methodology, which is based on monthly averages.By significantly aligning the monthly mean values, this process effectively reduces bias in the mean values [36].It should be noted that, however to a little lesser degree, the other two methods also significantly lessen the difference in mean values [37].

Figure 3. Results of bias correction data using linear scaling (LS) method
Figure 4 depicts the outcomes of bias reduction through the LOCI method.The corrected data from TRMM, GPM-IMERG, and GSMaP satellites exhibit a remarkably similar pattern to the observed monthly rainfall trend, particularly from June to October.Across different months, all three satellite datasets consistently tend to overestimate average monthly rainfall.This overestimation is most prominent during the rainy season, characterized by distinct peaks in February and July.It's noteworthy that February stands out as the wettest month in all three satellite datasets, while August emerges as the driest month.5. illustrates the outcomes of mitigating data bias through the EQM approach.Figure 5. illustrates how the corrected GPM-IMERG satellite data typically understate the observed data.The observed rainfall values, however, are typically overestimated by both the TRMM and GSMaP satellite data sets.Notably, the highest recorded rainfall in corrected GPM-IMERG and TRMM data falls in December, while in the case of GSMaP-corrected data, the wettest month is June.
Figure 5. Results of bias data using Empirical Quantile Mapping (EQM) method study's findings imply that rainfall in the Kuranji watershed follows an equatorial pattern.This conclusion is derived from the results of correction for bias applied to rainfall data using the three approaches.This region has a bimodal monthly rainfall distribution, typified by two peaks throughout the rainy season, with the majority of the year falling within the wet season requirements.This specific rainfall pattern is influenced by the movement of the sun across the equator, particularly in the equatorial zone where West Sumatra is located [48].
The LS method stands out as the most effective among the three bias-correction techniques, showcasing superior overall agreement between bias-corrected long-term averages and observed values.While the LOCI approach demonstrates satisfactory performance in wet months, it falls short of adequately correcting the mean across the entire time series.The ensemble model, evaluated through the EQM technique, proves satisfactory in aligning with observations during wet and post-monsoon months.Notably, the LS method refines monthly mean rainfall values, ensuring consistency with the raw RCM data [49].However, a limitation surfaces as this method uniformly applies the correction factor to all events, lacking the ability to accommodate variations in frequency.

Performance Evaluation of the Bias Correction Method
The statistical parameter values for the validation of multi-satellite data, including R, RMSE, and RB, following the bias correction, are summarized in Table 1.Overall, the analysis reveals that TRMM data outperforms GPM-IMERG and GSMaP rain data.The TRMM data show a particularly strong link,"as indicated by the correlation value (R) of 0.95."Additionally, the relative bias (RB) value for TRMM rain data is relatively lower in comparison to GPM-IMERG and GSMaP rain data.Despite achieving satisfactory R values for multi-satellite data, there remains a significant issue with large RMSE values, particularly evident in the GPM-IMERG dataset.When comparing gauge data that has been corrected for GSMaP, the higher RMSE values indicate a significant degree of error.This discrepancy is primarily attributed to the tendency of GSMaP-corrected gauge data to exhibit a higher degree of overestimation [50].The three satellite data have a very significant correlation with the observation data, according to a comparison of the bias correction results using the Local Intensity Scaling (LOCI) approach.GSMaP rainfall data is superior to TRMM and GPM-IMERG rainfall data.With a correlation coefficient (R) = 0.87, it is clear that there is a high association.Even though multi-satellite data has a fairly good Rvalue, the RMSE value is still high, particularly for GPM-IMERG data.This discrepancy is attributed to the GSMaP corrected gauge data demonstrating a tendency for higher overestimation [50].Meanwhile, the Relative Bias (RB) values for all three satellites exhibited negative values, signifying that the multi-satellite measurements are inferior to the rain gauge, indicating a systematic underestimation in comparison to the gauge data.
The three satellite datasets and the observational data also shown a very significant association overall, according to the comparative study of bias correction results using Empirical Quantile Mapping (EQM).The GSMaP data, in particular, exhibits a very strong correlation with a coefficient (R) of 0.93, outperforming the TRMM and GPM-IMERG rain data.However, a significant deviation is observed in the RMSE value.The Relative Bias (RB) values for all three satellites exhibit negative values, indicating that the multi-satellite measurements are underestimated compared to the rain gauge data.This consistent underestimation is reflected across all three satellites.Considering the EQM method, it can be concluded that TRMM satellite data is the most suitable choice.This conclusion is drawn based on its strong correlation value, relatively low RMSE value, and favorable RB value compared to the other two satellites.
Over the 10-year validation period for rainfall data, it is evident that the Empirical Quantile Mapping (EQM) method emerges as the most suitable for correcting TRMM, GPM-IMERG, and GSMaP satellite rainfall data.This is underscored by the robust correlation coefficients (R) exceeding 0.90 for all three satellites.With GPM-IMERG data, the bias correction performed using the EQM method notably produces the lowest RMSE value and an ideal RB value that is close to zero.In contrast, the LOCI method's correction of GPM-IMERG data results in comparatively lower performance, as seen by a 0.75 correlation coefficient and a significant RMSE divergence.
According to the performance results for the Kuranji watershed, the LS and EQM methods outperform the LOCI method significantly.Prominently, partiality undergoes efficient minimization through the employment of three distinct bias correction methodologies.In this context, LS and EQM emerge as the most optimal choices across the spectrum of satellite data, encompassing TRMM, GPM-IMERG, and GSMaP.On TRMM and GSMaP satellites, however, LOCI performs comparatively well.Across most cases, the LS and EQM methods prove more effective in mitigating both positive and negative bias compared to the LOCI method [37].The LS method outperformed other methods because it was effective at removing bias from average daily rainfall.This is consistent with findings from earlier research by [15], [35], and [36], which also witness to the method's usefulness in removing bias from daily rainfall data.

Conclusion
The examination of the Kuranji watershed has been the focal point of an inquiry concentrating on the application of bias correction methodologies.The primary aim is to diminish the discord between rainfall data derived from satellites and the actual rainfall records on the ground.The study found that EQM, LS, and LOCI were the most effective methods for bias reduction across all three satellite datasets, with LOCI showing lower efficacy.The proficiency of LS is attributed to its adeptness in correcting bias in monthly mean rainfall data.However, the validity of this bias correction approach for climate change scenarios remains uncertain.Future research should explore the development of new bias correction procedures or enhancements to satellite capabilities.The study also revealed that the enhanced performance of the selected multi-satellite data following bias correction in the Kuranji watershed makes the data suitable for broader hydrological assessments.

Figure 4 .
Figure 4. Results of bias correction data using the Local Intensity Scaling (LOCI) method

Figure
Figure5.illustrates the outcomes of mitigating data bias through the EQM approach.Figure5.illustrates how the corrected GPM-IMERG satellite data typically understate the observed data.The observed rainfall values, however, are typically overestimated by both the TRMM and GSMaP satellite data sets.Notably, the highest recorded rainfall in corrected GPM-IMERG and TRMM data falls in December, while in the case of GSMaP-corrected data, the wettest month is June.

Table 1 .
Comparison of the statistical analysis results of multi-satellite data after bias reduction