Significant advancement in subseasonal-to-seasonal summer precipitation ensemble forecast skills in China mainland through an innovative hybrid CSG-UNET method

Reliable Subseasonal-to-Seasonal (S2S) forecasts of precipitation are critical for disaster prevention and mitigation. In this study, an innovative hybrid method CSG-UNET combining the UNET with the censored and shifted gamma distribution based ensemble model output statistic (CSG-EMOS), is proposed to calibrate the ensemble precipitation forecasts from ECMWF over the China mainland during boreal summer. Additional atmospheric variable forecasts and the data augmentation are also included to deal with the potential issues of low signal-to-noise ratio and relatively small sample sizes in traditional S2S precipitation forecast correction. The hybrid CSG-UNET exhibits a notable advantage over both individual UNET and CSG-EMOS in improving ensemble precipitation forecasts, simultaneously improving the forecast skills for lead times of 1–2 weeks and further extending the effective forecast timeliness to ∼4 weeks. Specifically, the climatology-based Brier Skill Scores are improved by ∼0.4 for the extreme precipitation forecasts almost throughout the whole timescale compared with the ECMWF. Feature importance analyze towards CSG-EMOS model indicates that the atmospheric factors make great contributions to the prediction skill with the increasing lead times. The CSG-UNET method is promising in subseasonal precipitation forecasts and could be applied to the routine forecast of other atmospheric and ocean phenomena in the future.


Introduction
The subseasonal-to-seasonal (S2S) precipitation forecasts play an important role in disaster prevention and mitigation, which fill up the gap between weather forecasts (up to 2 weeks) and climate predictions (up to 3 months) (Liang et al 2018, Vigaud et al 2019).However, the S2S forecasts are still lacking significant advancements and therefore present great challenges for mission agencies and research communities, as the lead time is so long that the memory of atmospheric initial conditions fades rapidly but is also too short to make use of information from the slow-varying boundary forcings (Fan et al 2022;Vitart et al 2012, Lyu et al 2022, Zhu et al 2023a).
In recent years, the S2S forecasts considerably benefit from progresses in understanding the source of S2S predictability, such as the El Niño-Southern Oscillation, the Madden Julian Oscillation and stratosphere-troposphere interactions (Liang and Lin 2018, Kolstad et al 2021, Wang et al 2022).Meanwhile, the improvements in numerical model, observation system, data assimilation and computing resource also promote the development of S2S forecasts (Feng et al 2020, 2022, Zhang et al 2021).So far, several meteorological services have developed forecasting systems to provide S2S precipitation forecasts, which has been applied in water resources management and associated early warnings.However, the raw forecasts from S2S models tend to suffer from a substantial amount of forecasts biases and a marginal level of predictive skills (Li et al 2021, Lyu et al 2022, Yin et al 2023).Thus, forecast calibration is necessary before they are used for hydrological modeling and decision support systems.
Several post-processing methods have been proposed to obtain reliable probabilistic guidance from ensemble precipitation forecasts, such as the Bayesian joint probability model (Wang et al 2009), the ensemble model output statistics (EMOS; Gneiting et al 2005, Scheuerer 2014) and the isotonic distributional regression (Henzi et al 2021).These approaches have been widely utilized in weather forecasts and climate outlooks with promising results (Ji et al 2019, Li et al 2022a, 2023).However, due to joint effects of the low signal-to-noise ratio, the mixed discretecontinuous distribution of S2S precipitation and the relatively small sample sizes of available S2S reforecasts, current statistical-based post-processing methods have certain limitations in enhancing the predictive accuracy of S2S precipitation forecasts (Mariotti et al 2020, Scheuerer et al 2020, Huang et al 2022).
Recently, the data-driven deep learning (DL) methods, adept at capturing intricate nonlinear relationships between data and features, have been utilized towards model output calibrations (Rasp andLerch 2018, Ling et al 2022).Generally, these DL methods tend to generate more skillful forecasts as it accounts for the complicated relationship and covariabilities between predictors and predictands (Fan et al 2023, Guo et al 2023, Sun et al 2023a).Among them, the UNET, which take advantages of capturing multi-scale spatial patterns and reducing both temporal and spatial errors (Sun et al 2023a, Zhu et al 2022, Lyu et al 2023), has been demonstrated fairly effective to improve the S2S probabilistic precipitation forecasts (Vitart et al 2022, Horat andLerch 2023).However, the effectiveness of DL methods often displays considerable regional variation (Scheuerer et al 2020) and it remains to be investigated over China mainland, especially at local scale.Meanwhile, previous studies mainly focus on providing the probability of precipitation occurring at some certain thresholds but without continuous probability distribution of precipitation, which hinders their full potential and practical application (Sønderby et al 2020, Vitart et al 2022).
In this study, we proposed a novelty hybrid method, which combines the UNET with the censored and shifted gamma distribution based ensemble model output statistic (CSG-EMOS), to obtain reliable S2S precipitation probabilistic guidance over China mainland during boreal summer from European Centre for Medium-range Weather Forecasts (ECMWF).Notably, the ECMWF show limited precipitation forecast skills at longer lead times and the low signal-to-noise ratio further constrains the potential benefits of straightforward postprocessing of precipitation forecasts (Tian et al 2017, Li et al 2021).However, the forecast skills of ECMWF for associated atmospheric thermodynamic variables can be maintained to a large extent at the lead time of 3-4 weeks (Kim et al 2018, Liang andLin 2018).Considering the connections between these variables and precipitation, we take atmospheric variable forecasts into account to provide additional opportunity windows for improving precipitation forecasts.Meanwhile, the data augmentation is also included to deal with the problem of relatively small sample sizes.The additional atmospheric variable forecasts and the data augmentation are also included to deal with the problem of low signal-to-noise ratio and relatively small sample sizes, respectively.The individual CSG-EMOS and UNET models are selected as benchmarks.Furthermore, detailed analyses towards different aspects of predictions are carried out based on the decomposed Brier Skills (BS).The rest of this paper is organized as follows.Data and methods are described in section 2. Results are presented in section 3. Finally, section 4 provides the summary and discussion.

Data
The reforecast datasets of precipitation and associated factors including 2 m temperature (t2m), 2 m dewpoint temperature (d2m), sea surface temperature, total cloud cover and total column water on surface, along with the zonal wind (u), meridional wind (v), vertical velocity (w), temperature (t), geopotential height and specific humidity (q) at three pressure levels of 200, 500, 850 hpa over China mainland are derived from the ECMWF forecasts in the S2S Project (Vitart et al 2017).The reforecast model is initialized twice a week with the horizontal resolution of 1.5 • × 1.5 • .The details of atmospheric variables can be found in table S1.The selected factors are motivated by previous study (Lyu et al 2023, Zhang et al 2023) and domain knowledge.
The observational CN05.1 precipitation dataset (Wu and Gao 2013, Zhao et al 2023, Zhu et al 2023b), provided by the National Meteorological Information Center, is used as a reliable reference for forecast verification.The raw CN05.1 data has a horizontal resolution of 0.25 • × 0.25 • and is hence regirded to 1.5 • × 1.5 • to unify consistent resolutions of forecasts and observations.
Moreover, the study focuses on the extended boreal summer seasons (May-September) from 2001 to 2020, composing a total of 880 samples (44 initializations per year × 20 years).Following previous  Specifically, we interpolated the twice-weekly reforecast data from the training set and the examination set into the daily reforecast data based on the method proposed by Yang et al (2018).Hence, the samples in training set and validation set are enlarged to 1836 and 459, respectively.On the other hand, to mitigate the effect of interpolation errors on the calibration, we finally evaluate forecast skills of the 220 uninterpolated samples in examination set.
As introduced in section 1, the China mainland is determined for forecast examinations and calibration applications in this study.To describe regional differences, we further divide China into eight subregions with different climatology features following previous studies (Xu et al 2015, Sun et al 2023b), including North China (NC), East China (EC), Central China (CC), South China (SC), Northeast China (NE), Northwest China (NW), Southwest China (SW), and Tibetan Plateau (TP).The details can be found in figure 1.

The hybrid CSG-UNET architecture
The UNET architecture is a convolutional neural network that captures multi-scale spatial patterns and shows promising potential for post-processing model output (Sun et al 2023a).Figure 2 illustrates the sketch and data flow of the UNET framework employed in this study, comprising convolution layers, pooling layers, transposed convolution layers and skip connections.The whole network structure is shaped like the letter 'U' .The left-hand is the downsampling side, that is, the encoding process, and the right-hand is the upsampling side, that is, the decoding process.The utilized UNET in this study has four depth layers, as our preliminary experiments showed that increasing the depth beyond four layers had little impact on improving forecast skills and significantly decreased computational efficiency.The number of convolution kernels in each depth layer is 32, 64, 128 and 256, respectively.Specifically, the size of input image is (48,32,33), representing the width and height of 48 and 32, respectively, for the study area, and the 33 channels representing the input features (i.e.forecasts of the precipitation and associated atmospheric variables from ECMWF).The width and height of the input and output remain unchanged after two convolution operations, and the number of convolution kernels determines the number of output channels.The Max-pooling operations and the transposed convolution operations preserve the channels but reduce and increase the spatial dimensions, respectively.The skip connection preserves small-scale and high-frequency information.The learning rate utilized in this study is 10 −4 .
The widely utilized UNET based postprocessing method to probabilistic forecasting problem is based on the conception of transforming the probabilistic forecasting problem into a classification task, which is performed by partitioning the observation range in distinct classes and assigning a probability to each  Vitart et al 2022).In this study, we discretize the precipitation into 10 categories defined by climatological quantiles, spanning intervals of 0%-9.9%,10%-19.9%,20%-29.9%,…, and >90%.However, it shows limitations in providing continuous probability distribution of precipitation, which hinders their full potential and practical application.In this context, inspired by the state-of-the-art probabilistic post-processing method CSG-EMOS assuming a censored and shifted gamma (CSG) distribution for precipitation (Scheuerer 2014, Ji et al 2023), we propose an innovative hybrid method combining the UNET with the CSG-EMOS, which is named CSG-UNET and provides predictions on the parameters characterizing the CSG distribution by the UNET model (figure 2).The categorical cross-entropy and continuous ranked probability score are selected as the loss function for UNET and CSG-UNET, respectively.Meanwhile, the CSG-EMOS is also selected as the benchmark.The details of CSG-EMOS can be found in Text S1.

Overall evaluation
On the basis of the BS describing the mean squared error of probabilistic forecast exceeding a given threshold, the brier skill score (BSS) is also utilized to evaluate the probabilistic precipitation forecast skills of multiple models, which indicates the improvement of a target model relative to a reference forecast (i.e. the climatology prediction in this study).Associated formulae are provided in Text S2.Unlike previous studies where only a few fixed thresholds are evaluated, the current study focuses on forecast skills at each grid point based on its corresponding precipitation percentile, ranging from 10% to 90%.
Figure 3 displays the BSSs of ECMWF, CSG-EMOS, UNET and CSG-UNET averaged over China mainland at lead times of 1-4 weeks for the period from 2016 to 2020.Generally, the precipitation probabilistic forecast skills of ECMWF decrease rapidly with the increasing lead times and are limited to the 2-week lead time for nearly all thresholds.The three post-processing methods show obviously higher BSSs for all thresholds throughout the whole timescale, with the CSG-UNET emerging as the optimal.However, the CSG-EMOS shows negative skills towards extremes (e.g.80% and 90%) for lead times of 2-4 weeks, indicating its limitation in improving probabilistic forecast skills of extreme precipitation.In contrast, the two DL methods, especially the CSG-UNET, are demonstrated with overall superiority and display positive BSSs even for the 4week lead time, showing obvious BSS advancements by ∼0.4 compared with the ECMWF prediction.
To investigate the spatial characteristics of forecast skills, figure 4 describes the BSSs averaged over the eight subregions for forecasts derived from the ECMWF, CSG-EMOS, UNET and CSG-UNET, respectively.Generally, compared with the eastern part of China, the ECMWF shows lower forecast skills over western China, which have been partly attributed to insufficient descriptions of the local complex terrains in the model (Bromwich et al 1999, Bao et al 2011) as well as the limited capability of situ measurements over there (Bao and Zhang 2013, Jung et al 2016).Specially, the ECMWF shows limited forecast skills over SW, NW and TP, with negative BSSs for 10%, 80% and 90% thresholds even at the 1-week lead time.Meanwhile, the decreasing BSSs with growing lead times in the ECMWF forecasts are more evident over CC, SW, NW and TP, while the forecast skill can be maintained to some extent over NE, NC, EC and SC for lead times of up to 4 weeks.
After the three calibrations of CSG-EMOS, UNET and CSG-UNET, they depict noteworthy improvements in BSSs across almost all the subregions, with the western China featuring higher improving magnitudes than the other areas.However, although the CSG-EMOS exhibits overall superiority to the ECMWF, it shows limitations in enhancing forecast skills for extreme precipitation over the western regions like NW and TP with complex terrains.By contrast, the UNET and CSG-UNET make up this deficiency to some extent and the CSG-UNET performs as the optimal calibration for the precipitation forecasts over most subregions and thresholds.

Detailed analysis
To further analyze which aspects of the forecasts are improved by post-processing methods, the BS assessment is furtherly decomposed into three components: (i) reliability measuring the extent to which forecast probabilities match the observed relative frequencies.(ii) resolution evaluating the ability of forecasts to distinguish different outcomes, (iii) the uncertainty representing the variability magnitudes in observations (Wilks 2011).Details can be found in Text S2.Due to that the uncertainty term is independent from the forecast results, figure 5 provides the other two decomposed BS terms (i.e.BS_Reliability and BS_Resolution) of the multiple forecasts averaged over the China mainland, accompanied by the corresponding skill scores of CSG-EMOS, UNET and CSG-UNET compared to ECMWF.Lower terms of BS_Reliability (BS_Resolution) always represent higher (lower) probabilistic forecast skills.
Generally, the BS_Reliability of ECMWF increases rapidly with increasing thresholds, implying the decreasing consistency of forecasts and observations.After the three post-processing procedures, the BS_Reliability of all thresholds are obviously reduced.From the view of the skill scores, the two DL methods show obviously higher calibration capabilities for extreme precipitation forecasts than CSG-EMOS, which indicates the superiority of DL methods in improving the consistency of predicted and observed extreme precipitation.
In terms of the resolution term, the ECMWF exhibits high discriminative capability under the moderate-intensity rainfall but relatively poor discrimination ability for light rainfall and extreme precipitation.Notably, the two benchmarks of CSG-EMOS and UNET show limitations in enhancing the original BS_Resolution, especially at lead times of 1-2 weeks.In contrast, the hybrid CSG-UNET model  Climatology is selected as the reference forecast.The regions with dots imply the forecast skills higher than climatology.
shows positive BS_Resolution_SS for most thresholds for all lead times, which is even more evident at the 3-4 lead weeks.That is, the gratifying capability of CSG-UNET to recognize different precipitation outcomes even for lead times of 3-4 weeks is further demonstrated.
In summary, both three post-processing methods can improve the reliability of ECMWF for all thresholds, with the two DL methods showing superiority over CSG-EMOS for extreme precipitation forecasts.Meanwhile, both UNET and CSG-EMOS show limited improvements on discriminative ability of ECMWF, especially at the lead times of 1-2 weeks, while the newly proposed CSG-UNET shows generally promising results.

Summary and discussion
The present study proposes an innovative hybrid post-processing method CSG-UNET based on UNET and CSG-EMOS to obtain reliable S2S precipitation probabilistic guidance from ECMWF over China mainland during boreal summer.The additional atmospheric variable forecasts and the data augmentation are included to deal with the potential problem of low signal-to-noise ratio and relatively small sample sizes, respectively.
The CSG-UNET can pronouncedly improve the S2S precipitation forecast skills compared to individual use of CSG-EMOS and UNET.The forecast skills of precipitation under all thresholds can be extended up to 4-week lead time after applying the CSG-UNET, while the forecast skills of ECMWF are limited to 2 weeks lead time.Notably, the CSG-UNET demonstrates substantial advantages in improving extreme precipitation, showing obvious BSS advancements by ∼0.4 compared with the ECMWF prediction, especially over the regions with complex terrain (i.e.NW and TP).Further analysis indicates that this can be attributed to two DL methods take advantage of improving the consistency of extreme precipitation between forecasts and observations over CSG-EMOS.Besides, the CSG-UNET can improve the resolution ability under most thresholds obviously compared with UNET for 1-4-week lead times, which benefits from the conversion of discrete thresholds into CSG distributions and accounts for the superiority of CSG-UNET over UNET.On the other hand, despite our inclusion of atmospheric variables forecasts to offer more opportunities for enhancing precipitation forecast skills at longer lead times, the problem of low signal-to-noise ratio at longer lead times is still unavoidable, which constrains the potential benefits of post-processing models and leads to the less pronounced advantages of CSG-UNET over UNET at the lead times of 3-4 weeks.However, the resolution ability of CSG-UNET still exhibits a significant advantage over UNET at 3-4-week lead times.Thus, the hybrid approach has great prospects in S2S precipitation forecasting from no matter theoretical or practical perspective.
In fact, we also utilize the permutation importance method (PIM) to investigate the relative importance of each predictor for the CSG-UNET model, which does not depend on internal knowledge of model and has been widely used in previous studies (Rasp andLerch 2018, Li et al 2022b).The details of the PIM and associated results can be found in Text S3 and figure S1.The forecast skills of CSG-UNET mainly are derived from the precipitation forecast itself at lead times of 1-3 weeks.The contributions of associated variable forecasts increase with the growing lead times, and the v200 predictor plays a most important role at the 4-week lead time.It indicates that the inclusion of additional variable forecasts may create more chances for S2S precipitation predictions.
Notably, the above-mentioned methods only take the variable forecasts derived from dynamic model as the predictors, while the subseasonal predictability signals within the initial field like ENSO and MJO are disregarded (Wang et al 2018, Kolstad et al 2021).Thus, taking these subseasonal predictability signals into our DL model may provide enhanced forecast skills and will be examined in the future.Moreover, although the utilized PIM can obtain the importance of features based on their impact on a trained ML model's prediction, more advanced methods are still need to make the DL methods more explainable.

Data availability statement
The S2S reforecast dataset of ECMWF can be derived from the S2S Archiving Data Center in the ECMWF data portal at https://apps.ecmwf.int/datasets/.The observational CN05.1 precipitation dataset are available from the Climate Change Research Center, Chinese Academy of Sciences at https://ccrc.iap.ac.cn/ resource/detail?id=228.
All data that support the findings of this study are included within the article (and any supplementary files).
studies (Ling et al 2022), we split the entire dataset into a training set (2001-2012), a validation set (2013-2015; to optimize the model hyperparameters), and an examination set (2016-2020).To deal with the potential problem of relatively small sample sizes, the data augmentation is utilized to enrich the samples, which has shown promising results in previous studies (Wang and Perez 2017, Sun et al 2023a).

Figure 2 .
Figure 2. Sketch of the UNET architecture.Forecasts of precipitation and the associated atmospheric variables extracted from ECMWF are selected as input data.The number of channels in the UNET is indicated inside of each layer.The horizontal dimensions of each layer are given on the left.Green arrows indicated convolutional operations followed by a ReLU activation function.Gray arrows represent the skip-connections, i.e. feature concatenation.Orange and yellow arrows denote the max pooling and transposed convolutions, respectively.

Figure 3 .
Figure 3. BSSs (Y-axis) of forecasts for multiple precipitation thresholds in percentage (X-axis) derived from the ECMWF, CSG-EMOS, UNET and CSG-UNET at lead times of 1-4 weeks averaged over the China mainland (climatology is selected as the reference forecast).The respective shadings are the 95% confidence intervals of the estimated mean of the metric by bootstrapping.

Figure 4 .
Figure 4. BSSs (%) of forecasts for multiple precipitation thresholds in percentage (Y-axis) derived from the ECMWF, CSG-EMOS, UNET and CSG-UNET at lead times of 1-4 weeks averaged over eight subregions of China mainland (X-axis).Climatology is selected as the reference forecast.The regions with dots imply the forecast skills higher than climatology.

Figure 5 .
Figure 5.The decomposed brier skill terms (i.e.BS_Reliability and BS_Resolution) of numerous models averaged over the China mainland, accompanied by the corresponding skill scores of CSG-EMOS, UNET and CSG-UNET compared to ECMWF.The respective shadings are the 95% confidence intervals of the estimated mean of the metric by bootstrapping.