Skillful seasonal prediction of the 2022–23 mega soil drought over the Yangtze River basin by combining dynamical climate prediction and copula analysis

An unprecedented soil moisture drought broke out over the Yangtze River basin (YRB) in the summer of 2022 and lasted until the spring of 2023, caused great economic losses and serious environmental issues. With the rapid onset and long-lasting duration, the mega soil drought challenges the current seasonal prediction capacity. Whether the state-of-the-art climate models provide skillful predictions of the onset, persistence and recovery of the 2022–23 mega soil drought needs to be assessed. Identified by the drought area percentage, here we show that the mega soil drought over the YRB started from July, 2022, reached the peak in August, and diminished in April, 2023. Combined with real-time predictions of monthly precipitation released by three climate models participating in the North American multi-model ensemble (NMME) project, we predict the monthly evolution of the 2022–23 soil drought through a joint distribution between precipitation and soil moisture established by the copula method. The results indicate that the NMME/copula prediction well reproduced the spatiotemporal evolution of the mega soil drought at 1 month lead. Using the climatological prediction that relies on the information of initial soil moisture conditions as the reference forecast, the Brier skill score (BSS) values for NMME multi-model ensemble are 0.26, 0.23 and 0.2 for the forecast lead times increased from 1 to 3 months during the entire soil drought period. Specifically, the BSS is 0.14 at 2 months lead during drought onset stage, and 0.26 at 3 months lead during persistence stage, while it is close to zero at all leads during the recovery stage. Our study implies that climate models have great potential in probabilistic seasonal prediction of the onset and persistency of mega soil drought through combining with the copula method.


Introduction
Drought prediction is a great challenge due to the diverse and complex origins of drought events (Kiem et al 2016, Hao et al 2018), especially for the mega drought events that break historical records.Under climate change, the changes in drought evolution characteristics at regional and global scales further challenge the drought early warning and prediction (Pendergrass et al 2020, Tyagi et al 2022, Ma et al 2023, Yuan et al 2023).In the summer of 2022, the entire Yangtze River basin (YRB) experienced an unprecedented mega drought without sufficient early warning (Wang and Yuan 2023), which was accompanied by a rapid spatiotemporal onset and eventually developed into a seasonal drought due to a persistent precipitation deficit.The rapid and silent propagation of the 2022-23 mega soil moisture drought prevented the stakeholders from taking timely actions, resulting in direct economic losses of 51.28 billion Chinese Yuan.Previous studies have focused on the causes and impacts of this mega drought especially during the onset stage (Ma et al 2022, Liang et al 2023, Liu et al 2023, Wang et al 2023), but have not assessed the predictive skill during the entire evolution processes of the drought.
Over the past two decades, the research and operation of drought monitoring and prediction systems have significant progress.The U.S. drought monitor blends multiple physical drought indicators with subjective impact assessments from local experts to establish the drought classification scheme (Svoboda et al 2002).The African drought monitor estimates drought conditions through a combination of hydrological modeling and satellite remote sensing (Sheffield et al 2014).The pan-European probabilistic seasonal drought early warning system has been developed by utilizing the outputs of the LISFLOOD hydrological model (Van Der Knijff et al 2010), which can generate a comprehensive list of drought indices and quantify drought characteristics with the variable threshold method (Sutanto et al 2020, Sutanto and Van Lanen 2021).However, the accuracy and timeliness of drought prediction are still limited.The standardized precipitation index (SPI) has been proposed to identify meteorological drought events (Mckee et al 1993), while the substantial differences in the prediction of SPI between various dynamical climate prediction models may even exceed typical drought thresholds (Mo and Lyon 2015), and only less than 30% of seasonal meteorological drought onsets can be detected by climate prediction models at the global scale (Yuan and Wood 2013).Soil moisture anomaly is a good choice for identifying agricultural droughts that have implications for agricultural yields and water resource management (Ryu and Famiglietti 2005), while predicting the soil drought evolution process is also challenging due to complex causes and hydrometeorological processes (Wang and Ma 2023).The soil droughts that caused losses of billions of dollars, such as the 2012 Great Plains drought (Hoerling et al 2014) and the 2017 drought across the U.S. Northern Great Plains and Canadian prairies (Hoell et al 2020), were not well predicted.
The multi-model ensemble is an effective method for addressing forecast uncertainty and providing valuable probabilistic prediction information for disaster risk management (Pan et al 2013, Wanders and Wood 2016), which can also improve the skill of seasonal forecasting (Hagedorn et al 2005).The North American multi-model ensemble (NMME; Kirtman et al 2014) system provides real-time forecasts of meteorological data (Becker et al 2020), which can be directly used for the prediction of meteorological drought and has shown promise for both research and operational applications (Yuan and Wood 2013, Mo and Lyon 2015, Wanders and Wood 2016, Slater et al 2019).Moreover, the NMME data are used for driving the hydrological models to carry out the ensemble prediction of soil moisture droughts (Mo and Lettenmaier 2014, Thober et al 2015, Yuan et al 2015), which is always accompanied by a large amount of computation, and the accuracy of probabilistic drought prediction is significantly affected by the structures and parameters of the hydrological models (Hao et al 2018).The combination of dynamical climate prediction models with statistical models supplements existing approaches, which provides an alternative method to generate efficient probabilistic drought prediction that can be easily implemented in operational systems (Hao et al 2018, AghaKouchak et al 2022).The dynamic-statistical blended prediction is expected to leverage the advantages of these two methods to enhance the performance in drought prediction and provide better accuracy and more reliable estimations of uncertainty associated with drought prediction (Nandgude et al 2023).However, whether it can provide useful information for the prediction of the evolution process of the mega soil drought needs further investigation.
The copula method can establish the joint probability distribution between random variables (Salvadori and De Michele 2010), which can be used for soil drought prediction after establishing the relationship between soil moisture and the prediction of accumulated precipitation.Pan et al (2013) proposed a probabilistic framework based on the copula method, which enabled to assess the risk of a specific drought index threshold occurring under specific precipitation conditions.However, due to the lack of real-time precipitation forecasts and quantitative evaluation methods, the framework has not yet been applied to recent drought prediction.Here, focusing on the prediction of the unprecedented interseasonal soil drought event over the YRB in 2022-23, we improve the probabilistic drought prediction framework by combining the dynamical climate prediction and statistical analysis, and aim to answer the following questions: (1) What are the evolution characteristics of the 2022-23 mega soil drought over the YRB? (2) How to develop a probabilistic prediction framework for mega soil droughts based on NMME and copula analysis?(3) Does the real-time prediction framework perform well for the 2022-23 mega soil drought over the YRB?

ERA5 and NMME data
In this study, the hourly top 1 m soil moisture and precipitation from the fifth-generation reanalysis data (ERA5; Hersbach et al 2020) released by the European Center for Medium-Range Weather Forecasts were used to establish the relationship between precipitation and soil moisture, and consequently the probabilistic drought prediction framework (see section 2.3 for detail).The monthly precipitation data from three NMME models with real-time forecast data (Kirtman et al 2014; table S1), CanSIPS-IC3 (CanSIPS), COLA-RSMAS-CCSM4 (COLA), and NASA-GEOSS2S (NASA), were interpolated to a spatial resolution of 0.25 • × 0.25 • and used to predict the evolution process of the 2022-23 mega drought over the YRB.Here, the ERA5 data during 1961-2023 and NMME data during 1982-2023 were collected, and they were both converted to percentiles at a monthly time scale based on the same period of 1982-2021 (the data during 2022-2023 are used for prediction).For each NMME model, they have different ensemble numbers and the ensemble means were used in this study.In addition, the means of all ensemble members from three models were used as the multi-model ensemble mean of NMME (NMME multi-model ensemble).

Drought definition
We determined drought events based on the soil moisture percentiles (SMP) and drought areas at the monthly scale, and a strict threshold combined with relaxation criteria was used to avoid misidentification of the drought period (Wang andYuan 2023, Zhou andYuan 2023).Over the YRB, grids with SMP lower or equal to 20% are classified as experiencing drought, the drought onset is defined as the basin drought area greater than 50% for two consecutive months, while the drought area lower than 40% for two consecutive months marks the demise of the drought.For a drought event, the period when the drought area is greater than 50% and continues to increase is considered as the onset stage, the remaining drought period is the persistence stage, and the first month after the drought period is the recovery stage.

Copula-based probabilistic prediction framework
In this study, the copula method was used to establish the probabilistic prediction framework for drought events.The copula method links random variables with the joint distribution of their respective marginal distributions (Salvadori and De Michele 2010), and was used to analyze the joint distribution of SMP and cumulative precipitation percentile (CPP).Using x as CPP and y as SMP, the joint probability density functions of CPP and SMP (H X, Y ) can be defined as: where C is the copula function.F X and F Y are the marginal cumulative distribution functions of CPP and SMP, respectively.When the CPP is given, we can calculate the probability of drought occurring with the following equations: where x * represents the predicted CPP, which was provided by NMME models.The y * represents the drought threshold, and we used the SMP of 20%.
The ∂ represents the partial derivative.With the above equations, we can determine the probability of drought occurrence (SMP ⩽ 20%) when the CPP is given.Instead of the sorting index method, the Kernel density estimation was used to calculate the percentiles to better represent the extremes of soil moisture and cumulative precipitation during 2022-23 (Rosenblatt 1956, Parzen 1962).In addition, the loglikelihood estimation was used to select the optimal copula functions for the joint distributions to ensure the accuracy of fitting (Zhang and Singh 2012).
During the 2022-23 mega drought, the above method was applied to each grid over the YRB, and figure 1 illustrates the estimation procedure with a lead time of one month.Taking figure 1(a) as an example, the blue points are samples with the similar initial condition of SMP in July 2022 within a range of ± 1 nyr+1 % (nyr = 40; the number of years in the baseline period; 1982-2021), which are selected from the 31 sets of 31 day moving averages during 1961-2021 covering the period from 16 June-16 July to 16 July-15 August.The horizontal and vertical coordinates represent the CPP (x in equation ( 1)) and SMP (y in equation ( 1)) of the samples for the next 31 day (covering the period from 17 July-16 August to 16 August-15 September) during 1961-2021, which are used for copula fitting and constructing the probabilistic prediction framework.With this framework, we can obtain the probability of August mean SMP in 2022 at different thresholds (y * in equations ( 2) and ( 3)) when we input the corresponding predicted CPP for August, 2022 (x * in equations ( 2) and ( 3)).In the grid shown in figure 1(a), the probability of drought occurring in August, 2022 is 49% when the CPP is 25%, and 33% when the CPP is 50%.Here, with the observed CPP during 2022-23 from ERA5, the ERA5 experiment was carried out to assess the suitability of the probabilistic prediction framework for the 2022-23 mega drought.Additionally, we also conducted the estimations based on precipitation from climatological prediction (CLIP) and NMME climate models.The CLIP experiment was used as a reference (the gray lines in figure 1), which assumes that the future precipitation always remains at the climatology level (CPP = 50%).The NMME experiment can be used for real-time prediction, and the CPPs were provided by the NMME models.

Verification measures
The Brier score (BS; Wilks 1995) is used for assessing the accuracy of probabilistic prediction.It is essentially the mean square error of the probability forecast, which is calculated as follows: where N represents the number of grids over YRB, y k represents the predicted probability of drought occurrence, and o k represents whether the grid is experiencing drought in the observed data from ERA5.The o k equals 1 when the drought occurs, and equals 0 when the drought does not occur.A higher BS value is obtained when the accuracy of drought prediction is low, and a perfect prediction has a BS value of 0. Additionally, taking the BS of CLIP experiment as a reference, the improvements in prediction skill can be assessed with the Brier skill scores (BSS) (Hamill and Juras 2006), using the following equation: The BS and BS CLIP represent the BS of the NMME model and the CLIP experiment, respectively.A higher BSS indicates a greater improvement in predictive skills for the NMME model compared to the CLIP experiment.

Results
Figures 2(a)-(f) display the spatial patterns of the 2022-23 mega drought evolution process over YRB, and the observed monthly CPP from ERA5 can be found in figures S1(a)-(e).Combined with the time series of drought area percentage (figure 2(g)) and regional averaged SMP (figure 2(h)) over the YRB, we found that the drought developed since July and then reached its peak in August when the SMP decreased below 20% over more than 80% areas of the YRB.During the period of drought intensification, the YRB experienced an extreme precipitation deficit.The percentiles of precipitation over most areas dropped below 10% (figure S1(a)), and the regional-averaged precipitation percentiles are about 20% (figure S2(a)).Simultaneously, both the expansion rate of drought areas and the decline rate of SMP are rapid, the percentage of drought areas increased from less than 20% to greater than 80% and the regional averaged SMP decreased from above 50% to less than 20% within two months (figures 2(g)-( h  occurring over the areas where SMP is no greater than 20% (figures 2(b)-(f)), which indicates that the probabilistic prediction framework based on the copula method is effective in predicting the evolution process of 2022-23 mega soil drought when giving the accurate precipitation.Subsequently, with the predicted CPPs from climatology and NMME models (figures S1 and 2), we carried out the soil drought prediction and showed the spatial patterns of drought occurrence probability with a lead time of one month (figures 3 and S3).In the CLIP experiment, the predicted results still showed a very low probability for drought onset even if the YRB drought had reached its peak in August (figure 3  the persistence stage increased (figures 4(a) and (d)) while the improvement of predictive skills against CLIP decreased (figures 4(e) and (h)), and the BS and BSS values both decreased accordingly.Compared to the individual models, the NMME multi-model ensemble performed more stably during the whole drought period, and its predictive skills were further improved (figures 4(a) and (e)).During the whole drought period, the predictive skill of NMME multimodel ensemble is generally comparable to the bestperforming models (figures 4(a) and (e)).The corresponding BSS is 0.26, increased by 0.01, 0.1, and 0.14 compared to CanSIPS (0.25), COLA (0.16), and NASA (0.12), respectively (figure 4(h)).In short, with a lead time of one month, the CLIP has a higher potential for drought prediction during the persistence and recovery stages than the onset stage.Compared with the CLIP experiment, the predictive skills of three NMME models increased significantly during the onset and persistence stages, and the skill of CanSIPS is the highest.In addition, the NMME multi-model ensemble has further improved the prediction skill of drought evolution compared to the single model, with the BSS increased up to 0.14 throughout the entire drought period.
Furthermore, we assessed the predictive ability of CLIP and NMME models for drought evolution at lead times of two and three months.The predicted CPP is presented in figure S2 c)) and the BSS approaches 0 (figures 4(f) and (g)), which means that the NMME models nearly lost the predictive skills during the onset stage.However, the performance of the three models was overall better than the CLIP experiment during the whole drought period (figures 4(b)-(d) and (f)-(h)).Among the three models, the CanSIPS still performed the best, while the prediction skill of COLA declined the fastest and performed the worst due to the significant overestimation of CPP.As the lead time increased from one month to three months, the BS of the NMME multi-model experiment increased from 0.22 to 0.66, from 0.17 to 0.31, and from 0.

Summary and discussion
During the summer of 2022, an unprecedented soil drought broke out over the YRB without early warning and eventually developed into a seasonal drought due to the persistent precipitation deficit.The evolution process of SMP revealed that the drought had a rapid onset process and long-lasting duration, it developed since July and reached its peak in August, and finally recovered until April 2023.Focused on the 2022-23 YRB drought, we developed a probabilistic prediction framework using the copula method, and tried to predict the soil drought evolution process with the real-time predicted precipitation released by the three NMME models.The results indicate that the NMME models combined with copula have potential to predict the onset and persistence process of soil drought at one-month lead, and have higher predictive skills than the CLIP experiment.In addition, the NMME multi-model ensemble can further improve the predictive skill, with the BSS increasing up to 0.14 during the drought period compared to the individual NMME models.During the drought period, the accuracy of drought prediction is higher during the persistence stage than in the onset stage, while the prediction skill of the NMME models increased more significantly during the onset stage compared to the CLIP experiment.As the lead time extended, the predictive skills of the NMME multi-model ensemble during the entire drought period decreased while remaining higher than that of the CLIP experiment, with the BSS of the NMME multi-model ensemble decreasing from 0.26 to 0.2.Specifically, the BSS values changed from 0.56 to 0, from 0.18 to 0.26, and from 0.05 to −0.01 during the drought onset, persistence and recovery stages respectively.This suggests that the NMME models have great potential for predicting the persistence of drought, while the predictions for drought onset and recovery will quickly lose skill with the increase of lead times.
This study combines the dynamic climate prediction data with the copula method to provide a dynamic-statistical framework for drought prediction.Compared to the process-based model, this framework has fewer data requirements (no need for refined surface parameters and diverse meteorological forcings), does not require complex parameter calibration, and its computational process is also more efficient.More importantly, it has indeed demonstrated high potential in drought prediction.Overall, it is an alternative method to generate skillful probabilistic drought prediction that can be easily implemented in operational systems at regional to global scales.However, we also found that the NMME models have large uncertainties in the precipitation prediction as the lead time increases due to the inherently chaotic nature of the atmosphere (Lavers et al 2009, Yuan et al 2011, Smith et al 2012), and the prediction of drought onset quickly loses skill after one month.Given that the sea-air teleconnection and land-atmosphere coupling are key processes related to the generation of extreme events (Taylor et al 2012, Hao et al 2018, Shao et al 2023), more accurate simulations of the atmospheric circulation response to sea surface temperature (Seager andHoerling 2014, Schubert et al 2016), as well as further taking into account land-atmosphere coupling that facilitates the propagation of meteorological drought to soil moisture drought may provide prospects for the prediction of soil drought onset at long lead times (Schubert et al 2007, Hao et al 2018), which will benefit a variety of sectors by allowing sufficient time for drought mitigation efforts.Meanwhile, since the catchment memory, that is, prevailing catchment storage capacity (soil water, groundwater), plays an important role in explaining the performance of drought events (Van Hateren et al 2019, Sutanto andVan Lanen 2022), a reasonable reflection of catchment memory in the dynamic-statistical framework may further improve the predictive skill of soil drought.In addition, drought prediction under the changing environment is another challenge due to the influence of land use (Huntington 2006, Milly et al 2008).Here, the probabilistic prediction framework was established based on the statistical relationship between the SMP and CPP from historical records.However, the statistical relationship may not be stationary due to the temporal variabilities involving trends, oscillations, and sudden shifts, which can be influenced by the land cover changes (Nicholls 2001, Sveinsson et al 2003, Mishra and Singh 2011).Therefore, efforts are required to utilize the predictability associated with these nonstationary behaviors (NRC 2010) and enhance existing models to incorporate nonstationary features for drought prediction (Hao et al 2018).Lastly, besides the 2022-23 mega drought over the YRB, further studies should be conducted to determine whether the real-time prediction framework can provide a skillful prediction for drought events across different regions and seasons.

Figure 1 .
Figure 1.Illustration of the drought probabilistic prediction with the copula method.The blue dots represent the sample points with similar initial soil moisture percentile (SMP) of the 2022-23 mega drought during the historical period (1961-2021) at 27 • N, 115.5 • E. The green shading is the joint probability distribution function (PDF) fitted with copula method, and the red lines show the drought threshold (SMP = 20%).The left blue (right gray) lines are the conditional PDF of the drought index under cumulative precipitation percentile (CPP) equal to 25% (50%), and the integral values below the red lines (black numbers) represent the probabilities of drought occurrences under different CPPs.

Figure 2 .
Figure 2. The evolution process of the 2022-23 mega drought over the Yangtze River basin (YRB).(a)-(f) are the spatial patterns of SMP during Jun 2022-April 2023 at a monthly timescale.(g) Time series of drought area percentages (%) over YRB from Jun 2022 to May 2023.The dashed lines are the thresholds for distinguishing the onset (50%) and recovery (40%) of drought.(h) Time series of area averaged SMP (%) over YRB from Jun 2022 to May 2023.The red asterisks, blue circles and black squares show the onset, persistence and recovery stages, respectively.
)). Subsequently, precipitation first occurred over the upper YRB (figure S1(b)) and the drought entered the persistence stage.In October 2022, the SMP increased to above 20% over most areas of the upper YRB (figure 2(c)), while the drought over the middle and lower YRB persisted and finally recovered until April 2023.In summary, the 2022-23 mega soil drought broke out rapidly over the whole YRB with the extreme precipitation deficit and finally developed into a long-lasting drought over the central region, the period of July-August 2022 was identified as the onset stage, September 2022-March 2023 was the persistence stage, and April 2023 was the recovery stage.Focused on the 2022-23 mega soil drought over the YRB, we established the probabilistic prediction framework introduced in section 2.3 and assessed its accuracy with the observed CPP provided by ERA5 (figures S1(a)-(e)).The drought probability was displayed in figures S3(a)-(e), and it was discovered that there is a high likelihood of drought

Figure 3 .
Figure 3.The spatial patterns of drought occurrence (SMP ⩽ 20%) probability under the climatology cumulative precipitation ((a)-(e); CLIP; CPP = 50%) and ensemble average cumulative precipitation predicted by NMME models (f)-(j) with the lead time of one month.
(a)), which means that the CLIP experiment is unable to predict the onset of this mega drought.However, the central region over the YRB shows a high probability for experiencing drought after August (figures 3(b)-(d)) and the accuracy of drought prediction greatly improved accordingly (figure 4(a)), with the BS decreasing from 0.5 in onset stage to 0.21 in the persistence and recovery stages (figures 4(a) and (d)).In comparison to the CLIP experiment, the NMME models can provide more accurate precipitation prediction (figures S1 and S2), which ensures higher predictive skills for drought evolution at the onemonth lead time (figure 4(a)), particularly during the onset stage (figure 4(d)).The CanSIPS model can well predict the spatiotemporal characteristics of CPP (figures S1(k)-(o)), thus roughly capturing the evolution process of 2022-23 mega drought (figures S3(f)-(j)).However, the COLA and NASA models overestimated the precipitation over the middle and lower YRB (figures S1(p) and (u)), so the two models only captured the drought onset over the upper YRB (figures S3(k) and (p)).Subsequently, all three models overpredicted the precipitation in February 2023, and their corresponding predictive skills were low (figures 4(a) and (e)).During the recovery stage, only the CanSIPS model outperforms the CLIP experiment (figure 4(a)).As for the NMME multi-model ensemble, it well captured the drought evolution process and indicated a probability of drought onset exceeding 90% over most of the YRB in August (figures 3(f)-(j)).Compared to the onset stage, the accuracy of drought prediction during

Figure 4 .
Figure 4. Bar charts of the Brier skill (BS; (a)-(d)) and Brier skill score (BSS; (e)-(h)) for the NMME models with the lead times of one month ((a), (e)), two months (b), (f) and three months (c), (g) during the drought period.The results of BS and BSS at different drought stages and different lead times ((d), (h)).The red, orange and blue asterisks represent the CanSIPS-IC3, COLA-RSMAS-CCSM4 and NASA-GEOSS2S models, respectively.The blue bars represent the CLIP experiment, and the khaki bars represent the ensemble average of three models (NMME).
, while the BS and BSS of drought prediction are shown in figures 4(b)-(d) and (f)-(h).It was found that the prediction accuracy of CLIP and NMME models both decreased rapidly as the lead time increased.At the lead time of two (three) months, the BS of NMME models during July (July-August) is almost equivalent to the CLIP experiment (figures 4(b) and ( 2 to 0.3 during the onset, persistence (figure 4(d)), and recovery stages (figures 4(a)-(c)), respectively.Meanwhile, the BSS values decreased from 0.26 to 0.2 during the drought period (including onset and persistence stages; figure 4(h)), and from 0.05 to −0.01 during recovery stage (figures 4(e)-(g)).