Improvement of disastrous extreme precipitation forecasting in North China by Pangu-weather AI-driven regional WRF model

In the realm of weather forecasting, the implementation of Artificial Intelligence (AI) represents a transformative approach. However, AI weather forecasting method still faces challenges in accurately predicting meso- and smaller-scale processes and failing to directly capture extreme precipitation due to regression algorithm’s nature, coarse resolution, and limitations in key variables like precipitation. Therefore, we propose a state-of-the-art technology which integrates the strengths of the Pangu-weather AI weather forecasting with the traditional regional weather model, focusing specifically on enhancing the prediction of extreme precipitation events, as mainly exemplified by an unprecedented precipitation in North China from 29 July to 1 August 2023, and an additional extraordinary precipitation event as a supplementary validation to further ensure the accuracy of this technology. The results show that the AI-driven approach exhibits superior performance in capturing the spatial and temporal dynamics of extreme precipitation events. Remarkably, with a threshold of 400 mm, the AI-driven model secures a Threat Score (TS) of 0.1 for forecast lead time reaching up to 8.5 d. This performance notably surpasses the performance of traditional GFS-Driven models, which achieve a similar TS only within a limited 3-day forecast lead time. This considerable enhancement in forecast accuracy, especially over extended lead times illustrates the AI-driven model’s potential to advance in long-term forecasts of extreme precipitation, previously considered challenging, emphasizing the potential of AI in augmenting and refining traditional weather prediction.


Introduction
The integration of Artificial Intelligence (AI) technologies into meteorological forecasting marks a profound paradigm shift in the field (Dueben and Bauer 2018, Weyn et al 2019, Zhang et al 2023).This development is evident in various forecasting methods (Zhang et al 2023), particularly in medium-range weather forecasting (Pathak et al 2022, Chen et al 2023), where notable innovations have been observed.A notable example of this progress is Pangu-weather AI (hereafter, simply referred to as Pangu, Bi et al 2023), a sophisticated system that applies advanced deep learning algorithms to improve the forecast accuracy and efficiency, especially for large-scale meteorological events with severe weather conditions.Notably, Pangu has demonstrated superior performance over established numerical global forecasting models in certain instances, underscoring AI's transformative impact on enhancing forecast accuracy, and thereby making a notable contribution to meteorological science.
However, the application of AI in medium-range global weather forecasting faces certain limitations (Pathak et al 2022, Bi et al 2023, Tian et al 2024).
A key concern is the model's handling of extreme precipitation.An essential challenge in advanced AI weather forecasting is its limited efficacy in accurately predicting extreme precipitation, arising from the fact that their training data predominantly consists of reanalysis data, which typically have inherent coarse resolution and inadequately represent extreme precipitation events.Moreover, the nature of deep learning AI methods, which tends to produce smoother forecast outputs, may result in an underestimation of the severity of extreme precipitation.Accurate forecasting of extreme precipitation events is vitally important for disaster prevention and mitigation (Deng et al 2022, Liu et al 2023), as demonstrated by the recent catastrophic event in North China from 29 July to 01 August 2023, which resulted in notable human and property losses (CMA 2023).The current inability of AI weather forecasting to precisely predict such severe weather conditions stresses the urgent need for advancements in this technology.
In contrast, traditional regional weather models, such as the Weather Research and Forecasting (WRF) model, recognized as one of the most widely used regional weather models, extensively employed for kilometer (Xu et al 2022(Xu et al , 2023) ) and sub-kilometer scale forecasting (Bryan et al 2003, Xu andDuan 2022), excel in capturing finer atmospheric structures (Zhang and Zhang 2011) and enhancing the spatial resolution (Vukicevic et al 2022).This capability is key for accurately representing meso-and smallscale meteorological events and effectively improving extreme precipitation forecasts (Ding et al 2022).These models, designed to supplement the lower resolution of global models, provide a comprehensive array of meteorological variables, and offer a more detailed view of atmospheric conditions (Dowell et al 2022).Hence, model output is pivotal in mitigating the limitations of AI forecasting models, which may not capture all essential weather elements or precisely predict localized weather event.Nevertheless, traditional regional numerical models, driven by global numerical forecasts, continue to face constraints in forecasting extreme precipitation with a limited skillful lead time.The question arises: Can AI alter this existing landscape?
This study aims to integrate the WRF model to address the challenges of AI weather forecasting as mentioned above.Although AI has been applied in numerical modeling for postprocessing, hybrid modeling, and purely data-driven weather forecasting (Rasp et al 2020), to the best of our knowledge, this work is first to improve the extreme precipitation forecast ability of the AI weather forecasting model by combining AI's computational efficiency and advanced data processing with the WRF model's detailed, physically-based insights.Specifically, AI weather forecasting is utilized as initial fields and lateral boundary conditions to drive the WRF regional model, a framework referred to as AI-Driven WRF, as opposed to the traditional approach driven by the numerical global model, referred to as NWP-Driven WRF.

Dataset
In this study, the China Meteorological Administration (CMA) precipitation analysis dataset was employed to validate model precipitation.This dataset, noted for its high spatiotemporal resolution of 1 km and 1 h intervals, amalgamates data from over 30 000 rain gauge stations across China with CMORPH satellite products (Shen et al 2014).The ECMWF ERA5 reanalysis, utilized for analyzing the synoptic environment and atmospheric circulations during the extreme rainfall event as well as for driving the Pangu model, integrates a diverse range of observations through sophisticated modeling and data assimilation techniques (Hersbach et al 2020).The National Centers for Environmental Prediction (NCEP) Global Upper Air Observations, an extensive collection of surface and upper air reports operationally gathered by the NCEP, were employed to evaluate the accuracy of our model forecasts across different atmospheric elements.Additionally, the CMA Best Track Archive provided data for verifying the track of Tropical Cyclone.

Pangu-weather forecasting
In our study, we adopted the version of the Pangu model as extensively described by Bi et al (2023), which operates at a spatial resolution of 0.25 • × 0.25 • and a temporal resolution featuring 6 h intervals.This version was trained on 39 years of ERA5 reanalysis data, from 1979 to 2017.Upon completion of training, it operates as a forecasting system that initiates forecasts or simulations given the initial conditions.The choice by Bi et al (2023) to use 2019 data for validation and 2018 data for testing, aimed at a direct comparison with FourCastNet2, was designed to demonstrate the Pangu model's forecasting superiority.Due to the placement of the test year between the training and validation datasets, their methodology is not considered a best practice as it introduces the potential for data leakage and overfitting in the evaluation of the test data.Test data are meant to assess the model performance in the unknown future.However, Bi et al (2023) performed an additional analysis for the years 2020 and 2021, as shown in their Extended Data figure 4, finding the data leakage/overfitting led to a measurable thought not serious decrease in forecasting accuracy, thereby affirming the model's promising performance.Importantly, this methodology has undergone experimental testing and continuously verifying at ECMWF and the CMA, where the Pangu model's forecasts demonstrated promising results.For example, current forecasts by Pangu can be viewed at the ECMWF's website (https://charts.ecmwf.int/?query=PANGU).

WRF model
For this study, the Advanced Research WRF model, version 4.2.1, was employed to provide an integration with Pangu, particularly focusing on the Typhoon Doksuri-induced extreme rainfall event.The WRF model configuration included two nested domains: a larger domain with a 12 km grid spacing covering a substantial portion of East Asia, and a finer, inner domain with a 3 km grid spacing targeting the North China region (figure S1).
The model, with 50 vertical levels extending from the surface to 50 hPa, effectively captured the spatial distribution, intensity, and temporal dynamics of the extreme precipitation event.This configuration enabled a comprehensive reproduction of the rainfall process, positioning the WRF model as a robust comparative framework against Pangu's capabilities in analyzing the impacts of AI-driven regional modeling of on extreme rainfall events.The detailed physics information of the WRF model can be referred to in the supporting information.

Experimental design
This study encompassed nine experiments (refer to table S1) focused on the extreme rainfall event in North China, to assess the comparative efficacy of Pangu and the NCEP GFS forecasts when used as initial conditions for the WRF model.The selection of GFS, due to its open-source accessibility and prevalent application in driving regional models, serves as a key rationale for its inclusion as a reference in this study.The experiment commences with the Pangu model, initiated by ERA5 reanalysis data.Key initialization times are designated as 12:00 UTC on 23 July 2023 (termed as Pangu_2312), 00:00 UTC on 26 July 2023 (termed as Pangu_2600), and 12:00 UTC on 28 July 2023 (termed as Pangu_2812), selected for their relevance to the study period.Subsequently, Panguweather AI generates forecasts for 5.5 d, 3 d, and 0.5 d ahead of 00:00 UTC on 29 July 2023.These forecasts are then employed as the initial and boundary condition to drive the WRF model simulations, termed WRF_Pangu_2312, WRF_Pangu_2600, and WRF_Pangu_2812, respectively.
In a parallel setup, WRF model is also executed using forecasts from the GFS for identical lead times, labeled as GFS_2312, GFS_2600, and GFS_2812.These WRF runs, correspondingly termed WRF_GFS_2312, WRF_GFS_2600, and WRF_GFS_2812, serve as a control for evaluating the performance of the AI-driven forecasts.The effectiveness of employing Pangu-weather AI forecasts as initial conditions is assessed through various metrics, including the accuracy of precipitation forecasts, the intensity of weather events, and the spatial-temporal congruence of forecasted weather phenomena with observed data.This systematic comparative analysis aims to discern the relative strengths and weaknesses of AI-driven forecasts in the context of regional weather modeling.

Evaluation metrics
To evaluate the accuracy of initial conditions, the root mean square error (RMSE) and Bias are employed (Vergara-Temprado et al 2020).It is defined as: where O i denotes observed values, P i predicted values, and N the count of observations.Furthermore, the efficacy of quantitative precipitation forecasting is measured using the Threat Score (TS), Hit Rate and False-Alarm Rate, calculated as follows: TS = Hits Hits + Misses + FalseAlarms (3) HitRate = Hits Hits + Misses (4) FalseAlarmRate = FalseAlarms FalseAlarms + CorrecetNegatives . (5) For TS and HitRate, a value from 0 to 1 reflects the forecast's accuracy, with 1 representing perfect accuracy.Conversely, for the FalseAlarm Rate, a value approaching 0 indicates higher forecast precision.Hits correspond to the number of correctly forecasted events, Misses to the number of events that occurred but were not forecasted, FalseAlarms to the number of non-occurring events that were incorrectly forecasted, and CorrectNegatives refer to the number of nonoccurring events correctly forecasted.The HitRate and the FalseAlarmRate are critical metrics used to construct the receiver operating characteristic (ROC) diagram.

Meteorological overview
From 29 July to 1 August 2023, Typhoon Doksuri hit North China and caused extreme precipitation, flooding, and landslides.This catastrophic event impacted nearly 1.29 million people across the affected areas and resulted in 33 fatalities in Beijing.The intensity of this precipitation event was striking, particularly under the influence of complex terrain and favorable weather conditions (figures 1(a)-(c)).Beijing received an average rainfall of 276.5 mm, peaking at 744.8 mm, while Hebei province reported an average of 153.2 mm, with a maximum of 1003.0 mm.Records from 26 meteorological stations broke historical precipitation.These amounts surpassed the maximum precipitation recorded during the previous destructive extreme rainfall events in North China in 1996, 2012, and 2016(CMA 2023, Zhao et al 2024).
Figure 1(c) depicts the precipitable water, 850 hPa wind, and 500 hPa geopotential height as of 0000 UTC on 30 July 2023, illustrating a complex weather situation characterized by various dynamic atmospheric and meteorological factors.Firstly, the western Pacific subtropical high (WPSH), delineated by the 588 dgpm contour line, cast its influence extensively across eastern and northern China, the Korean Peninsula, and Japan.Within the framework of this atmospheric configuration, Doksuri is discernable, situated at approximately 35 • N in North China.Notably, its quasi-stationary movement was constrained by the overarching presence of the WPSH.This unique atmospheric alignment fostered conditions conducive to vertical air motions, which in turn facilitated moisture condensation, culminating in noteworthy precipitation events.
Concurrently, two prominent moisture flows towards North China complicated the atmospheric dynamics (figures 1(d) and (e)).A monsoonal southerly flow generated a moisture pathway from the south, with the other moisture influx channeled into the extreme rainfall region by a southeasterly flow.These converging low-level winds, shaped by the topography in figure 1(b), induced an unexpected moisture aggregation over North China, leading to the extreme rainfall event.Overall, the persistence of moisture transport, together with the WPSH and track of Typhoon Doksuri, was instrumental in the genesis of this extreme rainfall event.

Performance of Pangu
In anticipation of employing Pangu forecast data as initial conditions for WRF simulations, a rigorous evaluation of Pangu's predictive accuracy at WRF initialization times is imperative.Such an analysis informs our understanding of the error propagation in WRF simulations driven by Pangu data.Given the pivotal role of moisture transport in the analyzed extreme precipitation event, our initial focus is on the fidelity of Pangu's total column water vapor forecasts.
Figures 2(a)-(f) illustrates the column-integrated moisture flux from the Pangu model at lead times of 5.5, 3, and 0.5 d ahead of 00:00 UTC on 29 July 2023.A notable stability in the distribution of column-integrated moisture flux is observed, especially in the vicinity of Typhoon Doksuri in North China.Importantly, variations in the difference in moisture flux between the Pangu model and the ERA5 reanalysis become evident with longer forecast lead times, highlighting the model's sensitivity to lead time.For example, the Pangu_2812 (figures 2(a) and (d)) case shows a pattern of moisture transport closely resembling that of the ERA5, whereas the Pangu_2600 (figures 2(b) and (e)) experiment reveals a marked southward displacement of Typhoon Doksuri.This displacement is associated with an increased difference from ERA5 in and around North China, reduced moisture transport, and intensified cyclonic circulation, likely indicative of an overestimation of the storm's intensity.Interestingly, compared to Pangu_2600, while the discrepancy is more pronounced in the Pangu_2312 (figures 2(c) and (f)) experiment, the area of greatest difference from ERA5, particularly around North China, shows relatively little variation, underscoring the Pangu model's stability and predictable performance in critical regions.
In stark contrast, the GFS forecast lacks a stability in the distribution of column-integrated moisture flux (figure S2), displaying notable shifts in error magnitude and spatial distribution from GFS_2600 (figures S2(b) and (e)) to GFS_2312 (figures S2(c) and (f)).The consistent error behavior of the Pangu model, as opposed to the variability seen in GFS forecasts, not only underscores Pangu's potential robustness and dependability as a forecast initiator but also presents an opportunity for further improvement through targeted error correction strategies.
Figures 2(g)-(l) presents a comparative analysis of the forecasting accuracy of Pangu and the operational NCEP GFS using RMSE and Bias metrics.Our focus is on upper-atmospheric variables, including specific humidity, temperature, and geopotential height.Overall, Pangu's forecasting accuracy was found to be rigorously compared against the operational GFS model, with validations conducted using radiosonde datasets from NCEP global upper air observations (NCEP 2008).For example, results reveal that, for a 12 h forecast, both Pangu and the operational GFS display comparable RMSE and Bias values, indicating a similar overall forecasting accuracy in short-term predictions.However, closer examination reveals subtle yet measurable differences in their performance.Specifically, Pangu_2812 consistently shows a reduced error margin in lower atmospheric humidity (figure 2(g)) and geopotential fields above 925 hPa (figure 2(i)).This suggests Pangu has enhanced ability to accurately predict water vapor distributions and atmospheric structures, albeit with only marginally increased RMSE.
Despite both models exhibit comparable overall Bias and RMSE values, significant distinctions in bias levels become evident upon closer inspection.Specifically, within the lower troposphere, Pangu_2812 demonstrates a notably lower bias than GFS_2812 in predicting both humidity and geopotential height (figures 2(j) and (l)).This pronounced elevation in bias with GFS_2812, primarily observed in the boundary layer, can largely be attributed to the boundary layer parameterization utilized by the numerical models.AI models, such as Pangu, distinctly operate without reliance on sub-grid scale algorithms, a key distinction that contributes potentially to their enhanced forecasting performance and accuracy.
As the forecast lead time increased, a rise in RMSE and Bias was notable for both Pangu and GFS, reflecting a typical trend in atmospheric forecasting.However, Pangu exhibited a notably smaller escalation in error compared to GFS, especially in forecasting moisture and geopotential height (figures 2(g)-(i)).This lesser error growth, relative to traditional numerical models, underscores Pangu's enhanced stability and reliability in medium-range forecasting scenarios.This observation aligns with the findings of Bi et al (2023), who reported on Pangu's accuracy in extended weather forecasting, substantiated by verification data from 2018 and 2019.Our analysis, focusing on an extreme rainfall event, further underscores Pangu's potential advantages in longer-term forecasting.Its proficiency in predicting moisture content and geopotential fields over extended periods differentiates Pangu from conventional numerical weather prediction models (NWP), highlighting its robustness and reliability.These attributes reinforce Pangu's potential to improve medium-to-long-range atmospheric forecasts, especially in large-scale circulation and moisture transport.
However, our analysis reveals a marked escalation in bias, particularly in humidity forecasts, as demonstrated in the Pangu_2312 and Pangu_2600 simulations (figure 2(j)).This rise in bias, especially notable in the underestimation of Typhoon Khannu's intensity as depicted in figure 2(d), likely stems from the coarser resolution inherent to Pangu, coupled with the tendency of deep learning algorithms to yield smoother outputs than might be realistically expected.This situation emphasizes the crucial need for advancements in AI model simulations at finer scales.

Precipitation patterns
Figure 3 presents a comprehensive analysis of the spatial distribution of 72 h accumulated precipitation across the North China region, from 0000 UTC 29 July to 0000 UTC 1 August 2023.This analysis incorporates data from simulations underpinned by a range of initial and lateral conditions, derived from both Pangu and GFS forecasts with varying lead times, as described in section 2. The simulations exhibit marked variability in the distribution of cumulative precipitation over the 72 h period, influenced markedly by the global forecasting and lead time.
In the WRF_Pangu_2812 (figure 3(c)) and WRF_GFS_2812 (figure 3(f)) experiments, the simulated distribution of precipitation in North China closely aligns with the observed values.Notably, the WRF_Pangu_2812 simulation exhibits a broader spatial coverage of intense precipitation events, exceeding 250 mm, compared to the WRF_GFS_2812.While both simulations tend to overestimate precipitation in the western part of Hebei province (figure 1(b)), they generally replicate the observed spatial patterns and intensity of precipitation effectively, as evidenced by correlation coefficients and RMSE values for WRF_Pangu_2812 (Corr: 0.76 and RMSE: 86.2 mm, respectively, figure 3(c)) and WRF_GFS_2812 (Corr: 0.68 and RMSE: 96.4 mm, figure 3(f)).This underscores the effectiveness of both GFS and Pangu in short-term forecasting, highlighting their ability to accurately predict severe weather events within a limited timeframe.Following these, the WRF_Pangu_2312 (Corr: 0.67 and RMSE: 104.7 mm, figure 3(a)) and WRF_Pangu_2600 (Corr: 0.40 and RMSE: 137.6 mm, figure 3(b)) simulations still exhibit a commendable performance.A case in point is the WRF_Pangu_2312 model, which, despite projecting a lower overall intensity and distribution of precipitation, accurately delineates the north-south zones of heavy precipitation and successfully forecasts peak precipitation amounts surpassing 400 mm.
In stark contrast, the WRF_GFS_2312 (Corr: −0.18 and RMSE: 198.7 mm, figure 3(d)) and WRF_GFS_2600 (Corr: 0.45 and RMSE: 132.2 mm, figure 3(e)) simulations reveal notable shifts in precipitation patterns when compared to the WRF_GFS_2812 simulation (figure 3(f)).Specifically, the WRF_GFS_2600 simulation tends to concentrate precipitation predominantly in the northeastern regions of North China, evidently underestimating the intense rainfall in southern Hebei.More prominently, the WRF_GFS_2312 simulation exhibits an almost total lack of precipitation coverage in North China, further highlighting the variations in forecast accuracy among these simulations.This discrepancy underscores the relative stability and reliability of the AI-driven WRF model in forecasting this extreme rainfall event, compared to the traditional GFS-driven simulations.In this assessment, the WRF_Pangu_2812 and WRF_GFS_2812 simulations exhibit noteworthy proficiency, closely mirroring observed data and effectively capturing the temporal dynamics of the precipitation (figure 4(a)).A key aspect of their superior performance is reflected in their TS scores, exceeding 0.1 for precipitation thresholds with 600 mm in WRF_Pangu_2812 and 500 mm in WRF_GFS_2812 (figure 4(b)), indicating a robust capability in forecasting intense rainfall.Moreover, the WRF_Pangu_2600 and WRF_Pangu_2312 simulations, while not as precise as the WRF_Pangu_2812 simulation, still demonstrate promising accuracy.These simulations effectively replicate the precipitation patterns, with WRF_Pangu_2600 achieving a TS of 0.1 at a 450 mm threshold, and WRF_Pangu_2312 at a 400 mm threshold.Despite scoring lower on the TS compared to WRF_Pangu_2812, their performances remain noteworthy, indicating promising predictive capacity in extreme rainfall forecasting.

Quantitative assessment of extreme precipitation
Conversely, the WRF_GFS_2600 and WRF_GFS_2312 simulations show a marked decline in accuracy for long-duration extreme rainfall forecasting.These models deviate evidently from the observed data (figure 4(a)), with WRF_GFS_2600 achieving a TS of 0.1-a value determined through empirical judgment-for only a 200 mm threshold, and WRF_GFS_2312 exhibiting minimal skill above 50 mm (figure 4(b)).Analysis of the area under the curve (AUC) in the ROC diagram, where a higher AUC value indicates better performance, confirms these findings, aligning with the TS evaluations (figure 4(c)).This difference highlights the challenges of utilizing regional numerical models driven by traditional global numerical models for extended extreme rainfall forecasting.
Furthermore, our examination extends to include Hit Rate and false-alarm ratio (figure S4), which reinforce the insights gained from TS (figure 4(b)) and ROC (figure 4(c)) evaluations, highlighting the efficacy of AI-Driven approach.It worth noting was that the WRF_Pangu_2600 simulation exhibited an increased False Alarm Ratio, a consequence of the spatial mismatches in predicted precipitation within the northern regions.This discrepancy slightly reduced its TS and ROC metrics when compared to the WRF_Pangu_2312 simulation, emphasizing the challenges and nuances in accurately forecasting extreme weather events.

Assessment of circulation patterns and the track of typhoon doksuri
Our findings from the WRF model simulations, particularly those driven by the Pangu AI system, underscore the crucial impact of initial errors (figures 2 and S2) on the overall forecast accuracy.In the case of the WRF model driven by Pangu, we observed an increase in errors corresponding to greater lead times at the initial moment of the simulations.Notably, however, the location of these errors remained relatively consistent across different forecast periods.This consistency in error distribution can primarily be contributing to the stability of the Pangu-driven WRF simulations.
This observation is further validated by an analysis of the simulated tracks from our WRF experiments compared to the best-track data from the CMA from 00:00 UTC on 29 July to 00:00 UTC on 30 July for Typhoon Doksuri.A critical aspect of this comparison is how the initial position of the typhoon evidently influences its projected path.For instance, in the WRF_Pangu_2600 and WRF_Pangu_2312 simulations, the consistent accuracy in the initial location of Typhoon Doksuri (figure 4(d)), combined with more precise predictions in geopotential height and humidity fields (figures 2(g) and (i)), likely contributed to their superior performance in forecasting extreme precipitation events, compared to the WRF_GFS_2600 and WRF_GFS_2312 simulations (figure 3).
In addition to accurately capturing the typhoon's position, the WRF_Pangu_2312 (figure S3(a)) and WRF_Pangu_2600 (figure S3(b)) experiments demonstrated more stable and closely reanalyzed circulatory patterns and moisture transport in the vicinity of the rainfall area, particularly the circulation of the WPSH on the eastern side of North China, compared to the WRF_GFS_2600 (figure S3(d)) and WRF_GFS_2312 (figure S3(e)).

Conclusions and discussion
The integration of AI into meteorological forecasting, as exemplified by the implementation of Panguweather AI, represents an initial exploration in the field.However, the capability of AI to accurately predict extreme precipitation events, such as the unprecedented rainfall in North China during late July and early August 2023, still necessitates further investigation.The essence of this research lies in driving the WRF model with Pangu AI forecasts (AI-Driven WRF), as opposed to with traditional global NWP-Driven.Our comparative analysis, examining these two forms of input-Pangu AI for the AI-Driven WRF (Pangu-WRF) and the traditional NCEP-GFS model for the NWP-Driven WRF (GFS-WRF)-reveals measurable differences in forecast accuracy.Pangu-WRF simulations demonstrate enhanced precision in predicting rainfall patterns for extreme events, outperforming traditional GFS-WRF forecasts with accuracy maintained up to 8.5 d (figures 4(b) and (c)).This improvement is quantitatively supported by a TS of 0.1 for Pangu-WRF at 8.5 d, surpassing the NWP-Driven WRF model, which only achieve a TS of 0.1 within a 3 day forecast.
This enhanced forecasting accuracy benefits from Pangu's smaller error growth over time and the relative stability of its error locations compared to GFS model.These differences in error dynamics are not only evident in the North China case study but also in another notable extreme precipitation event in China from 19 to 21 July 2021, as detailed in our supplementary information (figures S5-8).This successful integration of Pangu's large-scale forecasting strengths with WRF's detailed modeling capabilities contrasts sharply with the GFS-driven simulations, which display considerable inaccuracies in H Xu et al estimating rainfall intensity and distribution.Given the remarkable computational efficiency gains of AI over traditional global models-surpassing 10 000fold improvements-the potential for AI in meteorology is immense (Bi et al 2023, Chen et al 2023, Lam et al 2023).This study not only highlights the enhanced accuracy and efficiency of AI-driven weather forecasting models but also underscores their critical role in advancing meteorological forecasting techniques.Moreover, by improving the accuracy and extendibility of forecasts for extreme weather events, AI-driven regional models have the potential to significantly enhance societal preparedness, contributing to more effective disaster risk reduction and management strategies.
While the forecast accuracy demonstrated by the AI-Driven WRF is encouraging, our study's limitations must be acknowledged.Firstly, this study is based on the analysis of only two cases, which are among the most notable instances of extreme precipitation in the China region.The robustness and general applicability of this integrated method need further validation across various weather events and conditions.Secondly, regional model configurations may influence outcomes, underscoring the necessity for a detailed exploration of these effects.Thirdly, using ERA5 data to drive the Pangu model might introduce biases compared to GFS-initialized forecasts.Future research should involve more extensive case studies, diverse initial conditions to drive Pangu and exploring the efficacy of ensemble Pangu in driving the WRF model to fully assess the strengths and limitations of integrating AI-driven forecasts with regional weather models, thereby enriching our understanding and application of these advanced forecasting techniques.

Figure 1 .
Figure 1.The (a) cumulative 72 h precipitation (units: mm) from 0000 UTC on 29 July 2023-0000 UTC on 1 August 2023 from the China Meteorological Administration dataset, (b) spatial distribution of terrain elevation (units: m), and (c) precipitable water (shaded; units: mm), in conjunction with wind fields at 850 hPa (vectors; units: m s −1 ), and 500 hPa geopotential height (contours, dgpm), as of 0000 UTC on 29 July 2023; Column-integrated moisture flux (vectors; units: kg m −1 s −1 ) and its corresponding magnitude (shading; units: kg m −1 s −1 ) for the ERA5 dataset at (d) 0000 UTC on 29 and (e) 0000 UTC on 30 July 2023.The red rectangles in (c)-(e) indicate the location of the North China region.The labels 'BJ' , 'TJ' , and 'HB' in white font signify the locations of Beijing City, Tianjin City, and Hebei Province, respectively.

Figure 2 .
Figure 2. Column-integrated moisture flux (vectors) and its corresponding magnitude (shading; units: kg m −1 s −1 ) for the (a) Pangu_2312, the (b) Pangu_2600, and (c) Pangu_2812 simulations, at 0000 UTC on 29 July 2023.Difference in the column-integrated moisture flux and its magnitude at 0000 UTC on 29 July 2023 between the (d) Pangu_2312, (e) Pangu_2600, and (f) Pangu_2812 simulations and the ERA5 dataset, The black dots represent areas where the difference is significant at the 95% confidence level.The red rectangles indicate the location of the North China region.Vertical profiles of the RMSE and Bias for (g), (j) Specific Humidity (units: 10 −3 g kg −1 ), (h), (k) Temperature (units: K), and (i), (l) geopotential height (units: m 2 s −2 ).The shaded areas represent the 95% confidence interval.

Figure 4 .
Figure 4. (a) Time series of accumulated rainfall (units: mm), over the region demarcated by the red box in figure 1(a), as derived from both the CMA observational dataset and the WRF experiments.(b) TS, and (c) ROC diagram for the 72 h accumulated rainfall between 0000 UTC on 29 July and 0000 UTC on 1 August 2023 in different experiments against CMA precipitation analysis, and (c) Best-track data from the China Meteorological Administration (CMA) (black) and the simulated tracks from the WRF experiments for Typhoon Doksuri, from 0000 UTC on 29 July to 0000 UTC on 30 July 2023.
Examination of the WRF model simulations, incorporating accumulated rainfall time series data (figure 4(a)), Threshold Statistics (TS, figure 4(b)), and ROC diagram (figure 4(c)) for 72 h cumulative precipitation, corroborates the previously discussed variability in the forecasting accuracy of precipitation patterns, especially when contrasting simulations influenced by different forcing inputs from Pangu AI and GFS numerical models.