Characterization and process understanding of tropical cyclone-induced floods derived from observations in Shenzhen, China

Coastal cities like Shenzhen are confronting escalating flood risks under the combined impact of climate change and rapid urbanization, especially the tropical cyclones (TC)-induced flood. Incorporating the impact of climate change and urbanization on the flood, this study constructed a new TC-induced flood model on western Shenzhen embedded with a unique statistical approach. Based on extensive historical data and machine learning techniques, the temporal characteristics and changes of flooding were revealed. The results reveal an increase in the frequency of TC-induced floods between 1964–2022, especially after the 1990s, which is attributed to a decrease in the distance of the location of the maximum intensity of TCs (observed within an 800 km range of the study area) relative to the land, averaging a reduction of 11.4 km per decade. This shift towards land is likely due to changes in the locations of TC genesis. Furthermore, the ‘rainfall sea level’ threshold for western Shenzhen was accordingly derived from the results of modelling, which would enable decision-makers to quickly assess TC-induced flood risks. The study’s proposed methods offer alternative approaches for predicting TC-induced floods in regions where the gathering of hydro-meteorological data is challenging or where economic and technological resources are limited.


Introduction
Coastal regions have historically been hubs of civilization, trade, and culture.However, they are also at the forefront of the challenges brought by climate change.Within a 100 km radius of coastlines, there are about 38% of the global population living in this region and 80% of flood-related fatalities worldwide between 1975 and 2016 occurred as well (Barbier 2015, Hu et al 2018).Coastal cities are more vulnerable to extreme hydrometeorological events (IPCC 2021).Floods induced by tropical cyclones (TC) are of particular concern that the accompanying intense storms and storm surges always have catastrophic effects on the living and production of the urban.The combined impacts of sea level rise, land subsidence, and rapid urbanization would exacerbate these effects and lead to unprecedented socioeconomic loss (Liu et al 2022).
Located on the coast of the South China Sea, Shenzhen, a typical coastal megacity, epitomizes the challenges faced by coastal regions globally.Since its establishment in 1979, Shenzhen has undergone urbanization at an unprecedented pace.In 40 years, its built-up area has expanded 35 times, growing from 27 km 2 in 1979-946 km 2 in 2017 (Yu et al 2019).This rapid urbanization has brought about transformative land-use changes that may intensify the flood risks coupled with the rise of sea levels.In recent years, Shenzhen has experienced multiple severe floods triggered by TCs.On 2nd and 8th September 2023, Super Typhoon Saola and Typhoon Haikui, hit the city in quick succession within a week and triggered shutdowns.The downpours brought by Typhoon Haikui broke seven historical records of the city since 1952.These records include the maximum rainfall over 2 h (195.8 mm), 3 h (246.8 mm), 6 h (349.7 mm), 12 h (465.5 mm), 24 h (557.8 mm), 48 h (613.8 mm), and 72 h (614.6 mm), causing massive flooding to Shenzhen (Xinhua 2023).Earlier super typhoons, Hato and Mangkhut in 2017 and 2018, had similarly devastating effects, highlighting the urgent need for effective TC-flood forecasting (Shenzhen Climate Bulletin 2018).
In response, the local government departments in Shenzhen have conducted a rigorous assessment of urban flood thresholds based on historical disaster data, survey insights, and other pertinent sources.This widely adopted assessment has informed the creation of a comprehensive table, enabled the estimation of specific rainfall thresholds at the subdistrict scale, and facilitated the evaluation of flood probabilities (Risk Warning for Major Meteorological Hazards in Shenzhen 2022).Machine learning (ML) has recently advanced urban flood forecasting, with studies like Zhang et al (2023), Ke et al (2020) applying ML algorithms to develop models for Shenzhen.However, these studies predominantly relied on rainfall data as the only input variable, potentially ignoring the complexity of TCs-induced floods.
Hydrological models are reliable tools for TCinduced flood study, offering robust support for a deeper understanding and prediction of flood events.However, most of the previous studies primarily focused on individual TC flood events, with a relative lack of research on long-term variations and trends of TC floods (Lee et al 2020, Yang et al 2021).In recent years, some researchers have indeed begun exploring this field (Joyce et al 2017, Zhang and Najafi 2020), Qiang et al (2021) also employed the SWMM model to simulate flood maps for the western coastal areas of Shenzhen under various combinations of rainfall and storm surges.Regrettably, these studies mainly concentrated on the long-term assessments of the impacts of sea-level rise on flooding, overlooking the potential effects brought about by urbanization.Moreover, as pointed out by Marsooli et al (2019), unlike simulations for rainfall-induced flooding events, TC-induced study needs to consider the variations in TC characteristics affected by climate change, an aspect that remains inadequately studied.As the establishment and operation of hydrological models often demand substantial computational resources and workforce, especially in acquiring highquality hydro-meteorological and geographical data for studying flooding scenarios in larger regions, the application of hydrological models on urban floods is therefore limited.
Considering the aforementioned issues and challenges, the objectives of this study are to: (1) develop a TC-induced flood forecasting model that combines machine learning techniques with statistical approaches; (2) identify and analyse historical potential TC-induced flood events in Shenzhen, examine changes in their frequency and discern possible natural causes.

Study area & data sources
Shenzhen, a coastal metropolis, is located in the south of Guangdong Province, China, with a geographic location between 113 • 43 ′ -114 • 38 ′ E and 22 • 24 ′ -22 • 52 ′ N. The city borders the Pearl River Estuary to the west and Hong Kong to the south (figure 1).There are around five TCs affecting Shenzhen on average every year.This study narrows down to the western portion due to the long-term sea level observations available at the ChiWan site, representing Shenzhen's western sea area.Focusing on four out of nine river basins that flow into this area, namely Maozhou River, Pearl River Estuary, Shenzhen Bay, and Shenzhen River basins, this region holds hydrological significance and embodies Shenzhen's core urban zones, marked as the most vulnerable in Shenzhen by prior research (Sarica et al 2021).
The observational 24-h rainfall data (00:00-24:00 UTC) and daily maximum sea level data (astronomical surge + storm surge) are collected from the Shenzhen Hydrological Yearbook (1964-2019) and Shenzhen Meteorological Bureau (2020-2022).The locations of the stations are shown in figure 1(b).Given the lack of data on storm surges in Shenzhen, the dates of maximum storm surges and sea levels of Hong Kong (North Point/Quarry Bay station) are used here, which are obtained from Tropical Cyclone Yearbook provided by Hong Kong Observatory and Chan (1983).Historical TC flood disaster information  is provided by the Shenzhen Meteorological Bureau and the Shenzhen Water Affairs Bureau.Additionally, records of seven TC-induced flood events (IDA, 1964;RUBY, 1964;ELLEN, 1983;GORDON, 1989;BECKY, 1993;SAM, 1999;YORK, 1999) are from the Collection of Storm Surge Disasters Historical Data in China 1949-2009(Yu et al 2015), Shenzhen flood control plan revision and river improvement plan: Revision report of flood control plan (2014)(2015)(2016)(2017)(2018)(2019)(2020).The land-use data originate from the National Earth System Science Data Centre.The historical length of the drainage network data is collected from the official website of Shenzhen Water Authority and Peng et al (1999).The track and intensity data of TCs are obtained from the Western North Pacific tropical cyclone database (Ying et al 2014, Lu et al 2021).The monthly sea surface temperature (SST) data is from the Met Office Hadley Centre SST dataset, and the monthly wind data is from the NCEP/NCAR (Reanalysis-1) dataset.

Methods
This study consists of four steps as shown in figure 2.

Maximum sea level & maximum daily rainfall
The daily maximum rainfall and maximum sea level occurring within ±1 d of the maximum storm surge event are collected.In total, 340 TCs are studied in this research.Note that the sea level data of Typhoon Viola (1964) in Shenzhen is missing, it is estimated by the fitting function of the remaining 339 TCs' sea level data at Shenzhen and Hong Kong (figure A1).The maximum sea level of Typhoon Viola (1964) in Shenzhen was calculated to be about 1.4 m.Detailed information can be found in the appendix.

Flood risk composite index (FRCI)
To address the potential influences of climate change and urbanization on flooding events over time, a FRCI is constructed.The process involves:

Establishment of Flood Sensitivity Factors: (1)
Based on land-use data, the ratio of impervious area to green area (forest, grassland, farmland), and water area for each year in the study area were calculated (figure A2).Since Shenzhen was established in 1979, with minimal land-use changes before that, data from 1980 was used for the period between 1964 and 1979.( 2) Historical drainage network length served as a measure of urban drainage capacity.Similarly, since the construction of Shenzhen's drainage system began in 1980 and was minimal before that (Chen 2014).
(3) Accounting for the effects of sea-level rise and land subsidence, a relative sea-level rise measurement was constructed.The average rate of sea level rise near the study area is approximately 3.  (VIF) was utilized to assess the linear relationships between factors.It was found that the VIF values for 'Relative SLR,' 'Impervious-to-Green Ratio,' and 'Drainage capacity' are significantly greater than 10, indicating potential serious multicollinearity issues.4. Dimensionality Reduction: principal component analysis was employed to reduce the dimensionality of these factors, and the first principal component, which explained approximately 91% of the variance (figure A4)), was selected as the FRCI as shown in figure A4).

Model building & application
Machine learning is used to extract the main features of massive data samples through algorithms, and make predictions according to the learned rules.
The process of building a machine learning model includes structural construction of data, model building, and validation.Flood prediction is usually an application of binary classification (e.g.'true and false' , 'yes and no'), which aims to distinguish flood events from non-flood events based on hydrological variables (Ke et al 2020, Schmidt et al 2020, Chang et al 2022).However, this study aims to establish a model to predict the likelihood of flood occurrence, providing a continuous probability value instead of a binary classification.
Considering the limited sample size, several commonly used regression models were considered, encompassing both linear models (logistic regression (LR), support vector regression (SVR), and ridge regression (RR)) and nonlinear models (random forest regression (RFR), gradient boosting regression (GBR), and regression tree (RT)).To determine the optimal model, five-fold cross-validation was employed, and the performance metrics of each model, including Accuracy, area under the curve (AUC), Recall, Precision, F1 Score and matthews correlation coefficient (MCC).
The decision boundary delineates the input feature space, providing insight into a model's decisionmaking process.By comparing these boundaries, the model that best captures the inherent data patterns is chosen for accurate flood predictions.Therefore, the decision boundaries of both optimal linear and non-linear models were analysed to select the best-fit model for this study.Finally, this three-dimensional best-fit model was used to estimate the TC flood threshold for 2022 and to calculate the probability of flooding for all TC events.

Attribution analysis methods
Standard linear regression and correlation analysis were employed in this study, and the significance levels are assessed using the standard two-tailed Student's t-test.In addition, the Mann-Kendall (M-K) test was used to test for abrupt changes.This method is widely applied to climate data and is highly effective in verifying a transition from a relatively stable state to another state (Xing et al 2018).More details about this method can be found in Mann (1945).

Best-fit model for TC-induced flood prediction
The historic disaster information of 123 TCs was collected, with 51 experiencing floods and 72 not.Modelling was conducted on these TCs, with the target variable being the presence of floods, and the features including rainfall, sea level, and FRCI.The  results of the five-fold cross-validation for six ML models are shown in table 1.It can be observed that among the linear models, LR slightly lags behind SVR and RR in accuracy but performs the best in other metrics.Among the three non-linear models, RFR has slightly higher AUC and MCC compared to the others.In summary, LR is considered the topperforming linear model, while RFR is considered the top-performing non-linear model.The decision boundaries of the LR and RFR models are illustrated in figure 3. The RFR model provides underestimated predictions for events featuring high sea levels and low rainfall.However, this discrepancy contradicts the reality that the topographic elevation remains constant in the study area.When the sea level exceeds the coastal elevation, flooding events are bound to occur even in the absence of rainfall.In comparison, the decision boundary provided by the LR model aligns more closely with reality and offers better interpretability.Therefore, the LR model is selected to be the best-fit model to predict TCinduced flood events.

TC-induced flood threshold for 2022
To determine the 'Rainfall-Sea level' flood threshold for western Shenzhen, the FRCI for the year 2022 was held constant.Subsequently, 10 000 sets of data points, comprising rainfall and sea levels, were randomly generated.These data points were subjected to the established model to assess their respective flood risk levels, categorized as high, moderate, and low, as illustrated in figure 4.
To better align with practical emergency requirements and enhance the effectiveness of flood warnings, the decision boundary with a probability of 0.3 is taken as the TC-induced flood threshold for Shenzhen in the present.The threshold line can be presented as equation (1): where x is the maximum daily rainfall.

Changes of TC-induced flood events and attribution analysis
Utilizing the established best-fit model, predictions were carried out for the entirety of the 59 year dataset encompassing all TC events.Consequently, the probability of flood occurrence for each TC event was computed.Subsequently, TC events with flood probabilities ⩾0.3 were identified as potential TC-induced flood events.102 potential flooding events have been selected from the total of 340 TCs.
Between 1964 and 2022, the frequency of potential TC-induced flooding events showed a significant  increasing trend (figure 5).In particular, it is obvious from the figure that the frequency of occurrence has been higher since the 1990s.However, during the same period, there is no statistically significant trend for TCs occurring in the south china sea (SCS), or coming within 800 km of the study area, or bringing storm surges to the region (figure A5).Therefore, the following sections consider possible reasons for this increasing trend.
Changes in TC characteristics can directly lead to changes in rainfall and storm surge, and thus affect the frequency of TC-induced flooding.Therefore, we conducted a series of statistical analyses for all 340 TCs during the 59 years.The results show that the time-series analysis of the number of TCs entering various distance ranges of the study area exhibits no discernible trend (figure A6), nor do changes in translational speed between 1964-2022, analysed both in terms of the mean overwater translation speed across different distance ranges of the study area and the translation speed pre and post-landfall (figures A7 and A8).Furthermore, a significant decrease at a 99.9% confidence level in the annual mean maximum intensity of TCs within 800 km during the period of 1964-2022 was observed (figures A9 and A10).It seems that none of these factors-the unchanged counts and translational speeds of TCs, along with the weakened intensity, lead to an increase in TC flooding events.
Observations indicate that TC motion has migrated coastward, poleward and westward due to tropical expansion as well as the higher relative SST along the coast (Daloz and Camargo 2017, Sun et al 2018, 2019, Knutson et al 2021, Wang and Toumi 2021).In this study, we found no significant change in the distribution of the closest locations to Shenzhen before TCs landed on the mainland (figure A11).However, the annual mean distance between the TC maximum intensity location and the study area decreases significantly (99.9% confidence level) at a rate of approximately 11.4 km per decade (figure 6(b)).The maximum intensity here refers to the maximum intensity within 800 km from the study area rather than the maximum intensity over the whole duration of the TC lifetime.To further analyse the trend changes of the annual mean distance, the M-K test method was implemented in this study (figure 6(a)).According to the intersection point position of the UF curve and the UB curve in figure 4(a), it can be inferred that the decrease in the distance appears to be a bifurcation point, which occurs around 1992.The annual mean distance between 1964-1991 is about 417 km, while that is about 358 km between 1992-2022, a difference of 59 km. Figure 4(c) shows the distribution of the maximum intensity points in these two periods, the mean positions of the two periods are at 20.15 • N, 114.68 • E (triangle) and 20.72 • N, 114.01 • E (square), respectively, and the distance between them is about 94 km.Overall, the location of the maximum intensity of the TC is close to the Shenzhen/coast during 1964-2022, a finding that is consistent with the results derived by Wang and Toumi in 2022.These results suggest that the increased frequency of TC-induced floods is likely related to a decrease in the distance between the location of the TC maximum intensity and the land.
Wang and Toumi (2022) suggest that the primary driver of the shift in maximum intensity position could be the zonal changes in the environmental steering flow.However, in this study, there is no noticeable difference in the environmental steering flow near Shenzhen between the two periods (figure A12).We noticed significant differences in SSTs between the two periods, which are associated with the location of TC genesis.After comparison, we found that the TC genesis locations affecting Shenzhen were closer to the coast from 1992 to 2022, with a mean position that shifted north-westward by about 782 km compared to the mean genesis position during 1964-1991 (figure 7).In addition, the proportion of TC generated in the SCS; (0 • -30 • N, 106 • E-120 • E) increased from 21% to 34%.The identification of the SCS region here follows the partitioning methods previously reported (Song et al 2019, Wang et al 2012a).The observed north-westward migration of TC genesis locations may be associated with a 'La Niña-like' intensification of the zonal SST gradient across the equatorial Pacific.As detailed by Lee et al (2022), this La Niña-like/zonally asymmetric convection, is manifested in an enhanced SST gradient, a strengthened Walker circulation, and altered

Conclusion & discussion
To summarise, this study employed statistical methods to predict flood events based on a rich, lengthy, decades-spanning historical dataset, combined with machine learning techniques, to delve deeper into the variations and characteristics of TC-induced flood events.The following conclusions have been drawn: First, the LR model is the best-fit model to predict TC-induced flood events, utilizing rainfall, sea level, and flood risk composite index as characteristic variables.The model gives the current flood thresholds for Shenzhen.The threshold line was determined to be: F(x) = −0.0265x+ 2.2.According to Nie et al (2016), when the sea level of Chiwan is higher than 1.4 m during a TC event, seawater intrusion will occur in western Shenzhen.However, the corresponding rainfall has not been mentioned in their research.Furthermore, Zhou et al (2017) reported that between 2012-2014, the maximum 24-hour precipitation of rainstorm-induced disasters in Shenzhen occurred in the range of 46-631 mm, which is mainly concentrated in the 100-300 mm range.Again, the effect of sea level was not considered in their study.The threshold of this study indicates that flooding may occur if the sea level exceeds 2.2 m without precipitation, or if 24-hour rainfall exceeds 83 mm when the sea level is 0. Combined with the results of previous studies, it can be concluded that the calculated TC-induced flood threshold in our study has a higher degree of confidence than those in the literature.Indeed, the validation of our threshold still requires simulation through hydrological models or observational data from future TC events.In addition, the daily rainfall data does not adequately represent the intensity of rainfall during various times of the day.If the rainfall is gentle and within the capacity of the urban drainage system, the cumulative 24-hour rainfall can surpass the threshold line without causing flooding.Therefore, the intensity of TC-induced flooding needs to be further investigated in the future.
Second, the frequency of TC-induced floods was found to increase at an annual average rate of 1.3%, which is considered to be related to a decrease in the distance of TC maximum intensity, observed within an 800 km range of the study area, relative to the land.The location of the maximum intensity of the TC relative to the Shenzhen/coast decreased at a rate of approximately 11.4 km per decade, with an apparent bifurcation point in 1992.This shift is most likely due to the changes in the location of TC genesis, which is believed to be mainly driven by a 'La Niña-like' intensification of the zonal SST gradients across the equatorial Pacific under global warming.Since there was rapid urbanization that began in the 1990s, the increase in flood frequency may be due to urbanization effect enhancing TC rainfall or causing changes to urban hydrological processes.Additionally, the northward shift of the maximum intensity location could heighten storm surge and might also be influenced by urban land cover, which in turn could amplify precipitation.All these issues are planned to be investigated using numerical models in future work to provide a more comprehensive understanding and address the complex interactions between urbanization and TC-induced floods.
Despite the inherent uncertainties in models and data, flood predictions based on the 'rainfall-sea level' threshold derived by the machine learning approach in this study will enable decision-makers to quickly evaluate the flood risk associated with a TC within a short period.The findings of this study hold significant implications for Shenzhen's effective response to TC-induced flooding.Furthermore, the method of establishing the flood model in this paper provides a valuable approach for areas constrained by hydrological data acquisition or limited by economic and technological conditions.

Figure 1 .
Figure 1.Study area and locations of rain and tide gauges.
5 mm yr −1 (Zou et al 2021, Ministry of Natural Resources 2023), and the mean subsidence rate is about 2.5 mm yr −1 (Ma et al 2019, Wang et al 2012b).The sum of the two rates is 6 mm yr −1 , consistent with the findings of Nicholls et al (2021).However, the rate of land subsidence in Shenzhen before 1986 tended to be zero (Huang et al 2001), and considering that Shenzhen entered a period of rapid construction and development after 1990 (Ng 2003, Du 2020), the influence of urban construction on land subsidence gradually became apparent.Therefore, for the period from 1964 to 1990, only sea-level rise was considered as a factor, while both sea-level rise and ground subsidence were included from 1990 onwards. 2. Normalization: To eliminate the influence of scales, Min-Max Scaling was applied to the four factors (figure A3). 3. Linear Relationship Analysis: To reduce redundancy among factors, the Variance Inflation Factor

Figure 2 .
Figure 2. The framework of the study.

Figure 3 .
Figure 3. Three-dimensional decision boundaries for logistic regression model (a) and random forest regression model (b).

Figure 4 .
Figure 4. Flooding threshold for 2022 derived from Logistic Regression model.

Figure 5 .
Figure 5. Frequency changes of the potential flood events.The dashed line represents the mean over the 59 years.Red and blue lines stand for the linear trends.

Figure 6 .
Figure 6.(a) M-K mutation test for the annual mean distance between the TC maximum intensity location within 800 km and the study area.(b) Changes of the annual mean distance.(c) Distribution of the maximum intensity locations.

Figure 7 .
Figure 7. Distribution of the genesis positions of TCs.

Figure A3 .
Figure A3.Changes in Flood Sensitivity Factors over time (normalized data).

Figure A4 .
Figure A4.(a) Cumulative explained variance by principal components.(b) Change in the first principal component over time.

Figure A5 .
Figure A5.Frequency changes of TC events.

Figure A6 .
Figure A6.Numbers of TCs in different ranges from the study area.

Figure A7 .
Figure A7.Changes of the mean translation speed of TCs before landfall occurring within different range of study area.

Figure A8 .
Figure A8.The annual mean translation speed before and after landfall.

Figure A9 .
Figure A9.The annual mean maximum intensity of TCs.

Figure A11 .
Figure A11.The of the closest locations to Shenzhen before TCs landed on the mainland.

Figure A12 .
Figure A12.The difference in the summer mean (June-October) environmental steering flow (500 hpa level) and SST between the period of1964-1991 and 1992-2022.

Table 1 .
Performance metrics of machine learning models.