Assessment of typhoon disaster loss based on the factor analysis-random forest model

Typhoon disasters in China’s coastal areas pose significant challenges for disaster prevention and mitigation, urban planning and national economic construction. This study aims to address the problem of incomparable disaster assessment indicators and low prediction accuracy of machine learning for small sample data. It establishes an index system based on the practical disaster investigation classification standards, which ensures data sources and uniformity. It also proposes a combination algorithm of factor analysis-random forest regression for direct economic loss prediction, which improves the typhoon disaster losses prediction. The results show that the optimized model has higher accuracy than single decision tree model, random forest model and factor analysis-decision tree model. The factor analysis method verifies the importance of influencing factors, which indicates that China faces great risks of coastal floods caused by super typhoons. The combination regression model can predict disaster losses reasonably, providing effective technical support for typhoon disaster assessment and management.


Introduction
Typhoons are causing increasingly severe flooding in China's coastal areas due to global warming and rising sea levels [1].To reduce the losses from typhoon disasters, it is important to have a scientific and effective assessment of the economic impacts[2,3].However, the evaluation of typhoon disaster losses is a complex system that involves multiple aspects [4] and the mechanism of how typhoons affect the economic losses is not well understood [5].Therefore, how to select sensitive indicators and measure the economic losses caused by typhoons, and then take appropriate disaster prevention and mitigation measures, is a key issue for the government, academia, and the public [4,6].
The existing research methods can be classified into three categories: statistical analysis, remote sensing numerical simulation, and artificial intelligence.The first category of methods uses historical loss data to construct models such as HAZUS-MU, GCOM2D/3D, FCHLPM [7].However, these models have limitations in applying to different regions due to the variation of parameters [4,8].The second category of methods can capture the dynamics of factors such as typhoons, storm surges, waves, and rainfall with high-resolution geographic data [4,9].However, these methods are costly, time-consuming, and not generalizable to other regions [9,10].
Machine learning methods have gradually become a new research hotspot with their widespread application recently [11].These methods have low requirements for data processing and can adapt to complex nonlinear relationships between factors, which makes them an effective way to comprehensively evaluate disaster losses [11][12][13].Some scholars have used different methods to predict disaster losses, such as principal component analysis [14], neural network [7], random forest [15], and support vector machine [16].These studies provide theoretical and methodological support for typhoon disaster warning and management based on machine learning.However, the subjective choice of the index framework makes it difficult to compare the research results across different studies [17,18].In this study, we built a practical framework based on the classification standards in disaster investigation in China and used a combination model of factor analysis and random forest regression to construct a reliable and scientific typhoon disaster loss assessment model.

Data Source and Indicator Selection
We collected data from the official reports of To adjust the economic loss data to the year of 2021, we used the CPI index for each year to discount the data uniformly and obtain direct economic losses as the output data for typhoon disaster losses [19].We used 10-fold cross-validation testing to determine prediction accuracy for the small sample data, which splits the data into 10 folds and tests each fold once with the rest as training.

Research Methodology
2.2.1 Factor Analysis.We applied factor analysis to reduce data redundancy and selection errors and improve model accuracy, as the dependent variables were many and correlated [20].Factor analysis groups variables by their correlation and reduces variables with overlapping information and complex relationships to a few independent factors.The method transforms the original p variables into m independent factors, where m < p.The new variable is a latent variable, a linear combination of unobserved factors and a random error term [20,21].

Random Forest Regression.
We applied random forest for its noise resistance and ability to avoid over-fitting [21].For regression problems, random forest selects n samples by bootstrapping and k attributes randomly to build a CART decision tree.This process repeats m times to build m CART decision trees.The final result of random forest is the average of all CART decision trees by voting to determine the data category [21,22].

Factor analysis-Random Forest Regression.
The combination of factor analysis and random forest regression can be useful for reducing data dimensionality and extract latent factors.Factor analysis finds a lower-dimensional representation that captures the data structure and variable relationships.Applying it before random forest regression reduces features and avoids the curse of dimensionality, which affects model performance and interpretability.Factor analysis also identifies key factors that influence the response variable and gives data insights.

Influencing indicators analysis
We used the random forest method to analyze the weights of 23 indicators influencing typhoon disaster losses.The indicators with greater weights were typhoon wind speed, number of damaged embankments, affected population, typhoon-affected population, damaged large and medium-sized reservoirs, damaged farmland area, and length of damaged embankments (Figure 1).Affected population is the total number of people affected by a typhoon process, including multiple disaster factors such as strong winds, heavy rain, storm surges, and floods.Typhoon-affected population is the instantaneous population impact caused by strong winds during typhoon landfall.
We quantified and ranked the weights of various influencing factors using the random forest method to provide guidance for more targeted disaster reduction measures.The results show that typhoon wind speed was the primary influencing factor, indicating that typhoon intensity was the dominant factor affecting disaster losses.The damaged embankments were the second primary factor, which were generally caused by wave and storm surges from strong typhoons.Correspondingly, the storm surges after the seawall damage led to serious affected population, which was the third primary factor.At the same time, high wind speeds during super typhoon landfall were more likely to cause fatalities.However, the impact caused by strong winds was smaller than the total fatalities.We speculated that in addition to personnel losses caused by strong winds, heavy rain and marine disasters were also the main causes of fatalities.The loss of large and medium-sized reservoirs (the fifth primary factor) was speculated to be directly related to the heavy rainfall caused by strong typhoons around wider areas.Besides personnel safety, the damage of water conservancy facilities also had a relatively important weight due to farmland damage.
From the overall weight analysis results, it can be speculated that super typhoons have a strong disaster-causing effect.As embankments and water reservoirs were damaged, affected population and farmland area were the main affected objects, resulting in high direct economic losses.This conclusion is consistent with the view that extreme weather has led to more severe coastal floods in the context of climate change [23].

Factor analysis
At the second stage, we conducted factor analysis to enhance the interpretability of the indicators and improve the prediction effect.We subjected the 23 dependent variables in the index to factor analysis.The results showed that the first latent factor with absolute values of observed variable loading greater than 0.5 from large to small were typhoon central pressure, typhoon-collapsed houses, affected population, damaged large and medium-sized reservoirs, four variables in total.The second latent factor included collapsed houses, death toll, damaged house, affected population, affected farmland, damaged farmland, damaged small reservoirs, damaged sluices, damaged dams, number of damaged embankments, length of damaged embankments, typhoon wind speed, 12 variables in total (Figure 2).
The first latent factor refers to the severe impact of super typhoons on coastal areas and is named the super typhoon factor.The main influencing variables of the first latent factor are similar to those of random forest analysis.Both methods show that affected population rather than deaths is the main factor, which indicates that comprehensive disaster prevention measures have been taken to protect personnel safety when super typhoons come.This speculation is consistent with China's disaster prevention and mitigation measures in recent years.Additionally, heavy rainfall caused by super typhoons brings greater pressure on large and medium-sized reservoirs which poses possible secondary risks that reservoir dam breaches may bring to coastal areas, and which is also the main risk factor that coastal areas face under uncertain extreme weather conditions [1,8].
The second latent factor refers to the large-scale impact of typhoons, which can be called the coastal flooding factor, mainly affecting houses, population, farmland, and small and medium-sized water conservancy facilities.This is consistent with the main risk faced by China's coastal floods in practice.In the second latent factor, deaths are included as a major variable in latent variables, indicating that areas with higher vulnerability may face greater personnel safety issues when attacked by typhoons [1,24].This contrasts with the analysis results of the first latent variable and random forest on population loss.It is speculated that although relatively complete disaster reduction measures have been taken in typhoon landing areas [24], there are still areas with high vulnerability or inadequate disaster warning measures that have more vulnerability in housing, farmland and facilities [11,14].It is worth noting that damaged embankments are also included in the high weight influencing variables, indicating that marine disasters such as storm surges and waves caused by typhoons are also a key focus of typhoon disaster loss assessment[4,9].

Factor analysis-random forest regression mode
In this stage, we used the latent factors as the input variables for the random forest regression model to predict the direct economic losses, and we used the 10-fold cross-validation method to analyze the prediction accuracy of the model.We compared the combination model with the single models of decision tree (DT) and random forest (RF) and the combination model of factor analysis-decision tree (FA-DT) in terms of model accuracy (Table 1).Among the four methods, the prediction accuracy of the factor analysis-random forest regression model was the highest, which improved the prediction accuracy of the single model by introducing the factor analysis method, indicating the rationality of using the combination algorithm of factor analysis-random forest regression for direct economic loss prediction.
The results show that the random forest method scored 0.90, which is much higher than the decision tree model score of 0.49.Both the single and combination models of random forest outperform the decision tree model.Random forest is a machine learning algorithm that uses multiple decision trees to make predictions.It reduces the variance seen in decision trees by using different samples for training, specifying random feature subsets, and building and combining small trees.Decision trees, on the other hand, is graphs that illustrate all possible outcomes of a decision using a branching approach which is simple but prone to overfitting, that is the main reason why random forest leads to higher prediction accuracy.
Comparing the influencing weights analysis results of random forest factor (Figure 1)with decision tree(Figure 3), decision tree analysis results indicate that farmland affected area is the absolute dominant factor contributing to disaster losses.Other variables such as affected population, dam damage, power interruption, and damaged reservoirs have less impact weight and not all variables enter decision tree analysis.According to actual disaster situations, weight analysis results of random forest method are more reasonable and comprehensive than decision tree.Besides accuracy, the advantages of random forest are that it makes the influencing factor weights more reasonable and interpretative.
Resampling improved the problem of inaccurate prediction results due to too small sample size, verifying the reliability of random forest method in predicting direct economic losses caused by typhoons.However, expanding the sample capacity to improve prediction accuracy is the main problem faced by typhoon disaster loss prediction all the time [12,21].Although this problem can be solved to some extent by data prepossessing, statistical methods, machine learning and other methods, data problem has always been a core problem that hinders the accuracy of disaster assessment [23,25].

Conclusion
This paper builds a typhoon disaster loss assessment system based on disaster investigation standards and uses factor analysis to obtain super typhoon and coastal flood factors for better interpreting the impact indexes and improving the prediction accuracy.Then, we use the factor analysis-random forest model to predict direct disaster losses.This model has good applicability for nonlinear typhoon disaster data and shows the highest accuracy verified with cross-validation and RMSE.
In this paper, we select indicators based on disaster investigation standards for data uniformity, without considering spatial differences, vulnerability and resilience, which leads to incomplete indicators.In the next step, we need to analyze the adaptability and sensitivity of the indicators in more integrative aspects within the risk assessment framework.We would also explore how to integrate relevant indicators with prediction models to support disaster prevention and reduction.In the simulation and prediction process, we should pay attention to whether there are redundancies and couplings among various factors, so as to simulate the mechanism of typhoon disaster economic losses more comprehensively and accurately and explore how to apply in making up for disaster prevention and mitigation policy shortcomings.

Figure 1 .Figure 2 .
Figure 1.Variable weights in random forest [1] Wang Z, Chen X, Qi Z and Cui C 2023 Flood sensitivity assessment of super cities Scientific Reports 1 5582 [2] Xie L and Zhang Z 2010 Study on the relationship between intensity, spatial-temporal distribution of storm surges and disaster losses along the coast of China in past 20 years Marine Science Bulletin 6 690 [3] Gan S, Zhang W and Zong H 2012 Analysis of typhoon storm surge disasters along the south China coast and disaster prevention measures Hydro-Science and Engineering 6 51 [4] Wang S 2021 Risk Assessment and Zoning of Storm Surge Disaster Using GIS Techniques and Convolutional Neural Network(Wuhan:China University of Geosciences) [5] Jiang X, Mori N and Tatano H 2019 Simulation-Based Exceedance Probability Curves to Assess the Economic Impact of Storm Surge Inundations due to Climate Change: A Case Study in Ise Bay, Japan Sustainability 4 Bulletin of Flood and Drought Disaster in China, Bulletin of China Marine Disaster, China Statistical Yearbook, and Yearbook of Meteorological Disaster in China from 2009 to 2021.We categorized the data according to the losses of different sectors, such as agriculture, transportation, electricity, water conservancy facilities, and housing.