Comparing and Evaluating Macao Flood Prediction Models

Climate change causes extreme weather in Macao, especially typhoons and flooding. In this paper, some raw flood data is missing from the Macao Meteorological and Geophysical Bureau, due to some flood sensors that were damaged during Typhoon Hato in 2017 and Typhoon Mangkhut in 2018. So we use data interpolation to construct new datasets and curve fitting to simulate real inundation depth. Besides this, we explore Neural Network, Long Short-Term Memory, Random Forest, Adaptive Boosting, and Linear Regression for analyzing, comparing, and evaluating the best combinations of flood prediction models, datasets, and scenarios caused by typhoon presence in Macao. Furthermore, we apply Bayes Network to the aforementioned models to evaluate the accuracy of predicting flood situations because of typhoons. The experiment results show that the different models achieve a different performance in predicting specific scenarios.


Introduction
The floodings from typhoons not only injure and kill peo-ple but also destroy fundamental infrastructures and spread disease. Super Typhoon Hato (No.1713) brought extremely destructive winds and caused severe flooding in Macao [1]. It resulted in the tragic deaths of 10 people and injured 244 more in Macao. Also, during the passage of Super Typhoon Mangkhut (No.1822) the storm surge rose to 1.5 meters in Macao, about 21,000 homes lost power, and 7,000 homes lost internet access [2]. Therefore, it is essential to forecast floods for warning residents to increase security and make preparations.
The authors of [3] analyzed the reasons for the sharp increase of Typhoon Hato and the change process of hourly rainfall within 3 days of Typhoon Mangkhut landing in China, then assessed the risk and dynamic change of the rainstorm disaster [4]. The authors analyzed the influence of Typhoon Hato on Sanshui and Sihui of Foshan in Guangdong Province and determined the evolution of intensity [5]. The authors of [6] predicted the formation process of typhoons and improved rainfall forecast researching [7] by studying the spatial distri-bution of rainfall and the dynamic prediction of typhoons in Wenzhou, Zhejiang Province [8].
The samples of flood forecasting by typhoons are not enough in Macao for the following reasons. First, the water level monitoring stations were built up after 2015. Second, the sample size is low

Related work
The authors built up the Neural Network model to train the data and used the LSTM model to predict the speed of the wind [9]. Meanwhile, the comprehensive learning method was introduced into the forecast, such as the AdaBoost model which added reducing sampling [10]. Through the testing result of rainstorm prediction, it was verified that the method can improve the accuracy of rainstorm prediction. Then, the authors built up the optimization model of Random Forest for training and testing the data [11]. Finally, they compared the accuracy of this model with other machine learning models. The result showed that this model has high recognition accuracy by introducing indoor localization about Linear Regression to train the history data for predicting the location of the destination [12]. The authors predicted data and decided to use the probability framework [13]. Although sometimes the value of observing is uncertain, the probability model can do this. Bayes optimization is the most effective way to solve this problem because it obtains an optimal approximate solution with less evaluation cost [14].

Dataset design
The private dataset comes from the Macao Meteorological and Geophysical Bureau (SMG), the Macau Maritime Admin-istration (MMA), and the Meteorological Bureau of Shenzhen Municipality (MBSM). Table 1 shows that the data is applied by MBSM in-cluding water level (Level), rainfall (PRES), wind direc-tion (TG WDIR), wind speed (WND), average wind speed (TG WSPD), the tide of Shekou Port (SK Tide), the tide of Shenzhen Airport (JCH Tide), and the tide of DaPeng Peninsula (DAP Tide). Table 2 displays the part of data of 11 water level mon-itoring stations supplied by SMG. First, we clean the original water level data which includes missing data reprocessing, data integration, and data There are six water level monitoring stations which are the main locations of flooding and the water level in this area is low and vulnerable to reflux in Macao. Some data was missing from water level monitoring stations in Macao, so we use the interpolation method to construct new data points within known data points by analyzing the tide level data from MBSM.

Model selection
This section includes useful Machine Learning models to train data and the approaches to describe the relationship between data.

Neural Network
Neural Network (NN) applies signal forward propagation and error back propagation. In forward propagation, input data enter the input layer, then are processed in the hidden layer. Finally, prediction will be output from the output layer. If the real data from the output layer is distinguished from the prediction output, the error will be given feedback. The error back propagates through the output layer, hidden layer, and input layer to revise the weight value of every parameter.

Long Short-Term Memory
Long Short-Term Memory (LSTM) is designed to solve the vanishing gradient problem when trained Recurrent Neural Network (RNN). The LSTM adds three gates including input gate, output gate, and forget gate on the basis of RNN.

Random Forest
In Random Forest (RF), there is no correlation between decision trees. When building up decision trees, random sampling and complete splitting should be considered. It respectively samples the row data and column data, originally "put it back" when sample the row data. The amount of input sample is the same as output sample. After sampling, k features are chosen from all features N (k N), then the sample data is decomposed, and finally the decision trees are built up. The leaf node of the decision tree cannot split or point to the same classification. It avoids over fitting because of random sampling in the beginning.

Adaptive Boosting
Adaptive Boosting (AdaBoost) will classify the categories of training data in every training process. If the category of the sample is right, it will reduce the weighted value. Otherwise, it will add weighted value. The accuracy of the category of the last training process will determine the weighted value of the sample in the training data set. Then, the new data set is modified by new weighted values to the next classifier to train the model. Finally, the combination of weak learners will access the higher accuracy.

Linear Regression
Linear Regression (LR) is an approach to indicate the re-lationship between independent variables and dependent vari-ables. LR uses a math formula to describe a linear correlation between independent variables and dependent variables. As a result, the output value is estimated by substituting it into the math formula and the model accuracy is calculated.

Bayesian Network
Bayesian Network (BN) displays causality between events, such as the presence of a cause given the presence of an effect, and vice versa. New evidence overthrows the previous reasoning. Even though data is incomplete, reasoning can still be carried out.

Experiment
The water warning level definitions are in Table 3. In Table 3, the pl symbol represents predicted water level and the rs symbol represents road surface level. Table 3. Classification of water warning levels in Macao.

RF and AdaBoost Result
This experiment uses the library functions of RandomFore-stRegressor( ) and AdaBoostRegressor( ) to predict the water level of the Inner Harbor area, and train data using different time tensities. The comparison of results is shown in Fig. 4. The ideal model uses the data of Typhoon Mangkhut to train the Mangkhut model and the best time tensity is five minutes. Finally, the Mangkhut model is evaluated using BN.

AdaBoost
The previous experimental results show in flooding warning levels 2, 3, 4. Random Forest Detection works best in Regression Detection works best Detection works best in levels 3, 4, 5.
As a result, we observe that the combination model of RF and LSTM suits low an areas in Fig. 6. The effect of the combination of NN and LR is NN can be used in high-risk areas.

Conclusion
This paper utilizes typhoon rainfall data in Macao which categorized by time density and realized Random Forest, Adaboost, Linear the data of Typhoon Hato to train the Mangkhut model separately using analyze the result and build up dynamic change of the raining prediction model by typhoons. Third, the sample is tested by using an established prediction model to require the accuracy probability distribution. parameters and permutation are utilized to situation in Macao. Fifth, the Bayes Distribution Map of the in Macao by using Bayes software model of Random Forest and LSTM suits BP Neural Network and Linear Regression is better in middle can be used in high-risk areas. The previous experimental results show in Table 4 that Neural Network works best in predicting levels 2, 3, 4. Random Forest Detection works best in levels best in levels 3, 4. LSTM works best in levels 1, 2. Neural Network Detection works best in levels 3, 4, 5.
As a result, we observe that the combination model of RF and LSTM suits low an . The effect of the combination of NN and LR is better in moderate and high risk areas. This paper utilizes typhoon rainfall data in Macao which has post-typhoon flooding. The data is categorized by time density and realized Random Forest, Adaboost, Linear Regression. Hato to train the Typhoon Hato model, and the trained model of Mangkhut model separately using the data of Typhoon Hato and Typhoon Mangkhut. Second, we analyze the result and build up the Macao rainfall spatial distribution model by typhoons and dynamic change of the raining prediction model by typhoons. Third, the sample is tested by using an established prediction model to require the accuracy probability distribution. parameters and permutation are utilized to boost the best method to predict the water in Macao. Fifth, the Bayes Distribution Map of the overall evaluation of the server flooding software Netica is drawn. As a result, we observe orest and LSTM suits low and middle-risk areas. The effect of the combination of BP Neural Network and Linear Regression is better in middle and high-risk areas. BP Neural Network