Reliability-based prediction of landslide using efficient Monte Carlo Simulation

Several studies have attempted to develop thresholds for prediction of landslide initiation due to antecedent rainfall by using the physical-based approach. The approach offers a better understanding of the underlying process of the landslide mechanism. In the previous studies, however, determinate values of hydro-mechanical soil parameters are commonly considered, neglecting the uncertainties of the variables. Thus, this study aims to propose a relatively computationally, inexpensive reliability approach to address the uncertainties of the multivariate. The application of the proposed model is demonstrated by taking a case study from Sabah, Malaysia. The Latin Hypercube Sampling was adopted for efficient sampling, while the dependencies of the multivariate were modelled using a higher-order copula. The infiltration process and slope stability were assessed based on a fully-coupled seepage-deformation and shear strength reduction technique, respectively in a finite element programme. The probabilities of failure for various antecedent rainfall condition were computed by using Monte Carlo Simulation. In order to reduce the computational cost of the simulation, the Multilayer Perceptron regressor was applied to form the non-linear surrogate model for an implicit limit state function.


Introduction
In the tropical climate region, many rainfall-induced landslides have been reported, such as those in China, India, Indonesia, Singapore, Thailand, as well as in Malaysia. The occurrences of the landslide can be predicted by using the physical-based threshold [1]. The physical-based (also known as conceptual or process-based) threshold involves a simulation of landslide mechanism due to rainfall infiltration. In unsaturated residual soil, Rahardjo et al. [2] have highlighted the significant role of antecedent rainfall in triggering a landslide. The antecedent rainfall can be defined as the precipitation that occurs in several days closely before the landslide. Their conclusion concurs with the previous study by Wei et al. [3] who have found out that the landslide at Bukit Batok, Singapore happened after several days of torrential downpour, but no rainfall occurred during the landslide incident. Their findings conclude that a landslide is not necessarily triggered by rainfall on the landslide event and provide evidence of the critical role of antecedent rainfall in triggering a landslide.
In the development of the physical-based threshold, most of the previous researches considered determinate value for each hydraulic and mechanical soil parameters, except that by Dou et al. [4] and Ering and Babu [5] to the best of the author's knowledge. However, the geomaterial contains a certain degree of uncertainties due to spatial variability, random testing error, sub-standard measurement procedure, and statistical error such as too few data [6]. Dou et al. [4] applied the Monte Carlo Simulation by considering the variability in hydraulic conductivity for a hypothetical infinite slope. Ering and Babu [5] considered the uncertainties of cohesion, friction angle, and matric suction by IOP Publishing doi: 10.1088/1757-899X/1153/1/012002 2 adopting the Bayesian analysis for a case study in Malin, India. In the Bayesian approach, the parameter value which characterise a system is inferred based on prior information of measured value. However, the approach suffers from the main limitation that the selection of prior probability is subjective and requires information from previous studies, which may not be available or inadequate. In the broader context of the rainfall-induced landslide assessment, several other studies have adopted reliability analysis [7,8]. Nonetheless, the construction of physical threshold for landslide initiation due to antecedent rainfall, which consider the various uncertainties in hydromechanical soil parameters is still lacking.
Therefore, this study aims to propose a relatively computationally, inexpensive reliability analysis framework to construct the physical threshold for the landslide initiation in unsaturated soil, subjected to antecedent rainfall. The uncertainties of random variables of both hydraulic and mechanical soil parameters are considered. The application of the proposed model will be demonstrated based on a case study of a rainfall-induced landslide in Sabah, Malaysia.

Study area
The capital city of Sabah state in Malaysia, i.e. Kota Kinabalu is one of the densely populated areas in the state. One of the landslide-prone areas within the vicinity of Kota Kinabalu is situated around a local access road, known as Jalan Penempatan ( Figure 1). The road which stretches of about 3 km long (from 5.954484 °N, 116.094497 °E to 5.934683 °N, 116.092300 °E) serves as an important route which connects several housing areas to the city centre. The study location which covers an area of about 1.5 km 2 , pertains to a natural hilly terrain with an elevation from 10 to 85 m above mean sea level. The site is underlain by the Eocene-Oligocene Crocker Formation, which mainly consisted of sandstone and shale. The residual soil typically consists of stiff to very hard sandy, clayey silt with an average depth of 10 m. The bedrock comprised of very weak to moderately strong sandstone.
A major landslide in the study area has occurred on 9 th June 2010 at Kilometre 2 (5.941728 °N, 116.094071 °E). The landslide event was due to occurrence of about 9-day rainfall before the incident where (θ) is permeability; is pressure head; z is elevation head; t is time, and; (θ) is specific moisture capacity. .
where θ is volumetric water content at a matric suction; θ is residual volumetric water content, θ is saturated volumetric water content; is matric suction, and; α, are fitting parameter. The fitting parameter α (also known as scaling parameter) is related to the air entry value of the SWCC, while the fitting parameter n (also known as the shape parameter) defines the slope of the SWCC. The bestfitting parameters can be determined by adopting the technique of curve fitting, in which the objective is set to minimise the sum of squared errors between the measured volumetric water content and the modelled volumetric water content. The combined equation by Van Genuchten [8] and Mualem [9] was used to derive the hydraulic conductivity function as shown in Equation 3. where is hydraulic conductivity at a matric suction, is saturated hydraulic conductivity, is effective water content, is parameter describing the pore structure of soil, and; m is a fitting parameter calculated as 1−1/ .

3.1.
3 Elasto-plastic model. The shear strength of the unsaturated slope in this study was determined based on the modified Mohr-Coulomb failure criterion as shown in Equation 4.
where τ f is the shear strength; c′ is effective cohesion; φ' is effective internal friction angle; σ is total normal stress; u a is pore air pressure; u w is pore water pressure; and χ is an effective stress parameter which is a function of degree of saturation.

Geostatistical model and dependencies of multivariate
In this study, eight number of hydro-mechanical soil parameters namely the saturated hydraulic conductivity (ks), fitting parameter α and n of SWCC, saturated volumetric water content (θs), residual volumetric water content (θr), unit weight (γ), effective cohesion (c'), and effective internal friction angle (ϕ') were considered as the random variables. The geostatistical modelling involved the best fitting of marginal distributions and the generation of random samples. Based on these data, the dependencies of the multivariate can be constructed by using a higher-order copula.
The best-fit marginal distribution for each variable was identified based on the lowest score of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) by applying Equation 5 and 6. Four candidates of marginal distribution, i.e. normal, lognormal, Gumbel, and Weibull distributions were considered.
where N is the number of input variable; ( ) is the distribution of variable xi; and k is the number of distribution parameter. Then, the Latin Hypercube Sampling was applied for efficient sampling, where the space of sample was evenly divided into possible subspaces, and samples point in each subspace were selected concurrently for an equal density of sampling in each subspace. No specific guideline is available on the required number of sample size. However, Silvestrini [10] have proposed sample size of 15D, where D is the dimension of input variables, as a safe rule of thumb for effective computer experiment with acceptable accuracy. Thus, in this study, the number of samples was taken as 15D to maintain both accuracy and computational cost.
Next, the dependencies of multivariate amongst the eight soil parameters were modelled using the drawable vine (D-vine) copula. The vine copula allows the modelling of the relationship amongst higher-order variables by taking the bivariate copula as the building block. Besides, different types of bivariate copula can be selected for the vine copula construction. In this study, five models of bivariate copula, namely the Clayton, Frank, Gumbel, Gaussian, and Student's t were considered as the candidates of the best-fit bivariate copula. The D-vine copula can be constructed according to Equation 7.
where νj is a random element of ν, and ν−j represents the (n − 1)-dimensional vector ν except νj. The function of bivariate copula is defined by | − with parameter θ defined in tree n. The most appropriate bivariate copula was identified based on the lowest scores of AIC and BIC (Equation 9 and 10).
where θ is copula parameter, L is the log-likelihood function, u is standard uniform random samples, N is the number of input variable, and k is the unknown variable. For Clayton, Frank, Gaussian, and Gumbel copula, θ = 1, and therefore, k = 1. For the Student's t copula, a degree-of-freedom, ν is considered in addition to the copula parameter θ, and thus, k = 2. Random samples from the cumulative distribution function, constructed from the conditional bivariate copulas can be generated by performing Rosenblatt Transformation. The transformation involves successive conditioning, as explained in Equation 11.
where xi is random samples, u is standard uniform random samples [0,1], and −1 is inverse of the marginal cumulative distribution function.

Finite element model
A two-dimensional slope geometry (Figure 3) was modelled in a general-purpose finite element software, Abaqus. Homogeneous soil layer of sandy, clayey silt was considered in the simulation. The boundary conditions of the slope model were assigned as follows: (i) GH and EF as Dirichlet seepage indicated by the groundwater table; (ii) AH, DE, and FG as zero flux to develop the pore water pressure within the slope model; (iii) AB, BC, and CD as surface flux due to rainfall infiltration; (iv) ABCD as drainage-only condition so that developed pore pressure is not greater than zero; (v) AG and DF were restricted in the X-axis direction; and (vi) GF was restricted in both X-and Y-axes direction.  The element type used was a four-node plane strain quadrilateral with free mesh type. Mesh convergence study has been performed to arrive at an acceptable accuracy of finite element result with a reasonable computational cost. Therefore, the global mesh size of 1 m was applied on the slope model, while a localised coarser mesh of 1.5 m was applied at the base of the slope model to reduce the computational cost. After performing the coupled seepage-deformation analysis, the shear strength reduction technique was applied to compute the factor of safety.

Regression analysis based on machine learning
A regression analysis was performed to determine the limit state function of the sampling-based reliability analysis. In this study, the feed-forward artificial neural network of Multilayer Perceptron was selected for the supervised regression. The multilayer perceptron network consists of three layers, namely the input, hidden, and output layers. The number of hidden layer in this study was fixed at one as it is found that the single layer is adequate to train any complex non-linear functions. Not only will redundant hidden layers adding model complexity, but they also tend to cause overfitting of the model. The learning and convergence rate using a single hidden layer is also more flexible and faster, respectively, as compared to multiple hidden layers. The relatively latest method with improved performance, proposed by Sheela and Deepa [11] was adopted in this study as an initial guess on the required number of the hidden node. The method is explained by Equation 12.
where ℎ is the number of hidden node and in the number of input node. Firstly, the training and testing dataset were randomly split into 70% and 30% respectively. Then, the input datasets were normalised according to the minimum-maximum scaling. The early stopping method was applied to achieve an optimum model performance while preventing overfitting. Next, a 10-fold cross-validation was adopted for the training of the model to minimise the bias due to the random selection of training and remaining dataset. The performances from the 10-fold crossvalidation were then averaged to assess the overall performance of the model training.
The relationship between the input vector x and output vector y in a Multilayer Perceptron network can generally be expressed by Equation 13.
where is the number of input node; ℎ is the node number in the hidden layer; is the ith input unit; ℎ is the weight between the input node i and hidden node ℎ; ℎ is the weight between the hidden node ℎ and output node o; ℎ is the bias (also known as threshold) for the hidden node h; is the bias for the output node o; ℎ is the transfer function of the hidden layer; and is the transfer function for the output layer.
Lastly, the performances of the regression model for the training and testing dataset were assessed in terms of correlation coefficient (R 2 ), root mean squared error (RMSE), and mean absolute error (MAE).

Sampling-based reliability analysis
A sampling-based reliability analysis, namely the Monte Carlo Simulation (MCS), has been adopted in this study. The main idea of MCS is the generation of sets of random numbers according to the probability density function of the input variables and the performance function for each set are then computed. The MCS exhibits several merits such as a large number of simulations can be realised to achieve higher accuracy, a various statistical distribution of variables can be considered, the error in result can be easily determined, the velocity of simulation convergence is independent of the dimension of the random variables, the simulation procedure is not affected by the complexity in performance function, and the system failure probabilities of slopes which are governed by several representative slip surface can be captured [12]. However, the direct simulation demands a substantial computational cost, especially in the case of a smaller probability of failure (e.g. lesser than 10 −6 ). Therefore, the MCS is integrated with the regression function from the Multilayer Perceptron to address the processing time efficiently. In this study, it was found that the probability of failure and coefficient of variation were converging towards the 100,000-realisation number. The expected performance levels for the different probability of failure can be evaluated based on the target reliability indices published by the US Army Corps of Engineers [13].

Geostatistical and multivariate dependencies model
Based on the lowest scores of AIC and BIC (Table 1), the saturated hydraulic conductivity, fitting parameter α and n of SWCC, saturated water content, and effective friction angle can be best-fitted by using the lognormal distribution. The unit weight and effective cohesion are best-fitted by the normal distribution. The Gumbel type is the most appropriate distribution to represent the residual water content. Then, the dependencies of the multivariate were modelled by using the D-vine copula (Figure 4). At the first tree of the vine copula model, the sequence of pair of variables is determined by identifying the order with the maximum summation of the total pairwise Kendall's correlation coefficient. Thus, the bivariates in the first tree are arranged in the following sequence: 1) unit weight with saturated water content (joint by Clayton copula); 2) effective friction angle with unit weight (joint by Frank copula); 3) effective cohesion with effective friction angle (joint by Student's t distribution); 4) residual water content and effective cohesion (joint by Student's t distribution); 5) shape parameter n and residual water content (joint by Clayton copula); 6) saturated hydraulic conductivity and shape parameter n (joint by Clayton copula); and 7) scaling parameter α and saturated hydraulic conductivity (joint by Clayton copula). In tree two, six conditional bivariate copulas are constructed by the decomposition of the copula in the tree one. The decompositions of the conditional bivariate copulas in tree two were then performed to construct new conditional bivariate copulas in tree three. These steps were continued until the last tree. In the D-vine copula, the Clayton copula was applied for ten times, the Student's t copula was selected for eight times, the Frank copula was chosen for four times, and lastly, the Gaussian and Gumbel copulas were chosen for three times respectively. In total, 28 numbers of pair copula were constructed in the D-vine copula.  Figure 4. Conditional bivariate copula modelled for seven trees in the D-vine copula. The letter in upper-right or -left corner of each copula diagram represent the best-fitted bivariate copula where "C" is Clayton copula, "F" is Frank copula, "G" is Gumbel copula, "N" is Gaussian copula, and "t" is Student's t copula. The letter "T" indicates the tree number for each row. The variables joint in the bivariate copulas are indicated as follows: 1 for saturated hydraulic conductivity; 2 for scaling parameter α; 3 for shape parameter n; 4 for saturated water content; 5 for residual water content; 6 for unit weight; 7 for effective cohesion; and 8 for effective friction angle.

Regression model based on Multilayer Perceptron
The topology of the neural network model constructed in this study is explained in Figure 5. The initial hidden node number is assumed as five. The regression model performance is assessed by applying a higher number of hidden nodes, yet the increase in accuracy is negligible. Thus, the selection of five number of hidden nodes is justified in this study. The limit state function equation can then be derived. The R 2 coefficients for the training dataset range from 0.9882 to 0.9898. The MAE and RMSE for the training dataset vary from 0.0271 to 0.0366 and 0.0311 to 0.0376, respectively. The testing dataset show R 2 coefficients between 0.9477 and 0.9557. It also displays MAE and RMSE from 0.0641 to 0.9542 and 0.0772 to 0.1227, respectively. As the value of R 2 coefficients are mostly approaching one, while the values of MAE and RMSE are close to zero for both training and testing dataset, the performance of the regression model is acceptable.

Threshold for landslide initiation
In this study, the probability of failure ranges from 0.0001 from 0.051 for different antecedent rainfall condition ( Figure 6). The mean factor of safety display values between 0.949 and 1.782. The probability of failure is generally increasing, while the mean factor of safety is decreasing with the increase in the rainfall duration and cumulative precipitation. This can be explained that the shear strength of the soil is reduced due to seepage in the slope. During the initial stage, prior to application of antecedent rainfall, the probability of failure is 0.0001, which can be categorised as "good" performance level. When the rainfall was applied for one day, the probability of failure increase to 0.00094. At this stage, the performance level is approaching the threshold of "above average". Further application of antecedent rainfall from day two to day seven has caused a rise in the probability of failure from 0.0014 to 0.0056, with a reduction in the mean factor of safety from 1.602 to 1.196. During this 6-day rainfall duration, the slope shows the performance level of "above average". On day two and day five, no increase in rainfall infiltration has been observed. However, the probability of failure is kept increasing due to redistribution of rainwater in the slope. The performance level of the slope decline to "below average" upon application of 9-day antecedent rainfall. The slope is considered fail when the performance level reaches the "poor" threshold. The corresponding probability of failure is 0.0051. Therefore, the landslide can be defined as the condition when the "poor" performance level is reached or exceeded. The 9-day antecedent rainfall with a cumulative amount of 273 mm depth, combined with a major storm of 74 mm depth is therefore considered as the rainfall threshold which may trigger landslide in the study area.

Conclusion
This study has presented a relatively computationally, inexpensive reliability-based approach to evaluate the physical threshold of landslide due to antecedent rainfall. The application of the framework has been demonstrated by taking a case study from Sabah, Malaysia. The uncertainties considered are the eight dependent random variables (saturated hydraulic conductivity, scaling parameter α and shape parameter n of SWCC, saturated water content, residual water content, unit weight, effective cohesion, effective friction angle), which were efficiently sampled using the Latin Hypercube Sampling. The dependencies of the multivariate were also constructed using the flexible Dvine copula, in which the best-fitted bivariate copula can be selected as the building block. The Multilayer Perceptron network, which can capture the non-linear relationship between input and output variables is adopted to derive the limit state function from the coupled seepage-deformation and shear strength reduction analysis. By taking the limit state function as the surrogate model for the expensive deterministic analysis, the Monte Carlo Simulation is realised to compute the probability of failure and mean factor of safety. It is found that the proposed model displays an acceptable performance with the R 2 coefficients approaching one, while the MAE and RMSE are close to zero for both training and testing dataset. The expected performance level can then be assessed based on the probability of failure, which provides a more representative index for the slope stability. It has been demonstrated in this study that the landslide may occur when the "poor" performance level is reached or exceeded. The physical threshold in the study which may trigger landslide pertains to a cumulative rainfall depth of 273 mm for 9-day antecedent rainfall and a major storm of 74 mm depth.