Landslide Susceptibility Mapping using Genetic Expression Programming

The increasing demand for land use and the mismanagement of lands have caused the increase of landslides around the world. It is important to recognize the landslide characteristics and the determining factors that influence this phenomenon in order to mitigate the adverse economic and environmental impacts. This study aims to estimate the landslide susceptibility in an area of Siahkal at Gilan province, Iran, by formulating a model using Gene expression programming (GEP). Seven condition factors including altitude, aspect, slope, proximity to rivers, proximity to faults, land use and lithology, were used in this research. The proposed model was developed as an equation, and its accuracy was assessed by the area under an Receiver Operating Characteristic curve that shows 0.82 and 0.77 for the training data and the test data, respectively. The result of this research was also evaluated with the inventory map over the study area which was constructed by field surveying and interpretation of airborne/satellite images. Our landslide susceptibility map indicates that the northern part of the area has the highest possibility for landslide which is in agreement with the landslide inventory map of the previous landslide.


Introduction
Natural hazards can cause social and economic crises and might lead to numerous casualties [1; 2]. Landslides are one of the devastating geological hazards which may cause serious damages to infrastructures and environments [1; 3]. Therefore, it is crucial to predict and determine landslide-prone areas in order to avoid the adverse effects of landslides on people's lives and properties [1; 4]. Different methods have been used to understand the importance of various factors that are affecting landslide susceptibility mapping [4]. These methods include not only statistical methods but also machine learning methods [5]. The necessity of normal distributions of environmental factors for statistical models is one of the drawbacks of the statistical methods which are inherently linear [5]. On the other hand, advances in computing technologies have provided an opportunity to further statistical methods into machine learning methods [6]. Machine learning techniques can use input variables without statistical regularity [5]. Well-known machine learning methods include Support Vector Machines (SVM), Decision Trees (DT), Logistic Regression (LR) and Artificial Neural Networks (ANN) [5][6][7].
ANN was applied to landslide susceptibility mapping in several studies [8][9][10][11][12], while SVM [8; 9; 13; 14] and LR were also used for determining landslide susceptibility mapping [9; 12; 15; 16; 17]. ANN has been used for complex modelling. However, it is one of the black-box models, and the model outcome from this method is challenging to get interpreted. The models generated from ANN have local optimum and overfitting limitations [5; 18; 19]. The SVM training process takes too much time, which is not an interpretable model [5; 13]. This encouraged researchers to use Evolutionary Algorithms (EA), namely Genetic Algorithm (GA), Genetic Programming (GP) and Genetic Expression Programming (GEP) [19; 20]. GP is less likely to trap in local optimums, and it is capable of modellimg a nonlinear problem such as landslide susceptibility [19; 21]. GP can define the model in the form of definite comprehensible equations that lead researchers to conduct studies based on evolutionary computation approaches [19].
Some of previous studies have used GEP to create a model based on different environmental factors that affect the landslide occurrences. In this research, GEP is also used to produce the susceptibility maps for landslides. GEP was applied on seven conditional factors including altitude, aspect, slope, proximity to rivers, proximity to faults, land use and lithology. Firstly, the conditional factors were classified, and then GEP was used to determine the relations between different classes. Finally, the generated equation from GEP from all of the condition factors was used to create landslide susceptibility maps.

Study area
Our study area, Siahkal, is located between latitudes 36°52'30" N and 37°00'00" N, and longitudes 49°52'30" E and 50°00'00" E, with an area of around 70 km 2 in Gilan province (North of Iran) ( Figure  1A and 1B). The region consists of highlands with the lowest level of nearly 300 m and the highest surface of about 2,500 m above sea level. A substantial amount of the study area is covered by rainforests followed by farmlands being the second greatest terrain feature. The climate in the region is cold and humid. Moreover, due to a high level of evaporation caused by the Caspian Sea, there is significant precipitation in the study area. High humidity in the area results in slight temperature fluctuations between summer and winter. In the presence of features such as humidity, frequent precipitations and mountainous terrain, the area holds a critically high potential for landslides.

Conditioning factors
We chose different condition factors to determine the landslide susceptibility on the basis of landslide types, data availability for our selected area and previous research [1; 2; 4-6; 22 -28]. We used the aforementioned seven condition factors which greatly influence the chances of landslides. In order to collect and prepare data, a topographical map (1:25,000 scale) of the region was obtained from the National Cartographic Center (NCC) of Iran along with a geological map (1:100,000 scale) from the Geological Survey of Iran (GSI).
Landslide inventory maps were used to train and test the GEP-generated model ( Figure 1C). The inventory maps (1:50,000 scale) were gathered from the Forests, Range and Watershed Management Organization (FRWO) of Iran. The inventory map used in this research was generated from field surveying, aerial photographs and satellite images. The seven conditioning factors were classified using ArcGIS. The elevation data was categorized into ten classes (Figure 2A). The aspect data was classified into eight directional classes plus one directionless class, namely northward, north-eastward, eastward, south-eastward, southward, south-westward, westward, north-westward and flat ( Figure 2B). The slope data was divided into six classes with equal intervals ( Figure 2C). Euclidean distance was used to calculate the proximity to rivers and faults ( Figure 2D and 2E, respectively). Distance from rivers was classified into five classes with equal intervals, and distance from faults was categorized into four classes (0-100,100-200,200-500, >500). The lithology data has 13 types of rocks which are present in the study area ( Figure 2F). The land use data has three classes, namely garden, cultivation and forest ( Figure 2G).

Gene Expression Programming (GEP)
GEP is one of Evolutionary Algorithms (EA) that acts like biological evolution that the fittest individuals survive [7; 29]. This algorithm was first introduced in 2001 [30]. The package used for landslide modelling in this research was GeneXproTools. The GEP algorithm begins with creating chromosomes of the initial population [30]. A set of functions were defined as a character for the encoding of the chromos (Table 1). Then the decision boundaries were defined to evaluate the fitness of the individuals [7]. The evaluation of model fitness was conducted by the area under the Receiver Operating Characteristic (ROC) curve. Selection programs will come to effect when the termination criteria are not satisfied. Then, several genetic operations, such as replication, mutations, etc. were applied to reproduce the next generations. In the replication step, two selected parents exchanged their genetic elements, and in mutation, certain gene elements will be modified randomly [7].

Results and Discussion
To model the landslide phenomenon, this study uses GEP due to several advantages including having chromosomes with simple interconnected linear structures that are relatively small and convenient for genetic modification (e.g. repetition, mutation, recombination, rearrangement, etc.), as well as the ability to form tree subordinates in a variety of size and shape. The software used for this is GeneXproTools. As for landslide susceptibility mapping, seven condition factors i.e., elevation, aspect, slope, proximity to rivers, proximity to faults, lithology and land use, were initially prepared in ArcGIS and then passed as an input to GeneXproTools. The data were divided into a training data (70%) and a test data (30%), for training and testing the model, respectively. The area under the ROC curve (AUC) was used as a performance measure. AUCs for the training and test data were 0.82 and 0.77, respectively ( Figure 3). An AUC between 0.7-0.8 was considered as an acceptable and >0.8 as good classification performance [7]. This means that the obtained results in our research are reliable.
The best output model, considering an efficient program structure in this study, is provided in Equation (1): ( 3 * tan −1 ( 1 2 * tan −1 (√ * 0 ))) + 6 * 4 2 * (√( 4 * 3 ) + 8 3 ) + √(( 5 where d0, d1, …, d8 represent aspect, distance to faults, distance to rivers, lithology, slope, elevation, land use (cultivation), land use (forest), land use (gardens), respectively. The value for c in the equation was 2.003. Finally, a landslide susceptibility map of the study area was generated by using the proposed model ( Figure 4). The landslide susceptibility was classified from low to high by the natural break method in ArcGIS. The result indicates that the majority of the landslide-prone regions are identified in the north and north-west of the study area. Similarly, the inventory map indicates that the northern region had the biggest landslide in the past. This indicates that our model can predict landslides accurately.  Figure 4. Landslide susceptibility map produced from GEP method. The map over layed on the map of the study area using Google Earth.

Conclusions
The adverse impacts of landslides encourage researchers to develop models for predicting this phenomenon in order to help decision-makers to mitigate the effects of landslide hazards. Landslides are a complicated nonlinear phenomenon, and we proved that GEP is an appropriate algorithm to address the problem because it deals with complex nonlinear problems. Also, unlike common approaches such as artificial neural networks which yield vague solutions for such issues, GEP provides results in the form of a clear and readily understandable equation. GEP was less likely to be trapped in local optimum compared to other evolutionary counterparts such as GA and GP. Therefore, to map potential landslide areas, this approach is strongly recommended due to its accuracy, simplicity, and ability to provide definite formula. The assessment on the basis of AUC shows that the generated model for our study area is reliable as evidenced by 0.82 and 0.77 for the training data and test data, respectively. The landslide susceptibility map generated from the proposed model is also in agreement with the reference inventory map.