Correlation-based feature optimization and object-based approach for distinguishing shallow and deep-seated landslides using high resolution airborne laser scanning data

Landslides post great threats to many regions globally, particularly in densely vegetated areas where they are hard to identify. Thus, in order to address this issue, precise inventory mapping methods are required in order to gauge landslide susceptibility in regions, as well as hazards and risk. Obstacles in the development of such mapping methods, however, are optimization techniques to employ, feature selection methods, as well as the development of model transferability. The present study seeks to utilize correlation-based feature selection and object-based approach in conjunction with LiDAR data, whereby LiDAR-DEM derived digital elevation alongside high-resolution orthophotos are employed in tandem. Next, fuzzy-based segmentation parameter optimizer was employed in order to optimize segmentation parameters. Next, support vector machine was employed in order to assess the effectiveness of the proposed method, with results illustrating the algorithm’s robustness with regards to landslide identification. The results of transferability also demonstrated the ease of use for the method, as well as its accuracy and capability to identify landslides as either shallow or deep-seated. To summarize, the study proposes that the developed methods are greatly effective in landslide detection, especially in tropical regions such as in Malaysia.


Introduction
Many applications require the usage of landslide inventory maps, such as regional magnitude recording, initial-step landslide susceptibility, hazard and risk analysis [1], and pattern examination for landslide distribution regarding landscape change due to landslide occurrence [2]. However, the formulation of landslide inventory maps for certain landscapes such as tropics, which are covered by heavy vegetation present, are not so straightforward and pose complications. Even utilizing the most advanced methods of landslide detection, the covering effect for vegetative regions poses complexities, calling for a more rapid yet precise method. Studies have shown that tectonicgeomorphic mapping in greatly vegetative areas compromises visibility for the landscape within [3]. One of the advanced methods for landslide detection in this area is remote sensing data, one of which includes LiDAR data [4]. LiDAR data in current times has come up as an effective method due to features such as dense vegetative area penetration and terrain information provided with high point density. Many studies have also illustrated the capability of this method to map landslides in densely vegetative areas [5], [6], and [7].Landslides may be categorized according to movement characteristics and volume, as either one of two types: shallow landslides or deep-seated landslides [8]. The difference between the two landslide categories is in the size, volume and impact of damage caused [9]. Studies have validated usage of LiDAR data for landslide identification [10], [11]. It has been shown that the method is able to provide essential information regarding active landslide geological features and topography. Thus, the discrepancies among landslide types must be duly noted in order to appropriately investigate geomorphological changes as well as landslide hazard mitigation [12].
Remote sensing and geoscience applications frequently utilize image analysis techniques in order to investigate landslides. Gao and Mas (2008) have reported the employment of pixel-based and object-based image analysis techniques for various landslide studies. Object-based image analysis, abbreviated as OBIA, has been more popularly utilized for varying scales than the former method (pixel-based). Thus, this method could effectively form the semantic features and additional geometry for classification applications [13]. Object-based methods utilizing LiDAR data have been applied in very densely vegetated areas, as an appropriate alternative to pixel-based method due to the uneven terrain present in these landforms [14]. On the other hand, pixel-based methods [15] face the limitation of salt-and-pepper effect which hinders the landslide identification process due to poor visibility [16]. Selection of features is vital for data mining in such applications [17]. Heightened dimensional datasets in classification-type problems lead to complexities in testing and training. A few objectbased landslide studies have employed feature selection employing LiDAR data [18], [19]. One study [17] investigated the importance of feature selection by employing correlation-based feature selection (CFS) in conjunction with gain ratio algorithms. Another study [20] employed random forest (RF) for the feature selection process. Currently, Ant Colony Optimization (ACO) has also been employed for this purpose, ultimately providing effective results [21]. Thus, the aforementioned literature survey illustrates that feature selection methods have been commonly used in conjunction with object-based methods. Nevertheless, there is a lack of literature for utilization for CFS and OBIA for remote sensing data using LiDAR data. The present study seeks to integrate CFS with OBIA for landslide identification, between shallow and deep-seated landslide types. Furthermore, airborne laser scanning data is employed for the study. The following objectives were established for the study: i) to optimize the multiresolution segmentation parameters, ii) to apply the CFS for feature selection from highresolution airborne laser scanning data, and iii) to employ Support vector machine (SVM) for differentiation of landslide types.

Methodology
High-resolution DEM (0.5 m) was derived from LiDAR point clouds, which was in turn employed for generation of other LiDAR-derived products and various landslide conditioning factors: aspect, slope, height (nDSM), intensity and hillshade. Subsequently, these products as well as orthophotos were integrated with correction of geometric distortions. Thus, they were brought together within the same coordinate system and prepared for extraction of features using Geographic Information System (GIS). Afterwards, Fuzzy-based Segmentation Parameter optimizer [22] (abbreviated FbSP optimizer) was employed to retrieve scale, shape and compactness parameters. The appropriate features were chosen utilizing CFS for feature ranking, starting from most to least significant features. SVM was then employed for performance evaluation of the proposed methodology. Lastly, transferability model was employed within the test site, with results being validated with confusion matrix. Figure (

Study Area
The chosen area of study is Cameron Highlands, which is a tropical and densely vegetated region spanning 26.7 km2 of land area (see Figure 2). The reason for choosing this particular area is because of its susceptibility to landslides. Geographically, Cameron Highlands is located in the north part of West Malaysia. The geographical coordinates for this region is 4° 26' 3" to 4° 26' 18" latitudes; and 101° 23' 48 to 101° 24' 4" longitudes.  It was ensured that precise LiDAR data readings were maintained within root-mean-square errors of 0.15m in the vertical axis and 0.3 m in horizontal axis. Furthermore, orthophotos were retrieved for the same aforementioned LiDAR point data system. Non-ground points were deleted utilizing inverse distance weighting (GDM2000/ Peninsula RSO as spatial reference), followed by a DEM of 0.5 m spatial resolution obtained through interpolation of LiDAR point clouds. Next, the LiDAR-based DEM was employed to produce derived layers to effectively identify landslide locations and features [23]. One of the most crucial elements for land stability is slope, which dictates the effect of landslide phenomenology [24]. Slope is also the main element behind landslide occurrence [25]. With regards to mapping of landslides, another point to note is that geometric features along with texture features are very pertinent for enhancing classification precision [19]. Terrain morphology, sampling density and interpolation algorithm employed are all influencing factors for DEM accuracy [26]. Figure (3) illustrates the data used for present study.

Image Segmentation
Factors influencing the selection of segmentation parameters include the environment chosen for analysis, chosen application and input data [27].Scale, shape and compactness are three such parameters which are needed to be selected for this algorithm by employing conventional trial-anderror techniques. However, such techniques are not time-effective and quite laborious [1]. Previously, numerous studies on automatic as well as semiautomatic methods required to identify best parameters were studied [28], and [29].Two of the cutting-edge techniques for automatic segmentation parameter selection are as follows: Taguchi optimization techniques [1] and fuzzy logic supervised approach [22].

Support Vector Machine (SVM)
Support Vector Machines (abbreviated SVM), are a supervised learning classifier commonly utilizes in remote sensing studies [31], [32]. This technique performs nonlinear transformation for covariates within high-dimensional feature spaces. It was investigated that SVM in small training datasets tended to be more precise than maximum likelihood classification, decision tree classification and even artificial neural network classification using greater training datasets [25]. Another study showed that a mere quarter of training dataset was enough for greater accuracy classifications [30]. Furthermore, SVM has been show to be very precise in the presence of limited training data sets [11]. SVM is employed in the current study utilizingthe e1071 package [33]. This was conducted within the R statistical computing software RDevelopmentCORE TEAM [34]. The performance of a SVM classifier depends on its hyperparameters. Therefore, selection of these parameters was optimized and their sensitivity was analyzed by using a grid search with 5-fold cross validation method.

Feature Selection
The various techniques for selecting features are filter, wrapper and embedded methods [35]. Filters need less time for computing, particularly within larger datasets [35]. The method is also suboptimal and not related to the classification algorithm. The wrapper method is not time-effective and omplicated due to the features being gauged with regards to classifier algorithms employed [36]. The features for this method are gauged by way of classification techniques themselves. Therefore, chosen features heavily depend on the classifer employed. In contrast to wrapper method, embedded methods need less time for computing and also addresses the issue of overfitting [37]. When this method combats greater amount of features, overfiting occurs due to irrelevant input features [19]. However, choosing lesser feature sets are effective in producing optimal classification resuls [38].

Optimizing the boundary of the types of landslide
The FbSP optimizer, mentioned prior, was employed to find optimal parameters for multiresolution segmentation, in particular the scale, shape and compactness. Optimized parameters are able to quickly raise precision of classification methods by way of specifying segmentation boundaries according to landslide type. Utilization of optimized segmentation parameters enables the exploitation of spatial and textural aspects for feature selection. The present study proposes an accurate segmentation in order to conduct the following steps, which entail that the optimal values of segmentation parameters be selected using sufficient training samples. These training samples would comprise both landslide and non-landslide types. Table 1 illustrates chosen values for scale, shape and compactness, followed by Figure 4 depicting the segmentation process.

Effects of SVM Parameters
Effectiveness of SVM classifiers greatly relies on hyperparameters. Consequently, in order to select the best parameters, the sensitivity is required to be inspected. Three SVM parameters should be evaluated, which include penalty parameter, kernel function and gamma parameter. These three parameters are shown in Table 2, alongside space search. The sub-optimal parameters in the present study needed grid search using 5-fold cross validation methods.

Selection relevant feature based on (CFS) Method
The current study investigates algorithms for feature selection which aim to support the selection of best features to identify shallow and deep-seated landslides. CFS method is employed for this purpose in order to choose the best features. A total of 86 of features present in landslide differentiation process, mean and StdDev, were considered for DTM, slope, height, DSM, and intensity. As for orthophoto, the red, blue, green, Max. diff and brightness were considered. As for texture features, the Gray-level concurrence matrix (GLCM) correlation, as well as GLCM dissimilarity, GLCM angular second moment, GLCM Mean, GLCM stdDev, GLCM Entropy, GLCM Contrast, GLCM Homogeneity, GLDV angular second moment, Grey level difference vector (GLDV Mean, GLDV Entropy and GLDV Contrast were all considered. Next, the geometry features were considered, such as shape, length and weight, density and region. Highest accuracies were found for results after 9 features were implemented, as shown in Table 3 showing great accuracies obtained. CFS results depict that the best subsets were obtained to enhance differentiation among two shallow and deep-seated landslides for the chosen study area.  Table 4. The results followed the features' utilization for training within the SVM classifier, whereby a misclassification was shown between the two landslide types. Furthermore, various landscape objects were also shown, which were artificial, cutslope among others. On the other hand, SVM classifier utilizing the best features exhibited much more accurate qualitative results, while also being able to effectively distinguish between the two landslide types. The obtained quantitative results were 86.36% for shallow landslides and 87.78% for deepseated landslides, as shown in Table 4. It is shown by the findings that greater accuracy is achieved when using CFS for feature selection. This may be so because of the discrepancy among values of shallow and deep-seated landslides. Thus, distinguishing between the two types was made much simpler. Furthermore, shallow landslide traits, such as size, run out and depth were found to be varying when compared to deepseated landslides. This helped in clearly classifying between two types as Figure 5 shows. The SVM results of classifier also demonstrated the capabilities of CFS algorithm and OBIA optimization techniques, in conjunction with LiDAR data, texture, geometric features and orthophotos which were all employed to improve the process of landslide detection. This entire process is illustrated in Figure 6.

Evaluation of CFS based feature selection
The present study shows transferability of developed model, as assessed in a secondary study of test site. Considerations were taken into account, while parameters for segmentation were optimized in the test site. Thus, full subsets for feature selection in the test site entailed that the lower quality of qualitative assessment be achieved. As a result, overall accuracy for shallow landslide was 64.43% and deep-seated landslides was 65.38%, shown in Table 4.Misclassification was noted between shallow and deep-seated landslides with regards to other landscapes, such as cut-slope, bare soil and artificial. On the other hand, this occurred only when the optimal features were selected. Overall results for accuracy of SVM classifier were 85.32% for shallow and 85.75% for deep-seated, shown in Table 4. However, the present study showed that optimal scales support the exploitation of feature selection, thus making the retrieval of transferability classifier less complex. SVM results further illustrated a drop in accuracy, which is still acceptable for the application. Lowered results accuracy show a decrease in results accuracy given many disadvantages which arise due to landslide types, which may be either shallow or deep-seated, as well as the mixture of landslide, shape, area, amount of time since landslide formation, complex terrain, and so on. The transferability results show that significance of features from high-resolution LiDAR data, textures, orthophoto, and geometric features for landslide classification, shown in Figure 7.  Thus, it may be concluded that selection of most important features could result in the decrease of dimensionality for object features, while also enhancing the classification accuracy. These results are in line with the study by [39]. Thus, in summary, SVM algorithm was found to be more sensitive to feature selection process.

Field investigation
In order to validate the proposed method, a field investigation was undertaken. Subsequently, landslide types were determined by way of GeoExplorer 6000, a handheld GPS device shown in Figure 8. More relevant information regarding landslide extent, source area, volume and deposition were all retrieved from the field investigation. Field measurements also enabled assessment for precision and reliability of the landslide inventory mapped.

Conclusion
The present study focused on improving the precision of landslide mapping by optimizing parameters used in multiresolution segmentation. The chosen parameters were found to greatly improve landslide classification of the two types: shallow and deep-seated landslides. Choosing appropriate features has been shown to greatly optimize classification accuracy, thus also enhancing computational resources for a given task. Lastly, transferability model is also improved. The findings of this study demonstrate the importance of integrated models, whereby the following factors were employed to improve landslide classification: high-resolution LiDAR data, geometric features, texture features, parameter defitinoin for SVM classifier and orthophotos. Additionally, findings for transferability showed that combination of CFS and object-based approach led to effective results, which enhanced the efficiency and cost for landslide inventory mapping methods. In summary, the developed method may be employed to enhance landslide detection and classification, by producing robust inventory maps which ultimately may be used for disaster management applications.