The probability of coseismic landslides in different peak ground accelerations: a case of 1927 Gulang M8.0 event

A catastrophic Ms 8.0 earthquake struck Gulang County, Gansu Province on May 23, 1927, and triggered numerous coseismic landslides. Based on a high-resolution remote sensing image from Google Earth Platform, we delineated 936 coseismic landslides with an area of 58.6 km2. Further, ten factors were selected as the impact factors for earthquake-triggered landslide hazard mapping of the earthquake. We propose a sampling method that selects the sliding and nonsliding samples established on the ratio of the study area to the landslide area based on the Bayesian theory, and a total of 247080 samples are selected. We establish a multi-factor impact model for landslide probabilities using the logistic regression (LR) model, calculate the regression coefficients based on LR models, and obtain the occurrence probability of coseismic landslides. The model is then applied to calculate the probability of earthquake landslides under different PGAs. The study areas are divided into extremely high-risk, high-risk, medium-risk, low-risk, and extremely low-risk areas. The results of this study provide scientific references for mitigating earthquake-related landslides in earthquake-prone areas.


Introduction
The population is becoming increasingly concentrated as the economy and society develop, major and lifeline projects become more intense, and the sensitivity and vulnerability to earthquake disasters grow. Thus, the risk of earthquake disasters cannot be ignored. On May 23, 1927, an earthquake of magnitude 8 occurred near Gulang County, Gansu Province. The seismic intensity reached XI degree, and the earthquake hypocenters macroscopical depth was 12 km [1] . About 40,000 people were killed and injured by the earthquake, and caused numerous landslides, collapses, and other geological disasters. Some of these landslides resulted in significant casualties and property losses. The Dengshanzhuang landslide, for example, buried an entire village and killed 104 people [2] .
Strong earthquakes occur frequently in the northeastern margin of the Tibetan Plateau, with severe coseismic landslide disasters. Historically, the coseismic landslide of the 1718 Tongwei M7.5 earthquake resulted in a considerable number of deaths [3] . Nearly half of the more than 200,000 casualties in the 1920 Haiyuan M8.5 earthquake was caused directly by coseismic loess landslide [4] . Further, two remote loess landslides triggered by the Minxian MS6.7 earthquake in 2013 resulted in at least 12 deaths [5] . The Gulang earthquake was also one of the most destructive earthquakes on the Tibetan Plateau's northeastern margin. In addition, the Qilian-Haiyuan fault, which runs between the Haiyuan and Gulang earthquake source areas, with a length of 280 km has no historical records of strong earthquakes in the past 800 years. This area is called the Tianzhu earthquake void zone and is considered the most vulnerable to future severe earthquakes [6] . This study applies modern remote sensing and geographic information system technology to the risk assessment of an earthquake-triggered landslide. The findings of this study have significant practical application value and can provide a scientific reference for future disaster prevention and mitigation in this area, as well as theoretical support for some major projects.

Study area
This study focuses on the 1927 Gulang earthquake high-intensity region (VIII-XI), with a total area of 12354.1 km 2 , located in the Wuwei City, Gansu Province. Wuwei is a prefecture-level city in the central and western parts of Gansu Province, located in the eastern section of the Hexi Corridor. The elevation in the study area ranges from 1420-4586 m, and the terrain is generally high in the south and west and low in the north and east. There are 8 active faults with strong tectonic activity and a series of NNE trending drainage systems in this study area (Fig. 1).

Seismic landslide data
Based on the Google Earth platform's high-resolution satellite remote sensing image, this study adopts the method of visual interpretation to conduct landslide visual interpretation in the Gulang earthquake high-intensity area. A total of 1516 landslides are interpreted, including about 600 glacial freeze-thaw landslides, which are not sure were triggered by the 1927 Gulang earthquake. Therefore, this study excludes such landslides, and 936 landslides are finally determined (Fig. 2). The scale of these landslides is mainly small and medium-sized. The total landslide area is 58.55 km 2 , the density of landslide point is 0.075 km −2 , and the percentage of landslide area is 0.47%.

A sample selection method based on the Bayesian thinking
The coseismic landslide probability of a single seismic event is the ratio of the total area of all landslides to the size of the study area based on the Bayesian probability theory, which is the prior probability of the occurrence of the coseismic landslide corresponding to the seismic event. Thus, the probability of the occurrence of a coseismic landslide in a region (Pcols) can be defined as the total area of the coseismic landslide in the region divided by the total area of the region.
Therefore, the selection of training samples should ensure that the proportion of landslide and nonslip samples is consistent with the proportion of landslide and nonlandslide area in the actual area. In most prior investigations of this model, the percentage of sliding and nonsliding samples was equal. This sampling method artificially exaggerated the proportion of landslide samples in the study area, and such evaluation results only considered the relative risk, not the true probability of landslide occurrence, resulting in the probability generated by the model overestimating the actual probability of landslide occurrence [7] . Thus, we expect the sliding sample generation probability to be proportional to the spatial range.
Sample points are randomly generated in the study area; the points falling into the landslide area are sliding samples, whereas the rest are nonsliding samples. This ensures that the ratio of sliding to nonsliding in the study area is equal to the probability of a coseismic landslide. After experimenting with various sampling intensities, 20 points km −2 was chosen, and the minimum distance between each sample point was 100 m, and a total of 247,080 samples were selected in the study area. The total area of the landslide was 58.55 km 2 , the study area was 12354.1 km 2 , and the probability of a landslide associated with this earthquake (Pcols) was about 0.47%. For the entire training sample database, the sliding sample was 1163, the nonsliding sample was 245917, and the ratio of landslide to nonlandslide was equal to the landslide probability (Pcols), which is also 0.47%. This indicates that the sampling intensity of 20 km −2 satisfies the requirements.

Logistic regression model
The logistic regression (LR) model, which is one of the most widely used machine learning models, is the mathematical model used in this study. To predict the probability of an event occurring in a specified area, LR analysis mainly forms a multiple regression relationship between a dependent variable (whether an event occurs) and multiple independent variables (impact factors). The dependent variable in this study represents the occurrence of coseismic landslides. Its values of 1 and 0 represent the occurrence and absence of coseismic landslides, respectively. The independent variable is each impact factor. The relationship between the probability of landslide occurrence and impact factors can be expressed as follows: (2) Where P is the probability of landslide forecast, the output range of 0-1; Z represents the sum of linear weights after the superposition of factors; Bi is the regression coefficient of each factor; Xi represents each impact factor. 1.0.1g; 2.0.2g Ten factors, including elevation, slope, slope aspect, TPI, distance from rivers, stratigraphic age, distance from the epicenter, seismic intensity, distance from faults, and peak ground acceleration are included in the model after comprehensive consideration of various influencing factors of Gulang earthquake landslide. The topographic data of this study's impact factor analysis come from the ASTER GDEM V1 digital elevation model (DEM) with a spatial resolution of one arc second in the Geospatial Data Cloud Website (http://www.gscloud.cn). We further process it into 20 m resolution DEM data with ArcGIS software to increase the accuracy. The DEM data are used to calculate topographic factors such as TPI, slope, slope aspect, and river. The strata age data come from the  Table 1.

LR coefficient calculation
The 247080 training samples and impact factors were combined, resulting in sample classification information for each impact factor, then the training samples were imported into SPSS software for LR analysis, yielding the regression coefficients of each grade under each impact factor. The regression coefficient indicates the size of the impact factor in this classification corresponding to the dangerous degree or importance. Finally, the regression coefficient of each grade under all ten impact factors (Fig. 3) was obtained, as well as its constant value, B0, as −24.663, when the regression coefficient value of the last grade of each factor was set as 0.  Figure 3. Regression coefficients of each impact factor.

Seismic landslide hazard assessment map
A corresponding weight value was assigned to each grade under each impact factor based on the regression coefficient of the LR model obtained from SPSS software. Based on Formulas 1 and 2, each influence factor layer was overlaid on the ArcGIS platform, and the distribution map of landslide risk index in the area of the Gulang earthquake was finally obtained (Fig. 4).  Landslide hazard index was divided into 1 grade every 0.2, and the study area was divided into the extremely low-risk area (0-0.2), low-risk area (0.2-0.4), medium-risk area (0.4-0.6), high-risk area (0.6-0.8) and extremely high-risk area (0.8-1). Figure 5 shows the landslide risk assessment map in the Gulang earthquake area. The figure shows the extremely dangerous areas for landslides near Dengshanzhuang and the 25-km long section between Xishanpu and Huangyangchuan, as well as the actual intensive areas of landslides. This indicates that the Gulang earthquake's landslide intensive area remains a high-risk area for large-scale landslides and collapse in the future.

Seismic landslide risk at a given PGA value
The landslide occurrence probability under different PGA in the study area was predicted in this research to better deal with the earthquake disaster with different damage degrees, and the PGA was 8 set as the trigger factor and substituted into the existing model. Figure 6 shows the landslide risk assessment map in the study area when the PGA was at five levels: 0.1, 0.2, 0.5, 0.8, and 1.0 g. Similarly, the study area was divided into five levels: extremely low-risk, low-risk, medium-risk, high-risk, and extremely high-risk areas, based on the landslide risk index.

Conclusions
We selected 247,080 sample points in the study area and ten impact factors for LR analysis based on the Bayesian probability theory and real landslide data from the Gulang earthquake. We conducted probability modeling of coseismic landslide occurrence and obtained a landslide risk assessment map in the Gulang earthquake area. Finally, the occurrence probabilities of coseismic landslides in this area under different peak ground accelerations (0.1, 0.2, 0.5, 0.8, and 1.0 g) were obtained. The findings can help prevent and mitigate landslide disasters, as well as provide a technical and methodological reference for future study in other areas.