Suitability model using support vector machine for land use planning scenarios in Java Island, Indonesia

Java Island has experienced rapid population growth in the past four decades. Along with population growth, human needs are also increasing. This is also driven by economic and social growth. However, land resources to meet population needs are becoming increasingly limited. Human needs change ecosystems by creating ecological pressures, such as land cover change, resource extraction and depletion, and emissions pollution. Thus, sustainable land use planning is needed to meet human needs in the future. Furthermore, It is necessary to consider the suitability of the land in land use planning. In this paper, a suitability evaluation method was developed, which synthetically considered the topographic, meteorological, ecoregion, and water supply conditions. The suitability evaluation model is developed using Support Vector Machine (SVM) to classify various designations. SVM training algorithm aims to find a hyperplane that separates the dataset into a discrete predefined number of classes in a fashion consistent with the training examples. The term optimal hyperplane is used to refer to the decision boundary that minimizes misclassifications, obtained in the training step. Model calibration and validation were performed based on the land-use status in 2016. Subsequently, the validated simulations were conducted based on the planning scenarios. This model is expected to be implemented in the short-term and long-term land planning scenarios, respectively. This study provides a synthetic suitability evaluation method for creating a land-use planning scenario, which overcomes the shortcoming of the traditional way of assigning land-use scenarios that being lack of objectivity.


Introduction
The land can produce products in the form of goods and services (supply) to meet human needs (demand). The intended needs include food, water, air, homeostasis, rest, and excretion as the most basic human physiological needs that must be met based on Maslow's Hierarchy of Needs [1]- [4]. Land products include food, fiber, and environmental services such as water supply, and airflow management. The ability of the land to produce such products depends on the ecological quality [5], [6]. Environmental quality is affected by environmental stress (environmental distress syndrome) due to changes in ecosystems to meet human needs, such as changes in land cover, resource extraction and depletion (such as deforestation and overfishing), disposal and pollution emissions and modification and movement of organisms [7], [8]. The resulting environmental impacts include, but are not limited to, climate change, land degradation, loss of biodiversity, and environmental pollution [9]- [11]. These environmental problems have led to reduced land productivity, and this has become a global issue in recent decades [4], [12]. Therefore, sustainable planning is needed that takes into account not only social and economic needs but also environmental sustainability.
Land suitability is an essential component in development activities to meet human needs. Meeting human needs through development activities requires land allocation. Land-use allocation is the determination of the amount of land for a particular use (or not used) through legal and administrative steps, which will ultimately lead to the implementation of planning [13]. Thus, land use allocation determines economic development performance and environmental quality [14]. Therefore, the allocation of land becomes a critical stage in development planning.
The determination of land use allocation can be done using the concept of spatial pattern recognition and land suitability. The concept of land suitability aims at optimal and sustainable land use [15]. This can be achieved by providing information on the relationship between land characteristics. Land characteristics are land attributes that can be measured or estimated, such as slope, rainfall, soil texture, and vegetation [15], [16]. There are several land suitability assessments or land evaluation assessments built based on the classification framework of the United States Department of Agriculture (USDA) (1961) and the Food and Agriculture Organization (FAO) (1976) [15], [17], [18]. The models developed are generally based on location (spatial) and based on deterministic or probabilistic approaches. The land characteristics of the assessment system are assumed to be used in this study to determine the direction of land-use allocation locations.

Case Study (Java Island, Indonesia)
The aim of this study is to analyze the land-use suitability using Support Vector Machine (SVM), with Java Island as a case study. The land and sea area of Java Island provides a variety of natural resources, both biotic and abiotic, which are of high economic and ecological value. The population growth rate in Java in 2018 reaches 1.23% with a population density of 1,317 people / km2, and 56.7% of the total population lives in urban areas [19]. The population of Java Island is almost the same as the total population of Kalimantan, Sulawesi, Maluku, Papua, and NTT. Energy consumption in Indonesia is focused on Java, or more than 60% of the total national consumption because 57% of consumers are in Java [20]. The volume of energy consumption per capita reached 883.91 kg of oil equivalent in 2014 and increased by 5.9% in 2016 [21]. Indonesia's CO2 emissions increased from 0.41 metric tons per capita in 1975 to 1.8 metric tons per capita in 2014 [22], [23]. Energy consumption and CO2 emissions have been on an upward trend in recent years. Empirical evidence shows that energy consumption increases carbon emissions, and economic growth is a significant contributor [24].  Figure 1. Case Study -Java Island, Indonesia

Suitability Modelling using SVM
Ecology and environmental changes can be ascertained by land cover and land use as the ground assumption [25]. The land cover portrays heterogenous Earth surface's physical properties, for instance, waterbody, wetland, vegetations, and so forth. The physical conditions of Earth's surface will potentially prompt the utilization of land for social and economic activities, described as land use, for example, industrial and agricultural areas [17], [26], [27]. Proper planning and management on land use could enable development with the least ecosystem damage [28].
In order to optimize and ensure the sustainability of land use, the concept of land suitability is employed in this research, by providing information on the relationship between the characteristics and the quality of the land. Land suitability indicates how to fit a particular class of land for the specific utilizations [28]. Land suitability could be assessed for both presents (actual) or after improvement (potential). It is evaluated by matching the land characteristics and its uses based on standard defined criteria. The results serve planners to identify optimal land use [29].
Land characteristics are commonly used as parameters in land suitability analysis. In this study, ecoregion, slope, elevation, and water provisioning are the parameters besides land-use for the suitability analysis. Socio-economic parameters can also be used for the analysis, such as population density, distance to the main road, and so forth. Socio-economic parameters are needed when searching for suitable locations for socio-economic activities, such as industry, trade, settlements, and others. Land use/land cover data, as the main parameter in this research, is originally classified into 22 classes. Reclassification to 8 classes is made based on the standard of land cover classes in the interpretation of Medium-Resolution Optical Imagery, including forest, wetland agriculture, dryland agriculture, plantation, built-area, grassland, embankments, and conservation area. Table 1 shows the data used in this study. This research utilizes the Support Vector Machine (SVM) to generate a suitability evaluation model by classifying various designations. SVM is an approach of classification algorithm which produces a hyperplane that represents the optimal division of linearly separable classes [25]. It is commonly used for hyperspectral image classification and object detection, and also developed in many applications, including multispectral analysis [25], [30]. In this study, SVM was employed with the kernel trick, which is a probabilistic approach as a computational algorithm. Kernel trick allows SVM to work on a non-linear problem, makes SVM a link from linear to nonlinear that simplify the dimensional  [32]. The classification deals with two foremost steps, determining the system of classification and deciding the training samples.

Significance Test of Parameters and Sampling
The parameters of elevation, slope, water availability, precipitation, temperature, and ecoregion are assumed to be related to land-use. The statement becomes a hypothesis (H0) that needs to be tested with a significance test. A significant understanding in research is the level of confidence in a hypothesis, whether the hypothesis was accepted or rejected. The level of significance in statistics is expressed as a number in percent, which indicates the likelihood or risk of errors in the results of tests conducted. The  (Table 2). This means that based on test results, there is still a 2.4% chance of error and a 97.6% chance of correctness. This study uses the commonly used confidence level, which is 0.95 or 95%. It means a 95% chance that the results can be trusted. Thus, for a 95% confidence level, the hypothesis that all parameters have a relationship to land-use is accepted. Sampling was conducted on a population of 159,757 grids in Java Island. The sample size in this model is 1,050 grids by taking 150 sample grids for each type of land-use so that there are 1,050 sample grids for seven types of land-use (other than protected/conservation areas). The number of sample points refers to the standard number of samples in SNI ISO 19157: 2015, ie, with a population of data of 159,757 grids (between 150,001 -500,000 grids) a minimum number of 800 grid samples is used [33].
Sampling was carried out using the stratified random sampling method. This method allows the population to be divided into strata or classes that do not overlap or be divided into several subpopulations that each sub has homogeneous samples [33]. Each class, in this case, the land-use class, has its characteristics. For each type of land-use, the determination of 150 sample grid locations is carried out randomly in all regions of Java.

Calculation of Land Suitability Probability Values
In this model, all grids in Java already have the six attributes of the physical parameter values and landuse class in 2016. In the previous stage, samples were also determined. The sample grid is used as a training point for classifying other grids. Based on a sample grid that contains physical parameter values and land use classes, SVM forms hyperplane lines with certain margins that separate one land-use class from another. In theory, one land-use class can be changed to all other land use classes, but with varying degrees of suitability. Probability values are used to represent the level of suitability of land use. SVM does not provide probability values at the time of classification. Therefore, SVM is combined with the error-correcting output coding (ECOC) algorithm. ECOC is a kind of method which converts a multiclass classification problem into two-class classification problems. In ECOC, the code bit values in each column of the error-correcting code matrix are used to reclassify the training samples. Thus, the probability value of land suitability of the seven land use classes can be calculated. Table 3 is the probability value for an n-grid in Java Island. With the ECOC SVM method, each grid in Java has seven probability values for each land-use class, which is between 0% to 100%. The land-use class, with the highest probability value, is considered the most suitable land-use class for a grid. For example, grid A has the highest probability value in the wetland agriculture class, which is 88.20% and the lowest in the embankments class, which is 0.01%. As such, grid A is best suited for the wetland  The highest probability value also indicates the level of confidence of the land-use suitability model. The highest probability value in Java reached 0.995004 or 99.5004%, with an average of 0.693878 or 69.3878%. Based on the distribution graph, the highest probability values on the entire grid converge above the average (> 69.3878%). The statement was also supported by the results of the visualization of the distribution of the highest probability values in Java, which showed that from 159,757 grids in Java Island, the probability values were dominated between 75,001% -100% ( Figure 3).

Figure 3. Distribution of maximum probability values of each grid
The validity of the land-use suitability model can be seen by cross-tabulation between the total quantification of the model and the existing land use ( Table 4). The area difference between the model and PL 2016 shows the accuracy of the land suitability model. Also, the broad percentage pattern of each land-use class in the model shows the same pattern as the existing land use. This probability value is influenced by parameters calculated in the model. From the results of the modeling, the highest probability on each grid shows the suitability of the type of land use. Some land-use classes have a larger area of modeling results than existing land uses. This means that when the class has a wide deficit, it can be partially fulfilled with a grid that has land suitability. For example, plantation classes have a broad percentage based on the model of 40.31%, while existing conditions are 35.73%. If the allocation of the plantation class experiences a deficit, then it can be fulfilled from the difference in the area of land that has the suitability of the plantation land. Table 4

Conclusions
The determination of alternative locations for potential land use can be done probabilistically using the SVM method by taking into account the physical parameters of the land, namely height, slope, water availability, ecoregion, precipitation, and temperature. However, most grid probability values are quite high (>75%). Probability values of land use suitability can be increased by adding parameters that certainly have a significant relationship with land use.