Natural and revitalized grassy ecosystems as biodiversity refuges: on the abilities of remote sensing for their detection and study

The natural and revitalized on the place of former fields grassy ecosystems can be the refuges (places of conservation in natural conditions) for many types of rare plant species. Such ecosystems are of great interest for the environment exploration and management activities planning over many countries including EU and USA. Samara region territory demonstrates the residual nature of grassy ecosystems as a result of intensive agricultural exploration of virgin grasslands, located here at the past. Consolidative usage of remote sensing data and ground-based surveys is the common practice nowadays. This paper describes the possibilities of high-nature value grasslands classification in Samara Region using intra-annual time series of multispectral remote sensing data of medium spatial resolution with the use of spotted ground-based surveys. Investigations engage the training set generation for the large natural vegetation communities with ground-based survey optimization and include the classification results for two vast grassland objects with protected status.


Introduction
The plant communities of the steppes, in former times extensive on the plain spaces of the temperate and subtropical zones of the Northern and Southern Hemispheres, have been plowed and suffered other forms of anthropogenic transformation of ecosystems [1; 2]. The result of this for the countries of Western and Central Europe and the European part of the Russian Federation was the preservation of virgin steppes fragments exclusively in natural reserves and inconveniences. A significant proportion of rare and protected higher plant species in steppe ecosystems increases their importance as valuable natural heritage sites and biological diversity components of the planet [1]. The international practice of grass communities (steppe and grasslands) conservation in Western European countries is being successfully implemented currently through the protection and monitoring of communities with a reduced agricultural load or lack thereof. The environmental legislation of the European Union (EU) countries provides, in particular, preservation and monitoring of the so-called "high nature value (HNV) farmlands" (for example, [3][4][5][6][7]). As a result of the adoption of these laws, the so-called HNV indicators system was introduced, which requires all EU member states to create a IOP Publishing doi: 10.1088/1742-6596/1368/3/032021 2 nationwide system for monitoring HNV areas. These monitoring systems relate to objects with a complex spatial configuration, which makes field monitoring costly and time consuming.
An effective way of monitoring optimization is the introduction of remote sensing (RS) methods, the experience of which is already significant in EU countries [8][9][10][11][12]. The possibility of recognizing of steppe communities with different species composition is determined by their visual distinguishability and is characterized by confinement to a complex of relief, hydrological and soil conditions. At the same time, there is no doubt about the need to conduct complex surveys on reference land sites, the results of which can be used to adequately process remote sensing data and build classifiers [13][14][15]. For the Samara region, the practice of using RS materials is mastered by structures that organize environmental protection activities, in particular, in determining the boundaries of protected areas (nature reserves), which is generally close to the practice of the land cadastre. However, the automatic differentiation of area types, covered with vegetation, especially the identification of natural ecosystems with different species composition, needs a deeper qualitative study of materials, involving the development of adequate classifiers and the selection of qualitative recognition patterns.
This article considers the possibility of automatic classification of natural grassy communities of the Samara region according to RS data and data of ground-based point surveys. Section 2 provides a general description of the grass communities characteristics of the Samara Region, with the rationale for the potential recognizability of certain grass communities types. Section 3 provides an overview of world experience in the field of grass vegetation classification using RS with the rationale for the chosen recognition method. Section 4 presents the details of the study, namely the study area is described, the choice of data of a specific spacecraft as RS data source is justified, the classification algorithm is given, the construction of the training set is described in detail, results of classification are also given.
In this paper, intra-annual time series of multispectral RS data of Sentinel-2 satellite is proposed to use for the grass communities recognition, which allows obtaining a sufficient number of images with spatial resolution up to 10 m for the growing period. The Support Vector Machines method with Radial Basis Function (SVM-RBF), well-proven for classifying hyperspectral data [17][18], is chosen as the classifier. Particular attention is paid to the construction of a training set, due to a small area of the control and measuring polygons (plots), on which ground surveys were conducted. For the generation of reference plots of adequate sizes, it was proposed to combine data from ground-based measurements, high-resolution remote sensing data from open sources and results of clusterization of NDVI indices composite for the initial growing period. Such approach allows optimizing the number of ground surveys and their complexity. Experimental studies have shown good consistency of classification results with visual analysis of a series of seasonal images.

General characteristic of Samara region grassy communities particularities
The areas of steppe vegetation of various types, that survived in the Samara region, including the largest of them, located on the territory of the Bolshechernigovsky district, are confined to certain landforms, specific soil conditions, and moisture regime. The average size of such areas is quite large (hundreds of hectares) and is characterized by a high degree of irregularity of the borders.
The Samara region is characterized by the predominance of such grassy communities as dry meadows, fescue steppe, wormwood, and grass steppe and fescue-feather grass steppe. For these kinds of steppes, the following physiognomic aspect differences can be distinguished, affecting the reflective properties and forming various textures of images of these areas on satellite images: the grass stand height, the percentage of the grass stand projective cover characterizing the presence of open soil within the plant area, and the photosynthetic pigments concentration in the leaves affecting both the plants colour and the intensity of photosynthesis during the vegetative cycle ( Figure 1).
For the Samara region and for the considered dominant steppe types, the grass stand height, depending on the type of steppe phytocenoses examined, can vary in a range of values from 20 cm to 1 m. The projective cover of vegetation can reach 100% (dry meadow), or it can be characterized by a significant degree of open soil between clumps of cereals (fescue steppe). A dry meadow, that occupies low relief areas and temporary watercourses, is characterized by the brightest reflection in the green part of the spectrum, while the wormwood-grass steppe is distinguished by the lowest The steppes are characterized by different species richness of higher plants, by the total species number, and by the rare species representation (rare, included in the Red Books of various levels: Russia, Samara region and other regions). Therefore, another important parameter for assessing grass communities is the number of rare plant species growing within them, which can be determined in the framework of ground-based observations. Thus, the types of steppes, that dominate in the Samara region, potentially have the necessary characteristics, that allow them to be recognized by RS data, and their identification is of great importance for environmental protection activities.

The world's experience of grassy ecosystems classification using RS data
Comprehensively, the problem of grass vegetation classification and its characteristics extraction by satellite images is considered in the works [19][20][21]. These papers consider the use of various principles of image acquisition (active and passive), spatial resolution (from low to very high), spectral resolution (multispectral and hyperspectral sensors), and temporal resolution (frequency of images obtained during the growing season). Particular attention is paid to the choice of vegetation classification methods, the advantages and disadvantages of various classifiers are discussed. Also, various aspects of the classification of grass vegetation are considered in more narrowly directed works [22][23][24][25][26][27][28][29][30].
In general, it can be noted, that the use of a series of images during the growing season provides more opportunities for classification, compared to the use of one image [12; 22; 23]. However, this significantly increases the feature space dimensionality and, as a result, computational complexity. For dimensionality reduction, as a rule, the principal component analysis (PCA) or the replacement of several spectral channels with a channel, containing the values of the normalized difference vegetation index NDVI, is used. But here there is a risk of a loss of some of the relevant information. In addition, in some works, the use of NDVI for steppe vegetation is questioned. As the study [24] concluded that this index is poorly suited for biomass estimation in drylands.
Until recently, the vegetation classification with remote sensing data was carried out mainly into rather large classes: according to climatic zones (alpine tundra, mountain steppe, desert steppe), types of plant communities (meadows and steppe, agrocenoses, forests, salt marshes, etc.). However, with the advent of new, often open, Earth remote sensing data with high resolution values (spatial, spectral and temporal), more and more studies have been devoted to more detailed classification, for example, the search for invasive plants [25; 26] or the classification of grass vegetation by species composition [12; 19; 27-29].
Among the classification methods, the most used are the support vector machine (SVM) method, decision trees, object methods, methods based on fuzzy logic, as well as various combinations of these methods. At the same time, the article [31] notes that SVM classifiers, characterized by selfadaptability, fast learning speed and limited requirements for the training set size, have proven their reliability in the field of intellectual processing of remote sensing data.
The review allowed the authors to focus on the study of the grass vegetation classification using a series of multispectral remote sensing data during the vegetation period, with dimension reduction using PCA and the SVM method as the classification method. Within the framework of the article, the recognition of steppe sites of different types was carried out using two territories in the vicinity of Bol'shaya Chernigovka village. The study sites include the Specially Protected Natural Area (PA) "Urochische Mulin Dol" ("Mulin Dol natural landmark") and "Uchastok tipchakovo-kovyl'noy tzelinnoy stepi" ("Plot of fescue-feather grass virgin steppe"), as well as the territory, located in the vicinity of these PAs (Figure 2). For brevity (in the map), let's denote these areas as PA "Mulin Dol" and PA "Fescue-feather virgin steppe" respectively.

Used RS data
The typical size of the analyzed objects -segments of steppes with a uniform species compositionranges from several tens to several hundred meters across, therefore high-resolution and mediumresolution satellite images are acceptable.
To solve the problems of the estimation of the state of steppe-type ecosystems on the territory of the Samara region, the author team analyzed the available open medium resolution RS data for 2018, which resulted in the selection of the priority data source and the procedure for its processing. A strong requirement for RS data when classifying vegetation is the presence of an infrared channel, as the most informative when analyzing vegetation cover. In addition, given the rapid flowing of the growing season, the short revisit period plays an important role. As a result, the choice was made between the data of the Sentinel-2A and 2B satellites (MSI instrument) [32] and Landsat-8 (OLI instrument) [33]. Their comparative characteristics are presented in Table 1.  Table 1 shows the advantage of Sentinel-2 images over Landsat-8 in all major indicators: spatial resolution, revisit frequency, swath width. As a result, data from Sentinel-2 satellites (A and B) were selected for the research.
The main criterion for the selection of images was the complete absence of clouds within the study area during the whole vegetation period from April to October 2018. Nine images were selected, the dates of which are shown in Table 2. The used Sentinel-2 data corresponded to the L2Ac product level (with performed atmospheric correction) and contained information from 8 spectral channels in the wavelength range from 496,6 nm to 864,8 nm with a spatial resolution of 10 and 20 m. As a result of preliminary processing, the spectral channels from 20 m of spatial resolution were converted to 10 m resolution and a integrated 72 channel composite was compiled from the 8-channel images obtained.

Classification methods and algorithms
For the classification of steppes in this article, the method of Support Vectors Machines with Radial Basis Functions (SVM-RBF) is used. The software implementation of the classifier was performed in the MATLAB environment.
Briefly, the classification process can be described as follows: 1. Feature Calculation. For feature calculation, the Earth remote sensing data normalization is used with the subsequent reduction of the dimension of the feature space using the principal component analysis method. Normalization is performed using linear contrasting of each channel of the composite, bringing its values to the range from -1 to 1.
2. Spatial preprocessing. Since the SVM method is sensitive to noise, a median filtration in a 3×3 pixel window was used to remove random outliers in the source data.
To train a classifier, a reliable ground-truth data is necessary.The training set generation process is discussed in more detail in the next subsection.

Training set construction on the base of several data sources analysis
For the considered PA, there is no up-to-date detailed map of vegetation types, that could be used for the classifier training and classification quality estimation. Therefore, to build a training set, first, RS imaging clusterization was used to select potential sites, that are of interest for research. Then, on the sites identified, ground-based point studies were carried out by environmentalists from the team of authors of the article to describe these sites and identify the classes of recognizable plant communities.
To determine the grass communities, that are potentially separable according to the RS data, images for the most informative for this type of vegetation phenological period -the initial growing season -were used. Since the NDVI vegetation index allows to determine the vegetation development stages quite well, the composite of RS images for clusterization was composed from the NDVI values, calculated from Sentinel 2 data from 1 to 30 May 2018. As a result, a composite with 5 channels was obtained, containing at each point the value of the NDVI vegetation index for different dates: 2, 14, 22, 24 and 27 May.
In order to identify the vegetation classes, that are potentially divisible using RS images, and the corresponding sites, the generalized EM clustering algorithm was applied to the NDVI composite [34]. This algorithm estimates the number of clusters and their parameters in accordance with the model of a mixture of Gaussian distributions.
Regarding the task of determination of reference sites for ground-based studies, the formed cluster marks should correspond to certain plant communities, the main difference between which is the time of formation of the first shoots and dense green vegetation cover.
Within the largest areas after clusterization, that correspond to different classes of natural vegetation, points for a ground survey were set (16 points, Figure 3). One of the criteria for points selection was their accessibility (location near field roads) and grouping (to minimize the time of movement from one point to another). Environmental specialists, as part of the team of authors of the article, collected data of the state of natural communities at these points in the summer of 2018. During the ground-based complex survey and data processing, environmentalists examined these areas and assessed compliance of the clustering results and species diversity of vegetation in their territory. The greatest number of points corresponded to the steppe ecosystems, for which it was possible to select the largest number of sites of sufficient area and with a clear differences in species composition. For steppe areas in the vicinity of selected points a geolocation, a comprehensive survey of vegetation cover, and an assessment of significance by the presence of rare species were performed.
The ground surveys results were combined with the clusterization results and the following main classes of natural vegetation detected by the clustering method under consideration were identified: 1. fescue-wormwood grass steppe -points 1, 14; 2. meadow steppes in depressions and shrubs along the rivers -points 4, 5, 8-10; 3. fescue-feather grass steppe -points 7, 13, 15; 4. motley grass-feather steppe and saline meadows -points 2, 3, 6, 11, 12, 16. Additionally, using high resolution data from open sources (Yandex maps [35]), classes such as forest areas and shrubs were identified. Also a class of steppe areas, exposed to mowing and grazing As a result of the analysis of clusterization areas, ground survey data, and high-resolution images, a final list of classes was compiled (Table 3) and areas for the classifier training were selected ( Figure  4). Table 3. Land cover classes used in the classification of steppe territories. №  Title  1  Water  2  Forest  3 Bushy communities 4 Steppes in lowlands 5 Fescue-feather grass steppe 6 Motley grass-fescue-feather grass steppe 7 Motley grass-fescue steppe, possible mowing and pastures 8 Fescue-wormwood grass steppe In addition, the results of ground surveys were included in the system of the regionally verified base of reference sites for ground support of Earth remote sensing, created in recent years by the joint efforts of specialists in the field of geoinformatics and ecology of Samara University. For different reference sites, depending on the type of steppe communities, the presence of 3 to 13 higher plants species, included in the Red Book of the Samara Region, was recorded, with an average level of 8-9 species. This makes it possible to consider the remaining areas of natural steppe vegetation as potential refugiums, where rare plant species, related to different protected status, grow with high probability, and heightens the importance of identifying and monitoring of such fragments of the Samara region and adjacent territory.

Classification of natural grassy ecosystems territory
The classification was carried out according to the description, produced in subsection 4.3. As the features, components of PCA decomposition were used, in which the main energy of the image is concentrated. For the studied image (multispectral composite), the main energy -0,98% -was concentrated in the first nine components of the transformation. However, since the quality of the classification is not unambiguously related to the quality of the presentation of the source data, it made sense to determine the set of features "with a margin", so it was decided to take the first 40 components of the transformation.
The results of the classification are presented in Figure 5. For the considered territories there is no up-to-date and detailed map of vegetation species, that could be used for classifier training and quality estimation (as, for example, data on forest vegetation taxation or data on agricultural crops). In this regard, the work of the classifier can be evaluated numerically only on the training set. On the training data, the error probability is 0,0034 (evaluated using cross validation method), which indicates a good separability of classes and a good quality of the training sample (there are no admixtures of extraneous, not considered classes). For the entire considered territory of the natural steppe communities, the classification results were evaluated visually.
The classification allocates extensive areas of fescue-wormwood, fescue-feather and motley grassfescue-feather grass steppes. These homogeneous areas of a large area correspond to the virgin steppe. The steppe areas, which are deposits, are characterized by a mosaic combination of several steppe classes. Plots of forest and shrubs were identified with good quality, which can be approved by highresolution images from open sources. The areas corresponding to the class of "motley grass-fescue steppe, mowings and pastures" were not found in places other than the training set, which requires further study.

Conclusions
The article examines the possibility of grass communities classification in the Samara Region using remote sensing data. For this purpose, an analysis of typical grassy communities for the Samara region was carried out, highlighting the target classes and compiling their semantic description. As a result of the analysis of world practice in this area, the SVM method and composites of multispectral RS images of medium resolution for the vegetation period were proposed to use for the classification. For the classifier training, an article proposes an approach for reference sites selection based on preliminary clustering of NDVI vegetation index composite for different dates within the growing period and point ground surveys within the boundaries of the largest clusters. Experimental study of the classifier was carried out on the territory of two protected areas of Samara region.
According to the experiments with the steppe vegetation classifier, we can draw the following conclusions. The proposed approach to the selection of reference sites allows to "precisely" select ones, taking into account their deciphering using the RS images used, thereby reducing the amount of field research and increasing their efficiency. The constructed steppe vegetation classifier provides high accuracy on the training set. The results of the classification are generally consistent with the visual analysis of satellite images for the period under consideration (April-September 2018).