Agroforestry mapping using multi temporal hybrid CNN+LSTM framework with landsat 8 satellite imagery and google earth engine

Agroforestry is indeed a traditional practice followed in tropical countries like India. About 28.43 million hectare area is used for agroforest cultivation. By 2050 India has the mission of increasing the area under agroforestry to 53 million hectares. In this study, we have made an effort to map the agroforest areas using the geospatial tools and hybrid deep learning techniques. The land utilized for cultivation and various agroforestry activities such as rubber, tea, coconut, and banana plantation were classified as forest canopy by the existing classifiers taking the tree canopy density as a parameter. In light of proposing a solution to the issue, we have put forth a multi temporal hybrid deep learning framework which is a fusion of convolutional neural network, a deep neural net and long short term memory network to classify agroforestry distinguishing it from the forest canopy using remote sensing data. The experimentation was carried out in the southern districts of India, and Landsat 8 imagery was used to classify the agroforestry of the study area that includes tea, banana, rubber, coconut, and crop lands. An efficient multi temporal hybrid deep learning framework was designed to classify the agroforest plantation distinguishing it from crop lands and forest clusters. The experimental results of multi temporal hybrid CNN+LSTM outperformed CNN, LSTM, BiLSTM model reducing the error rate with respective accuracy and kappa score of 98.23% and 0.88. The proposed method provides a benchmark to accurately classify and estimate the LULC, particularly mapping the agroforest plantation for other regions across the country.


Introduction and background
India is one of the countries that is rich in flora and fauna and it is ranked as world's eighth most biodiverse region [1].The biodiverse ecosystem is a key factor which influences the climatic conditions.Agroforestry plays a vital role in sustaining biodiversity [2], improves soil fertility and reduces soil erosion.The focus of India is to increase the agroforest area by 53m hectare by the year 2050 [3].The country aims to increase the tree canopy from 24% to 33% by promoting agroforest cultivation [4].A team from NITI Aayog has made a detailed analysis of the suitability of the wastelands to be converted to agroforests [5].The agroforest area of Koraput district of Odisha, India was mapped [6] by way of extracting and eliminating the forest cover from the classification.Balaji et al [7] mapped the agroforest coconut planatation in the districts of Tamilnadu, India.It is inferred that the tree crops, bamboo, banana plantation, coconut, teak, fruit orchards and other agroforestry are categorized as forest [8,9].It is the need of the hour to develop an improved classifier model to precisely classify the agroforest plantation and other land cover of India.
The land use land cover (LULC) is monitored and mapped by proper class labelling of satellite images and machine learning techniques can be used to automate the same.It is challenging and time intensive to keep the cartographic data updated.Several LULC classifier models were designed to determine spatial and temporal variations in the land cover.There are numerous discrepancies between these estimates and the national estimates.From the literature, it is noticed that several works on agroforest mapping were carried out.
It was observed that there is misclassification of natural forests and human raised plantations like oil palm, and rubber in Ghana and Indonesia as cited by Jacob Abramowitz et al [10] and Inggit et al [11], respectively.In Bangladesh, a standard classification system was introduced to provide consistent Land cover information [12].The National Forest and Tree Resource Assessment (NFA) was administered and the resulting national forest inventory and land cover map was obtained which assessed various land covers in Bangladesh.By merging both the results by harmonizing data and loss of data the final estimate of land cover was made.Huaqiang et al [13] collected training samples based on various types of data and created an architecture using a Decision Tree (DT) and mapped bamboo forest correctly using Landsat 8 Operational Land Imager (OLI) data and MODIS data.By applying pixel unmixing algorithm, distribution of bamboo was mapped.Jinwei et al [14] mapped the rubber plantation in Danzhou region of Hainan Province in China.Hong et al [15] mapped the Mango plantation in Sanya City in China and compared the performance using Random Forest and Support Vector Machine.
Patrick Helber et al [16] created a benchmark dataset EuroSAT and used Bag of Visual Words (BoVW), Convolutional Neural Network (CNN), ResNet50, and a GoogleNet model.EuroSAT dataset has 10 classes comprising of 27,000 images and 2000-3000 images per class with 10m/pixel resolution.In the IEEE GRSS Data Fusion Contest (LULC classification) [17] conducted in 2018, the first prize was received by the team from Wuhan University which used fully convolutional network for classification.It was also noted that multispectral Light Detection and Ranging (LiDAR) signals reduced classification errors.Xiang Xu et al [18] analysed the intrinsic characteristics of hyper spectral imagery by processing the image at subpixel level which showed high signal to noise ratio.Xiaodong Li et al [19] proposed a model where Super Resolution land cover Mapping (SRM) predicts various LULC classes within mixed pixel.In that technique, the pixels were split into subpixels and their purity is examined.It was observed that deep CNNs resulted in high accuracy compared to all other non-deep learning approaches.
Miae Kim et al [20] compared the results of CNN with Random Forest (RF) and Support Vector Machine (SVM) and demonstrated the superior performance of CNN.The standard deviation images [21] were obtained, and by applying linear spectral pixel method, the landscape was classified into shade fraction images and vegetation.Shooshtari et al [22] designed the land cover change model in Neka Watershed, Iran using Multi-Layer Perceptron(MLP).Cai et al [23] designed Deep Neural Networks to classify crop types and used ReLU as activation function.Dogan et.al [24] modelled an artificial neural network to map the land use and land cover in Kastamonu area.Babu et al [25] suggested a method which used pixel based classification with LSTM classifier applying swarm optimization.Wang et al [26] suggested BiLSTM model to map LULC in china.Google Earth engine [27] offers a platform for geospatial analysis and validation of samples using images tagged by users which also provides geospatial datasets.The existing classification techniques classifies agroforestry and agricultural plantation as forests.
From the aforementioned studies, it is comprehended that the plantation forests like oil palm, rubber, eucalyptus, bannana, tea plantation, coconut, and acacia were misclassified as natural forests.The prime focus of the present study is to reduce classification errors and categorize the agroforest areas accurately, discarding forest areas and automatically providing labels to the cartographic data.By facilitating a multi-class classification, it is possible to distinguish the agroforest plantations from the forest area eliminating the inclusion of forest canopy into the agroforest class.
The objective of the proposed work will pave a way in eliminating the classification errors of misclassifying agroforest as forest, and provide an accurate information on the agroforest plantation.
The main contribution of our research includes the following: 1. We have presented a multi temporal hybrid CNN+LSTM Framework to classify LULC, particularly agroforest plantation.
2. A 1D convolutional network is used to extract the spectral signatures.Convolution operation of CNN extracts the spectral signature maps.The LSTM network decides which spectral features to pass through the state and what features to discard which is accomplished by the input gate and the forget gate respectively thereby improving the learning process.
3. We compared the results with the existing studies and other hybrid models like CNN+BiLSTM and CNN +GRU where the proposed CNN+LSTM method showcases a better performance.

Study area
The study area of our research encompasses southern districts of Tamil Nadu viz., Kanyakumari, Tirunelveli and Tenkasi.The Kanyakumari district is geographically located at 8.08 • N 77.57• E, having a lengthy coastal line which is 71.5 km.long with an area of 1,672 km 2 .It lies in the southern tip of India covered by the Indian Ocean in the south, Bay of Bengal in the west and Arabian Sea in the east.Major plantation crops include coconut, rubber, tea and coffee.Other food crops include paddy, jackfruit, mango, and banana.Tirunelveli is geographically located at 8.65 • N 77.383 • E. This district has a diverse flora and fauna which includes mountains, plains, rivers, and thick inland forests.The major agricultural crops include paddy, banana, coconut, spices and other forest based products.The district has a total area of 3,907 km 2 .As of 2019, Tenkasi, which was part of Tirunelveli, became a separate district.Figure 1 shows the geology and study area's map.

Data and methodology
The proposed method involves the following steps viz., (i) image acquisition ,sample Collection and preprocessing (ii) extraction and identification of variables from satellite imagery (iii) class-labelling and selection of classiers (iv) setting a classification scheme (iv) quality checking and comparison of results.

Image acquisition, sample collection and pre-processing
The satellite imagery used in the area of study is obtained from the Landsat 8 which was launched in the year 2013.The image collection comprises a total of 11 bands with a Pan Band (0.50 to 0.68 μm) for which the spatial resolution is 15 m and 8 multispectral bands whose spatial resolution is 30 m captured by OLI and 2 spectral bands captured by Thermal Infrared Sensor (TIRS) with a spatial resolution of 100 m.Spectral bands captured by OLI includes Visible Band (B1, B2, B3), Red band (B4), Near-Infrared band (B5), Short wave Infra-Red: SWIR 1(B6), SWIR 2 Band (B7), (PAN) (B8), Cirrus (B9) and two other Spectral bands captured by TIRS viz., TIRS 1 (B10), TIRS 2 (B11).Satellite Imagery used in the study is derived from the Tier 1 scenes Landsat Collection 2. Multispectral satellite images from the Landsat satellite covering the study area from the districts of Kanyakumari, Tirunelveli, and Tenkasi acquired via Google Earth Engine are atmospherically and geometrically corrected [28].The Landsat 8 image collection of 30m spatial resolution is acquired and the study area is clipped from the obtained imagery.The OLI and TIRS sensors aboard Landsat 8 collect the images of the Earth for every 16 days referenced to WRS-2 and the image collection from the scenes of the study area are collected to a maximum of 22 or 23 times per year.The different seasonal data between 01-01-2018 and 01-01-2021 corresponding to the region of our study were chosen through the Google Earth Engine (GEE) platform.This is further subjected to pre-processing including removal of cloud cover and cloud shadow which is carried out in the Google Earth Engine.Spectral indices like Normalized Difference Water Index (NDWI) [29], Normalized Difference Built-up Index (NDBI) [30], Normalized Difference Bareness Index (NDBaI) [31] and Normalized Difference Vegetation Index (NDVI) [32] are added to the imagery.Furthermore, the mean composite and standard deviation of the collection is computed to strengthen the classification results due to annual variations like seasonal changes which affect the spectral reflectance.Further, kriging, a geostatistical interpolation technique is applied to remove the missing values.The imagery with additional bands for the samples of the respective classes are collected from the study area is trained by the proposed hybrid deep learning model.All these spectral bands added are learnt by the classifier which supports effective classification and mapping of the spectral information into the respective classes.Figure 2 illustrates the steps involved in the pre-processing of the satellite data.
The homogenous data points are identified and suitable class names are given.Samples for all the 24 identified classes are collected using high resolution satellite images from the Maxar Tech provided in Google Earth.Out of the 3302 geocoded samples, 615 samples are validated by means of field visits and the rest were validated with crowd sourced images provided in Google maps, IRS AWiFS LULC data and other mapping platforms This dataset is subjected to quality checking and the errors are corrected manually.The attributes related to class labels, for all the samples were added using QGIS platform.24 classes for the study were identified using previous studies that were done for the region such as National Remote Sensing Centre's (NRSC) LULC 50K (2015-16).These pre-processed samples are used for validation and training.
In the present study, the globally accepted coordinate system, the European Petroleum Survey Group: 4326-World Geodetic System 1984 (EPSG: 4326-WGS 84) is used, as the Landsat 8 scenes are in the same coordinate system.

Extraction and identification of variables from satellite imagery
The collected Landsat images have 11 bands corresponding to the reflectance from different wavelengths.However, as this spectral reflectance varies between different seasons, the mean is computed for the image collection gathered for the entire year.Indices like NDVI, NDBI, NDBaI and NDWI are calculated, and these bands are added to enhance the differences between the classes like vegetation, built-up, barren land, and water thereby improving the classifiers accuracy resulting in a total of 15 bands.The spectral reflectance values of the spectral bands highlight the features of every pixel on earth's surface.NDBI is computed using shortwave infrared (SWIR) and near-infrared (NIR) bands identifying built-up area accurately.NDWI is formulated using NIR and shortwave infrared (SWIR) bands which identifies water bodies.NDBaI is computed using shortwave infrared (SWIR) and red bands which identifies barren land.The median (15 bands) and standard deviation (15 bands) of the collection is computed and the bands are merged together to get a total of 30 bands including the indices.The mean value and standard deviation values are the parameters which contributes to understand the population distribution.As there is a high temporal variability between different seasons for many of the classes, the mean and standard deviation composite improves the classifier's performance.Pre-processing of the Landsat image collection is performed using the Google Earth Engine (GEE) script, removing the cloud cover, and this raster data extracted holds the spectral information of all the bands corresponding to the sample points.Spatial, spectral information, vegetation index and other indices are extremely essential for classifying agroforest, vegetation and forest region.The right parameters from the satellite imagery have been carefully chosen to improve the accuracy of the classifier [33].

Class-labelling and selection of classifiers
With the help of ground truth values, existing maps and geo referencing, the images are labelled into predefined classes viz., built-up, water body, agriculture land, low vegetation/ natural landscape and thick vegetation.The satellite images are subjected to classification and the pixels with higher intra-class similarity are identified.The proposed Multi Temporal Hybrid CNN+LSTM Framework is compared with other hybrid models.

Hybrid deep learning approach-CNN+LSTM
In the proposed method, feature extraction is performed by CNN, and LSTM plays the role of the classifier.Feature extraction from the remote sensing imagery is effectively done by CNN while the temporal dependencies are captured by LSTM and the combination of both effectively handles the spatiotemporal aspects while mapping the agroforest canopy.The hybrid deep learning models have significantly improved the classification accuracy.
Convolutional Neural Network : In the proposed method, the CNN is composed of four one-dimensional convolutional layers and a max pooling layer where 90 output filters where used.To extract features convolutional windows of size two, three, four and five are used and the Rectified Linear Unit (ReLU) is used along with dropout.The extracted features are then given as input to the max pooling layer to further extract high level features and the most prominent ones are preserved avoiding overfitting, thus improving the classifier's accuracy by improving generalization compared to other models.
Long Short-Term Memory Network : The CNN is then preceded by a neural network called LSTM, whose input is received from the CNN.LSTM is a type of RNN which has three gate functions namely forget gate, input and output gate.Every LSTM cell is implemented using the following functions: , 2 The functions of input gate, forget gate and output gate are represented as in t , f t and out t respectively.Here, σ and tanh are the sigmoid and hyperbolic tangent mapping functions.Weights of input gate, candidate input gate and output gate are given by wgt i , wgt c and wgt o respectively.bias i , bias c and bias o represents the bias of input gate, candidate input gate and output gate, respectively.hid t and hid t-1 represent the output of current and previous LSTM cells, respectively.x t represents current input vector.C t and C t−1 represents the current and previous cell memory.In the biLSTM approach, two LSTM layers are used where the data is processed in two directions.Figure 3 illustrates the architecture diagram of the proposed Multi Temporal Hybrid CNN+LSTM Framework.

Setting the proposed classification scheme
In the entire data collected, 70% is utilized for training and 30% is used as test set.The classes which are green shows minimal interclass variation which makes the classification challenging.The results are enhanced by including additional classes like low vegetation and thick vegetation, which have sub classes like rubber, coconut, deciduous forest, evergreen forest, shola grasslands and mangrove forest.As far as agroforestry is concerned, an explicit definition becomes mandatory to exist in order to classify various types of agroforest ecosystem and vegetation.When a classifier is designed, there must be a classification scheme with class labels and the criteria discriminating them, and the definition of boundaries should be unambiguous.A total of 3302 samples are collected with five major classes: built-up (779 samples), water body (413 samples), agriculture land (611 samples), low vegetation/ natural landscape (812 samples) and thick vegetation   (687 samples).The specifics of the classes and its sub classes are presented in table 1.The proposed classification scheme used for the study is illustrated in figure 4.

Quality checking and comparison of results
Out of the samples collected, 70% of the dataset is utilized for model training and 30% is utilized to validate the model and evaluate the model accuracy.The test set is validated, the error rate is calculated, and the results are compared to identify a better classifier.

Accuracy assessment
Metrics like F1-Score, Cohen's Kappa coefficient and IoU score are computed to learn the performance of the proposed multi temporal deep learning CNN+LSTM model and the results were compared with other deep learning models from previous study carried out by researchers.The outcome illustrates that the model proposed has reduced the error rate when compared to CNN, LSTM and BiLSTM.In general, precision reveals the number of samples which belong to a certain class out of all the samples predicted to fall under that class.
And, recall provides a clear indication of the number of samples that actually belong to the class out of the samples that were predicted as class members and non-class members.It gives the probability of identifying the class members.F1-score is a combination of both recall and precision forming a single metric.These three metrics help us understand how accurate the classifier is in identifying the land cover particularly agroforest plantation.Cohen's Kappa coefficient is a metric used to compute the interrater reliability which is best suited for applications involving multiple classes.

Results and discussion
The existing classifiers identifies plantations like coconut, arecanut, rubber, and other dense plantations as forest cover [34,35] since they also have higher canopy densities.Apart from the canopy density and NDVI , this paper attempts to use the spectral features of agroforest plantations like tea, banana, rubber, and coconut to improve classification accuracy using hybrid deep learning techniques.[20]).Extraction of spectral signatures of agroforestry plantation by the 1D convolutional network is carried out in our study to distinguish agroforest and other plantation, and it involves multi class classification.The LSTM network simplifies the learning process by filtering only the meaningful information enhancing the classifier's performance.
The details of the performance measures of different deep learning models are presented in table 2. The classified map based on CNN+LSTM is shown in figure 5. Table 3 shows F1-score for the classes: crop land, tea, banana, savanna, high dense scrub forest, shola grassland, evergreen forest, deciduous forest, rubber, coconut, mangrove forests, and fallow land.Figure 6 illustrates the comparison of the metrics like classification Accuracy, Kappa Score, and IoU score of various deep learning models with the proposed Hybrid CNN+LSTM framework.Figure 7 shows the satellite imagery and the corresponding classification map of the hybrid model CNN+LSTM using Landsat 8 imagery.In figure 7 (a), the landscape around the reservoir Papanasam lower is accurately classified as deep water and the region surrounding the reservoir is classified as forest.
The rocky mountains in the forest, the low dense and high dense urban area and the croplands nearby are classified correctly.In figure 7 (b), the banana plantations are clearly distinguished from the forest canopy in the landscape near Vaigaikulam.In figure 7 (c), the forest canopy and the shola grasslands in the Sivagiri Reserve Forest are distinguished and mapped.Similarly, in figure 7 (d), the croplands on either side of the Thamirabarani river are classified and the forest region is mapped accurately.The landscape in figure 7 (e) is the coastal region of Kanyakumari district near Mutam beach, where coconut plantations are in surplus and are classified successfully without categorizing it as forest.The landscape in figure 7 (f) is the Chitar dam of the Kanyakumari district which is classified as deep water.The deciduous forest and rubber plantation surrounding the dam is mapped without classification errors.CNN+LSTM produced high F1-scores of 0.96, 0.85, 0.92, 0.97, 0.9,0.95,and 0.98 for tea plantation, low dense scrub forest, evergreen forest, deciduous forest and high dense scrub forest,Rubber and Coconut respectively.
From the observations made, it is noticed that the proposed method clearly distinguishes the agroforest from the forest ecosystem and the major challenge with this technique is the collection of samples from different agroforest cluster groups through field survey, minimizing inter-class similarity with varying spectral signatures.In the future this work can be extended to capture the changes in the agroforest canopy dynamically to monitor the demolishment of agroforestry and other agricultural lands.

Conclusion
This paper proposes a multi temporal hybrid deep learning model for classifying LULC distinguishing agroforest plantation from natural vegetation, forest and other plantation.The proposed method distinguishes agroforestry from the forest canopy by extracting the spectral features providing better results than the existing deep learning models.The CNN+LSTM model produced optimal results of classification showing superiority in performance with accuracy and kappa score of 98.23 and 0.88 respectively.This methodology effectively maps rubber, banana, coconut plantation, tea plantation, evergreen forests, deciduous forests and shola grasslands (High altitude grasslands) in the districts of Tamil Nadu.The agroforest plantations like rubber, coconut and banana in the southern districts of Tamil Nadu are mapped distinguishing high altitude grasslands and other vegetation.Experimental results show better performance by adding spectral indices with fusion of bands obtained by computing composite median and standard deviation.The eleven bands and the computed spectral indices have contributed more in distinguishing different vegetation from the agroforest canopy which outperformed MLP, DNN, ANN, LSTM, biLSTM, CNN, and CNN+biLSTM.By utilizing advanced satellite imagery, remote sensing technologies, and geographic information systems, areas that are experiencing demolishment of agroforest plantation in near real time can be identified.In the future, this model of distinguishing agroforest area from other vegetation can be deployed and used across other states in India.

Figure 1 .
Figure 1.Map of the study area.

Figure 4 .
Figure 4. Proposed classification scheme used for the study.

Figure 5 .
Figure 5. Classification of agroforest plantation using hybrid CNN+LSTM in the study area.

Figure 6 .
Figure 6.Performance metrics graph: classification accuracy, IoU Score and Kappa score of various deep learning models.

Figure 7 .
Figure 7. False color composite of satellite imagery and the corresponding classification map of hybrid model CNN+LSTM using Landsat 8 imagery.

Table 1 .
Samples collected under every class in the study area

Table 2 .
Classification accuracy for various deep learning models.

Table 3 .
F1-Score for the classes crop land, tea, banana and savanna, high dense scrub forest, shola grassland, evergreen and deciduous forest, rubber, coconut, mangrove forests, and fallow land.