Synergistic Use of WorldView-2 Imagery and Airborne LiDAR Data for Urban Land Cover Classification

There are lots of challenges for deriving urban land cover types for high resolution optical imagery because of spectral similarity of different objects, mixed pixels, shadows of buildings and large tree crowns. In order to reduce these uncertainties, recently, it’s a trend of the classification of urban land cover from multi-source sensors in the field of urban remote sensing. In this study, a hierarchical support vector machine (SVM) classification method was applied to the urban land cover mapping, using the WorldView-2 imagery and airborne Light Detection and Ranging (LiDAR) data. The results showed that: (1) The overall accuracy (OA) and overall kappa (OK) were 72.92% and 0.66 for WorldView-2 imagery alone; while the OA and OK were improved up to 89.44% and 0.87 for the synergistic use of the two types of data source. (2) Buildings and road/parking lots extracted from fused data were more precision and well-shaped. The two classes from fused data were optimally classified with higher producer’s accuracy and user’s accuracy than WorldView-2 imagery alone. The trees were also easily separated from the grasslands when the airborne LiDAR data was added. (3) The fused data could reduce the phenomenon of different spectral character of the complex and detailed objects. It was also helpful to address the problem of shadows from the high-rise buildings. The results from this study indicate that the synergistic use of high resolution optical imagery and airborne LiDAR data can be an efficient approach to improving the classification of urban land cover.


Introduction
Sustainable and cost effective urban sprawl dynamic monitoring is desirable, but the development and implementation of accurate urban land cover classification remain a significant challenge. It is often difficult to accurately classify urban land cover from Very High Resolution (VHR) optical images alone. Some limitations include: (a) some materials of urban objects have similar spectral properties. For example, some buildings with the grey roofs share similar reflectance characteristics with the asphalt roads; and the spectral properties of high albedo objects such as bare soils, cloud, sand and the buildings with white roofs are similar [1]. (b) Because of the heterogeneity in urban landscapes, the mixed pixel has been recognized as a challenge to improve classification accuracy [2]. (c) Shadows casted by buildings or large tree crowns have become a constraint for the classification [3]. Therefore, recent advances in Earth Observation allow using multi-source high resolution data to conduct the more detailed urban land cover classification.
The fusion of Light Detection and Ranging (LiDAR) data and VHR optical imagery has been used for urban land cover mapping in the emergence. Chen  normalized digital surface model (nDSM) data to classify the urban areas based a hierarchical object oriented classification method. After the height information from LiDAR data was integrated, high buildings and low buildings could be distinguished from the original buildings class. Roads could also be further classified into roads and crossroads [4]. Kyle et al. conducted the land cover classification and mapping by the method of the Classification and Regression Tree (CART), based on the fusion of National Agriculture Imagery Program (NAIP) multispectral data and LiDAR derived data (intensity, digital terrain model and canopy height model). They realized adding the LiDAR height information, compared with NAIP multispectral (MS) data and derived Normalized Difference Vegetation Index (NDVI) data, produced a more accurate urban land cover classification and increased the overall accuracy by 5.2% [5]. Im et al. quantified the urban impervious surface using a synthesis of artificial immune networks, based on the LiDAR nDSM data and WorldView-2 multispectral imagery. They successfully identified the urban impervious surface and the overall accuracy was greater than 90% [6]. Zhou et al. implemented the fusion of WorldView-2 imagery and LiDAR elevation, intensity layer and its pseudo-waveform for classifying image objects using the Kullback-Leibler divergence-based curve matching approach. They showed the use of the fused dataset improved the overall classification accuracy by 7.61% over the use of WorldView-2 imagery alone [7]. Langkvist et al. achieved land cover classification from WorldView-2 multispectral image and airborne LiDAR DSM data using convolutional neural networks. The overall accuracy was up to 90.02%-94.49% [8]. Chen et al. improved the accuracy of thematic classification from hyperspectral and LiDAR data using Support Vector Machine (SVM) with extended morphological attribute profiles (EMAPs) and CNN. Deep fusion of two data sources increases 3%-5% of overall accuracy (Compared with Hyperspectral image alone), 12%-15% of overall accuracy (Compared with LiDAR height alone) [9]. According to the above analyses, we can come to the conclusion that LiDAR data are useful for urban land cover classification, especially for LiDAR-derived height information.
In this study, synergistic use of the WorldView-2 image and airborne LiDAR data, a hierarchical support vector machine (SVM) classification method was used to map the urban land cover types. A compared analysis between WorldView-2 imagery alone and the fusion of this two data source was conducted. The objective of this study is to expand the application of fusion of VHR optical imagery and LiDAR data in the field of urban land cover.

Datasets and Method
The study area is located in the Pingdingshan City, Henan Province, China. Collected WorldView-2 imagery was acquired on 21 August 2014. The 11-bit WorldView-2 imagery consists of one panchromatic (PAN) band with 0.5×0.5 meter pixel size and eight multispectral bands across wavelengths ranging from 400 to 1040 nm, with the spatial resolution of 2 meter. Collected airborne LiDAR data was acquired on 19 August 2013 and were provided in ASCII format, including the X, Y, and Z coordinates and delivered first return intensities. The two dimensional mean point density was 23.0 points/m 2 .The DSM and the DEM were derived from airborne LiDAR point clouds. The nDSM image was generated by subtracting the DSM with DEM, with 0.5 meter spatial resolution (Yan, 2015).   Figure 1 shows the data procession for classifying the urban land cover from WorldView-2 imagery and airborne LiDAR data, Including data pre-processing, a hierarchical SVM classifier construction, accuracy assessment and classification result analysis.

Data pre-processing
During the data pre-processing, the WorldView-2 imagery was rectified to a Universal Transverse Mercator (UTM) coordinate system with the corresponding rational polynomial coefficients (RPC) sensor camera model. The LiDAR nDSM image, with 0.5 meter resolution, was registered to the WorldView-2 imagery (UTM), and the Root Mean Square (RMS) value was less than 1 pixel A Haze-and-ratio-based (HR) fused scheme was used by WorldView-2 MS and PAN bands. The HR fused method is a PAN modulation (PM) based fusion method that takes into account haze [10]. Equation (1)

Hierarchical support vector machine (SVM) classifier
A pixel-based SVM method was used in the classification. SVM is supervised, non-parametric statistical learning techniques designed to solve classification and regression problems [11]. SVM has shown great capabilities to model complex multi-modal, nonlinear data distributions in highdimensional features spaces [12]. In our study, a SVM method with the kernel of Radial Basis Function (RBF) was used to perform the classification. The kernel parameter G was set to 0.1, the reciprocal of band number (10), and the penalty parameter was set to 100 [13].
Based on the traditional SVM model, a hierarchical classification was added. An nDSM threshold determined by the Decision Tree for splitting ground and non-ground objects was 2.4 meter. After mask, the SVM classifier was used to ground and non-ground images respectively. The land cover types in our study area were classified into six classes, including buildings, trees in non-ground objects, and road/parking lots, grasslands, bare soils, and waterbody in ground objects. The waterbody was manually clipped before classification because of the proportion of area of waterbody was little. The pure samples were manually selected by cross-validating geo-referenced field sites and aerial photos. Selected samples polygons were randomly split to training sample datasets and test sample datasets according to 2:1 ratio. The training samples were used to the classification processing of SVM, and the test samples were used to build the confusion matrix and validate SVM modal accuracy. A "Majority/Minority" filtering for each class, with 3×3 window, was employed to reduce the "salt and pepper" phenomenon in pixel-based classification.

Accuracy validation
As for accuracy validation, the random points of every class (the number of random points was Buildings: 200, Trees: 150, Roads/Parking lots: 200, Grasslands: 100, Bare soils: 100) were selected based on aerial photos and field truth data and the producer's accuracy (PA), the user's accuracy (UA), the overall accuracy (OA) and the overall kappa coefficient (Kappa) were calculated. The confusion matrix was built to analyse the classification accuracy.  The hierarchical classification results, of our study area, from fusion of WorldView-2 imagery and LiDAR nDSM data were generated to compare with WorldView-2 imagery alone using a nonhierarchical classification (Figure 2). Then the results of accuracy evaluation are listed in Table 1.

Results and analysis
According to the Figure 2 and Table 1, we generated the finer classification result from fusion of WorldView-2 imagery ad LiDAR nDSM data, compared with WorldView-2 imagery alone. Buildings and roads/parking lots, extracted from fused data, were more precision and well-shaped. Only 68.26% of buildings could be classified correctly from WorldView-2 imagery alone. 96.57% of buildings could be extracted accurately after integrating LiDAR height information. Similarly, the producer's accuracy and user's accuracy of roads/parking lots improved from 64.56% to 86.19%, and from 66.83% to 90.50%. What's more, adding height information, the PA of grasslands was from 70.59% up to 85.71%, and the UA of trees was from 85.53% up to 96.63%. The fused data reduce the misclassification between grasslands and trees. In summary, for the synergistic use of WorldView-2 imagery and airborne LiDAR data, the overall accuracy improved from 72.92% to 89.44%, overall Kappa coefficient improved from 0.66 to 0.87.
A visual comparison is showed in Figure 3. The height information improved the buildings extraction. For WorldView-2 imagery alone, much grey roofs misclassified as roads and parking lots (Figure 3(a)). Then as for the spectral difference in an individual roof, the buildings extracted were well-shaped from fused data (Figure 3(b)). Furthermore, due to the non-orthographic phenomenon in WorldView-2 high spatial resolution data, the roofs and floors of tall-buildings were classified as buildings and roads respectively, and the shadows on the roads, from high-rise buildings, were misclassified as buildings. After adding nDSM data, the high-rise buildings were extracted more precision and the misclassification between shadow and buildings was reduced (Figure 3(c)). In addition, hierarchical fused data classification reduced the mixture of some grasslands and trees. Therefore, height information can contribution to classification (Figure 3(d)). (a) shows WV-2+LiDAR classification reduced the mixture between grey roofs and asphalt roads; (b) presents WV-2+LiDAR classification was valuable for solving the problem of spectral difference in individual roof; (c) illustrates WV-2+LiDAR classification improved the extraction of high-rise buildings and shaded roads; (d) explains WV-2+LiDAR classification classified trees and grasslands more accurately.

Conclusions
In the study, based on the fusion of WorldView-2 imagery and airborne LiDAR data, a hierarchical SVM classification method was applied to urban land cover classification, and compared with the nonhierarchical classification using WorldView-2 imagery alone. Three results were obtained: (1) Compared with WorldView-2 image alone, the overall accuracy and overall Kappa coefficient of classification from fused data improved from 72.92% to 89.44%, and from 0.66 to 0.87 respectively. The major reason was improving the classification accuracy of buildings and roads/parking lots.
(2) Due to adding the height information, fused data reduced the confusion of white roofs with high albedo roads, and grey roofs with low albedo roads. Fused data also reduced the imprecise buildings extraction from spectral difference in a roof due to complex detail of urban objects.
(3) Moreover, fused data could extract the high-rise buildings more precisely due to the LiDAR nDSM orthographic image, and fused data was helpful for addressing the shadows from buildings to some extent.