Validation of FABDEM, a global bare-earth elevation model, against UAV-lidar derived elevation in a complex forested mountain catchment

Space-based, global-extent digital elevation models (DEMs) are key inputs to many Earth sciences applications. However, many of these applications require the use of a ‘bare-Earth’ DEM versus a digital surface model (DSM), the latter of which may include systematic positive biases due to tree canopies in forested areas. Critical topographic features may be obscured by these biases. Vegetation-free datasets have been created by using statistical relationships and machine learning to train on local-scale datasets (e.g., lidar) to de-bias the global-extent datasets. Recent advances in satellite platforms coupled with increased availability of computational resources and lidar reference products has allowed for a new generation of vegetation- and urban-canopy removals. One of these is the Forest And Buildings removed Copernicus DEM (FABDEM), based on the most recent and most accurate global DSM Copernicus-30. Among the more challenging landscapes to quantify surface elevations are densely forested mountain catchments, where even airborne lidar applications struggle to capture surface returns. The increasing affordability and availability of UAV-based lidar platforms have resulted in new capacity to fly modest spatial extents with unrivalled point densities. These data allow an unprecedented ability to validate global sub-canopy DEMs against representative UAV-based lidar data. In this work, the FABDEM is validated against up-scaled lidar data in a steep and forested mountain catchment considering elevation, slope, and Terrain Position Index (TPI) metrics. Comparisons of FABDEM with SRTM, MERIT, and the Copernicus-30 dataset are made. It was found that the FABDEM had a 24% reduction in elevation RMSE and a 135% reduction in bias compared to the Copernicus-30 dataset. Overall, the FABDEM provides a clear improvement over existing deforested DEM products in complex mountain topography such as the MERIT DEM. This study supports the use of FABDEM in forested mountain catchments as the current best-in-class data product.


Introduction
Space-based, global-extent digital elevation models (DEMs) are key inputs to many Earth sciences including flood hazard mapping (Sampson et al 2016, Garrote 2022, hydrological modelling (O'Callaghan and Mark 1984, Beven and Freer 2001, Gharari et al 2020, shortwave radiation estimates (Garnier and Ohmura 1968, Dozier and Frew 1990, Marsh et al 2012 , landslide hazard assessment (Fenton et al 2013), ecology (Illés et al 2011, Moudrý et al 2018, geomorphology (Liu et al 2009), and landform classification (Reu et al 2013). However, many of these applications require the use of a 'bare-Earth' digital elevation model (DEM) versus a digital surface model (DSM), the latter of which may include systematic positive biases due to tree canopies in forested areas (O'Loughlin et al 2016). These biases occur as vegetation obscures and/or obstructs the Earth's surface from most space-based remote sensing techniques (Magruder et al 2021) resulting in the

Methodology
Digital datasets that describe the elevation of the Earth's surface may either include the natural and built surface features (e.g., vegetation, buildings) or may only include the Earth's surface. The former of these are referred to as Digital Surface Models (DSM) and the latter are Digital Elevation Models (DEM) that are 'bare Earth' representations of the surface. Herein, these terms are used to distinguish between the datasets. When both DEM and DSM are referred to collectively, the term 'dataset' is used.

Data
To evaluate the accuracy of the FABDEM DEM, a reference DEM is needed. This reference DEM was acquired from a UAV-based lidar platform. Due to the slow and low flight of a UAV platform, compared to other remote sensing platforms, a significantly higher number of lidar returns are available (>100 pts m −2 ) and therefore it is possible to more accurately characterize the sub-canopy topography. FABDEM was also compared to the hydrologically conditioned, vegetation-removed, MERIT DEM. The MERIT DEM was included due to its widespread use, as well as to provide a benchmark sub-canopy dataset to compare FABDEM. As MERIT and FABDEM are derivative from the SRTM and Copernicus-30 DSMs respectively, both SRTM and Copernicus-30 datasets were included in the comparison. Other SRTM-derivative datasets, such as NASADEM, are not considered in this comparison. This is firstly due to focusing on vegetation-removed DEMs (i.e., further comparisons of DSMs against a lidar sub-canopy DEM are not particularly instructive). Secondly, there is a strong motivation to consider near-global datasets. Many datasets, e.g., NASADEM, exclude much of northern Canada, Alaska, Norway, Sweden, Finland, and Russia. These are rugged areas with significant boreal forest canopy coverage and are areas with users who would be primarily interested in DEMs such as FABDEM and MERIT.

UAV-Lidar
The Fortress Mountain Research Basin lidar data was acquired in late summer (September 2018) with a Riegl miniVUX1-UAV lidar with an integrated Applanix APX-15 inertial measurement unit (IMU) mounted on a DJI M600 Pro UAV platform. Complete information on the data collection, processing, and accuracy assessment, including data and code, for the UAV-lidar DEM generation, can be found in Harder et al (2020). The area of interest was mapped in a sequence of terrain following flights at m s 7 , 1 ⋅ -110 m above the ground surface, and with 80 m parallel line spacing. The highest elevations of the basin area >2500 m.a.s.l. were not surveyed due to visual line-of-sight limitations and logistical challenges. Absolute positioning of the platform was facilitated with the logging of Global Navigation Satellite System (GNSS) data over the course of the campaign with a Leica GS16 base station located within the flight area and precise point positioning with post-processing with the Natural Resources Canada PPP system (Natural Resources Canada 2023). Post-processing (see Harder et al (2020) for more details) integrated IMU and GNSS base station data within the Applanix POSPAC UAV software to generate trajectory data of the UAV-lidar system. The RiProcess software suite, propriety Riegl software, integrated and optimized the trajectory and laser scan data to generate a raw lidar point cloud (figure 1, top left). The raw point cloud was classified into surface and vegetation points (figure 1, top right), with the lasground_new function of LAStools (Isenburg 2023) which differentiated ground surface points from vegetation with an adaptive TIN (Triangular Irregular Network) algorithm adapted from Axelsson (2000). A 0.1 m resolution DEM (figure 1, bottom left) and canopy height model (CHM) were generated with the blast2dem and lascanopy functions, respectively, in LAStools. Uncertainty of the DEM was assessed with respect to 119 surface elevations observed with a Leica GS16 RTK rover-base GNSS system. Vertical differences between the manual survey and the UAV-lidar DEM were found to have Root Mean Squared Error (RMSE) of 0.11 m, 0.13 m, and 0.14 m for open, shrub, and forested surfaces respectively (Harder et al 2020).  Specifically, the version 3, void-filled SRTMGL1Nv003 (https://lpdaac.usgs.gov/products/srtmgl1nv003/) product was used in this study.

Copernicus-30
The Copernicus-30 (COP30) TanDEM-X X-band global radar 1' (≈30 m) DSM (AIRBUS 2020). It is currently considered the most recent and most accurate global DSM (Gesch 2018, Grohmann 2018, Guth and Geoffroy 2021, Hawker et al 2022. Not only does COP30 include more recent topography, but it also uses the Earth Gravitational Model 2008 (EGM2008) geoid model (Pavlis et al 2012) as compared to EGM1996 as used by the other datasets. EGM2008 is substantially improved over EGM1996 and Goyal et al (2021) reported differences between the two up to 12 m between the two models. The COP30 dataset is an ESA/Airbus product acquired between 2010 and 2015.

Study region
The study region is a 4.74 km 2 domain, shown in red in figure 2. It is situated almost entirely within the Fortress Mountain Research Basin (FMRB), located in the Kananaskis River Valley, Alberta, Canada. In figure 2 the black outline is the FMRB proper, the red outline is the lidar acquisition study region, and the blue line is Fortress Creek. Shown in green is the vegetation coverage greater than 3 m, derived from Potapov et al (2021) for visualization purposes. Topography from the Copernicus-30 DSM dataset is shown in greyscale. The FMRB is representative of the heavily forested, subalpine, rugged mountain terrain of the Canadian Rockies. Each dataset is shown in figure 3. The basin ranges in elevation from 2000 m to 2900 m (study area: 1936 m to 2348 m) and comprises forested slopes transitioning into above-tree-line alpine (Harder et al 2020). The FMRB forests are primarily composed of fir and spruce ranging from 4 m in height in the younger stands to ≈9.5 m in the mature stands (Langs et al 2021) transitioning to larches near the treeline, this is shown in figure 4. The distribution of canopy height is shown in figure 5. The distribution is split for vegetation £ 3 m and > 3 m, corresponding to the threshold for the vegetation-removal algorithm in FABDEM (see data section for more details).

Processing
The lidar dataset was reprojected to geographic WGS84 from UTM 11N and a bounding box of the irregular lidar domain was computed. The COP30, MERIT, SRTM, and FABDEM datasets were then extracted to this rectangular subdomain. As each dataset is aligned to a different grid, each dataset was regridded to a common alignment (defined by the lidar dataset) and resolution (30 m) using bilinear interpolation (2 × 2 window) via the QGIS Align Rasters functionality. This was the only non-automated step in this workflow. These aligned datasets were then clipped to match the irregular domain of the lidar dataset (red outline in figure 2) using the gdalwarp tool from the Geospatial Data Abstraction Library (GDAL; GDAL/OGR contributors 2022). Once aligned and clipped, all the aligned datasets were projected to EPSG 32611 (UTM zone 11N) to be able to compute secondary landscape characteristics. These characteristics were Terrain Position Index (TPI) and slope, both generated using the gdaldem GDAL tool.

Landscape attributes
The complex topography present in mountain headwater catchments represents a mix of steep slopes and forested surfaces that could prove to be problematic with deforestation algorithms. Topographic indices derived    from remote sense elevation data are commonly used as surrogates for field measurements (Moudrý et al 2018).
In this study, two landscape attributes were computed for each input elevation dataset and compared to those computed from the lidar. Although an almost infinite set of possible elevation-derived landscape characteristics could be compared, slope and TPI were selected due to their common usage in the hydrological and environmental sciences. Slope was selected as it is the first derivative of the elevation field. The use of slope is widespread in many Earth science investigations such as in the: simulation of subsurface flow (Beven 1981;Beven and Freer 2001), impact of topography on near-surface wind fields (Ryan 1977, Wagenbrenner et al 2019, Dujardin and Lehning 2022, estimation of avalanches (Bernhardt and Schulz 2010), impact of steep terrain on incident shortwave radiation (Garnier and Ohmura 1968, Dozier and Frew 1990, Marsh et al 2012, use in hydrological modelling (Kienzle 2011, Gharari et al 2020, simulation of blowing snow (Marsh et al 2019), estimation of ice field katabatic winds (Conway et al 2021), and landslide hazard assessment (Fenton et al 2013). The correct estimation of slope in complex terrain is critical when distributing meteorological inputs (Kienzle 2011) or comparing model output to remotely sensed data (Vionnet et al 2021). It is a key secondary characteristic of the landscape that should be estimated correctly. The slope was computed using the gdaldem slope tool with default parameters.
The Topographic Position Index (TPI) is a landscape classification index (Weiss 2001) that measures the relative topographic position of a point as a difference between its elevation and the mean elevation within a predetermined neighbourhood (Weiss 2001, Reu et al 2013. TPI is scale-dependent and is highly influenced by the quality of the DEM. By thresholding these TPI values, a landscape can be classified into discrete slope position classes and combined with elevation to produce landforms (Weiss 2001). TPI has had substantial use in, for example, geomorphology (Liu et al 2009), soil mapping (Illés et al 2011), watershed metrics (Weiss 2001), and ecology (Giroday et al 2011). A comprehensive summary can be found in Reu et al (2013). TPI is a key landscape attribute that has broad interdisciplinary application. TPI was calculated via the GDAL TPI function using the default (and non-configurable) search radius of the surrounding cells.

Comparison metrics
The difference between the lidar reference elevation dataset and the candidate elevation dataset, i.e., COP30, FABDEM, SRTM, or MERIT, is described by one of three metrics: difference, root mean squared error, and mean bias. These are described briefly below. Each method is used for elevation, slope, and TPI comparisons.

Slope
Slopes of the aligned datasets were compared spatially to the lidar-derived slope. The slope difference (per equation (1)) is shown in figure 7 with Root Mean Squared Errors (RMSE) and Mean Biases summarized in table 2.
The COP30 and FABDEM slope differences tended to be quite similar, and areas of large slope difference were co-located with the areas with steep gradients and in the transition areas between higher and lower error, as seen in figure 6. The COP30 RMSE (1.122°) and bias (0.121°) are slightly smaller than that of FABDEM's RMSE  (1.317°) and bias (0.224°). The MERIT error pattern was similar to that of COP30 and FABDEM, however in increased magnitude. The substantial error in elevation in the northern portion of the domain was reflected in large errors in slope estimates. The RMSE (3.179°) and bias (0.60°) were worse than FABDEM and COP30. The SRTM slope error pattern closely matched that of the elevation errors with substantial North-South striping. The RMSE (3.685°) and bias (0.755°) were the worst.

TPI
The aligned datasets were used to produce the slope and were compared spatially to the lidar-derived Terrain Position Index (TPI). The TPI differences (per equation (1)) are shown in figure 8 with Root Mean Squared Errors (RMSE) and Mean Biases summarized in table 3.  The COP30 had the smallest errors in TPI (RMSE = 0.366, bias = −0.001) with the main spatial pattern along the westward-facing slope. The FABDEM had a slightly increased RMSE (0.480) and bias (−0.002), with a similar spatial pattern. Small differences along the eastern portion of the domain corresponded to the steep slopes in this area. The MERIT RMSE (0.869) was the second worst and had the worst bias (−0.005). The terrain features along the North portion of the domain had a further knock-on impact, producing nonsensical TPI values. The SRTM striping resulted in similar spatial error patterns once more, with a North-South stripping with some East-West features. This resulted in the worst RMSE (1.088) and the second-worst bias (0.004).

Elevation
The distribution of elevation across the domain is shown below in figure 9. The Copernicus-30 (COP30) is shown in red, FABDEM is shown in yellow, the resampled lidar is shown in green, the MERIT is shown in blue,  and SRTM is in purple. All the datasets had a generally positive bias versus the lidar at low elevations, and a low bias in the mid-elevations. In general, all the datasets have the same shape of the distribution of elevations. The MERIT DEM notably underestimated the occurrence of mid-elevations (≈2150 m) and was the most different of the distributions. The SRTM overestimated the occurrence of the 2100 m elevation band. Both MERIT and SRTM both overestimated high elevation (2300 m) occurrences.

Slope
The distribution of the slopes derived from each elevation dataset is shown in figure 10. The Copernicus-30 (COP30) is shown in red, FABDEM is shown in yellow, the resampled lidar is shown in green, the MERIT is shown in blue, and SRTM is in purple. The SRTM had a substantially different distribution than the other datasets. It had a low slope bias, overestimating the occurrence of low slopes (<10°). The SRTM underestimated the second peak at ≈17-20°and overestimated the occurrence of large slopes. The FABDEM generally matched the lidar the most closely, although it underestimated slope occurrences in the ≈15°region. The MERIT DEM is more similar to SRTM and underestimated occurrences of the low slopes and, similar to COP30, overestimated the mid-slopes (≈17°-20°). All elevation datasets generally captured the bimodal distribution, with the SRTM being the weakest in doing so.

TPI
The distribution of TPI derived from each dataset is shown in figure 11 with the TPI derived from the Copernicus-30 (COP30) shown in red, FABDEM shown in yellow, the resampled lidar is shown in green, the MERIT shown in blue, and SRTM in purple. The MERIT DEM produced the most biased distribution of TPI, heavily biased towards small, slightly negative TPI values. SRTM DSM is the opposite, with too many high and low TPI values. COP30 DSM almost exactly matched the lidar values, whereas FABDEM DEM slightly overestimated these small, slightly negative TPI valuesi.e., the small depressions in the slopes.

Elevation
The distribution of elevation differences across the domain is shown below in figure 12. The COP30 is shown in red, FABDEM is shown in yellow, the MERIT is shown in blue, and SRTM is in purple. Negative values reflect the dataset being at a higher elevation than the lidar. These distributions of differences reflect the range of values informing the previously noted RMSE and bias, with errors due to a positive bias versus the lidar. The SRTM DSM had a more closely centred distribution around the 0 m elevation, showing a more equal mix of over and underestimation. The MERIT DEM had a notable underestimation bias, whereas the FABDEM DEM had an overestimation bias.

Slope
The distribution of slope differences across the domain is shown below in figure 13. The COP30 is shown in red, FABDEM is shown in yellow, the MERIT is shown in blue, and SRTM is in purple. Negative values reflect the dataset covering a larger slope than the lidar.  The FABDEM and COP30 distributions were well-centred at 0°. The MERIT DEM and SRTM DSM showed an increasing prevalence of larger slope differences. In these distributions, the SRTM DSM clearly performed the worst of all the other datasets.

TPI
The distribution of TPI differences across the domain is shown below in figure 14. The COP30 is shown in red, FABDEM is shown in yellow, the MERIT is shown in blue, and SRTM is in purple. Negative values reflect the DEM is at a higher elevation than the lidar.
Like the elevation and slope, the COP30 and FABDEM were visually indistinguishable. The MERIT and SRMT error distributions were, respectively, increasingly worse showing a much larger distribution of errors.

Consideration of 3 m threshold
The FABDEM only has bias corrections for canopies exceeding 3 m. The impact of this threshold is shown in figure 15. The errors associated with a location with a canopy below 3 m are shown in red, and the errors associated with a canopy larger than or equal to 3 m are shown in blue. There was an overestimation in the FABDEM elevations in the regions where the vegetation removal was not run. In the areas of tall canopies, there was a slight negative bias in the FABDEM.

Discussion
The COP30 DSM is considered one of the best, free, global DSM datasets currently available (Gesch 2018;Grohmann 2018, Guth and Geoffroy 2021, Hawker et al 2022. The application of COP30 to environmental forecasting is already providing improved results, e.g., Garrot (2022). The derivative dataset, FABDEM, represents the first 30 m de-forested DEM. Thus, there is a substantial opportunity for FABDEM to provide an improved and usable DEM in complex topography for use in the environmental sciences in locations where the COP30 DSM has too much vertical bias due to tall vegetation canopies. However, the accuracy associated with machine learning approaches in complex mountain topography to de-bias DSM datasets remains an open question. Recent advances in UAV-based lidar systems are providing unprecedented sub-canopy lidar coverage, allowing for detailed canopy height models. In turn, this allows for the creation of low-error reference DEMs that can then be used to validate DEMs.
When compared to the lidar DEM, the FABDEM had the smallest elevation RMSE and mean bias, representing a substantial improvement over existing datasets such as MERIT. Indeed, it also suggests that the vegetation removal over Fortress Mountain Research Basin was generally successful. The areas of largest error in the COP30 dataset almost exactly match the spatial extents of the tallest canopies in this domain. In these tallcanopy areas, the FABDEM substantially improves upon COP30. The shape of the elevation distributions was more similar to that of the resampled lidar (30 m), with a negative bias offset. Without this offset, the two distributions are almost identical, suggesting that the small-scale features are well captured by the FABDEM DEM. This is also reflected in the slope distribution which is almost identical to that of the resampled lidar. The exception is the representation of the ≈17°slopes whose occurrence is slightly underestimated by FABDEM. The width of the distribution of TPI is similar to that of the lidar, however, there is an over occurrence of low TPI topography versus the lidar. Both these areas of underperformance are likely due to the smoothing passes performed as a post-process step in the creation of FABDEM. Additional improvements in the RMSE of FABDEM are likely attributed to the improved gravitational model used by COP30 (and by extension, FABDEM). COP30 uses the Earth Gravitational Model 2008 (EGM2008) geoid model (Pavlis et al 2012) as compared to EGM1996 as used by the other datasets. EGM2008 is substantially improved over EGM1996 and Goyal et al (2021) reported differences between the two up to 12 m between the two models.
Due to the hydrological conditioning of the MERIT DEM, it has been widely used in hydrological applications. However, the coarse resolution of 90 m has clear limitations in the complex topography of mountain basins. It is clearly better than the SRTM DSM, however, the artefacts around a few of the gullies in this domain are problematic. In a comparison of SRTM and MERIT in the Rockies, Hirt (2018) found that MERIT had improved void filling with fewer artefacts than SRTM. In the study region herein, the co-location of the artefacts with hydrological features, such as gullies, may attribute these errors to the hydrological conditioning. This is further substantiated as the SRTM data does not have these artefacts. Due to the colocation with steep slopes, it may also be related to void-filling strategies, e.g., Bildirici et al (2007). Despite these artefacts, it was found that MERIT is greatly improved versus the SRTM DSM in this area, which agrees with previous work (Hirt 2018). The strong striping in the SRTM dataset was exceptionally problematic. Such patterns have been described in various SRTM evaluations (Yamazaki et al 2017, Goyal et al 2021. The stripping pattern would have profound impacts on many processes' simulation and quantification, and these data should be used with care in steep mountain basins. The FABDEM elevation bias and RMSE were the best of all the datasets, improving upon the COP30 dataset, and these improvements are clearly a result of the vegetation debiasing. The SRTM DSM has a clear midelevation bias from including the vegetation canopy. However, the FABDEM debiasing improvement comes at the cost of slightly reduced slope and TPI scores, although differences are so small (RMSE differences of 0.2°and 0.1°respectively) that the tradeoff is almost certainly worthwhile. However, smoothing may be problematic in areas of low slopes, such as the interior plains or Arctic coastal plains.

Conclusions
Space-based, global-extent digital elevation models are an integral part of quantifying Earth system processes and are a key input for hydrological models. Obtaining high-quality elevation data in steep, forested mountainous terrain is a key challenge for simulating headwater mountain catchments. Without correcting for forest cover, hydrological and ecological process predictions in complex, forested terrain may end up biased. Due to the complex land-atmosphere interactions with various environmental processes, the use of deforested DEMs is strongly motivated. Although aeroplane-based lidar data have been available for mountain basins, the recent deployment of UAV-based lidar systems has further constrained the sub-canopy estimates of elevation in a way previously not possible. This is due to the substantially denser point clouds, with many more sub-canopy returns helping to better resolve the sub-canopy topography. These sub-canopy elevation data sets provide an optimal dataset with which to evaluate existing (MERIT) and new (FABDEM) deforested DEMs in a steep, representative mountain catchment.
The use of machine learning to improve the fidelity of remote-sensing based, global datasets is a clear opportunity to further improve the inputs to various Earth-science workflows. However, the uncertainties associated with machine learning approaches to complex mountain topography remain an open question. In this study, the evaluation of FABDEM, SRTM, Copernicus-30, and MERIT datasets against high-fidelity subcanopy lidar data presents a new opportunity for quantifying the error in these datasets.
It was found that the new FABDEM had a 24% reduction in elevation RMSE (2.58 m versus 1.95 m) and a 135% reduction in bias (−1.933 m versus 0.685 m) compared to the Copernicus-30 dataset from which it is derived. The spatial elevation error in Copernicus-30 was colocated with the tall vegetation in the domain, these areas had reduced error in the FABDEM. Further, the FABDEM better preserved the shape of the elevation and derivative parameters such as TPI and slope versus the other datasets of MERIT and SRTM. The FABDEM elevation was the best of all the datasets, clearly a result of the vegetation debiasing although the Copernicus-30 had slightly better slope and TPI RMSE/bias versus the lidar. However, the differences between FABDEM and Copernicus-30 are so small (RMSE differences of 0.2°and 0.1°respective) that the tradeoff is almost certainly worthwhile.
Overall, the FABDEM provides a clear improvement over existing deforested DEM products in complex mountain topography such as the MERIT DEM. The preservation of key aspects of the distribution of derivative landscape products such as slope and TPI suggests that the use in hydrological and ecological models would benefit from these high-resolution data, especially when simulating processes with substantial heterogeneity. The domain evaluated here is representative of the steep, forested mountain headwater basins present in the Canadian Rockies. Further comparisons should be conducted in the future over broader extents as more UAVlidar data becomes available. However, this study supports the use of FABDEM in forested mountain catchments as the current best-in-class data product.