Extracting shallow water depth from the fusion of multi-temporal ICESat-2 data and multi-spectral imageries

With the development of satellite remote sensing and laser altimetry data, the fusion of laser altimetry data and multi-spectral images for shallow water depth inversion has become an economically convenient way. However, there are few methods that take into account the temporal dimension of data to integrate the results of water depth inversion in multiple temporal phases. In response to this issue, this article proposed a shallow water depth inversion method that integrates multi-source and multi-temporal remote sensing data. This method utilized the random forest (RF) algorithm to estimate the water depth values at different time, taking into account the contextual information of Sentinel-2 imagery and using the overall least squares as the theoretical model to fuse the multi-temporal water depth inversion results. This article took the Yongle Islands in the South China Sea as the research area and conducted the shallow water inversion experiments using ICESat-2 (Ice, Cloud and Land Elevation Satellite 2) data from 5 temporal phases and one Sentinel-2 imagery. The results show that the mean value of RMSE (root mean square error) and R2 (determination coefficient) of the proposed method using single temporal imagery was 1.53m and 0.7610, which outperformed the inversion accuracy of traditional methods that ignore image context information. The RMSE and R2 of the multi-temporal fusion model was 1.15m and 0.8622, which was 0.08m and 0.0199m higher than existing median filtering fusion method.


Introduction
Islands and reefs are closely related to human activities, and the study of their surrounding shallow water topography is of great significance for production economy, maritime rescue, route management, and other activities [1].Shallow water topography is an effective indicator of the coastal environment of islands and reefs [2], and bathymetry is a key step to obtain underwater topography in shallow water areas.Rapid and efficient acquisition of shallow water depth is currently a research focus [3]- [5] .The traditional method is mainly based on ship borne sonar measurement, which has high accurate results, but requires more manpower and material resources, especially in areas such as islands, reefs, and coastal zones where are difficult to access due to rights disputes and thus unable to conduct on-site shallow water depth measurement.In contrast, the wide field of view, strong presence, and low cost of satellite remote sensing data make it an important means of water depth measurement, facilitating realtime, synchronous, and continuous monitoring of large areas [6].In the 1970s, Lyzenga et al. [7] proposed a method for inverting shallow water depth using multi-spectral images.Subsequently, relevant researchers proposed numerous shallow water depth inversion methods using remote sensing data.Experiments have shown that the efficiency of shallow water depth measurement using remote sensing data is significantly improved compared to traditional methods [8], [9].
With the development of remote sensing technology, several researches on water depth extracting have been conducted.Wang et al. established a distributed SVM model using IKONOS-2 multi-spectral image and airborne lidar water depth sampling data, and the results were relatively reliable [10].However, this method relied on in-situ data.Qiu et al. compared RF with single band model, dual band ratio model, and multi band linear model, finding that RF was superior to multi band linear model, and both of them were significantly superior to single band model and dual band ratio model [11].In addition, the fusion of active and passive optical remote sensing data for water depth extracting has also been the main research direction in recent years.Its advantages lie in improving the accuracy of water depth inversing, and reducing the dependence of active optical data and the costs [12].The multi-source water depth inversing method has further developed since the emergence of ICESat-2 satellite in 2018.ICESat-2 has high orbital photon resolution and is not limited by time and space, but it can only provide altitude profiles along the orbit, and there are observation gaps between parallel orbits [13], causing the sparse distribution of data in large-scale water depth measurements.Therefore, combining ICESat-2 measurement data with multi-spectral remote sensing imagery for water depth inversion has become one of the main research directions at present.Pan Chunmei et al. [14] fused the TM extracting results with the raster SAR extracting results at the pixel level, and used the weighted average value of the water depth extracting results from two temporal images as the pixel final result; Deng et al. [15] applied QuickBird to extract the water depth of the Beilun River mouth, but the result was affected by clouds.Melsheimer et al. [16] proposed an extracting method without water depth control points based on the different tide levels in multi-temporal remote sensing images; Ye et al. [17] established a multi-temporal single band model using TM images and applied it to extract the water depth in Jiaozhou Bay; Liu [18] proposed a multi-source and multi-temporal extracting method utilizing image segmentation, and conducted water depth fusion experiments on homologous, multi-temporal images (TM/ETM+) and multi-source, multi-temporal images (TM/ETM+ and SPOT-5), respectively.
Although the existing methods for depth extraction are relatively complete, there are still several points worth paying attention to.Firstly, the correlation of pixels should be considered to establish the water depth extraction model.Traditional methods use the pixel values of each band in optical images as features, without considering contextual features.Ai et al. established a convolutional neural network (CNN) model considering the spatial correlation of pixels, which to some extent improved the accuracy of water depth extraction [19].However, the extracting results had significant errors, making it difficult to meet practical application needs.Secondly, it is necessary to integrate multi-source and multitemporal data to improve the accuracy of water depth extracting model.Most existing methods focus on the extracting of water depth from multi-source single temporal remote sensing data, and use other temporal data as validation sets, without fully utilizing the existing information within the study area.Traditional multi-temporal fusion methods, such as median filtering, are considered as simple and effective fusion method, but their ability to resist noise is poor, and the fusion results are unable to reflect the true water depth.Therefore, how to integrate multi-temporal remote sensing data to obtain highprecision shallow water depth extracting results is an existing major problem.
This paper used Sentinel-2 and ICESat-2 for shallow water depth inversing, proposed a feature extraction method that took neighbourhood information into account, and used the RF model to estimate the shallow water depth at each temporal phase in the study area.Then the global least squares estimation method is employed to achieve the fusion of water depth extracting results at different temporal phase, which improved the accuracy of water depth extracting results by integrating multi-source data and multi-temporal data.

Study area
The study area in this paper is Yongle Islands, located in the Xisha Islands in the South China Sea, whose longitude and latitude range are 15°46′to 17°07′N and 111°11′to 112°06′E, respectively.It is composed of eight reefs distributed in a ring shape, with an area of about 8 square kilometers.The geomorphic characteristics of each reef in the archipelago are similar, and the underwater topography is rich.Except for Money Island, the depth of the inner sea area enclosed by other reefs is less than 50 meters, and the transparency of the shallow water area around the islands and reefs ranges from 15 to 20 meters, with a maximum of 34 meters.

Data and preprocessing
We used ATL03 photon data of ICESat-2 measurement from five different times in the study area, with data collection times in February, April, May, July 2019, and July 2020, respectively.To reduce the impact of photons that were not in the shallow sea area on the results, shallow sea photons were intercepted for depth extracting, as shown in figure 1.The acquisition condition and data distribution were shown in table 1 The remote sensing image adopted the Sentinel-2 image taken in 2019.The Sentinel-2 configured Multispectral Imager (MSI) can cover a total of 13 bands from visible light to shortwave infrared, with different spatial resolutions (10m, 20m, and 60m).This article selected L1C level data with less than 10% cloud cover, which was then processed through ortho-rectification and subpixel level geometric precision correction to get the top of atmosphere (TOA) reflectance data products as the original data.
In terms of validation data, due to the access to measured water depth data in the South China Sea region is not publicly available, this article used independent photon data from ICESat-2 measurement at the study area as the validation set for the extracting results.

Preprocessing of Sentinel-2 imagery.
This article used the sen2cor tool in SNAP software to generate L2A level products for Sentinel-2 imagery, which carries out atmospheric correction and radiometric calibration.Land areas were removed by fusing bands with spatial resolution of 10m (bands 2, 3, 4, and 8).

Preprocessing of ICESat-2 data.
The multi-beam laser altimeter system ATLAS (Advanced topographic lidar altimetry system) carried by ICESat-2 satellite adopts a photon counting detection system, resulting in high point cloud noise [13].Therefore, this article first set an altitude threshold (10m) for preliminary denoising.To ensure the consistency of multi temporal photon data, this paper adopted the TPXO9-atlas tidal correction model to correct ICESat-2 photon data [20].On this basis, the geographic inverse encoding was applied to the photons to obtain the corresponding coordinates on the Sentinel-2 image.

Water body separation base on adaptive variable ellipse filter
ICESat-2 is influenced by atmosphere and other factors, so the ATL03 photon data usually contains a lot of noise, which cannot be directly used for extracting sea surface and water bottom information.In addition, it is also a major challenge to adaptively obtain photon data from the sea and seabed based on the density differences of photons.Aim at the former issues, we used a local adaptive variable ellipse filtering method [21], [22] to classify signal and noise photons, which can automatically determine filter parameters based on the density distribution of photons in different environment and water depth to achieve adaptive separation of sea surface, water bottom and noise photons.

Water depth extracting based on RF model
RF model is adopted to perform regression, decision trees is used as the basic regressor for Bagging integration.The reflectance values of pixels from multi-spectral remote sensing image corresponding to water bottom photons were used as features, and the water depth values were used as labels.By adjusting parameters to train the model, shallow water depth prediction was achieved.
Most water depth extracting models using remote sensing optical imagery are achieved by constructed the relationship between the spectral features and water depth, but they mainly consider the one to one variable relationship and ignore neighbourhood information.Therefore, this paper incorporated neighbourhood information and used the spectral values within the neighbourhood window with size of k × k (k is experimental set to 3) on the imagery, i.e. image block, as features.However, using the spectral values within the neighbourhood window with size of k × k on the imagery as features increased the feature dimension.Traditional methods used methods such as principal component analysis (PCA) to reduce feature dimension, but their feature selection was unsupervised, making it difficult to ensure that the processed features are beneficial for water depth extracting.RF model is a meta estimator that fits multiple decision trees on each seed sample taken from dataset and uses the mean method to improve prediction accuracy and control over-fitting.Therefore, it has high training efficiency and accuracy, and can provide powerful feature combinations for water depth extracting.Therefore, we directly used image block as feature input.The specific process is shown as figure 2, where image blocks of each band are stretched into k 2 vectors, which were then concatenated to k 2 × 4 vector and used as the feature input to train the RF model.

Fusion of multi-temporal extracting results
The classical error theory believes that there is inevitably an error in each observation of a specific target.In order to obtain a solution that is closer to the true value of object measurement, redundant observations are often made on the same target, then an over-determined equation is constructed to obtain a reasonable estimate of unknown parameters based on the least squares criterion.The results of multi-temporal water depth extracting at the same location can be seen as multiple observations of water depth values of the same object.In this scenario, the observation equation was obtained, seen in equation (1).When the number of time phase is greater than or equal to 2, the least squares solution can be performed to obtain the estimated value of water depth X.

H=BX
(1) Where H is the column vector composed of observation values at different time phase, which are the water depth values extracted from ATL03 photons; X is the estimated water depth which is unknow; B is a column vector composed of 1.
In the process of using multi-source and multi-temporal data to invert water depth, due to differences in imaging conditions, the accuracy of tidal correction results, and the parameter errors of the RF algorithm, the results often contain varying degrees of gross errors.If the least squares solution is used for multi-temporal fusion, the results will be affected by the gross error of the observation values, resulting in a decrease in the accuracy of the results.The conventional method of using median filtering for multi-temporal result fusion can eliminate the interference of gross errors to some extent, but cannot effectively utilize multi-temporal extracting results.
It believed that the anomalous values of water depth, which bring gross errors, in the multitemporal inverse model results existed in the coefficient matrix B, rather than the weight matrix, resulting in a certain error E B .The error E B and the estimated water depth X were taken as unknown parameters, and the total least squares estimation (TLSE) was used for multi-temporal water depth data fusion.Firstly, the observation equation to equation was modified to the equation (2).
Where V is the correction item of the observation item H.
The error E B in B was considered, and treated the weight matrix as a unit matrix, then the least squares solution was modified to equation (3).
The observation equation was used as a constraint and the Lagrange multiplier method was used to solve the extreme value of the condition, then the normal equation was derived as equation (4).

X=(B T B)
- Where v is a scale factor.Both sides of the above equation contain unknown quantities, so iterative solutions were needed.If the initial value of the scaling factor v (0) =0, then X (1) =(B T B) -1 B T L. During i-th iteration (i ≥ 2), the solutions were calculated following equation ( 5) and equation (6).
Where i represents the number of iterations.When the absolute value of the difference between X (i) and X (i+1) was less than the threshold ε (empirically set to 0.00001), the iteration stopped.

Precision metrics
To verify the accuracy of water depth extracting result, a quarter of randomly photon data were selected as validation data to evaluate the results of water depth extracting result.R 2 and RMSE are calculated using equation (7) as the precision metrics.

RMSE=(
∑ (y ̂i-y i ) Where y i is the water depth value of validation set, assumed as true depth value; y ̂i is the estimated water depth value; y ̅ i is the average value of true water depth; n is the number of photons in the validation set.

Parameter setting
This paper established a RF model for shallow water depth inversing based on one single temporal phase data.The iteration times was set to 50, the maximum tree depth was set to 10, the minimum number of sample data was set to 4, and the leaf nodes was set to 5.

Water depth extracting results
For method comparison, a multi-temporal fusion method based on median filtering was implemented to compare with the proposed TLSE fusion method, and a RF model based on one pixel features was also constructed to compare with the extracting results of that based on neighbourhood window features.The comparison results are shown in table 2. In terms of feature type, under the condition with TLSE fusion method, the methods using neighbourhood window features yielded a RMSE of 1.15 m and a R 2 of 0.8622, which was better than the RMSE of 1.18 m and R 2 of 0.7408 using one pixel features, the same of the condition with median filtering fusion method.In terms of fusion method, under the condition with neighbourhood window features, the methods using TLSE fusion method yielded a RMSE of 1.15 m and a R 2 of 0.8622, which was better than the RMSE of 1.24 m and R 2 of 0.8423 using median filtering fusion method, the same of the condition with one pixel features.From the inversing results of validation data, 100 samples were randomly selected to observe the two fusion methods.The true depth values and fusion depth values were compared, and it was found that the results based on TLSE fusion were more in line with the true water depth data.The comparison results are shown in figure 3. The R 2 values of TLSE fusion method and median filtering method were 0.8622 and 0.8423, and the RMSE values were 1.15m and 1.24m, respectively.This indicated that the former fusion method had a smaller degree of dispersion and performed better, which meant its results were more in line with the true depth values.Therefore, the water depth fusion results of multi-temporal data using TLSE fusion method are shown in figure 4.

Conclusion
This paper proposed a water depth inversing method that integrated multi-source and multi-temporal remote sensing data, which proposed a feature extraction method that took into account the contextual information in the image and TLSE fusion method to fuse the inversing results of multiple-temporal data.This improved the extracting accuracy and efficiency of RF model.Through experiments, it can be seen that the methods using neighbourhood window features achieved a higher R 2 and a smaller RMSE than the methods using one pixel features, which showing that the contextual information provided by the neighbouring pixels could help model learn more knowledge about spatial correlation between features and water depth.It can be proven that the contextual information can improve the accuracy of water depth prediction to a certain extent.Also, using TLSE fusion method gained a higher R 2 and a smaller RMSE than the median filtering fusion method, which proved that the TLSE fusion method helped the model to capture the temporal variation to improve its accuracy.However, median filtering fusion method didn't fully use multitemporal data, which causing some information underlying in the inversion results with variable errors ignored, thus the model was sensitive to the temporal variable and failed to get a higher accuracy.It showed the TLSE fusion method had a potential in multi-temporal data fusion.With the combination of neighbourhood window features and TLSE fusion method, the study achieved a RMSE of 1.15 m and a R 2 of 0.8622.

Figure 3 .
Figure 3.Comparison of consistency between true depth and fusion depth from TLSE and median filtering method.

Figure 4 .
Figure 4. Fusion water depth map in the Yongle Islands with TLSE fusion method.(a)Shiyu -Jinqing Island and Duncan Island, (b)Yinyu Island and All-wealth Island, (c)Robert Island and Coral Island, (d)Money Island and Antelope Reef.

Table 1 .
. An overview of ATL03 data information in the study area.

Table 2 .
Model comparison in respect to feature selection and multi-temporal data fusion method.