Fine-resolution estimation for urban surface water pollution susceptibility with multi-modal earth observation data

The sustainability and suitability of water resources are of great importance for maintaining urban populations. The landscapes and environment around urban waters have always been the main focus of maintaining water quality for sustainable water supplies. Early-stage field investigations recognized the influence of land use/land cover (LULC) on water quality. To extend the research scope in spatial and temporal dimensions, remote sensing techniques have been utilized to discover the relationships between LULC and water quality. However, these remote sensing datasets generally had a medium spatial resolution, making them unable to support the fine-detailed land classifications that are critical to explore the water quality in an urban area. Moreover, although more details regarding the land surface are available from the currently-generated high-resolution and very-high-resolution remote sensing images, this land surface information is too complex for the state-of-the-art deep learning approaches and benchmark datasets. This manuscript reports our efforts on developing a framework to explore the fine-resolution relationship between surface water pollution and LULC. To address the cost of computing time and limitations of well-labelled datasets, we employ a foundation model-enhanced approach for water extraction and water-surrounded LULC classification. We propose an estimator of surface water pollution susceptibility to main pollutants based on the surrounding LULCs. Selecting the Future City of Beijing as the study area, based on very-high-resolution remote sensing images, the experiment proved that our proposed approach could effectively map the susceptibility of surface water pollution caused by its surrounding land use and land cover. To our knowledge, the relationship of LULCs and water quality have not been investigated using 0.5 m spatial resolution data. We hope our work can provide a prospective fine-detailed water quality analysis in the community of water environment of remote sensing.


Introduction
The sustainability and suitability of water resources are of significant importance in maintaining various urban uses, including drinking, sanitation, landscaping, and living [1].Moreover, the United Nations has identified water sustainability as a key development goal, namely 'Ensuring access to water and sanitation for all.'In urban areas, the quality of surface water is often heavily influenced by the neighbouring areas on both sides [2,3].Therefore, the results of land cover/land use (LULC) recognition in water neighbourhoods can enhance our understanding of the vulnerability of water resources [4,5].Traditional methods rely on manual labour, which may not guarantee completeness in terms of spatial and temporal aspects.Earth's surface using sensors that do not make direct physical contact with the observed object or area.In the context of water vulnerability, a multi-modal remote sensing dataset can provide a comprehensive representation of LULC conditions around water bodies, minimizing spatial gaps.This information is crucial for making informed water resource management decisions and ensuring sustainable water resource usage [1,6].
The accurate classification of LULC from remote sensing datasets is essential for effectively understanding water neighbourhoods.Previous researchers have successfully conducted numerous studies exploring the relationship between water quality and LULC [5][6][7][8][9][10][11]. Furthermore, several deep learning techniques have been developed for LULC classification [12,13].However, the availability of labelled data and computing capabilities still pose significant challenges that need to be addressed.These challenges become even more pronounced when dealing with very-highresolution unmanned aerial vehicle (UAV) images, which often involve multiple objects and landscapes.The existing benchmark datasets for LULC classification still have limitations in representing the diverse range of landscapes found in water neighbourhoods.Additionally, the complex nature of urban water landscapes makes the pretraining or fine-tuning of deep learning models time-consuming and computationally intensive.
Large-scale models (LSMs), also known as fundamental models, offer a promising solution to tackle the computationally expensive challenges.Fundamental models are typically developed using a substantial amount of data resources, enabling them to effectively handle various pattern recognition tasks [14].Notably, foundation models [15] such as segment anything model (SAM) [16] and Internimage [17] have achieved state-of-the-art classification results and have been successfully applied in other tasks.
This manuscript presents our research efforts in utilizing a fundamental model to accurately extract detailed LULC information associated with water quality.Additionally, we propose a spatial-temporal template that enables the prediction of potential water pollutants in response to changes in LULC.

Surface water pollution susceptibility index
Surface water pollution can be classified according to various systems, including pollutant properties (physical, chemical, and biological), spatial distributions (point, polyline, and polygon), and temporal terms (persistent or intermittent) [6].As previously mentioned, [5][6][7][8][9][10][11] have investigated the correlation between surface water quality and LULC and have identified potential water pollutant types associated with specific LULCs.Table 1 presents a compilation of these surface water pollutions that may be attributed to specific LULCs.

Study area and dataset
Figure 1(a) depicts the study area, which is the Future City (未来科学城) in Beijing.This location is characterized by a high density of water bodies and has undergone substantial transformation from a rural area to a bustling downtown over the past few decades.This remarkable transition serves as a powerful illustration of the dynamic interplay between human activities and the natural environment.Furthermore, figure 1(b) showcases the dataset employed in this manuscript, comprising UAV images with a spatial resolution of 0.5 m, as well as Gaofen-2 satellite images with a spatial resolution of 3 m and RGB channels.The acquisition dates for these datasets are as follows: 1 November 2018; 1 November 2019; 1 July 2020; 1 September 2021; 1 September 2022; and 1 September 2023.Figure 3 presents the map of site designing.

LSM-enhanced land classification 3.2.1. SAM
This paper selects SAM module, which utilizes a vast dataset of 11 million photos and 1+ billion masks for model pretraining [16], to conduct LSMenhanced land classification.It comprises an image encoder, image embedding, prompt decoder, and mask decoder.The model's architecture is similar to ViT, where the image is divided into patches, and the backbone is a masked autoencoder (MAE) pretrained ViT, which primarily focuses on global image features.The prompt encoder processes features from prompts, such as prompt points or polygons, to accurately label the location of positive objects.The mask decoder combines features from both the image and prompt encoders within the fundamental models to segment the test image.
The workflow of the SAM module encompasses three key modules [16]: • Image encoder and image embedding: This module extracts relevant features from the input remote sensing image.• Prompt encoder: The prompt encoder extracts prompt features from masks or labelled remote sensing images.It transforms both the input image and prompt features into a set of tokens.• Mask decoder: using the tokens generated from the input image and prompts, the mask decoder performs segmentation on the test remote sensing image, effectively delineating different regions of interest.

Architecture
Figure 2 illustrates the workflow of LULC classification, which consists of three main components.
In the first part, surface water areas are extracted from the 4 m resolution Gaofen-2 satellite image, and prompts are generated based on these extracted areas.Subsequently, the SAM module is employed to segment the surface water within the 1 m resolution UAV image, utilizing the generated prompts.In the second part of the workflow, buffering areas are created on both sides of the extracted surface water.This is followed by conducting LULC classification specifically within these buffering areas. (

1) surface water extraction and segmentation
The water areas are extracted from the Gaofen-2 satellite imagery using the NDWI [17] Index, which is shown as follows, ndwi = (g − nir) / (g + nir) where g and nir respectively represents the green band and near infrared band.Next, we perform an overlay of the extracted surface water areas (at 4 m resolution) with the corresponding UAV image (at 0.5 m resolution).This allows us to identify all pixels within the UAV image that correspond to the extracted surface water areas.From the linear areas formed by these detected pixels, we select the centre pixels as prompt points.Finally, we utilize these prompt points to extract surface water from the UAV images through the process of SAM segmentation.
(2) LULC classification on surface water neighbourhood According to previous research [6], different types of LULC including agriculture, built-up, grass, forest, and bare soil, would make varied influences on water quality.To assess the LULC within the buffer areas, we create a 500 m buffer region surrounding the extracted surface water areas.Subsequently, we generate prompts based on these buffer areas for conducting LULC classification with multispectral imagery with 4 m to 20 m spatial resolution.In addition, we used 0.5 m UAV imagery in this study, meaning that more details of LULC could be represented.
As mentioned earlier, prompt generation plays a crucial role in achieving state-of-the-art segmentation results with SAM.Therefore, based on the LULC categories defined in benchmark datasets [18], we manually select point and area prompts by visual interpretation and identification.These carefully chosen prompts serve as guiding inputs for SAM to generate accurate segmentation results.Based on the findings reported in the referenced studies, we employ nine LULC classes, including commercial built-up, industrial built-up, residential built-up, agriculture, grass, forest, construction site, other impervious surfaces and bare area.Each LULC category consists of 1000 image samples, which are utilized for training and fine-tuning the SAM model.
To ensure classification precision, we adopt a specific approach where we classify only one LULC category at a time using SAM.For instance, if the buffer areas encompass five different LULC categories, we perform ten separate segmentations using SAM, with each segmentation focusing exclusively on extracting a single LULC category.
Moreover, commercial built-up, industrial builtup and residential built-up, which are not available be to classified from UAV imagery, would hold different influences on water quality.Thus, we used the planning map shown in figure 1(C) to determine these three types of built-up areas.

Pollution susceptibility estimation of surface water
Based on the existing research discoveries, we propose an estimator that predicts surface water pollution susceptibility with LULC types and LULC changes.Figure 3 shows the architecture of the proposed estimator of surface water pollution susceptibility, which is developed based on a weighted-datacube.
In figure 3(a), the nine LULC types are arranged in a 2-dimensional layout incorporating the horizontal and vertical dimensions of the UAV image.The Xdimension runs parallel to the waterside, while the Y-dimension is perpendicular to it.Each grid in this 2-dimensional space represents a pixel in the UAV image.The LULC categories corresponding to a pixel at different points in time are organized along the time dimension.Therefore, utilizing this cube structure, we can effectively depict the LULC changes for each pixel using the following expression: where p (x, y) refers to the pixel located at (x, y), l refers to the shortest distance between p (x, y) and the waterside, t1, . . .tk refers to the temporal term, where k denotes the kth year.w l denotes the weight being determined by l. p(x, y) l,t1 represents the LULC type of p (x, y) at the kth year.w l is defined by the distance of buffer area, and is inversely proportional to l.
In figure 3(b), four cases are presented, illustrating different LULC transitions.These cases include the transitions from agriculture to construction site to forest, from forest to grass to other impervious surfaces, from construction site to other impervious surfaces to commercial built-up, and from construction site to construction site to residential built-up.Building upon the findings listed in table 1, we can assess surface water pollution susceptibility using the following expression: where PS (p (x, y)) denotes the pollution susceptibility value involving various pollutions that relies on {p(x, y) l,t1 |Q}.{p(x, y) l,t1 |Q} denotes the pollution that depends on the result of p(x, y) l,t1 .For example, when p(x,y)_(l,t1) is agriculture, we would have {p(x,y)_(l,t1)|Q}⊆{sediments, DOM, TN, TP, eutrophication, heavy metals, hydrocarbon}.

Results of surface water extraction
Figure 4(a) shows the results of surface water extraction using the state-of-the-art semantic segmentation and SAM.While previous studies have highlighted the significance of SAM in semantic segmentation, these investigations were primarily based on 1 m resolution satellite images.In the case of the 0.5 m resolution UAV image, both dark and bright tones/colours can be observed on the surface of the water due to water depth and turbidity differences, as well as radiometric interferences.Additionally, roads often exhibit similar tones/colours to water bodies due to their asphalt surfaces.Consequently, generating accurate surface water-labelled samples becomes challenging.To overcome this limitation, we leverage the auxiliary information provided by the waters extracted from the 4 m resolution satellite image.This approach enables effective extraction of surface water from the UAV image.Figure 4 showcases the results of surface water extraction in the study area using the proposed framework discussed in section 3.2.The green stars within the SAM-enhanced segmentation results indicate the point prompts defined in SAM.It is evident that even state-of-the-art feature matching (FM) approaches have limitations when it comes to capturing fine details of LULC in very-high-resolution UAV images.However, incorporating explainable knowledge from remote sensing, such as spectral indices for water detection, proves to be effective in enhancing the accuracy of surface water extraction.This highlights the significance of multi-resolution remote sensing data fusion in accurately recognizing fine-resolution details of complex land surfaces.

LULC classification for buffering area
We created the buffer area by using the distance = 500 m.In other words, we would focus on the area that is within 500 m of either edge of a water body.Moreover, the first law of geography claims that the nearer things are more related than the further things [19].Thus, a kernel function should be employed to set the weight for the buffer area within different buffering distance.According to spatial statistical research, basic weighted kernel functions include Gaussian, exponential, box-car, bi-square, tri-cube, etc. [22].These five weighted kernel functions are expressed as follows, w exp = exp (−d) where d refers to the normalized distance.d 1 , d 2 , and d k respectively refers to different thresholds.num 1 , …, num k refer to the predefined values.When n = 2, w n is the bi-square weight.When n = 3, w n is the tricube weight.Then, we employed the fine-tuned SAM to conduct semantic segmentation on the buffer area.The classification precision for agriculture, bare land, grass, forest, construction site, residential built-up and other impervious surfaces were around 90%-95%.Since the labelled datasets for commercial builtup and industrial built-up were not annotated well, the fine-tuned SAM could not recognize the difference between these two built-up classes.Thus, we manually corrected the segmentation results for commercial built-up and industrial built-up.Moreover, considering the change results were pixel-level fragmentary, we clustered the similar pixels into a superpixel, or an image region.
Figure 5(a) illustrates the results of LULC classification for six years, which was presented in section 2. We have fine-tuned the SAM with the benchmark datasets including nine classes, and each class included 1000 samples.In the experiment, we selected nine classes: agriculture, bare land, grass, forest, construction site, commercial built-up, industrial built-up, residential built-up, and other impervious surfaces.The first four classes could be found in table 1.To deal with fine-detailed LULCs, we divided the built-up, or urban into construction site, commercial, industrial and residential based on the research reported in [6].Moreover, we set the other impervious surface to define the types of impervious surfaces that have not reported to have an influence on water quality in literature, such as parking lot, concrete road, concrete space, etc.

Estimation of surface water pollution susceptibility
Based on the LULC classification results, we developed the weighted datacube (figure 3) for assessing the surface water pollution susceptibility.The possible surface water pollutants associated with various LULC classes are listed in table 1. Figure 5(b) illustrates the results of the weighted cube for nine areas we select from the buffer areas, the position of which are labelled in figure 5(a).The following calculation was conducted based on box-car weighted kernel function: setting the weights 1, 1/2, 1/3, 1/4 and 1/5 for the area from 0 to 100 m, from 100 to 200 m, from 200 to 300 m, from 300 to 400 m, and from 400 to 500 m, respectively.
Based on the distance l of the nine selected areas, we defined 1/4 as the weights for areas 1, 2, 3, and 4, 1/2 as the weights for areas 5 and 6, 1 as the weight for area 7, and 1/3 as the weights for areas 8 and 9.
Then, we determine the LULC to each area, the result of which is shown in figure 5(b).The finedetailed pollution susceptibility of surface water for each selected area would be calculated by their corresponding weight.The details of which are as follows, area 1 area: PS(p(x, y)) = Fusing the above calculated results, we have the following statistics: Table 2 presents the results by different commonlyused kernel functions.The value of each susceptibility pollutant is given by: weight × susceptibility index.The relationships between LULC and susceptibility pollutants are listed in table 1, which present the susceptibility pollution related to each type of LULC.Moreover, the weights are calculated in the weighted kernel function in section 4.2.Referring the Above all, scale is critical for the estimation of surface water pollution susceptibility in urban area.In this experiment, we could obtain very-highresolution details of LULC information relevant to surface water quality.The detailed information offers a great potential for exploring the influence of LULC on water pollution.

Conclusion
Previous studies have highlighted the impact of LULC on surface water quality, with urban areas experiencing a more pronounced influence due to frequent human activities.However, traditional approaches primarily focus on coarser spatial scales such as streams and riversides, typically ranging from 10 m to 100 m in spatial resolution.To gain a deeper understanding of the human-water interaction in urban areas, it is essential to consider the fine details of LULC, which can be obtained from high-resolution datasets.This paper presents our proposed approach for assessing the fine-detailed pollution susceptibility of surface water based on spatial and temporal changes.Based on very-high-resolution details on LULC information relevant to surface water quality.In comparison to previous works regarding the relationship between water pollution susceptibility and medium resolution LULC maps, fine-detailed information could be available in the proposed research.However, the topography and watershed boundaries have not been considered in this research due to the well-shaped water side in the study area.The future works might integrate these factors into the research framework in terms of estimation on water pollution susceptibility.

Figure 1 .
Figure 1.Study area and dataset.(a) The location of the selected study area.(b) Study area visualized by satellite image.(c) Site planning map of the study area.(d) Legend of the site planning map shown in (c).

Figure 2 .
Figure 2. Architecture of the proposed framework.

Figure 3 .
Figure 3.The proposed estimator of surface water pollution susceptibility.

Figure 4 .
Figure 4. Illustration on surface water extraction results by SAM, and surface water extraction results by the proposed workflow.

Table 2 .
Susceptibility to surface water pollution from 2018 to 2023 generated by different weighted kernel functions.research discoveries listed in table 2, from 2018 to 2023, we could measure the surface water pollution susceptibility as follows.The greater the susceptibility index the more vulnerable surface water is to this pollutant.