Building Edge Detection Technology From Remote Sensing Image Based On NSCT And Tensor Voting

An edge detection technology based on the combination of non-downsampling contour wave transform (NSCT) and tensor voting is proposed, which aims to obtain more accurate and detailed edge information of buildings in remote sensing images. Firstly, NSCT is used for image decomposition to obtain the subband frequency information of different scales and angles. Then, position encoding is performed on these subband coefficients to obtain second-order symmetric tensors at the corresponding positions. Tensors of different scales and angles at the same position are weighted and summed to complete feature fusion. Finally, the edge features of the image are obtained based on tensor voting theory. The experimental results show that compared to common edge detection technologies, such as Canny, Fast Edge (Fast edge detection using structured forests) and HED (Holistically-Nested Edge Detection), our method can more accurately and intensively reflect the boundaries of buildings and the edge information of roofs, providing better support for the analysis of building types and architectural styles. Compared to the HED method, which is based on deep learning, our method improves PSNR and SSIM metrics by 0.98 and 0.03, respectively.


Introduction
Building edge detection is a classic problem in remote sensing image processing, which significantly reduces the amount of processed data by retaining useful structural information of building boundaries.It is widely used in urban planning, environmental protection, map drawing, and agricultural resource investigation [1].At present, there are generally two method categories used for building edge detection in remote sensing images.One is to directly use general image edge detection algorithms [2] [3] [4] [5] [6], and the second is to use deep semantic segmentation networks for building boundary extraction [1] [7] [8] [9].In practice, both of these methods are difficult or even impossible to obtain complete and detailed edge information of building roofs, which affects the judgment of building types and styles.
Among the commonly used edge detection algorithms, Sobel detection [2] is one of the pioneering methods in this field, which calculates the first-order derivative changes of the image in the horizontal and vertical directions for edge detection.The method is simple, but the results are noisy.The Canny detection operator [3] uses dual threshold combining non-maximum restrain technology based on the Sobel operator to filter out non edge noise to a certain extent, but the detection results are easily affected by threshold settings.In recent years, with the rapid development of machine learning, especially deep learning technology, learning based edge detection methods such as Fast Edge(Fast edge detection using structured forests) [4], HED(Holistically-Nested Edge Detection) [5], and EDTER(Edge Detection with Transformer) [6] have also emerged.Fast Edge uses a random decision forest framework to predict edge information for linear or T-shaped structures present in local image window blocks.HED stands for overall nested edge detection, which is based on VGG Net, the network structure is improved by adding a layer of so-called side output layer at the back of each convolutional layer, followed by supervised learning.The side output layer is multi-scale and multi-level, and finally the detection results are predicted through a weighted fusion layer.EDTER(Edge Detection with Transformer) applies ViT(Vision Transformer) structure to edge detection, uses Transformer to obtain global contextual information and local detailed clues, and extracts meaningful target boundaries.When we apply these methods to building detection in a remote sensing image, it is easy to overlook some key details in the image, such as the shape and details of the building roof.The detection results generally only have the rough outline of the building area and the detailed information of the roof edge is lost.
For the method of extracting building boundaries using deep semantic segmentation networks [7] [8] [9], the segmentation results are state of the art.But this type of segmentation method can only segment the approximate area of the building, we can only obtain the boundaries of the buildings and cannot get any edge information such as the roof of the building at all.
In order to obtain more detailed and accurate edge information of building boundaries and roofs, We propose an edge detection technology based on non-downsampling contour wave transform (NSCT) [10] [11] and tensor voting [12] is proposed to obtain accurate edge information of buildings and roofs in remote sensing images.Our method is non learning type, which means it does not require a dataset or long-term model training.The second part of this paper mainly introduces the relevant content of NSCT, the third part introduces tensor voting technology, and the fourth part is the experimental results and analysis, which is to prove the effectiveness of our method.

Image Transformation Based on NSCT
NSCT is a multi-scale and multi-angle image decomposition and transformation tool that utilizes a non downsampling pyramid decomposition filter (NSP) to maintain image size and perform multi-scale image decomposition, obtaining low-frequency components and high-frequency components of different subbands.Using Non-Downsampling Directional Filter Banks (NSDFB) to perform multiangle decomposition on two-dimensional images and obtain directional features of the images at different angles.NSDFB performs multi-angle decomposition on the high-frequency subband images obtained from NSP decomposition, and then obtains multiple directional sub images from different angles to obtain multi-scale and multi-angle information.
NSP and NSDFB are two different filtering methods.The former achieves multi-scale decomposition, while the latter achieves directional decomposition from multiple angles.It should be clearly pointed out that there is no dependency relationship between NSP and NSDFB.In this paper, we utilize NSCT composed of NSP and NSDFB to achieve multi-scale and multi-angle decomposition of remote sensing images.The decomposition process is shown in Figure 1.

Tensor Encoding and Tensor Voting
Assuming the size of original remote sensing image is w h  , after the decomposition of the NSCT, we obtain the sub band frequency coefficient tensors with the size of Next, position encoding is performed on these subband coefficients to obtain second-order symmetric tensors for their respective positions.Then, the tensors of different scales and angles at the same position are weighted and summed to complete feature fusion, which is the process of tensor encoding.Finally, tensor voting is performed based on tensor voting theory [12], followed by tensor decomposition to obtain the edge features of the image.

Tensor Encoding
Unlike directly using hard thresholding to obtain simple binary images, tensor encoding can simultaneously encode significant information of different categories for pixels at any position [13].is second-order symmetric.Therefore, by encoding the coefficient tensors of

Tensor Voting
The tensor voting is the stacking process of field strength information.The tensor information at any pixel position will be affected by the neighbourhood pixels in the image, and its eigenvalues and eigenvectors will be affected and changed.Usually, two different field characteristics are generated, namely the stick tensor field and the ball tensor field The calculation of s in formula ( 2) is shown in formula (3): The calculation of c in formula ( 2) is shown in formula (4): In formula (2), the calculation of k is shown in formula (5):

Ball Tensor Field
In the tensor voting theory, the voting calculation of the stick tensor field is constrained by directionality, while the calculation of the ball tensor field is non directional, which is meaning that the receiving point P can accept the sum of the stick tensor votes in all directions of the voting point O.The calculation formula is shown in formula ( 6): In formula ( 6), R  is the rotation matrix, which represents the rotation angle of the rod tensor.S is the rotated stick tensor.It can be achieved through discrete approximation and summation in the actual calculation.

Tensor Decomposition
, which is the encoding information after a tensor voting for the position of ( , ) xy.If T can be expressed as formula (7).
In formula (7),  ) is a ball tensor for point like features.In edge detection, we mainly focus on linear features, so that the final 12 ()  − is the edge feature desired in this paper.

Experiments
To verify the effectiveness of the algorithm proposed in this paper, we select multiple remote sensing images from different datasets for experiments.All the experiments were completed on a PC with Intel Core i7, a 3.4GHz CPU, 32GB of memory, and Matlab2020b programming environment.Three representative images were selected from the Inria Aerial [14] dataset with a spatial resolution of 0.3 meters.Three images with a same resolution were manually downloaded from Chinese Ancient Architecture Style Remote Sensing Database [15] for experiments, all of which were with the same size of 500 500  .We set free variable σ in the formula(2) to 3.

Results
To avoid noise interference, Gaussian filtering is first used to smooth the original image to a certain extent.Then NSCT combined with tensor voting were used for decomposition and edge feature extraction.Three images in Inria Aerial were compared using Canny, Fast Edge, HED, and our method.
The experimental results are shown in Figure 3.  3, columns (a) represent the original images, while columns (b) to (e) represent the results of Canny, Fast Edge, HED and ours respectively.Compared with the other three methods, ours can accurately and meticulously detect the edges of buildings, and can also detect detailed edge information such as ridge lines of building roofs.From Figure 3, we can find that our method can detect the ridge line information of the roof, which can then determine the approximate shape of the roof.However, the other three detection methods not only cannot obtain the edge information of the roof, but also the roughness of the building boundaries is not precise enough.
To further demonstrate the effectiveness of the method proposed in this paper, three images with a resolution of 0.3m were selected from Chinese Ancient Architecture Style Remote Sensing Database [15], as shown in Figure 4.  4, compared with the other three methods, ours can still accurately and meticulously detect building edges.At the same time, it can also detect detailed edge information such as ridges on building roofs.Taking the last image in the first column as an example, through the detection method in this paper, it can be seen that it is a Chinese tower style building, but it is difficult to determine the style and type of the building through the other three detection methods.

Experimental Analysis and Evaluation
We use NSCT to perform multi-scale and multi-angle decomposition on the image, obtaining directional sub-band coefficients in 2, 4, and 8 directions.Then, we weight and sum these sub-band coefficients to complete fusion encoding, and finally obtain edge features through tensor voting.The experimental analysis and evaluation in this section mainly discuss three issues.Firstly, we encode the 2-direction, 4direction, and 8-direction sub-band coefficients individually (no need for summation), and then perform tensor voting to verify the effectiveness of fusion through experimental results.Secondly, after summing the sub-band coefficients, tensor voting is no longer required, and the effectiveness of tensor voting is verified through experimental results.Thirdly, by calculating the peak signal-to-noise ratio (PSNR) and structural similarity method (SSIM) [16], we evaluated the experimental results of Canny, Fast Edge, HED, and our method, proving that our method is equally superior to the above methods in these two indicators.

The inferiority of encoding sub-band coefficients individually
Referring to Figure 1, we encode sub-band coefficients in the 2-direction, 4-direction, and 8-direction sub-band coefficients individually, and calculate the anisotropic coefficient encoding as shown in formula (8).
, where  i {2,4,8} We only select the last image in the first column of Figure 3 and Figure 4 for explanation.The experimental results are shown in Figure 5.The first collum of Figure 5 represent the original images.The evaluation of PSNR and SSIM In order to evaluate the performance of detection algorithms quantitatively, two evaluation metrics, PSNR and SSIM were selected to evaluate Canny, Fast Edge, HED, and our method.Detailed calculations of PSNR and SSIM can be referred to relevant references [15].Generally speaking, a larger value of PSNR indicates lower image noise and higher visual quality, while a larger value of SSIM indicates that the edge image is closer to the original image in terms of structure, brightness, and contrast ratio.We calculate the PSNR and SSIM of the original images in Figure 3(a) and Figure4(a), and the corresponding detection result image separately, and the calculation results are shown in Tables 1 and Tables 2. Our method has a 0.98 improvement in PSNR compared to HED, and a 0.03 improvement in SSIM compared to HED.As the image of edge detection only contains edge information, there is a significant subjective visual difference from the original one, which causes that the values of PSNR and SSIM are both very small.However, compared to the other three methods, our method still performs outstandingly in PSNR and SSIM performance indicators, further verifying that our method can obtain more accurate and detailed edge information of the image.

CONCLUSION
Common edge detection technologies are not accurate and detailed enough in detecting the boundaries of buildings in remote sensing images, and are prone to ignoring the shape and detailed edge information of building roofs.To address this issue, a remote sensing image building edge detection technology based on the combination of NSCT and tensor voting is proposed.NSCT is used to decompose the image into fine subband frequency information of multiple scales and angles, then position encoding is performed, weighted sum is performed to complete feature fusion, and finally tensor voting is performed to obtain the edge features of the image.The experimental results show that our method is effective.Compared with typical detection methods such as Canny, Fast Edge, and HED, our method performs outstandingly in PSNR and SSIM performance indicators.The detection results more accurately and finely reflect the boundary information of buildings and the edge information of roofs, which can provide better support for the analysis of the types and styles of the buildings in a remote sensing image.Due to the high level of noise in remote sensing images, the algorithm proposed in this paper is susceptible to the influence of noisy images.Therefore, the image should be subjected to a certain degree of denoising and filtering before implementing the algorithm.

Fig. 1 . 1 y 2 y , 3 y and 4 y 4 y
Fig.1.The NSCT structure adopted in this paper Firstly, we decompose the remote sensing image into two parts with the NSP: high-frequency subband (Z) H 1 and low-frequency subband (Z) H 0 .The high-frequency subband (Z) H 1 is further decomposed using NSDFB to obtain multiple directional subbands.Then, the low-frequency subband (Z) H 0 obtained through the first NSP decomposition are further subjected to the second NSP and subsequent NSDFB decomposition.This process is repeated iteratively until the initial set number of scale decomposition layers and directional decomposition subbands are completed.We adopt the methods of cubic scale decomposition and cubic direction decomposition in this paper.The finally outputs are one low-frequency subband 1 y and three high-frequency subbands the subband frequency coefficient obtained through NSCT decomposition of a pixel at any position ) , ( y x in the original image, where j  represents the j th scale and i θ represents thei th angle direction.perform tensor encoding on the subband frequency coefficient )

3. 2 . 1 Fig. 2 .
Fig.2.Schematic diagram of rod tensor votingWe can define the voting function as shown in formula (2).Among them, σ is the scale factor of voting and the only free variable in the formula.Where c is the function of σ, and k is the curvature of the arc.

1 λ and 2 λ
are their corresponding eigenvalues, then according to tensor voting and spectral theory,

2 erespectively and 12 0
are the two eigenvalues and eigenvectors of T   .According to tensor voting theory,

Fig. 3 .
Fig.3.Typical image comparison experiments in the Inria Aerial datasetIn Figure3, columns (a) represent the original images, while columns (b) to (e) represent the results of Canny, Fast Edge, HED and ours respectively.Compared with the other three methods, ours can accurately and meticulously detect the edges of buildings, and can also detect detailed edge information such as ridge lines of building roofs.From Figure3, we can find that our method can detect the ridge line information of the roof, which can then determine the approximate shape of the roof.However, the other three detection methods not only cannot obtain the edge information of the roof, but also the roughness of the building boundaries is not precise enough.To further demonstrate the effectiveness of the method proposed in this paper, three images with a resolution of 0.3m were selected from Chinese Ancient Architecture Style Remote Sensing Database[15], as shown in Figure4.

Fig. 4 .
Fig.4.Typical image comparison experiments from Chinese Ancient Architecture Style Remote Sensing Database In Figure 4, columns (a) represent the original images from Chinese Ancient Architecture Style Remote Sensing Database, while columns (b) to (e) represent the results of Canny, Fast Edge, HED and ours respectively.Compared to the images in the Inria Aerial dataset, the images from Chinese Ancient Architecture Style Remote Sensing Database are with more noises, poorer clarity, and greater difficulty in edge detection.However, as shown Figure4, compared with the other three methods, ours can still accurately and meticulously detect building edges.At the same time, it can also detect detailed edge information such as ridges on building roofs.Taking the last image in the first column as an example, through the detection method in this paper, it can be seen that it is a Chinese tower style building, but it is difficult to determine the style and type of the building through the other three detection methods.

Figure 5 (
b) -(d) show the results of individually encoding and feature decomposing the 2-direction, 4direction, and 8-direction subband coefficients corresponding to Figure 5 (a).

Figure 5 (
e) show the final results after multi-scale and multi-directional fusion.It can be seen that through multi-scale and multidirectional fusion, the results obtained are more refined and accurate, which demonstrates the effectiveness of final fusion results.

Fig. 5 .
Fig.5.Subband coefficients encoded individually 4.2.2The Validity of Tensor Voting Taking the images in Figure 5(a) as an example.Figure 6 (a) and (b) respectively represent the sum of the sub band coefficients corresponding to Figure 5(a).Without tensor voting, their edge features are not obvious.For a better visual experience, we have enlarged the parts of the images.

Fig. 6 .
Fig.6.Sum of Subband Coefficients Before Tensor Voting 4.2.3Theevaluation of PSNR and SSIM In order to evaluate the performance of detection algorithms quantitatively, two evaluation metrics, PSNR and SSIM were selected to evaluate Canny, Fast Edge, HED, and our method.Detailed calculations of PSNR and SSIM can be referred to relevant references[15].Generally speaking, a larger value of PSNR indicates lower image noise and higher visual quality, while a larger value of SSIM indicates that the edge image is closer to the original image in terms of structure, brightness, and contrast ratio.We calculate the PSNR and SSIM of the original images in Figure3(a) and Figure4(a), and the corresponding detection result image separately, and the calculation results are shown in Tables1 and Tables 2. Our method has a 0.98 improvement in PSNR compared to HED, and a 0.03 improvement in SSIM compared to HED.Table1Calculation results of PSNR

Table 1
Calculation results of PSNR

Table 2
Calculation results of SSIM