CDMnet: Cloud Detection in Remote Sensing Images Based on CNN

The main objective of cloud detection is to accurately distinguish between clouds and areas that are not clouds to extract accurate surface information. To achieve an accurate distinction between cloud and non-cloud regions, we propose a novel method called Cloud Detection Method Net (CDMnet). The method employs a lightweight network backbone, compared to traditional classical algorithms like SVM, to effectively reduce computational costs while maintaining accuracy. Additionally, we utilize hole convolutions with varying dilation rates to extract cloud features in parallel. We also employ SAM spatial attention multiple times to fine-tune cloud position positioning and obtain more accurate cloud masks. From the experimental results, it can be concluded that our method is more efficient than other traditional methods.


Introduction
Cloud detection methods based on multi-feature extraction were developed to enhance the performance and robustness of cloud detection by leveraging multiple features.By combining and analyzing these diverse features, cloud regions can be more accurately identified and extracted.
Cloud detection holds significant application value in various fields, including meteorology, climate research, and environmental monitoring [1] .However, cloud detection tasks face challenges due to the resemblance between clouds and ground objects, as well as other confounding factors present in remote sensing images, such as snow-covered mountains, lakes, and glaciers [2] .In response to these challenges, researchers have proposed various cloud detection methods based on feature extraction [3] .
For instance, Z. Shao et al. proposed a multi-scale feature-based Convolutional Neural Network (MF-CNN) method capable of simultaneously detecting thin clouds, thick clouds, and non-cloud pixels in remote sensing images [4] .Alqahtani M et al. presented a cloud detection model based on the XGBoost algorithm [5] .
The similarity in texture and color features between clouds and ground areas can lead to confusion and misclassification.For instance, ice-covered and snow-covered regions and cloud areas may appear similarly white in the image.The method mentioned in the paper overcomes these challenges by analyzing and comparing the differences between the various features, resulting in excellent results under comparable conditions.In this work, our contributions are as follows: The paper introduces the Cloud Detection Method Net (CDMnet) method and evaluates its performance on the GF-1 dataset, achieving favorable results.
The paper introduces the SAM module, FEM module, and ASPP module to precisely determine the position of clouds and facilitate accurate detection.

CDMnet Model Network Architecture
The objective of this experiment is to devise a more precise method, Cloud Detection Method Net (CDMnet), for cloud recognition in remote sensing images.Taking inspiration from the MobileNet V3 approach, the paper incorporates a Feature Extraction Module (FEM) as the network backbone to extract abundant spatial and high-level semantic information.This model takes a remote sensing image with three RGB channels as input and produces a cloud mask image with two channels as output.Figure 1 illustrates the model architecture diagram for this experiment.This paper presents the feature extraction part.This module utilizes convolution with different expansion rates to extract features of clouds in parallel and generate cloud masks.Accurately locating the location of the clouds is another major challenge.Therefore, a spatial attention module is introduced in this model.The Spatial Attention Module (SAM) is used several times during the cloud mask generation process to refine the cloud positioning and improve the correctness of the generated cloud masks.The method in this paper is lightweight and proficient in distinguishing clouds from regions that are not clouds in remote sensing images and shows good results in this experiment.

Other modules
Based on the excellent performance of Mobile Net V3 in cloud detection tasks, the model employs the Mobile Net V3 module for cloud and non-cloud detection in its experiments.Mobile Net V3 is a lightweight network model known for its efficiency.It incorporates a channel attention (SE) module and utilizes a NAS search approach to address the computational and memory requirements typically associated with traditional convolutional neural networks when running on mobile and embedded devices.This approach provides a high level of performance.By leveraging these advantageous features, the integration of Mobile Net V3 into the cloud detection model presented in this paper allows for more accurate detection of both cloud and non-cloud regions, which greatly contributes to the overall results of the model.Figure 2   Figure 4 shows that we use dilation convolution at different rates to obtain perceptual fields at different scales, enabling us to capture local and global contextual information in the image.

Dataset and metric
In this experiment, we make use of the publicly available remote sensing image dataset GF-1_WHU, which is published by the SENDIMAGE Lab of Wuhan University and provides globally distributed validation images.In this paper, the following five metrics will be used to evaluate the experimental results.

Model Construction
In this experiment, FEM is used as the basic network structure, which possesses lightweight and efficient features.To effectively capture multi-scale features, the model integrates a SAM.This allows the model to extract features with different sensory fields, ultimately improving the representativeness of cloud regions.

Experimental Results and Comparative Experiments Evaluate
The experimental results demonstrate the substantial advantages of this method in terms of effectiveness, surpassing traditional methods.The comparative experimental findings are presented in Table 1.

Conclusion
We have found a new method: CDMnet, through this experiment, aimed at extracting rich spatial and high-level semantic information from processed remote sensing images.By combining multi-scale feature extraction modules and using hole convolution with different expansion rates, cloud features are extracted in parallel.By comparing the experimental results, under the same evaluation criteria, the method proposed in this article is superior to other methods such as support vector machines.In future research, we plan to explore new network models and continue to improve our methods to better complete cloud detection.
illustrates the structure diagram of the feature extraction module (FEM).

Figure 3
Figure3illustrates that in this experiment, SAM was used to extract spatial information specifically related to clouds in the image, thereby improving the representation of cloud regions.Figure4shows that we use dilation convolution at different rates to obtain perceptual fields at different scales, enabling us to capture local and global contextual information in the image.

Table 1 :
Experimental results