A deep learning-based method detects dust from solar PV panels through Unmanned Aerial Vehicles

As the number of solar photovoltaic (PV) panels increases, dust detection on the panels becomes particularly important. In this paper, we propose a deep learning-based method that detects dust from solar PV panels through Unmanned Aerial Vehicles. The model utilizes the improved YOLOv5 method to detect PV panel dust on aerial images. The model is a lightweight model that requires fewer computing resources and time and can work in real time on a regular CPU computer. Moreover, in this paper, a prediction head is added to YOLOv5 to cope with significant changes in target scales due to unmanned aerial vehicles capturing images at different altitudes. And the model introduces new tricks to help detect dust targets in images with large coverage areas. After experimental validation, the proposed method outperforms the state-of-the-art in terms of detection accuracy, detection speed, F1 score, etc., and is more suitable for the inspection of dust on PV panels of Unmanned Aerial Vehicles.


Introduction
In recent years, the growth of energy demand and the effectiveness of addressing climate change and reducing global warming have made the development and utilization of renewable energy increasingly concerned.After investigation, it is found that the energy radiated by the sun to the Earth is tens of thousands of times more than the energy provided by various energy sources.And because solar energy has the characteristics of being clean and safe, besides, it has huge energy, wide distribution, and longterm aging, it is considered one of the important means to achieve carbon neutrality [1].
As solar energy is used on a large scale as a clean energy source, the number of solar PV panels is increasing.Besides the manufacturing, transportation, and maintenance of solar PV panels caused by working in harsh environments, dust on the surface of solar photovoltaic panels can also affect the efficiency and performance of solar photovoltaic power generation.To ensure the efficient and reliable operation of PV plants, regular dust detection and cleaning are required.
Regarding the inspection for photovoltaic panels, the traditional working method requires staff to climb onto the roof to visually check the status of photovoltaic panels.This method is not only timeconsuming and laborious but also has certain risks.However, UAV-based aerial inspection is a noncontact inspection method that can be directly applied on-site during the normal operation of photovoltaic systems.Therefore, more and more scholars want to use UAVs to conduct efficient, safe, and accurate inspections for photovoltaic panels and their related components.
However, the current inspection UAV is only used in a camera.This solution has low timeliness, and if an emergency occurs, it will not be dealt with immediately, which may cause unnecessary losses.And because of the UAVs at different altitudes, the target scale changes greatly, and the existing detection model is difficult to detect accurately, which also puts forward higher requirements for the construction and optimization of the network model.
To address the above challenges, we have designed a lightweight multi-scale target solar PV panel dust detection scheme deployed on UAVs.The specific contributions of this study are: 1.This paper proposes a light-weight PV panel dust detection method, which can be deployed on drones for real-time detection at inspection sites; 2. This paper proposes a multi-scale target detection method for PV panel dust to achieve accurate detection when UAVs fly at different heights; 3. This paper adds an attention mechanism, and it is easy to locate the dust of PV panels in areas with large area coverage.

Related Work
For the monitoring and inspection for the status of PV panels, scholars have also done relevant work in recent years [2]- [6].In terms of object detection, the popular detection methods are mainly based on three modes.
The first one is based on image processing.In [7], the researchers used Gaussian filtering for edge detection, which is used for pattern recognition of the boundary of the PV panel and its components and the detection of dust and other targets.But the mode algorithm is fixed, and image processing requires certain prior knowledge and the robustness is poor, not suitable for UAVs inspection.
The second mode is based on machine learning.It needs to set the features manually and distinguish them according to the features extracted from the image.Its accuracy is higher than that of the mode based on image processing, but its timeliness is also relatively poor.In [8] the improved K-nearest neighbor (KNN) algorithm is used to continuously approximate and calculate the theoretical and actual output difference of photovoltaic panels.
The third model, which is also the most widely used method at present, uses deep Convolutional Neural Networks (CNNs) [9].At present, CNN has shown advantages in visual recognition tasks, which can automatically extract features and conduct in-depth analysis and recognition of features.Different dust coverage was simulated concerning [10] and [11], and CNNs were used to evaluate the pollution degree of photovoltaic panels by extracting features.In [12], scholars employed the Visual Geometry Group Network (VGG) for extracting PV panel features collected by UAVs.However, the number of parameters trained by VGG is up to millions of levels, and model reasoning requires very high computational resources.Although the model ensures the accuracy of detection, the real-time performance is poor, so it is necessary to select a model with low complexity and high real-time performance to deploy on the UAVs.
At present, the popular real-time target detection algorithms are YOLO series, including YOLO, YOLOv3, YOLOv4, YOLOv5 [13]- [16], etc.In [17], scholars show the detection of photovoltaic power stations in India based on UAVs inspection images by using the YOLO method.Among the YOLO series methods, YOLOv5 is the most widely used, and many advanced algorithms are improved based on YOLOv5.

Workflow Design
This section introduces the design process of the workflow.The traditional method is to transmit the video stream captured by the UAV to the cloud server, where the network model is used for reasoning and detection.Being different from the traditional method, the method in this paper uses a lightweight model to carry out edge reasoning during the inspection of PV panels by UAVs, ensuring the real-time detection of dust covered by PV panels.This method not only greatly reduces the bandwidth and time required to upload the image to the cloud, but also ensures timeliness.The design process of the whole work process is shown in Figure 1.First of all, we will make a dataset of PV panel dust taken by UAVs and label them with the labeling annotation tool.To cope with data scarcity, and ensure the performance of the photovoltaic panel dust model, Mosaic [18] is adopted in this paper.It can be used to enrich training data by stitching four pictures with random scaling and clipping.The improved YOLOv5 network is covered in detail in the next section.The process of model training is also a process of model learning, which requires the neurons of a convolutional neural network to capture image features by learning parameters such as weight and bias.Model optimization is to constantly update trainable weight and bias in the network to gradually decrease the error in the loss function, so that the results predicted by the model are closest to the real label, to ensure accuracy and robustness.The optimizer selected in this paper is SDG, and the loss function is composed of classification, regression, and confidence.The formulas are as follows.
where n l is the loss corresponding to the nth sample, x and y respectively represent the input and corresponding predicted output, and N is the number of samples.TurePositive precision= TurePositive+FalsePositive (5) TurePositive recall TurePositive FalseNegative   where TurePositive is the number of positive samples predicted correctly, FalseNegative is the number of negative samples predicted error, and FalsePositive is the number of positive samples predicted error.If the mAP of the model meets the threshold requirements, the next step will be evaluated on the test data; otherwise, the model will continue to optimize parameters.

Network architecture
The improved YOLOv5 network structure in this paper is composed of three parts: backbone, neck, and detect head.The structural details are shown in Figure 2.
The backbone is a combination of Focus, Conv, C3, SPP, and Transformer encoder, which is responsible for extracting features from the input image.First, Focus slices the image, changing the RGB 3 channels mode of the input image into 12 channels to achieve the double-down sampling feature map without information loss.The Conv module consists of Conv2d, Batch Normalization, and the activation function SiLU.Transformer and C3 encoder replace some convolution blocks and CSP bottleneck blocks from the original YOLOv5.C3 is a lightweight convolutional neural network module, which is composed of two branches and is used to increase the nonlinear capability of the model and extract image features.At the end of the backbone and the beginning of the neck is a Transformer encoder, which uses a multi-head self-attention mechanism to effectively capture global information and extract target features from low-resolution feature graphs.
Meanwhile, to overcome the problem that gradients disappear with the increase of network depth and ensure the effective combination of shallow features and deep features, a residual connection is adopted in the network structure [19].
The design of the Neck is to make better use of different levels of features extracted from the Backbone.By combining the extracted semantic features with the positioning information, the performance of the target detector can be optimized.The Neck designed in this paper adopts FPN + PAN.FPN is top-down, blending high-level features with up-sampled and lower-level features to ensure full integration of semantic information at different scales.PAN, on the other hand, transmits location information from shallow to deep, enhancing multi-scale localization capabilities in deep network structures.In the design of the Neck network, this paper adds CBAM, which is a lightweight module combining channel attention and spatial attention.It can greatly improve the model performance with a small amount of computation and parameters and can be embedded in any CNN network.
The Channel Attention Moudle mainly recognizes the meaningful features of the input image, while the Spatial Attention Moudle extracts the rich information of the input image.The two are combined to complement each other to ensure that the effective information of the input image is fully collected.Moreover, in the process of extracting special layer information, the Transformer encoder module and CBAM are combined to quickly locate dust targets to be detected in a large range of geographic information images taken by UAVs.
Compared with YOLOv5, a new detection head is added to the head network.Although the calculation cost is increased, the particle size of the target can be detected more obviously, which is very favorable for small targets inspected by UAVs at high altitudes.

Experimental configuration
The experimental configuration is shown in Table 1

Experimental analysis
The dataset used in the experiment in this paper includes the dust area of the whole solar photovoltaic panel and the local dust, which respectively deal with the targets of different sizes taken by the UAVs at different heights of vision.Among them, there are 5160 images of local dust and 1220 images of the whole dust area.This article will divide the training set, validation set, and test set according to 3:1:1.
Since the method designed in this article is an improvement on the most popular YOLOv5s, the performance of this paper's method is verified by comparing it with YOLOv5 and its improved related strategies, such as YOLOv5s-CBAM and YOLOv5s-Transformer.The parameters of model training are as follows: the batch size is set to 16, the learning rate is 0.01, and the model training 100 epochs.Because after the 60th epoch, the performance of the comparison method is not different, the evaluation indicators of the first 60 epochs for analysis and comparison can be shown.
Figure 3 shows that the method designed in this paper reaches a faster peak in mAP than YOLOv5s-CBAM and YOLOv5s-Transformer, and the mAP peak is higher than others, which indicates that our method has stronger learning ability.The original version of YOLOv5 lags far behind the previous three improved versions in terms of learning ability.  2 shows the evaluation indicators of the above four methods, such as Recall, Precision, mAP, and F1 scores.In terms of recall, the method in this paper is 0.57% lower than YOLOv5s-CBAM, showing the best performance in other aspects.In the case of the F1-score, the methodology used in this paper is 9.1%, 0.36%, and 1.33%, better than the other three methods, respectively.According to the result, our proposed approach shows the best results in terms of overall performance.

Figure 1 .
Photovoltaic panel dust dataset Dataset annotation Dataset augmentation Loss obj adopt BEC._ Loss box adopts IOU.

Figure 2 .
Figure 2. The structure of the improved YOLOv5

Figure 3 .
Figure 3.The mAP curve of the method Table2shows the evaluation indicators of the above four methods, such as Recall, Precision, mAP, and F1 scores.In terms of recall, the method in this paper is 0.57% lower than YOLOv5s-CBAM, showing the best performance in other aspects.In the case of the F1-score, the methodology used in this paper is 9.1%, 0.36%, and 1.33%, better than the other three methods, respectively.According to the result, our proposed approach shows the best results in terms of overall performance.

Figure 4 Figure 4 .
Figure4displays the detection result of the proposed method on solar PV panel dust.Among them, (a) is the whole large piece of dust, which is relatively easy to detect, and the methodology used in this paper could be clearly and accurately detected.(b) The local small area of dust is shown, and the method in this paper still has good recognition ability for small targets.

Table 1 .
. We used a 24 GB NVIDIA GeForce RTX3090 for experiments under Ubuntu 18.04.All experiments were conducted under PyTorch version 1.10.2+cu113.Experimental Environment Configuration.

Table 2
Evaluation indicators of the method