Insulator defect detection algorithm based on a lightweight network

Insulators are the key components of transmission lines. The identification and detection of insulator defects are directly related to the stable operation of transmission lines. In order to improve the efficiency of the insulator and its defect location, a faster defect detection algorithm based on YOLOv5 is proposed. Firstly, a lightweight Ghost module was introduced in the YOLOv5 backbone network, which significantly improved the detection speed with ensuring accuracy. Secondly, Secondly, CBAM is introduced into YOLOv5 Neck network to further improve the detection accuracy. The experimental results show that the model of the improved post-network is smaller compared to the YOLOv5 original network, and the detection speed improves greatly while ensuring the detection accuracy. It is of great significance to power grid operation and maintenance.


1.Introduction
Insulator is the key component of the transmission line, its main role is to support the wire and insulate the wire from the rod and tower, with the dual functions of electrical insulation and mechanical support [1] , Insulator state detection is an important guarantee to ensure the stable operation of electric power transmission lines. Because the insulator is in the outdoor environment, the weather and other factors are easy to cause damage to the insulator string and produce defects [2] , It has had a great impact on the safe and stable operation of the power grid, and also poses a great threat to the social economy [3] . Therefore, the defect detection of the insulator is of important significance for the power grid operation and maintenance.
With the rapid development of deep learning, various target detection algorithms based on deep learning have been proposed and have been widely used in insulator detection [4] . In literature [5], a convolution layer in the original YOLOv3 network was replaced with dense-Net, which realized multilayer feature fusion of insulator images and improved detection accuracy. In literature [6], a new residual block with 4-fold reduction sampling friendly to small targets was added between the second and third residual blocks of YOLOv3 in feature extraction network Darknet-53 to improve the detection accuracy of small targets. In view of the difficulty in small target detection, literature [7] combined non-maximum suppression algorithm, sliding window and YOLOv4 neural network to improve the accuracy of small target detection in insulator defect area. Literature [8] proposed an intelligent identification algorithm of insulator defects in HSR catenary based on deep learning technology, efficientnet-B3 network was used to discriminate the insulator status, and then YOLOv5 model was used to accurately detect the information of insulator defects, which improved the accuracy of intelligent identification of insulator defects in catenary.  The Input network is used for pre-processing of images, including adaptive anchor frame calculations, adaptive image filling, and Mosaic [9] Data augmentation in three parts. The Input network may process the original image as a uniform size image of 608×608×3 to facilitate further processing of the image by a subsequent network .The backbone network was used to feature extract the images, and the backbone network contained the Focus structure and the C3 structure. The Focus structure provides the slice manipulation. The input image of 608×608×3 was first sectioned to obtain a feature map of 304×304×12, Then 32 convolution checks were used to calculate the features of 304×304 ×12, resulting in a new feature map of 304 × 304 ×32.The role of C3 structure is to locally cross layers, using the feature information of different layers to obtain richer feature maps. The Neck network is used to generate feature pyramids, which contains FPN [10] And PAN [11] , FPN integrates strong semantic features at the high level from top to bottom. PAN is a complement to FPN and conveys strong localization features. Both are used simultaneously, strengthening the feature fusion capability of the network. The Head network was used for the final detection, including the loss function of the prediction box and the NMS [12] (Non-Maximum suppression) Algorithm. The YOLOv5 uses the GIOU_ Loss As a loss function of the prediction box, the problem of overlapping boundary boxes can be effectively solved, and the speed and accuracy of the prediction box regression can be improved .The NMS is used to screen the target detection prediction stage detection box.

Improved insulator detection model
Currently, in practical insulator detection application environments, detection model are mostly deployed on mobile or embedded devices that fail to provide sufficient computing power for large network models. Due to the excessive network parameters of YOLOv5, and the feature extraction network of YOLOv5, there are a lot of redundant feature diagrams in the process of convolution operation, which will greatly consume the computing resources of hardware equipment and make the detection time of images longer. In order to further improve the detection speed of the model, the network structure of YOLOv5 is improved.
First, CBAM [13] is introduced into the Neck network, and the structure diagram is shown in Figure 3 below. The CBAM module contains two submodules, the Channel Attention Module [14] and the Spatial Attention Module [15] .   (1), where the is the sigmoid activation function. , The Spatial Attention module focuses on the spatial position and determines which locations contain the main information about the target. The basic principles of the spatial attention module are as follows: First, the input feature, , will obtain the spatial feature through the average pooling layer and the maximum pooling layer, , . Then, stitch together, , , to form a new spatial feature map with a channel number of 2. Finally, the new spatial feature map is convoluted and the sigmoid activation function is used to obtain the feature map of the spatial attention weight, . is calculated as Formula (2), where the is the sigmoid activation function, and represents the convolution layer of the 7x7 convolution core.
; , To sum up, the output characteristic is calculated by the input feature , and the formula for is shown in the following formula (3), where is the channel attention weight feature, is the spatial attention weight feature. * * The Conv module and C3 module in the backbone network of YOLOv5 are then replaced with the Ghost module [16] and the Ghost Bottleneck module. Through the appeal operation, the calculation amount of the model in the feature extraction stage can be reduced and the detection speed can be improved significantly under the condition that the detection accuracy of the model is unchanged. The Ghost module is a lightweight, efficient convolution module that integrates traditional volumes into two steps, first using fewer traditional convolutions to generate the underlying feature map, and then using the underlying feature diagram to generate a new feature map using a smaller linear convolution. Finally, the two sets of feature diagrams are stitched together to get the final output. The structure of the Ghost module is shown in Figure 4  To further reduce the need for hardware resources, the C3 module in YOLOv5 can be replaced with an efficient Ghost Bottleneck module, as shown in Figure 4 (b). There are currently two main types of Ghost Bottleneck: where Ghost Bottleneck 1 is the simple stacking of two Ghost modules, and then the input and output of the two Ghost modules are connected using shortcut. Ghost Bottleneck 2 increases the number of channels used to reduce feature mapping compared to Ghost Bottleneck 1, and then connects the inputs and outputs of the two Ghost modules using shortcut.
Finally, the improved model is named YOLOv5_CG, where CG is an abbreviation for the Ghost module and the CBAM initials.

The introduction of datasets
The datasets of defective insulators used in this study comes from the Internet. The pixel size of insulator images is 1152×864, and there are 240 images in total. To prevent too few experimental samples, resulting in over-fitting network training. The data set was enhanced by randomly adding noise and randomly changing image brightness, and a total of 2400 images were generated. Then labelImg was used to label the image: the insulator was marked as "insulator" and its defects as "defect". After annotation, the data set was divided into 90% training set and 10% test set.

Experimental evaluation indicators
MAP (mean accuracy), FPS (number of images processed per second) and Model Size (Model Size) are used as evaluation indexes in this study. The specific calculation process is as follows: mAP is the mean of average accuracy for all categories in the dataset, The calculation of mAP with N detection categories is shown in Formula (4) below. Where P and R represent the accuracy rate and recall rate respectively, as shown in Equations (5) and (6). TP is the number of samples correctly classified into a certain category; TN is the true number of negative samples as negative sample prediction; FP represents the number of samples incorrectly assigned to a category; FN represents the number of misidentified targets. FPS represents the number of images processed per second, and the indicator is related not only to the calculation of the algorithm model, but also to the hardware performance during the experiment. Model Size represents the size of the system memory space that the algorithm model takes up, typically in MB.

Experimental platform introduction and training parameter setting
This paper builds a model based on the Pytorch deep learning framework, the system is configured as NVDIA TITAN RTX ×2 GPU (32G), i7-10700k× 2 CPU, operating system Windows10, CUDA 11.2, Python 3.8. After the experimental environment is built, enter the training stage of the model, the main parameters are set as follows: enter the image size 640×640, the number of pictures per training session is 16, and the number of training times is 200. Table 1 below compares the metrics on the dataset for the two algorithms. Experimental results show that the modified model size is only 66.04%, and the speed of running on the GPU is increased by 29.18 percent, and the mAP is increased by 1.4%. Therefore, the improved model can improve the detection speed and reduce the model parameters under the premise of ensuring the detection accuracy.    Table 2 shows the comparison of detection confidence between YOLOv5 and YOLOv5_CG on the insulators in Figure 5. By comparison with Table 2, it can be seen that, compared with the original YOLov5 network, YOLOv5_CG improves the average reliability of the detection of insulators and their defect locations. Therefore, the effectiveness of the improvement work has been proved successfully.

Conclusion
In the actual insulator detection environment, devices such as drones can not provide sufficient computing power for large network models. In this paper, we design and implement the lightweight YOLOv5_C G network model. The size of this model is only 66.04% of that of YOLOv5, and it is 29.18% faster on GPU and 1.4% faster on mAP. Therefore, it greatly reduces the dependence on the hardware environment, and can meet the actual application needs. This algorithm also has some limitations, and the next step will continue to increase the diversity and complexity of the data, considering other related factors such as weather. At the same time, the network structure will be further improved, and the subsequent work will also solve how to efficiently deploy the mobile models, and verify and improve the proposed model in practical applications.