Transmission line defect detection based on feature fusion

Transmission lines are the lifeblood of the power system, and regular line inspections are required to ensure the operation of transmission lines. However, due to the special high-voltage environment of transmission lines, the pictures collected by UAVs are characterized by many background interferences and small defect samples, and it is difficult to directly detect line inspection methods based on deep learning. For this reason, this paper proposes a high and low-dimensional fusion framework, which divides the target detection task into four modules, and greatly reduces the technical volume of the model while retaining the high-dimensional features through the collaboration between modules. This paper also introduces the window attention mechanism based on YOLOv7, which allows the model to focus on local information and improve the model’s detection ability on small targets and also introduces the Wise-IOU loss to improve the model’s prediction accuracy and inference time. Experiments prove that the method in this paper can significantly improve the accuracy of model prediction while meeting the speed requirements of industrial scenarios.


Introduction
The power system is the core infrastructure of the national economy, and all aspects of economic development and social life are dependent on the power supply.With the expansion of the scale of the power system, the complexity of its system and the need for inspection are also increasing.In the power system, power transmission is one of the key parts.To ensure the stability of power transmission, the staff needs to carry out regular inspections, and timely detection of potential faults, to ensure the normal operation of transmission lines.
The current mainstream target detection algorithms can be of two types.One is the two-stage R-CNN series based on region proposal network [1] [2] , and the other is the single-stage YOLO [3] based on regression idea.Along with the development of target detection algorithms, many detection algorithms based on deep learning are applied to line inspection.Chen et al. used the improved Faster-RCNN algorithm to detect bolt defects [4] , but the detection efficiency is slow and the FPS is only 22, which can't satisfy the function of real-time detection.Pan et al. expanded the dataset based on the superresolution reconstruction method [5] , which solved the problem of less defective data in industrial scenarios, but caused the problem of model overfitting.Bao et al. achieved the detection of damping hammers by Cascade RCNN [6] , but the number of model parameters was large and difficult to deploy. [7], but the detection accuracy is poor in the case of small defect sizes and blurring.

Cheng et al. detected insulator defects based on YOLOv4
The purpose of this paper is to construct a set of intelligent algorithms for line inspection to solve the problem of small target scale and difficulty to detect in UAV inspection to reduce the difficulty of inspectors' work, and finally reach the goal of automated line inspection.The main contents of this paper are as follows.1.We propose the feature fusion framework, which reduces the computational amount of the model while retaining the high dimensional features and improves the detection effect of small-scale targets based on ensuring efficiency.2. We introduce the window attention mechanism to optimize the model, which enhances the ability of the model to extract local data and improves the generalization ability and stability of the model.

Transmission line defect detection process
Transmission line defect detection is generally accomplished through UAV inspection, as presented in Figure 1, featuring three main steps: picture acquisition, model training, and defect detection.Before deployment, it is essential to gather a significant amount of transmission line images using UAV photography as a model dataset, which is then utilized to train the model.After deployment, the inspection procedure is carried out as follows: the images are captured via the UAV and transmitted through the network to the ground-based control center.Subsequently, the collected images are detected by the inspection model at the control center to identify any defective parts in the transmission line.In the defect inspection process outlined above, the UAVs usually collect actual transmission line pictures that have 5000 pixels.However, the model needs to adjust the picture size to 640 pixels during detection, which can cause pixel loss and blurred details in the adjustment process.Furthermore, although defects frequently occur on a small scale and some are similar to normal samples, modifying the picture size can significantly impact the model's ability to detect such defects.As a solution, we suggest a fusion detection framework that incorporates high and low-dimension features to solve the issue.

Feature fusion framework
The feature fusion detection framework is based on the concept of the region proposal network in Fast-RCNN [1] .It blends the image's high and low dimensional features between the region suggestion network and the detection head.This framework divides the target detection task into four sub-modules, and the structure is illustrated in Figure 2. Figure 2. Feature fusion framework.The region proposal module is accountable for extracting the regions suggested from low-precision images.These recommended regions pertain to areas that are more probable to have defective targets.We use YOLOv7 [8] as the foundation for the region proposal component.CIoU-loss is the standard loss function used in YOLOv7, which considers the overlapping region of the prediction frame, the distance from the centroid, and the aspect ratio.However, the aspect ratio's definition presents ambiguity, and the balance of hard and easy samples is disregarded.Consequently, we employ WIoU-loss [9] with the following equation: Due to significant background interference in the transmission line images, we use the window attentionbased model, Effective Model [10] , to optimize the backbone network of the region proposal module.The window attention mechanism enables the model to concentrate on local regions with targets and exclude background regions, thereby enhancing the region suggestion module's predictive accuracy.The data generation module receives predictions from the region proposal module, and then maps them onto the high-resolution image, resulting in the production of suggested regions with highprecision features, which were used as the input of the local prediction module.The prediction targets for these regions are then generated according to real labels.By incorporating the data generation module, the framework can fuse high-resolution image information with low-resolution suggested regions.This results in the ability to dismiss most background-independent regions and concentrate on key regions for detection targets.This approach maintains high-precision inputs while keeping subsequent modules computationally efficient.
The local prediction module receives high-resolution images from the data generation module and detects defective parts or assesses whether they are defective from the images.For detecting whether there are defects on the parts (e.g., insulators lightning, etc.), the module is realized by the detection network; for detecting whether small parts are damaged (e.g., signboards broken, connecting fittings rust, etc.), the module is realized by the classification network.In our experiments, the module is implemented using the YOLOv7 detection network to locate defects.
The post process module is responsible for remapping the local prediction results to the original image.Its inputs are the local prediction results and the relevant information of the proposed region, after geometric operations, the prediction results are mapped to the original image dimensions, and finally, the output of the whole framework is obtained.In the data generate module, we use the image context filling method, which allows the image to contain more contextual information and helps the model to learn more local information, but it also leads to redundancy in the image generated by the data generate module.During inference, the local predict module may receive redundant image information, which leads to its prediction results for the same target being repeated many times.To solve this problem, the post process model sets the threshold of NMS to 0.2, and this value can largely exclude the data set duplication caused by filling the content of the original image, to ensure that the model predicts the correctness of the result.
The feature fusion detection framework ensures that the local prediction module receives finer features by performing feature fusion in the data generation module, which greatly reduces the model computation compared to the method of directly inputting high-resolution images, and allows the model to obtain finer features and avoids too much overhead compared to the method of directly inputting lowresolution images.

Dataset
The dataset used in this paper is the pictures of transmission lines taken by UAVs provided by China Southern Power Grid, with a total of 8,730 pictures.6,984 pictures are used as the training set, and the rest are used as the test set.Some of the faulty samples in the dataset are shown in Figure 4. Since the resolution of the images in the dataset is not uniform, this paper uniformly reduces the images to 640×640 before training the model, which makes it easier to train and converge the model.The widthheight distribution of the defective samples in the normalized dataset is visualized and analyzed, and the results are shown in Figure 3.The experimental platform used in this paper is the NVIDIA GeForce RTX 3090.

Experiments and analysis
To validate the model effect, we conducted a series of ablation experiments to compare and analyze the feature fusion detection framework under different parameter configurations.Firstly, we compare the detection effect of different versions of YOLOv7 in the region proposal module, and the results are shown in Table 1.From the table, we know that among the different YOLOv7 variants, YOLOV7-tiny can guarantee smaller model parameters and FLOPs with less loss of accuracy.Thus, we chose eYOLOv7-tiny as the benchmark of the region proposal module model.We compare the effect after introducing the attention mechanism and WIoU-loss in the region proposal module, and the results are shown in Table 2. From the table, it can be seen that by introducing the attention mechanism and WIoU-loss, it can reduce the model's anti-interference ability to the background irrelevant information, so that the model pays more attention to the real region containing the detection target, and improve the model's prediction accuracy while reducing the number of model parameters and FLOPs.We also test the effect of different input image sizes on the local prediction module.By controlling the output of the data generation module, different images can be generated for the local prediction module.This paper tries images with resolutions of 384, 480, and 640, and the experimental results are shown in Table 3.As can be seen from the table, the 480-pixel size can balance the accuracy and computation, so this paper finally chooses 480 pixels as the image size generated by the data generation module.Finally, we compare the effectiveness of the high and low dimensional fusion framework with YOLOv7 direct detection, and the results are shown in Table 4. Since the local prediction module in this paper's framework receives inputs as high-resolution images, it obtains finer features compared to direct detection and thus improves accuracy more.At the same time, because the algorithm in this paper is a two-stage detection algorithm based on Faster-RCNN, there is the disadvantage of slower inference and lower FPS, but after the actual deployment test, the method in this paper can still meet the speed requirements of line inspection.Figure 5 shows some of the prediction results of the model.

Conclusions
We design a feature fusion detection framework for the problem of large background interference and small defect features in UAV transmission line inspection.Compared with using the model to detect defects directly in the images, the method proposed can substantially improve the accuracy with less speed loss, and this paper also enhances the model's ability to focus on local features by introducing windows attention to reduce the impact of background interference on the prediction results.After the validation of the dataset, the method can achieve 84.9% AP0.5, and the detection speed can meet the demand of industrial deployment.

Figure 1 .
Figure 1.Transmission line defect detection process.In the defect inspection process outlined above, the UAVs usually collect actual transmission line pictures that have 5000 pixels.However, the model needs to adjust the picture size to 640 pixels during detection, which can cause pixel loss and blurred details in the adjustment process.Furthermore, although defects frequently occur on a small scale and some are similar to normal samples, modifying the picture size can significantly impact the model's ability to detect such defects.As a solution, we suggest a fusion detection framework that incorporates high and low-dimension features to solve the issue.

Figure 3 .
Figure 3. Width and height distribution of the dataset (image size 640).

Figure 4 .
Figure 4. Defect image of signboard broken and insulator lighting.

Table 1 .
Result of different YOLOv7 models in region proposal module.

Table 2 .
Result of different methods in region proposal module.

Table 3 .
Results of different mage sizes in local predict module.