Interactive technology of high voltage tower acceptance UAV Based on AR and YOLOv7

To address many shortcomings of the current unmanned aerial vehicle (UAV) inspection mode of the transmission line high-voltage towers and improve the inspection efficiency, the UAV tower foreign matter detection technology based on Augmented Reality (AR) and You Only Look Once: Unified, Real-Time Object Detection (YOLOv7) has been developed. Firstly, the system architecture, including UAV, ground station, and AR glasses is established. Then, the AR scene is built by using the Unity engine, and the interactive control mode between AR glasses and UAV is designed. Subsequently, the mapping mode of head action and UAV command is devised to synchronize the UAV action and head action. Finally, the tower is detected for foreign objects by using YOLOv7 vision detection technology. The experimental results validate the efficacy of the built AR scene, demonstrating seamless functionality in completing the video streaming and attitude data transmission functions. The use of AR glasses can achieve interactive control of the UAV attitude. In addition, the YOLOv7 visual detection algorithm can accurately detect the attachment of foreign objects to the high-voltage tower.


Introduction
With the rapid development of the economy and society, there is a noticeable surge in the social demand for electricity.Hence, the burden of grid operation, and maintenance is getting heavier and heavier.To address these challenges, unmanned aerial vehicles (UAVs) have emerged as an effective method for high-altitude inspections.At present, UAVs have found extensive utilization in transmission line operation and maintenance inspection [1][2][3] .However, during the actual usage, operators are required to observe the UAV back to the ground station video while remotely controlling the UAV.In this case, the operation burden is heavy, and the observation effect is poor due to the lack of depth information on the ground station video.Augmented Reality (AR) can effectively interact with the surrounding environment, and its application scope is becoming more and more extensive and in-depth [4][5][6][7][8] .By combining AR technology 9,10 , visual inspection, and UAV, the operator can observe from the first perspective.Moreover, incorporating intelligent warnings into the system can effectively solve problems, such as tower inspection, acceptance, the UAV following the line manipulation, and other related difficulties.
Regarding power grid construction, operation, and maintenance, few studies combine the three aspects of UAVs, AR technology, and visual inspection, mainly because AR technology has just

AR assisted acceptance system architecture
Before conducting inspection and acceptance, the establishment of the entire AR-assisted acceptance system is necessary, as shown in Figure 1.During operation, the UAV is first remotely controlled for takeoff and subsequently switched to the acceptance mode through the ground station.In this mode, the AR glasses take over the control of the UAV.Then, the AR glasses detect the operator's head movement and convert it into the control command to operate the UAV.These commands are sent to the ground station through the LAN router, and then they are sent to the UAV by the ground station to complete the control of the UAV by the AR glasses.Throughout the operation, the drone continuously sends the video stream back to the ground station.Concurrently, the ground station runs the algorithm model based on the YOLOv7 to process the video information and provide the abnormal warning information, pushing the AR glasses through the router and the video stream.

Unity scene building
To achieve a human-computer interaction-free environment using AR glasses, the utilization of the Unity engine becomes essential for constructing an AR scene and visualizing the information flow in the AR scene.The AR scene construction includes the user interface (UI) design and the embedding of the background program, as shown in Figure 2. The UI is designed with a playback window for playing the real-time streaming protocol (RTSP) video stream and an axes plot and parameter box for determining the operator's current pose.The background program mainly consists of two main programs: a socket client program to send the gesture stream and an RTSP program to receive the video stream.The two background programs are deployed to run in two threads to avoid the main program lagging due to blocking.

MRTK human-computer interaction design
For the AR glasses to control the UAV, the way the operator interacts with the AR system must be developed.This system uses the mixed reality toolkit development kit (MRTK), providing human movement functions to develop the human-computer interaction method.
Since the transmission line acceptance inspection requires the UAV to fly around the tower frequently, main flight actions encompass left and right rotations, flying along the tower edge after rotation, and the forward and backward flight actions to adjust the distance from the tower.To enhance the acceptance inspection efficiency, the human-computer interaction must be designed according to the natural behavior of human observation.For instance, the UAV projects the captured video to the AR glasses to facilitate human observation and senses the head movement of the human observation through the MRTK module of the AR glasses and links the two together simultaneously.The correspondence between head and UAV movements is shown in Figure 3.When the head of a worker wearing AR glasses turns horizontally to the left or right, it corresponds to the UAV executing a yaw movement to the left or right.Additionally, tilting the head left or right corresponds to the UAV performing roll movement to the left or right.Additionally, raising and lowering the head corresponds to the pitching of the UAV, controlling its forward and backward movements.

AR glasses and drone command mapping
The utilization of the operator's head's natural movements to control the drone requires designing a mapping mechanism between the head posture to the corresponding drone commands.
To begin with, the AR system captures the angle of the operator's head.Since people have natural random movements, a small range of angles can be ignored.On the other hand, the drone rotates continuously, while there are limitations to the continuous rotation of the human head.To address this issue, the principle of unilateral design is considered.In this case, when the head rotates to a certain side, it generates drone control commands.Conversely, the head is rotated to the opposite side to generate drone commands in the opposite direction, and all the head callbacks do not generate control commands.
The specific approach involves employing AR glasses to measure the angular range of the head attitude and subsequently transforming it into left-right symmetric data through bias to graphically express the direction of the head movement.Finally, the normalization process is applied to solve the problem ranges of head rotates in different ranges on the X, Y, and Z coordinate axes.The interval of -10° to 10° is selected as the natural movement of the head without any processing.The mapping relationship between head attitude and UAV motion is shown in Table 1.
This paper uses the linear mapping of Equation ( 1) to determine the correlation of the head angle with the UAV motion speed.Finally, the speed mapping map is shown in Figure 4.

Speed abs Angle
where represents the three rotational axes i.e., X, Y, Z; indicates the mapping speed of axis , denotes the range of head movement corresponding to axis , and symbolizes the head attitude angle corresponding to axis .

Tower foreign object detection based on YOLOv7
After solving the human-computer interaction problem between the UAV and the operator, observing with AR glasses for a long time causes visual fatigue, influencing the acceptance and inspection effect.The visual learning algorithm provides warning information, which plays an important role in alleviating the operational burden.YOLOv7 algorithm can consider both the efficiency and accuracy of image recognition.Compared to previous versions, YOLOv7 is faster and performs better on embedded devices.In this research, the algorithmic framework is used for the assisted detection of foreign objects.

Principles of the YOLOv7 algorithm
The core of the YOLOv7 algorithm is the positive and negative sample assignment strategy, mainly including the positive sample screening strategy, optimal transport assignment (SimOTA) secondary screening, and AUX auxiliary output.In the positive sample screening strategy, the k-means clustering algorithm is used to match the priori edge (referred to as anchor) with each real value (referred to as ground truth) before formally using the image data for training.Moreover, the ground truth with a low degree of match is rejected as the background, and the rest is used as positive samples.In addition, this process gets the grid of the network layer corresponding to the anchor box (referred to as the grid).If the center point of the ground truth aligns with a certain grid, the matched anchor with the ground truth in that grid and the anchors in the two adjacent grids are taken as a positive sample, as shown in Figure 5.In this paper, a positive sample screening strategy is used to select positive samples.In the above example figure, the strategy selects a grid centered at (2.5, 1.5) and two grids centered at (1.5, 1.5) and (2.5, 2.5) for the ground truth represented by the solid rectangular box.In this case, three grids and their matching anchors can be obtained with three centroids located in the centers of the three aforementioned grids and with the same length and width as the length and width of the corresponding anchors, matching the dimensions of the above three grids.It should be noted that a ground truth may match with more than one anchor, leading to obtaining 3-27 positive samples.Therefore, this effective increase in the number of positive samples significantly enhances the model's accuracy.
Having addressed the problem of correctly obtaining the positive samples, the next task involves a secondary screening process using SimOTA.This method calculates the intersection over union (IOU) between the positive samples and the ground truth.These values are then ranked in descending order, and the sum of the first ten values is taken and rounded to yield a value denoted as b.Subsequently, the cost function of the positive samples is calculated and ranked in descending order.The first b samples of the cost function are considered positive samples to realize the purpose of dynamically allocating different numbers of positive samples for different targets.Moreover, the case where the same grid prediction frame is associated with two ground truths is considered.In this case, the method selects the prediction frame with the smaller value of the cost function as the positive sample of the corresponding ground truth.
To solve the problem of effectively allocating training samples for different foreign object species, the YOLOv7 framework incorporates two heads: the auxiliary head (aux head) and the lead head (lead head).These heads serve to allow the middle layer of the network to learn more information and obtain richer gradient information, thus helping the model train better neural network parameters.The aux module operates as an auxiliary output module, while the head module is the process of target detection using grid-based anchors on feature maps of various scales.Similar to the lead head, the aux head filters positive samples; however, it sets the offset to 1 during the grid filtering process.Furthermore, when performing SimOTA, the aux head considers the sum of the first 20 values and rounds it up to increase the recall rate and prevent missing detections and omissions.With these improvements, the YOLOv7 framework demonstrates notable efficacy in tackling the problem of training sample allocation for foreign species.Moreover, these improvements yield enhanced accuracy and robustness.

Image detection process
After clarifying the core algorithm, the detection process of this paper for tower foreign object images mainly includes image preprocessing, data training, data testing, and result prediction.
The image preprocessing process comprises the following key steps: 1) Image grayscale processing: Graying with image averaging, we use the average of the gray values of the three-channel grayscale image as the grayscale image and remove the meaningless color image.
2) Image grayscale enhancement: We use segmented linear grayscale correction processing, which can increase the dynamic range of the gradient layers to more easily identify the desired foreign object target.
3) Image denoising processing: We employ Gaussian filtering on the airflow by the high altitude, the lens swing caused by the jitter, and digital equipment in the analog-to-digital conversion process brought about by the noise interference.
Model training begins with the selection of the neural network parameters required by the YOLOv7 framework according to the requirements of the detection task, and then stores in the form of yaml to form a pre-training file.In this paper, the YOLOv7-tiny lightweight pre-training model is chosen, which can effectively shorten the training time.During the training process, the pre-training file is parsed, the yaml file format is converted into Python's built-in dictionary format, and the weight model is loaded through the model.
Following the successful completion of the algorithm testing, the captured video by the UAV is cropped in advance, from which each frame of the acquired image is used as a test set.The accuracy of foreign object detection and the recall rate are calculated by using the trained model parameters, and the model precision is judged by reconciling the average of the two.Finally, the image prediction work is performed.The steps are basically the same as the test, but the corresponding index data are no longer output.Moreover, the output is changed to the predicted image, and the target detection frame is labeled on the image, along with the category of foreign objects and the accuracy rate.

Experimental validation
The test image size is set to be consistent with the training, and the confidence threshold of the initial target is set to 0.6.The results of the foreign object detection in the high-voltage pylons in the videointercepted images are shown in Figure 6.Here, the image is cropped and enlarged to make the detection box easy to view.Not just four images were tested, but multiple sets of images were tested across four categories.Only 4 images are selected for display here.In this paper, 4 images are chosen for each iteration as the amount of data for one test iteration.Figure 7 depicts an example of the predicted image output.As can be seen, the algorithm can correctly identify bird nests on pylons with an accuracy rate of 95%.The values in the figure represent the check-accuracy rate for foreign object recognition, indicating the proportion of samples with a predicted value of 1 and a true value of 1 among all samples, i.e., the degree of confidence that the foreign object is recognized as a specific species.
The parameter settings are subjected to further analysis, with the F1 score serving as the reconciled average of both the check accuracy and recall (referred to as the check completeness rate, which indicates the proportion of foreign objects that are picked out), effectively providing a comprehensive response to the recognition effect.The F1 curve graph is shown in Figure 8, obtained through 300 training rounds, indicating that a confidence level in the range of 0.7-0.85results in a superior F1 score.Finally, the dataset records are summarized in the form of a confusion matrix according to the two judging criteria: the true categories and the categories predicted by the classification model.As shown in Figure 9, the rows of the matrix represent the true values, and the columns of the matrix signify the predicted values.An observation from the diagonal elements reveals that the prediction accuracy for all four foreign objects surpasses 97%, indicating that the relevance and accuracy of this study meet the expected requirements.This experimental dataset contains 2660 pictures of bird's nests, 507 pictures of balloons, 479 pictures of garbage, and 476 pictures of kites.This diverse dataset can better cover the attachment of foreign objects to high-voltage towers.

Conclusion
This paper investigated the inspection and acceptance method that combined AR glasses, UAV, and visual processing to solve the existing problems in the way of UAV transmission line acceptance and inspection operations.Firstly, AR glasses and no human-machine interaction environment were developed, and AR operation drone rules based on the natural movements of the human head were designed to realize that the operating personnel manipulates the drone with the first viewpoint.Hence, to reduce the recognition fatigue due to the long-time operation of the staff, a foreign object detection algorithm model was trained based on the YOLOv7 framework.The experiment verified that the model has good accuracy and universality.In addition, the operator's burden was greatly reduced by pushing the warning information given by the YOLOv7 algorithm to the AR glasses.

Figure 3 .
Figure 3. AR Glasses and Drone Control Correspondence.

Figure 5 .
Figure 5. Schematic diagram of positive sample screening.

Figure 8 .
Figure 8. Graph of analysis results.

Table 1 .
Head Attitude Corresponding to Drone Motion.