A Simple Model for Fault Detection of UAV Photoelectric Load Based on Image Classification

A simple image-based unmanned aerial vehicle (UAV) optoelectronic load fault detection model is proposed to address the types of distortion that occur during the imaging process due to device failures. The model adopts the classic AlexNet network, adjusts the key parameters of the network, adds a BN layer, and then optimizes the original network using LeakyReLU, achieving a fault detection model based on image classification. The experimental results show that the clas-sification accuracy of our model has reached over 90%, which is higher than the comparison method. Moreover, the training samples required for our model are small and easy to implement.


Introduction
With the continuous development of UAV, aerial images have become an important means for humans to obtain geographic and environmental information.Clear aerial images obtained through UAV optoelectronic payload are widely used in various fields such as military, transportation construction, water conservancy engineering, ecological research, and urban planning.However, during the imaging process, there is often distortion of the image caused by the photoelectric load imaging device itself.According to the classification of fault causes, the typical types of distortion in the optoelectronic load of UAV include device damage, defocus distortion, lens contami-nation.Figure 1 shows common types of UAV optoelectronic load fault images.Device damage refers to the damage of photoelectric converter components (such as CCD and CMOS), resulting in fixed areas or rows of images without signals, which can be considered as a form of occlusion distortion.Defocus distortion refers to the phenomenon of blurring the entire image due to improper focus setting, resulting in distortion with isotropic characteristics.When the lens is contaminated with water droplets, dirt, etc., image distortion is called lens contamination.This distortion is local, fixed in area, and can also cause blurring of the scene.Different types of distortion have different degradation effects on images.Therefore, by automatically identifying the cause of the malfunction image of the photoelectric load through the distorted image, predicting the type of malfunction, and providing sufficient information for the maintenance and guarantee of the photoelectric load.
Currently, due to the rapid development of deep learning, Numerous scholars have conducted extensive research and proposed a series of intelligent fault detection algorithm models based on image information.Tao et al. used two cascaded Fast-RCNN networks to detect defects and damages of insulators in aerial images [1].Xue et al. designed a Position Sensitive ROI Pooling based on the characteristics of their specific task fault targets, achieving better fault detection performance [2].The above methods require a large number of manual annotation, and when the dataset cannot meet the requirements, the detection effect is often poor.Zong et al. generated low dimensional representation and reconstruction errors for input data through autoencoders, and conducted joint optimization to improve the ability to distinguish errors [3].Gong et al. introduced a memory module in the autoencoder to suppress its generalization performance, making it easier to screen abnormal samples [4].Akcay et al. utilized an encoder and two discriminators to detect hazardous materials in X-ray security checks [5].Schlegl et al. analyzed the optimal sequence and training method for training GAN and encoder, and proposed a medical image lesion detection method [6].Akcay et al. added a skip connection between the encoder and generator to make the positioning of abnormal parts more accurate [7].The above methods all provide new ideas, but their effectiveness in practical applications is often unsatisfactory, and the model is complex and difficult to implement.
Based on the above reasons, this paper proposes a simple optoelectronic load fault detection model forUAV based on image classification.By using the classic AlexNet backbone network and adjusting the model architecture and parameters, the model can achieve high-precision fault classification and detection results with only small-scale training data, with high detection efficiency and easy implementation.

AlexNet
AlexNet input size is 224×224×3.The network structure generally includes 5 convo-lutional layers (including 3 pooling layers) and 3 fully connected (FC) layers, each of which contains convolutional kernels, bias terms, ReLU activation functions, and local response normalization (LRN) modules [8].The first, second, and fifth convolu-tional layers are followed by a maximum pooling layer, and the last three layers are FC layers.The final output layer is softmax, which converts the network output into probability values for predicting the category of images.In addition to scale, AlexNet has done a lot of work in solving overfitting problems, resulting in excellent perfor-mance on large-scale datasets.The structure of AlexNet is shown in Table 1.

Convolutional layer and pooling layer. C1:
The convolutional kernel size of 11×11×3, with a step size of 4. Add 55×55×48 feature maps are respectively placed into the ReLU activation function to generate activation maps.Maximize pooling of activated images with a size of 3×3.The stripe is 2, and the size of the pooled feature map is 27×27×48.Perform LRN treatment after pooling.C2: Use the output of convolutional layer C1 (response normalization and pooling) as input, and filter with 256 convolutional kernels with a kernel size of 5×5×48.C3: There are 384 kernels with a size of 3×3×256, connected to the output (normalized, pooled) of convolutional layer C2.C4: There are 384 kernels with a size of 3×3×192.C5: There are 256 kernels with a size of 3×3×192.The convolutional layer C5 has an additional pooling compared to the C3 and C4 layers, and the pool-ing size is also 3×3, with a stripe of 2.The convolutional layers C3, C4, and C5 are interconnected, with no access to pooling or normalization layers in the middle.
2.1.2.Fully connected layer.F6: The convolutional kernel size is 6×6×256, 4096 convolutional kernels generate 4096 feature maps with a size of 1×1.Then put it into the ReLU function and Dropout for processing.It is worth noting that AlexNet uses a Dropout layer to reduce the occurrence of overfitting.F7: Same as F6 layer.F8: The output of the last fully con-nected layer is the input of 1000 dimensional softmax, which generates predicted values for 1000 categories.
 and  represent the mean and variance of the convolutional features of the layer,  and  represent the displacement coefficients.

2.2.2.
LeakyReLU.The ReLU activation function used in AlexNet discards parts less than 0, mainly because the trained network is not in a simple ax+b form, but rather in the hope of obtaining a more complex nonlinear network structure.Moreover, the ReLU activation function can effectively alleviate the problem of gradient vanishing during the training process of the model.The LeakyReLU activation function not only retains the advantages of the ReLU activation function, but also appropriately retains parameters less than 0, which can learn more parameters during the training process [10].Therefore, change the ReLU in AlexNet to LeakyReLU to retain more convolutional features.The formula for calculating LeakyReLU is as follows: ( , ) , ( , ) 0 ( , ) ( , ) , ( , ) 0 x i j x i j y i j x i j x i j 2.2.3.Our network.Based on the excellent performance and small computational complexity of AlexNet, this paper designs a classification model for UAV optoelectronic load faults.The fault diagnosis problem is considered as a pattern classification problem, which includes four categories: normal, device damage, defocus distortion, and lens contamination.Figure 2 shows the model framework.Research has shown that the large convolutional layer size and step size in Krizhevsky's CNN structure only converge the first layer features learned for high-frequency and low-frequency information, but not for the middle layer information.At the same time, when extracting the second layer features, aliasing artifacts appear in the feature map, which can interfere with the features of that layer.Therefore, using this size and step size to extract features is not ideal.Due to the small differences in the performance of the above four types of faults in the images, in order to better extract feature information, the size of the first convolutional layer of the model is reduced from 11×11 to 7×7.Change the stride from 4 to 2, add a BN layer after the last convolutional layer, and change ReLU to LeakyReLU.The other structures are the same as AlexNet.The model consists of 9 layers, with the first 5 layers being convolutional layers, followed by BN layers, and the last 3 layers being fully connected layers.In order to align with the target, the last fully connected layer generates a 4dimensional softmax input for detection.Adopting this new framework helps to retain more image information in the first and second layer features, and to retain as many features as possible after the convolution calculation in the fifth layer, thereby contributing to the improvement of subsequent fault classification performance.The comparison between the improved framework and the original AlexNet is shown in Table 2.

Dataset
This paper conducts classification experiments on four types of optoelectronic load images, three of which are images collected due to load device faults, including de-vice damage, defocus distortion, and lens contamination.The last type is images collected due to non load device faults, which may be clear or partially distorted.However, this distortion is not caused by device faults, and is referred to as a normal image in this paper.
Due to the fact that the distortion generated by the image does not change due to changes in scene content, and the number of fault images collected during real flight is relatively small, in order to enable the algorithm to obtain sufficient training sam-ples, the images used in the experiment were selected from publicly available image datasets of the same fault type.Defocus Blur dataset images from ImageNet-C were selected for defocus distortion, and Raindrop image dataset images from Peking University were selected for lens contamination [11]; The device damage is generated by computer simulation, while the normal image consists of clear images and non device fault images.

Experimental results and analysis
Accuracy refers to the ratio of correctly classified photoelectric load images to the total image.To demonstrate the superiority of this method, the proposed model is compared with two traditional machine learning classification models, SVM and K-means algorithms, and a simple and classic classification model, AlexNet, to learn and classify on the dataset, calculate the accuracy of each model, and conduct comparative testing.To ensure fairness, both training and testing samples were randomly selected, with a quantity ratio of 3:2.The final performance on the test set is shown in Table 3. From this, it can be seen that compared with various traditional algorithms, this method has strong competitiveness.

Conclusion
This paper focuses on three types of distortion caused by device damage, defocusing misalignment, and lens contamination in the imaging process of UAV optoelectronic loads due to device failures.Image classification is used for fault type detection.The method adopts the classic AlexNet as the backbone framework and achieves a classification model with a simple framework structure by changing network parameters.The experimental results show that the proposed method has a simple structure, accurate classification results, fast cutting speed, and certain advantages.In the future, we will enrich the sample data and further improve the running speed and classification performance of the network.

Figure 1 .
Figure 1.Common types of UAV optoelectronic load fault images.(From left to right are device damage, defocus distortion, and lens contamination).

Figure 3
Figure3shows the dataset sample.600 images were selected for each category, totaling 2400 images to form the experimental dataset, and the imag-es were unified to 224 × 224 size.Randomly select 50 images, 60% for training and 40% for testing, with 10000 iterations.

Figure 3 .
Figure 3. Dataset sample.(From left to right are normal image,device damage, defocus distortion, and lens contamination).

Table 1 .
[9]xNet structure Batch normalization.Batch normalization (BN) layer can accelerate the training of network models and prevent overfitting[9].To enhance convolutional feature extraction and make the model as simple as possible, a BN layer is added after the last convolutional layer to move and scale the features.The specific formula is as follows:

Table 2 .
Our network compare with AlexNet

Table 3 .
Method Comparison