Research on Surface Defect Detection of Solar Pv Panels Based on Pre-Training Network and Feature Fusion

Aiming at the problems of the lack of sample size and the complexity of defect images in defect detection task, based on the idea of transfer learning and hierarchical feature fusion, this paper proposes a deep classification network model of improved vgg19 pre-training network by analyzing the basic principle of feature extraction of convolutional neural network and getting inspiration from feature pyramid network. Then, the model is trained by the small-scale defect images of solar pv panel. Finally, the solar pv panel data set containing four kinds of defects, including cracks, debris, broken gates and black areas, is selected to comprehensively verify the effectiveness of the improved network in the defect detection task. The experimental results show that the proposed method is superior to the classical vgg19 network model in four evaluation indexes, such as accuracy, precision, recall rate and F1 score.


Introduction
As a big manufacturing country, China has to produce a large number of industrial products every day.With the improvement of living standards, consumers have higher and higher requirements for product quality. In addition to the performance of products, they also need to ensure the perfect appearance. However, in the process of manufacturing products, surface defects are often inevitable. Surface defects not only affect the beauty and comfort of products, but also affect their performance in different degrees. Therefore, manufacturers attach great importance to the surface defect detection, so as to effectively control the product quality, and analyze some problems in the production process according to the test results, so as to effectively eliminate or reduce the production of defective products and prevent defects After the products flow into the market, it will cause trade disputes, affect the honor of enterprises, and even endanger personal safety.
For example, in the production process of solar pv(photovoltaic) panels, there will be some defects that are not easy to be detected by naked eyes, such as cracks, debris, broken gate and black areas, which seriously affect the power generation efficiency. Therefore, the defect detection technology of solar pv panel is an important link to ensure the efficiency of pv power generation [1].
At present, manual quality inspection is still the most traditional method of surface defect detection. This method has low sampling rate, low accuracy, poor real-time performance, low efficiency, high labor intensity, and is greatly affected by the experience of quality inspection workers and subjective factors. Machine vision as a non-contact, non-destructive automatic detection technology, has the outstanding advantages of safety, stability, high efficiency.With the rapid growth of deep learning model represented by convolutional neural network (CNN), it has been successfully applied in many fields of machine vision [2]. Many defect detection methods based on deep learning are also widely used in various industrial scenarios.
However, different from theoretical research, surface defect inspection still faces many problems, such as: 1) we often cannot find large-scale product defect images to complete the training of deep neural network model in engineering design practice, which leads to the phenomenon of model over fitting and falling into local optimal solution. 2) in the real and complex industrial environment, we cannot find large-scale product defect images to complete the deep neural network model training, There may be small difference between defect imaging and background, low contrast, large change of defect scale, variety of types, large amount of noise in defect image, and even a large amount of interference in the imaging of defects under the environment of the factory,as shown in Figure 1.

Fine tuning strategy of pre-training network based on transfer learning
In the practice of deep learning method, we found that after the effective training of ImageNet data set, the parameters of the deep network models (such as vgg16, vgg19, inception-v3, resnet50 and other classic deep network models) are constantly adjusted, and they often perform well in the test set [2], that is, the model has good generalization ability and robustness. However, if only a small number of training samples are used to train the deep network model, the performance of the model in the test set is poor.

Transfor learning
Transfer learning [3] refers to the transfer of knowledge from the source domain to the target domain, that is, transferring the previous knowledge to another new domain. In order to transfer the recognition ability of the pre-training network model to the user-defined data set. This paper introduces the idea of transfer learning and lists three fine tuning strategies for classical pre training network, as shown in Figure 2.  Figure 2. Fine tuning strategies of transfer learning.

Selection rules of fine-tuning strategy.
According to the size of the user-defined data set and the similarity with the pre-training data set, the fine-tuning rules are summarized as follows: (1) When the similarity between the user-defined data set and pre-training data set is small, and the scale of the user-defined data set is relatively large, the whole model is often trained on the basis of the original parameters of the pre-training network, that is, Option 1.
(2) When the user-defined data set is similar to the pre-training data set to a certain extent, and the scale of the user-defined data set itself is not large, the partial convolution layer of the pre-training network is frozen, and only a small amount of user-defined data is used to train the high-level convolution group, that is, Option 2.
(3) When the user-defined data set and the pre-training data set have large similarity and small scale, freeze all convolution layers and train only the full connection layer of the pre-training network, that is, Option3. As shown in Figure 3, the VGG-19 network consists of five convolution groups and three full connection layers. These five convolution groups contain 2, 2, 4, 4 ,4 convolution layers respectively, so there are 19 layers in total. As pointed out in [4], convolution layers with different depths have different receptive fields, and can extract features of different levels in the image. Shallow convolution kernel can extract sharp and detailed boundary, texture and gray features, which are universal for different images. Therefore, from the theoretical analysis, we can freeze the low-level convolution group, that is, to preserve the shallow weight parameters after updating and adjusting the pre-training data set, which can be directly used to extract the shallow features of user-defined data sets. In addition, the high-level convolution group has larger receptive field, which can extract more abstract semantic information. Therefore, high-level features can be directly sent into the classifier for classification. In order to ensure that the target features are more discriminative, it is necessary to use the user-defined data set to train and adjust the semantic parameters of the model. Finally, the full connection layer is actually a classifier. According to the specific classification task, it needs to adjust adaptively, such as setting the reasonable number of neurons and the number of classification label, and selecting the appropriate loss function and optimizer. In the process of transfer learning, this part can be replaced by the user-defined full connection layer.

Hierarchical feature fusion mechanism
As shown in Figure 4, the classic Feature Pyramid Network (FPN) [5] integrates features of different depths organically, strengthens the information flow between layers, and provides more detailed information for visual tasks. By fusing feature maps of different depths, the high-level features of lowresolution and high-level semantic information and the low-level features of high-resolution and lowlevel semantic information are connected from top to bottom, so that the features at all scales have rich semantic information, so it has been widely used in the fields of target detection and image segmentation.

Attention mechanism
Attention mechanism has been proposed for the first time in the convolutional neural network Senet (squeeze-and-excitation network) [6]. After adding attention mechanism, the network model can autonomously learn and record the importance of each feature channel. The importance degree is added to feature channels to enhance the importance of useful channels and suppress the importance of useless channels. In fact, this is a weighted fusion process, which allocates appropriate weight coefficients according to the importance of the channel. Senet network greatly improves the efficiency of feature extraction through attention mechanism. Inspired by SENet network, Zhang et al. [7] designed a channel attention network, and its schematic diagram is shown in Figure 5. Suppose that the size of the input feature map is H × W × C, where h, W and C represent the height, width and channel number of the feature maps respectively. Firstly, the HGP of each channel's feature map is pooled globally, and the pooled value is used as the feature descriptor of each channel, thus a C-dimensional feature vector can be obtained. Then, a two-layer perceptron network is used to fuse the information of different channels. Where WD is the reduction of the number of channels, WU is the expansion of the number of channels, and the scaling factor of reduction and expansion is r. A new feature vector can be obtained, and then the activation function f is used to activate the vector. Finally, the original feature maps are weighted and fused.

Improved VGG-19 pre-training network
By introducing the idea of transfer learning and attention mechanism, and getting inspiration from the feature pyramid network, we propose an improved vgg19 deep network model, as shown in Figure 6.

Fine tuning strategy of backbone network based on VGG-19
In this study, VGG-19 backbone network is selected. After full training of ImageNet data set, the network has shown good generalization ability in the test set. However, the solar pv panel data set used in this study is small in scale and does not have a large degree of similarity with ImageNet data set. Therefore, according to the fine-tuning rules shown in Figure 1, Option 2 is selected as the finetuning strategy for this study.
The specific adjustment methods are as follows:1) The basic texture parameters of vgg19 and vg-g19 are used to extract the original texture features. 2) Training high-level convolution group: using the collected small-scale data set to fully train and adjust the semantic level parameters of the pre training, so that the extracted target features are more discriminative. 3) A user-defined full connection layer is designed to replace the full connection layer of the pre training network. (the parameters are set as follows: full connection layer 1, size = 4096; full connection layer 2, size = 4096; softmax,size = 4, loss ="sparse_categorical_crossentropy"; optimizer="stochastic gradient descent")

Hierarchical feature extraction module
Inspired by the pyramidal feature fusion module, VGG-19 backbone network has five convolution groups. We can extract a feature map f i from the last convolution layer of each convolution group (i.e., conv1_ 2, conv2_ 2, conv3_ 4, conv4_ 4, conv5_ 4), so a set of feature maps F= (f 1 , f 2 , f 3 , f 4 , f 5) can be obtained,as shown in Figure 7.

Hierarchical feature fusion attention module
Since convolution is a down sampling process, it means that features from deeper layers have lower resolution. If the features extracted from the first convolution group are taken as the standard, the features from the remaining four convolution groups need to be upsampled(deconv) to make the resolution the same.Then, the channel attention mechanism shown in Fig. 5 is introduced into the fusion scheme, five features of different depths are weighted and fused., Finally,the fused feature map is sent to the classifier for classification

Experimental results and analysis
In this experiment, 800 pictures of solar pv panels were collected, which contained four kinds of defects, namely cracks, debris, broken gate and black area. As shown in Figure 8.  The evaluation index was mainly composed of accuracy (AC), recall (R), precision (P), and F1 score.
The accuracy rate is the percentage of correct prediction results in the total sample, and its calculation formula is as follows: The precision rate is the probability that all predicted positive samples are actually positive samples, and its calculation formula is as follows: Recall rate refers to the probability that the actual positive samples are predicted to be positive samples, and the formula is as follows: The F1 score is used to comprehensively evaluate the performance of the model, and the formula is as follows: Under the condition that the training data and test data are the same, the experimental results of VGG-19 model and improved VGG-19 model on four evaluation indexes are shown in Table 1 and  table 2.  Table 1 and table 2, it can be found that although the accuracy rate of the classical vgg19 network for the detection of four common defects of the solar pv panel image reaches more than 90%, the accuracy and recall rate are low, especially the crack and fragment type defects, which have low contrast with the normal images, are relatively difficult to accurately distinguish, and the two types of defects are the same The improved vgg19 network model with feature fusion attention module can theoretically extract more detailed information. Compared with the experimental data, it can be seen that the detection accuracy of crack and fragment is significantly improved, and the recall rate is increased from 76.67% to 83.33%. From the overall analysis, the improved vgg19 network in the accuracy, accuracy, recall rate and F1 score four indicators are effectively improved than the classic vgg19 model, which verifies the effectiveness of the improved model in the defect detection task.

Conclusions
In this paper, based on some problems of automatic defect detection technology, such as insufficient samples, low contrast, intra-class difference and inter-class similarity, an improved defect detection model is proposed based on the pre-training backbone network, feature fusion mechanism and attention mechanism. The experimental results show that the deep network model training can be completed only by using small-scale pv panel defect data set, and the evaluation index is better than the classic pre -training network, and the F1 score is more than 90%. However, in order to put the model into practical industrial application, the detection accuracy needs to be further improved. In addition, this study only uses one data set to verify the validity of the model, and only four types of defects are listed. It is far from enough to comprehensively evaluate the effectiveness of the improved model on more defect data sets. However, this paper has a certain reference value for solving the defect detection technology problems faced by similar products.