Yolov5-based channel pruning is used for real-time detection of construction workers’ safety helmets and anti-slip shoes in informationalized construction sites

In view of the problems that the application of target detection model in edge equipment occupied too much memory and the operation was slow, a lightweight model based on YoloV5 is proposed to detect the wearing condition of safety helmet and anti-slip shoes. The channel prunning is carried out on the basis of the original model, and the channel with low weight is deleted to reduce the parameters of the model and improve the detection speed. The experimental results show that the recognition accuracy of the network after channel pruning is 93.2%, which is basically the same as that before channel pruning. Meanwhile, the number of parameters is reduced by 63.9%, and the detection speed is 123.45fps. It provides technical support for the development of embedded detection equipment.


Introduction
The production safety of enterprises is crucial, and every year there will be tragedies caused by the failure to wear the corresponding safety protective equipment [1]. In the cylinder driving operation of a tobacco factory in China, workers are usually required to wear safety helmets and anti-slip shoes before the next construction operation. Therefore, workers' wearing, including safety helmets and anti-slip shoes, need to be detected before the construction. At present, the wear detection before construction mainly relies on manual detection, and people need to be at the construction site all the time, which consumes a lot of human resources, which requires a machine that can replace manual detection to ensure the safety of construction personnel all the time. The traditional detection based on large servers requires a large amount of investment in the early stage. Meanwhile, the field is not good for the deployment of large equipment, so it relies on edge equipment for detection. The traditional large models run slowly in embedded devices and cannot meet the real-time requirements, so the model needs to be deleted.
Yolo has certain advantages in some target detection fields [2][3][4], but there is still room for YOLO to be optimized for use on less powerful devices, such as edge computing devices [5][6].

Data
The data was collected from a cigarette factory, the video was shot by IphoneXR, and then the video was captured as a picture. The fuzzy and incomplete pictures were deleted. A total of 11567 pictures were selected. The specific distribution of the data set is shown in Table 1, and the wearing example is shown in Figure 1. The essence of the channel pruning algorithm is to eliminate the connection points with low weight by identifying the channel of the network, and disconnect the input and output connected to the connection points [7]. Compared with the layer pruning algorithm, this algorithm reduces the number of parameters to be stored and has lower hardware requirements, so it is easy to be deployed on embedded devices, mobile terminals and other small computing platforms [8][9]. Therefore, in this study, a balance was made between flexibility and implementation cost. A channel pruning algorithm was used to prune the Yolov5s safety helmets and anti-slip shoes detection network after training. In a Convolution-Bn-Activation module, the Bn layer can achieve channel scaling. The specific operation of the BN layer has two parts: as shown in Eqs. (1)- (2). After batch-normalization, a linear transformation will be carried out by evaluating the weight of the coefficient γ. When the weight of the coefficient γ is very small, the corresponding activation (Zout) will be correspondedly small. These outputs with small responses can be prunned out, thus achieving channel pruning at the BN layer.  The channel pruning of Yolov5 consists of the following steps: 1.
Sparse training. L1 regularization is used to train the network sparsity on the BN layer of the safety helmets and anti-slip shoes detection network, and the parameters with less weight in the model are found out to make the network structure sparse.
2. Channel pruning. A network with a smaller width can be obtained by cutting off the input and output relations related to the weighted channel at a certain pruning rate. The network will automatically identify and remove the unimportant channels, which will hardly affect the generalization ability of the network. 3.
Fine tuning of the model after channel pruning. After channel pruning, there may be a large loss of precision, so the precision of the model can be restored by fine-tuning the model.
During the whole process, the objective function is shown in Eq. (3) Where the first term was the training loss of the network; the second term is the L1 regular constraint term of the γ coefficient of the BN layer; x and y are the input and output of the training, respectively; W is the training parameter in the network; λ is the penalty factor.
Parameter Settings at each stage of channel pruning are shown in Table 2.  Figure 3 shows the sparsity training process. As the sparsity training progresses, γ gradually approaches 0, but does not reach 0, indicating that γ has gradually become sparse. Starting from the 14th epoch, the change in γ is not so obvious, indicating that the sparsity training can be stopped. An attempt was made to train 300 epochs to make γ closer to 0, but there was no difference compared with the 14th epoch. After sparse training and channel pruning, the channel number distribution is shown in Figure 4. A total of 4496 channels are prunned, and the prune ratio is 46.7%. As you can see from Figure 4, the number of channels in most layers is significantly reduced. Table 3 shows the changes in the number and accuracy of parameters before and after channel pruning. Before and after channel pruning, the model size was reduced by nearly 63.9%, but the accuracy remained basically unchanged. Summing up the above two characteristics, it is shown that channel pruning is effective.

Evaluation of model performance
In order to evaluate the performance of the model, 3400 images are tested. In this study, four indexes of accuracy precision(P), recall (R), mean Average Precision(mAP) and detection speed (FPS) were selected for evaluation. Their calculation formulas are shown in Eqs. (4) -(6).
Where TP, FP and FN are the numbers of true positive cases, false positive cases and false negative cases, respectively; and c is the number of detection categories. There are four wearing situations in this study, so c=4.
A total of 3403 images were selected for the test, including the same number of different wearables. Test results as shown in speed also has more than 100 frames at the same time, the in the embedded devices, small system is good, at the same time for the application in embedded small platform provides technical reference.

Comparison of different object detection algorithms
At present, in addition to the Yolo target detection network, there are other target detection networks such as Faster-RCNN [10], RetinaNet [11],Yolov4-tiny [12] and test results are compared with them. As shown in Table 5, compared with other algorithms, the mAP value of Yolov5s after channel pruning decreased by less than 1%, but the model of network model decreased by at least 4.5 times. Both Faster-RCNN and RetinaNet need to reach several hundred MB and obviously cannot run on embedded devices and small mobile devices and cannot meet the real-time requirements. Yolov4-tiny can basically meet the real-time requirements, but the mAP is 0.06% lower than that of Yolov5s after channel pruning. As a result, Yolov5s for removing channels is up to 37 times faster than other networks without compromising accuracy. The results show that channel purnning of Yolov5s can effectively detect whether the operator is wearing a helmet and anti-slid shoes.

Conclusions
In this study, the channel pruning method is used to reduce the hardware requirements for model operation and reduce the cost for large-scale application. The principle of channel pruning is to judge the weight of the connection points in the network model of the neural network through the weight coefficient. The connection points with small weight coefficient are disconnected from the corresponding input and output, and these connection points have little influence on the final accuracy of the model and will not affect the final detection result. After channel pruning, a total of 4496 channels with channel weight coefficient less than 0.5 are deleted, reducing the channel of Yolov5s by 63.9%. After constant parameter fine-tuning, the operating speed reaches 123.45fps and mAP reaches 93.2%, which provides technical support for the development of embedded monitoring robot. Compared with the Faster-RCNN, RetinaNet and Yolov4-tiny algorithms, the model size is reduced by 73 times and the mAP is reduced by less than 1%. The running speed is 1.89 times faster than yolov4-Tiny, which meets the real-time and accuracy requirements. In future research, the efficiency of the channel pruning algorithm will be further studied to improve the accuracy of detection.