Flame Detection Based on Improved YOLOv7

A flame detection algorithm with improved YOLOv7 is proposed to address the problems that small targets in flame detection are easy to be missed and misdetected, and are easily disturbed by bright objects during detection. In the improved algorithm, the SE attention module is introduced to enhance the perception of the model on the channel; ConvNeXt is used to retain the features of small target flames for small targets among flame images; Employing K-means ++ clustering algorithm to gain the anchor points that match the flame size more to enhance the detection accuracy.The results testify that the improved algorithm we proposed in this paper is higher than the YOLOv7 in terms of average accuracy and precision.


Introduction
Fire is a kind of common natural disaster in real life.The occurrence of fire often produces toxic gases, pollutes the environment, and causes great harm to common economy and human life safety.In addition, the occurrence of fire is accompanied by an obvious flame, so a model that can accurately identify the flame is of great practical significance.
The progress in the application and development of deep learning to object detection is advancing, a series of major breakthroughs have been made in object detection, which are principally classified into one-stage object detection algorithms and two-stage object detection algorithms.The two-stage goal-detection algorithm step one involves extracting candidate frames from the image to ensure accuracy and recall rates, and then sends the candidate frames to a classifier for classification prediction.The algorithm for target detection, known as the two-stage method, is represented by R-CNN[ [1][2] [3], with Faster R-CNN [3] being the popular framework used.The single-stage target detection algorithm straightly generates detection results from the data and the processing speed is fast, thus meeting the requirements of flame detection in this paper.Typical frameworks are YOLO [4] [5] series, SSD [6], and EfficientDet [7], etc. Redmon et al. [8] [9] proposed that YOLOv1 and YOLOv2 networks were used successively for fire detection, which improved both speed and accuracy.Based on YOLOv7 and combined with ConvNeXt Block, Du et al. [10] built a module with stronger flame feature extraction ability and lighter structure.Based on the Faster R-CNN network, a flame recognition algorithm was designed by Liu Tongjun et al. [11].The training set used included flame pictures of various shapes and light pictures of various shapes, which reduced the interference of bright objects on flame recognition and improved the accuracy of the system [17].Gao et al. [12] introduced the channel attention mechanism module in YOLOv4, which significantly improved the detection effect.
In view of the requirements of flame detection on detection performance, as well as the relatively complex shape of flame, which is easy to be interfered by the surrounding environment and objects with similar characteristics to flame, an improved YOLOv7 flame detection model was proposed.In this model, to gain more important feature information, an attention mechanism (SE module) is introduced, and a pure convolutional neural network ConvNeXt is introduced to improve the recognition speed [21].The anchor boxes are updated using the K-means++ clustering algorithm.Lastly, the proposed method's effectiveness in flame detection is testified through comparative experiments.

YOLOv7 network models
YOLOv7 is mainly divided into two parts: Backbone and Head.Its input side performs a series of operations such as data enhancement for pictures and zooming pictures to a fixed size [19] [20].Backbone is the backbone of networks, including CBS convolutional layer, ELAN module (extended High efficiency layer aggregation network) and maximum pooled convolution.The Head network is the same as the YOLOv5 network.It adopts the traditional PAFPN structure.The output of Backbone is downsampled and enhanced by SPPCSP in a bottom-up path, enriching features of different levels.Subsequently, the obtained features are processed by feature fusion to obtain outputs of 3 different sizes, corresponding to the features of 3 sizes.The original network structure is shown in figure 1.
Although YOLOv7 is an efficient detection network with good performance on many datasets, there are few datasets for flame detection.Flame detection Compared with other targets, the shape of the flame is not fixed, and in some chaotic backgrounds and some small target flames,YOLOv7 is prone to miss detection and false detection.In order to solve these matters, this paper ameliorates the YOLOv7 algorithm, and compares the modified algorithm with the original algorithm, which testifies the usefulness of the suggested method in blaze detecting.

SE attention mechanism
The SE(Squeeze-and-Excitation) attention module is a submodule that can be built into other detection models.SE module reconstructs original features through a weight matrix, assigns weights to different positions of images from the perspective of channels, and obtains more important feature information.Figure 2 illustrates the structure of the SE.In literature [12][13],SE module has greatly improved the performance of different models, which proves the effectiveness of SE module.

ConvNeXt
Due to the existence of flame images of some small targets in the data set,YOLOv7 would lose flame features of small targets during feature extraction, resulting in inadequate flame feature extraction of small targets.In this paper, the YOLOv7 model was improved to introduce ConvNeXt [14] network architecture with stronger feature extraction ability.
ConvNeXt is a convolution model based on ResNet model and improved by referring to the design idea of Swin Transformer [15] [18] structure from five aspects.The ConvNeXt structure is shown in figure 3.

Macro design. InConvNeXt improves on ResNet,ResNet with Swin
The Transformer has the same four stages.The difference is that the block stacking ratio of each stage in ResNet is [3,4,6,3], and the stacking ratio of ConvNeXt is [3,3,9,3].At the same time, setting the stem into a convolution operation with a step 4 and size 4.

Depthwise convolution.
ConvNeXt refers to the idea of ResNeXt [16], replacing ordinary Convolution with Depthwise Convolution, and this improvement better balances model size and model performance.

Inverted bottleneck.
In the standard ResNet the bottleneck structure is used.MobileNetV2 [18] uses an inverse bottleneck structure to prevent information loss when information is transformed in feature Spaces of different dimensions.ConvNeXt also adopts this inverse bottleneck structure, which can effectively avoid the loss of flame characteristic information of dim targets in the downsampling process.

Large kernel. Modern convolutional neural networks mostly use 3×3 kernels, but in the Swin
Transformer, large kernels were used again without affecting performance.Therefore, a 7 × 7 convolution kernel is used in ConvNeXt.

Micro designarge kernel.
In addition to the above adjustments,ConvNeXt also makes adjustments at the microscopic scale.ConvNeXt replaces the ReLU with the GELU and employs fewer activation functions.We also employ fewer regularization layers and replace batch normalization with layer normalization.ConvNeXt also draws on Swin Transformer Block, and the comparison between ConvNeXt Block and Swin Transformer Block is shown in figure 3.
With probability to choose the biggest point ) ( D x as a new clustering center.Finally, according to the roulette game act, select the next cluster center x c and proceed until K cluster centers


are selected.After the above operation, these K cluster centers are selected for the standard kmeans clustering algorithm.For each class, recalculate its cluster center, and the calculation formula is as shown in (2).

Experimental environment
The operating system used in this paper is Windows10, Pytorch version 1.8.0 as the deep learning framework, all experiments are carried out on the virtual environment created by Anaconda, the editor used is Pycharm Professional edition, and the programming language used is Python 3.7.10, the hardware acceleration uses CUDNN 80.05 and CUDA 11.1, and the experiment condition is display in table 1. (3 The TP is positive model to predict the correct number of samples, FP is the negative sample model prediction error, FN is the model prediction error of the sample were positive.

Experimental dataset
At present, there are few open datasets on flame detection, and the environment of flame is relatively simple.The data set employed is the flame images crawled by the author through the network.The image scenes include forest, indoor, outdoor, day and night, and contain 2059 images.
The self-made data set uses LabelImg annotation, the annotation category is only fire, this dataset is partitioned into training sets, validation sets, and test sets in a ratio of 6:2:2, and then evaluate the model.

Clustering results
For this data set, the clustering center is determined by applying the k-means++ clustering algorithm, can get the candidate box number and the relationship between the average crossover ratio.The clustering results of K-means ++ are display in figure 4.
As depicted in Figure 4, with the continuous increase in the number of candidate boxes, the average intersection and union ratio gradually rises, but when the number of candidate boxes is greater than 8, the curve gradually flattens out, and 9 anchor boxes are selected in this paper.

AP comparison.
The P, R, and AP pairs of the two models before and after improvement are display in table 2. The original Yolov7 method achieves accuracy and precision rates of 67.0% and 66.7% respectively, and the P and AP of the improved model are 79.0%and 68.0%respectively.The P and AP of the improved model perform better.To test the detection influence of the improved model on flames under complex backgrounds and small target flames, choose three pictures from the test set to compare the original model as a result in figure 5.
Through comparative analysis, our algorithm is better and can be better used for flame detection.From the comparison in figure 5 that display our model has better result than the original model in the small target fire.For the small target flame areas that are not detected by YOLOv7 in figure 5(a)(c), the model in this paper can better detect them.
As figure 5(e)(f) the model can still detect fire well under the interference of bright objects.Therefore, the improvement effect of YOLOv7 in this paper is significant, and it has good performance in the detection accuracy of objects with interference and small targets. 5. Ablation experiment SSD, YOLOv5s, [10] and the algorithm were used to detect the data set severally, the results as display in table 3.
To confirm the effectiveness of each step on the experiment, an ablation experiment is carried out to proof it.The verification criteria include P, R, AP, and the size of the weight model.Experimental result are display in table 4 according to the improvements made to the model.
As shown in table 4 that after the experimental comparison of different combinations of networks, P is improved by 12% contrast with the before method, the recall rate is improved by 3.5% although it fluctuates, and the AP is improved by 1.3%, which proves the superiority of the detection network.

3. 3 .
Improvement of the prior anchor boxYOLOv7 uses pre-set anchor boxes, which are not consistent with the actual box of the flame target.A suitable anchor boxes can greatly increase the accuracy of flame inspection.For fast and accurate powder detection, k-means ++ algorithm was selected for the clustering and initialization of the anchor boxes, which improved the cluster center initialization of kmeans algorithm.In k -means algorithm, a randomly selected sample points from the all data sets as the first original clustering center 1 c , and then calculate each sample point and the shortest distance between the initial clustering center, each iteration.Then, for each sample points, calculate each sample point was selected as the center of a clustering of probability, remember to ) ( P x , calculation formula as shown in(1).

4. 5 .
Comparative analysis of models 4.5.1.Model training.The flame data set is trained through the improved network model, set the beginning learning rate to 0.01, the momentum factor to 0.937, the training lasts is 500 epochs, the batch size to 16, and the input size is 640*640*3.In the process of training, the training results are saved, the weights are updated in true time, and the training convergence weight is set to the best weight.The loss change curve in figure 4. It can be seen that after 460 epochs of network training, the curve area is smooth, it can be see that the model has converged at this point, and set current weights for our experiment.

Figure 5 .
Figure 5.Comparison Of The Detection Results.

Table 1 .
Experimental environment.Using accuracy P, recall R, and average precision AP as assessment metrics.Since there is only one detection target category in this experiment, the average precision can be represented by AP, which is calculated by (3) ~(5).

Table 3 .
Comparison of detection results.

Table 4 .
Ablation experiment.Aiming at the shortcomings of deep learning in flame detection, we propose an improved YOLOv7 target detection model, which setting anchor boxes with K-means++ to enhance the detection accuracy.The backbone network is enhanced by incorporating the SE attention mechanism to strengthen characteristic extraction capability on channel of the network.ConvNeXt is introduced in the head to retain the feature of small targets.The results display that the present methods lift AP contrast with the beginning YOLOv7 model, which proves the effectiveness of the present methods.Next, we will enhance the network model to attain improved detection speed and accuracy, especially in environments with fire-like features that are prone to false positives.Where '×' means that the improvement technique corresponding to the current column is not used, and '√' means that the corresponding technique is used.