Research on YOLOv3 algorithm based on darknet framework

Traffic jams and accidents occur frequently in modern cities. In the context of smart cities, intelligent transportation can be effectively controlled through target detection technology. In view of the problems of slow detection speed and low accuracy of traditional vehicle detection algorithms, a YOLOv3 algorithm based on K-means ++ is proposed. The accuracy of bounding box detection is improved by the K-means ++ algorithm. Compared with the traditional YOLOv3 detection algorithm, the improved algorithm improves the detection speed and accuracy. Experiments show that the improved algorithm has a higher recognition rate for small targets in the actual test, while reducing the false detection rate and improving the accuracy of the algorithm.


Introduction
With the development of modern cities and the improvement of people's consumption level, urban traffic jams and traffic accidents occur frequently, which causes great inconvenience to people's production and life. In response to such problems, the concept of smart city has been proposed, hoping to control intelligent transportation, so as to effectively solve traffic congestion and provide protection for illegal traffic evidence [1]. At the same time, for the application research of target detection technology, domestic and foreign scholars have gradually carried out research work on vehicle detection. In the field of traditional machine learning, scholars extract target features and input the extracted features to support vector machines, iterators, and other classifiers for classification and recognition, but the disadvantage of this method is that the scale of the data is not large, and the generalization ability is relatively small and poor. It is difficult to accurately identify the target [2]. In the field of deep learning, the application of convolutional neural networks combines artificial networks and convolution operations, resulting in many deep learning target detection algorithms. YOLO (You Only Look Once) algorithm is a brand-new algorithm for end-to-end detection, which can maintain a high accuracy rate while performing rapid detection [3]. The single-stage detection algorithm YOLOv3 has become the mainstream detection algorithm in engineering applications due to its good detection accuracy and speed. This paper improves the YOLOv3 algorithm. Compared with the traditional YOLOv3 detection algorithm, experiments show that this algorithm improve the accuracy of target detection.

The basic idea of YOLO algorithm
The basic principle of YOLOv1 algorithm is to divide an image into SxS grid cells. If the center of an object falls in this grid, this grid is responsible for predicting this object [4]. There are B bounding boxes in each grid. Each bounding box has 5 parameters: x, y, w, h, confidence. The first four parameters are used to record the position and size of the bounding box. Confidence represents accuracy of the object contained in the predicted box, and its calculation formula is: Object IOU (1) If there is an object in a grid cell, the first item will be 1, otherwise it will be 0. The second term is the IOU value between the predicted bounding box and the actual target box.
Each grid cell also predicts a category information, that is, the probability value judged as a certain category, which is recorded as category C. In the YOLO model, there are 20 categories, so there are 20 parameters in C. From the above, the output form of each image is: Its network structure is shown in Figure 1: YOLOv1 algorithm network structure diagram. The YOLOv1 algorithm uses a convolutional network to extract features, and then uses a fully connected layer to obtain predicted values. The detection layer of YOLOv1 algorithm contains 24 convolutional layers and 2 fully connected layers.
Although the YOLOv1 algorithm has the advantages of fast operation and accurate recognition, the algorithm still has some defects. For example, if there is no object in the grid cell, then Pr object  . Although there are two Bounding Boxes in a grid cell, they share the same set of classification probabilities, so the same cell can only recognize the same object, which will cause inaccurate object recognition and missed detection.

YOLOv3 algorithm
The YOLOv3 algorithm introduces the anchor box in the Faster-RCNN algorithm, which can effectively improve the accuracy of the bounding box to identify objects [5].The anchor box is the shape and size of several boxes that are counted from all ground truth boxes in the training set and most frequently appear in the training set, using the k-means clustering method [6]. YOLOv3 uses the method of YOLO9000 to predict the bounding box, and determines the anchor box through size clustering. Four predictions for each bounding box network, with a coordinate offset of x t , y t , w t , h t . If a cell of the feature map is offset by the coordinate ( , ) xy cc in the upper left corner of the picture, and the pre-selection box of the bounding box's size is w p , h p , that is, the size of the anchor, then the generated prediction coordinate is , which is the feature map level. And x g , y g , w g , h g is the true value mapping on the feature map. By predicting the offset x t , y t , w t , h t to make x b , y b , w b , h b and x g , y g , w g , h g consistent, as shown in Figure 2 specifically.

Improve YOLOv3 algorithm
For the YOLOv3 algorithm, the original K-means algorithm is used to randomly select K points in the data set as the clustering center. By adding statistical a priori (or human) experience to the model, multiple experiments are required during the experiment and with uncertainty. This paper proposes a YOLOv3 algorithm based on K-means ++ [7][8][9].
The K-means ++ algorithm selects K clustering centers according to the following idea: assuming that n initial clustering centers   0 nK  have been selected, when selecting the n + 1 clustering center: points further away from the current n cluster centers will have a higher probability of being selected as the n + 1 cluster center. When selecting the first cluster center   1 n  , the random method is also adopted [10].
The following are the specific steps for applying the K-means ++ algorithm to the YOLOv3 algorithm: (1) The original clustering data used is a detection data set with only labeled boxes. YOLOv3 will generate a TXT file containing the location and category of the labeled boxes, where each line contains , , ,  to the clustering center closest to "distance"; (4) After all the labeling boxes are assigned, the cluster center point is recalculated for each cluster.
The calculation method is 1  The configuration of the experimental platform is: the operating system is Windows7, the CUDA version is 10.0, the CUDNN version is 7.4.1, and Tensorflow2.0 is used for training.

Experiment and result analysis
During the verification process, The match is successful when the prediction probability is greater than 50% and the prediction box matches the actual marked rectangular box IOU is greater than 0.5. The actual test is a car in a video taken on a road. Test 1, 25, 50 , 75 and 100 frames, the experimental results are shown in the figure below. (the left side is the yolov3 algorithm result, the right side is the algorithm result of this article) The test accuracy rate is shown in Table 1: Experimentally tested at frame 1 and 100, this algorithm has better recognition ability and accuracy, and the recognition rate has been significantly improved. The 25, 50, and 75 frames of this algorithm are more fuzzy than YOLOv3 algorithm. Small targets perform better, and the YOLOv3 algorithm ICAMLDS 2020 Journal of Physics: Conference Series 1629 (2020) 012062 IOP Publishing doi:10.1088/1742-6596/1629/1/012062 5 misses detection. In summary, the algorithm in this paper performs better and more accurately when identifying targets.

Conclusions
This article is based on the YOLOv3 algorithm. By improving the clustering algorithm of the preselected bounding box size, the K-means in the original YOLOv3 algorithm is changed to the K-means ++ algorithm, which can effectively improve the accuracy of the bounding box. Experimental results show that: compared with the original network of YOLOv3, the improved YOLOv3 algorithm has high detection accuracy, low misrecognition rate and good robustness for small targets in actual road target detection tasks.