An adaptive label assignment scheme for slender object detection

Slender objects have a large aspect ratio and are generally oriented, resulting in poor performance of current general detectors on slender object detection tasks. Therefore, an adaptive label assignment scheme for slender object detection is proposed in this paper. Specifically, the central axis prior to positive training samples is proposed to make the final position distribution of positive training samples more reasonable. Secondly, it is proposed that the number of positive training samples of slender objects could be further increased to solve the problem of positive training sample imbalance between slender objects and regular objects. Experimental results on the MS COCO dataset demonstrate the effectiveness of the proposed method.


Introduction
Object detection aims to identify and locate the object of interest in the image, which is a fundamental yet challenging problem in computer vision.In recent years, object detection algorithms based on deep neural networks have developed rapidly and achieved great success in the field of general object detection.However, there are many special scenarios in industrial applications, such as small objects, dense objects, and slender objects [1][2][3].For small object detection and dense object detection, many excellent works have been proposed, but as far as we know, there is little research on slender object detection.
(a) (b) . Illustrations of redundant background and overlap in horizontal bounding boxes.Horizontal bounding boxes are commonly used to represent the position of objects in object detection [4].However, this representation has the following problems: firstly, when the object is oriented or irregular in shape, it can lead to redundant background or parts of other objects in the box, as shown in Figure 1(a).Secondly, when multiple oriented objects appear densely, the use of horizontal bounding boxes will result in a large overlap rate, as shown in Figure 1 2 ratio, so we argue that the above problems have a great impact on the detection performance of slender objects.
To address the above problems, state-of-the-art detectors are mainly proposed from the perspective of label assignment [5,6].OTA regards the label assignment problem as the optimal transmission problem, calculates the transportation cost between all the ground-truth boxes (gts) and the prediction boxes, and then obtains the lowest transportation cost by finding the optimal mapping.Compared with other methods, OTA considers the global information of all gts and can find the global optimal label assignment scheme [7].SimOTA simplifies the Sinkhorn-Knopp iterative solution used in OTA into a dynamic k estimation strategy and can reduce the training time by 25% without compromising performance [8].All these label assignment methods adopt center before restricting positive samples to be within a distance r from the center of the object.However, we expect positive samples of slender objects to be distributed along the central axis, so the central prior strategy does not apply to slender objects.
Besides, we find that the proportion of slender objects in the datasets is generally small, and the position regression accuracy of slender objects is lower than regular objects, thus the number of positive training samples estimated by the dynamic k estimation algorithm is less than that of regular objects.It leads to the imbalance of positive samples between slender objects and regular objects, therefore the detection performance of slender objects is further dropped.
Aiming at the above problems, we propose an adaptive label assignment scheme to improve the detection performance of slender objects.Firstly, the prediction boxes are replaced by rotated boxes to reduce the redundant background and overlap.Then, the adaptive central axis prior is proposed to make the position distribution of the final selected positive training samples more reasonable.Finally, to alleviate the imbalance problem of the number of positive samples between slender objects and regular objects, we propose to increase the number of positive samples of slender objects to further improve the performance of slender objects.For convenience of evaluation and analysis, it is necessary to determine the slenderness measurement of the objects.Horizontal bounding boxes are commonly used in object detection, and slenderness can be approximately computed from the width and height ℎ of horizontal bounding boxes as = ℎ ⁄ .However, this definition is inapplicable for oriented slender objects.As shown in Figure 2, cannot truly reflect the skateboard's slender features.A more precise approach is to use a rotated bounding box which covers the object with a minimal area.In this way, slenderness can be defined as:

Rotated object detection head
The current general object detectors can only predict the horizontal bounding boxes.Aiming at predicting the minimum-area bounding boxes of the objects precisely, the object detectors need to be able to predict the rotation angle of the objects.One of the most intuitive and simplest methods is to add an angle regression branch to the detection head so that the improved detectors have the angle prediction ability of the bounding boxes.Taking the decoupled detection head of YOLOX as an example, the way of adding an angle regression branch is shown in  SimOTA mainly consists of the following procedures.Firstly, the candidate region of positive samples needs to be determined, the anchors within a 3×3 square range of the object center are taken as candidate positive samples.Then, we calculate the cost or matching degree between all anchors and gts, the cost is the weighted sum of classification loss and regression loss as shown in Equation ( 2), which is also known as loss awareness.Finally, the dynamic k estimation algorithm is used to predict the number of positive samples that need to be assigned to each gt, and the top k anchors with the minimum cost values are assigned as positive samples for each gt.

Central axis prior.
To make the training process stable and improve the convergence speed, YOLOX and other anchor-free object detectors need to determine the candidate region of positive samples, and then determine the positive samples from the prior region dynamically.It is generally considered that the anchors near the center of the object can better characterize the object, so the square area within a certain distance from the center of the object is generally used as the candidate region, which is called center prior as shown in the horizontal bounding box in Figure 4.However, this practice is not suitable for slender objects, and we believe that the features learned by the anchors located near the central axis of the objects have richer semantic information and can better characterize objects, which is conducive to subsequent classification and localization.Therefore, the central axis prior is proposed in this paper, which mainly expands the candidate region of positive samples to the rotated rectangle

2.3.3
Increase the number of positive samples for slender objects.Dynamic k estimation was proposed in SimOTA to estimate the appropriate number of positive samples for each gt adaptively.Specifically, the IoU between all prediction boxes and gts is calculated, then the top q largest IoU values are accumulated for each gt, and the rounded result is used as the number of positive samples.We observe that the number of positive samples of slender objects is significantly less than that of regular objects due to the position prediction of slender objects being relatively inaccurate, and the IoU between the prediction box and the gt is generally small.Besides, slender objects generally have a small proportion in the training datasets.The two factors lead to the unbalanced number of positive samples in the detection of slender objects, which causes the model to be biased towards regular objects.Therefore, the dynamic k estimation algorithm is improved in this paper.By increasing the number of positive samples of slender objects as shown in Equation ( 4), the imbalance of positive sample numbers is alleviated, and the training of slender objects is more sufficient.27.3 (+1.1) 39.1 (+0.5) 52.9 (-0.1)Experiments are conducted on MS COCO 2017 [10] which contains about 118 k and 20 k images for train and test sets respectively.Following the common practice, we train detectors on the train sets and compare performance on the test set.YOLOX-M model with a modified CSPNet backbone pre-trained on ImageNet is used as the baseline.We use stochastic gradient descent (SGD) with weight decay = 5e-4 and momentum = 0.9 as optimizer.We use a cosine learning rate schedule, the base learning rate is 0.01, and the total epochs are 300.During training, we use a multi-scale strategy, the input size is evenly drawn from 416 to 768 with stride = 32, and the input size is 640x640 during the test.The , , and * used in Equation ( 4) are 1.5, 1.2, and 10, respectively.Besides the AP which is commonly used in object detection, we also report APXS, APS, and APR for extra slender, slender, and regular objects, respectively, which are similar to the commonly used APS, APM, and APL.

Results and discussion
As shown in Table 1, the YOLOX-M baseline model achieves 47.2% AP.However, the APXS and APS are only 22.9% and 36.5%,respectively, which are far less than 52.8% APR, so we conclude that the detection performance of slender objects is lower than that of regular objects.Changing the representation from bounding boxes to rotated bounding boxes can reduce the redundant background and overlap, so we can see the performance of extra slender objects and slender objects increases to 23.4% and 36.9%,respectively, and the performance of regular objects also increases from 52.8% to 53.0%.After applying the central axis prior, the performance of slender objects increases, and the APXS and APS increase by 2.8% and 1.7%, respectively.The performance of regular objects is unchanged, and the AP increases from 47.4% to 47.6%.When we continue to increase the number of positive samples of slender objects, the APXS increases from 26.2% to 27.3%, and the APS increases from 38.6% to 39.1%.Affected by this, the APR drops by 0.1 slightly, and the AP stays unchanged.When the above three strategies are used simultaneously, the APXS, APS, and APR increase by 4.4%, 2.6%, and 0.1%, respectively, and the AP increases by 0.4%.We visualize some label assignment results in Figure 5, and only the FPN layer with the largest number of positive training samples is shown for better visualization.While predicting horizontal bounding boxes and adopting center prior, the positive samples of slender objects are assigned to the region near the center of the object, and there are cases in which they fall into the background or overlapping region.Adopting the rotation bounding boxes and the central axis prior, the positive samples can be distributed along the central axis of the object, and almost no cases fall into the overlapping area, so the position distribution is more reasonable.After improving the dynamic k-estimation algorithm, more positive training samples can be assigned to slender objects.

Conclusion
In this work, we propose an adaptive label assignment strategy for slender object detection.We point out that horizontal bounding boxes are not suitable for slender objects.Adopting rotated bounding boxes can better align slender objects for more accurate identification.Then, we propose to change the candidate region of positive training samples from the commonly used center before the central axis, which is conducive to selecting more suitable positive samples for the network training.Furthermore, we find that compared with regular objects, slender objects have a lower proportion in the data set and lower positioning accuracy, resulting in the problem of positive sample imbalance.Therefore, we propose that the number of positive samples of slender objects can be further increased by improving Figure 1.Illustrations of redundant background and overlap in horizontal bounding boxes.Horizontal bounding boxes are commonly used to represent the position of objects in object detection[4].However, this representation has the following problems: firstly, when the object is oriented or irregular in shape, it can lead to redundant background or parts of other objects in the box, as shown in Figure1(a).Secondly, when multiple oriented objects appear densely, the use of horizontal bounding boxes will result in a large overlap rate, as shown in Figure1(b).Slender objects have a large aspect

Figure 2 .
Figure 2.An illustration of the definition of slenderness.For convenience of evaluation and analysis, it is necessary to determine the slenderness measurement of the objects.Horizontal bounding boxes are commonly used in object detection, and slenderness can be approximately computed from the width and height ℎ of horizontal bounding boxes as = ℎ ⁄ .However, this definition is inapplicable for oriented slender objects.As shown in Figure2, cannot truly reflect the skateboard's slender features.A more precise approach is to use a rotated bounding box which covers the object with a minimal area.In this way, slenderness can be defined as: = ( , ℎ)/ ( , ℎ)(1)

Figure 3 .
There are three common ways to define the angle of the rotated boxes.We use the OpenCV definition in this paper, and ∈( 0 , π2 ⁄] [9].

Figure 3 .
Figure 3.A simple rotated object detection head.

Figure 4 .
Figure 4. Comparison of center prior and central axis prior.

Figure 5 .
Figure 5. Visualization of assigning results.(a) YOLOX, (b) applying rotation box and central axisprior, (c) increasing the number of positive samples of slender objects.We visualize some label assignment results in Figure5, and only the FPN layer with the largest number of positive training samples is shown for better visualization.While predicting horizontal bounding boxes and adopting center prior, the positive samples of slender objects are assigned to the region near the center of the object, and there are cases in which they fall into the background or overlapping region.Adopting the rotation bounding boxes and the central axis prior, the positive samples can be distributed along the central axis of the object, and almost no cases fall into the overlapping area, so the position distribution is more reasonable.After improving the dynamic k-estimation algorithm, more positive training samples can be assigned to slender objects.

Table 1 .
Performance comparisons on the MS COCO 2017 test-dev set.