Improved Object Detection using Data Enhancement method based on Generative Adversarial Nets

In deep learning-based object detection, especially in face detection, small target and small face has always been a practical and common difficult problem due to its low resolution, blurred image, less information and more noise. In some applications, sensing image data is hard to collect, leading to limited object detection performance. In this paper, we investigate using a generative adversarial network model to augment data for object detection in images. We use generative adversarial network to generate the diverse objects based on the current image data. An improved generative adversarial network is added in the network and a new loss funtion is applied during the trianing process to generate diverse and high-quality traing images. Experiments show that images generated by generative adversarial network have higher quality than counterparts.


Introduction
Small object detection is one of the most fundamental parts images understanding and computer vision. Small object detection is technically the basis of many other computer vision applications, such as real time object tracking [1], image object segmentation [2,3], image captioning and labelling [4], person action capturing [5], virtual scene understanding [6]. If the images themselves are in a low resolution, then small objects may only be represented by a few pixels. With so limited information from the images, even humans would not perform well enough. In addition, most of the computer vision architectures scale the input size to a fixed size which is usually very small. This may reduce the image size further. Moreover, many computer vision models use pooling layers which may remove features from the images. This would be an essential problem if some features of the small objects are removed after pooling. While detection on small objects may not always be necessary, it could be typically crucial to some real-world problems. For example, the vehicle perception models for selfdriving requires extremely accurate detection behaviours on small objects. Any failures of traffic lights detection can cause severe problems.
To alleviate the inefficiency caused by small training images problem, data enhancement techniques have been proposed to increase the images numbers and diversity. These methods can be classified into three catalogues [7]. The first kind is spatial geometric transformation methods which uses rotation, flip, zoom, crop and shift to generate diverse kinds of images. The second type of method considers pixels transformation such as colour jittering, noise added and random erasing to generate more types of training images to feed into the machine learning model. The third way of image enhancement is to use over-sampling technique [8], sample pairing [9] to generate images. However, these image data enhancement methods depend on many hand-crafted rules, which is difficult to get for commonly unexperienced use and the images produced are also of limited diversity. In this paper, a novel generative adversarial network based (GAN) model is proposed to generate high-quality and diverse images. The main works of this paper include:(1) We apply an optimized GAN model to argument small-scale object detection problems. (2) We use data distribution feature in the generate model to improve the clarity of the generated images. (3) R-CNN based samples selection technique is used to sample features from objects to obtain object diversity. (4) Experimental studies show the effectiveness of the proposed method.
The rest of the paper is organized as follows. In Section 2, related work of existing small object detection algorithms and argumentation methods is described. Section 3 introduces the GAN framework. In Section 4, we introduce the object tracking network and optimization. Experimental studies are presented in Section 5. In Section 6, we conclude the paper.

Related Works
Object detection is widely used to in image understanding and video surveillance applications. The object detection methods based on deep convolutional neural networks (DCNNs) [11][12][13] have improved image detection accuracy greatly over traditional methods. In some applications, training images are hard to obtain [13] and small sized. Thus, different image enhancement techniques are used to improve the accuracy of object detection methods. [14] used Faster R-CNN model and covered image pixels with random values to enhance image generation ability. [15] used image mirroring, rotation, Gaussian blur and Gaussian noise to enhance the training image. [16] enhance the train image data by randomly dropping some objects in the training images which can produce richer variation image data set than method in [17]. In [18], authors adopted colour jittering and geometric alteration to generate the diverse kinds of training images. [19] simulates camera shooting and illumination synthesis to generate different marine uneven illuminating images for marine organisms' detection. [20] generates images by changing physical characters to improve object detection.

Basics of Generative Adversarial Network
The GAN model was firstly proposed in [21]. Generally, GAN consists of two models: a generator G and a discriminator D. The optimization objective function of GAN is defined as follows: where G denotes the generative model; D represents the discriminative model, E is the mean operation. z is the noise following Gaussian distribution P noise (z).

Network improvement and optimization
We optimize the network firstly by replacing the generator G resolution network with a 32 residual blocks of size 96 × 96, and add discriminator network D to form a complete GAN network. Actually, to compare with other methods, we use the Wasserstein GAN (WGAN) [22] model and add a gradient penalty parameter in the model. The loss function of the discriminator used in the model is formulated in (2): where x HR is the resolution of the image, x SR = G srq (x LR ) is the generated super resolition image, x LR is the low resolution of the image, G srq is the super-resolution of the image, D f is the discriminator model, P r is real image data distribution, P g is the generated image data distribution; r Df is gradient parameter of the discriminator. is a random vector sampled from x HR and x SR .
The next proposed improvement uses an HR image as input to generate a low resolution image as shown in Figure. 2.
The loss function used is as follows: (3) In the loss function (3), we try to integrate the super resolution image into a the discriminator model. GAN model computes loss function between the super-resolved images generated by G hr and highresolution images. If the super resolution image is similar to the high resolution image, the low resolution image is generated by the high resolution image.

Experimental Study
To validate the performance of the proposed GAN model, we compare our model with two popular GAN methods (WGAN [24] , LSGAN [25] ) and run experiments to evaluate the performance of these GAN models. To compare with other methods, we set parameters similer to experiments in those works (the epochs of all GAN model is set to 500). The model is implemented with Python 3.5 and Pytorch library.  Figure. 3, shows the images generated by the GAN model over the YOLO network. We can see the images generated are better than LR, EDSR methods. From Table 1, we can see that, compared to the result obtained from high resolution images, the GAN based models can detect all vehicles from the image, but only part of the vehichesl were detected from the super resolution images genetated by the bicubic interpolation algorithm and EDSR model. Among the tested models, the Yolo based model provided the most accuruate results as shown in Table 1.
The proposed image enhancement method is also compared with traditional data augmentation techniques and other GAN models. We first use the GAN model to enhance the training images, then the images were rotated, flipped and transformed to get more versatile forms. Table 2 hows the detection results.  From Table 2 we can see, the average precision value increases. The reason is that the training images become fluent after rotating, flipping and transforming. With more trianing images, the detection model can be well-trained and object detection accuracy is greatly improved. With the combination method, images are expanded greatly and diversity of images is eough to reduce imbalance distribution of positive and negative imge samples in the training process. With both image size and diversity both imporved , the accuracy of object detecion can be largely improved.

Conclusion
In this paper, we investigate a GAN model to enhance image datasets to improve accuracy of object detection in small size image sets. We use the GAN model to generate different level of images. Furthermore, the diversity of the images are obtained with some mapping. With the generated new images, the learning model is well-trained and more object features are extracted. Experimental studies with some popular methods showed that the proposed method can improve the average detection precision by 4.48%. In the future, we will work on parameter tuning of the model, to reduce training computing time and memory usage.