Automatic ship detection in SAR Image based on Multi-scale Faster R-CNN

Automatic ship detection of synthetic aperture radar (SAR) images has been widely used in maritime surveillance. SAR images have the characteristics of all-weather, all-day detection. Therefore, many object detection methods ranging from traditional to deep learning techniques have been proposed. However, the objects in large-scale remote sensing images are relatively small, and objects are often appeared at different scales. What’s more, the current ship detection methods are insensitive to small-scale vessels. To solve these problems, a novel multi-scale ship detection method based on a Multi-scale Faster R-CNN network in SAR images is proposed in this paper. Firstly, a multi-scale network is used to decompose the SAR images into a pyramid structure and extract the features. Then, the region proposal network (RPN) is performed using the feature map for each layer to get the proposals that contains ship targets. Finally, these proposals are fed to the classification and regression network to obtain the final detection results. Multi-scale Faster R-CNN achieves the mean average precision(mAP) score 0.986 on the dataset of SAR-Ship-Dataset, which indicates that the proposed method has high detection accuracy and low missing rate.


Introduction
Synthetic Aperture radar (SAR) images are not affected by weather and light. And are widely used in marine monitoring, environmental monitoring and military applications, such as maritime transport safety, ship detection, etc. Germany launched the TerraSAR-X satellite in June 2007, the European Space Agency (ESA) launched Sentinel-1 in April 2014 and China launched the Gaofen-3 in August 2016. With the addition of satellite platforms, a vast amount of SAR images has been produced, SAR ship detection become an urgent technical.
The traditional SAR ship target detection mainly use Constant False Alarm Rate (CFAR) detection method, but the CFAR method cannot adapt to ship detection of all sizes. CFAR can only use pixelbased information, which is difficult to extract effective features. Under the circumstance of great complexity, there are too many false warning targets, which will affect the detection performance of CFAR detection method [1]. Cell-average CFAR (CA-CFAR) and order statistic CFAR (OS-CFAR) detectors are presented in [2] [3]. These two detectors use the self-adaptive methods to determine the detection threshold. However, these two methods ignore the local characteristics in SAR ship images and have low detection performance for complex scenes. In [4], a multi-scale fused heterogeneity detector and a contrarious decision-based target detector are proposed. By reducing the interference of background and speckle noise, the target can be highlighted effectively, and the target and background can be separated accurately. But the traditional detectors are slow in speed and low in precision. In [5], the Single Shot Multibox Detector (SSD) is used in SAR ship images. However, the accuracy of SSD detector is relatively low. In [6], it put forward the application of Faster R-CNN to the SAR image of ship detection. The accuracy of Faster RCNN is improved to some extent, but the faster RCNN does not consider the multi-scale characteristics of SAR ship images. In [7], RetinaNet was used to detect ship targets. But there exists the problem with small targets that have not been detected.
Based on the above analysis, Multi-scale Faster R-CNN is proposed for detect small targets of SAR ship images. It contains a multi-scale network to extract the multi-scale characteristics of ships. Through this network, multi-scale features of ships are extracted and put them into the Region Proposal network (RPN) to generate candidate regions. Finally, get the detection targets.
This paper is organized as follows. Section 2 relates to the proposed method. Section 3 reports on the experiments, including the dataset and experimental analysis. Section 4 offers the conclusion.

Multi-scale Faster R-CNN
The architecture of Multi-scale Faster R-CNN has three components, the backbone with the multi-scale network, RPN, classification and regression. The network structure is shown in figure 1.

Multi-scale Network
In SAR ship images, ships usually have different scales, which result in different characteristics. For convolutional neural networks, different depths correspond to different levels of semantic characteristics. The original Faster R-CNN used the top-level features, ignoring the lower-level features. However, these low-level features are very helpful for detecting small objects [8]. Therefore, based on Faster R-CNN, this paper adds a multi-scale sub-network, as shown in figure 2, which consists of a bottom-up line and a top-down line and a horizontal connection.  The bottom-up pathway uses ResNet to construct. It consists of many convolution modules each has many convolution layers. As move up, the spatial dimension is reduced by 1/2. The output of each convolution module later is used in the top-down pathway. Specifically, the feature maps from bottomup pathway undergoes 11  convolution filter to reduce the channel dimensions.
As for top-down path, the spatial resolution is upsampled by a factor of 2 using the nearest neighbour. Then the feature maps are merged from the bottom-up pathway and the top-down pathway by elementwise addition. Finally, each layer uses a 33  convolution filter to generate the final feature map, which is to reduce the aliasing effect of upsampling. Then these feature maps are fed into RPN.

RPN
In this paper, in order to generate proposals, the RPN in Faster R-CNN is used [9]. RPN is a full convolutional network used to generate bounding box that contains candidate targets. The feature maps of different scales generated by the multi-scale network are input into RPN head. Anchors are generated through the RPN head. Anchors generated by the RPN head are fed into two full-connection layers, the classification layers and the regression layers. The classification layers output the probability of each anchor being object and the probability that it is not object. The probability is in terms of i p , i is the index of anchor in a mini-batch.
For bounding box regression, use the parameterizations of the four coordinates following [9] : By default, we set =10  and thus both classification and regression terms are roughly equally weighted. At the end of RPN, 1000~2000 anchors are generated and will be used later.

Classification and Regression
Due to the problem of zone mismatch in RoiPooling, RoIAlign was selected in this paper [10]. RoIAlign traverses each candidate region, keeping the floating points from being quantified. Firstly, the candidate region is divided into kk  cells, and the boundary of each cell is not quantified. Four coordinate positions are calculated in each cell, the output coordinates are calculated by bilinear interpolation. Then, the maximum pooling is used to obtain the pixel value of the small region. After RoIAlign, the generated proposals are sent into the full connection layers for classification and regression to get the results.

Experiments on SAR Ship Dataset
The dataset used in this paper is the SAR Ship Dataset published in [11]. A total of 102 Chinese Gaofen-3 images and 108 Sentinel-1 images are used. Details are shown in the table 1.  The size of each ship slice is 256 256  , these ship slices have different scales and backgrounds. Each ship slice corresponds to the label data for xml format. The label contains the name of the ship slice, the ship's position in the ship slice, the name of the ship, etc.The 43819 ship slices were randomly divided into training set and validation set, which the training set accounted for 80% and the test set 20%.
In the target detection problem, it is necessary to classify and locate the object at the same time, and the common precision index can not be used directly, so the mean average precision (mAP) is used as the evaluation index. the calculation formula for mAP is as follows: C is the number of categories,

Experiments on SAR ship Dataset
In order to better evaluate the performance of the proposed method in this paper, using three other models for comparison, Faster R-CNN, SSD and RetinaNet, using the same dataset for training. The experimental results are shown in table 2.  It can be seen from table 2

Conclusion
This paper designs multi-scale Faster R-CNN for ship detection in high-resolution SAR images. The experiment results show that the multi-scale Faster R-CNN detection performance is the best compared with the other target detectors selected in this paper, and the future work will focus on ship detection near the port.