SDNet: A Novel Separable Structure for Single Image Deraining

Rain streaks degrade visual quality of images and severely affect the performance of computer vision system. Recently, deep convolutional neural networks (DCNNs) have shown the potential in single image deraining. However, their complex structures with large-scale of parameters lead to strong requirement of large storage, which still limits the applications of deraining systems. In this paper, we propose a separable deraining network (SDNet) which provides a more flexible and simply baseline of deraining network by using a cascaded framework to achieve automatic trade-off between model performance and computational resources. Besides, a novel module called group-atrous spatial pyramid (G-ASP) and a network separation strategy (NSS) are proposed to extract more discriminative multi-scale information with fewer parameters and adapts our model to different levels rain-streaks for reasonable allocation of computational resources automatically. The experimental results on public datasets demonstrate that our SDNet achieves promising performances with much fewer parameters than other methods.


Introduction
Rain streaks degrade the visual quality and then severely affect the performance of outdoor computer vision tasks such as object detection, segmentation, video surveillance [1]. Thus, effective methods for removing rain streaks are required for a wide range of practical applications, which aims at restoring clean background image from a rainy image.
Several methods based on traditional optimization separate rain streaks from object details by the use of low-level image features [2,3]. However, it is difficult to remove the rain streaks and preserve structure information simultaneously, when the object and rain streak have similar structures and orientation. Fortunately, inspired by the unprecedented success of deep convolutional networks (DCNNs) in low level vision tasks [3,4,5]. Fu et al. [6,7] utilized a relative shallow network to remove rain streaks which brought great performance promotion than traditional methods. Zhang et al. [8] proposed an algorithm based on GAN apply to the deraining and [9] used low-rank coding-based method to remove rain streak.
However, existing models for single image rain streaks removal tend to design a complex model to learn different modalities and intensities negative residual with tones of parameters which is wasteful to utilize a resource-hungry model to meet diverse rainy conditions. The practical applications require that models should follow: light-weight, adaptive ability and performance simultaneously.
To address above problems, we propose a separable deraining network (SDNet) with cascaded structure. The SDNet consists of cascaded several residual learning blocks and each block can be regarded as an independent rain steaks removal network. The visual results are shown in Fig.1. We further propose a network separation strategy (NSS) to help SDNet to be able to meet diverse rainy conditions at test phase automatically. Investigations show that representing features at multiple scales is of great importance for numerous vision tasks [10]. Therefore, we propose a novel module for multi-scale features extraction called group-atrous spatial pyramid (G-ASP) which adopts group strategy and introduces atrous convolutions with different rates to gain stronger scale adaptability. Our contributions can be summarized as follows: (1): We build a single image deraining network with both light-weight and promising performance, which strongly reduces the storage requirement in practical applications.
(2): A network separation strategy (NSS) is proposed at the testing procedure, which achieves adaptability across different rainy conditions.
(3): We propose a novel multi-scale features extraction module which is called G-ASP to enhance modeling ability to extract discriminative features and a parameter reuse strategy is adopted to G-ASP to reduce model complexity while maintain a decent performance.

Methods
In this part, we first describe the structure of G-ASP and its parameter reuse strategy. Then, we further propose NSS. Finally, the overall architecture of SDNet and loss functions are discussed.

Group-atrous Spatial Pyramid
In recent years, improvement of model performance brought by multi-scale information has attracted attention [10]. To obtain more abundant multi-scale features, we propose a novel atrous spatial pyramid (G-ASP) in Fig.2.
The original features are first expanded double size on channel by 1 × 1 convolutional kernel. Then, we evenly split new features into 4 groups, denoted as , where ∈ {1, 2,3,4}. Each feature group has same spatial size but 1/4 number of channels compared the original features. Accordingly, each has atrous convolutions, denoted as , (·) except , where ∈ {1, 2,3}. We present as the output of groups. Then, can be calculated as Eq.1. (1) Where , ( ), ∈ {2,3,4} represents the function mapping of atrous convolutions with rate of 1, 2, 3, respectively. The output of G-ASP can be described as = ( 1 , 2 , 3 , 4 ). The function of (·) denotes concatenation operation. We can find each group has different number of layers whereas the convolutional kernels share same size, by which we can maintain a steady increase of receptive field on same group while enlarging feature diversities from different groups to extract higher quality features with abundant multi-scale information. Meanwhile, the hierarchical information connections (HIC) among different groups (as shown by the red lines in Fig. 2) are also built. Thus, the multi-sale features from different groups can achieve hierarchical feature fusion and information guidance along these connections. In addition, we further propose a parameter reuse strategy, of which atrous convolutions in same group share parameters. Moreover, the recursive computation method [5] is also adopted. In this paper, we set each G-ASP as a recursive block, the inference of G-ASP at recursion can be formulated as + 1 = ( ), where (·) denotes the function mapping of each recursive block, is recursion stage and we set recursion as 5 in each forward propagation. Network parameters are reused across different stages to reduce the model complexity significantly.

Progressive Network Separation Strategy
In order to meet the requirements of practical application, we further propose a network separation strategy (NSS) which aims at adjusting the model structure to adaptive diverse rainy conditions in test phase according to a particular activation threshold. In image processing field, SSIM [11] is a widely used index to evaluate image quality which mainly considers the image similarity in brightness, contrast and structure. Thus, we hereby choose SSIM value to construct activation threshold for network separation in Algorithm 1. We consider that there existing an obvious structural difference between the rainy images and deraining images which is reflected on SSIM value. Thus, if the increasement of blocks do not brings expected SSIM value fluctuations, it means that current network has already reach" effective capacity", which causes less pays off with the increasing blocks budget.

Overview Architecture
The overall architecture of SDNet is shown in Fig.3 and Fig.4. Our proposed SDNet is cascaded by several residual learning blocks with repetitive structure. Fig.5 indicates that SDNet will obtain a more eye-pleasing reconstruction and higher PSNR and SSIM as block goes deeper. In this paper, each block is composed of three parts: initial feature extraction, multi-scale feature extraction and reconstruction. Besides, we adopt the global residual learning with a long shortcut in each residual learning block since negative residual is easier to learn. It is denoted that and , where denotes the reconstruction part which has only one convolutional layer. In addition, we use dense connections between shallow blocks and deep blocks to compensate the information loss brought by cascaded scheme. The features produced by G-ASP from shallow blocks are concatenated into the corresponding input of G-ASP in deep blocks. Therefore, the deeper blocks can merge features extracted by all previous blocks to predict finer estimation. Considering most methods optimized with Euclidean distance generate blurry predictions since the per-pixel losses do not close to perceptual difference between output and ground-truth as human visual perception [12]. In particular, it is hard to distinguish between the rain streaks and objects structure relying on 2 loss function solely when the rain streaks are blend with object edges and background scene. We adopt the combination of 2 loss and SSIM loss [11] to preserve global structure better as well as maintaining per-pixel similarity in each block. Meanwhile, global supervision is introduced into each block by minimize the loss collection of all blocks. The overall loss function for is defined as follow: where λ is a hyper-parameter to balance the MSE loss and SSIM loss, λ is set as 1 via crossvalidation experiments. We can hereby calculate global loss function as Eq. 3.
The global loss is a loss collection of all blocks, is the number of blocks, is hyperparameter to balance every and we set as 1.

Implementation Details
All experiments are conducted by using TensorFlow for the python environment on a NVIDIA GeForce GTX 1080 with 8GB GPU memory, we use the Xavier method to initialize the network parameters and the RMSprop optimizer is chosen. The initial learning rate is set to 0.001, the batch size is set to 16, 50000 iterations of training are required for training process and we set all kernels size of 3×3 and each convolution layer has 16 kernels. During training stage, we randomly generate 0.8 million clean and rainy image pairs with size of 128×128. In this paper, SDNet with 5 blocks as it's conventional form for experiments. We use the Rain100H and Rain100L for training and test respectively. Rain100H and Rain100L are two synthetic datasets with heavy and light rain streaks in [13] and each of them has 100 rainy images for test.

Evaluation on Synthetic Dataset
Effects of G-ASP: We assess the impact of G-ASP. A network which uses one convolutional layer to replace G-ASP (set as SDNet_O) is trained as the baseline. Meanwhile, we set experiments to quantitative analysis of hierarchical information connections (HIC). Table 1 shows the atrous pyramid and HIC are essential parts for G-ASP which brings distinct improvement on PSNR [14] and SSIM [11]. We arguably demonstrate that multi-scale features and feature fusion brought by G-ASP possess positive affects for deraining task.  Table 2 shows that under most light rainy conditions, a light-weight (3 blocks) SDNet is able to construct a good result and the further increase of blocks do not bring obvious performance improvement. But for heavy rainy conditions, require a more powerful network to get better reconstructions.

Effects of NSS:
Quantitative results on synthetic dataset: We compare our models (5 blocks) with several stateof-the-art deep and non-deep algorithms on synthetic in Table 3. The negative sign in the last column of Table 3 indicates the number of parameters not mentioned in the relevant paper. Compared with most other methods, our SDNet achieve comparable PSNR and SSIM with JORDER while outperforming other methods, which is in consistent with the visual results. We can find the intermediate result from SDNet_3 also has decent results. Table 3 indicates that SDNet has higher SSIM value but fewer parameters as the block increases when compared with ResGuideNet. It means our SDNet is a more efficient structure for deraining task. The visual comparisons are shown in Fig.6, benefited from multi-scale features extraction and hierarchical fusion brought by G-ASP, our model has a satisfactory reconstruction with less blurring and abundant details.

Evaluation on real-world dataset
The ultimate goal of image processing algorithms is to serve the applications of practical scenes. Therefore, the performance in real-world dataset is a significant evaluation indicator for deraining algorithms. In this section, SDNet is trained on synthetic dataset and we also implement other methods according to their optimal setting. The visual results are shown in Fig.7. Since no ground truth exist, we only show their reconstruction results, we can find that SDNet still works well on real-world images, and the results produced by SDNet has joyful performance on multiple kinds of rain streaks with less blurring and more details.