Medical Image Denoising Using Two-Stage Iterative Down-Up CNN and SURF Features

This paper proposes a Two-Stage Deep Iterative Down-Up Convolutional Neural Network for denoising the medical images, which more often increase and decrease the feature map resolution. The function, Speeded Up Robust Features (SURF) is employed in the proposed system to handle the problem of gradient vanishing. The SURF is a patented local feature detector and descriptor. Experiment is conducted using several different medical noise images like Computed Tomography (CT) and Ultrasound Image and found that the proposed system outperforms the current state-of-the-art consistently in image denoising methodologies.


I. INTRODUCTION
The medical noises are the great barriers in the measurement procedures that causes quality degradation and uncertainty in the diagnosis which makes those images unusable. There are several existing approaches available for medical image denoising over the past decades. The noises are of different for different medical imaging instruments and are classified as Fixed pattern noise and Independent random noise.
Speckle is a multiplicative noise which means the granular pattern that occurs on B-scans by mottling. The phased array ultrasound scanners are of coherent in nature. Hence, the indeterminable random tissue in homogeneities causes mottled images of B-scan images of kidney and liver, as it has too small underlying structures that can be resolved by ultrasound scan. The quality of the B-scan images could be degraded by speckle which in turn minimizes the human perception to determine the fine details in those scan images during diagnosis. Also, it results in loss of efficiency in edge detection process [1]. The ultrasound imaging when invented, it goes on increasing in its usage in real-time which is considered as a most crucial technique in medical diagnosis. For an ultrasonic imaging to be successful in its diagnosis, image quality is the important feature. But this image is often affected by the speckle noise, which is a form of acoustic noise that degrades the ultrasound image quality more. The speckle noise can also be seen in the laser and microwave radar imaging as interference effect that occurs due to the scattering of ultrasonic beam arises from the microscopic tissue inhomogeneity. The granular pattern obtained will not be related to the actual tissue micro structure. But the speckle covers the availability of lesions in the images and minimizes the human perception towards those minute details. Thus, it is important to suppress the speckle noise in medical ultrasound images to enhance the quality of those images for successful diagnosis of fine details in it [2]. The CT is a technique of radiographic examination. It takes in the large series of 2D image of an object's cross-section and generates a 3D image of the same object. The CT is used in many of the clinical situations connected to conventional radiography. The major disadvantage of using conventional radiography techniques that depicts 3D objects as 2D images are, the overlaid on the image. This issue is solved in CT due to the scanning of thin slices of body using a narrow beam of x-rays that are rotated over the stationary patient's body [3].
There are several automotive, gaming and entertainment applications that contain 3D vision. Here, the estimation of distance between the sensor and the object is very important. This technique is called as range or depth sensing and can be done with structured light, LiDAR, depth from stereo and Time of Flight (ToF) active sensing. The ToF is now in trending due to the fact that it can deliver the scene geometry details in 2D map format ie., range, depth. This can then be interpreted as 2D gray scale image [4]. In digital cameras and camcorders, more sophisticated algorithms are embedded for the sake of image sharpening and noise reduction processes. The known fact is that, the contrast enhancement increases the noise occurred due to sharpening, which is the major disadvantage of several existing contrast enhancement algorithms. The impact of this drawback is deceptive towards the classical linear unsharp masking process, in which a part of image undergone high pass filtering is added to the input data. If this high pass filter is replaced by the non-linear filters, then there could be a specific improvement, as it combines the edge enhancement and noise reduction effectively, resulting in limited noise amplification while sharpening an image [5]. Breast cancer is now common among women which causes to death. The National Cancer Institute reported that among one eighth of the women of United States are affected by this disease. The cause of breast cancer to be developed in someone is still unknown and hence, the primary prevention appears to be impossible. But the detection of this disease in its early stage can be a key factor that improves the prognosis of breast cancer. The most popular procedure of screening and diagnosing breast cancer is, X-ray mammography. The breast cancer when detected earlier can reduce the mortality rate upto 25%. But around 10-30% of breast lesions cannot be interpreted or can be missed in screening process [6].
The image denoising methodologies aims in reducing the noise and maintaining the quality of the image simultaneously. This is because; the transmission and acquisition of images are most often subjected to the additive noise. There are several estimators existing in image denoising. They can be classified as, (1) one that directly applied to signal and (2) one that uses Wavelet Transform before pre-processing [7]. The Flat-detector Computed Tomography is used in several domains like biomedical imaging, materials science and industrial diagnostic systems. The problem is associated with the reconstruction of image with high quality, more specifically in biomedical fields for diagnosing several diseases, organ defects and cancers. The defects of scintillator, sensitivity and non-uniform detector produces scanning data with some defects which causes concentric ring artifacts in the CT slices that are reconstructed by the FDK-liked algorithms. The CT slice quality could be affected by the ring artifacts which are needed to be revised before using it [8]. The aim of the medical image fusion is to integrate the data from multi-modal medical images for getting highly accurate information of an object that can give an ease of access for diagnosis and treatment by the medical images. It is known already that, in transmission and acquisition process, the medical images are corrupted by noise, which reduces the fusion effect. The medical images with noise lead to mistakes in characterizing the images and hence, the noise in a medical image is a big challenge in many traditional methods of image fusion [9].
The denoising an image is a vital process, since the denoised image can be used for several useful applications. Moreover, image denoising can be used in all the fields where image processing is assessed, as it is a simplest possible inverse problem [10]. The signal denoising is most important in the field of measurement and instrumentation. This is because; it reduces the uncertainties in measuring process and enhances the accuracy and credibility of several procedures. Methods like adaptive filters, statistical estimators, transform domain methods etc. had been explored to address this problem. In recent, the sparse representation theory is a growing technique, which suggests that the clean parts of a signal with noise can have a good sparse representation that makes use of a pre-defined dictionary, but the dictional can't sparsely represent the noisy parts [11]. The sparse representation can be a better tool for image fusion, image denoising, compressive sensing, electroencephalogram inverse problem, voxel selection, magnetic resonance spectroscopy, etc. The traditional sparse representation systems assume that, the non-zero coefficients could be appeared in random. But the sparse coefficients show intrinsic structures often in cluster form. The standard sparse representation won't consider intrinsic sparse signal. Hence, integrating the intrinsic structures in sparse representation is a technique for enhancing the sparse representation performance [12].
Based on the transform and signal property, the sparsity of the representation varies. Good sparsity can be achieved for multiresolution transforms for spatially localized information like edge and singularity, since they are more available in natural images and displays a substantial part of the data into it. Hence, this transform can be applied significantly in image denoising methodologies [13]. For denoising the Magnetic Resonance Spectroscopic Imaging (MRSI) signals, there are various methods available. This signal can be denoised by consecutive projection against various areas characterized by a set of linear time-frequency transform. Explicit parametric models can also be employed in MRSI denoising either before or during the metabolite quantitation. For correct models, this method is much effective and for incorrect models, bias problem will occur [14]. Remote sensing imagery is a significant technology that can be used in several different types of applications like earth climate, military and agriculture. But practically, different types of stripe noise would be present in the remote sensing imagery together with the cross-track and push-broom imaging. This is because of the variations in calibration error, detector response, etc. The remote sensing images with stripe noise degrade the quality of the images visually and risk the suitability of those images for further processing [15].
A technique called Industrial tomography, determines the distribution of the materials in an imaging region with respect to the set of multi-angle measurements. There are 12 electrodes being embedded onto the outer wall of the tube in Electrical Capacitance Tomography (ECT). The capacitance between the electrode pairs is calculated which is then used for reconstruction of images. The inverse problem is needed to be solved in this purpose [16]. A technique called Single Image Super-Resolution (SISR) is used to generate an image of high-resolution from low-resolution. This technique can be used more commonly in the field of computer vision applications like medical, surveillance and security imaging etc. where there is a demand for more fine details from an image is required. Traditional SISR method includes interpolation like bicubic interpolation and Lanczos resampling, methods that use internal patch recurrence or statistical image priors [17]. In medical and some other industries, XRAY-CT is used widely. The increased use of medical computed tomography leads to excessive exposure of the patients to radiation dosage. Based on the ALARA (As Low As Reasonably Achievable) principle, extensive research is being undergone in reducing the dosage of CT [18].
For scanning a patient with X-ray CT scan in routine, they are supposed to be developing the risk of cancer due to the exposure to the radiation more often. Hence, it is important to minimize the routine scanning in patients. But the low dose of X-ray CT causes severe artifacts because of the less number of photons, beam hardening etc. that leads to low reliability of diagnosis. Therefore, the interest in research of reconstructing a high-quality data from low dosage of Xray CT is increasing in the community of CT. Conventional denoising methodologies are high in computation expenses and the image denoising methods cannot eliminate the CT-specific noise more readily. To overcome these issues, a deep learning algorithm based low dose X-ray CT is introduced [19]. CT, Magnetic Resonance Imaging (MRI), Medical imaging including X-rays, ultrasound etc. are more vulnerable to noise. The cause for this noise varies with respect to image acquisition methods and reduced exposure of patients to radiation, ie., if the time taken to expose the patient to radiation is decreased, then the noise will get increased. Hence, denoising is vital in this situation for appropriate image analysis by machines and by the humans [20]. The contributions in the proposed medical image denoising method based on two-stage iterative down-up CNN and SURF features are briefed as follows: x The characteristics of noise and image are considered from the perception of image decomposition. The proposed system restores the desired image by leveraging the noise map which makes the proposed method more robust towards various noise levels.
x The propose two-stage iterative down-up CNN model can process several different types of medical noises faster and with better performance. Experiment is conducted using several different medical noise images like CT and Ultrasound Image and found that the proposed system outperforms the current state-of-the-art consistently in image denoising methodologies by means of effectiveness and efficiency.

RELATED WORKS
In medical B-scan imaging, a speckle suppression based on adaptive smoothing is introduced. In this method, the local kernels with appropriate shape and size are filtered. A filtered kernel can be obtained from each of the pixels that gets fitted onto local homogeneous area with processed pixels. This can be done by the region growing technique based on local statistics. This model is then examined using phantom and images and is found that the speckle can be reduced effectively by this filter, yet the resolvable information would be preserved [1]. A space-invariant filter cannot perform a single operation on every pixel of an image. Also, in nonlinear edge-preserving filters, same problem occurs. Hence, a space-varying algorithm that makes use of local image content can be used to solve this problem [2]. Several noise reduction filter to reduce the noise in CT images is examined. The combination of Gaussian and Prewitt operators and on anisotropic diffusion-based filters is used in [3]. The modeling and removal of fixed pattern in photonic mixture devices that makes use of the principle called Time-of-flight for calculating the range and depth is used in [4]. The filtering of input image based on 2 types of directional smoothers called Type 1 and Type 2 PWL Directional Smoothing are used. Sharpening module is used to process the results of the filter. Based on the value of Gaussian noise, the amount of required smoothing and sharpening would be chosen [5]. A dyadic wavelet processing-based image denoising and image enhancement algorithm is demonstrated which makes use of the estimation of local iterative noise variance [6].
A wavelet transform-based hyper-analytic wavelet transform combined with other filters used in discrete wavelet transform is introduced and this method is a simple, yet faster image denoising model [7]. A variation-based model that makes use of sparse prior to remove ring artifacts in the CT slice. With the help of Alternating Direction Method of Multipliers (ADMM), the model is divided into many simple sub-problems that can be solved easily [8]. A variation-based model that can perform multi-modal medical image fusion and medical image denoising is presented. This system uses multiscale alternating sequential filter for image fusion and Adaptive fractional order total variation (AFOTV) constraint to denoise the medical images [9].
The zero-mean white and homogeneous Gaussian additive noise is removed from the input image by using K-Singular Value Decomposition (K-SVD) algorithm based on the redundant representation and sparse representation over trained dictionary is given in [10]. A Random Refined Orthogonal Matching Pursuit (RROMP) algorithm, which is an algorithm of sparse recovery, is used for denoising the signal by generating many sparse representations using False Discovery Rate (FDR) control and multi-selection strategy [11]. A Dictionary Learning method based on Group Sparsity and Graph Regularization (DL-GSGR) is introduced. It takes a 3D medical image for denoising it using group sparse representation [12].
An image denoising method that uses enhanced sparse representation in the transform domain is described. In this method, same types of 2D fragments of an image are grouped into 3D arrays. This 3D group is processed by Collaborative Wiener Filtering [13]. The MRSI images with high Signal to Noise Ratio (SNR) are denoised using Casorati and Hankel form of matrix and low-rank approximation through Singular Value Decomposition (SVD). This method can be applicable in denoising the MRSI images, spatial-temporal images and spatial-spectral also [14]. A Low Rank Based Single Image Decomposition (LRSID) model is employed to naturally convert the image de-striping problem to image decomposition problem and separating the original image from the stripes [15].
A low-rank image decomposition method is demonstrated to reconstruct the images of ECT robustly and to recover the corrupted and missing pixels in it. To get the exact error matric and low rank matrix, convex optimization is carried out [16]. A Single Image Super Resolution model based on very deep convolutional neural network of VGG net is established to propose an effective and simple training process for Image Net classification [17]. The deconvolution network, auto-encoder and short-cut connections are combined into a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN) for low dose CT imaging based on deep learning is introduced in [18].
A CNN model over the coefficient of wavelet transform of low-dose CT image is described in [19] for the extraction of directional component of artifacts and the exploitation of intra band and inter band correlations to minimize the CT image noise. The auto-encoders based on denoising with convolutional layers is demonstrated for medical image denoising efficiently by boosting the sample size by combining the heterogeneous images to enhance the performance [20].

SYSTEM METHODOLOGY
The proposed model uses a Deep Two-Stage Iterative Down-Up CNN for medical image denoising, that more often causes variations in the of feature map resolution and also the Speeded Up Robust Features (SURF) which extracts, detects and descripts the useful features in the medical images in a dataset required for successful image denoising.

(i). Deep Learning Based Image Denoising
The deep learning approach when integrated into medical image denoising improves the performance of the denoising algorithm. Several image denoising algorithms are available and some of them are described as follows. A simple CNN architecture with 5 denoising layers cannot achieve significant performance. An image denoising model based on auto encoder is also observed to have low performance when compared to Block Matching and 3D Filtering (BM3D). A multilayer perceptron is introduced to learn mapping from noised image patches to denoised image, attains a performance which is similar to that of the BM3D. The performance of the Denoising Convolutional Neural Network (DCNN) is improved when compared to conventional models using convolution, batch normalization and Rectified Linear Unit (ReLU) as basic structures and trained deep networks by using global residual learning. A CNN model with 7 layers is introduced that considers the trade-off among the accuracy and the computational cost. This is successful in enhancing the receptive field using dilated convolution. The Memory Network (MemNet) provides a larger receptive field while sustaining less parameter with the help of a Recursive CNN and dense skip connections. A Residual Dense Network (RDN) which makes use of residual learning and dense connection as basic structure, increases the reuse of the features and attains considerable performance enhancement in image denoising using Gaussian noise. All these methods have its own limitations and to overcome those, a deep network based on Up-Down scaling is proposed in this paper.

(ii). Deep Networks Using Up-Down Scaling
The depth and the computational complexity of a network can be maintained when the receptive field gets increasing, a dilated convolution is used which is affect by gridding artifacts, but it can sub-sample the features sparsely. There should be a better trade-off between the computational cost and the receptive field, hence down-scaling and up-scaling of feature maps should be improved. The resolution of feature maps can be reduced to half by using maxpooling, simultaneously the number of feature maps can be increased by a factor of 2 to minimize the data loss. The Up-sampling with 2x2 convolution is used in up-scaling the lowresolution features. A method called Wavelet Transform (WT) that consists of convolution and sub-sampling can be used for down-scaling and up-scaling of the feature maps. The deep down-up network comprises of feature extraction, Down-Up Block (DUB), reconstruction and enhancement. The feature extraction is carried out using SURF. Consider the input image size is ܺ × ܻ, then the DIDN extracts ܰ features at first using 3×3 convolution, and feature extraction is performed by the size ଶ ‫ݔ‬ ଶ ‫ܰ2ݔ‬ via convolution layer of 2 strides. The feature that are extracted from the input image are then given to DUB, in which two up and down scaling processes are performed. For down-scaling, a 3x3 convolution layer with 2 strides is used and for up-scaling, a sub-pixel layer is used. The reconstruction block comprises of 9 convolution layers and a Parametric Rectified Linear Unit (PReLU). The enhancement block uses 1x1 convolution to reduce the number of feature maps in the output of reconstruction block, then an up-scaling process is done in the sub-pixel layer to produce a denoised image as an output.

(iii). Two-Stage Convolutional Neural Network
The propose deep iterative down-up CNN iteratively down and up scales the feature maps with the help of a convolutional layer and a deconvolutional layer by 2 strides.

a. Dataset
In this paper, 2 different types of dataset are used. The different medical images collected for developing a medical image dataset for image denoising process includes, the images from devises like Computed Tomography (CT) and Ultrasound Imaging. The CT images dataset is obtained from National Biomedical Imaging Archive (NBIA) and the Ultrasound images dataset is obtained from the teaching files of Gelderse Vallei Hospital in Ede, Netherlands. This

b. Image Decomposition
The noises in the medical images mostly consist of random and structural noises. Hence image degradation can be done by using the following expression: The image decomposition process estimates simultaneously the values of X and N from the value of Y.

c. SURF Feature Extraction
The SURF is used to extract the local robust features based on the Hessian matrix for detecting and based on Hessian distribution for descripting the features. The SURF uses Box Filter to approximate the Laplacian of Gaussian (LoG). This approximation provides an advantage of easier calculation of convolution with box filter using the integral images parallelly done different scales. In addition to this, the SURF depends on the Hessian matrix determinant for location and scales.

HESSIAN DETECTOR
The Hessian Detector detects the important features automatically regardless of variations in view-points. This can be generally described as the process of attaining the scale invariance for finding verifying the scale features at different scales or scale space that can be attained by splitting the scale space into various levels and octaves. The scale space is implemented in the form of image pyramids as given in the Fig. 3. The levels of the pyramid are found by the Gaussian kernels sub-sampling and smoothing. The SURF makes use of a Hessian detector for subtracting those layers of the pyramid. The scale space can be evaluated by up-scaling the filter sizes of the integral image combined to fast Hessian matrix method. The processing time of the SURF filters are size invariant. Hence the simultaneous processing is possible in SURF and need for image sub-sampling can be negotiated thereby improving the performance. A Hessian-based blob detector is used by SURF for finding the important features. The Hessian matrix determinant provides the extent of response and it is the expression of local changes across the region. The major need for SURF detection is the non-maximal suppression of determinants in Hessian matrix. The cost of convolution is high and it uses approximations and can be speeded-up by using integral images and approximated kernel. The SURF approximates the kernels using the rectangular box, box filter. Hence, the calculation of approximated convolution can be effective for arbitrary sized kernels that uses integral image.
The approximate and discrete kernels are termed as ‫ܦ‬ ௬௬ for ‫ܮ‬ ௬௬ and ‫ܦ‬ ௫௬ for ‫ܦ‬ ௫௬ . The term, ‫ݓ‬ is scale-sensitive but remaines as constant at 0.9. A descriptor must provide robust and unique descriptions of features. It is generated by the region around the interest points. The SURF descriptor is uses Haar wavelet response which is efficiently calculated by using the integral images. To achieve a rotational invariance, unique orientation for a feature is needed to be determined. In this case, before calculating the descriptor, the interest point surround region should be rotated to its direction. The SURF descriptor defines the region of interest of size 20s. The region of interest is split into 4x4 sub-regions as defined by the wavelet response values in the directions of ‫ݔ‬ and ‫.ݕ‬ The wavelet response in the direction, ‫ݔ‬ and ‫ݕ‬ is denoted respectively as ݀ ௫ and ݀ ௬ . For each of the sub-region, a vector, ‫ݒ‬ is computed on the basis of 5x5 samples.  Fig. 4, the selected square limits one of the 16 sub-regions and the dots characterizes the sample points. The descriptor of the features are 16-vectors for concatenated sub-regions. It is then undergone normalization to get the invariance to contrast variation and therefore represented as the linear scaling of the descriptor.

d. Two-Stage Iterative CNN (TSIC)
In this paper, a model of Two-Stage Deep Iterative Down-Up CNN is proposed for denoising the medical images and makes them suitable for diagnosis. This network consists of 3 layers namely, convolution, batch normalization and Rectified Linear Unit (RELU) layers. The convolutional layer extracts the features using SURF. The batch normalization layer prevents the gradient vanishing divergence problem. The RELU layer is used to track the sparsity and its nonlinearity. The algorithms for training and testing process of medical image denoising using two-stage iterative Down-Up CNN model are given in Algorithm-1 and Algorithm-2 respectively.

PERFORMANCE MEASURES
The various performance metrics used in the measurement of the performances of the proposed methodology over different dataset are given as follows: (i). Accuracy The accuracy of a propose model can be evaluated by calculating the average of the True values in the result obtained from the given dataset of medical images and is referred to as the ratio of sum of the True Positives and Negatives to the total of the True Positives and Negatives and the False Positives and Negatives.

FPR = FP FP + TN (vi). Recall
The Recall is referred to as the ratio of True positives to sum of the True Positives and False Negatives obtained from the dataset of medical images.

Recall = TP TP + FN (vii). Precision
The performance metric Precision is referred to as the ratio of True positives to sum of the True and False Positives obtained from the dataset of medical images.

Precision = TP TP + FP (viii). F1-Score
The F-measure, F ஒ is computed by using the Precision and Recall metrics. The value of F ஒ provides the metric F1-score. F1-score is referred to as the Harmonic mean of the values obtained from the Precision and the Recall.

CONCLUSION
In this paper, a Two-Stage Deep Iterative Down-Up Convolutional Neural Network is proposed for denoising the medical images, which more often increase and decrease the feature map resolution. In contrast to the existing approaches of medical image denoising, the proposed approach utilizes the characteristics of both image and noise components simultaneously for processing. The Speeded Up Robust Features (SURF) is used in the proposed system which acts as a feature detector and a feature descriptor. Extensive experiments are conducted using various medical noise image datasets like Computed Tomography (CT) and Ultrasound Image and found that the proposed model consistently exceeds the performance of the current stateof-the-art in medical image denoising efficiently and effectively. The major advantages of the proposed model are the speed and latency, security and cost savings. The speed and latency mean the traffic loads of a larger enterprise in computing the medical data are reduced thereby improving the performance if the applications and the services. The security means the problems related to the local compliances, privacy regulations and the data sovereignty are resolved. Cost savings means the computation not only eliminates the need for cloud but also optimizes the data flow which increase the computational costs.