Deep learning-based segmentation of breast masses using convolutional neural networks

Automatic breast tumor segmentation based on convolutional neural networks (CNNs) is significant for the diagnosis and monitoring of breast cancers. CNNs have become an important method for early diagnosis of breast cancer and, thus, can help decrease the mortality rate. In order to assist medical professionals in breast cancer investigation a computerized system based on two encoder-decoder architectures for breast tumor segmentation has been developed. Two pre-trained DeepLabV3+ and U-Net models are proposed. The encoder generates a high-dimensional feature vector while the decoder analyses the low-resolution feature vector provided by the encoder and generates a semantic segmentation mask. Semantic segmentation based on deep learning techniques can overcome the limitations of traditional algorithms. To assess the efficiency of breast ultrasound image segmentation, we compare the segmentation results provided by CNNs against the Local Graph Cut technique (a semi-automatic segmentation method) in the Image Segmenter application. The output segmentation results have been evaluated by using the Dice similarity coefficient that compares the ground truth images provided by the specialists against the predicted segmentation results provided by the CNNs and Local Graph Cut algorithm. The proposed approach is validated on 780 breast ultrasonographic images of the BUSI public database of which 437 are benign and 210 are malignant. The BUSI database provides classification (benign or malignant) labels for ground truth in binary mask images. The average Dice scores computed between the ground truth images against CNNs were as follows: 0.9360 (malignant) and 0.9325 (benign) for the DeepLabV3+ architecture and of 0.6251 (malignant) and 0.6252 (benign) for the U-Net, respectively. When the segmentation results provided by CNNs were compared with the Local Graph Cut segmented images, the Dice scores were 0.9377 (malignant) and 0.9204 (benign) for DeepLabV3+ architecture and 0.6115 (malignant) and 0.6119 (benign) for U-Net, respectively. The results show that the DeepLabV3+ has significantly better segmentation performance and outperforms the U-Net network.


Introduction
Nowadays, the breast cancer is still the main reason of death among women.To detect possible breast tumors the examination is performed using different screening procedures.Among them, the mammography is a primary screening method for breast cancer, with a very good performance in small tumors and microcalcifications detection.However, it uses ionizing radiation and breast compression.However, the dense breast tissue can hide small tumors and thus the sensitivity of the mammography decreases.This is the main limitation that considerably reduces the sensitivity of the mammography technique [1].Another breast imaging method is ultrasound imaging.This technique does not use ionizing radiation, is more cost-effective and allows the detection of tumors in the case of dense breast tissue.To differentiate benign and solid breast lesions, the ultrasound technique evaluates various features such as: morphology, orientation, boundary of lesions and lesion size.As a diagnostic tool, breast ultrasound imaging (BUS) has a good performance in an early detection of cancer.
Various tools for suspicious lesions localization and segmentation (i.e., the boundaries of the lesion are outlined to differentiate it from the background tissue) were developed.Localization and segmentation of the tumor aims to separate it from the normal breast tissue.Correct segmentation of breast tumors is a necessary stage in the diagnostic process.To obtain quantitative data, automatic segmentation can be useful for medical radiologists in the analysis of breast cancer.The segmentation performance is assessed based on the ground truth images that are manually generated through boundaries delineation.Over the years, the first attempt was devoted to computer-aided diagnostic systems, which are important tools in assisting the medical imaging professionals [2,3].Nowadays, artificial intelligence (AI) is the leader in clinical practice, as it saves time and preforms tedious activities much faster, diminishes radiologist overload and helps less experiences practicians, in some cases [4][5][6][7][8][9][10][11][12][13][14][15][16][17].AI includes machine learning and deep learning as efficient computational tools for biomedical big data storage, analysis and understanding.In image processing, the main tool used by the deep learning is the convolutional neural network (CNN), with several models proposed in recent years.Generally, a CNN has input and output layers, along with the convolution, max pooling and fully connected layers.They enable the CNN to learn a huge number of abstract features.
The proposed approach aims to find a deep learning network solution to automatically detect and segment breast cancer with high accuracy.To this end, our main contributions are as follows: (i) BUS images segmentation using the Local Graph Cut method from MatLab as a benchmark; (ii) BUS images segmentation using two Encoder-Decoder architectures, namely DeepLabV3+ and U-Net; (iii) the segmentation performance analysis using the Dice similarity coefficient computed between the ground truth images and the predicted segmentation results provided by the two CNNs and (iv) a comparison between the Local Graph Cut segmented images and the predicted segmentation results provided by the two CNNs.

Related Works
The development of AI research on breast ultrasound has tremendously increased.Many studies devoted to imaging classification, object detection, segmentation and synthetic imaging of breast lesions were published.Vakanski et al. [6] proposed a deep learning model for breast tumor segmentation in BUS images.This approach introduced blocks of attention into a U-Net architecture and learned feature representations that prioritize spatial regions with high levels of saliency.This deep learning model used a dataset of 510 images and a Dice similarity coefficient of 90.5% was reported.Two different encoder-decoder architectures for breast tumor segmentation, SegNet and U-Net were proposed in [7].The proposed model used the ratio 0.85/0.15for training / validation datasets.The U-Net architecture has returned the best qualitative and quantitative results.The SegNet architecture provided a mean intersection over union (IoU) of 68.88% and 76.14% for the U-Net architecture.Four semantic segmentation models based on CNNs along with AlexNet, U-Net, SegNet and DeepLabV3+ were analyzed in [8].Over 3000 BUS images were used for training and validation.The segmentation performance was quantified by F1-score and IoU.The best results were achieved by models based on SegNet and DeepLabV3+ with a F1-score > 0.90 and an IoU > 0.81.Tsochatzidis et al. [9] proposed a CNN approach to analyze the mammographic information for breast cancer diagnosis.An improved diagnostic performance was obtained for the proposed method using a CNN classifier.Xu et al. [10] used CNNs to segment the 3D BUS images and various metrics used to evaluate the segmentation performance.Their reported results indicated that the obtained segmentations can facilitate the breast cancer diagnosis.Singh et al. [11] proposed a deep learning method for breast tumor segmentation based on the texture features and the contextual dependencies.The dilated or atrous convolution allows capturing the spatial context (the position and size of the tumors).The model can examine various tumors of various shapes and sizes.They reported the Dice and IoU values of 93.76% and 88.82%, respectively.
A combination between a graph CNN and a classical CNN in order to improve the detection of malignant lesions in breast mammograms was proposed in [12].The authors reported improved performance, i.e., a sensitivity of 96.20%, specificity of 96.00% and accuracy of 96.10% and concluded that the proposed method improves detection of malignant breast masses.Salama and Aly [13] proposed a new technique based on the following models: ResNet50, MobileNetV2, InceptionV3, VGG16 and DenseNet-121 to segment the area of interest in mammographic images.Three mammographic datasets were used to evaluate the proposed models: MIAS, DDSM and CBU-DDSM.The best classification performance was reported for the InceptionV3 and modified U-Net working in the DDSM mammography dataset.Luo et al. [14] proposed a new segmentation framework based on CAD technology and deep learning algorithms for breast tumor classification.Initially, the network is trained to obtain enhanced images of the segmented tumors.The features are obtained from the raw and enhanced images, respectively, by using two parallel networks.A new cascaded CNN that consists of the U-net, a bidirectional attention guidance network and a refinement residual network for breast lesion segmentation was proposed in [15].The results indicated the cascade convolutional algorithm as being able to improve diagnostic performance.A selective kernel U-Net CNN model for BUS image segmentation was developed in [16].The network's receptive fields are adjusted using both a selective kernel and an attention mechanism to provide the fuse feature maps.Another CNN model was used to build an "attention enhanced U-net" for breast segmentation with improvements to the obtained results [17].

Proposed method
The Encoder-Decoder architecture employs a decoder network that maps the resolution of the encoder network layer features.This mapping aims to recover the mask that retains the tumor segmentation at the original image size.
DeepLabv3+ employs Xception-65 as its backbone.This module is based on deep separable convolutions with different steps, which decomposes the convolution into a deep convolution and a point convolution.It uses atrous spatial pyramid pooling (ASPP) to increase the field-of-view but the necessary number of parameters was not increased.The ASPP module processes the up-sampled feature map to conform to the low level resolution.Then, it up-sampled again the feature map.The restoration of spatial information is done progressively to pick up boundary information of the target.Thus, the loss of intrinsic spatial information is minimized.The decoder uses bilinear up-sampling to restore the initial spatial resolution.[22] is a CNN used for semantic segmentation with a symmetric architecture.It consists of an encoder devoted to spatial features extraction and a decoder that generetas the segmentation map using the encoded features.The encoder contains two 3 × 3 convolution operations, one 2 × 2 max-pooling process and two other steps that are repeated four times.A 2 × 2 transposed convolution operation followed by two 3 × 3 convolution operations are used by decoder for feature map generation.This sequence is repeated four times.

U-Net architecture description. U-Net
Finally, the segmentation map is obtained based on 1 × 1 convolution operation.The ReLU (Rectified Linear Unit) activation function works in the convolutional layers.The final convolutional layer uses a Sigmoid activation function.The connection between the encoder and the decoder is made by a progression of two 3 × 3 convolution operations.The U-Net network architecture is presented in figure 3. The Local Graph Cut segmentation method is a semi-automated segmentation technique.To segment the breast mass, a region of interest (ROI) is drawn around it.The boundaries of the region of interest will mark the breast mass to be segmented.It is an interactive segmentation where some information is provided by the user.
The segmentation performance is evaluated using the DICE similarity coefficient.The similarity coefficient compares the segmentation of the original raw images with that provided by the Graph-cut as benchmark (they help to compare the performance and efficiency of segmentation).Also, the segmentation results provided by the CNN are compared with the segmentation performed manually by the radiologist (ground truth images).The flowchart of the segmentation step is showed in figure 4.

Results and Discussion
The BUS images are segmented using the Local Graph Cut algorithm from the MATLAB environment.These segmentation results were used as benchmark images together with ground truth images provided by radiologists for subsequent segmentations done with convolutional networks.
Visualizations of the segmentation results are displayed in figures 5 and 6.Overall, the segmentation outputs provided by DeepLabV3+ are closer to the ground truth images provided by both the radiologists and graph Cut algorithm used as benchmark (figure 5).DeepLabV3+ provides an accurate segmentations compared to the U-net networks.Compared to U-net networks in terms of Dice scores (0.9360/malignant and 0.9325/benign), the DeepLabV3+ network clearly outperformed the segmentation provided by U-Net.The segmentation performance of U-Net networks is significantly worse (0.6251/malignant and 0.6252/benign).The encoder-decoder structure of the DeepLabV3+ network allows for a better resolution control of the extracted encoder features.
Moreover, the DeepLabv3+ model performed slightly better with malignant tumors than the benign ones.This finding suggests that the low intensity of benign tumors made them more difficult to segment.For U-Net model we cannot find a pattern in the segmentation performance.The rationale of our study indicates that a classical encoder−decoder structure cannot successfully generate segmentation outputs.A previous study reported a similar segmentation performance of a DeepLabV3+ model with a Dice coefficient of 0.8690 [23].Another study tested the segmentation performance of U-net models using BUS image and reported a Dice score of 0.7177 as the best performance of the model [24].

Conclusions
To assist medical professionals in breast cancer diagnosis, a computerized system based on two encoder-decoder architectures for breast tumor segmentation was proposed.The segmentation performance of DeepLabV3+ and U-Net models were investigated.Significant performance differences are shown in terms of Dice similarity coefficient with respect to various experimental conditions.Our proposed DeepLabV3+ achieves promising performance while U-Net provides the worst results among the analyzed methods.Semantic segmentation based on deep learning techniques can overcome the limitations of traditional algorithms.

Figure 1 .
Figure 1.Various BUS images of BUSI dataset [16].Benign lesion in gray scale image (a) and ground truth image (b).Malignant lesion in gray scale image (c) and ground truth image (d)

Figure 2 .
Figure 2. The description of the DeepLabV3+ network architecture.

Figure 3 .
Figure 3.The U-Net network architecture.

Figure 4 .
Figure 4.The segmentation flowchart of breast mass by CNNs.

Figure 5 .
Figure 5. Visualizations of the DeepLabV3+ segmentation results.The top left three images are benign images (from left to right correspond to original raw image, ground truth provided by radiologists, and segmentation output by DeepLabV3+).The top right three images are benign images (from left to right correspond to original raw image, ground truth provided by Graph cut algorithm, and segmentation output by DeepLabV3+).The bottom images are malignant images and the experimental conditions are the same.

Figure 6 .
Figure 6.Visualizations of the U-Net segmentation results.The images in the first row are benign images (the first two images from left to right correspond to ground truth provided by radiologists and the result predicted by U-Net; the following two images correspond to ground truth provided by Graph cut algorithm and the result predicted by to U-Net).The images on the second row are malignant images and the experimental conditions are the same.

Figure 7 .
Figure 7. (a) Average Dice score for segmentation performance for DeepLabV3+; (b) Average Dice score for segmentation performance for U-Net.The central lines indicate median Dice score values; "boxes" are the interquartile range and "whiskers" indicate the smallest and largest values.