Deep Convolutional Neural Network for Computer-Aided Detection of Breast Cancer Using Histopathology Images

The innovation in medical imaging technologies leads to a frenetic pace of change in health care. In recent years various deep learning algorithms play a significant role in medical image classification and diagnosis. The deep convolutional neural network (DCNN) has obtained impressive results in many health-related applications. The fine-tuning parameters and weight initialization is the major task to adapt pre-trained convolution models. We explored transfer learning approaches using Alexnet, and VGG-16 analyzed with their behavior. Also, the DCNN framework had developed and compared with Alex net and VGG-16 transfer learning models. The DCNN attained more significant results compare to transfer learning models. The DCNN procures outstanding performance for binary (93.38%) and multi-class (average 89.29%), which exceeds the previous state of the art techniques in the literature.


Introduction
Breast cancer is one of the dominating causes of cancer in the universe. In 2012, 1.7 million new breast cancer cases were registered, and it is also the dominant cause of cancer death in all around the world. The estimated number of death among women in 2018 is 627000, which are approximately 15% of cancer death. The number of breast cancer for all age groups is 25.2%, and it is higher than in other cases [1]. Early-stage detection increases the chance to initiate effective treatment to improve survival rates [2]. The initial stage breast cancer diagnosis was performed by different imaging modalities such as mammography, thermal, and ultrasonography. The possibility of carcinoma growth is identified by breast biopsy techniques [3]. The examined cancer tissue was stained by hematoxylin and eosin (H&E) with pink cytoplasm and purple nuclei. The staining is the process used to identify and grading of cancer cells. In this work, the histopathology image taken from the slide is classified using deep learning techniques.
Deep learning is a powerful tool to learn from the extracted features automatically and perform the task, such as regression and classification. Deep learning is developed from the classical neural network, and it differs from many hidden layers and training paradigms. Deep learning plays a substantial role in pattern recognition, machine learning, and computer vision [4]. The raw data considered for layer by layer multiple level operations with optimal weights. In high-level abstraction, the feature set is extracted automatically. The variety of different histopathology image analysis techniques have been applied using deep convolution neural networks [5], [6]. In computer-aided diagnosis, the deep neural network is the hierarchical unsupervised learning methods for feature extraction, and supervised steps are needed to optimize the classification process [7]- [9]. introduced a patch-based classifier using CNN. The proposed classifier operated with two modes one patch in one decision and all patches in one decision [22]. In [23], the authors proposed a CNN based method to classify histopathological images from 269 images. The accuracies of CNN based SVM are 77.8% (4 class) and 83.3% (2 class). Szegedy et al. proposed the Inception_V3 network obtained 78-93.9% accuracy [24]. The patient-level classification of breast cancer with CNN and multi-task CNN models achieved the patient recognition rate for CNN is 83.25% and 82.13% for MTCNN [25]. Mutaza et al. (2019) proposed transfer learning using Alexnet, and the BreakHis dataset achieves 81.25% of accuracy, 77.46% of specificity, 82.49% of sensitivity, 91.79% of precision, and 86.89% of F measure. These results are reached by fine-tuning the hyperparameters [40]. In [41], the authors utilized ICPR 2012 and 2014 dataset for histopathology image diagnosis using region convolution neural network (R-CNN). The authors grading system accomplished quality results that make use of fusion in Resnet-50 and Densenet-201.

Dataset
Fabio et al. introduced a breast cancer dataset for histopathology images (BreakHis), acquired from 82 patients [26]. The performance had been examined using BreakHis for binary and multi-class classification. The BreakHis is the large scale dataset that comprises 7909 images with benign and malignant, which includes four subclasses for each. The images are categorized under four magnification factors (40×, 100×, 200×, 400×) and eight subclasses [27]. The images are an eight-bit depth RGB image, and the size of the image is 700×460. The following table 1 shows the accumulation of BreakHis dataset images.

Methods
In this section, we state the description of deep convolution neural network (DCNN) and fine-tuned transfer learning classification approaches. First, we will confer the DCNN architecture with 15 layers. Then we use the pretrained model Alexnet and VGG-16 for our dataset. Finally, the classification accuracy comprises DCNN and transfer learning methods for binary and multi-class.

The DCNN Architecture
The CNN is designed with deep 15 layers architecture with learning parameters to classify binary and multi-class classification. The layer by layer DCNN design is shown in figure 1. The whole input histopathology images are resized (28×28) with eight-bit depth RGB channels. The convolution layer retrieves the features from the input layers. The convolved weights with input are known as the kernel. The amount of filter shifting is called stride, which is usually lower than the kernel [28], [29]. The size of the kernel is 3×3, and the stride value is 2. The weights are passed by using the activation layer Relu. The pooling layer performs the dimensionality reduction using downsample the features with stride 1. The fully connected layer has a complete connection with neurons. The activation is computed with bias offset and matrix multiplication.
The learning parameters are implemented to develop the DCNN model, based on the optimizer stochastic gradient descent (SGD) [30], and the parameters are selected as in [31]. SGD does frequent updates of weights, hence converge faster for the larger dataset. For the training process, the learning rate is chosen as 0.001, which gives better results. The maximum epochs terminated with 300 and further no improvement in accuracy after 300 epochs. The validation frequency is selected as 10. Fig 2 shows the schematic representation of proposed DCNN for breast cancer classification.
DCNN is the feed-forward neural network includes a series of convolution operation and subsampling layers. The function of layers depends on three parameters, namely, height, width, and the number of channels.
Relu is the activation function utilized between the pooling layer and the convolution layer [32]. The activation function produces an activation map (M): = ( * ℎ + ) (1) The dimension of the activation map is given as, 1 + − × 1 + − × 1 (2) By applying different kernels (K) in the activation map, the dimension can be changed as  The convolution layer ( ) is the input layer comprises feature maps 1 of size 2 × 3 . The convolution layer receives the raw input images and sends feature maps to the output layer. The output feature maps 1 of size 2 × 3 [34], [35]. The ℎ feature map in the convolution layer is denoted by, = + ∑ × −1 (4) Where is the bias parameter for training data. The kernel size is denoted by . The convolution layer width and height can be determined by using the following equation (5) Where and ℎ represents convolutional layer width, and ℎ denoted as the input image width and height, and ℎ refers to the convolution filter width and height. Similarly and ℎ represents the stride width and height.
The mathematical representation of DCNN architecture with the input size 28×28 is illustrated by the following calculations  In the first convolution layer ( 1 ) we used the filter size of 3×3, stride 1, and kernel 8.

Alex-net
Transfer learning is the process of fine-tuned parameters utilized to perform a different task from the network trained with one task [36]. AlexNet is the deep neural network architecture with 4096 fully connected layers. By using the pretrained network, we analyze the strength of the network when transferring network to the medical image domain. Training of AlexNet had taken more than 4 hours, depending on the specification of the system and size of the datasets. In this experiment, the pathological database is divided into two sections, one for training (80%), and another section (20%) is utilized for validation. The termination of the epoch is based on the observation of the highest accuracy. The training process using AlexNet is shown in figure 3.

Figure 3. Schematic representation of the fine-tuned Alexnet
In our experiment, we change the input image size 227×227. Then the complete dataset is split into two groups for binary and multi-class classification. The fully connected layer is changed with dimensionality reduction. Finally, for binary class, the fully connected layer is 2, and for multi-class, the fully connected layer 4. The training is limited up to 3 epochs, and the learning rate is 10 -5 .

VGG-16
The VGG-16 is a multi-layered deep convolution neural network. The layers used in VGG-16 are depicted in figure 6. The fully connected layer is comprised of 4096 neurons [37]. For feature extraction and dimensionality reduction, convolution, and max-pooling layer are used [38]. The entire dataset is divided into binary and multi-class for training and validation. The 80% of the data is utilized for training, and the remaining is used for validation. The validation frequency is 20 with three epochs, and the learning rate is 10 -5 . The training time depends on the size of the dataset and the specification of the system. The VGG-16 fine-tuned pre-trained model for the histology image classification is shown in figure 4.

Results and Discussions
The present work manifests the uniform results in the pretrained model and improves the results in the DCNN model. The DCNN, Alexnet, and VGG-16 experiments are carried out with the following system configuration: Intel(R) Core (TM) i5, Windows 7, 4 GB RAM with Single CPU. The experiment is operationalized by using Matlab 2018b. In DCNN, 90% of data is used for training, and 10% is utilized for validation. In the pre-trained model (Alexnet and VGG-16), 80% of data is used for training, and the rest of the data is applied for validation.   Table 2, it can be observed that the DCNN accomplishes significantly dominant results, whereas Alexnet and VGG-16 are proportionate. To analyze the performance of the model concerning training and validation for both binary and multi-class (DCNN (90%-10%), Alexnet (80%-20%) and VGG-16 (80%-20%) are employed to get better accuracy. In our proposed method, we attained 93.38% average accuracy for binary class, and for multi-class, the average accuracy was 89.29%. The proposed method is analyzed comparatively with the existing models and is shown in Table 3. Figure 6 and 7 illustrates the different performance graphs for binary and multi-class with various magnification factors utilizing pre trained models (Alex net and VGG-16). Figure 8 shows that training and validation accuracy of histopathology using deep convolution neural network (DCNN). The outcome of the proposed CNN model surpass the modified Alex net and VGG-16.

Conclusion and Future works
Since breast cancer detection at the early stage helps the patient to undergo medical treatment. Using handcrafted features combined with machine learning algorithm gives less accuracy, and the final decision depends on the robustness of the extracted feature. Hence deep learning techniques with the convolutional neural network are explored in the present work. In this paper, binary and multi-class classification is performed by using deep features extracted by DCNN, Alexnet, and VGG-16. The BreakHis dataset is used to carry out this work. The Alexnet and VGG-16 net are adapted to identify breast cancer in histopathology images. The comparison results clearly indicate that the proposed DCNN architecture outperforms the fine-tuned pre-trained model. The scope for future research could be directed based on using different pre-trained models and varying hyperparameters. In real-time applications initially, binary classification can be done by the developed DCNN. Further grading can be performed with the developed multi-class DCNN. Thus, the developed architecture is immensely useful for real-time applications.