Prognostication of Acute Lymphocytic Leukemia (ALL) using Capsule Network Algorithm

A type of cancer that affects the blood-forming tissues in the body including lymphatic system and bone marrow is Leukemia. The second most commonly occurring acute leukemia is the acute lymphoblastic leukemia or acute lymphocytic leukemia (ALL). Around 25% of the cases are observed to be due to malignant T-cell precursors while the remaining 75% of cases is due to precursors of B-cell lineage. In general, response to chemotherapy, white blood cell count and age are the clinical factors that contribute towards risk stratification. However, in recent years it has been identified that genetic alterations have enabled between individual prognosis and recovery. Despite advancement in technology, chemotherapy using anthracycline, corticosteroids and vincristine serves to be the backbone therapy to treat this disease. In this proposed work, we have used a deep convolutional neural network to detect the presence of ALL accurately and based on the image screened, it is further categorized into one of the 4 subclasses. Using Capsule network algorithm (CapsNet), we have established 100% average sensitivity for ALL detection with a highest specificity of 99.56%, precision of 99.82% and accuracy of 99.36%. When compared with other similar methodologies, we have been able to accomplish higher accuracy without microscopic image segmentation using capsule network algorithm.


Introduction
Leukemia is associated with a type of cancer that gives rise to malignant white blood cells (WBCs). These cells will affect the bone marrow as well as the blood cells, hindering their immunity, making the human body vulnerable to immunity [1]. This will further prevent the bone marrow's capacity to generate platelets and red blood cells. There is also the possibility for the WBSs to damage the organs through which it flows such as the brain, spleen, kidney, liver etc. Leukemia can be classified as myelogenous or lymphoblastic/lymphocytic based on the WBC kind that is affected [2]. Myelogenous leukemia is when the affected cells are monocytes and granulocyltes whereas lymphocytic leukemia is when the affected cells are lymphocytes. According to a case study observed in the United States, it was found that ALL in adults were at a peak value, recording more than 7000 cases in a year. The reason behind ALL is genetic alterations and chromosomal abnormalities observed in proliferation and differentiation of lymphoid precursor cells [3]. The proliferation and malignant transformation of lymphoid progenitor cells in the extramedullary sites, blood and bone marrow is known as Acute lymphoblastic leukemia. Though ALL is predominantly found in children, when it occurs in adults, it proves to be fatal. Another survey conducted in the United States showed that in a population of 100000 people, ALL incidence is fixed at 1.6. A total 7862 new cases of ALL was diagnosed in the year 2018 accounting for 1425 deaths. Based on these observations, it can be identified that during childhood, the first peak of ALL incidence occurs, following bimodal distribution while the second peak happens at the age of 53 [4]. In pediatric patients, the use of dose-intensification strategies will be fruitful in overcoming the disease at early stages. However, in adults, prognosis will prove to be insufficient. In fact, only 30-40% of patients show response to high rates of chemotherapy, beyond a certain age limit [5]. In ALL, some abnormalities, the cells affected will tend to divide and grow at a very fast pace and might even live on for a longer period of time when compared to other normal cells. During the course of time, these cells will gather in the bone marrow of healthy blood cells resulting in decreasing the count of healthy platelets, red blood cells and white blood cells indicating symptoms and signs of leukemia [6]. In this paper, we concentrate on Acute lymphocytic leukemia (ALL) and a means to detect it at the early stage.
ALL can be further categorized into L1, L2 and L3 (3 subtypes) according to French American British (FAB) classification.
x L3 type cells: These cells have an oval or round nucleus of normal size and are identical to each other. They produce vacuoles as they hold a reasonable amount of cytoplasm. They are considered to be larger than L1 type cells x L2 type cells: These cells are considerably larger than L1 type cells and have variable shapes.
x L1 type cells: These cells are identical to cytoplasm in physical appearance. They are usually small in size.
By taking a complete blood count test, it is possible to diagnose lymphoblastic leukemia. A spike in the WBC is an excellent indicator of leukemia [7]. There are many methods used to diagnose leukemia manually. In this paper, we propose a methodology which captures the image of the skin to identify the presence of leukemia using image processing.

Related Work
In recent years, a number of studies have been conducted to enhance image quality. Contrast stretching is one such technique that is useful in diagnosing ALL, also providing extra details on nucleus cell and cytoplasm [8]. A median filter is used to improve the image quality thereby resulting in an object which appears smoother by holding details of edge. Segmentation is an important aspect in identifying medical objects. It is used to segregate the part of the cell body which holds cytoplasm and nucleus [9]. One method that can be used for the segmentation process is thresholding. Similarly, Kmeans object separation can also be used for separation using color as the medium. Other relevant methodologies like morphology reconstruction, watershed distance transform etc will also be useful for segmentation [10]. Feature extraction is the next stage following the segmentation process. The extracted features represent input values that are associated with identifying an object's basic characteristics. These characteristics hold an accuracy value of 96% while other features like ratio and area of the nucleus have an accuracy of about 90% in identifying the object. Standard deviation and mean of RGB colors can also be used as color features [11]. With a support vector machine, object classification was performed using supervised classification techniques. On the other hand, using SVM technique will prove to be effective when applied in linear non-linearity to spit the data according to the hyperplane classifier [12]. This technique can be used for applying linear separation in order to segregate 6 cell types attaining an accuracy of 96% [13].

Image Acquisition
This work is carried out with that publicly available ALL-Image DataBase (IDB) online data set [14] for study and analysis. Two sub datasets are created with images from the main dataset. The first set ALL1 consists of 100 images of which 60 images are from healthy individuals and 40 images   Figure 2 represents the sample healthy, L1, L2 and L3 blast cells from the database. The inter and intra class similarities and variabilities impose certain challenges in classification of ALL images. For this purpose, we use Capsule Network algorithm, to enable quick and efficient classification of the ALL images. Further, more ALL images from other sources like Google are obtained for the purpose of testing and verification. 60% of the obtained images are used for training purposes and the remaining 40% is used for evaluation purposes. Overtraining is avoided with data augmentation.

Image Processing
Overfitting may occur if the number of training cases used are insufficient. To overcome this issue, a data augmentation scheme is used for manipulating the images available with mirroring and rotation techniques. This caused the increase in the number of images to 1000 where 384 images are from healthy subjects and 616 are from leukemia affected subjects. Further, the leukemia images are categorized based on their type. 295 L1 type images, 235 L2 type images and 86 L3 type images are obtained. The final results may be imbalanced due to the difference in the number of images in the   Interpretation of the capsule's output vector length can be performed as the probability that the current input contains the entity characterized by the capsule. The CapsNet can contain several capsule layers. In this paper we have used a primary capsule layer that modifies the convolution layer output and a CancerCaps layer that categorizes the images into L1, L2 and L3 types. Several convolution layers as required can be used before the primary capsule layer. Convolution with stride greater than 1 can be used for reducing the dimensionality of the model. Each capsule in the CancerCaps layer is associated with every capsule in the primary capsule layer. To improve the learning capability, a routing-by-agreement algorithm is implemented.

Results and Discussion
Detection and classification of ALL and its types is performed using a trained CapsNet model. RGB color image, HSV (hue, saturation, value) color image, YCbCr (luminance, chroma blue, chroma red) color image and a combination of these parameters are used in each dataset during training. 60% of the obtained images are used for training purposes and the remaining 40% is used for evaluation purposes. 100% average sensitivity is obtained for ALL detection with a highest specificity of 99.56%, precision of 99.82% and accuracy of 99.36% as represented in table 1.  During the data training stage, classification is performed and the output is modelled during the testing phase. Images are evaluated with Minimal Overlap Probability for multiple iterations over multiple feature combinations of the images. The gamma and cost values are also evaluated during the training and testing phase of data processing. Radial basis function kernel is used for comparison with the proposed CapsNet model. Six features namely area, perimeter, circularity, nucleus ratio, mean and standard deviation are considered in this work. Figure 4 provides the graphical representation of accuracy for combination of various features.

Fig. 4: Accuracy for combination of various features
The highest accuracy parameter is assessed and combination of multiple features is studied. Figure 5 provides the comparison of accuracy levels of each type of cells analysed in this work. When compared to the previous work in this domain, the proposed model provides highest accuracy. This work also uses the largest dataset of images for both training and evaluation purposes. CapsNet also addresses the issue of noise suppression from the image by using frequency and conservative smoothing filters. This impacts the performance of the proposed model in the positive direction. Maximum pooling and convolution are performed in this work. Maximum likelihood estimator is used for routing purpose. The lower-level capsule votes are iteratively fitted with the higher-level capsules by means of Gaussian distribution. Cluster estimation of higher capsules couple-coefficients are performed using capsule activation using their probability values. Network regularization is performed with routing alone. Training process does not involve image reconstruction. Under a specific receptive field, routing and convolution transformations are executed. When compared to the conventional routing procedure, this scheme reduces the number of parameters involved significantly.

Conclusion and Future Scope
In this work, we have introduced a novel capsule network algorithm that is used for identifying and classifying the ALL based on its characteristics. Three levels are diagnosed and according to the type of ALL detected, we can further submit the analysis for treatment of the disease. Using Capsule network algorithm (CapsNet), we have established 100% average sensitivity for ALL detection with a highest specificity of 99.56%, precision of 99.82% and accuracy of 99.36%. This methodology can be used to diagnose leukemia and prevent the spread of it at an early stage. To optimize the proposed work, an in-depth analysis of the architecture can also be made in terms of dimensions of capsules in different layers, number of convolutional layers and type of capsule network used. A cross-validation of the observed results prove to be very promising indicating that the proposed work is better than other classical convolutional neural networks.
As a part of future work, researchers can use this methodology for implementing in various deep learning architectures. Apart from ALL, other types of leukemia can also be detected by incorporating the capsule network algorithm. Since we have used an algorithm that learns about the aspects of ALL from the scratch, we have used data sets extensively. This will also enable the oncologist and pathologist to understand and analyze leukemia in a more efficient manner. A full automation of the system can also be attempted by researchers by determining the output and input parameters. Similarly, AML can also be detected using this methodology to identify and differentiate between the different blood cells. On a wide aspect, it is safe to understand that in order to diagnose the disease at a quicker rate, a computer-aided decision system can be used.