Cervical single cell of squamous intraepithelial lesion classification using shape features and extreme learning machine

Cervical cancer is an abnormal growth of cells found on the cervix. In general, cervical cancer is identified early by doing a pap smear test. However, this examination is still manually performed by doctors and the results are still subjective. Therefore, this study aims to determine the classification of Squamous Intraepithelial Lesion automatically from cervical single cells. The classification of those Squamous Intraepithelial Lesion includes normal cervical cells, Low-Grade Squamous Intraepithelial Lesion (LSIL), and High-Grade Squamous Intraepithelial Lesion (HSIL). We used Extreme Learning Machine (ELM) as a classifier and tried to compare the ELM’s performances with Backpropagation Neural Network method. We used 225 data and 3 classes include normal, LSIL, and HSIL. The classification was carried out by manual cropping and segmentation as the image pre-processing and the feature extraction was based on shape features consisting of Circularity, Semi Major and Minor Axis Length, Equivalent Diameter, Average Radius, and Compactness. This study concluded that Squamous Intraepithelial Lesion classification by using ELM had better performances than Backpropagation Neural Network. The highest accuracy result of 96.67% was obtained in Backpropagation training, while the highest accuracy in ELM’s training was 100% when both methods were tried by using 225 data.


Introduction
Cancer is a disease that arises from the abnormal growth of body tissue cells that turn into cancer cells. From several types of cancer, cervical cancer ranks fourth in incidence after breast cancer in developing countries [1]. Cervical cancer is generally caused by the Human Papilloma Virus (HPV) where this virus causes changes in cell DNA. This causes cell growth to occur continuously so that early detection is necessary. In addition, the cause of delay in the diagnosis of cervical cancer is a symptom that is not visible and only obvious when it is in its final stage. To detect cervical cancer, generally, an early examination or pap smear can be done. Pap smear is very important in reducing the incidence of cervical cancer [2]. This examination requires medical personnel to get an accurate diagnosis. However, medical personnel still analyse the result visually, so the results are subjective. Therefore it is necessary to have an automatic analysis in carrying out the process of diagnosing cervical cancer as a second opinion for doctors, so that it is expected that an accurate diagnosis of cervical cancer cells can be established.
Research on the identification of cervical cancer cells has been carried out in previous studies using SVM, k-NN and ANN to identify normal and abnormal cells. The feature extraction method used is the Gray Level Cooccurence Matrix (GLCM). The accuracy results of this study are 86% with SVM, 70% with KNN, and 65% with ANN. Subsequent research used morphological feature extraction and image classification using the k-nearest neighbor (kNN) method, this study generated the accuracy of 82.9% with 5 Fold Cross validation [3]. Another research employed the random forest method and resulted in 81.71% of accuracy. To identify cells and feature extraction, the study used the Gray Level Cooccurence Matrix, local binary pattern and tamura [4]. Subsequent research used the multilayer perceptron method to identify normal and abnormal cells. This study uses the extraction method of morphological features such as size, shape and texture with an accuracy of 85.05% [5]. Another research employed the fuzzy min-max neural network classification method to classify normal cells, low-grade squamous intra-epithelial lesions (LSIL) and high-grade squamous intra-epithelial lesions (HSIL) while the feature extraction method used is adaptive fuzzy. moving k-means (AFMKM). The level of accuracy obtained in this study is 75% [6].
We used the Extreme Learning Machine (ELM) method and compared the result with the Backpropagation, that is the commonly used image classification method. The ELM has the advantage of being able to work optimally even on complex functions with linear and non-linear data [7]. ELM has a good learning ability in generating image classification with accurate results. In this study, the ELM method was used for image classification of Squamous Intraepithelial Lesions automatically from cervical single cells.

Datasets
Cervical cancer Pap smear image data were obtained from Dr. Soetomo General Hospital. The data taken is an image of Squamous Intraepithelial Lesion consisting of Normal class, LSIL (Low-Grade SIL), HSIL (High-Grade SIL) with jpg format. A total of 225 cervical cancer Pap smear image data were used. Training process used 180 data and 45 data for the testing process.

Feature Extraction
The shape features to be extracted are: Circularity, Semi Major and Minor Axis Length, Equivalent Diameter, Average Radius, Compactness. Cervical cancer cells have a similar shape. However, abnormal cervical cancer cells have a shape that is larger and more irregular than normal cells. At the feature extraction stage, there are 8 parameter calculations in the nucleus and cytoplasm shown in Table 1:    ( . x + ) shows the output of hidden neurons related to the input . is the matrix of the output weight and the matrix of the target. In ELM, the input weight and hidden bias are determined randomly, so that the output weight associated with the hidden layer can be determined from the equation: The structure of the ELM that can be observed is as shown in Figure 4: Data normalization is carried out so that the range between data is not too far away so that when the data is processed, accurate results will be obtained.

Segmentation
Prior to the extraction of shape features, manual segmentation was carried out through cropping. In one field of view there are several cells, so cropping is done to make some data where there are several cells that are indicated of the same type. The dimensions of the cropping result are 431 x 432 pixels shown in Figure 5. The resulting image from manual segmentation in the form of RGB is converted into a ycbrc image. This is done to differentiate between the nucleus and cytoplasmic images so that they can be selected and circled in the nucleus and cytoplasm by using the regionprops toolbox. After that the program can calculate 8 parameters from the shape of the cytoplasmic cells and the shape of the nucleus cells. Then the results of the average value for calculation of parameter calculations can be seen in Figure 6 and Figure 7.

Future Extraction
Feature extraction is the stage of taking important characters (features) in cells after the image segmentation process. In this study, using shape features consisting of 8 parameters of the shape features. Among others, namely: Circularity, Semi Major and Minor Axis Length, Equivalent Diameter, Average Radius, Compactness.
The results of segmentation and feature extraction on the image show the difference between normal epithelial cells and abnormal epithelial cells in size and shape. Figure 7 depicted in the graph shows that the average value of normal epithelial cells (nuclei) and abnormal epithelial cells has a similar shape. Nor in the regularity of its form. The graph of each parameter increases, this proves that in normal nucleus cells the size and shape is smaller than the size and shape of abnormal nucleus cells. So, it can be concluded from the extraction results that the higher the abnormal class the nucleus, the bigger the shape and size of the nucleus. Abnormal epithelial cells have a more irregular shape than normal epithelial cells. On the other hand, in cytoplasmic cells, the more abnormal cervical cells are, the smaller their shape and size, or even until they are not visible as shown in Figure 8, the graph of each parameter shows a decrease, this proves that the size and shape of the cytoplasm is getting smaller than that. normal cells to hsil class.

Training and Testing Process
In this study, k-fold cross validation was used as a method of dividing data into k sub-data, namely 5fold cross validation It can be seen that the results shown in Table 2 of ELM have an accuracy of 100% on fold 1 with a training time of 6 seconds. Backpropagation has accuracy of 96.67% with 12 seconds of Training Time.
As for the results of the ELM testing process, it is known that the accuracy of ELM is done by doing 5-Fold Cross Validation in order to get a pretty good validation accuracy and the validation results are obtained at 95%. Meanwhile, based on Table 2, the backpropagation training stage is the result of program testing for normal images, LSIL, and HSIL. The highest accuracy at the testing stage is in the 4th fold, which is 93.89%. It can be seen that ELM and backpropagation neural networks are

Conclusion
The classification of Squamous Intraepithelial Lesions using ELM has a better performance than Backpropagation Neural Network. The result of ELM training time is 6 seconds and Backpropagation for 12 seconds. It can be seen that the ELM computation is faster so that the running time of the program during training and testing does not last long. It is also known that the highest accuracy results of 96.67% were obtained in Backpropagation training, while the highest accuracy in ELM training was 100% when both methods were tried using 225 data.