Comparison of machine learning algorithms for chest X-ray image COVID-19 classification

Artificial Intelligence and Machine Learning algorithms were used to identify the coronavirus (COVID-19) from X-ray photos of the chest. The authors propose a model for early coronavirus detection based on image filtering strategies and a hybrid feature selection model in this analysis. Traditional statistical and machine learning methods are used to derive these attributes from CT images. The Confusion Matrix for infected COVID-19 patients and regular patients was obtained using Support Vector Machine and K-Nearest Neighbor to classify the features chosen. The output of the two approaches can be compared. The various techniques’ performance shows that the Support Vector Machine achieves the highest precision of 97% compared to the K-Nearest Neighbor with a precision of 86%.


Introduction
Coronavirus (COVID-19) spreads from an infected person by saliva droplets, coughs, and sneezes. The majority of individuals infected with coronavirus (COVID-19) have minor respiratory diseases and are more prone to experience serious diseases such as cardiovascular disease, asthma, chronic respiratory diseases, and cancer. Many individuals above the age of 60 and underlying clinicians are at high risk for COVID 19 [1]. The coronavirus is the fastest transmitted virus among humans as a consequence of serious acute respiratory syndrome. From the CT x-ray and the signs, a method to correctly classify the coronavirus. Two hundred fifty-three samples of infected COVID-19 patients were collected using a different source. A licensed clinical laboratory tested the blood of 49 patients and 24 patients infected. The cross-validation approach correctly detects contaminated patients with a sensitivity of 96.95 percent and a precision of 95 percent [2]. Detecting this condition from X-ray scans is also one of the fastest methods to detect patients. In early studies, infected patients have abnormalities in the chest X-rays Artificial Intelligence and Machine Learning algorithms can provide identification of coronavirus from chest X-ray photos. Classify the files using CNN with the SoftMax classifier, SVM, and the random forest. CNN is seen in two scenarios: picture classification and graphical attribute extraction with a hybrid method. Train and measure parameters using to derived function. According to the proposed algorithm, CNN precision is 95.2 percent, which is higher than other approaches [3]. The usage of radiographic and radiology scans to detect the infection is one of the quickest strategies to diagnosis patients. Early findings indicate that the chest x-rays of COVID-19 patients are visibly

Methodology
There are 5,863 X-Ray photos (JPEG) and two sections (Pneumonia/Normal) in the dataset repository. From the open-source Kaggle website, 80% of the dataset is used for preparation, and 20% is used for testing [6]. As part of routine, outpatient care for patients using Chest X-ray scan. Both chest x-rays have been checked for quality control first, with scans excluded from bad or unreadable. Two specialists then graded the diagnoses of the pictures until the AI scheme was accepted. A third specialist reviewed the evaluation package to make sure there were no grading mistakes. The grey levels, patch scale, measurements, and features of the X-ray images were all new [7].

Pre-processing
By adding a median filter, average filter, and histogram equalization, the preprocessing phase improves its generalization for eliminating noise and improving the contrast enhancement in the entire picture. The median filter sorts pixels in the picture and replaces them with the pixels' median in the surrounding area. The average filter smoothest the picture by reducing adjacent pixel amplitude variations and replacing a neighboring pixel's average value, including itself [8]. Furthermore, Histogram equalization increases picture contrast by stretching out the intensity spectrum, resulting in a higher resolution image with no detail loss.

Feature extraction
Different features have been widely utilized in extraction and selection processes. The HOG mechanism synthesizes dimensional distribution in the picture areas and is particularly helpful in defining deformable structures. The technique is convincing enough to calculate the histogram quickly [9].

Machine learning
Machine learning is a multidisciplinary discipline with a diverse set of science domains supporting it. Computational Statistics, whose fundamental goal is to make forecasts using machines, is closely linked to ML models' simulation [10]. It has also linked to Mathematical Optimization, a branch of statistics that deals with templates, implementations, and frameworks. Machine learning can be used to build and program explicit high-performance algorithms in a number of computing fields [11][12] [13].

Support Vector Machine (SVM)
This algorithm used to characterize a single entity based on derived attributes. Any features will be extracted, and these features must be transferred through the SVM module to detect the correct entity. The SVM algorithm uses a hyperplane to segregate or add a sample to its class. The number of features and the corresponding characteristics must be defined to optimize each function's utility and make the detection process more effective. Supporting Vector Machine provides more reliable performance, but limiting this method is that the time required for classification is more compared to other weaker classifiers. SVM library is used to facilitate module creation [14].

K-Nearest Neighbor (KNN)
Algorithm K-Nearest Neighbor is an algorithm classifying objects that are nearest to the object. The K-Nearest Neighbor algorithm categorizes new data which its class still does not know by choosing k data which are closest to the new ones. As the expected class for new data, the closest class frequency of k is selected. In general, the value of k uses the odd number such that the classification method does not have the same distance. The distance or nearness of neighbours is determined by Euclidean distance [15][16] [5].

Performance Evaluation
In machine learning, the confusion matrix is more widely used to assess a classification model's efficiency. In a classification problem, the correct and incorrect results are tallied, and the performance is compared to the reference data. Accuracy, Precision, Recall, Specificity, and F1-score are some of the most popular matrices. Four statistical indices were determined to overcome the uncertainty matrix: true positive (TP), true negative (TN), false positive (FP), and false negative (FN), as shown in equation (2)(5).

Result and Discussion
A chest X-ray dataset was used to predict coronavirus (COVID-19) infected patients and regular patients in this research. Machine learning algorithms such as Support Vector Machine (SVM) and K-Nearest Neighbor were used to train and validate the dataset on chest X-Ray pictures (KNN). The SVM algorithm obtained a 98 per cent accuracy, 97 per cent precision, 94 per cent recall, 94 per cent specificity, and 98 per cent F1. The Confusion matrix for COVID-19 infected patients and regular patients was obtained using the proposed machine learning techniques, as seen in Table 1 and Table 2. When the uncertainty matrix was analyzed, the Support vector machine identified COVID-19 contaminated patients (269 pictures) as true positive and regular images (139 images) as true negative, achieving a 98 percent accuracy score. The KNN has a 94 percent precision, with true positive images of COVID-19 of 250 and true negative images of 152.  Classifiers are seen by drawing a confusion matrix in Figures 1 and 2. The SVM and KNN settings are used to construct a regular, normalized uncertainty matrix. The support vector machine has the best performance model of the machine learning process, with 97 percent precision, 100 percent recall, 94 percent specificity, and 99 percent F1 score, compared to k-Nearest Neighbors, which has 86 percent precision, 87 percent recall, 73 percent specificity, and an F1-score of 86 percent, as seen in Figure 3.

Conclusion
Among other things, early COVID-19 predictions might have ended the epidemic. This research used some devices with chest x-rays to discern between infected COVID-19 patients and standard chest X-ray pictures. The reliability of the various techniques reveals that the SVM reaches the highest precision of 97% compared to the KNN with a precision of 86%. Comparisons with other classification methods, such as Random Forest or Naïve Bayes, may need to be made in the future to use extraction techniques to increase the performance quality and aid decision-making in clinical practice.