xCovNet: A wide deep learning model for CXR-based COVID-19 detection

The high death toll and economic impact of the COVID-19 pandemic emphasize the need for effective population screening technologies. The high cost, limited availability, and slow nature of CT scans and PCR-based tests renders them impractical for frequent use among the general public. As chest X-ray (CXR) imaging is fast and economical, a high accuracy CXR-based test would be well-suited for such screenings. Deep learning algorithms are widely used to aid medical image diagnoses. We use a collection of state-of-the-art pre-trained deep neural network models with additional layers to detect the COVID-19 cases from a sample of healthy, COVID-19, and pneumonia patients. We observed models trained with concatenated features of multiple pre-trained deep learning architectures outperform the individual and ensemble models. Our final model obtained a recall of greater than 98% and a precision of greater than 95% on two separate datasets. The wide architecture of xCovNet may contribute to its robust behavior, providing a systematic approach to construct a reliable deep learning model for emerging datasets.


Introduction
COVID-19 is a highly contagious infection caused by the novel SARS-CoV-2 coronavirus.Although RT-PCR tests are considered the gold standard, the major drawbacks are the cost and time involved.In airports, while many airlines require negative PCR-based COVID-19 tests for international flights, it can be extremely challenging for passengers to find testing centers before journey.This policy cannot guard against passengers who are infected between the time of their COVID-19 test and their flight.Chest CT scans have shown superior performance to RT-PCR [1] but they are impractical for general population screening due to non-portability and exorbitant cost is a major concern for many developing countries, making them infeasible as a global screening tool.
Chest X-Rays can be rapidly acquired, are cost-effective with very high precision and sensitivity and so, they can be an effective solution for population screening.Furthermore, automatic detection of potential patients would greatly reduce human error.Some of the best-performing X-ray based COVID-19 detection algorithms are the COVID-Net model [2] with a COVID-19 detection sensitivity of 80% and precision of 88.9% and Oh et al's proposed model [3] with a sensitivity of 100% and precision of 76.9%.
To overcome the challenge of limited size of COVID-19 CXR datasets, we use a collection of state-of-the-art pre-trained deep neural network models with additional layers to detect the COVID-19 cases from a sample of a) healthy people, b) COVID-19 patients, and c) pneumonia patients.We observed models trained on concatenated features of multiple pre-trained deep learning architectures outperform the individual and ensemble models as well as the ones discussed above: COVID-Net model [2] and Oh et al's proposed model [3].
Oh et al [3] achieved perfect sensitivity for datasets similar to Wang et al [2] for COVID-19 classification, but this high sensitivity comes at the expense of high false-positive rates.Tested on a similar dataset to Oh et al, our model (xCovNet) obtained a sensitivity of 98% which is slightly lower than Oh et al's sensitivity (100%) while xCovNet's 95% precision is much higher than Oh et al's (76.9%).Covid class F1 Score for xCovNet is 96% and is much higher than the F1 score for Oh et al's model (86.9%).
We used a global approach, as opposed to a patch-based approach similar to [3], as patch-based approaches may miss relevant features.Since the training datasets are small in size, pre-trained deep learning models with additional layers as well as data augmentation (rotation up to 15 degrees) were used to train the classifiers.A callback function was used to prevent overfitting.The model classifies a chest X-ray into 3 categories: Normal, Pneumonia, and COVID-19.The first two classes may be combined as Non-COVID-19 cases for binary classification into Non-COVID-19 and COVID-19.The proposed model achieved significantly higher precision and sensitivity on a comparable test dataset for detection of COVID-19, compared to the models proposed by Oh et al [3] and Wang et al [2].techniques.The images were then normalized by dividing the individual pixel intensities by 255 so that all the pixels in an image had a value between 0 and 1.The X-ray images comprise three classes: Normal, COVID-19, and Pneumonia.DenseNet201, MobileNetV2, ResNet152V2 and Xception were chosen as the backbone models.The topological depth, number of parameters, top-1 accuracy, and top-5 accuracy of these models can be accessed in the Keras documentation [4].Top-1 accuracy and top-5 accuracy refer to performance of the model on the ImageNet dataset [5].

Model Architecture
Our proposed xCovNet model concatenates the outputs of modified versions of MobileNetV2, ResNet152V2 and Xception.These models act as feature extractors where they extract slightly different sets of features.Our pipeline illustrates that combining these features improves the accuracy of the model when compared with standalone models.Each of the predefined models was imported with ImageNet weight initialization, except for the topmost classification block.Extracted features were concatenated in a layer and the concatenated layer was connected to the output layer with a Softmax activation function and three nodes, one for each class.Cross-entropy loss was used for training.Test and validation images were classified by the class with the largest predicted probability.The validation set was used for tuning the hyperparameters as well as for model selection.
Rectified linear unit (ReLU) activation was used in all the additional dense layers.Since the dataset has more than 2 classes, Softmax activation was used in the last layer to obtain class probabilities.Adam optimizer was used for training, with an initial learning rate of 0.001, a learning decay rate of 0.2, and a patience factor of 6.It was then multiplied successively by 0.2 until the value of learning rate became 0.0001 when the validation loss did not improve after training for 6 consecutive epochs i.e., the patience factor was set to 6. Hyperparameter tuning was performed using a grid search.The weights for each of these layers were determined using hyperparameter tuning through a grid search technique on the validation set.To avoid overfitting, a Keras callback function with early stopping was used to terminate the training if the performance did not improve for 6 epochs.The model achieved the highest validation accuracy on the 6th epoch.Afterward, the model started to overfit on the training data and the validation accuracy started to decrease.The model was early stopped at the 12th epoch and the weights were restored from the 6th epoch.

Datasets
Three different open-source datasets were used for model training and prediction across three classes: Normal, COVID-19, and Pneumonia consisting of Bacterial and Viral Pneumonia.
The COVID-19 chest X-ray images were taken from the open image database created by Joseph Pal Cohen at the University of Montreal [6].The chest CT scan images and lateral X-ray images were deleted from the dataset and only frontal chest X-ray images were retained.Also, the dataset contained a few chest X-ray images of people affected with SARS, MERS, and ARDS, all of which were removed.Consequently, the final COVID-19 dataset consisted of 275 frontal chest X-ray images of patients suffering from COVID-19.
The X-ray image dataset with Normal and Pneumonia classes was taken from the open-source Kaggle dataset of 5,863 images, collected by Paul Mooney [7].From the above dataset, an equal number of 275 images for each of Normal and Pneumonia categories were randomly selected for our dataset to prevent class imbalance.Thereby, the chest X-ray data repository used in this study consists of 825 images with 275 images in each class.The dataset was further randomly split into train, validation, and test sets using a 70/10/20 split ratio.Table provides a summary of the data split.

Implementation Challenges, Limitations, and Proposed Solutions
The major challenges and limitations of this work are as follows: (i) Limited training data: Although the COVID-19 pandemic has already infected more than65 million individuals, its pressure on healthcare providers around the globe has limited the capacity to de-identify and release X-ray images for research purposes.
(ii) Architecture optimization: The small size of publicly available datasets impedes reliable architecture optimization without potentially overfitting to a dataset with fully training each deep learning model.Hence pre-trained models were used, and extracted feature sets of each network were then concatenated

Initial Architecture Assessment
In line with the above discussion, to achieve high accuracy and reduced training time, 4 standalone pre-trained models were selected for further evaluation: DenseNet201 (D201), MobileNetV2 (MNV2), ResNet152V2 (R15V2), and Xception (XPN).These models enjoy high prediction accuracy on the ImageNet dataset, suggesting they are efficient feature extractors.However, it is not an immediate conclusion if a single architecture succeeds at extracting all relevant features.Therefore, we combine the extracted features of multiple models to achieve a wide supermodel that has higher predictive power than any of the individual models.To perform transfer learning, the top layer was removed and all of the other layers were kept frozen with the pre-trained ImageNet-based weights.The final flatten layers were concatenated before getting connected to the output layer for classification.We examined different combinations of these models.

Performance Metrics
High predictive accuracy alone is not enough to ensure high performance and reliability of a model.Precision and sensitivity are among the metrics that are used within the medical community and are standard tools to assess the performance of a classifier.For the COVID-19 class, which is the main focus of this research, precision, sensitivity, and F1 score were calculated based on the predictions on the test set, in addition to the overall accuracy.

Experimentation with different models
Initially, the four modified standalone models with ImageNet weights were fine-tuned and evaluated on the validation dataset to obtain overall accuracy.As Table 2 lists, D201 secured the highest accuracy, followed by R15V2 and XPN.
We concatenated extracted features of the top-performing models to create wider models with a more diverse set of features.The deep learning feature concatenation pipeline is provided in Figure 1.Intuitively speaking, such a strategy improves deep feature representation; features captured by individual models are encoded in the learned representation and can contribute to the prediction, collectively.We considered all architectures to consist of two or three backbone models.Performance metrics of these models are provided in Table 2. DenseNet201, MobileNetV2, ResNet152V2 and Xception are abbreviated as D, M, R, and X, respectively.A combined model is named by concatenating the initials of its backbone architectures.For instance, RMX is the model composed of ResNet152V2, MobileNetV2, and Xception.A list of these combined models and their corresponding metrics on the validation set are provided in Table 3.

Model Analysis
Given the small differences in the performance metrics of the models in Table 3, we used validation loss as a more stable metric to select the final model.Note that models with higher overall accuracy tend to have higher COVID-19 F1 scores as well (see Table 3).Since RMX has the lowest validation loss and outperforms other models, we hereafter focus on this architecture and rebrand it as xCovNet.We benchmark our proposed xCovNet against other state-of-the-art models to further evaluate its performance.
The confusion matrices of xCovNet predictions on the test set are provided in Figure 2. Figure 2 describes a 3-way classifier on validation and test sets, respectively, depicting the generalizability of the trained classifiers.Given the focus on COVID-19 detection, the confusion matrix for binary classification on test data is provided in Figure 2, where we have collapsed normal and pneumonia classes together.The confusion matrices of Figure 2 (a) corresponds to F1-scores of 99% for binary classification.

Validation on External Data
To test the generalizability of xCovNet we tested its performance on an independent dataset.Two new datasets were combined to construct an external test set for evaluating the performance.
The combined dataset has a total of 525 images with 175 images for each of the 3 classes: Normal, Pneumonia and COVID-19.The COVID-19 chest X-ray images were taken from the open image database at the IEEE DataPort, Sheet, et al. [8] submitted by Rakshith Sathish, as part of the COVID-19 Action Group on AI and Radiology.The chest X-ray images in this dataset have been prepared by combining images from several open-source databases.This dataset consists of multiple classes, of which only a sample of 175 COVID-19 images was selected.Additional X-ray images for the Normal and Pneumonia classes were taken from the open-source Mendeley dataset Version 3 [9].The dataset has over 110,000 labeled images from multiple diseases.175 images were randomly selected from the Normal and Pneumonia classes (see Table 1 for details).4).

Discussion
xCovNet is an efficient model for analyzing X-ray images to flag high-risk individuals.It presents an outstanding performance on test and external validation datasets.Concatenating feature spaces of several deep learning models pre-trained on ImageNet data allows for efficient detection of morphological features present in the image while keeping the number of model parameters sufficiently small to avoid overfitting.This is especially desirable for analyzing COVID-19 images as publicly available datasets are not sufficiently large enough yet to fully train CNNs.Not only are deep learning models prone to overfitting, but they may also fail and correlate dataset-specific artifacts with class labels, which may result in large performance drops on new datasets.xCovNet achieved high accuracy on an external test set, suggesting it is using a robust set of features for its predictions.

Conclusion
Chest X-ray is a fast and economic test for detecting potential COVID-19 patients, which reduces the caseload of the healthcare workforce.The proposed xCovNet model is an efficient tool to automate Xray based screenings.It efficiently filters large population groups and outputs a shortlist of potentially infected individuals.The transfer learning-based approach allows for effective training when the lack of large datasets disables us to fully train a model from scratch.External validation illustrates that xCovNet can also differentiate between healthy, COVID-19, and pneumonia images, from which the next steps of a screening protocol can benefit.xCovNet consistently achieved of greater than 95% and a sensitivity of greater than 98% when tested on multiple independent COVID-19 datasets collected by different labs in different countries.These numbers ensure confidence in the use of this technology as a first-phase filtration mechanism at a large scale.

Figure 1 .
Figure 1.Overall and detailed representations of the proposed xCovNet model.

Figure 1
Figure 1 shows an overview of the proposed xCovNet model layout.The CXR images in each dataset were resized to 224 by 224 pixels, which is in line with the standard practices for image classification

Figure 2 (
Figure 2 (b) provides the confusion matrices of xCovNet predictions on the external dataset.xCovNet preserves its high prediction accuracy for all classes.In particular, out of the 175 images, only 1 of the COVID-19 images was mislabeled as normal, and only 3 were mislabeled as Pneumonia.The model has slightly lower accuracy on Pneumonia images, mislabeling 6 as COVID-19 and 11 as Normal.Finally, 9 normal images were mislabeled, of which only 3 are mislabeled as COVID-19.As Figure 2 (b) suggests, for binary classification, xCovNet is an effective model for population screening, i.e., filtering a large pool of individuals to create a short-list of high-risk potential patients (See Table4).

Table 1 .
Dataset summary for experiment.

Table 2 .
Overall Accuracy for standalone models on Validation Set.