CNN-based Clinical Diagnosis and Decision Support System for Chest X-ray

Chest radiographs are popularly used as a first-line diagnostic tool for various chest diseases. However, the correct interpretation of information while reading a radiograph is a major challenge. We aim to develop a suitable convolutional neural network (CNN) framework for the automatic detection of disease in chest radiographs. We have designed a 15-layered CNN architecture with 3 output classes: covid, normal, and pneumonia. This architecture was trained with 10,011 images. To avoid the risk of over-fitting, data augmentation techniques were implemented on training samples while dropout layers were implemented in CNN architecture. During training, we were able to obtain 98.24% accuracy on the training dataset. On the other hand, the experimental results of the test dataset indicate that the classification accuracy is 95.8%, precision is 95.83%, recall is 95.77% and the f1-score is 95.79% on average which is comparatively superior to several existing research works.


Introduction
In medical practice, chest radiography is the most widely used diagnostic tool due to its essential clinical value in the diagnosis of lung diseases such as tuberculosis, pneumonia, interstitial lung diseases, early lung cancer, and many more [1].It is estimated that over 2 billion chest radiograph procedures are performed each year [2].The worldwide acceptance of chest radiography emerges due to its lower cost, ease of acquisition, accessibility, portability, and relatively lower radiation dose [3].However, the correct interpretation of the information in chest radiographs is always a major challenge even for experienced radiologists and other clinicians.A chest radiograph is a 2D representation of a 3D anatomical structure.As the X-rays pass through the body, these are absorbed by multiple anatomical structures producing different pixel values in the respective radiograph.Additionally, different anatomic structures can overlap each other in a single 2D image, and numerous pathological and physiological changes can appear identical, or a single pathology may display various features.Thus, the reported error rates for chest radiograph interpretation remained there for decades.It is reported that the interpretation of chest radiographs accounts for more than 22% of all diagnostic radiological errors [4].At the same time, the interpretation of chest radiographs demands a radiologist's expertise, time, and energy.However, there are not enough qualified radiologists to adequately address the growth in examinations, which results in a greater workload for radiologists [5].Thus, the improvement in diagnostic accuracy and minimizing the error rates while interpreting chest radiographs has been a matter of hot research topic, especially with the rapid improvement and development of artificial intelligence (AI) in medical applications.Recent developments in the deep learning field, especially CNNs, and the availability of a large volume of data sets showed great success in the automatic diagnosis of diseases in chest radiographs.A CNN is an excellent deep learning algorithm for image classification because of its strong ability for feature extraction and classification even from scratch images.Some research works dealt with radiography-based disease detection using CNN [6][7].These works clarified that CNN systems can outperform humans in various cases of disease detection and help to make decisions on chest radiographs.However, the authors of these works usually dealt with two-class classification problems and mostly worked on only one dataset with a small number of samples.Also, most of them make use of pre-trained models such as LeNet, AlexNet, Inception, VGGNet, etc.But these models are huge, have millions of trainable parameters, need a lot of processing power, and are time-consuming [8].On the contrary, this work is based on building a CNN architecture from scratch with at least two disease classification capabilities on a wide range of datasets.Besides, the model performance is observed and contrasted with other related works in an attempt to implement an efficient decision support system for radiologists in medical practice.The rest of the paper is structured as follows: Section 2 presets materials and methods.The results and discussions are given in Sections 3 and 4. The work is concluded in Section 5.

Dataset
In this work, the dataset includes 11,826 frontal chest X-ray radiographs that were obtained from Kaggle repositories corresponding to chest X-ray and COVID-19_Radiography_Dataset datasets [9-10].The images are in varying resolutions from 299 × 299 to 1762 × 1535.Hence, these images are later reshaped into 256 × 256 to format them according to the input layer of the proposed CNN architecture.There is a total of 3937 Normal cases, 3616 Covid cases, and 4273 Pneumonia case images in the dataset.The dataset is divided into two sets: a training set and a test set with a composition of 85% and 15% randomly.Some of the X-ray image samples are shown in Figure 1, while Table 1 represents the randomly selected distribution of the training and testing dataset.Here, we train our neural network model using the training dataset while the testing dataset is used to assess the performance of the trained model.Non-trainable params 0

Convolution Layer
The convolution process takes place in this layer, where the CNN model extracts features from the input image.The convolution layer is also called the feature extraction layer as it extracts features using convolution filters.As shown in Table 3, we have used five different convolution layers each with (3 x 3) kernel size and a different number of filters (N).The early convolutional layers in the network process an image and come up with extracting low-level features like edges, while the convolutional layers deeper in the network can detect complex features such as corners, objects, etc.In comparison to normal feed-forward layers, the convolution layers have a much lower number of parameters and use the concept of weight-sharing technique, hence reducing the computation efforts [12].When an input image is subjected to a convolutional layer, each kernel is convolved across the width and height of the input image, giving a 2D feature map of that filter.Hence, in our architecture, 32 feature maps are created when the input images are passed through the first convolutional layer.Similarly, corresponding feature maps are created as per the number of filters in respective convolutional layers.The values within the filters are adjusted during the training process.

Pooling Layer
In the CNN, the pooling layer is repeatedly used to down sample the outputs of previous groups of neurons which in turn reduces the size, thus making the speedy computation, preventing overfitting, and reducing the memory requirement as well [13].There are different types of pooling layers, including average pooling, max pooling, and sum pooling.Average pooling calculates the average for each patch of the feature map, max pooling selects the largest value from the feature map, and sum pooling calculates the sum of all feature map elements.Among them, max pooling is the most prevalent and frequently used pooling layer.In our CNN architecture, we have used four max pooling layers each after convolutional layers.These max pooling layers are each of (2, 2) pool size and (2, 2) stride.By selecting the largest value from the input window of (2, 2) pool size for each channel of input, these layers down sample the input along with spatial dimension (height and width).The window is shifted by (2, 2) strides along each dimension.Thus, the spatial shape of the resulting output from the max pooling layer is given in Equation ( 1).
(  [14].In our CNN architecture, we have used two dropout layers with dropout rates of 25%.Thus, the random 25% inputs to these layers are eliminated at each step during model training.This helps to avoid the overfitting of our model.

Flatten Layer
The flattened layer is used to translate the multidimensional data value input into a single dimension.This is necessary before passing data into a fully connected dense layer.Thus, in our CNN architecture, we have used a flattened layer before the dense layer.

Dense Layer
The dense layer is also known as the fully connected layer.The operation of the dense layer is represented in Equation ( 2).The Kernel is a weights matrix created by the layer, activation is the element-wise activation function supplied as the activation argument, and bias is a bias vector created by the layer.In our CNN architecture, we have used two dense layers after the flattening layer, where the second dense layer is used for classification with three output classes for COVID, normal, and pneumonia cases. (2)

Activation Function.
As real-world problems are non-linear, activation functions are used to introduce non-linearity in the model.Various forms of activation functions have been used to transform data into a non-linear form.
In our case, we have used rectifier linear unit (ReLU) as an activation function in hidden layers because ReLU speeds up the training process and is faster to compute [15].Also, we have used the sigmoid function in the output layer as it is suitable for predicting probability as an output.The mathematical representation for these functions is expressed by Equations ( 3) and ( 4).

Model Training
Our proposed model was trained with 10,011 labeled images.Before inputting the images into our model, the data augmentation techniques as listed in Table 4 were applied, and the images were downscaled to 256 x 256 to format the images as per the input layer of our model.Table IV displays the values of the hyperparameters used to train the CNN model.During the training phase, the outputs from CNN and labels are fed to the error function.The errors are backpropagated in CNN to adjust parameters, thus reducing the errors.In our CNN model, there are three label classes, so we have used sparse categorical cross entropy as a loss function.During training, this loss function is minimized by using optimizers that work to adjust the parameters of CNN.We have employed the Adam optimizer which is a stochastic gradient descent technique based on adaptive estimation of first-order and second-order moments.In addition, it is computationally efficient, invariant to diagonal rescaling of gradients, has little memory requirement, and is well suited for larger problems in terms of data and parameters [16].We have trained our model for 100 epochs which is sufficient for our model to obtain 98.

Qualitative Evaluation
Once model training was completed, the best model with the highest accuracy was loaded to access the performance of our model to classify the input image into three classes namely Covid, Normal, and Pneumonia.Four common evaluation metrics -accuracy, recall, precision, and f1-score were employed to assess our model.To define these metrics, the following terms are defined first.

True Positive (TP)
It refers to an object corresponding to the positive class and is also predicted to be the same class by the model.

False Positive (FP)
It refers to an object corresponding to the negative class, however, predicted as positive by the model.

True Negative (TN)
It refers to an object corresponding to the negative class and is also predicted to the same class by the model.

False Negative (FN)
It refers to an object belonging to a positive class, however, predicted as negative by the model.In correspondence to the above parameters, the four-evaluation metrics are represented by Equations ( 5), ( 6), (7), and (8). ( In our case, we have provided 1815 test images into our model for prediction into one of the three classes.The evaluation metrics of our model are then obtained which is as shown in Table 5. while the confusion matrix is shown in Figure 4.The accuracy of the model provides an overall measure of the total number of correct predictions.If the dataset is imbalanced, accuracy alone cannot provide insight into the performance of the model.In such situations, the precision and recall values are used to assess the performance of the model.Precision provides the accuracy of the model over total positive label predictions while recall measures the total correct true predictions by the model over total ground truth positives.In our case, we have achieved an accuracy of 95.8%, with a precision of 95.83%, a recall of 95.77%, and an f1-score of 95.79% on average.

Comparison with Related Works
There is some research on automatic disease detection based on chest radiography using CNN.We have evaluated our results with the results of these related works.The comparison is given in Table 6.
Most of these works are based on small-size datasets and are two-class classification problems.Also, many of them have made use of the transfer learning approach by using various pre-trained models for feature extraction.However, transfer learning from the pre-trained model may involve negative transfer when the CNN is pre-trained on data not similar to a chest radiograph.In this regard, we have tried to develop our own custom CNN architecture from scratch that can classify input chest radiographs into three classes and is trained with a sufficient amount of dataset giving satisfactory classification accuracy.

Discussion
The results presented in this research work demonstrate that a convolutional neural network can be used to successfully detect diseases in chest radiographs at a level comparable to or even higher than practicing radiologists.We have successfully developed a model that can classify input chest radiographs into one of three classes namely Covid, Normal, and Pneumonia.Also, the system can generate activation maps depicting discriminative areas from the last convolution layer of our CNN architecture for each class classification.Figure 5 shows the normal and its corresponding activation map from the last convolution layer of our CNN architecture for each case.If we can make clinical integration of this system, it would contribute greatly to patient care by decreasing the time to diagnosis and increasing access to interpretation of chest radiographs.Although, the results are overwhelming and are superior to other related works, there are still some limitations in our model.First, although we have combined datasets from two multiple sources, it is still not sufficient enough as the deep learning algorithms tend to work better with the larger dataset.Second, a chest radiograph can be used as a diagnostic tool for a large number of lung diseases, however, we have only made a classification for three diseases.The lack of datasets for other diseases limits the performance of our model for a wide range of disease classifications.Third, only frontal radiographs were accessed for developing our model, however, accurate diagnoses require a lateral view as well.

Conclusion and Future Work
Chest radiograph plays a vital role in the diagnosis of various lung diseases, and it is widely used as a first-line diagnostic tool worldwide.However, the correct interpretation of information while reading a chest radiograph is a major challenge.In this regard, we have developed a CNN architecture from scratch to detect normal, COVID-19, and pneumonia from input chest radiographs.For this, we have achieved an average accuracy of over 95% which is comparatively higher than the other related works.
In the future, we aim to work on larger datasets with wide classification capabilities.Also, we plan to improve the classification accuracy by implementing fine-tuning and optimizing the hyper-parameters.
In addition, we will be validating the performance of our system on actual medical chest radiographs collected from different hospitals and imaging centers to employ it as a decision support system for radiologists in clinical practice.

Figure 1 .
Figure 1.Chest radiographs: (a) Covid-19, (b) Normal, and (c) Pneumonia 24% accuracy in the training dataset.One epoch corresponds when all the training data are passed through the model once.We have implemented a model checkpoint to save the best model during the training process by monitoring accuracy during each epoch.Also, we have set early-stopping for early breaking of the model training if there is no chance in model accuracy by 0.01 for 20 epochs.Figure 3 represents the loss vs accuracy graph of our CNN model for 100 epochs.In this figure, the red curve represents the accuracy of our model, and the blue curve represents the loss of our model over 100 epochs.During this process, we obtained the highest accuracy value of 0.9824 for a loss value of 0.0663 during epoch 80. Thus, the parameters of the CNN model for this accuracy and loss value were saved as the best model which was later loaded for further evaluation of our model.

Figure 3 .
Figure 3. Loss vs Accuracy graph during training of model for 100 epochs

Figure 5 .
Figure 5.Our model predicted images with their corresponding heat-map activated image from the last convolutional layer: (a) Covid original image with corresponding heat-map activated image, (b) Normal original image with corresponding heat-map activated image, and (c) Pneumonia original image with corresponding heat-map activated image.

Table 2 .
Data augmentation techniques In our research work, we have designed a CNN architecture that can classify input chest radiographs into three classes namely Covid, Normal, and Pneumonia classes.The general block diagram of our proposed CNN model is shown in Figure 2. It is a 15-layer CNN architecture based on six basic components, namely the convolution layer, activation function, pooling layer, dropout layer, flatten layer, and dense layer.These elements are utilized in various layers of our suggested model and have their respective functionality in the CNN architecture.The detail of the summary of our proposed 15layer CNN architecture is presented in Table3, where there is a total of 3,609,187 trainable parameters.

Table 3 .
Summary of the CNN Model with parameters for each layer The use of a dropout layer is one of the techniques that help to reduce overfitting issues during model training.During each step of model training, this layer randomly sets input units to zero value with a set frequency.Those inputs not set to zero are scaled up by 1/(1-rate), keeping the total sum of all inputs constant Figure 2. The general block diagram of the proposed CNN model

Table 4 .
Hyperparameters of the CNN model.

Table 5 .
Experimental performance metrics results

Table 6 .
Comparison of our CNN model with other related works.