VGG-based discrimination for pneumonia images enhanced by Generative Adversarial Network

Due to problems such as blurring of chest X-ray image and double shadow of heart and lung, the accuracy of judging whether there is COVID-19 is not high. This study intends to use Generative Adversarial Network (GAN) to segment images, and then put the segmented image set into the classification algorithm to determine whether it is infected by the pneumonia. The data set is divided into “normal” and “pneumonia”. In order to remove the interference of rib and heart on lung contour in X-ray image, GAN technology is used to segment the original data set. The segmented image only retains the most important lung silhouette, which is convenient for the next step of recognition and information extraction. The classification algorithm uses Visual Geometry Group (VGG)-16. During the training, only fully connected layer 1 and fully connected layer 2 layers were trained, and the rest layers were frozen. When the model loads the original weight, the top level of the model will be replaced by the head model. The head model is placed on the basic model and becomes part of the actual training model to determine the best weight. After the original X-ray image is processed by GAN, the accuracy of the segmented lung image is improved from 84% to 98%. Loss decreased from 17% to 3%, and the accuracy of classification results was greatly improved. Therefore, it is proved that the segmented image can effectively remove irrelevant parts in the recognition process, and the accuracy is improved due to the removal of noise interference and clearer lung contour.


Introduction
According to the World Health Organization (WHO), since the outbreak of COVID-19, most COVID-19 patients (about 80%) may be asymptomatic, and about 20% may need hospitalization due to dyspnea.The key to fighting against the epidemic is to quickly detect virus carriers.At present, the diagnosis of novel coronavirus pneumonia is mainly to detect genetic material through virological polymerase chain reaction or lung X-ray image [1].The results of the molecular test take hours or even days, while the Xray test only takes a few minutes.At present, most doctors still rely on observation to determine whether patients are infected with COVID-19 virus.However, artificial intelligence (AI) recognition of X-ray images can play an auxiliary role in doctors' decision-making.The accuracy of recognition depends on the accuracy of algorithm and the quantity and quality of data sets.
However, the main obstacles of chest x-ray image segmentation and recognition are image blurring, local artifacts and cardiopulmonary overlap [2], which lead to a high error rate.If the accuracy does not meet some requirements, it shall not be put into normal use.Since the development of medical imaging, there are other recognition technologies besides X-ray and various imaging technology applications have been developed.For example, XRayCovid-19 of UFRRJ is a project under development, which uses an AI assisted health system to process COVID-19 during diagnosis [3].In addition, in the application of biomedical information, medical digital image transmission protocol technology has been developed to exchange and consult generated digital image files.Common medical imaging technologies include angiography, cardiovascular angiography, etc [4].
However, sometimes due to insufficient samples or new problems, false positive or unidentifiable results may be obtained, which requires doctors to review, increasing the burden.In addition, in terms of the "nutrition" of AI products, there are also problems such as small amount of data, few dimensions, low quality, and "data islands".At the same time, the low concentration of the medical imaging equipment industry and the uneven quality of hospital equipment also increase the requirements for the robustness and adaptability of medical AI products.
In order to reduce the influence of data set on recognition results to a certain extent, GAN can be used as a preprocessing method to optimize the data set.Generative Adversarial Network (GAN) is a deep learning model and one of the most promising unsupervised learning methods for complex distribution in recent years [5].The model produces good output through mutual game learning of at least two modules in the framework: generation model and discrimination model.In recent years, GAN has many applications in the field of medical imaging, such as reconstruction of lost image data, and synthesis of certain types of medical images [6].For example, the Korean Academy of Science and Technology is studying the synthesis of dislocation CT images.
In this study, since only the lung image is the target image in the X-ray image, it occupies a certain area and is different from the adjacent heart and ribs in color and shape.Therefore, in order to reduce the interference of ghost and occlusion on computer recognition, this paper will separate the lung from other organs through GAN image segmentation to obtain a black and white segmented image with only the lung firstly.The segmented lung image set is then imported into the classification algorithm as a new data set and labeled as "normal" and "pneumonia".Observe whether the classification results are improved compared with the original undivided results.

Dataset description and preprocessing
The data set used in this study is sourced from the data website kaggle [7].There are two categories of labels, namely "normal" and "pneumonia".An example is shown in Figure 1.From the Figure 1, the normal X-ray shows a clear lung, without any abnormal turbid areas in the image, while the right image shows shadows and gaps in the lung.When patients' chest X-ray images are abnormal, most patients are involved bilaterally [8].Bilateral multilobular and subsegmental consolidation areas are typical manifestations of chest CT images of patients in intensive care unit (ICU) upon admission.In the training set, there are 1423 "normal" pictures and 897 "pneumonia" pictures.In this study, the size of all original pictures were resized to 150×150 and normalized to [0, 1].

GAN
The main inspiration of GAN comes from the idea of zero-sum game in game theory.When it is applied in the field of deep learning neural networks, the distribution of data should be learned through the continuous game between the generator and the discriminator.In the field of image generation, G can generate realistic images from random numbers after training.
The main function of GAN is to pair the original image with the segmented image [9].Divide each image into two parts to build a complete dataset.The semantic image and original image of each X-ray film.Crop each image into two and record the pixel values.
During the construction of the generator and discriminator, it is necessary to import all the layer types required to build the model.This includes the main convolution and convolution transpose layers, as well as the batch normalization layer and the leaky relu layer.Cascade layer is required to build U-net architecture.The generator includes encoding initial data multiple times to obtain a feature map of the original image.The full resolution image is obtained by decoding the feature map.It can be concluded that most of the layers in the generator are just encoder and decoder blocks.The discriminator is a keras implementation of the model used in the pix2pix GAN paper.Use the leaking Relu to replace the normal Relu for experiment, and consider the negative value.This improves the convergence speed.The discriminator performs binary classification, so the S-shape is used at the last level, and the loss function is set to binary cross entropy.After several attempts at the project, it was found that 10 batches would produce the best results.Figure 2 presents the original image and the corresponding images based on the GAN.

Convolution neural network.
Convolution neural network (CNN) is a kind of feedforward neural network and one of the most famous algorithms of deep learning.It has convolution computation and depth structure.CNN can classify input information according to its hierarchical structure because of its strong representation learning ability [10,11].
As a kind of CNN network, Visual Geometry Group (VGG) has very good feedback on image recognition.VGG convolution layer uses small convolution kernel, which can add nonlinear activation function after each layer, enhancing the learning ability of the model and increasing the ability of feature abstraction.
Pooling layer adopts 2 × 2 pooling core, which can extract features more effectively.VGG16 is used in this study.The workflow of the method can be found in Figure 3. Vgg16 has 16 layers, and its structure is concise and suitable for change.The convolution kernel is mainly used to expand the number of channels, reduce the height and width by the pooling layer, and make the model architecture deeper and wider within the acceptable computing range.The last two layers (FC1 and FC2) of the existing framework used in the study are locally trained.The total number of parameters exceeds 15 million, nearly 600,000 are locally trained, and the rest are "frozen".The first layer of VGG16 architecture uses 224x224x3 images, so this study must ensure that the X-ray images to be trained also have these dimensions.Because they are part of the "first layer" of the convolution network, when the model is loaded with the original weights (weights="imagenet"), the top layer of the model is not retained (include_top=False), and these layers will be replaced by the head model.The head model is placed on the basic model and becomes a part of the model for practical training (to determine the best weight).

Implementation details
This study is based on Tensorflow, reads the X-ray image and generates the corresponding tag value, which is normally tag 0 and pneumonia is tag 1, and generates training set and validation set.In addition, the learning rate, epochs, nodes_dense0, the ratio of dropout layer, maximum database connection value and the size of max pooling is set to 1e-3, 10, 64, 0.5, [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] and (4, 4), respectively.1 presents the loss and the corresponding accuracy of two datasets, one is based on the original dataset and the other is based on the dataset preprocessed by the GAN.In this case, the structure of the CNN model used for classification training is not changed, and only the data sets are different before and after.The original set uses the X-ray image that is only preprocessed for classification, and the accuracy rate is 0.8396, and the loss rate is 0.1766 as shown in Table 1.The accuracy rate of image classification after GAN processing is as high as 0.9899, and the loss rate is only 0.0308 as shown in Table 1.Thus it can be seen that the performance of the proposed method is 15% higher than the method based on original dataset in terms of the accuracy, which proves that the effectiveness of the proposed method.

Discussion
After GAN segmentation, the classification performance is greatly improved as shown in Table 1.The reason may be that the influence of local artifact and cardiopulmonary overlap is eliminated in the segmented image, leaving only the most important pulmonary contour.The segmented image is black and white with large contrast and clear boundary.The color difference of rib and other organs in the original image is small.In addition, the image may be relatively blurred due to patient jitter or imaging problems.It also has a certain impact on the improvement of recognition accuracy.In the process of CNN image recognition, local and overall information will be extracted.However, some unnecessary local information may be extracted from the original image, such as the ghost of ribs and heart, which reduces the efficiency of model recognition to a certain extent.Only the most important lung contour can be extracted directly from the segmented image.

Conclusion
The content of this study is to determine whether there is pneumonia by segmenting the X-ray image and putting the segmented lung image into the classification algorithm.The research method is mainly to segment the image with GAN.The classification algorithm is VGG16 which has been pre trained and only changed the FC1 and FC2 layers.The classification results are divided into "normal" and "pneumonia".Finally, the accuracy of the experiment after image segmentation reaches 98%.It is 15% higher than that without segmentation.However, this is only an immature experiment and cannot be put into use.There is still a long way to go before it can be used as a medical tool.Later, GAN segmentation will be used for the inspection of other parts.It is also possible to add a group intelligence module to analyze errors and let professionals manually mark before training.

Figure 1 .
Figure 1.The sample data in the collected dataset.When patients' chest X-ray images are abnormal, most patients are involved bilaterally[8].Bilateral multilobular and subsegmental consolidation areas are typical manifestations of chest CT images of patients in intensive care unit (ICU) upon admission.In the training set, there are 1423 "normal" pictures and 897 "pneumonia" pictures.In this study, the size of all original pictures were resized to 150×150 and normalized to [0, 1].

Figure 2 .
Figure 2. The left is the original image, and the right is the segmented image.

Figure 3 .
Figure 3.The structure of the proposed method.

Table 1 .
The performance based on two kinds of datasets.