Face mask detection for covid_19 pandemic using pytorch in deep learning

The World Health Organization (WHO) has stated that there are two ways in which the spread of COVID 19 virus takes place that are respiratory droplets and physical contact. So, avoiding the spread of this virus need some precautionary steps to be taken that are social distancing and the wearing of masks. Among these two precautions the mask wearing is considered as the important factor for the spread of COVID 19 virus because these droplets can land on any surface. So, to keep track of the people that are wearing mask or not is more important. Here we have presented a mask detection system that is able to detect any type of mask and masks of different shapes from the video streams for following the rules that are applied by the government. Deep learning algorithm is used here and the PyTorch library of python is used for mask detection from the images/video streams. The proposed system is able to detect the mask wearing people and those one who are not wearing the masks.


INTRODUCTION
COVID-19 pandemic has come in existence in December 2019 and the first case of this virus is found in China. From there this virus started spreading in all over world and almost in every country of this world [1]. Staying home, travel by self driving, avoid the transportation, avoiding to travel to the infected cities all these measures can be taken for stopping the spread of covid 19 disease [2]. The two main reasons found behind the spread of this virus are stated by WHO as the respiratory droplets and the physical contact of peoples [3]. The respiratory droplets from peoples may reach to other persons that are in contact (within 1m) if the infected person sneezes or coughs, and this droplet may spread in air and may reach to various surfaces that are closer. This virus is remains on almost every surface and this may lead to the contact transmission. The precautions stated by the governments of every country are of social distancing and the wearing of masks [4]. The masks that are used by peoples are of various types for example medical masks, surgical masks, procedure masks and also some designer masks of different shapes (cup shape) [5]. These masks are also used for the stoppage of the spreading of respiratory droplets of the infected person and this mask IOP Publishing doi:10.1088/1757-899X/1070/1/012061 2 may also provide the adequate breathability, fluid penetration resistance and high filtration. Social distancing is a simple phenomenon of maintaining distance between objects and peoples of minimum 2 meters. It is to be noted that the role of Internet of things, artificial intelligence, blockchain, unmanned aerial vehicles etc is going to be very crucial in detection of covid19 disease [6]. So here in our proposed approach we have designed a mask detection method that is able to detect various types of masks and of different designs. Our approach is able to detect the peoples wearing mask or not from images and video streams with the help of computer vision, for this deep learning algorithm is applied and the PyTorch library of python is used for implementation. The approach first trains the deep learning model that is MobileNetV2 and then applying this for the detection of masks from images/video streams.

LITERATURE REVIEW
Face recognition system is helpful in various fields. One of the areas is presentation attack detection for the recognition of the face has been introduced in this paper by authors. For this they have used 3D silicon face masks for the real objects. They have used the new database with 8 custom 3D silicon masks along with the bona fide presentation. The effectiveness of the presentation attack detection for such 3D silicon mask has been evaluated and it is found to be good [7]. Here in this paper the authors have proposed a mask detection system for the health care personal inside the operation theatre. As the health care personal need to wear a mask in the operation theatre and the proposed system will alert for any personal not wearing the mask. There are two detection system used for face and medical mask wearing. Their system achieved almost 90% recall and less than 5% of false positive rate. They have worked for the medical mask detection from the images that are taken from 5m distance by cameras [8]. In this paper the authors have worked on the masked face detection from the video. The masked person is detected in this presented approach and mainly 4 steps are performed for the detection that are estimation of distance between camera and person, detection of eye line, detection of part of face and detection of eye. They have analyzed their algorithm on various video surveillance systems and achieved a fine accuracy [9]. The paper presents a model for masked face detection for the security purpose. For this they have presented a special cascade CNN which works on three layers of CNN for the detection of masked face. Also, the authors have worked on the self created MASKED FACE dataset as the already present masked face dataset is not that much sufficient to evaluate algorithms more precisely. The proposed CNN model worked well and the detection of masked faces is done accurately [10]. Another work in this field is done for the detection of masked face and after that the detection of the original face. The proposed work has been carried out in two stages, first one is to detect the masks that are covering larger area of face than needed and the secondly to get the face that is not present in the training dataset. They have applied GAN based algorithm and used celebA dataset for training and achieved higher accuracy [11]. Here another work related to masks have been done by the authors, they have worked on the face detection from the faces that are covered with the masks. The authors have discovered the images that are covered with masks from the video and then original images are detected from this masked face. For this they have proposed Multi-task Cascaded CNN and SVM for classification. The proposed system is able to detect the masked faces and the original faces of peoples [12]. Authors here have proposed a face recognition system that is based on fully convolutional network. The authors have worked on the generation of face segmentation from any size of images. FCN here works as the training method for the extraction of features also the gradient descent is for the training with the loss IOP Publishing doi:10.1088/1757-899X/1070/1/012061 3 function as of binomial cross entropy. They have achieved 93.884% mean pixel accuracy from the proposed approach [13].
Here the authors have worked on the face detection that is very fast and reliable method presented by Viola. This algorithm uses skin mask for the detection of face which gives result 4 times faster. Here the training is done for modeling the system with 2 eyes or 1 eye and 1 nose for lowering the false detection rate. They got 2.4 percent false negative in place of 10% [14].

PROPOSED APPROACH
The proposed approach for the mask detection is stated here and the various implementation steps involved in this approach. The flowchart for this is presented below that states the overall flow of the approach. The model starts with the loading of the dataset for mask detection and the preprocessing of data is done with the help of PyTorch torch vision. Then after the generation and the training of model is done that is MobileNetV2. Then after the serialization of the face mask classifier takes place. The first phase of this model is as stated in above statements and the MobileNetV2 classifier training process is done by using the PyTorch framework of deep learning. Now after getting the serialization of face mask we will load this face mask classifier, and then the faces are loaded from the available image / video stream. Now the next stage comes with the preprocessing of the PyTorch transforms and the OPENCV of python. Here at the end the face mask detector detects the people with the masks or without masks. And the results are stated at last in show figure 1.

IMPLEMENTATION
The implementation of the proposed approach is presented here and the various steps followed and the data used for implementation is stated. Also, the selection of each and every library, framework, algorithm and the dataset are stated in this section.

Data at Source
Here for out experiment we have downloaded the raw images from the article on PyImageSearch and the task of augmentation on this image is done by the OpenCV. "Mask" and "NoMask" are the tags given to the images at the start itself. The size and resolutions of the taken images are varied because the they are taken from various devices having different configurations.

Data Preprocessing
Here the preprocessing steps have been stated which are performed for making the images noise free and make them clear for detection, and this preprocessed image could be given to the neural networks model as input. Algorithm development using PyTorch is done by making use of the following module, x PyTorch DataLoader -Loading of data is done from the folder of images x PyTorch DataSets ImageFolder -image sources are located by using this also it provides already developed module for labeling the targeted variable. x Pytorch Transforms -It helps in the application of the steps of preprocessing on the given images when it is being read from the source folder. x PyTorch Device -This is used for the identification of the capability of the system such as CPU, GPU power for training of the model. The usage of the system can be switched with the help of this. x Pytorch TorchVision -loading of already created libraries can be done by this. Like the pre trained models, image sources and many more. In PyTorch it comes as a core x PyTorch nn -it is seen as the core module. This is mainly used for the creation of Deep neural network model. It provides the libraries that are needed for the model building. Like Linear layer, Convolution layer with 1D, conv2d, conv3d, sequence, CrossEntropy Loss (loss function), Softmax, ReLu and so on. x PyTorch Optim -model optimizer can be defined by this. Proper data learning is provided by this module. Ex. ADAM, SDg, etc. x Pytorch PIL -image loading from the source is done by this.
x PyTorch AutoGrad -automatic differentiation on the given operations of tensors can be carried out in this module. Like one line of code. Gradient calculation can be done by backward (). It is seen very useful while performing back propagation in DNN.

Algorithm for image classification using PyTorch
CNN provides various architectures such as ResNet, Inception, AlexNet, MobileNet, etc. We have implemented our approach on the MobileNetV2 architecture as it is light weight and seen very efficient.

MobileNetV2
This model is inspired from the previous version as MobileNetV1, and uses depth wise seperable convolution as per the building blocks. MobileNetV2 provides us some additional features as: a) linear bottlenecks in between the layers, b) in between bottlenecks it provides shortcut connections. Below figure 2 shows the structure of MobileNetV2.

RESULTS
The result section discusses the accuracy of the proposed mask detection approach. The dataset has been divided in two sets that are training and the validation set. The below graph states the accuracy obtained for the image classification for the training and the validation set. Here the training set contains 5000 images with mask and 4000 images without mask show in figure 5.

CONCLUSION
COVID-19 pandemic has come with various challenges to the world and the spread of this virus should be controlled as this virus has affected more than one crore peoples all over the world and the counting is still going on. One of the major precautions is to wear mask for the stopping of the spread of the respiratory droplets of infected peoples through cough or sneeze as well the healthy people should be covered with mask. So here we have presented an approach that uses deep learning algorithm and the framework of MObileNetV2 is used for implementation along with the PyTorch and OPENCV of python. The results state that the proposed model is capable of detecting the peoples with or without masks from the images as well from the video streams. The accuracy for the training and validation set is compared and found to be of 79.24 %.