Face mask detection services of Covid19 monitoring system to maintain a safe environment using deep learning method

COVID19 has become one of the pandemic diseases that hit the world, including in Indonesia. Efforts to prevent the spread of this virus have been carried out, including by implementing health protocols, provide information services for the spread of the virus and emergency response, detection services for people suspected and infected with the virus and programs for preventing the spread of the virus and vaccination for all elements of society in Indonesia. This research focuses on the development of face mask detection system to detect the use of mask by people using deep learning method that focuses more on object detection. Face masked detection systems can be used to assist in monitoring compliance with the use of masks in public areas that have the potential to cause crowds so as to create a safe environment. The use of the Deep Learning method through the MobileNets V2 architecture was quite effective and gives an accuracy result in testing the classification training data of 0.99 at 10 epochs and has also been implemented in real-time direct testing.


Introduction
COVID19 has become one of the pandemic diseases that hit the world, including in Indonesia. This virus has spread in Indonesia since early January 2020 and has infected a large number of people. The government's efforts to prevent the spread of the COVID-19 virus were currently accelerating its implementation by launching a vaccination program for all elements of society in Indonesia. Giving this vaccine aims to provide immunity to the body against Covid-19 virus infection and can further encourage the formation of herd immunity or group immunity in society. However, the vaccination program that has been going on for more than a few months has not been evenly distributed to all Indonesian people and was currently only for the priority group of vaccine recipients. Therefore, efforts to prevent the spread of the virus through compliance programs with health protocols must continue to be implemented and monitored, such as compliance with wearing masks. Efforts to monitor the use of masks are currently carried out directly by security personnel. This supervision has not been effective when carried out in public areas and crowds so that compliance with the use of masks has not been optimal. To assist in monitoring compliance with community masks, this research will develop an Information and Computer Technology (ICT)-based application to detect the use of masks that can be applied in public areas that have the potential to cause crowds. This detection system uses digital images obtained through CCTV cameras and then analyzed through image processing techniques and machine learning. This digital image analysis is carried out to detect people who do not use masks and detect the use of masks that are not in accordance with the provisions.
Compliance monitoring services implement health protocols, in this case compliance in using masks has also been developed by many researchers [1][2][3][4][5][6][7]. This service was used to detect the use of masks that can be applied in public areas that have the potential to cause crowds. This detection system uses digital images obtained through CCTV cameras and then analyzed through image processing techniques and machine learning as shown in Figure 1. This digital image analysis was carried out to detect people who do not use masks and detect the use of masks that were not in accordance with the provisions. Object detection in image processing was also applied to other research topics, such as robotics, autonomous vehicles, autonomous driving, health, surveillance, and others. One of the object detection algorithms that widely used was You Only Look Once (YOLO) [1,[8][9]. Detection of masks on the face involves detecting the location of the face and then determining whether the face uses a mask or not and the masks used were in accordance with the provisions. In addition to the YOLO algorithm, object detection can use basic machine learning tools, namely Tensor Flow, Keras, Open CV and Scikit Learn [2,4].

Figure 1. Mask Face Detection Services
This research develops a face mask detection system to detect the use of mask by people using deep learning method that focuses more on object detection. Face masked detection systems can be used to assist in monitoring compliance with the use of masks in public areas that have the potential to cause crowds so as to create a safe environment. Several researchers have developed a mask compliance detection system through object identification in image processing. This study will use an RGB camera as a non-contact sensor to detect compliance with the use of masks in public areas. This application not only detects compliance with the use of masks, but also detects the correct placement of face masks. The use of the Deep Learning method through the MobileNets V2. This paper consists of five chapters, including chapter one discusses the background, chapter two discusses material and methods, chapter three discusses of experimental and results, chapter four contains conclusions.

Material and Methods
This study will develop a detection system for using masks to detect the correct placement of face masks using image processing methods that focus more on object detection. The stages of detecting compliance using masks can be shown in Figure 2 below. The first stage in the detection process was to get the raw data obtained from the camera with a certain resolution. The development of Face Mask Detection begins with data collection. In this step, the image was cropped until the only object visible was the object's face. After the data was collected, the data was labeled and grouped into two parts; with mask and without mask.  The preprocessing stage was used to convert raw data into data that was noise-free and ready to be used for input into machine learning, for example in Neural networks. The preprocessing steps include: resizing the input image, applying color filtering (RGB), scaling or normalizing the image, cropping the center of the image, converting the image into a Tensor. Image resizing was an important pre-processing step in computer vision due to the effectiveness of the training model. The smaller the image size, the better the model will run. In this research, resizing the image was done, namely making the image 140 x 147 pixels.
The feature extraction process will convert the image data into a set of certain features so as to form a feature vector and then used in the classification process. Several features that can be explored from image data are color features, geometric features, texture features and statistical features. The classification stage was used to determine whether an object uses a mask or not. In general, the algorithm used in this classification stage was Convolutional Neural Network (CNN), which was a Deep Leaning algorithm. Furthermore, the evaluation stage was used to measure the accuracy performance of the classification results. The CNN architecture in this detection process uses MobileNets V2 which faster than the original CNN. The basic difference are in the use of layers that adjusted to the thickness of the input image.  Figure 3. The face detection model uses OpenCV and the results obtained are Region of Interest (ROI) which contains data such as location, width and face height. The next step was to build a training model on the classification of masked and unmasked faces using a dataset and processed using the MobileNets V2 architecture. This training model then applied to a real face detection test. Dataset using mask and without mask is shown in Figure 4.

Experimental and Result
This study will develop a detection system using a mask to detect the correct placement of a face mask by using an image processing method that focuses more on object detection. This study uses a dataset with 4095 images consisting of 2,165 using a mask and 1,930 without a mask and having a pixel size of 140 x 147. Classification data training process on MobileNets V2 architecture consists of a 7x7 Maxpooling layer, a flatten layer, a hidden layer with a ReLU activation function, 0.5 of Dropout value, two neurons in the output layer and a Softmax activation function. Some of the parameter settings used in this architectural include Learning Rate 1e-4, Epochs 20, Batch Size 32, Optimizador using Adam, and Loss function using Binary Cross Entropy. The image data used as input for the neural network has a scale of 140x147 pixels. The face classification training process uses a dataset with a comparison value of test data and training data of 30:70. Based on the comparison value, the amount of data used in training data and test data was 2.886 and 1.229. As shown in Figure 5, the convergence obtained from the training model occurs at about 10 epochs and has an train accuracy of 0.99 as well as train loss of 0.01 on detecting wearing face mask. The next test was carried out using metrics of Precision, recall, F1-score, macro avg and weighted avg. The matrix values of the training data test results were shown in Table 1.
Implementation of face mask detection systems can be done through the use of images and real-time video streaming. The images used in the detection process include several scenarios for the number of people with and without masks. Figure 6 shows the results of implementing a face mask detection system using images.  6. Face Mask Detection System using Images

Conclusion
This study will develop a detection system using a mask to detect the correct placement of a face mask by using an image processing method that focuses more on object detection. This study uses a dataset with 4095 images consisting of 2,165 using a mask and 1,930 without a mask and having a pixel size of 140 x 147. Face masked detection systems can be used to assist in monitoring compliance with the use of masks in public areas that have the potential to cause crowds so as to create a safe environment. The use of the Deep Learning method through the MobileNets V2 architecture was quite effective and gives an accuracy result in testing the classification training data of 0.99 at 10 epochs.