Face Spoof Detection Using VGG-Face Architecture

Face recognition systems have been obtaining substantial importance in modern world. Security systems are major application of face recognition system. However, the potential of the face recognition system to withstand the attack of an unauthorized person is an important concern. Face recognition systems are vulnerable to photographs and video spoof attacks. In these scenarios, anti-spoofing systems comes in handy to evade these attacks. Robust solutions are required for face recognition system to be immune against spoofing attacks. In this paper, the detected face is denoised and then converted to YCbCr and CIELUV colour model and then passed through VGG-Face architecture for extraction of face embeddings of each colour space. Then the extracted face embeddings are concatenated and then passed through SVC (Support Vector Classifier) which then classifies real and spoof faces. The proposed method has obtained a test accuracy of 99.6% with specificity of 99.5% for spoof detection.


Introduction
Biometric systems such as face recognition, finger-print identification is extensively used for personal identification. It is more secured than any traditional methods like passcode-entry, ID card, or keys. Face recognition system is also more convenient than the traditional methods. However, Face recognition is often prone to presentation attacks. Presentation attack includes print and video/replay attacks. In print attack, the attacker utilizes the photo of a valid user presented in a digital device or printed in a paper. In video/replay attack, the attacker uses natural human movements of a valid user recorded in a video. Many different types of hardware and software methods have been developed to detect spoof faces.
The software-based methods analyse the liveness properties such as textures, structure information and liveness sign and image quality. All these methods are very sensitive to environment noises such as low light conditions. It takes high computational time to obtain the result. Texture based methods utilizes the reflection of light from the surface of the target. The human skin will reflect the light differently than a plain paper or a screen. Detection of spoof attack is based on the difference between real and spoof face's visual and tactile quality. Tactile texture represents the roughness or smoothness of the surface while visual texture is the illusion of quality of the surface. The algorithms used for the texture-based methods are local binary pattern (LBP), Fourier analysis, Colour Texture analysis. This method is more subjective to the environmental noises. Structure based methods captures the differences in the structural properties of 2-D plain surface and 3-D surfaces. The light falls on the 3-D surfaces diffuses more slowly than the 2-D surface. The delay in the diffusion is due to nonuniform light spread on the 3-D surface. It requires more time for detection. The liveness 2 detection method utilizes blinking of eyes and movement of mouth for spoof detection. It also requires the cooperation of the users. This type of spoof detection fails when it comes to video/replay attacks. The image quality analysis method detects the quality of the image used for detection. Higher resolution images can pass these kinds of analysis methods. Hardware based methods uses additional hardware such as infra-red cameras, multiple 2-D cameras provide high accuracy results but it costs high. In this paper, the proposed system performs spoof detection computation by combining the features extracted from CIELUV and YCbCr colour space converted detected and denoised face.

Literature Review
Yousef et al., have utilized patch-based CNN for extraction of local features and depth-based CNN for depth map generation which then used to identify the real and spoof face images. Patch-based CNN has been used for increasing the training data and to retain the native resolution of original image. Classification is done using the features extracted from the depth map using depth-based CNN. Kant et al., proposed an approach utilizing both camera and thermal sensor. For detection both the camera and thermal sensor captures the users and then for any frame it compared with thermal image from the thermal sensor which discriminates the face skin from the 2-D surface. They have achieved an accuracy of 98% using thermal face recognition [7]. Jukka et al., has proposed a method involving the passage of input image through face detector and upper body detector. Upon finding the upper body, then the image is passed through spoofing medium detector for further classification of real and spoof faces. Combining both the CASIA and NUAA dataset they have achieved an EER of 6.8% [8].

Dataset
The dataset utilized for training and testing of face spoof detection consists of real and spoof images from NUAA photography imposter dataset [8] and custom created dataset. The number of real and spoof images with respect to NUAA photography imposter dataset and custom created dataset are shown in Table 1. The samples of respective datasets are shown in Figure 1&2.

Methodology
In this section, the working of facial anti-spoofing system is discussed here. Facial anti-spoofing is the task of averting presentation attack for false facial authentication. The proposed system in this paper is illustrated in Figure 3.
In this paper, face detection has been performed using MTCNN (Multi-Task Cascaded Convolutional Neural Network) [10]. MTCNN comprises of three networks corresponding to each step respectively. Initially, an input image is passed through P-net for prediction of possible face positions and their bounding boxes. The respective output consists of large number of false positives. Hence the output is passed through R-net for regression of bounding boxes which eliminate false  The face features extracted from VGG-Face is passed through Support Vector Classifier (SVC) for classification of real and spoof images. The detected face in YCbCr and CIELUV colour space is shown in Figure 5.

Experimental Setup
Face feature extraction and training and testing Support Vector Classifier (SVC) is done with a Quadcore Intel I5-8265U processor of 8GB RAM with base speed of 1.6 GHz and maximum speed of 3.9 GHz with Smart Cache of 6 MB. Deep learning toolkit namely tensorflow and keras and python modules namely open-cv, scikit-learn, mtcnn and vggface are used.

Results
Evaluation of the proposed method for spoof detection using various performance metrics is discussed in this section. Performance metrics include Accuracy, Specificity, Sensitivity and AUC-ROC Curve. Accuracy measures the total number of correct predictions. Specificity and Sensitivity computes the overall number of correct negative and positive predictions respectively. The ROC (Receiver Operating Characteristics) Curve represents the correlation between TPR vs FPR which differentiate the classifier ability at different thresholds as well as compared with No-skill classifier. The AUC (Area Under Curve)-ROC represents the likelihood of an input being correctly classified and it varies between 0-1. The face features are extracted from the YCbCr and CIELUV converted image using VGG-Face with pre-trained weights. These extracted features act as input for SVC (Support Vector Classifier). Training and testing images for SVC with respect to each class is illustrated in Table 2. Fine-tuned SVC is evaluated on a totally unseen data to generalize how well the classifier is performing. The confusion matrix shows that there were 14 misclassified images in Figure 6. Evaluation metrics based on the predictions in test data is illustrated in Table 3.    Figure 9. Comparison of results with the other works is illustrated in Table 4.