Can machine recognize a long-missed old friend? A test to the FaceNet face recognition algorithm

Facial recognition receives heated discussion in recent years. Given the increasing demand for portable and easy-to-use face recognition programs, several coding libraries have been developed that includes the algorithm for identifying and classifying faces. One example of such libraries, the “face_recognition” library developed by Adam Geitgey allows fast execution for face recognition tasks and requires only 1 photo as training input. It adopts the Convolutional Neural Network (CNN) from OpenFace, which is built based on the FaceNet algorithm. Although the study claims that the algorithm has a high base on tests from common face dataset, it is not clear whether the algorithm can identify people consistently when they are in their different ages. In this paper, a face dataset is constructed by collecting publicly available photos from celebrities that can be traced to a certain date of taken. These photos are grouped by 5 pieces per person with each photo taken roughly 4 years apart from the previous one. Then, experiments are conducted to test the effectiveness of face_recognition library on identifying the person from these groups of photos. The results report an over 75% success rate for the algorithm on image identification tasks, indicating a good ability for the face encoding to resist the gap of time.


Introduction
Facial recognition is a widely researched field of computer technology.It intends to allow computers and machines to analyze images containing human faces and identify the person within the image.With the increasing demand for automatic identity checking, the deployment of face recognition system is rapidly expanding.Indeed, facial recognition technology has been widely applied in fields ranging from device access control to security checking.
As human beings, the ability to recognize facial pattern is innately built as part of the function of our visual system and nervous system.Inspired by how the eyes and brains work, the task of machine facial recognition can be divided into four processes: 1. detect and capture faces in the image.2. Apply transformation to the face image to account for variation of viewing angle and environment factors.3. Encode facial pattern.4. Recognize the identity of the face using the encoded data.For machines to recognize faces from an image, all parts of the above-mentioned procedure need to be done.However, part 3 of the procedure-the encoding of facial pattern-is the most essential part of the problem.To allow fast identification of a certain face from vast amount of image input, quantitative measurement need to be extracted from facial features.In recent years, researchers offered various solutions to generate facial pattern data using deep learning techniques due to its excellent performance in various tasks [1,2].Specifically, OpenFace utilizes Deep Convolutional Neural Network (CNN) network to collect 128 numerical data for each single face.It is built on the FaceNet algorithm which trains the CNN network to directly optimize the 128-dimension face embedding [3].After training, the network can generate roughly consistent measurement for same faces, allowing accurate similarity detection and face identify verification from image input.Also, the facial recognition system provided by OpenFace allows fast and efficient execution even on a personal computer.Unlike previous approaches, the FaceNet algorithm doesn't rely on intermediate bottleneck layer, thus is able to achieve superior efficiency [4].
Elaborating on this approach, Adam Geitgey integrates various machine learning toolkits such as OpenCV and dlib to build the "face_recognition" library [5].It is an efficient and easy-to-use toolkit that provide face recognition functions for individual programmers.Using the library, a user only needs to feed one image per person into the system and the algorithm can identify the presence of that person in images with a high accuracy.In fact, when tested on the "Labeled Faces in the Wild" dataset, the model achieves an impressive high accuracy of 99.38% [5].
However, although Geitgey's approach can achieve a high performance on common face datasets, it is unclear whether the algorithm is able to generate consistent measurement for a person in his or her different ages.As discussed above, the CNN network encodes facial pattern data from a single image input.But as people grow up, the alignment of facial features can vary a lot.Human brains possess the ability to capture the identity of a long-missed acquaintance, but it is unclear whether the existing CNN network can also take the natural variation of faces into account.In this regard, this study designs a test to evaluate the ability of the face_recognition library to recognize that the images taken between a long period of time belong to the same person.100 images of 20 celebrities were collected from the Internet with 5 images per person.For every person, each of the image was taken roughly 4 years away from the last one.Given the middle image as training input, the experiment tested whether the algorithm could verify that the other 4 images belong to that same person.

Dataset description and pre-processing
Currently, there are no easily accessible dataset that are organized by groups of people's images taken at their different ages.So, in this study, a dataset of face images is manually constructed by collecting pictures of top 20 celebrities from Forbes "The World's Highest-Paid Celebrities" list [6].Given the abundance of public image resources available about these famous people, the dataset is carefully selected to include a series of 5 images for every person, with each image taken roughly 4 years apart from the previous one.Sample photos are provided as shown in Figure 1.Since the applied facial recognition algorithm already contains a series of steps to process the input image, there's no need for additional data preprocessing.The detail of image processing techniques used will be introduced below.

Approach
This study intends to test the effectiveness of "face_recognition" library invented by Adam Geitgey [5]. Figure 2 is a detailed follow-through of Geitgey's approach [7].Here, the facial recognition system uses dlib's face detection algorithm [8].Specifically, dlib's face detector utilizes a HOV + Linear SVM approach [9].Essentially, the algorithm transforms the original image into gradients showing the flow of brightness.From the gradient chart, the algorithm can detect the presence of face pattern [7].Using such approach, the face detector achieves high accuracy while maintaining a good computational efficiency [9].
Face Transformation.After acquiring the face from the image, next step is to apply transformation on the face image to eliminate the influence of posing and viewing angle.In detail, the system relies on the "face landmark estimation" algorithm developed by Vahid Kazemi and Josephine Sullivan [10].The idea of the algorithm is to generate and locate 68 landmarks from every face in the image.The program then applies rotation and scaling to the face to center the eyes and mouths.No matter where the face originally faces, the program centers the eyes and mouths at roughly the same position.
Face Encoding.The most essential part of the facial recognition system is a CNN network that is trained to encode facial pattern data from the image input.As described in the introduction section, the system is built from the CNN network trained by OpenFace [3].Specifically, during the training phase, the algorithm picks a face from a person and generates measurement on it.Then it picks out 2 testing pictures: one from the same person and the other one from another random person.In the next step, the program compares the 128 measurements it generates from two testing pictures and slightly modifies the neural network to make the measurement closer for the pictures coming from the same person and farther away from the picture of another person [7].Finally, after the training, the network can generate 128 consistent pattern data for each face.
Face Classification.The final step for the system is to build a classifier for the face measurements.The classification algorithm searches through the database to find a face that is closest match to the new image that needs to be identified.In Geitgey's approach, a linear SVM classifier is applied [7].Given the same experimental conditions, the result for male and female were separated and shown in Table 2 and Table 3.Compared with data from males, female photos were generally less likely to be recognized than male photos.This gender difference increases as the time gap between training photo and testing photo expands.

Discussion
Given the general high accuracy achieved by the identification test, it can be inferred that the FaceNet facial recognition algorithm has a good ability to recognize faces taken over 16 years of time gap.
However, it is worth noticing that the success rate is lower for the earliest group of photos.Specifically, when trained on the middle photo from the series, the photo taken from 10 years ago is significantly less likely to be identified than the photo taken from 10 years later.Also, compared with taking the latest photo as input, taking the earliest photo as training input causes a lower possibility for the algorithm to identify the other 4 photos.One possible explanation for this is that human faces vary greater in early years.
Finally, compared with identifying men, the algorithm was less competent to identify women given the same age variation.As of right now, the reason behind the displayed gender difference is unclear.It is possible that female's faces vary greater across time than male's faces.But it can also be that women turn to wear distinct makeup in different ages.And the variation in face makeup could possibly influence the encoding of facial feature for the algorithm.

Conclusion
This study tests the effectiveness of FaceNet face recognition algorithm on consistently identifying an individual person with the challenge of aging.Given the photos collected from the Internet, the algorithm maintains a high identification accuracy across age differences.However, the algorithm displayed a slightly weaker ability to match young faces with matured ones.Also, the experimental result shows a gender difference as the algorithm did better when identifying male than female.Future improvements to this research would be to conduct experiment on more sophisticated dataset with enlarged sample size and faces without makeup.

Figure 1 .
Figure 1.Sample image of Lionel Messi from 2006 to 2022.Since the applied facial recognition algorithm already contains a series of steps to process the input image, there's no need for additional data preprocessing.The detail of image processing techniques used will be introduced below.

Figure 2 .
Figure 2. Flow chart for the algorithm.Face capturing.The first step of the algorithm involves detecting and capturing faces from image input.Here, the facial recognition system uses dlib's face detection algorithm[8].Specifically, dlib's face detector utilizes a HOV + Linear SVM approach[9].Essentially, the algorithm transforms the original image into gradients showing the flow of brightness.From the gradient chart, the algorithm can detect the presence of face pattern[7].Using such approach, the face detector achieves high accuracy while maintaining a good computational efficiency[9].Face Transformation.After acquiring the face from the image, next step is to apply transformation on the face image to eliminate the influence of posing and viewing angle.In detail, the system relies on the "face landmark estimation" algorithm developed by Vahid Kazemi and Josephine Sullivan[10].The idea of the algorithm is to generate and locate 68 landmarks from every face in the image.The program then applies rotation and scaling to the face to center the eyes and mouths.No matter where the face originally faces, the program centers the eyes and mouths at roughly the same position.Face Encoding.The most essential part of the facial recognition system is a CNN network that is trained to encode facial pattern data from the image input.As described in the introduction section, the system is built from the CNN network trained by OpenFace[3].Specifically, during the training phase, the algorithm picks a face from a person and generates measurement on it.Then it picks out 2 testing pictures: one from the same person and the other one from another random person.In the next step, the program compares the 128 measurements it generates from two testing pictures and slightly modifies the neural network to make the measurement closer for the pictures coming from the same person and farther away from the picture of another person[7].Finally, after the training, the network can generate 128 consistent pattern data for each face.Face Classification.The final step for the system is to build a classifier for the face measurements.The classification algorithm searches through the database to find a face that is closest match to the new image that needs to be identified.In Geitgey's approach, a linear SVM classifier is applied[7].

Table 1 .
Rate of successful identification based on age groups.

Table 2 .
Rate of successful identification for female

Table 3 .
Rate of successful identification for male.By feeding the algorithm with image from different ages, three experiment conditions were built.Given one photo as training input, the success rate for the algorithm to correctly identify the person from other 4 images are shown in the Table1.As demonstrated, the facial recognition algorithm achieved a relatively high accuracy, with all experimental conditions reached over 85% of success rate.Among three test conditions, training the algorithm with earliest photo results in lowest success rate for identifying other 4 pictures.