Analysis of Non-invasive Video Based Heart Rate Monitoring System obtained from Various Distances and Different Facial Spot

Heart rate (HR) is one of the crucial indicators for human psychological. In recent works, it has been shown that a standard camera is able to detect illumination changes in the face skin due to the human cardiac pulse and this can be used to estimate the human HR. However most of previous systems work on near distance mode with a single face patch, thus the expediency of the camera based remote heart rate estimation for long range distances remains ambiguous. This paper has proposed a solution by analyzing an optimal framework that able to works properly under the mentioned issues. Initially, presumable facial landmarks are estimated by applying cascaded of regression mechanism. Then, the region of interest (ROI) was selected based on the facial landmarks in the location where non rigid motion is minimal. Temporal photoplethysmograph (PPG) signal is extracted from the ROI and the unwanted signal such as environment illumination signal or motion artifact signal is eliminated by using Independent Component Analysis (ICA) filter. Then, PPG signal is further processed using series of temporal filter to exclude frequencies outside the range of interest prior to estimate the HR. Since, the HR is estimated independently from multiple local regions, a histogram analysis is constructed to calculate the average HR estimation accurately. From the experiments, it can be concluded that the HR can be detected up to 5 meters range with 94% accuracy using full face region.


Introduction
Nowadays, there are many non-contact heart rate monitoring methods that had been developed. Generally, there are three types of existing non-contact heart rate monitoring methods. The first method of non-contact heart rate monitoring system is electromagnetic based heart rate monitoring system. In general electromagnetic signal would be transmitted to the targeted person and the backscattered signal would be measured and thus the estimation of heart rate would be made from the back-scattered signal obtained. Continuous-wave (CW) and wide band pulsed radar (UWB) are two 1 To whom any correspondence should be addressed. 2 To whom any correspondence should be addressed. types of electromagnetic based monitoring systems [1][2][3]. For laser based heart rate monitoring system, typically a device called laser Doppler vibrometer (LVDi) was used [4]. LVDi detects the signal from skin movement that related with blood pressure pulse under carotid artery by measuring a small skin surface displacement cause by expansion and contraction of artery [5].
Recently, video based remote HR monitoring framework becomes subject of interest due to vast amount of data available from a cost effective device. Generally, in video-based HR system, PPG signal will be extracted from the face video recorded and multiple signal processing filters will be applied in order to obtain HR from PPG signal. PPG signal is an optically volumetric signal measurement of human body that usually cause by fluctuation of air or blood in the body.
One of the early works was started back in 2008 by Verkruysse et al [6]. They stated that the signal emitted through a face video recording under a normal or ambient lighting was rich enough to measure HR of a human being. They started by recording subjects' face video by using a standard recording camera with 640*480 resolution, a frame rate of 30 frame per second (30 FPS) and a standard graphical mode of Video Graphic Array (VGA). For this experiment, the ROI is selected manually and the raw signal is computed frame by frame based on the mean value of Red, Green and Blue (RGB) colour channel. They concluded that the green channel contain the strongest PPG signal compare to red and blue channel.
The research work then evolved with the work of Poh et al [7], instead of using a video recorder, Poh and his colleagues try a different approach by using a standard laptop webcam to record the face video of subjects. They used a face detector to detect the subjects' face frame by frame rather than defining the ROI manually. Another different approach that they made was, they used three colours channel RGB to obtain the raw PPG signal information. Then, ICA is used to separate the raw PPG signal from unwanted noise that may affect the HR reading. Lastly, they estimated the signal obtained as frequency and the highest frequency is selected as the HR estimation. Balakrishnan et al [8] proposed a HR estimation system by extracting raw PPG signal from subtle head motion of subject. They claimed that raw PPG signal can be extracted using this method due to blood incursion towards head. However, the system is not robust to motion artifact and subjects had to remain stationary during the recording.
Even though all of the mentioned works had very promising results and accuracies, all of the experiments conducted by the researchers mention above are made in controlled environments such as the distance of the subject between camera was limited, the subjects of the experiment were not allowed to exhibit any movements and lastly, all of experiments only use one ROI and no comparison between ROIs had been made. Thus, researchers are focusing on creating algorithms that can tackle more realistic condition. This leads to another different approach by Philips Research Group that tried to overcome the problem of moving subject with respect to illumination source. They stated that, an optimal fixed combination of band passed RGB signal can be found based on ratio of normalized colour signal when assuming standardized skin, eliminating noise due to specular reflection [9].
Recently, ROI selection had become increasing popular research topic. This is because it had been reported that ROI has a significant influence for HR estimation accuracy [7]. So intelligent ROI selecting and tracking algorithm had been developed, in order to achieve robust algorithm to motion artifact [10][11][12][13][14]. Facial landmarks detection is used, so that more detailed face region could be selected. Feng et al [15] discovered a set of points on the forehead that can subsequently be used to update ROI. However, they improved the algorithm and used the cheeks area instead. M.Kumar et al [13], tried a different method by dividing the face region into smaller ROIs. Raw PPG signal is extracted from all the region selected and is combined using a weighted average. Although, there were many works and researches that was done during these past few years regarding the HR estimation system, there are still a lot of improvement that can be made in order to obtain the optimal algorithm for the system. In this paper, we investigate the capability of remote HR for acquiring data from various distances and provide an optimal framework that capable to satisfy the constraint.

PPG General Algorithm
There are general algorithms of existing PPG approach prior to our project. In this section we are going discuss and classify the chosen approach accordingly. We divide this section into three important steps : Signal Extraction, Filtering and HR estimation.

ROI Detection
ROI detection is important to the HR system algorithm because PPG algorithm is based on human face. Hence it is crucial to detect the face of subject in the video frame. Most of commonly used ROI detection is Viola and Jones (VJ) algorithm [16]. VJ algorithm is a machine learning based algorithm that used Haar cascade features to detect face. This algorithm is popular due to its high rate detection, fast processing time and it could easily be used as it is available in most of software programming such as OpenCV and Matlab. Another different method is to use skin detection algorithm. The advantage of using skin detection is that additional area such as neck or hands of the subject will be included as the ROI. However, due to the same feature of the skin detection a problem could occur. The drawback of using skin detection is the presence of unwanted noise interference in the PPG signal extracted due to object that have similar colour to the subjects' skin. Facial landmark tracking algorithm is used to tackle the motion artifact and illumination issues. Furthermore, facial landmark tracking method will result in more detailed ROI selection due to basic points provided by the tracking system. Two popular facial landmark algorithm that used by many researchers in this field is Active Appearance Models (AAM) [17] and Discriminative Response Map Fitting (DRMF) [18].
There are many works evolution or improvement that were made for the past few years in order to obtain the most optimal method for facial detection. In this paper, the ROI detection used was VJ algorithm with Haar cascade feature and facial landmarks tracking. The reason that we chose VJ algorithm because it is a very promising algorithm due to its high detection rate and it is practical for real time application, in case if this project is going through an improvised version of real time application. We combine the face detection with facial landmark tracking system to obtain more detailed ROI selection for the project. ROI selection give a massive influence to the HR estimated result, thus selecting more detailed ROI would leads to more accurate results.

ROI Selection
Poh et al [7] stated that the ROI selection will influence the estimated HR reading because the raw PPG signal strength varies according to area of face region. In most publication, the common face areas selected are forehead and cheeks [19][20][21]. These areas are chosen because there would be less non-rigid motion involve compared to another face part. Lam et al [22] stated that the most optimal area that would yield a good PPG signal is located at the center of the face which include nose and mouth are but excluding the eyes area. Since our project involve with variation of distances, we are going to use all of the mentioned ROIs in order to determine the most optimal patch for our system.

Signal Extraction, Filtering and HR Estimation
Basically, there are many algorithms that can be used to extract the raw PPG signal from the ROIs selection. However, the most used method is BSS using ICA algorithms. Many researchers [7][8] used this method due to its ability to separate mixed signal into independent signals effectively. Another known method to extract raw PPG signal was to use Fast Fourier Transform straightly without BSS. Due to its effectiveness to separate mixed signal into independent signals and elimination most of unwanted signal from PPG signal, for this project we proposed to use the BSS with ICA as our algorithm for PPG signal extraction.
Filtering is an important step in the HR estimation system because despites using BSS as signal extraction tools and noise removal algorithm, the signal may contain a few unwanted noises that could not be eliminated using BSS. Hence further filtering process is required in order to obtain genuine PPG signal. There are three types of filtering algorithm that commonly used for the HR estimation system. The first one is bandpass filtering. Bandpass filter can eliminate both high and low unwanted frequencies. However, this method requires a definite frequency range that is feasible for human HR. A common frequency range for human HR is 0.7Hz -4Hz (42bpm -240bpm).
Another way is to use both detrending filter and moving average filter. Similar to high pass filter, detrending filter remove the unwanted long-running trend from PPG signal, which results in smoother signal. Moving average filter on the other hands, represent a low pass filter. Moving average filter is commonly used to smooth out short-term fluctuations and highlight longer-term trends or cycles. Both filters need to be applied to the PPG signal in order remove both high and low unwanted frequency.
A different approach was to use adaptive bandpass filter. Adaptive bandpass filter is known for its ability to progressively change the cutoff frequencies according to previous HR estimation. Hence, will produce a consistent HR results. However, the drawback of this filter is that it is depending on the previous HR estimated which means that it could not eliminate unwanted noise independently. With that being said, for this project, we used the detrending and moving average filtering due to is effectiveness and its promising result for our HR algorithm.
Lastly, frequency analysis can be used to estimate the HR readings. PPG signal is converted to power spectrum density (PSD) because PSD displayed a strong frequency that corresponds to human HR [22]. There are many frequency analyses that can be used to convert raw PPG signals to PSD such as Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), Welch's method and Short-time Fourier Transform (STFT). Once the signal had been converted, the frequency that exhibits the highest spectral power is chosen as HR estimation reading.
As a conclusion, since previous works only used certain part of face at a time in their experiments, thus there were no proper comparison between ROI were made. This paper aims to propose optimal ROI by investigating the relationship of face region for PPG signal extraction and the distance of subjects from the camera.

SYSTEM OVERVIEW
The proposed framework for the system consist of five important steps that are facial tracker, signal extraction, BSS using ICA, signal filtering and lastly a histogram analysis. Figure 1 shows the block diagram of the system flow.

Facial Tracker
Facial tracker is one of the important steps in this system to detect the face and extract facial landmarks information. For the facial tracker in this system, AdaBoost-based cascade with Haar like feature [16] was used. This classifier works by constructing a strong classifier (positive images) as linear combination weak classifiers (negative images). From the detected face region, facial landmarks are detected by using cascaded of pose regression model [23]. This facial tracker would track rigid and non-rigid facial landmark, and 49 points of important facial point would be displayed. The current shape estimate of the facials location can be estimated by using following formulation let S = (x1, x2…xp) denotes the coordinate of all p facial landmarks in a bounding box I and rt(.,.) be the regressor cascade.

S(t+1) = S(t) + rt(I, S(t))
(1) Based on the facial landmarks location obtained, a ROI of face region was constructed. In this paper, four ROIs were investigated to determine the most suitable patch that can be used in various distances requirement as shown in Figure 2(b). The first patch was on the right cheek of the face. Right cheek was chosen because according to [24] cheek is the most practical for ROI selection since they are rarely covered by clothing of facial hair and less non-rigid motion is generated in this area compared to other regions.
Next, the second patch was selected on the center of the face that includes eyes and nose region but excluding the forehead area since the forehead tends to be covered by hair. This area was chosen because previous study [25] reported that the center of face to be the most suitable ROI for video based HR. The third patch was the whole face since basically the larger the region, the possibility to extract the PPG information from a far distance is high. Means that the information loss could be reduce even if the distance of the subject to the camera was farther apart. Finally, the fourth patch was selected at the lower part of the face that includes nose and mouth but excluding eyes and chin area. This region was chosen because of there are less non-rigid motion in it and wider ROI dimension compared to the patch placed on the right cheek.

PPG Signal Extraction and Noise Removal
Raw PPG signal was extracted from ROI constructed based on the 49 points facial landmarks location. The raw PPG signal was extracted from the green channel since the green channel provide strongest reading for the PPG signal due to the sensitivity of light absorption of hemoglobin. The raw PPG signal that was extracted was a mixed signal that contained unwanted signal such as background illumination and motion artifact. So, in order to eliminate the unwanted signal, ICA based BSS method was used to separate the mixed signal.
ICA is one of the BSS techniques that capable to recover unobserved signal from a set of observed mixture in which the mixing process is unknown and assume to be linearly correlated. The general mathematical model of ICA can be represented according to equation (2) below, where x is the observed signal (mixed signal), A is the unknown mixing matrix of two independent signals and s is the independent component.
Thus, based on ICA mathematical model, we can say that blood volume variation cause by cardiac pulse and the illumination changes are two important factor that affect the reading of raw PPG signal. The mathematical model of the factor influences the PPG signal can be represented by a linear correlation define by equation (3) in which s represents the green channel signal and y represents the variation of illumination. Ideally if the y parameter can be estimated, then the pure cardiac pulse signal can be obtained. However, in practice the signal of y cannot be measured directly.
PPGraw = s + y (3) The raw PPG traces obtained is fed to the ICA by assuming that the rectified PPG signal and the illumination components are independent. After solving for the components, the ICA will produce two separated signals. Average Euclidean distance measure is performed to find the similarity score between the signals, and the one with the lowest value will be labeled as the rectified PPG signal while the other one will be denoted as illumination variation.

Signal Processing and Filtering
The PPG signal obtained after the BSS was considered as a refined but not a pure signal. The signal obtained still contained some unwanted noise that could affect the HR reading. Thus, further signal processing and filtering was applied to the refined signal. In this paper, there were two step of the PPG signal filtering process. Firstly, a detrending filter was applied to the PPG signal in order to reduce slow and non-stationary trends of the raw signal. Then a moving average filter was applied to decrease the noise that presence in the signal. Next, the filtered PPG signal was change to frequency domain and the power spectrum density of the signal with the frequency within the range of 0.7Hz to 4Hz that represent the HR value range from 42bpm to 240bpm was computed using the Welch's method. The highest frequency of the signal amplitude was multiplied by 60 in order to get the estimated heart rate value. After that, a confidence ratio computational was made because, a high ratio that show PG signal had dominant frequency and therefore providing more confidence that the heart rate estimated was correct. Illustration of the generated signal as a result of the elaborated process can be seen in Figure 4.

Histogram Analysis
The HR value obtained from a single sequence calculation is still subject to the variation and hence a histogram based analysis is performed. The histogram analysis was made in order to increase the accuracy of the heart rate value estimated. Histogram analysis could be done by repeatedly selecting another pair of random points from ROI, extract the PPG signal from the green channel, apply filter to the PPG signal, compute the confidence ratio from the PPG signal, compute the heart rate value and was added to a histogram. Lastly, the final result of the heart rate estimation value will be obtained. The purpose of using histogram analysis was to obtain more accurate heart rate estimation value from the proposed system.

RESULTS AND ANALYSIS
This section, will discuss on the performance of the proposed system of this paper. In the experiment, we conduct the face videos recording in an indoor environment under controlled light condition, the videos were recorded with 1440*1080 pixels resolution and 60 FPS is taken from participant with different distance setting of 1 meter, 3 meters and 5 meters respectively. During the experiment, the subjects were allowed to exhibit normal static motion along the process. There were eight subjects that were involved in this experiment and the subjects were all in different races and skin colours. Figure 5 and 6, shows the experimental set up for the system that was used for the proposed framework. There were three parts of analyses that conducted to test the performance of the proposed system. The first analysis target was to determine the heart rate value using the system proposed for various distances, the second analysis was to find patches region that will give more accurate heart rate readings and last analysis was to observe the effect of different skin tones to HR estimated. Since the analyses were focusing on heart rate for various distances, the distance for both experiments ranges up until five meters. Quantitative assessment such as accuracy and percent of error are calculated using equation below, where measured value is heart rate obtained from camera and actual value is heart rate obtained by using pulse oximeter (ground truth).  The analysis is conducted to determine an optimal dependency between distances and face ROI that gives the highest accuracy of the estimated HR value. Previously, like mentioned in the system overview there were four ROI selected for this experiment which are right cheek, center face, whole face and lower face. The entire patch will be used fully in each of the video taken at three different distances as explained above and the result is showcase in Table 1. From the table, for 1 meter reading the most accurate HR was found to be generated from the whole face region with 94.11% accuracy. Clearly this happen because of the region selected was the biggest region compared among the other region.
Larger ROI means that more information about the raw PPG signals could be extracted, hence resulting in more accurate HR reading. Moving to a 2 meter distance, it is found that the most accurate HR was obtained from the center of face area with 90.09% accuracy. Apparently, same as the whole face region, the center part is considered larger part of region and it can generate an accurate HR reading even from a far distance. For a 5 meter case which consider as the farthest distance from the camera, the most accurate HR was also obtained from the whole of the face area with 91.53% accuracy. Again, this was due to the larger region for PPG signal extraction and bigger dimension as mentioned previously that make the obtained PPG to be more accurate.
From the obtained data, for a single person HR reading over a 5 meters range, it can be deduce that the ROI that produce the most inaccurate reading was the patch that placed at the right cheek of the face with an average of 84.11% of accuracy while the optimal ROI is found to be the whole region of the face with an average of 91.86% of accurate HR estimation. One of the reasons that contribute to the inaccuracy of the HR reading is the region size of the ROI which practically contribute to information loss during the signal extraction process. However, generally the entire selected path is capable to produce around 80% of accurate HR estimation in each of the distance variation. Last but not least, the third analysis that was done was regarding the effect of HR estimated corresponding to the skin tones of the subjects. As mentioned previously, the volunteered subjects for this experiment came from different races such as Malay, Chinese and Indian. For the HR result corresponding to the skin tones, was shown in table 2.