Facial Expression Recognition using Multiclass Ensemble Least-Square Support Vector Machine

Facial expression is one of behavior characteristics of human-being. The use of biometrics technology system with facial expression characteristics makes it possible to recognize a person’s mood or emotion. The basic components of facial expression analysis system are face detection, face image extraction, facial classification and facial expressions recognition. This paper uses Principal Component Analysis (PCA) algorithm to extract facial features with expression parameters, i.e., happy, sad, neutral, angry, fear, and disgusted. Then Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) is used for the classification process of facial expression. The result of MELS-SVM model obtained from our 185 different expression images of 10 persons showed high accuracy level of 99.998% using RBF kernel.


Introduction
Facial expression in everyday human life is used as a form of natural human response that describes the feelings felt by a person in interacting with a certain thing. In the interaction between fellow human, expression is use as a part of communication. On the other hand, technological developments from time to time have great leaps when a new interaction is implemented such as the use of keys, monitor screen, GUI, touch screen, to voice commands. The key to technological development lies in the user, therefore computer interaction is an important part in technological development and understanding the user's response becomes very important. One approach taken to understand the response of the user is by a facial recognition system. By understanding the expression of the user, the computer can be made as if it can understand human feelings.
The research on facial expressions has been started since 1978 [1] and was extensively researched in the computer science area of the 90s and continues to the present day. an expression recognition system of two important stages, namely facial features representation and classification design. Facial feature representation is obtained by extracting input in the form of digital image by using certain method. The method of feature extraction used by researchers to build a facial expression recognition system that is Principal Component Analysis Algorithm (PCA). Principal Component Analysis (PCA) algorithm is one method that can be used to process the image of a person's face so that the system will automatically recognize a person's face through its main features such as eyes, nose, lips, and eyebrows as identity [2]. The identity of the person's face image by the system will be recognized through various training (training) stored in the database. The training phase is the result of extraction from different collections of different faces then collected and stored in a database. The results of facial images that have been extracted using PCA algorithm will be compared with the new face image 2 1234567890 ''"" as the face image to be tested and testing whether it has similarity or almost similar to recognizable by the system [3].

Related Works
Yubo Wang, et al., [4] have proposed a new method of recognition of facial expressions. The facial expressions are extracted from the human face by the expression classifier learned from the Haar feature based on the LUT Weak Classifiers. The expression recognition system consists of three modules, face detection, facial feature landmark extraction and facial expression recognition. This system is applied automatically can recognize seven expressions in real time that include anger, disgust, fear, pleasure, neutrality, sadness and surprise. Reda Shbib and Shikun Zhou in [5] studied the introduction of facial expressions using the Active Shape Model method by adopting AdaBoost classifier and Haar-Like feature to detect images. AdaBoost is a method to increase classifier accuracy gained from the component learner of the Support Vector Machine (SVM) method [6,7,8]. Data mining is a process that uses statistical techniques, calculations, artificial intelligence and machine learning to extract and identify useful information and related knowledge from large databases [9]. SVM initially can only classify data in two classes [8,10,11]. However, further research SVM was developed so that it can classify data over two classes (multiclass) [12,13,14]. Classifying M-classes means predicting the class labels , = 1, … , one way to solve the M-class problem by formulating it into binary L classification problems [15]. SVM concept is simply trying to find the best hyperplane that serves as a separator of two classes in the input space. Pattern that is a member of two classes into +1 and -1, and share alternate field separators. The best dividing fields can not only separate the data but also have the largest margins. Margin is the distance between the fields of separator (hyperplane) with the closest pattern of each class.
Let { 1 , … , } be the dataset and ∈ {+1, −1} is the class label of the data. A pair of parallel bounding plane separates the two classes. The first delimiter field limits the first class while the second delimiter field limits the second class, it is obtained therefore as follows [16].
The best separator field has the largest margin values that can be formulated into a quadratic programming problem as follows. min , , with constraints: ( . + ) ≥ 1 − and ≥ 0, where = 1, . . , is a slack variable that determines the level of misclassification of the data samples. Whereas, > 0 is a parameter. Kernel method is used when the ordinary SVM classifier cannot separate the data linearly. The kernel method transforms the data into the features space dimension so that it can be linearly separated on the features space. The kernel method can be formulated as follows. LS-SVM is one of the SVM modifications that is solved linearly [7,8,11,13,14]. If the SVM separator field is given as in (3), then the LS-SVM is given as follows. (1) The above equation can be solved with Lagrangian multiplier.
where the Lagrangian multiplier αi can be either positive or negative.
To optimize the conditions in (5), we decrease of w, b, , and α is equal to zero. The results of the process are as follows.
Using the One-Against-All (OAA) method, k binary model can be constructed (in which k is the number of classes). Each i-class model is trained by using the entire data. For example, there is a classification problem with 3 classes. Training step constructs 3 pieces of binary classification with the following objective function. min , , , Confusion matrix is a table that expresses the amount of test data that is correctly classified and misclassified. The table using the following parameters in order to calculate accuracy, sensitivity and false discovery rate. True Positive (TP) is the number of documents from class 1 is correctly classified as class 1. True Negative (TN) is the number of documents from class 0 is are correctly classified as class 0. False Positive (FP) is the number of documents from class 0 incorrectly classified as class 1. False Negative (FN) is the number of documents from class 1 that are misclassified as class 0.
Performance measurements of accuracy, sensitivity and false discovery rate can be calculated using the following formulas respectively.

Experimental Details
The process of facial classification is explained by the following activities:

Principal Component Analysis (PCA)
The sequence of features extraction process using PCA is given as follows.
1. Read input data 2. Calculate the average of each column Input of all image of training data is input parameter on PCA function that is "ImageTraining" mean column image can be calculated using matlab program code: I = imread ('YM.AN1.61.tiff'); Average = mean (I);

Calculate the zero mean
To calculate the zero mean, the average matrix of training data must be doubled as much as the sample data, which is multiplied by 240 lines with the command: ZeroMean = bsxfun (@ minus, 256, Average); 4. Calculate the covariance To calculate the covariance matrix by multiplying the zero mean against its transpose, and the result is divided by "Amount Sample-1" with the command: CovMatric = (1 / (10-1)) * (ZeroMean * ZeroMean ');

Calculate eigen value and eigen vector
The dimensions resulting from the covariance matrix have 240 row dimensions and 240 column dimensions. The dimension has been significantly reduced from 10304 to 240, so the dimension number has decreased by 10064 (10304-240 = 10064). The value of the covariance matrix is used further process to determine the value of eigen value and eigen vector with the command: [eigvector, eigvalue] = eig (CovMatric);

Sort the eigen value in descending order
To perform a descending order of "eigen value", simply take the diagonal value followed by the sequencing process. The "eigen value" sorting result is stored in a variable, that is, the "Result" variable and the sorting index are stored in the "Index" variable with the command: eigvalue = diag (eigvalue); [Results, index] = sort (eigvalue, 'descend'); eigvalue = eigvalue (index); 7. Sort the eigen vector column to change as the eigen value index changes 8. Check the value of eigen value close to 0 The most non-dominant eigenvector values tend to have a correlation of the value of eigen value that is close to zero, therefore it is necessary to check the value of eigen value and the number of close to zero with the command: NotDominant = find (eigvalue <0.00000000001); Amount = length (Not Dominate); 9. Calculate the projection matrix The projection matrix can be calculated by multiplying the transpose data variable by "eigen vector". The code of the program is as follows: ProjectionMatric = ZeroMean '* eigvector; The result of the program code is used to form a diagonal matrix with the command: RootSumProjectionMatric = (1 ./ (sum (ProjectionMatric. ^ 2). ^ 0.5)); New projection matrix can be obtained with the command: ProjectionMatric = (bsxfun (@ times, ProjectionMatric, RootSumProjectionMatric)); 10. Throw away the eigen vector column whose position is equal to the eigen value position which has a value close to 0. If the value of eigen value is found to be close to zero, then the eigenvector column correspondingto the eigen value position need not be used for the next process with the command: eigvector = eigvector (:, 1: end-Amount); The

Dataset
We conducted a simulation for the proposed method; i.e., face detection, feature extraction using PCA algorithm, and facial expression classification process using Ensemble Multiclass Least-Square Support Vector Machine (LS-SVM). The data used in this simulation is data about facial expressions from the Olivetty Research Laboratory (ORL) and Jaffe Images can be downloaded, respectively, on the following URL addresses. and The type of expression is divided into six classes, i.e., angry, disgust, fear, happy, neutral, and sad. The sample data on this dataset amounts to 240 images of facial expressions.

Classification Process
In the process of expression classification, starting with image taking of the file, then image is processed using image processing software to get numerical data value which will be used as input data for learning process and validation. After the learning data in the next can be the result of data learning is used for the testing process.

Multiclass Algorithm Method
The analysis used in this multiclass classification is the One Against All classification method. The multilingas method algorithm as follows: 1. Dataset input. 2. Identify the input dataset a. The values of the training data feature (xi) b. Class of training data (yi) c. Values feature test data (xti) d. Class of test data (yti) 3. Initiate objects on LS-SVM before performing the training process with the initlssvm function a. Specify data of training data feature (xi) b. Specifies the training data class (yi), c. Choose a classifier to classify data d. Selects the kernel and its parameters to use 4. Selecting the multilingual method code used (code_OneVsAll for One Against All) 5. Conduct training process with train lssvm function 6. Calculating values w 7. Make predictions based on the model obtained and determine data feature test data (xti) with simlssvm function 8. Create a confusion matrix 9. Calculate the level of accuracy with the formula: where C is the correct total of predictions and N is the total of all data tested.

Implementation Results on dataset
The separator function for the one against all method with the RBF (Radial Basis Function) kernel using the parameter σ = 0.5 for dataset is as follows.

Results and Discussion
The values of w and b based on the kernel type and their parameters for the use of the One Against All method on the expression dataset can be seen in table 1: Based on table 1 and figure 1, the use of RBF kernel types and using the parameter σ = 0.5 has the highest accuracy on dataset has the highest accuracy of 99.99832%. Accuracy for classification with the 6 number of classes.

Conclusion
The facial expression recognition have been proposed using the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) method. The image of the face is extracted using Principal Component Analysis (PCA) algorithm to take the value of Projection Matrix and Weight Matrix. The classification process uses the MELS-SVM to construct the classifier model from the training images. Finally, the testing images is used to measure accuracy using RBF kernel and the our result showed the accuracy level is 99.99832%.