Feeling Recognition by Facial Expression Using Deep Learning

The visible appearance of the emotion state, personality, purpose, psychological feature activity and psychopathology of an individual is the Facial expression. This plays an outgoing role in social affairs. Automatic recognition of facial expressions will be a vital part of natural human-machine interfaces. It should even be applied in activity science and in clinical apply. Fellows in nurturing automatic facial features Recognition system must perform detection and placement of faces during a disordered scene, facial feature extraction and facial features classification. Emotion recognition by facial features utilizes a Deep Learning system, which is enforced victimization Convolution Neural Network (CNN). The CNN model of the project relies on LeNet design. Kaggle facial features dataset with seven facial features labels as fear, anger, happy, surprise, sad, neutral and disgust is employed during this project. The system achieved 60.37 accuracy.


Introduction
A Facial expression is the visible display of the personality, psychopathology, cognitive activity, and intentions of a person and plays an outgoing role in interactive relations. Human facial expressions are classified into seven basic emotions: fear, anger, happy, surprise, sad, neutral and disgust. The facial emotions are expressed through prompting specific sets of facial muscles. A Plentiful amount of information about our state of mind is contained in sometimes subtle, however complex, signals in an expression.
Automatic recognition of facial expressions can be an important constituent of natural human-machine interfaces. This may also be used in behavior science and in clinical training too. This showed progress in recent decades, however, it has been considered for a long period of time. Due to the complexity and varieties of facial expressions, identifying facial expression with a high accuracy remains to be difficult even though much progress has been made.
In day to day life, humans commonly identify emotions by characteristic features, which are presented as a part of a facial expression. For example, happiness is definitely associated with a smile, which is an upward movement of the corners of the lips. In the same way, other emotions are also characterized by other deformations, which is a characteristic of a particular expression. The problems nearby the representation and classification of static or dynamic characteristics of these deformations of face pigmentation is addressed by Research into automatic recognition of facial expressions.
A convolutional neural organization (CNN, or ConvNet) is a sort of feed-forward counterfeit neural organization in AI. The availability design between its neurons is enthused by the association of the Individual cortical neurons answer to boosts in a confined locale of room which is known as the responsive field. The responsive fields of various neurons to some degree cover such that they spread the visual field. Inside its open field, the response of an individual neuron to improvements can be assessed numerically by a convolution activity. Convolutional networks were propelled by natural cycles. These organizations might be considered as deviation of multilayer perceptron, which are intended to utilize least measure of pre-handling.
Convolution networks have broad applications in video and image detection suggestivesystems and natural language processing. The convolutional neural network is also known as change in variant or space invariant artificial neural network (SIANN). This is named based on its translation in variance characteristics and its shared weights architecture.
LeNet is the example of one of the very first convolutional neural networks. These are the networks which helped in propelling the field of Deep Learning. This innovative work by Yann Le Cun [3][5] was named LeNet5. It was used mostly for character recognition tasks such as reading digits, zip codes etc. The basic architecture of LeNet can be shown asin

Objective
The objective of the paper is to design a Convolutional Neural Network deep learning model which in existent time using the facial expressions, can detect emotion of humans. Facial emotion recognition can be used as a base for many real-time applications, which is the one of the useful tasks of it. As humans can easily identify the emotion of other humans with minimum effort, inn same way, Automatic detection of emotion of a human face is important due to its use in real-time applications.

Literature Review
Two one of a kind processes used for facial features popularity comprise one of a kind methodologies. Dividing the face into separate movement devices or preserving it intact for in addition processing seems to be the primary and the essential difference among the primary processes.
In each of those processes, one of kind methodologies, specifically the 'Geometric based' and the 'Appearancebased' parameterizations may be used.
Rather than following spatial focuses and utilizing development and situating boundaries which differ inside time, shading (pixel) data of related locales of face are treated in Appearance Based Parameterizations. This is so as to get the boundaries that are attempting to shape the component vectors. For a classification problem, algorithms like Neural Network Support vector Machine , NAive BAyes and Deep Learning are used.

Previous Work
On Robust facial expression recognition using local binary patterns, a model has been proposed [1]. Also, a Facial expression recognition system has been built [2].
Many done the work on "Subject independent facial expression recognition with robust face detection using a convolutional neural network" [4].
A Facial expression recognition system has been built upon recent research to classify images of human faces into discrete emotion categories using convolutional neural network [6]. Facial Expressions Recognition system also has been developed using Convolutional Neural Networks based on Torch model [7].
The "feeling" segment contains a numeric code extending from 0 to 6, comprehensive, for the feeling that is available in the picture. The "pixels" section contains a string encompassed in cites for each picture. The substance of this string a space-isolated pixel esteems in line significant request. Document contains just the "pixels" section and your errand is to anticipate the feeling segment. The preparation set comprises of 28,709 models. The test set utilized for the comprises of 3,589 models. The approval comprises of another 3,589 models.

Architecture of CNN
The convolutional neural organization's engineering contains an info layer, some convolutional layers, some completely associated layers, and a yield layer. CNN is designed with some amendments on LeNet Architecture [7] . It has 6 layers without seeing input and output. The architecture of the Convolution Neural Network used here is shown in the figure 1.2. The input layer has pre-determined, fixed dimensions, so the image before feeding into the layer, must be pre-processed. From Kaggle data set, normalized gray scale images of size 48 X 48 pixels are used for training, confirmation and testing. Laptop webcam images are also used for testing purpose, in which face is detected and cropped using Open CV Haar Cascade Classifier and then normalized. x

Convolution and Pooling (ConvPool)Layers
Based on batch processing, convolution and pooling is done. Each bunch has N pictures and CNN channel loads are refreshed on those groups. Every convolution layer takes picture bunch contribution of four measurement N x Color-Channel x width x stature. Highlight guide or channel for convolution are likewise four dimensional (channel width, channel tallness, Number of highlight maps in, number of highlight delineates). Four-dimensional convolution is determined between picture group and highlight maps in every convolution layer. After convolution only changing parameter is image height and width.
New image width = old image width +1-filter width New image height = old image height +1-filter height For dimensionality reduction, after each convolution layer down sampling / sub sampling is done. This cycle is called Pooling. Max pooling and Average Pooling are two notable pooling techniques. In this paper max pooling is done after convolution. Pool size of(2x2)is taken, which parts the picture into network of squares, which are every one of size 2x2 and takes limit of 4 pixels. In the wake of pooling just stature and width are influenced. In the architecture, two convolution layer and pooling layer are used. From the start convolution layer size of info picture cluster is Nx1x48x48. The size of picture bunch is N, number of concealing channel is 1 and both picture height and width are 48 pixels. Convolution with incorporate guide of 1x20x5x5 results picture bunch is of size Nx20x44x44. After convolution pooling is done with pool size of 2x2, which results picture bunch of size Nx20x22x22. This is followed by second convolution layer with incorporate guide of 20x20x5x5. This results picture gathering of size Nx20x18x18. This is trailed by pooling layer with pool size 2x2, which results picture bundle ofsizeNx20x9x9.
x Fully ConnectedLayer This layer is spurred by the manner in which neurons communicate signals through the mind. It gets an enormous number of information highlights and converts highlights through layers associated with teachable loads. Two shrouded layers of size 500 and 300 unit are utilized in completely associated layer. The loads of these layers are gifted by the regressive proliferation of its mistakes and forward engendering of preparing data. The distinction among forecast and genuine worth are assessed, and the weight alteration expected to each layer before is determined by back in the back-spread method. By tuning the hyper-boundaries, for example, network thickness and learning rate, the unpredictability of the design too a straining rate can be controlled. The energy, learning rate, rot and regularization boundary goes under Hyper-boundaries for this layer.
The size of yield is Nx20x9x9 from the second pooling layer and size of information is Nx500of first shrouded layer of completely associated layer. In this way, yield of pooling layer is taken care of to first shrouded layer which is straightened to Nx1620 size. First concealed layer's yield is taken care of to second layer. Second concealed layer's size is Nx300 whose yield is taken care of to yield layer of size equivalent to number of outward appearance classes.
x OutputLayer The Output from the second shrouded layer is associated with yield layer having seven unique classes. Yield is gotten utilizing the probabilities for every one of the seven-class utilizing Softmax initiation work. The anticipated class is the class with the most elevated likelihood.

Results And Analysis
CNN architecture for facial expression recognition is implemented using Python. Along with Python programming language, Numpy, Theano and CUDA libraries were used.
For both convolution layer, training image batch size is taken as 30 and filter map is of size 20x5x5.To validate the training process, validation set was used. In last batch of every epoch in validation cost, validation error, training cost, training error are calculated. The image set and corresponding output labels are taken as input parameters for training.
The training process apprised the weights of feature maps and hidden layers which are based on hyperparameters such as momentum, learning rate, decay and regularization. In this system batch-wise learning rate was used as10e-5, momentum as0.99, regularization as10e-7and decay as0.99999. 6000 pictures were utilized to test the model. The classifier gave 56.77 % exactness. The disarray lattice for seven outward appearance classes is appeared in Table 1.  Above tables showed that this model predicts for anger 246 times true out of 457 images whereas it predicted 1images of anger class as disgust, 45 images of anger as fear and so on for other motions. The true predicted values are the values on the diagonal of matrix for each emotion like for disgust it is 28, for fear it is 194 out of total no of images of that particular emotion. On the basis of confusion matrix we calculate the error rate and accuracy of out model by finding recall, precision and F1-Score.
The precision, recall and F1-score of each facial expression class is shown in Table 2. The general exactness and review are 0.57 and 0.57 individually. The model performs truly well on arranging positive feelings bringing about moderately high accuracy scores for cheerful and amazed. Disturb as most elevated accuracy and review as 0.95 and 0.99 as pictures in this class were over examined to address class irregularity. The exactness of Happy is 0.69 and recall is 0.68. The equivalent can be clarified by the models (6500) in the preparation set.
In addition, the precision of surprise is 0.69 and recall is 0.65 but this is with least examples in the training set. Very strong signals must be there in the surprise expressions. On average, model performance seems to be weaker across negative emotions. Mostly, the emotion sad has a low precision which is only 0.43 and recall is 0.39. The model usually classified fear, angry and neutral as sad which is wrong classification .In addition, the model it is puzzled most while required to predict neutral and sad faces because these two emotions are perhaps the least expressive ones (excluding faces with crying expressions).
The overall F1-score is also 0.57. F1-score is most noteworthy for digust due to oversampling of pictures. Happy and surprise higher F1-score as 0.69 and 0.67 separately. Fear has least F1-score as 0.39 and miserable, outrage and unbiased additionally have low F1-score.

Conclusion
The outward appearance acknowledgment framework introduced in this examination work contributes a strong face acknowledgment model which depends on the planning of social qualities with the physiological biometric attributes. The natural qualities of the human face with pertinence to different looks, for example, satisfaction, misery, outrage, dread, nauseate and shock are related with mathematical structures. This reestablished as base coordinating format for the acknowledgment framework.
Here a LeNet design based six layer convolution neural organization is actualized to group human outward appearances for example glad, shock, miserable, dread, appall, outrage and unbiased. The framework has been assessed utilizing Accuracy, Precision, Recall and F1-score. The classifier prevailing with exactness of 60.37 %, accuracy of 0.57, review 0.57 and F1-score0.57.

Future Scope
In the future work, the model can be extended to color images also. This will allow examining the effectiveness of pre-trained models such as AlexNet or VGGNet for facial emotion recognition.