Identification of hand motion using background subtraction method and extraction of image binary with backpropagation neural network on skeleton model

Capturing and recording motion in human is mostly done with the aim for sports, health, animation films, criminality, and robotic applications. In this study combined background subtraction and back propagation neural network. This purpose to produce, find similarity movement. The acquisition process using 8 MP resolution camera MP4 format, duration 48 seconds, 30frame/rate. video extracted produced 1444 pieces and results hand motion identification process. Phase of image processing performed is segmentation process, feature extraction, identification. Segmentation using bakground subtraction, extracted feature basically used to distinguish between one object to another object. Feature extraction performed by using motion based morfology analysis based on 7 invariant moment producing four different classes motion: no object, hand down, hand-to-side and hands-up. Identification process used to recognize of hand movement using seven inputs. Testing and training with a variety of parameters tested, it appears that architecture provides the highest accuracy in one hundred hidden neural network. The architecture is used propagate the input value of the system implementation process into the user interface. The result of the identification of the type of the human movement has been clone to produce the highest acuracy of 98.5447%. The training process is done to get the best results.


Introduction
Computer vision a field that aims to make a useful decision about a real physical object and state based on an image. Computer Vision is a combination of image processing and pattern regocnition the output of the computer vision has been processed into the result of image understanding. Development of this field is done adapting the ability of human vision in taking information. In the dicipline, computer vision deals with supporting theories such as artificial intellegent can extract information from the displayed image. Image data can be obtained with some from video, with some camera or multidimensional data from scanner. Area contained in computer vision include: pattern recognition,and classification, image segmentation, image restoration, and other area related to computer vision. Digital Processing is widely used to study matter relating to image quality improvement contrast enhacement, image transformation, colour, image restoration, selection of image characteristic (feature image) is optimal for the purpose analysis, making the process of with drawing information or object. Description or the introduction of objects contained in the image, compression, or reduction and for data processing time and input from image processing result [13]. The feature extraction process in image processing is used to perform the recognition process the objects in the image, or the video used by using feature extraction can be known to the existing classes and search for significant feature areas on an image or video object on the intrinsic elemen used [14]. Feature Extraction is general used to perform the identification process of the charateristic can form the best representation of an object to distinguish one object from another object. The feature extraction of forms is used to match an image or object with another object exists with the region used, the process of feature extraction involves the computation of a number of feature characteristic, values of an object shape independent of size or orientation, form feature extraction can be calculated for each object can be identified on each image or image stored, two types of feature are used global features (feature extraction that include aspect ratio, cilcularity and moment invariant) local feature extraction is a sequential boundry segment). Background method is one of the methods used to perform the separation process of one object with another object, doing background image reduction process. Background subtraction known as foreground detection is on of the tecniques in digital image processing and computer vision to perform the process of detecting or detecting an image to detect or retrieve the foreground of the approriate background based on human objects, text and another object movements [3]. In general background subtraction method is used to perform the process of moving object detection of an image or object on the video based on the difference between the reference background with the frame or reduce from the video, identification process is a determination of the identify of people, and objects [15] [10].The definition of identification in general is the provision of sign on the class of goods or something with the aim of providing one component with another. Artificial Neural Network is one system that process information that has information idential with biological nerves. Artificial Neural Networks are used as pattern regocnition, signal processing and forecasting [6].

Digital Image Processing.
Digital image processing is a dicipline that studies things to image quality improvement(contrast enhacement, transformation, colour( image restoration ), image transformation, selection of image feature ( featuring image ) is optimal for the purpose of analysis, making the process of with drawing information or object description or the introduction of objects contained on the image, perform compression or reduction and for the purposes of data storage, data transmission, and data processing time.

Computer Vision
Computer Vision is a combination of image processing and pattern regocnition, the output of the computer vision process has been processed into the result of image understanding. Area contained in the computer vision include recognition, motion, image restoration, and other areas related to Computer Vision the most common function performed by include: image acquisition, pre processing, feature extraction, detection, segmentation, high level processing, decision making [7].  Figure 1. are the stage of image processing, the initial stage is to acquire the image with the aim of obtaining a digital image to determine the data needed and choose the method of recording digital image at this stage steps taken to take the object to be taken pictures, and also to the preparation of tools, actors to imaging, the result of image acqusition is strongly influenced by the ability of the sensor to perform the process of digitizing the signal access. Preprocessing is related to image quality improvement, noise remove process, image transformation, determined the image to be observed. Segmentation is a stage that aim to partition the image into sub section that have important information such a sparating objects and backgrounds. Representation and description by representating a region with a list of coordinate points in a close curve and performing image description by performing feature selection and feature extraction. Recognition and Interpretation provide labeling objects whose information for the interpretation to giving meaning to groups of recognition objects. The last stage useful for based knowlegde to guide the operation of each process module and the performa the process of controlling the interaction to template matching or pattern recognition.

Background Subtraction
Background subtraction method is on of the methods used to perform the sparation process of one object with another object by doing background image reduction process. Background subtraction known as foreground detection is one of the techniques in digital image processing and computer vision to perform the process of detec or retrieve the foreground of the appropriate background based on human object, text and other object movements [3]. In general background subtraction method is used to perform the process of moving object detection of an image or object on the video based on the difference between the reference background with the frame produced from the video.

Artificial Neural Network
Artificial Neural Network is one system process information that has information identical with biological nerves. Artificial Neural Networks are used as pattern recognition, signal processing and forecasting [6]. The feature extraction process in image processing is used to perform the recognition process and the classification processing of objects in the image, or video used by using feature extraction can be known to exsiting classes and search for significant feature areas on an image or video object and rely on the intrinsic elements used to perform the identification process of the characteristics can form the best representation of and object to distinguish one object from another object, the feature extraction of form is used to match an image or object exists within in the region used, the process of feature extraction involves the computation of a number of feature characteristic values of an object shape independent of size or orientation. Feature extraction can be calculated for each object that can be identified on each image and on image stored.

Skeletonization
There are saveral ways used to form skeleton. Skeleton is a unique form of an object, which the order of an object. One way to get skeleton is the trough thinning. Thinning is a morfological operation is use to minimizing the geometric size of an object with the end result of a skeleton. Skeleton using an image to express the topology and characteristic model a grass field [16]. 3. The other device used is Camera support smartphone 8MP resolution, Video acquisition using the camera that has been prepared. 4. Conducting video frame extraction process to image. 5. Perform background subtraction process on object component to obtain a binary image 6. Conducting the process of skeleton and do the process of extraction of binary image characteristics using invariant moment.

Data Acquisition
In this research the data acquisition process is done using camera 8 Mega Pixel resolution smartphone. Data is a video that contains the movement of humans with various movement including hands down, hands to side and hands up. The video specification of the acquisition results are shown in table 2 below

Video Frame Extraction
The next acquisition video is extracted into multiple frames. In this study, the total frame of extracted video totaled 1444 pieces. In each frame extraction results are then performed image processing to detect the type hand movement. The following is a coding for extracting video frames:

Image Processing
Image processing is performed on each frame of extraction result in order to identify the type of human movement. Stages of image processing implemented include: image segmentation, feature extraction, and identification

Image segmentation
In this research, image segmentation is done using background subtraction method to separate between foreground / human and background / object other than human. The steps of image segmentation are as follows: 1. Define the frame containing the background (background frame) and the frame you want detect (current frame The display of image segmentation process shown in Table 4  Table 4. Display Image Segmentation Process

Feature Extraction
After the object in the image separated with background then performed the feature extraction step. Extracted traits are used to distinguish the type of movement between one object to another object. In this study feature extraction is done through morphological analysis (form) based on seven invariant moment values. The values are extracted from each image to distinguish the 4 output classes: class no object, hand-down class, hand-to-side class, and hands-up class. The calculation of the invariant moment value is based on the set of moments of a function f (x, y) of two variables defined as follows: x p y q f(x,y)dxdy pq= 1,2,3,… The two-dimensional moment with the order (p + q) of a digital image of size M x N is defined as: The seven values of the moment do not change with respect to translation, scale change, reflection, and rotation. The extracted moment invariant is then used as input in the identification algorithm. Image processing is performed on each frame of extraction purpose to identify the type of human movement. Stages image processing implemented among others are: image segmentation, feature extraction, and identification. Image segmentation are done using background subtraction method to separate between foreground human with background (object other than human).

Identification process
The artificial neural network used consists of three layers that include one input layer, one hidden layer, and one output layer. The identification process is performed to identify the type of human movement based on the input value in the form of seven invariant moment values previously extracted. The identification process is done using a back propagation neural network algorithm with an architecture like that shown in Figure 3 below: Figure 3 Identification process using Artificial Neural Network Back Propagation The training process uses the function activation of bipolar sigmoid on layer hidden and training functions Levenberg-Marquardt. In this process the seven invariant moment values are propagated progress through the initial weights had previously been initialized. Propagation the input value is made the neuron contained on the layer hidden. After arriving at the layer hidden, the total value by received by each neuron is processed use the activation function. So obtained the value of neurons in the layer hidden The value of the neuron then again flipped towards layer output to obtain values output. The output value is compared with the target value. If the resulting error is smaller than target error that has been previously set, then the propagation process will be Stop. But if on the contrary, is done back propagation process by updating weight value. The back propagation neural network algorithm is used as follows: Algorithm 1. Training Step 1: Initialize, initialize all the weights on the hidden layers and output layers, and define the activation functions used at each layer and the rate of learning. Initialize weights use random numbers within range in a small range.
Step 2: Activation, activate network by applying input and expected ouput a. Calculating the output obtained from neurons in hidden layers. b. Calculating the output obtained from neurons in the output layers Step 3: The weights are updated when the error is receded in Artificial Neural Network the returned error corresponds to the output signal. a. Calculates the gradient of errors for neurons in the output layer, calculating weight correction. Updating weights on output layer neurons: b. Counting gradient errors for neurons in hidden layers: calculating weight correction, updating weights on hidden layer neurons Step 4: Iterate, raise one for the iteration p, goback to step 2 and repeat the process until the error criteria is reached. The artificial neural network training process is done by varying the number of neuron in the hiden layers, the result obtained are shown in Table 6 2nd International Conference on Computing and Applied Informatics 2017 IOP Publishing IOP Conf. Series: Journal of Physics: Conf. Series 978 (2018) 012020 doi :10.1088/1742-6596/978/1/012020 Figure 5. User interface down hand class Figure 6. User interface site to side hand movement Figure 7. user interface hands up movement 4. Conclusions 1. Background subtraction method in this research is used to separate between foreground (human) with background (object other than human), and succeed are done by producing good skeleton. 2. Video frame extracted amounted to 1444 pieces. Each frame the extraction results, then performed image processing to detect the type of hand movement, with the process identification using back propagation neural network using 7 pieces invariant moment value input to produce an accuracy of 98.5557% for the process of identification of hand movements based on 4 pieces of class, one class no object, hand-down class, hand-to-side class and hands-up class. 3. Further development can be tested with various movements, and types of videos with various formats, so it can be compared the results and can be recommended the result with high identification and accuracy value and better, and for future work can be done with other action for identification.