A Comparative Experimental Study on the Use of Machine Learning Approaches for Automated Valve Monitoring Based on Acoustic Emission Parameters

Acoustic emission (AE) analysis has become a vital tool for initiating the maintenance tasks in many industries. However, the analysis process and interpretation has been found to be highly dependent on the experts. Therefore, an automated monitoring method would be required to reduce the cost and time consumed in the interpretation of AE signal. This paper investigates the application of two of the most common machine learning approaches namely artificial neural network (ANN) and support vector machine (SVM) to automate the diagnosis of valve faults in reciprocating compressor based on AE signal parameters. Since the accuracy is an essential factor in any automated diagnostic system, this paper also provides a comparative study based on predictive performance of ANN and SVM. AE parameters data was acquired from single stage reciprocating air compressor with different operational and valve conditions. ANN and SVM diagnosis models were subsequently devised by combining AE parameters of different conditions. Results demonstrate that ANN and SVM models have the same results in term of prediction accuracy. However, SVM model is recommended to automate diagnose the valve condition in due to the ability of handling a high number of input features with low sampling data sets.


AE Signal Parameters
Acoustic emission refers to the generation of transient elastic waves produced by a rapid release of energy from a localized source within the surface of material, as reported by the American Society for Testing and Materials (ASTM) [16]. In this paper, AE is defined as transient elastic waves produced by the impact of one surface on another in a reciprocating motion. In other words, the transient elastic waves produced by the impingement of the plates inside the valve with the upper and lower plate housing during the reciprocating compressor operation. AE hit has specific parameters related to the signal event. The interpretations of AE parameters often related to the machine condition [17]. In this study, AE parameters have been extracted from the acquired AE hits include amplitude, counts, duration, energy, absolute energy, ASL and signal strength. See Fig. 1 and Table 1.

Support Vector Machine
Support vector machine is a supervised machine learning method that relies on statistical learning theory with an ability to handle high input features with low sampling data sets [18]. This learning technique uses input vectors for pattern classification. During the training process, SVM creates a hyperplane that allocates the majority points of the same class in the same side, while maximizing the distance between the two classes to this hyperplane [2]. See Fig. 2. This hyperplane could be either linear or non-linear, which is also relevant to the kernel function [19]. SVM training seeks a globally optimized solution and avoids over-fitting so that it can deal with a large number of features. A comprehensive description, limitations and drawbacks of SVM method are available in [20,21]. In the linearly separable case, there exists a separating hyperplane whose function: Where: w: weight x: input factor b: bias which implies: · 0 1, 1, … , 2 The SVM algorithm tries to determine a distinctive separating hyperplane with minimizing which represents the Euclidean norm of w; the distance between the hyperplane, by adjusting the data points of each category using 2/ . When Lagrange multipliers introduced, the SVM training process is to solve a convex quadratic problem (QP). The solution employs the following equation: α y 3 Where: : Lagrange multipliers Only if corresponding α 0, this is known as support vectors. During the model training process, the decision function is represented by: In this study, the SVM tries to place a margin between the faulty-healthy data and adjusts it in a way to keep the distance between the data points and the margin as maximal in each group. The nearest data points are used to define the margin and are known as support vectors. However, in most cases the patterns are not linearly separable; therefore, a kernel function is used to perform the transformation. Hsu et al. [22] proposed RBF kernel function to be the first try kernel function for an SVM model. Chen et al. [23] found that RBF kernel gives a better test accuracy compared to the polynomial kernel. Therefore, SVM with RBF kernel function was deployed in this study.

Artificial Neural Network
Artificial neural networks (ANNs) are a family of statistical learning models inspired by biological neural networks of human brain. The ANN may don't have exact precision and of the traditional computing approach; yet, they are sufficient to get close approximations to the system that we have inadequate information to make a suitable solution [24]. ANNs are generally presented as systems of interconnected "Neurons" which exchange messages between each other. A neuron often received several inputs x . Each input should multiply by a value of weight w and then compile with a bias . Thus the neuron's activation will calculate as a result. The activation function to calculate as following: When the activation z is obtained by the neurons, it is then ready to calculate the activation function . There are many types of activations functions and the selection depends on the network application and ANN architect choice. However, the most used activation functions include linear, threshold and sigmoid function. Thus the output of an activation function is the output of certain neuron. Moreover, the ANN consists of interconnection for all neurons that they are arranged in form of layers. The most common ANN includes three layers (input layer, hidden or interactive layer and output layer). In this study, feed forward network was employed. The neurons in each layer are connected to the next layer in one direction. See Fig.3. In this study, the network was trained to receive 9 inputs (duration, count, amplitude, ASL, signal strength, energy, absolute energy, speed and flow rate) and 2 output (healthy and faulty). The data were divided randomly into three groups: 70% as the training set, 15% as the validation set and 15% as the testing set [25]. Training and validation samples were used to develop the model, whilst the testing samples were held out and then applied to the developed model to evaluate the model performance. Once training is complete, a relationship between input and output data can be established. In training stage, the node weighting is adjusted till the value get close to the real value of all available inputs. However, if over fitting is identified, the computational processing will stop. Over fitting takes place when the model is performed well during training; then it starts to decline when tested with hidden data.
Two error values: mean squared error (MSE) and the percent error (%E) were used to check the network during training, validation and testing process. MSE is defined as the average squared difference among outputs and targets. The lower values of MSE are better (when MSE = 0, means no error). The percent error defined as a fraction of samples which are misclassified by ANN (when %E = 0, indicates that no misclassifications happened by the network. While a value of 100 indicates maximum misclassifications). Feed forward supervised network was used to classify the inputs according to targeted classes. Scaled conjugate gradient back propagation (TRAINSCG) was utilized as training algorithm. See Fig. 4. Thus, training will stop when validation data set MSE has increased more than maximum fail times; set to be 6 times, to avoid over fitting [26,27]. Further, hyperbolic tangent sigmoid (TANSIG) has been employed as a transfer function. Thus, the network process input of both negative and positive values in fast time [28]. Equation 6 presented the TANSIG transfer function: The number of neurons in the hidden layer was optimized through changing the number of neurons and trains the network until the highest accuracy and MSE of the network was achieved.

Experimental Procedure
The experiment began by acquiring the AE signal (baseline signals) from the compressor with the valve in a healthy condition. The experiments were conducted in various operational conditions in terms of speed and airflow rate. Thirteen operational speeds ranging from 200-800 rpm (with incremental increasing by 50 rpm) and three flow rates (0%, 50% and 100%) were employed. Speeds were controlled by the speed controller, whilst the flow rates were controlled using a flow metre at the compressor outlet. Next, the experiment was repeated with the same operational conditions but emulated two types of actual faults, corrosion and clogged, individually at the compressor valve (including both the suction and discharge parts). Corrosion was introduced into the valve plates, whilst clogs were introduced into the valve body. Each fault was simulated with different severity levels to simulate progressive fault deterioration. Table 2 illustrates the types of defects with their severities. Fig.6 and Fig.7 illustrates the defects simulation. All defects in the experimental specimens (spare valves) were simulated in advance. Next, the first defective valve was configured inside the reciprocating compressor. The first AE signal was acquired when the test rig was operated at the first speed and flow rate. The test was repeated for the other speeds and flow rate conditions until the signal was acquired for all the operational conditions. Then, the test rig was shut down, and the valve was replaced with the second specimen with another fault severity. The procedure was repeated, and another set of AE signals was recorded. To complete the experimental procedure, the test-rig was operated for 39 different operational conditions (13 speeds × 3 flow rates = 39) and sixteen valve conditions (8 valve conditions ×2 fault locations = 16) for a total of 624 tests. Each test was conducted for 30 sec and repeated 3 times, and the average was calculated. All experiments were conducted at laboratory temperature range between 25-30°C and standard atmospheric pressure. According to hold and train method [25], the data were divided randomly into two groups: 85% as the training set and 15% as the validation set. Training samples were used to develop the model, whilst the validation samples were held out and then applied to the developed model to evaluate the model performance.

Artificial Neural Network Model
The best network was selected according to the network highest classification accuracy and minimum MSE through changing the number of neurons in the hidden layer and trains the network until the highest accuracy and minimum MSE was achieved. Twelve attempts of changing the number of neurons have been investigated to select the network. The MSE value shows a significant decrease when the number of neurons increases until it has achieved a minimum value < 0.01 at the network with 40 neurons in the hidden layer. Next, any try to increase the number of neurons for more than 40

Sup
In this m one feat division, training SVM mo ulted in incre y when the p at. According and output n into health F pport Vector SVM algorit method, the S tures space. , that is, con seeks a glob odel based on easing the va proposed nu gly, the ANN layer: 2 ne hy/faulty with    Table 3 shows the output arguments for SVM model, the support vectors are the range of data points with each row after normalization has been applied. Alpha are the weights values for the support vectors. The sign of the weight is positive for support vectors belonging to the first group (healthy) while negative for the second group (faulty). Bias refers to the intercept of the hyper plane that separates the two groups. RBF kernel has been used as a kernel function. Group names refer to the total data samples. Support vector indices refer to the training data that were selected as support vectors after the data was normalized. Shift refers to the negative of the mean across an observation in training while scale factor refers to 1 divided by the standard deviation of an observation in training. Based on the training data, the overall accuracy for SVM model was 99.4%.

ANN and SVM Models Performance
The ANN and SVM models were validated using 15% validation samples which were separated randomly from the original acquired data set. This method allows the fitted models to predict the valve condition from validation samples. The process was performed many times to obtain distribution of the predictive performance for each model. Thus, if the models classify the data correctly, then the usability of the model in other contexts can be assured. A lack of fit is possible if the model is unable to classify the data. In addition, receiver operating characteristic curves (ROC) is another comparable method to determine models classification ability [30]. The ROC curve is created by plotting the true positive rate (sensitivity) and false positive rate (one minus the specificity). The point on the curve that is nearest to the upper left corner corresponds to maximum sensitivity and specificity of model classification. The classification accuracy for ANN and SVM model are illustrated in Table 4 and Table 5 respectively. Fig. 11 shows the ROC curve for both models.

Conclusion
The performance two machine learning approaches ANN and SVM models has been evaluated for detection valve fault in single stage reciprocating compressor based on AE signal. The experimental procedure was conducted by inducing two typical valve faults with different severity to acquire the combination of healthy/faulty AE signal during different speed and flow rate conditions. Same data sets, operational condition and AE parameters are used in constructing both models. ANN and SVM models have been train and evaluated based on hold and train method by using Matlab. In evaluating the results we consider the total accuracy of the detection and the ease of the model building. ANN and SVM models demonstrated slightly different in term of sensitivity and specificity while they have same ability to detect the valve condition with accuracy of 99.4%. Since, ANN accuracy is highly dependent on the neural networks structure such as number of nodes and hidden layers. While SVM has an excellent ability of handling a high number of input features with low sampling data sets [31]. SVM model is recommended for automate diagnose the valve condition in single stage reciprocating compressor.