The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data

Medication planning aim to get types, amount of medicine according to needs, and avoid the emptiness medicine based on patterns of disease. In making the medicine planning is still rely on ability and leadership experience, this is due to take a long time, skill, difficult to obtain a definite disease data, need a good record keeping and reporting, and the dependence of the budget resulted in planning is not going well, and lead to frequent lack and excess of medicines. In this research, we propose Adaptive Neuro Fuzzy Inference System (ANFIS) method to predict medication needs in 2016 and 2017 based on medical data in 2015 and 2016 from two source of hospital. The framework of analysis using two approaches. The first phase is implementing ANFIS to a data source, while the second approach we keep using ANFIS, but after the process of clustering from K-Means algorithm, both approaches are calculated values of Root Mean Square Error (RMSE) for training and testing. From the testing result, the proposed method with better prediction rates based on the evaluation analysis of quantitative and qualitative compared with existing systems, however the implementation of K-Means Algorithm against ANFIS have an effect on the timing of the training process and provide a classification accuracy significantly better without clustering.


Introduction
Medicines is one of the items needed by the sick community [31]. The sale of medicines in the hospital 50%-60% affects the budget and expenditure of the hospital, so it is needed to predict the optimal medicine needs and because of an error in planning will affect the medical and economical treatment [31]. Medicine planning aims to get the type of the number of medicines in accordance with the requirements and avoid the void based on the pattern of disease and the frequency of the number of patients visit [31]. In general, morbidity method is used for planning of medicines because the expected needs approach the truth and support the improvement of the use of medicines appropriate pattern standard treatment [31]. But it needs a long time and skilled labor, it is difficult to obtain definite disease data and need good recording and reporting [31]. Easy consumption method used does not need data of disease and lack of or excess medicines very small, but it is difficult to get medicine data and the number of patients contacts, lack, excess and loss of medicines are not reliable [31]. Composite method is a method of improvement of the weakness of the method morbidity and consumption, uncertainty number of visits and the frequency of illnesses make this method is less accurate used for the prediction of medicine needs.
Fuzzy Inferences System (FIS) can resolve the problem of uncertainty factors [19], but membership function dependence on the FIS affect the result of the decision-making [32]. Divination with a combination of data mining such as Genetic Algorithm with BP Neural, SOP with Neural Network, SVR with RBFNN outlined on [30] and ARIMA with Neural Network [7]. Neural Network (NN) is a method used by many researchers for divination or prediction [35], NN is better and accurate for divination gold prices compared with ARIMA [13], the ability of NN generalize, identify non-linier relationship and implementation in various applications are very useful [9]. But NN has the weakness of the need for large training data and requires a long computing time to identify the pattern on the data in high dimension [27].
Adaptive Neuro Fuzzy Inference System (ANFIS) is one of the algorithm that combines fuzzy system with artificial nerve network [1]. ANFIS method can be used for divination or prediction [25] [18], ANFIS method is more accurate in predicting compared ARIMA method and NN [33] [24]. But the characteristics of the data used for training will affect the structure of the ANFIS then it affects the accuracy prediction [14].
Clustering is one of the techniques for the classification of objects or patterns in one association so each cluster possess cluster similarity of one another [20]. There are several algorithms in clustering such as K-means, Fuzzy C means (FCM) and Possibilistic c-means (PCM) [26]. FCM is one of the cluster algorithm optimal enough in the classification, but it needs a long computing time [16]. PCM is very strong against noise [17], but it's too sensitive to the cluster initialization [22]. The K Means Clustering is the algorithm that efficient in grouping a large data set and fast in the computing [16], but limited to numeric data [10].
From the results of several research, combination of K-means Clustering and ANFIS Algorithm is proposed for the prediction of medicine needs. We know with good prediction results will greatly affect the decision-making to improve the medical and medicine need budget more optimal. Data analysis framework needs prediction of medicine based on ANFIS method. The Data is derived from the hospital database. The first stage we use the year 2015-2016 medical data sources to predict the need of medicines with standard ANFIS structure and then calculated the value of the Root Mean Square Error (RMSE) for training and testing. For the second approach, we still use ANFIS algorithm, but after the process of grouping with the algorithm K-Means. The second is the end result of this approach will we compare to know the impact of the classification of accuracy ANFIS prediction. This paper is organized as follows: Part 2 related research. In section 3 the methods used. Section 4 explains the model used, the results and discussion are presented in section 5 and section 6 conclusion.

Related Research
Research of Zubir Haider Khan [23], used Artificial Neural Network (ANN) for stock price prediction using two models, one model is for training by using backpropagation and second model is multilayer Feedforward network for testing. Fauzi Yudhi Septiawan [15], applied the Genetic Algorithm to predict the movement of the currency Forex which circulates in Indonesia, USDJPY USDCHF, GBPUSD and EURUSD. Imam Baihaqi Siregar et all [8], performed the comparison method of single ICE, double exponential smoothing holt, triple exponential smoothing based on Exponential Smoothing for palm oil production forecasts and analyzes the accuracy of the model and characteristics of production data forecasts, result of divination single ICE lower based on the value of the RMSE. David Ijegwa Acheme [2], developed a share price prediction using Fuzzy Logic method for decision-making sale and purchase of shares based on the market trends with four indicators are moving average convergence/divergence (MACD), relative strength index (RSI), stochastic oscillator (SO) and on-balance volume (OBV).
Mustain Billah et all [8], did research prediction of share price closing with ANFIS method and ANN on the case of a Securities Exchange Dhaka (BED). Paul Sunjay [28] using ANFIS approach for optimal supply forecasts. Research of Abbasi [1] for stock price prediction on Iran Khodro Coporation on long-term period by using the triangle membership and four independent variables namely dividends per share (FIXED), price to earning ratio (P/E), closing price and the stock price are used as the independent variables short term. Research of Hien Nguyen Nhu [25], applied the combination of Firefly and ANFIS algorithm. Network model ANFIS trained using the Firefly algorithm and assigned to predict the share price in Vietnam Stock Market with the purpose of comparing the performance among ANFIS with Hybrid Algorithm, Back Propagation and Particle Swarm Optimization (PSO).
Research of Osman assured Hegazy had [18] proposed the combination of Genetic Algorithm (GA) with Adaptive Neuro Fuzzy Inference System (ANFIS) for stock price prediction for optimizations. The combination of the previous stock prices as input model for the best stock trend prediction on the next day to see the results of the mean square error (MSE). GA optimize fuzzy paremeter to overcome constraints statistics method. Research of Georgia Makridou [24] used ANFIS method for predicting the price of gold and compares with the method 'Buy and Hold' (B&H) strategy. The result of the test accuracy of ANFIS is superior performance (B&H). Ehsan Lotfi [21] proposed ANFIS method can be applied to the oil price forecasts earth by considering input parameters on ANFIS network.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS is Neuro-Fuzzy approach which was introduced by the outskirts (Outskirts et all 1997,1993) which explores fuzzy system Takagai and Sugeno model (1985) and can be used to control application, prediction or divination. ANFIS architecture is similar with neural network, radial function and a little certain restrictions. The process of learning neural network is used to optimize the parameter values of fuzzyfication. In Neuro-Fuzzy consists of five layers of process. The first layer is the parameter of the membership function fuzzy that his nature is nonlinier association of Exodus system. The learning process on the parameters us error Back-Propagation (EBP) to renew the value of parameters. On the fourth layer, the parameter is a linier parameter against the exodus of the system that makes up the basis of fuzzy rules. On this layer is using Least-Squares Estimotor method (LSE).

K-Means Clustering
The K means clustering is one of the most popular clustering algorithm used because it has simple algortihm, easy to implement and efficient in complexity [5]. Algortihm K-Means determines the cluster of the object based on the attribut/features from the object into the cluster (K). These groups differentiated by its center. This algorithm is very sensitive to the placement of the initial values cluster center. The determination of the partition number of cluster (k) is very important in this algorithm, but there is no terms applicable to determine the amount of cluster(K) formed [34]. The steps in the algorithm K-means are as follows: 1. Prepare the training data. 2. Set the value of the K cluster. 3. Set the initial values centroids. 4. Count the distance between the data and using the formula (Euclidean centroid Distance). 5. Data partition based on the minimum value. 6. And then perform iteration during the data partition is still moving (no longer moving objects to other partitions), when it continues then to points 3. 7. If a group of current data is similar with the previous one, then stop iteration. 8. The data has been partitioned according to the value of the final centroid.

Accuracy Calculation
ANFIS method prediction is based on the training and testing using the Root Mean Square Error (RMSE). RMSE is an alternative method for evaluation techniques of divination. If the value of the RMSE is getting smaller, and the estimation of the model or variable is more valid.

The proposed Model
The proposed model for the prediction of medicine needs at the hospital is based on the results of these observations from two hospitals. In the figure 1 illustrates the framework of research. The first step is to make observations on the private and government hospital, the results of the interview with the head of IFRS, pharmacy and director become as input material, there are three main categories which are used as indicator parameter namely patients, diagnosis and inventory data. The purpose of this research is to see the influence of the classification of the accuracy of the prediction of ANFIS method sugeno model order one, therefore it uses two different approach. The first stage approach ANFIS method directly applied to perform the training and testing data based medical data. For the second approach, data are grouped with the algorithm kmeans as input parameters. The second result of this approach will be made as the material inputs in planning medicine IFRS needs.

The results and Discussion
Prediction of medicine needs in IFRS uses the data result of observation at the hospital in 2015 and 2016. The determination of the medical data variable consists of gender, age and ICD patients.

Prediction with ANFIS without grouping
The training and testing on the first approach directly applies ANFIS method on the data sources with five indicator inputs namely gender, age patients, diagnosis code, long hospitalized, stock, and one of the exodus. There are 34 rules used on the model of the ANFIS Sugeno Order one. In the first stage is to seek the learning rate for optimal learning. This testing uses the amount of data trained as much as 50% of the total data. RMSE produced by the least is 0.004. The value of the smallest RMSE will be used for the test RMSE and accuracy with a combination of data trained and test data. For the test RMSE and accuracy on the training used the amount of data as much as 85% of the total amount of data. To test the data used 70% of the total amount of data. Each combined data is done by the repetition of ten times to take the value of the RMSE and the accuracy of the test results. The results of the training and testing can be seen in figure 2 and 3.

Prediction with K-Means ANFIS (K-ANFIS)
The second approach to prediction of medicine needs is done with the grouping of data. The Data is grouped with the algorithm k-means and then applied to the ANFIS. In table 5 is the result of a cluster of medical data with three groups, namely gender, ICD and age. The process of classification is used in two stages, the first phase of the medical data grouping is 2015 and the second grouping is 2016. Cluster3 is the most dominant group in the determination of the medicine used, it is very reasonable for an indication of the highest determination of the medicine is the result of disease diagnosis from the doctor. Cluster1 is the lowest domain because there are no rules of the determination of the medicines for gender. While in cluster2 is the classification based on age with domain number 2 is the most dominant, this is due to the fact that there are different types of medicines for each of the age of the patient.
The process of computing time on the classification of this data is still too high, this can be seen the result of time which is used for the classification of a domain takes 0.09 seconds while the lowest 0.02 seconds with average 0.05 seconds for one-time iteration from 60 percent of the total data. The number of the cluster of the international Classification of Diseases (ICD) produced 139 clusters isn't defined and 23 isn't detected for data 2015, while 2016 produced 145 clusters and isn't known 12 clusters. On the table 1 explains the distance and the ratio between the cluster.  Based on the results of the medical data classification (gender, age, ICD) will be made as ANFIS input parameters for training and testing the predictions of medicine need. The test result can be seen in figure 6. In the figure 6 shows the prediction results with applying k-means algorithm for data grouping, significant prediction is more accurate and more quickly in the training process and testing. In the figure 7 and 8 the results of prediction of the comparison ANFIS with K-ANFIS for prediction of medicine needs 2016 and 2017. the results of prediction is accurate enough. The second approach is by applying the algorithm kmeans of ANFIS which is having an effect on the time of the training process and it provides the accuracy of classification which is significantly better than the first approach without grouping, but the computing time used the algorithm k-means is still too high with an average of 0.05 seconds for each iteration so that it needs further research for computing time optimization by applying the other classification algorithm.