Mining association rule based on the diseases population for recommendation of medicine need

Selection of medicines that is inappropriate will lead to an empty result at medicines, this has an impact on medical services and economic value in hospital. The importance of an appropriate medicine selection process requires an automated way to select need based on the development of the patient's illness. In this study, we analyzed patient prescriptions to identify the relationship between the disease and the medicine used by the physician in treating the patient's illness. The analytical framework includes: (1) patient prescription data collection, (2) applying k-means clustering to classify the top 10 diseases, (3) applying Apriori algorithm to find association rules based on support, confidence and lift value. The results of the tests of patient prescription datasets in 2015-2016, the application of the k-means algorithm for the clustering of 10 dominant diseases significantly affects the value of trust and support of all association rules on the Apriori algorithm making it more consistent with finding association rules of disease and related medicine. The value of support, confidence and the lift value of disease and related medicine can be used as recommendations for appropriate medicine selection. Based on the conditions of disease progressions of the hospital, there is so more optimal medicine procurement.


Introduction
The hospital is one of the health healing and recovery facilities for the patient [1]. Most hospitals in the implementation of health care activities have implemented Hospital Management Information System (HMIS), so in health services to patients have been recorded in the database, ranging from registration of the process of payment of health costs. However, the implementation of the process of medicine management in the hospital pharmacy installation is still not optimal because there is still often a vacancy at medicine stock. Medication is a good need for sick people with 50-60% of the overall budget of the hospital which is used at medication and medical equipment. Management of non-optimal medicine handling will adversely affect the hospital both medically and economically [2].
Selection of medicine need is the first phase at the cycle planning of procurement medicine [3] in [2]. The process of medicine selection is based on the patient's disease population. The patient's disease population may change which based on the patient's diagnostic volumes of data stored in the database, so it is need to require an automated way to select medicine requirement based on disease progression. Data mining is one of useful technique for extract and defining patterns of datasets in databases into information [4]. Application of data mining methods of the health field is proposed by many researchers, such as predicting heart disease [5], [6], health insurance [7], disease identification 2 1234567890 ''"" MECnIT IOP Publishing IOP Conf. Series: Journal of Physics: Conf. Series 1007 (2018) 012017 doi : 10.1088/1742-6596/1007/1/012017 [6], hypertension [8], insurance cheating, low cost patient medical solutions, related disease detection, treatment and other treatment methods [9].
Apriori is one of the Mining Association methods of Data Mining to find all related adjustment items in a database transaction that fill minimum of rules and limits or another limit [10]. Apriori algorithm that is proposed by many researchers like finding the associaton rules of Chinese traditional medicine [11], Chinese herbal medicine [12], diagnosing patients diseases with hypertensive symptoms [8], disease identification [6], detection of heart disease factors for men and women [5], identification of symptoms with traditional Korean therapy [13]. Apriori algorithm can reduce the number of candidates that must be calculated by butchering method, it has good performance [14], with the required scanning process of each iteration will increase the high time computation.
Some researchers use comparative methods to optimize time computation of apriori algorithms, one of them is the application of the k-means clustering method and it proves to be very accurate [15], [16], [17]. K-Means Clustering is one of the hard partition classification techniques, it is efficient in grouping large and fast datasets in [18] calculations, but limited at numerical data [19].
At this research, we analyze the patient prescriptions based on a doctor's diagnosis in the hospital database to identify the relationship between the disease and the medicine used by the physician in treating the patient's illness. The application of the k-means clustering method was used to find the 10 more dominant diseases in health dataset in year 2015 and 2016, then we use an apriori algorithm to find useful relationships and information between disease and related medicine based on support, confidence and lift values. Our paper is structured as follows: section 2 of the related research, the proposed method is described in Section 3. Section 4 is results and conclusion is in Section 5.

Related research
Research [15], applied a combination of k-means clustering algorithms with a priori in consumer data to find association rules, from the results of the study of consumer data clustering with k-means algorithm showed a significantly better and consistent influence on a priori based on value support. So it provides useful information service providers to offer the right products / ads to the right consumer. [16] Applied on clustering algorithm is to improve the performance of a priori algorithms to find solutions by generating different items each site on a cloud-based network. [17], proposed by a novel method of clustering using k-means and a priori with the aim of allocating unique id for the object of the cluster. Each objects of the group has a certain position which may vary depending on the circumstances, the id is allocated then applied to k-means clustering method along with the a priori algorithm. [20] Applying k-means clustering is to analyze goals treatment in breast cancer based on user behavior, datasets using UCI with 569 data and 32 attributes. [11] proposed modification of the Association Rules Mining method to study the structural character of the Traditional Chinese medicine (TCM) pairs with a dataset source of 625 medicine data onto 347 medicines and 5 types of cold, hot, warm, cool and normal properties. The application of a priori algorithms is used to find out some specific medicine or properties more commonly used in medicine pairs by comparing a priori by proposed method, from the test results based on statistical tests, optimal proposed method finding association rules on medicine than previous methods.
[ 21] proposed the mining of health data to find the pattern of illness that occurs to patients by seeking symptom relation of disorders in the medical database. Yan Yan's research, Wang Chunyan, Li Min developed a multi-model based on a priori algorithm at the hospital to extract data from the database to produce useful information in medical decision making, [20] proposed a priori algorithm to find the characteristics of headache on traditional medicine, making it easier about doctors' decision on recipes for various kinds for headache sufferers.

Methodology
Apriori Algorithm is the most famous algorithm for finding patterns of a database that has a frequency or support above a certain threshold called the minimum support term. A priori algorithm consists of several stages of iteration, each iterations will generate a calculated frequency pattern by scanning the database to obtain support of each items, items that have support above the minimum support are selected into high frequency patterns of length one or often called 1-itemset . K-itemset is a term of a set consisting of k items. In the second iteration process will produce 2-itemset which each set has two items [22]. The rules of association are the implications of the form X → Y, where X is the antecedent and Y is the consequence of the rule. Thus X∩Y = Φ. The support of the item set is defined as the ratio of the number of transactions containing items set to the total number of transactions. Trust of association rule X → Y is the probability that Y transaction contains an association rule s mining X algorithm. Support ot the association rule X→Y : (1) Confidence of association rule X→Y: (2) Lift, also known as interest of association rule X →Y ∶

(3)
K-Means algorithm is one of the most popular clustering algorithms used cause it has a simple algorithm, easy to implement and efficient in its complexity [23]. The grouping of k-means is based on proximity to each other according to the Euclidean distance. It takes k as an input parameter and partition a set of n objects (1) from k cluster. The average value of the object (2) is taken as the resemblance (3) to the parameter to form the cluster. Cluster mean or center is formed by random selection of object k. Comparing most similarities (4) of other objects is assigned to the cluster. For each data vector the algorithm calculates the distance between the data vector and each clan centroid using the equation [24]. The steps in the K-means algorithm are as follows:  In this study we used patient prescription datasets in 2015 and 2016 from two hospitals, we apply these datasets source to the MySQL database to facilitate the process of cleaning and transformed data. After the process of cleaning the noise data, we concluded 651.378 prescriptions with 12.015 patient data and 1.945 medicine type data for 2015, when the 2016 patient prescription dataset amounted to 956.152 prescriptions, 18.416 patients and 1.835 medicine type, like at the first table. The main objective of this study was to classify the 10 more dominant disease populations based on the patient's disease progression using the k-means algorithm on the patient prescription dataset. From this clustering, we apply an a priori algorithm to establish the relationship between disease and related medicine based on the value of support, confidence and lift. This knowledge can be a recommendation of appropriate medicine selection of the procurement in medicine to be more optimal to avoid the occurrence vacancy at stock of medicine in pharmacy hospital.

Results and Discussion
Tests were conducted to find the association for disease with medicine in patient prescription dataset, on dataset 2015 and 2016 used, we made as material Analysis and identification process of patients disease pattern. The initial step of the patient's prescription dataset will be grouped w ith the k-means algorithm of the dataset by 2015 consisting of 651, 378 prescriptions, 12,015 patients and 1,945 medicines, when the 2016 patient prescription dataset consists of 965,152 prescriptions, 18,416 patients and 1,835 medicaments. In grouping the disease, we use 3 (three) variables, those are age, gender and disease. Gender (man and women), age (infant, toddler, children, adult and elderly) the disease variables use ICD10 with 21,591 kinds of diseases. In table 1 is the result of cluster and number of instances, table 2 results cluster of attributing to patient prescription dataset 2015 consisting of 10 clusters.  Based on the results of the grouping of 10 of the highest diseases in the dataset of 2016, we used association rules to find the relationship between medicine-related illnesses by forming binary matrices in which columns were medicine and rows represented 10 of the highest diseases and each cells had 0 and 1. We analyzed without considering the dosage and the way of the medicine used due to varying doses. Table 5 shows the support, trust, adoption of association rules between the top ten diseases (antecedent) and the related medicine(consequent) which has a minimum limit value of Support 20% and confidence 65%. Trust and lift values can be used to assess the rules of association. Medicine that has high confidence and lifting values has a relationship of diseases such as illness with ICD code H26.9 have a trust value and support for the medicine Ciprofloxacin 500 Mg Tablet means Ciprofloxacin 500 Mg medicine the tablet is most commonly used for unspecified Cataract disease, but the value of lift to this medicine is relatively low. This implies that the Ciprofloxacin 500 Mg Tablet medicine is often used in other diseases, the Polidemisin Eye Drop medicine has a high lift value, it means that the Polidemisin Eye Drop medicine is a special remedy for unspecified Cataract disease.

Conclusion
The application of a priori algorithms in this study aims to extract useful information about the patient prescription database sourced from two different hospitals. We use Association Rules to find the relationship between disease and related medicine based on the grouping disease using k-means algorithms. From the results of the tests, the k-means algorithm accurately classifies 10 dominant diseases in patient prescription datasets in 2015 and 2016, thus significantly affect the a priori algorithm, it is more consistently to find association rules between disease and related medicine. The value of support, confidence and lift between medicine related diseases can be useful as a recommendation of appropriate medicine selection based on the condition of disease progression of the patient, so the procurement of medicine in the hospital is more optimal.