Comparative analysis of machine learning techniques based on chronic kidney disease dataset

Kidney failure is one of the chronic diseases that becoming a very common health issue in the world. It is a state in which kidney is damaged and cannot clean blood as well as they should. Excess fluid and waste may cause more health issue in our body and takes long period to diagnose. Moreover, with no early symptoms, the disease is detected only at a later critical end stage. As this Chronic Disease is becoming threat in today’s world, research is being done on a large scale to predict the presence of this chronic disease using machine learning. Machine learning is playing such a tremendous role in healthcare system like identify diseases and diagnoses, drug discovery and manufacturing, smart health record with the use of various techniques like Support Vector Machine, Decision Tree, Naïve Bayes, Random Forest etc. This paper comparatively analyzes the accuracy of pre-existing techniques for prediction of chronic kidney disease based on data from various research papers. Furthermore, this study also considers the different attributes from either already existing database or from real life database by using multiple techniques of machine learning. It is concluded that working with real life datasets with all possible attributes taken into consideration yields the accurate prediction for the presence of chronic kidney disease using machine learning.


Introduction
Kidney is very essential part of our body or we can say that we cannot survive without kidney because it filters waste and excess fluid from our body. But now a days kidney failure becoming a very common health issue in the world. It is a chronic disease means long lasting or takes long period to diagnose [1]. Chronic Kidney Disease is a state in which kidney is damaged and cannot clean blood as well as they should. Excess fluid and waste may cause more health issue in our body and this problem threatens all over the world [2]. As kidney disease progresses and irreversible destruction for our body, it may finally lead to kidney failure, which requires dialysis or a kidney transplant for survival In India 1 in 10 people estimated to be suffering from chronic kidney disease. Early analysis can lead to prevention of kidney failure. The best way to evaluate kidney function or guess kidney disease stages are to observe the Glomerular Filtration Rate (GFR) regularly [3]. GFR is calculated using age, gender, race and blood creatinine level. Kidney disease is usually asymptomatic early and can go undiagnosed until it develops. It is often referred as silent disease [4]. x Stage 1: Normal kidney function stage is when calculated GFR is greater than 90 ml(GFR>90ml). x Stage 2: If approximate GFR is in between 60-80 ml then minor loss of kidney function started x Stage 3: If calculated GFR is in between 30-59 ml then minor to harmful stage is to be considered. x Stage 4: If approximate GFR is in between 15-29 ml then we can predict that the major loss of kidney function started. x Stage 5: If calculated GFR less than 15 ml(GFR<15 ml) then this is the last stage of kidney failure which needs dialysis or transplant of kidney for survival.
Dialysis: When 90% of your kidney function fail then you require a artificial process to clean your blood from spare fluid, we have two process for this implementation. When an artificial kidney is used to fritter away fluid from your blood then this is called hemodialysis [5]. But when through pistula a tube pipe is fit in your arm or belly to purify your blood then this called peritoneal. Both the process is performed under the doctor supervision.

End-Stage Renal Disease (ESRD):
When transplantation or dialysis is only option to treat complete and stable kidney failure.

Glomerular Filtration Rate (GFR):
The rate at which the kidneys filter out waste and surplus fluid; calculated in millilitres per minute.
Proteinuria: When urine contains higher levels of an unusual protein called albumin. Though, Kidney disease is asymptomatic in early stage but if we are able to diagnose this disease in early stage through accurate medical action then we can prevent chronic kidney [6]. If a professional system guess the kidney disease by examine the patient's symptoms like blood, urine test etc [7]. Then it gives doctors a proper time to heal or cure the patient disease on time. In current scenarios machine learning algorithms are playing important and accurate role in healthcare sector.

Machine learning
Machine Learning is outstanding within Artificial Intelligence area, which focuses more on to develop system which performs intelligently and also it has covered the largest real life impact for business . The main function of ML is to enable machine to work in self learning mode without being programmed explicitly, With the help of its algorithm implementation it make program able to learn, grow and change by themselves when exposed to new data. If the measurable performance of some task that is assigned by computer program is improve as it gain more and more experience then it called as Machine Learning or we can say that when machine take decisions and does prediction on data then also it is said to be the machine learning concept. Classification, Regression and Clustering this type of problem are mainly solved through Machine Learning [9]. Which technique we are going to use is mainly depend upon the type and category of data we will provide to our model. The offered Techniques are Supervised, Unsupervised, Semi-Supervised and Reinforcement to implement machine learning algorithm.

Supervised Learning
It is the simplest and easiest type of learning method. When your dataset act as a teacher or you can say guide your model for training then this type of learning called supervised learning [10]. In this approach your model gets trained automatically and starts making prediction and take decisions.

Unsupervised learning
When your model try to find pattern from the given dataset or you can say that it learn through its observations and find the structure in the data then this type of learning is called unsupervised learning [11][12].

Semi supervised learning
From the name it is clear that it lies between supervised and unsupervised learning. Practically cost of label is very high in some situation so we need that much skilled person those can work with only few labels because majority case of labels are absent.

Reinforcement
This type of learning is completely different from supervised and unsupervised learning. Here in this learning we perform a feedback loop between an agent and an environment [13]. To make relation or to connect agent or environment we give set of actions. Video game is the best example of Reinforcement. Reinforcement goes through following steps: x Agent observed the input state.
x Agent perform an action instructed by decision making function.
x Action performed and now agent receives reinforcement from the environment.
x This state-action information is stored for further use.

Machine learning in healthcare
Technology world's new inventions giving opportunity for people to develop a system which help the doctors to detect the chronic disease in early stage and cure the disease, Machine learning is playing so tremendous performance in healthcare system [5] [14]. There are many application of machine learning in healthcare sector like identify diseases and diagnoses, drug discovery and manufacturing, medical imaging diagnoses, smart health record, better radiotherapy etc [8]. There are many chronic diseases in this world like cancer, lungs problem, heart disease, kidney failure etc. With no early symptoms and people comes to know only at the end stage, so here machine learning actually providing so many solution to early detect your problem and cure them.

Requirement of Decision Support in Healthcare
Many people die every year because of not able to detect disease early in the health care system. Health information technology construction suggested few strategies, like association, significant consumer selection of clinician society and IT adoption [15].

Decision Support
Health protection system in machine learning depends on the computer's large computing capacity and on doctor's logic competence [13]. Doctors and machine both search for pattern but doctors cannot estimate heartbeat of each patient as efficient machine can do. So, machine will do all these tasks and present the outcome for confirmation to doctor. Decision support system in healthcare. Decision support system help in early disease exploration by maintaining the awareness of health issue, background knowledge of individual patient to doctors [12]. It keeps track of insurance policy, refund detail, charge, account receivable, account payable. This system also helps in Identification of patient's situation and then doctor suggest when and how to use the drug to cure your disease.

Literature review
This table present the work done for prediction of chronic kidney disease using machine learning, here I, Study how dataset collection (Open Source or Real life Data) and their pre-processing affects the accuracy level of our study. Presented a model for identify chronic kidney disease in patient by using pre existing dataset of patient and clean the data to achieve the better result, which is provided to our model for implementation process.
Datasets downloaded from UCI repository, 400 patient records with 25 aspect.
Attributes used are red blood cell, Blood Pressure, white blood cell etc.
Raw data needs to be clean so that properly use for our model like missing value removed by WEKA function "Replace Missing Value" with NA or by Accuracy level given by Decision Tree is 91.75% and SVM is 96.7%.
Calculation time is less.
Limitation: Strength of the data is not higher because of size and missing aspect. The purpose of study is develop a CKDPS system by using different approach like finds the correlation between input parameter for example cretanine and urea have strong relationship.
On basis of urine test or blood test try to predict CKD by finding correlation between attributes so that to remove redundancy and noise in data and at last provide only correlated data to model for better result of prediction.
Machine learning classifiers used in this study are regression tree, support vector machine, logistic regression and multi-layer perception neural network.

Accuracy analysis of various techniques
The following study shows the accuracy comparison of various techniques, using different attributes from either pre-existing database or real life dataset to predict the presence of chronic kidney disease in an individual.
x Figure 2 shows the accuracy comparison of SVM and DT algorithms with 14 attributes of 400 patient records downloaded from UCI Repository [18]. SVM provides the best result with 96.7% accuracy but strength of the data is not higher because of size and missing aspect. x Figure 3 shows the accuracy comparison of LOGR, SVM, MLP and RPART algorithms with 24 attributes of dataset downloaded from UCI Repository [19]. MLP gives the best result with 99.5% accuracy but finds only the correlation between input parameter to get the desired result for example cretanine and urea have strong relationship, Value of most of the attributes is neglected to achieve the better accuracy. x Figure 4 shows the accuracy comparison of SVM, DT, RF algorithms with 25 attributes of dataset downloaded from UCI Repository. RF gives the best result with 99.16% accuracy [20]. DT Figure 4. Accuracy comparison of SVM, DT and RF techniques based on 25 attributes of UCI Repository dataset.
x Figure 5 shows the accuracy comparison of DT, NB, KNN and RNN algorithms with 43400 records structural as well as un-structural collected from hospitals [21]. RNN gives the best result with 97.62% accuracy. This may be time consuming so for better result in future use only MRI scan or R-RAY. x Figure 6 shows the accuracy comparison of SVM, DT