Performance analysis of machine learning algorithms: Single Model VS Ensemble Model

Machine Learning is a branch of Artificial Intelligence that predicts several naturally occurring events by training a model with some data and then using unseen data to test it. This paper seeks to analyze the performances of single and ensemble machine learning algorithms on the Cleveland Heart disease data set. Experimental study proves that the accuracy score and area under the ROC curve in the ensemble machine learning model is higher than the single machine learning model in predicting non-CVD and CVD patients.


Introduction
"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." Generally there are six steps to train/make a machine learn: Deciding/Selecting the trainingexperience, Deciding/Selecting the function to map the target, Deciding/Selecting a way in which a function of target can be represented, Deciding/Selecting a function approximation algorithm, Estimating training values and Adjusting the weights.By following these steps the machine can learn by three different ways: Supervised, Unsupervised and Reinforcement.Supervised learning happens in the case of labelled input and output data whereas in unsupervised learning category of input and output data is not known.Reinforcement learning is a reward/feedback based learning.
The purpose of automation of the system is to train a machine or to develop a model to solve real life problems with efficacy, speedily.Broadly, the types of problems can be categorized as:
1.1 Types of Learning in ML: While training a system/machine/computer, different learnings are followed which are based on the approach of prediction/classification. The learnings are: Concept learning: This learning can be applied on boolean-valued outcome data.In the process of learning, a potential hypothesis ( a boolean-valued function to map input data to output data) which best fit the training data is acquired.Find-S: does not consider -ve examples while training (no-outcome example), List-Then-Elimination: generates random order Version Space(VS: set of consistent hypotheses) and works on finite space of hypotheses( H ) Candidate-Elimination(CDE): works on the principle of List-Then-Elimination and produces ordered VS, most general hypothesis to most specific hypothesis.These all three algorithms does not consider any bias, they use only conjunction (∧) with different attributes and choose only best fit hypothesis.
Introduction to bias, in concept learning, leads to Inductive learning.Example: Decision Tree (DT) follows inductive learning and generates flexible model where best fit hypothesis is combination of two or more than two hypotheses.These hypotheses are joined with disjunction (∨) thus gives better flexibility in prediction.[3] Bayesian learning: in this approach of prediction, hypotheses follow probability distribution and Bayesian methods classify new cases/instances by weighted average of probabilities of hypotheses while taking their posterior probabilities as weights.Algorithms are: Bayes theorem, Bayes Optimal Classifier, Naïve Bayes Classifier (NBC), Brute-Force MAP learning algorithm etc.
We can derive most probable hypothesis for Find-S and CDE, using Bayesian learning.[4] Computational leaning: This approach has two types of learnings: • PAC learning: Probably Approximately Correct Learning where sample complexity, computational complexity and mistake-bound are calculated for any of the learner (Machine Learning Algorithm), with assumption that hypotheses space (H) is finite and consistent having training-error zero and test-error smaller than ɛ (finite value of allowed error in the prediction), can be calculated .
• Agnostic learning: When H is finite but inconsistent then the PAC leads to Agnostic learning where training-error may not be zero in all the cases.[5] Instance based learning (IBL): It is a memory based learning.New instances are compared with instances seen in training, each time with new query a new hypothesis is generated thus this approach of prediction is called, local approximation.For example K-nearest neighbor (KNN) algorithm chooses K number of nearest neighbor for each new input data.Unlike DT and NBC which trains the model with global approximation and generated hypothesis remains fixed for all new input data, KNN does not generate any specific model.[6] 1.2 Errors in ML It is a measure of how well the model performs over a given set of data.In the supervised learning during approximation of target function, as generated target function is based on sample data it is incomplete and noisy, there is a difference between expected and predicted output, termed as error in prediction.Loss/Cost function like Mean Squared Error is applied to calculate the error in ML.
The error for any supervised machine learning algorithm comprises of three parts: • Bias Error: It is subjected to assumptions made during training the model.Bias introduces an underfit model, a model which does not perform on training data (training-error≠0) nor generalizes well on the test data (test-error≠0).This error may be due to insufficient input data, can be reduced by considering enough complex data to fit the true function correctly.• Variance Error: When the input data is too large with much of the variance in it, the model learns these variance and in cooperate variance error.The model with variance error is an overfit model which learns too much complexity to fit the true function.The variation in the data may be due to the noise.This error also can be reduced by K-Fold Cross Validation and Hyperparameter Tuning.• Noise: It is an error present in the input data may be due to the system or human introduced.
This error cannot be reduced but can be minimized by minimizing the human intervene while collecting the input data.
We do not want the model to be underfitting or overfitting the data.The balance between the Bias and Variance error is called as Bias-Variance Trade-off, generates a Good-fit model.[7] 1

.3 Ensemble Prediction Model
Ensemble approach suitably merges two or more than two individual single classifier models to enhance the performance.Merged classifiers are known as base classifiers/learners. Ensembling is done to make a more robust system which in cooperates the prediction from all the base learners.Base learners can be Decision Tree, Naïve Bayes, Logistic Regression, KNN, SVM etc.Following are the three approaches to get ensemble model: Can efficiency of prediction model be enhanced using ensemble classifier approach?The first Another section of your paper

Literature Survey
Three different approaches of classification has been observed from the literature review: i) Single classifier approach ii) Homogeneous ensemble classification approach iii) Heterogeneous ensemble classification approach [9] In single classifier approach two or moe than two classifiers are fed with same training set data consists of extracted/reduced features, Figure1.Performance of all the classifiers evaluated on certain parameters: Accuracy.Sensitivity, ROC curve fitting, Confusion matrix etc. and the best performing classifier is selected for prediction of new data set.
From the literature review it has been observed that death rate is continuously increasing due to heart disease and single classifier system is not sufficient to predict the CAD thus the recent trend is of hybrid system and one of the approach is ensemble [11].Ensemble system can be homogeneous and/or heterogeneous.
In one of the homogeneous ensemble approach, as Figure2, dataset is sampled, either parallel: Bagging, Boosting or vertical: Attribute Bagging and fed to the single learning algorithm.In the other approach different training parameters of same learning algorithm is considered to train the system [12,13,14].The dataset is downloaded from "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/"[15].The data set is processed and does not have any missing values, contains nearly 300+ records of the patients/potential patients.There are 76 total features recorded in this data set but most studied features are as described below in Table1.KNN: K-nearest neighbor is a nonparametric supervised instanced based learning algorithm, can be applied on both classification and regression problems.Here K is the number of nearest neighbors of unseen/new instance, in the memorized training data set.Generally used distance metric is Euclidean Distance.In this algorithm value of K should be odd and data should be cluster type or else KNN is not efficient in case of no clustered data.Size of K also play an important role in efficacy of KNN.Very small value of K may result in underperformance of KNN and too high K may result in to less speed of KNN for large data set.[17] With this above knowledge KNN classifier has been developed using the following steps:

Table1: Attributes Description
• Load the data file.
• Normalized the data as KNN considers Euclidean distance to find n nearest neighbors.

•
Divide the input data into test and train data.
• Create and apply the model.

•
Find the accuracy of prediction.From the Table2 it is concluded that bagging ensemble classifier with LogisticRegression has highest accuracy of prediction/classification for the Cleveland Heart Disease data set.

Results and Analysis
Another matrix: area under the ROC curve is used to compare the performances of the ensemble and single model machine learning algorithms.Figure3 demonstrates the ROC curve and area under the ROC curve (AUC) for single classifier models.Among DecisionTree, LogisticRegression, KNearestNeighbor and SVM; LogisticRegression single classifier model has the largest area under the ROC curve means the true positive rate is high as compared to the false positive rate.

Figure3: ROC-Single Classifier Models
Figure4: ROC-Ensemble Classifier Models Similarly, from the ROC curve of Figure4, it is visible that among VottingClassifier, BaggingClassifier and RandomForest; Voting Ensemble Classifier has the largest AUC, and thus it is the highest performing ensemble classifier.

Conclusion
There are various machine learning algorithms to solve prediction/classification problems.Singleclassifier prediction models may suffer from underfitting or overfitting problems, whereas ensemble models can overcome the weaknesses of a single model and generate more stable and accurate model.
An ensemble classifier is one of the ways to improve the algorithm's accuracy by combining the goodness of different algorithms and lessening their weaknesses.

Figure2
Figure2: Homogeneous Ensemble Classifier [8]reduces variance and helps to avoid overfitting.In this approach, n number of base classifiers are merged parallel to get more stable and more accurate model.Preferably, n should be odd in number.Training data set random sampled into n sets of fixed size, Bootstrap, Sample with replacement, and parallel fed to n classifiers.The resultant class/category/label of target/outcome variable is decided by averaging, weigh averaging, majority voting etc. RandomForestClassifier is one of the example of Bagging classifier.Boosting: It reduces Bias and helps to avoid underfitting.In this approach, n number of weak learners are combined sequentially in such a way that present base learner is more effective than the previous one.There are three types of boosting : Adaptive, Gradient and XG-Boost.Stacking: To have better prediction results, the goodness of the Bagging and Boosting is merged to stack one layer of ML models to another layer of ML models.The output of one layer serves as input to another layer and gets refined.[8]

[16]
Table2 represents the Accuracy of prediction of different developed classifiers: