Predicting risk factors for postoperative coronary artery bypass grafting using logistic regression and CHAID

Non-fatal postoperative complications are postoperative morbidity that can affect the patient’s functional status and quality of life. Evaluation of postoperative morbidity is the step needed to assess and improve the quality of patient care. Therefore, a method is required in order to predict risk factors in evaluating a patient’s postoperative morbidity. After this the results will be used to determine the insurance premiums. In this research, the Logistic Regression are used to know the risk factors that would occur in patients who had undergone Coronary Artery Bypass Grafting (CABG) surgery. Then we use the CHAID method to classify readmission based on patient characteristics. Based on the two analyzes, it can be concluded that the CHAID analysis supports the Logistics analysis, there are two risk factors significantly influence the complications of patients after Coronary Artery Bypass Graft (CABG), namely Sex and Ejection Fraction.


Introduction
Based on data from the Sample Registration System (SRS) in 2014, coronary heart disease was the highest cause of death, which reached 12.9 % [1]. One way that can be done to reduce deaths from coronary heart disease (CHD) is by performing Coronary Artery Bypass Grafting (CABG). Coronary Artery Bypass Grafting is an action taken to overcome coronary heart disease by eliminating coronary artery blockage and maximizing the flow of blood vessels [2].
Before performing the Coronary Artery Bypass Grafting action, an analysis of risk factors before the CABG action needs to be done to determine the postoperative risk. Data used for the analysis of factors before surgery are derived from medical records of CABG patients [3]. Risk factors before surgery are one of the things needed to determine the morbidity that will occur postoperatively. CABG postoperative morbidity determines the length of time the patient is hospitalized. Therefore, the risk factor prediction model plays an important role that can be used to predict morbidity that will occur after CABG surgery.
There have been several studies regarding risk prediction models for CABG operations that have been conducted. In 2002, Hakala, et al. discussed about predicting the risk of atrial fibrillation prediction after CABG operation with logistic regression in Finland, and found that ejection fraction (EF) influence the complication of patient [4]. In 2003, Huijskes, et al. discussed the risk prediction that occurred after underwent CABG and valve surgery in Netherland, using multivariate logistic regression, and found that sex and injection fraction influence the complication of patient [5] logistic regression with forward stepwise to identify the risk factors that occurred after underwent CABG surgery in England, and found that renal dysfunction, unstable angina, ejection fraction, peripheral vascular disease, obesity, age, smoking, diabetes, priority, hypercholesterolaemia, and hypertension affect the complication of patient [6]. In 2009, Antunes, et al. discussed the risk prediction that occurred after underwent CABG operations in Portugal, using the development of logistic regression and bootstrap analysis models, and found that Diabetes, Obesity, and peripheral vascular disease affect the complication of patient [7]. In this study, risk factors for CABG surgery will be predicted using the logistic regression and CHAID methods to provide information to patients about the risk following CABG operation that influence the complication of patient.

Logistic regression model
Logistic regression is a regression model used to analyze the relationship between predictor variables and response variables, where the response variable has only two possibilities, namely success, and failure expressed by 0 and 1. The following is a general form of logistic regression [8]: or can be written as: where ݅ ൌ ͳǡʹǡ͵ǡ ǥ ǡ ݊ , and ݊ denotes the number of observations, ൌ ሾͳǡ ‫ݔ‬ ଵ ǡ ‫ݔ‬ ଶ ǡ ǥ ǡ ‫ݔ‬ ሿ and It is assumed that ܻ has a Bernoulli distribution (ܻ is the response variable) and has the following probability distribution (table 1).
Since ‫ܧ‬ሺߝ ሻ, then the expected value for the response variable is given as follows: which means: Since ܻ is a binary response variable, then the value of ‫ܧ‬ሺܻ ሻ has the following restrictions: In this logistic regression, we use the link function, in the form of a log link that is defined as follows: This function is used to guarantee that no expectation value of the response variable belongs to the interval ሺെλǡ λሻ. We use the maximum likelihood method to estimate the parameter of ࢼ. Each observation follows the Bernoulli distribution. Hence, the probability distribution of observations is The observation ‫ݕ‬ has a value of 0 or 1. Since the observations are mutually independent, then the likelihood function is given as follows: By substituting equation 7 to equation 9, the likelihood function of ࢼ is obtained as follows: Therefore, the log-likelihood function is The next step is to take the derivative of equation 11 with respect to ߚ ǡ ߚ ଵ ǡ ߚ ଶ ǡ ǥ ǡ ߚ . The resulting reduction is accompanied by zero to get the optimal estimated value. Hence, we get the following general equation: Furthermore, the estimator ࢼ is obtained by solving equation 12, but equation 12 is still in the implicit form so that a numerical method is needed in its solution. In this research, Newton-Raphson method will be used. The principle of Newton-Raphson method is to find ࢼ that satisfies ݂൫ࢼ ൯ ൌ , with ݂൫ࢼ ൯ is given in equation 13. Next, we will look for the second derivative of the log-likelihood function for each parameter, and the general equation will be compressed as follows: From the equation 13, then an iteration will be performed to find the estimated value of ࢼ . After obtaining the estimated value of ߚ ǡ ݆ ൌ Ͳǡͳǡʹǡ ǥ ǡ ݇, the next step is testing the Goodness of fit to find out whether the model obtained is feasible or not to be used. The hypothesis used in this test is as follows: ‫ܪ‬ = Model is feasible to use. ‫ܪ‬ ଵ = Model is not suitable to use. The test statistic used in this study is deviance, which is formulated as follows: The principle of the decision is to reject the ‫ܪ‬ if ‫ܦ‬ ߯ ఈሺିሻ ଶ . After testing using deviance then Wald tests will be performed to determine the effect of risk factors on postoperative readmission. The results of this test indicate whether a risk factor is feasible to enter into the logistic regression model [9].
The hypothesis used in this test is as follows: ‫ܪ‬ = There is no influence of risk factors on patient readmission. ‫ܪ‬ ଵ = There is an influence of risk factors on patient readmission.
The test statistics used in this test is: Interpretation of Logistic Regression Model Parameters needs to be done by looking at the odds ratio value. The odds ratio is a measure to see how likely the influence of risk factors is on post-surgery patient readmission and is defined as follows [10]: with ܱ ோ represents the ratio between the odds of a binary response variable ܻ and a binary predictor variable ܺ.

Chi-square automatic interaction detection
The chi-square automatic interaction detection (CHAID) method was first introduced by Kass in 1980. This method is a method for classifying categorical data to divide data sets into sub-groups. In this study, the CHAID method is used to classify readmission based on patient characteristics. The results of the CHAID classification will be displayed in a tree diagram. The CHAID method is divided into three stages [11], namely: a. Merging Phase, at this phase, the significance of each category of independent variables will be examined for the dependent variable. The first step that must be done is to form a two-way contingency table for each independent variable with the dependent variable. Then calculate the Chi-Square statistics for each pair of categories that can be chosen to be combined into one, here are the test statistics used [12]: separator. The risk factor variable that has the smallest ‫-‬value is used as the best node separator, and if no risk factor variable has a ‫-‬value ߙ, then the separation is not performed [13]. d. Termination Phase, this phase is done if there is no real relationship between the risk factor variable with the response variable and the size of the child node is less than the minimum node child-size specification. If there is a merger in the explanatory variable, then a Bonferroni correction is performed for each type of free variable as follows. Bonferroni correction is a correction process for several statistical tests when independence is carried out together. Independent type Bonferroni multipliers for nominal origin variables [14]: Monotonic type Bonferroni multiplier for ordinal origin variables is: with ‫:ܤ‬ Bonferroni multiplier ܿ: The number of origin categories ‫:ݎ‬ The number of new categories

Results and discussion
The data used in this study came from the European Society of Cardiology ( In this study, the observed risk factors are used to construct a prediction model with logistic regression. Using the Newton-raphson method, an estimated value of the coefficient ߚ for each risk factor is obtained as shown in figure 1.
After obtaining the estimated coefficient ߚ for the prediction model, the next step is to carry out the Wald test to determine the effect of risk factors on the occurrence of complications after carrying out CABG individually. Based on Wald's value in figure 1, with the value of ߙ is 10 %, five risk factors significantly influence complications after CABG, namely Sex, NYHA, and EF. Thus, the prediction model used is as follows.  Figure 2. Classification tree with ߙ ൌ ͳͲ Ψ on CHAID method.
From the odds ratio in the figure 1, we found that men are more at risk of 1.944 times more than women, NYHA is more at risk of 1.438 times more than not experienced, and EF is more at risk of 0.982 times than not experienced. Then, CHAID analysis will be carried out with the value of ߙ is 10 %. The result obtained from the CHAID analysis can be seen in figure 2.
As shown in figure 2, the risk factor that has the strongest influence on the occurrence of morbidity is the sex variable. Out of 388 respondents, around 25 % of male sex experienced post-operative morbidity, and 12.9 % of female patients experienced morbidity. Variable ͵ͺ and male sex around 43.6 % having morbidity while ͵ͺ having morbidity around 20.1 %.

Conclusion
Based on the analysis using logistic regression that has been done on patient data after CABG surgery from the European Society of Cardiology, it obtained three risk factors that significantly influence postpatient complications (CABG), namely sex, NYHA, and ejection fraction. Whereas, if using CHAID analysis, there are two risk factors significantly influence the complications of patients after Coronary Artery Bypass Graft (CABG), namely sex and ejection fraction. Based on the data analysis carried out, it can be concluded that the CHAID analysis supports the Logistics analysis in predicting risk factors for postoperative CABG.