Prediction model of exacerbations in patients with Chronic Obstructive Pulmonary Disease (COPD) at RSCM

Chronic Obstructive Pulmonary Disease (COPD) is a worldwide health problem. COPD has a tendency for exacerbations. Exacerbations are worsening of acute respiratory symptoms resulting in additional therapy. Exacerbations in COPD increase the risk of death. The objective of this study is to determine the prediction model of exacerbations in patients with COPD based on factors affecting exacerbations in patients with COPD at RSCM (Rumah Sakit Cipto Mangunkusumo). The data used in this study is secondary data from the medical records of patients with COPD in RSCM. The sample was chosen using purposive sampling technique. The samples in this study are 107 patients with COPD. The method used is binary logistic regression analysis. The results of this study indicate that the factors that significantly influence the exacerbations of COPD are breathlessness, history of ICS use, and history of antibiotics use. Appropriate logistic regression model has been obtained. The result indicates that patients with COPD who have breathlessness, have history of ICS use, and have history of antibiotics use are more at risk of exacerbations than those who don’t. Accuracy test has been conducted with classification table at cut point 0.5. The prediction model has an accuracy rate of 74.77 %.


Introduction
Chronic Obstructive Pulmonary Disease (COPD) is a common, preventable and treatable disease that is characterized by persistent respiratory symptoms and airflow limitation that is due to airway and/or alveolar abnormalities usually caused by significant exposure to noxious particles or gases [1]. COPD is a worldwide health problem. According to World Health Organization (WHO), globally it is estimated that 3.17 million deaths were caused by COPD in 2015. In Indonesia, the prevalence of COPD is 3.7 % [2]. The state of COPD consists of stable state and exacerbations state. COPD exacerbations are defined as an acute worsening of respiratory symptoms that result in additional therapy [1]. The worsening of the previous condition is characterized by increase in dyspnea (shortness of breath), increase in the amount of sputum, and change in the color of sputum to yellow or green.
Previous studies of COPD exacerbations have been done. A study by Make et al in 2015 suggested that the factors associated to short-term (6 months) risk of exacerbations in COPD were COPD maintenance medications, daily reliever use, the number of exacerbations during previous year, the ratio of forced expiratory volume in 1 second (FEV1) to forced vital capacity (FVC), and gender [3]. These factors were used to develop a score called SCOPEX. Meanwhile, a study by Steer et al in 2012 suggested that the factors that most significantly affect the mortality in patients hospitalised with an exacerbation of COPD were dyspnea, eosinopenia, consolidation, acidaemia, and atrial fibrillation [4]. These factors were combined to form a score called DECAF score.
The characteristics of patients with COPD in foreign countries compared to Indonesia are suspected to be different. That's why this research was done in Indonesia, especially at Cipto Mangunkusumo Hospital (RSCM). RSCM is chosen because it is a national referral hospital that not only function on health services but also function on research and education in health.
The problem in this study is how to determine the prediction model of exacerbations in patients with COPD based on factors affecting exacerbations in patients with COPD at RSCM. So, the objective of this study is to determine the prediction model of exacerbations in patients with COPD based on factors affecting exacerbations in patients with COPD at RSCM.
The scope of problem in this study: 1. This study uses medical records of RSCM from 2010 to 2017. 2. The patients studied were COPD patients who are inpatient or outpatient at pulmonology clinic of RSCM.

Variables in the study
The variables in this study consist of dependent variable and independent variables. The dependent variable is exacerbation, defined as worsening of acute respiratory symptoms resulting in additional therapy. This variable consists of two categories, exacerbation and not exacerbation.
The independent variables in this study are selected based on previous studies. They are as follows: (a). Gender. This variable is defined as the gender of COPD patients. It consists of two categories, female and male. (b). Age. This variable is defined as the life duration of COPD patients counted from the patient's birth date until the patient is admitted to the hospital due to COPD, expressed in years. (c). Body Mass Index (BMI) . This variable is defined as an assesment of adult nutritional status that is measured by weight in kilograms (kg) divided by the square of the height in metres (kg/m 2 ). Low BMI is associated with high mortality in COPD patients [5]. (d). Smoking history. This variable is defined as a habit of smoking cigarettes in patients with COPD.
It consists of two categories, smoking and not smoking. Around 20-25 % of smokers are at risk of developing COPD or lung cancer [6]. (e). Eosinophil level. This variable is defined as the percentage of eosinophils in the blood of patients with COPD based on the blood test results. (f). Neutrophil level. This variable is defined as the percentage of neutrophils in the blood of patients with COPD based on the blood test results. (g). pCO2. This variable is defined as the partial pressure of carbon dioxide in the blood of patients with COPD based on blood gas analysis. It is expressed in mmHg. (h). pO2. This variable is defined as the partial pressure of oxygen in the blood of patients with COPD based on blood gas analysis. It is expressed in mmHg. (i). Oxygen saturation. This variable is defined as the percentage of oxygen molecules that has bonded with haemoglobin in patients with COPD. (j). Breathlessness. This variable is defined as the condition of shortness of breath or difficulty breathing. It consists of two categories, breathless and not breathless. (k). Sputum production. This variable is defined as a condition when COPD patient's airway is filled with phlegm or mucus. It consists of two categories, producing sputum and not producing sputum. (l). History of bronchodilator use. Bronchodilator is medicine that can dilate the bronchi and bronchioles, decrease resistance in the respiratory airway, and increase airflow to the lungs. This variable consists of two categories, having history and not having history. This variable consists of two categories, having history and not having history. (n). History of ICS (Inhaled Corticosteroid) use. This variable is defined as the history of using corticosteroid in inhalation form. This variable consists of two categories, having history and not having history. A study suggests that COPD patients treated with ICS experience significantly fewer exacerbations than patients on placebo [7]. (o). History of antibiotics use. Antibiotics are medicine that can destroy or slow down the growth of bacteria. This variable consists of two categories, having history and not having history. (p). Cardiovascular-related comorbidities. This variable is defined as comorbidities associated with the heart and blood vessels that are suffered by patients with COPD. It consists of two categories, having and not having. (q). Respiratory-related comorbidities. This variable is defined as comorbidities associated with the respiratory that are suffered by patients with COPD. It consists of two categories, having and not having.

Experimental method
Population in this study is patients with COPD at RSCM who were listed on the medical records in 2010-2017. The sample is 107 COPD patients who are selected using purposive sampling. The data used is secondary data from the medical records of COPD patients at RSCM from 2010 to 2017. The method for data analysis is binary logistic regression.

Binary logistic regression model
Regression analysis is a statistical technique for investigating and modeling the relationship between variables [8]. One of the cases of regression is binary logistic regression. Binary logistic regression is an approach to analyze the relationship between binary (dichotomous) dependent variable with numerical or categorical independent variables. The dependent variable has only two possible outcomes, generically called success and failure and denoted by 0 and 1. Binary logistic regression model has the form: where ߨሺ‫ܠ‬ ሻ is the probability of "success", ߚ ǡ ߚ ଵ ǡ ߚ ଶ ǡ ǥ ǡ ߚ are unknown parameters, ‫ܠ‬ ൌ ൫‫ݔ‬ ଵ ǡ ‫ݔ‬ ଶ ǡ ǥ ǡ ‫ݔ‬ ൯ is the set of independent variables, ݅ ൌ ͳǡ ʹǡ ǥ ǡ ݊, and ݊ is the number of observations. If some of the independent variables are nominal, it is inappropriate to include them in the model as if they were interval scale variables. In this situation, the method of choice is to use design variables (dummy variables) [9]. In general, if a nominal scaled variable has ݇ possible values, then ݇ െ ͳ dummy variables will be needed. Suppose that the ݆ th independent variable ‫ݔ‬ has ݇ levels. The ݇ െ ͳ dummy variables will be denoted as ‫ܦ‬ and the coefficients for these dummy variables will be denoted as ߚ , ݈ ൌ ͳǡ ʹǡ ǥ ǡ ݇ െ ͳ. Thus, the model with ‫‬ variables and the ݆ th variable being discrete would be:

Parameter estimation
The unknown parameters ᇱ ൌ ൫ߚ ǡ ߚ ଵ ǡ ǥ ǡ ߚ ൯ should be estimated to form a logistic regression model. The method to estimate parameters in logistic regression is the Maximum Likelihood method. This method is applied by looking for the parameters ᇱ ൌ ൫ߚ ǡ ߚ ଵ ǡ ǥ ǡ ߚ ൯ that maximize likelihood function. Each observation follows the Bernoulli distribution. Since each observation is assumed to be independent, the likelihood function is obtained as the product of the probability function of each observation as follows: It is easier mathematically to maximized the likelihood function in the form of logarithms. This expression, called log likelihood, is defined as follows: The value of that maximizes ‫ܮ‬ሺሻ can be found by differentiating ‫ܮ‬ሺሻ with respect to parameters ߚ ǡ ߚ ଵ ǡ ǥ ǡ ߚ and setting the resulting expressions equal to zero. These equations are: and for ݆ ൌ ͳǡ ʹǡ ǥ ǡ ‫.‬ The equation 5 and equation 6 are nonlinear in parameters ߚ ǡ ߚ ଵ ǡ ǥ ǡ ߚ , so it is necessary to solve it using iteration method.

Testing for significance of parameters
Testing for the significance of the parameters is performed to determine whether the independent variables in the model significantly affect the dependent variable. This test is performed simultaneously and partially. The simultaneous test is performed to know the significance of parameters to dependent variable simultaneously. The hypotheses are : ߚ ଵ ൌ ߚ ଶ ൌ ‫ڮ‬ ൌ ߚ ൌ Ͳ and ଵ : at least one ߚ ് Ͳ, ݅ ൌ ͳǡ ʹǡ ǥ ǡ ‫.‬ The test statistic is ‫,ܩ‬ formulated as ‫ܩ‬ ൌ െʹሾ‫ܮ‬ ଵ Ȁ‫ܮ‬ ሿ, where ‫ܮ‬ ଵ is likelihood without independent variable and ‫ܮ‬ is likelihood with independent variables. The partial test is performed to know the effect of each ߚ individually. The hypotheses are : ߚ ൌ Ͳ and ଵ : ߚ ് Ͳ ; ݅ ൌ ͳǡ ʹǡ ǥ ǡ ‫.‬ The test statistic is Wald (ܹ), defined as ܹ ൌ ߚ መ Ȁܵ‫ܧ‬൫ߚ መ ൯, where ߚ መ is estimator of ߚ and ܵ‫ܧ‬൫ߚ መ ൯ is standard error for ߚ .

Goodness-of-fit test
Assessing the goodness-of-fit of the model is performed to know how effective the model describes the dependent variable. This test is performed using Hosmer and Lemeshow test. The hypotheses are : Model fits the data and ଵ : Model doesn't fit the data. The test statistic is ‫ܥ‬ መ , formulated as

Interpretation of the coefficients
The coefficients in logistic regression model are interpreted to find out the meaning of parameter estimation on independent variables. For categorical variables, the parameters are interpreted using odds ratio. Odds is the proportion of probability of a "success" with "failure" in a category. The odds ratio ( ) is defined as the ratio of the odds for ‫ݔ‬ ൌ ͳ with the odds for ‫ݔ‬ ൌ Ͳ. The odds ratio is expressed by ൌ ݁ ఉ భ . The interpretation of this odds ratio is that the tendency for ‫ݕ‬ ൌ ͳ at ‫ݔ‬ ൌ ͳ is ݁ ఉ భ times compared to ‫ݔ‬ ൌ Ͳ. For numerical variables, the interpretation of ߚ ଵ coefficients is that everytime the independent variable increase as much as ܿ unit, the risk of ‫ݕ‬ ൌ ͳ is ݁ ఉ భ times greater.

Classification table
The classification table is used to determine the accuracy of prediction model. This table is the result of cross-classifying the dependent variable with a dichotomous variable whose values are derived from the estimated logistic probabilities ߨሺ‫ܠ‬ ሻ [9]. The derived dichotomous variable is obtained by defining a cutpoint ܿ, Ͳ ܿ ͳ, and compare each estimated probability to ܿ. If the estimated probability exceeds ܿ, then let the derived variable be equal to ͳ; otherwise it is equal to Ͳ. The most commonly used value for ܿ is 0.5. The classification table is presented in table 1.
Two useful summaries of predictive power are ൌ ܲሺ‫ݕ‬ ො ൌ ͳȁ‫ݕ‬ ൌ ͳሻ, ൌ ܲሺ‫ݕ‬ ො ൌ Ͳȁ‫ݕ‬ ൌ Ͳሻ Another summary of predictive power from the classification table is the overall proportion of correct classifications. This estimates which is a weighted average of sensitivity and specificity [10].

Results and discussion
The descriptive analysis show that majority of patients with COPD at RSCM experience exacerbations, are male, have smoking history, experience breathlessness, produce sputum, have history of bronchodilators use, have history of antimuscarinic use, have history of ICS use, have history of antibiotics use, have cardiovascular-related commorbidities, and have respiratory-related commorbidities. In addition, on average, they are 64 years old, have normal weight, have normal eosinophil level, have high neutrophil level, have low pCO2 value, have high pO2 value, and have low oxygen saturation. The data obtained is analyzed using binary logistic regression analysis with R software version 3.3.3. In this study, there are 10 independent variables used as dummy variables because they are categorical. Those are gender, smoking history, breathlessness, sputum production, history of bronchodilator use, history of antimuscarinic use, history of ICS use, history of antibiotics use, cardiovascular-related comorbidities, and respiratory-related comorbidities. Each of these variables has 2 categories, so the dummy variable required is ʹ െ ͳ ൌ ͳ each. Furthermore, the parameters in the logistic regression are estimated using the maximum likelihood method. The result is represent in table 2. Testing for the significance of parameters simultaneously is performed to know the significance of parameters to dependent variable simultaneously. At ߙ ൌ ͲǤͲͷ, the p-value is 5.359945ൈ10 -5 . The p-value ൏ ߙ, then is rejected so the independent variables simultaneously affect the exacerbations.
Testing for the significance of parameters partially is performed to find out which independent variables are worth entering into the model. The results of this test are showed in table 2. At ߙ ൌ ͲǤͲͷ, the independent variables that have significant effect, those with p-value ൏ ͲǤͲͷ, are breathlessness, history of ICS (Inhaled Corticosteroid) use, and history of antibiotics use. The logistic regression model will be formed using these three significant variables. The parameter estimation is presented in where ‫ݔ‬ ଵ represents breathlessness variable, ‫ݔ‬ ଶ represents history of ICS use variable, and ‫ݔ‬ ଷ represents hisrory of antibiotics use variable.
The goodness-of-fit test of the model is performed to determine whether there is a difference between the observations with the predictions. Based on the result, the chi-square value is 6.144 with p-value is 0.6311. Because p-value ߙ ൌ ͲǤͲͷǡ is not rejected. So, it can be concluded that the model fits the data.
Since all significant independent variables are dichotomous, the interpretation uses odds ratio as shown in table 4.
Based on the odds ratio in The percentage of classification accuracy is calculated as the ratio between the number of observations classified precisely by the total number of observations. The percentage of classification accuracy is calculated as follows: Based on the logistic regression model and interpretation of odds ratio, it is found that patients with COPD who have breathlessness, history of ICS use, and history of antibiotics use are more at risk of exacerbations than those who don't. Most breathlessness indicates that the condition of lung is getting worse. The worsening condition of the lungs indicates the occurence of exacerbations. A study suggests that the rate of breathlessness affects mortality in COPD exacerbations [4]. Patients with COPD exacerbations who have higher degrees of breathlessness have higher risk of mortality as well.
Corticosteroids are medicine that are commonly used to relieve symptoms of swelling, itching, and allergic reactions. ICS (Inhaled Corticosteroid) is a corticosteroid used in inhalation form. It is possible that use of ICS comes with an increased risk for exacerbations of COPD [11]. Patients with COPD use ICS because their COPD symptoms worsen. The worsening of COPD symptoms leads to exacerbations.
Antibiotics are recommended for patients whose exacerbations symptoms indicate bacterial infections, that is increased sputum volume, sputum purulence, and increased breathlessness [12]. Unnecessary use of antibiotics, such as in patients who don't have increased purulent sputum, should be avoided because overuse of antibiotics may cause resistance, so the antibiotics don't work to overcome the infection resulting in exacerbation.

Conclusion
This section contains conclusion based on the research and suggestions for further research. The limitations of this study are that the amount of sample is small, that is only 107 patients and the sample is selected using non-probability sampling technique. In addition, this study don't involve the results of spirometry test as an independent variable due to inadequate data. Thus, based on the results of data analysis using binary logistic regression method, the factors that significantly influence the exacerbations of COPD are breathlessness, history of ICS use, and history of antibiotics use. The prediction model of exacerbations in patients with COPD based on factors affecting exacerbations in patients with COPD at RSCM has been obtained. The result indicates that patients with COPD who have breathlessness, have history of ICS use, and have history of antibiotics use are more at risk of exacerbations than those who don't. The accuracy of the model is calculated using classification table at cutpoint 0.5. The result shows that the accuracy rate of the prediction model is 74.77 %, the specificity is 81.63 %, and the sensitivity is 68.97 %.
Based on the limitations of this research, the suggestions for the development of this research are: 1. The next similar research is expected to use a larger number of sample so the data can describe the real situation better. 2. The next similar research is expected to consider other factors that may affect exacerbations in patients with COPD, such as the spirometry test.