Carbon dioxide emission prediction using support vector machine

In this paper, the SVM model was proposed for predict expenditure of carbon (CO2) emission. The energy consumption such as electrical energy and burning coal is input variable that affect directly increasing of CO2 emissions were conducted to built the model. Our objective is to monitor the CO2 emission based on the electrical energy and burning coal used from the production process. The data electrical energy and burning coal used were obtained from Alcohol Industry in order to training and testing the models. It divided by cross-validation technique into 90% of training data and 10% of testing data. To find the optimal parameters of SVM model was used the trial and error approach on the experiment by adjusting C parameters and Epsilon. The result shows that the SVM model has an optimal parameter on C parameters 0.1 and 0 Epsilon. To measure the error of the model by using Root Mean Square Error (RMSE) with error value as 0.004. The smallest error of the model represents more accurately prediction. As a practice, this paper was contributing for an executive manager in making the effective decision for the business operation were monitoring expenditure of CO2 emission.


Introduction
The global warming issues now become a universal problem for all nations. The Intergovernmental Panel on Climate Change (IPCC) reported that scientists were more than 95% certain that most of global warming is caused by increasing concentrations of greenhouse gasses and other human (anthropogenic) activities [1]. That balance between earth and atmosphere affected by an increase in acid gasses charcoal or carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), hydrofluorocarbons (HFC) and perfluorocarbons (PFC) more commonly known by greenhouse gasses. In particular, CO2 is a major cause of Global Warming [2]. About eight billion tons per year of carbon in the form of CO2 emitted globally through burning fossil fuels for transport and for the production of heat and electricity around the world [3]. Emission of carbon dioxide is the rest results of combustion of water (H2O) and carbon monoxide gas (CO) or also called as carbon dioxide (CO2) that is a greenhouse gas. There's a concept as references in the measurement of CO2 emission, namely carbon footprint [4]. 1 Corresponding Author: Chairul@uii.ac.id (Chairul Saleh) 2 Nurrachmandzakiyullah@gmail.com 3 jbayunugroho@gmail.com Carbon footprint has become a widely used term and concept in the public debate on responsibility and abatement action against the threat of global climate change. It had a tremendous increase in public appearance over the last few months and years and is now a buzzword widely used across the media, the government and in the business world [5]. It is a measure of total amount of carbon dioxide released into the atmosphere in the given time frame that is directly or indirectly caused by an activity to provide service or product [6]. A carbon footprint is a measure of the amount of carbon dioxide emitted through the combustion of fossil fuels. In the case of a business organization, it is the amount of CO2 emitted either directly or indirectly as a result of its everyday operations. It also might reflect the fossil energy represented in a product or commodity reaching market [7]. The carbon footprint of U.S. households is about 5 times greater than the global average, which is approximately 10 tons CO 2 e per household per year. For most U.S. households, the single most important action to reduce their carbon footprint is driving less or switching to a more efficient vehicle [8].
During the last decade, the growth of energy consumption and CO 2 emissions be more important than industry growth trends, it means that the reduction of energy consumption and CO 2 emission or slightly does not cause a decrease in growth industries. The industrial sector is the main contributor of the total CO 2 emissions in the world [9]. Considering the growth of manufacturing sector, expected there is serious concern about the emission of carbon dioxide [10]. The use of energy is the greatest source of emissions. This represents about 65 percent of all gas emissions, and the level of emission is expected to rise if no accurate steps implemented. Coal and Electricity have been a preferred form for energy consumption and has consistently registered a higher growth rate than other forms of energy. Increased consumption of electrical power is more intimately bound up with increased emission levels of CO2 [11]. Recently, many efforts have been put on the complex system of economy-environment and relevant CO 2 emission reduction issues [12]. In an effort to minimizing CO 2 , there is a need of new manufacturing process, better known as green manufacturing which is suitable a sustainable development strategic [13]. Green manufacture is an economically-driven, system-wide, and integration approach to the reduction and elimination of all waste streams associated with the design, manufacture, use and disposal products and materials [14]. The standards to reach green manufacture including zero potential safety problems, zero health threats on the operators and product users, zero environmental pollution, waste recycling, and waste disposal during the production process as much as possible [15].

Literature Review
Research on CO2 emissions is highly renewable, qualitative and quantitative researches that have been done until now is still in the process of the discussion, it makes the theory and approaching method in order to calculate the emitter of CO2 emissions has not become one unit [16]. Mitigation of Carbon Dioxide emission is the challenge of the future in order to stabilize global warming [17]. CO2 prediction using computational intelligence approach that has been done by [18]. An adaptive neuro-fuzzy interference system (ANFIS) and multi-layer perceptron artificial neural network (MLP-ANN) have been developed to estimate CO2. The proposed model of ANFIS and MLP-ANN demonstrates that both methods can solve CO2 prediction problem. There's another method for prediction namely, support vector machine (SVM). Recently, several applications of SVM can be found both for classification and for regression problems [19]. SVM is more accurate than semi empirical equations for predict solubility of different solutes in supercritical carbon dioxide [20]. On the other hand, SVM was implements the structural risk minimization principle [21]. SVM has successfully solved a prediction of CO2 exchange rate [22]. In their study, different tests were performed along the North Atlantic oceanic region with data obtained during 2009 and the proposed model of SVM demonstrates that SVM can solve CO2 prediction problem. SVM is powerful machine learning tool that can be used for time-series prediction [26].
SVM is a novel learning machine first developed by Vapnik in 1995 [23][24][25]. SVM is a learning system that uses a hypothetical space in the form of linear functions in a high dimension feature space, trained with the learning algorithm based on the theory of optimization by implementing learning bias. SVM concept uses the concept of ε-incentive function loss. SVM can be generalized to approach nonlinear function known as SVR. SVM concept can be defined as using single hyperplane in many dimensional spaces that eventually those partitions can be resolved in a non-linear. SVM considers the two-class classification, the classes being = +1, −1 respectively. To separate the training pattern in both classes, SVM will find the maximum hyperplane. Linear equation of hyperplane where the hyperplane normal to w and b is bias.
If the training data are linearly separable then a pair of (w, b) will be existed as If the classes is not linearly separable then a non-negative slack vector variable ξ = (ξ1,..,ξm) will be existed as Based on many researches mentioned above, it is proved that there are so many good methods to predict the carbon dioxide emission. In this research will be carried out the prediction of carbon dioxide in the manufacturing industry. Machine learning of support vector machine approach will be used as a prediction method.
3. Proposed SVM model for carbon dioxide emission prediction This paper will investigate CO2 emissions using prediction SVM model, which considering variable energy consumption that have an impact on the emergence of CO2 emission. In this study, we considering electrical energy and burning coal as energy consumption that affect directly increasing of CO2 emissions [11]. The data of energy consumption were collected from the alcohol industry in Yogyakarta, Indonesia. The data of energy consumption is retrieved from historical data that used in production process, including electrical energy during the production process in kilowatt-hour (kWh), burning coal in kilogram (Kg). However, the data electrical energy and burning coal consumption must be converted into the values of CO2 emission (KgCO2). The value of the emission factor for electricity refers to the emission factors that have been set by Perusahaan Listrik Negara (PLN). PLN is a state's company that deal with all aspects of electricity in Indonesia. The value of the emission factor of the electricity that is equal to 0.725 kg/kWh. Conversion of electrical energy is obtained by multiplying the amount of energy used and the calorific value or emission factor as shown in Equation 6. Then, for the CO2 emission data are obtained the total emitted CO2 from electrical energy consumption and coal. In order to designing prediction model of SVM, training and testing data were needed. In this research, electrical energy consumption and coal data were selected as dataset input and the CO2 emission as output. The relationship between the input and output variables can be defined as follow: Furthermore, to make it clearly, there are several step in order to design model prediction of SVM as shown in figure 1 as follow:  However, in order to design prediction SVM model pre-processing data is needed by data mining technique that involves transforming raw data into the understandable and proper format. The preprocessing data is normalization that needed to improve the generalization performance with SVM [27]. Then, to validate the model, cross-validation is used by dividing the dataset into training data (90%) and testing data (10%). The model proposed in this research is used to support vector machine algorithm to investigate the learning method for the CO2 emission prediction. A learning process in SVM is used to Support Vector Regression (SVR). SVR is a part of Support Vector Machine and is specialized in obtaining regression models by means of a change in the dimensionality of the data. SVR concept is based on risk minimization, i.e. to estimate a function by minimizing the upper limit of generalization error, so that the SVR is able to overcome the overfitting. To find the optimal parameters of the SVM model in the training process, the parameters such as C parameter, Epsilon must be optimal. To measure the optimal parameter of prediction SVM model will be used root mean square error (RMSE) [28]. The obtained result of RMSE shows the accuracy of the prediction. Then, the result prediction will be analyzed.

Experiment result
In this work, the application software is Rapid Miner 5.3 was conducted in the experiment for predict CO2 emission. In order to test problems were undertaken using a computer specification: AMD A8-5550M 2,1 GHz, RAM 2 GB and Windows 10 as the operating system. Our objective is to monitor CO2 emissions from incurred energy consumption by looking for an accurate prediction which has the lowest error. In this experiment, as mention before, cross-validation was used to split the data into a training set or equal parts: 90% and 10 % for testing set. Then, Error measurement was performed on 90/10 splits and repeated for all 10 probable splits. In order to get the optimal parameters of SVM model, the SVR learning with a trial and error approach for the training process was performed. The objective of SVM model is finding the best parameters on SVM model such as C parameters, Epsilon and the type kernel function used "dot" function. We set the maximum iteration 10.000 and during the training, C parameters were determining -0.1 until 1 and 2 until 50. The parameter of C selected based on smallest RMSE value. Then, after C parameter is obtained, Epsilon was determined with range 0.1 until 1. The best epsilon parameters were selected also based on smallest RMSE value. The C parameters are used to minimize error during the training process and can avoid overfitting and underfitting if the parameter was properly selected. Overfitting arises whenever the training algorithm capturing noise from the data, while underfitting arises if the training algorithm cannot accurately recognize the underlying trend of the data, therefore the model shows low bias but high variance. Moreover, the Epsilon is used to fit the training data. Both parameters can be selected by the user [29]. In Table 1 shows the optimal parameters of SVM model for predict CO2 as follow: Based on the experiments result above, show that the optimal parameter of SVM is C is 0.1 and epsilon are 0. The performance of the SVM model has an error value (RMSE) of 0.004. By using a parameter in Table 1, we compare the actual data and the result of prediction CO2 emission as follow. In the last step, the output of the SVM model is analyzed. To ensure the prediction result are a normal distribution, the confidence level of training model SVM need to be done by using the statistical technique. Several statistic tests can be used to test the normal distribution such as Shapiro-Wilk test, the Lilliefors test, D'Agostino-Pearson's L2 test, the Jarque-Bera test, and the Anderson-Darling test [30]. However, normal distribution test Shapiro-Wilk was conducted because the greatest applicable statistic method that suitable for all type of distribution and sample in order to test the model prediction. If the prediction value was normally distributed, it shows that the prediction model has higher accuracy and precision. We set the significance level (α) at 0.05; if p <0.05, then the prediction value are not normally distributed. The result of the Shapiro-Wilk test is shown in Table 2. As can be seen from Table 2, prediction data have the significance value is 0.104 respectively. Thus, the predicted values are normally distributed. Moreover, histogram analysis for prediction data as shown in Figure 3. it indicates that the prediction data have higher accuracy. The statistical analysis generally used to know how the statistical probability distribution of the data which can help to make the right decision. From Figure 3, the prediction results approximate a normal distribution were the histogram is centered over the true value. It proves that the SVM model can solve nonlinear data and higher accuracy of prediction.

Conclusions and Further
Work Support Vector Machine model was applied in this paper to predict CO2 emissions from energy consumption. The model is used to monitor electrical energy and burning coal which affect the amount of CO2 emitted. Trial and error approach was applied in order obtain a better prediction model with a lower error. The results obtained show that the lower error (RMSE) value was 0.004 with optimal parameters for the SVM model of 0.1 for the C parameter and 0 for Epsilon. Prediction with high accuracy can give information concerning about CO2 emissions. Furthermore, the main objective in this work is to achieve the lower RMSE when designing the model prediction. It can be concluded that when the high accuracy of the prediction model, then the lower RMSE value must be obtained. By monitoring energy consumption, it can help the manager to develop policies or taking a decision in order to reduce the negative impact on the environment during the production process. For further research, the parameter of SVM model can be automatically selected by integrating optimization technique such as genetic algorithm or particle swarm optimization.