The development of bank applications for debtors’ selection by using Naïve Bayes classifier technique

The purpose of this study is to create an application which functions automatically with high accuracy when analyzing bank customer data. This needed due to non-performing loans occurring frequently caused by the inaccuracy of credit analysts in the assessment of creditworthiness. This can be seen in the incident occurred in a public bank located in Bandung. This bank does not have the database that serves to accommodate data history and the method used in assessing creditworthiness is merely based on the simple statistical analysis. This leads to reduced accuracy and speed in the decision-making process. This research applies Naïve Bayes Classifier (NBC) method, a Data Mining technique. This helps credit analysts to select customers who are truly eligible to be given credit so that non-performing loan can be avoided. NBC calculates the probability of one class from each group of attributes and determines which class is most optimal. The accuracy of the NBC sampling test from 501 data is 94% compared to the decision made by a credit analyst. It can be concluded that this application is very helpful for credit analysts in recommending customers who are eligible for a loan to the bank’s decision maker.


Introduction
In the banking world, one of the services offered to customers is loaning or credit. Credit will be given to qualified customers in accordance with the terms set by the bank. However, with the large number of customers applying for credits, errors performed by credit analysts occur frequently in selecting customers who are qualified to be given a loan [1,2]. This results in the individual as well as corporate customers' inability to repay credit in a timely manner, which is also known as a non-performing loan. A non-performing loan is a condition wherein a customer is unable to pay the minimum payment that is overdue for more than three months [3]. Due to the presence of this condition, there is a need of a system that is able to make selections rapidly and make predictions accurately by utilizing existing information as a benchmark in the provision of credit to customers who are truly qualified [4,5].
In this research, an application will be developed by applying technology in the field of data mining that is able to function in finding data patterns that have not been known of previously. Data mining is an activity that involves the collection and use of historical data in order to find the regularity of patterns or relationships in large datasets [6,7]. The identification of data pattern is approached by conditional probability. Naïve Bayes Classifier (NBC) is one of the classification techniques in data mining that is able to predict the future opportunities based on the past experience 2 1234567890''"" and then find a function that connects past data pattern with the desired output [7]. This method will form a classification model after a number of instructional characteristics are given to determine the class that is suitable for an analyzed sample. By using NBC method in this application, it is expected that it is able to classify the data of customers who apply for credit in a large scale and work as a tool for credit analysts in determining whether or not a customer is qualified to receive a loan.
The NBC technique has been applied to other studies. Patil and Sherekar have compared the evaluation results of Naïve Bayes classifier with J48 in the context of bank defaulters [8]. Naive Bayes classifier has shown good results; however, J48 was found to be more cost-efficient. Ginting and Trinanda found Naïve Bayes was more accurate in book searching for library applications [9,10]. The list of books can be displayed not only based on title, category or author but also based on the descriptions of the books. In his paper, Bustami applied the Naïve Bayes algorithm to classify insurance customer data based on payments history [11]. He used the result to determine future eligible customers.
The purpose of this study is to develop an application by applying the Naïve Bayes Classifier, a well-known data mining technique, to examine the feasibleness of an individual or corporate creditor. With the help of this application, banks are expected to be able to decide loan certainty more quickly and also accurately. This application can also suppress the operational expenses with the ability to determine the area of credit that becomes the focus of the bank in the future. There have only been few studies commenced that were aimed at bank data analysis, particularly in Indonesia. Thus, the expectation from this study is that the result can be a reference for other studies of the same field.

Classification
Data classification is a process that finds the same properties on a set of objects within a database and classifies them into different classes according to the established classification model. The purpose of classification is to find a model of a training set that differentiates attributes into appropriate categories or classes, the model is then used to classify the attributes of which the class are previously unknown [6,7].

Naïve Bayes technique
Naïve Bayes Classification is a classification using probability and statistics method which predicts future opportunities based on the experience in the past known as Bayes theorem. The theorem is combined with Naïve in which the attribute condition is assumed to be independent [6,7,11].

Naïve Bayes classifier technique
NBC is a classification technique rooted in the Bayes theorem. The main characteristic of the Naïve Bayes Classifier is a very strong assumption (naive) to the independence of each condition or event. Before explaining the Naïve Bayes Classifier, Bayes theorem will be discussed first as it is the base of the method. On the Bayes theorem, if there are two separate events (e.g., A and B), the Bayes theorem is formulated as follows: [11,12,13] Here the high independence of assumption is used (naive), in which each guideline (F1, F2 .... Fn) is independent (independent) of each other. With this assumption, the application of a similarity is valid as follows: From the equation above it can be concluded that the naive independence assumption makes opportunity requirements to become possible. Furthermore, the description (F1, F2, ..., Fn | C) can be simplified as follows: With the above equation, Bayes theorem can be written as follows: The above equation is a model of Naïve Bayes theorem which will then be used in the classification process. The Z presents the constant evidence for all classes on a single sample.

Data source
The data used is the historical data of a microcredit bank from 2010-2015 that taken from a database of a public bank in Bandung. The data is kept at the bank and can be obtained when customers apply for credit. This historical data is divided into two: data training and data testing.

Implementation calculation using NBC
The above attributes will determine whether a customer will be given a credit (Eligible/Accepted) or not given the credit (Ineligible/Denied). But before deciding the two classes/categories, two stages are conducted namely the training process and the classification process. In the training phase, the search process of conditional probability value and the search process of prior probability value will be carried out. After the prediction model is built at the training process, the unknown data classification process is then performed.
NBC is applied to several processes that have been described above. The output generated by the system is in the form of a conditional probability value, the prior probability value and the classification accuracy level. While the second is data predicted result namely the credit risk classes (Eligible/Accepted and Ineligible/Denied). The credit risk class used is the largest i.e. Eligible/Accepted obtained if P (Eligible | X) > P (Ineligible | X) where X is a known attribute. Instead Ineligible/Denied is obtained if P (Eligible | X) < P (Ineligible |X).
When provided a new input, the bank customer data classification can be determined by the following steps: 1. Counting the number of classes/labels. P (Y = Eligible) = 12/20, the number of data "Eligible" on data training is divided by the total number of data P (Y = Ineligible) = 8/20, the number of data "Ineligible" on data training is divided by the total number of data From the results above, it appears that the highest probability value is in the (P | Y) class so that it can be concluded that the status of the prospective customer is classified as Eligible.

Test result
The following test results for credit approval to use applications that have been built. From figure 1, it can be observed that there is a compatibility of the predicted outcome that has been processed using data mining application, figure 1(a) with the factual data (historical data), figure  1(b). On the data testing, there are 38 out of 50 compatible data and 12 data are incompatible with factual data. The percentage of success is: Based on figure 1(c) and 1(d), there are 472 out of 501 compatible data and 29 data are incompatible. The percentage of success is: It can be seen that this application is well-performing in analyzing the feasibility of customers data who are eligible for loans, however, this application will work optimally when the data provided is In figure 2(a), the result obtained is Ineligible (Tidak Layak) because the multiplication result of the Eligible class is smaller than multiplication result of the Ineligible class. In figure 2(b), the result obtained is Eligible (Layak) because the multiplication result of the Eligible class is bigger than that of Ineligible class. The details of the calculations are described in the previous section.

Conclusion
After doing the analysis, design, and testing, it can be concluded as follows:  Classification technique with NBC can be used to predict credit risk classes quickly.  If the data training to Nil, Naïve Bayes NBC cannot classify the record, so that the data predicted will be Nil or Ineligible.  From the results of the testing system, the success rate of data testing as many as 11 data is 63. 6% and of data testing, as many as 50 is 76%. Data mining applications using an NBC is considered to be helpful in the process of deciding giving credit to bank customers.  This study, besides the application, can build the classification of the bank customer (who is Eligible or Ineligible), in the other hand by observing the result of eligibility classification, the bank can also determine which area will be the focus in the next marketing target, so it can minimize the operational cost.