A classification model for student exchange using CART algorithm

The university has a cooperative relationship with other universities including abroad. The cooperation program covers various fields, one of which is the academic field. Students have the opportunity to exchange information, science, and culture through student exchange programs. However, not all students are eligible to join this program because there are terms and conditions that must be met from various aspects such as academics, attitudes, and even financial conditions. The purpose of this study is to analyse and preprocess the training data and then model it in the form of classification using CART. Based on the test results, the proposed model provides satisfactory results with an accuracy percentage of 90%.


Introduction
Student Exchange is organized by certain parties for providing opportunities for students to study abroad within a certain period. This has many benefits, especially for students who are participants in the student exchange program. This program is effective for challenging students in developing a global perspective [1][2][3][4].
Currently, tertiary institutions in Indonesia are organizing student exchange programs. Based on observations, the discovery of facts that occur in the process of acceptance of student exchange participants. There were also unilateral resignations by prospective participants. As for the cause, the participants were not prepared technically and non-technically as well as about funding. The purpose of this study is to minimize participants who are not eligible to join the student exchange program. There are not all students in a university are entitled to participate in student exchange programs. The organizer usually provides a limited amount of quota. However, every student has the same opportunity to participate in the selection process. There are, many conditions that must be fulfilled by students including GPA scores, making English essays, interviews using English as well as reading the Qur'an test (optional).
Data mining has several function such as classification, clustering, and association [5,6]. CART (Classification And Regression Tree) is one classification method of data mining [7][8][9][10][11]. It will produce a classification tree if the response variables are categorical, and produce a regression tree if the response variables are continuous [7,12]. The main purpose of CART is to obtain an accurate group of data as a characteristic of a classification. The distinctive feature of the CART algorithm is that the decision node is always two-pronged or binary forked. In its implementation, a record will be classified into one of the many classifications available on the destination variable based on the values of the predictor variables. The proposed model uses CART to classify the feasibility of prospective student exchange program participants. Based on previous research, this method has advantages namely, the accuracy is very qualified [13][14][15][16][17][18] and this method widely used on several field [19][20][21].

Research methods
CART is the algorithm used in this study. CART is an algorithm of a data exploration technique, which is a decision tree technique. CART is a nonparametric statistical methodology developed for the topic of classification analysis, both for categorical and continuous response variables. The following right and left branch candidates will be used to make a decision tree shown in table 1. The calculation of PL (Prior Left) and PR (Prior Right) is an implementation of equations (1) and (2).
In table 2 it can be seen that for the PL calculation results that are obtained from the amount of data that meets the criteria of the left branch candidate in the total amount of data. Then, PR is obtained from the amount of data that meets the criteria of the right branch candidate divided by the total data. The results of these calculations are presented in table 2. 3 P (j | tL) and P (j | tR) are calculated using formulas (3) and (4). Table 3 shows the calculation results of P (j | tL) and P (j | tR). Tables 2 and 3 can be seen that the results of the calculation of P (j | tL) with L status are obtained from the amount of data that meets the left branch candidate and the status L is divided by the total amount of data that meets the left branch candidate criteria. Likewise P (j | tL) with the status of TL divided by the entire amount of data that meets the criteria of the left branch candidate. For P (j | tR) L status is obtained from the amount of data that fulfils the criteria of the right branch candidate L status divided by the total amount of data that meets the criteria of the right branch candidate, as well as P (j | tR) TL status. Based on table 4, it can be seen that branch number 5 has the greatest goodness value. Then, candidate branch 5 will be the first branch in the decision tree and so on.

Results and discussion
The testing phase is done to find out how the algorithm works and what the results of the algorithm's process are like. In the testing process, the training data used were 171 as many as 30% of the training data. The data has been classified and is original data obtained from the authorities of the program organizing university. This test is done by calculating the recall value to get a percentage of the ability of the algorithm to find information back with the following formula: Based on the three test scenarios above, it can be seen that the highest accuracy lies in the second test by dividing the percentage of 50% and 50%. While the smallest percentage is in the third test with the percentage of training data 40% and 60% testing data. It can be concluded that the amount of data entered either the amount of training data or testing data affects the results of accuracy. The final rule is detailed in figure 1.

Conclusion
This proposed model can be used well in providing the prediction of student exchange participants. One of the variables or determinant factors used is GPA, income, essays, interviews, and others that can provide fairly accurate results. In the testing phase, all test scenarios show positive results. Further works, we suggest to improve this proposed model with any positive determinant factor that is adjustable to any condition and policy. This model needs to compare with several classification methods such as C45, Support Vector Machine, J48, and so on.