Comparison of the K-Nearest Neighbor algorithm and the decision tree on moisture classification

Soil moisture is a parameter needed by plants in terms of plant growth. In determining the appropriate soil moisture for plants requires a control system. This study uses a comparison of KNN and decision tree algorithms with the aim of being able to determine soil calcification with yield parameters namely moist and dry, so that it has good accuracy compared to the accuracy of the Decision Tree algorithm with an accuracy of 55.77% with dry class recall of 62.69% moist 51.92% dry precision class 58.33% humid 47.37% and K-Nearest Neighbor with 72.69% accuracy dry class recall 80.60% humid 63.16% dry precision class 72.00% humid 73.47%. The results of the above model testing can be concluded that the K-Nearest Neighbor is the most accurate algorithm for classification of moist soil


Introduction
In terms of plant growth, soil moisture is a parameter those plants produce [1] [2]. The moisture of soil is very dynamic, caused by soil surface evaporation, transpiration and percolation [3] [4]. Humidity information can also be used for management of air reserves, drought early warning, scheduling of irrigation, and weather forecasting. In terms of plant growth, soil moisture is a parameter those plants need [5]. And determine the condition in keeping with the control system plant's needs Classification is a data mining technique used in the classification Based on data interaction [6] [7], data for data sets [8]. In learning the closest distance to the dataset [9], the K-Nearest Neighbor algorithm classifies objects based on the soil moisture dataset [10][11] [12], so that it can produce a good learning pattern. In the meanwhile, the Decision Tree Algorithm learning pattern adopts a tree featuring roof nodes, intermediate nodes, and leaf nodes [13]- [14]. Terms of being able to estimate the soil calcification with yield parameters [15] [16], namely moist and dry, this performance improvement a comparison of the k nearest neighbor and decision tree algorithms, so it has better accuracy [17] [18] [19].

Methodology
The approach used in the K-Nearest Neighbor and Decision tree design analysis. The steps for the research flow chart are as well.

The first step of data collection
The data used is the observation data from the soil moisture sensor with six test sites and different soil constructions becoming analyzed. So as to develop 124 data records through data.

Second Step Processing of Data
Preprocessing is one of the mandatory steps taken to model algorithms, so that data that has duplication, errors and gaps in the record can be minimized as best as possible. There are several techniques for preprocessing data. In this study, the data preprocessing technique used was missing values. These are some of the steps necessary taken to model the algorithm is preprocessing, so that data of duplication, errors and record gaps can be minimized as quickly as possible. For preprocessing data, there's many techniques. In this analysis, there were missing values in the data preprocessing technique used

Third Transformation
Step This research has a data identity after preprocessing the data, namely the identity of the test with the value of the AE test site as a polynomial, the identity of the soil type, namely Andisol and Vertisol as Binominal, Test Time Identity Namely Morning Afternoon and Evening as Polynomial, Identity of Soil Results, namely numerical, Identity of Moist and Dry as Binominal classification results.

The Data Mining Model's Fourth Step
What is achieved at this step should be to model the classification by implementing training testing and evaluation tests with the Decision Tree algorithm and K-Nearest Neighbor.

Knowledge of both the Fifth Step and Action
The example that demonstrates an easy-to-understand result from the pattern process at this step becomes results to be used as information for decision-making.

Result
The primary objective of this study is to determine the level of accuracy in soil moisture prediction of the data mining classification algorithms. The results of the algorithm analysis using the Decision Tree and k-Nearest Neighbor technique can be compared when assessing the degree of precision. Any algorithm's performance will be checked before making a comparison. Which can be seen in Figure 2, the leading to improved is to use 10-Fold Cross Validation for the training algorithm. The design of the datamining model explains, based on Figure 2, that the first uses the retrieve operator that functions as a dataset, the second the multiply operator functions as a duplicate of the data that can be used by multiple different operators, the third Cross Validation functions as a sub-process for training tests and tests performed.

Model Testing for Decision Tree
The implementation of the training experimental setup or test can be seen in the following figure based on the outcome of the datamining model design.  Figure 4, that the accuracy rating is 55.77 percent with dry prediction details and it turns out to be true dry up to 42 data, dry prediction and turns out to be true moist up towards 30 data, humidity predictions and turns out to be true dry up to 25 data, then the prediction moist and turned out to be true moist up to 27 data. while the percentage of class recall and class precision can be seen in the following table.

The K-Nearest Neighbor Model Checking
The implementation of the learning experimental setup and test can be seen in the following figure, based upon the results of the K-Nearest Neighbor design method. Based on Figure 6, the K-Nearest Neighbor Algorithm Model shows that the operators used are the First Operator Decision Tree, Second Apply model and the third operator Performance. Then the level of accuracy can be generated as follows. Based on Figure 7, the accuracy value of the K-Nearest Neighbor Algorithm shows that the accuracy rate is 72.69% with details of dry predictions and it turns out that 54 data is true dry, dry prediction and turns out to be true moist as much as 21 data, moist predictions and turns out to be true dry as many as 13 data, then the prediction is humid and it turns out to be true moist for 36 data. while the percentage of class recall and class precision can be seen in the following table.

Discussion
Based on the results of the performance test, the comparison data can be seen as follows.

Conclusion
The conclusion from this research is the comparison of the accuracy of the Decision Tree algorithm with an accuracy of 55.77% with dry class recall of 62.69%, moist 51.92%, dry precision class, 58.33%, moist 47.37% and K-Nearest Neighbor with an accuracy of 72, 69% dry class recall of 80.60%, humid at 63.16%, dry precision class, 72.00%, moist 73.47%. The results of the above model testing can be concluded that the K-Nearest Neighbor is the most accurate algorithm for classification of moist soil.