Paper The following article is Open access

Machine learning approach for categorical document mining

and

Published under licence by IOP Publishing Ltd
, , Citation P Sarma and H Deka 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1070 012052 DOI 10.1088/1757-899X/1070/1/012052

1757-899X/1070/1/012052

Abstract

The word text mining or document mining means automatic classification of text documents to different categories depending on their content. At present days this area of research is attracting many researchers due to increasing use of electronic documents in everyday life. In this paper authors have proposed a document mining method using Bisecting K-means algorithm, KNN classifier and Decision tree. There are many machine learning classification algorithms used for information retrieval. But most of them have very high computational complexity. So a method Bisecting K – Means clustering algorithm instead of normal K-means algorithm is used here and this approach trivially reduces the number of comparisons compared to others. The researchers also used decision tree at last to obtain the sub categories more accurately. After analysis it is found that the combination of Bisecting K Means and KNN classifier enhance the accuracy of the categorization. Accuracy of this proposed system is shown for each category at Result and discussion section.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1757-899X/1070/1/012052