Machine learning approach for categorical document mining

The word text mining or document mining means automatic classification of text documents to different categories depending on their content. At present days this area of research is attracting many researchers due to increasing use of electronic documents in everyday life. In this paper authors have proposed a document mining method using Bisecting K-means algorithm, KNN classifier and Decision tree. There are many machine learning classification algorithms used for information retrieval. But most of them have very high computational complexity. So a method Bisecting K – Means clustering algorithm instead of normal K-means algorithm is used here and this approach trivially reduces the number of comparisons compared to others. The researchers also used decision tree at last to obtain the sub categories more accurately. After analysis it is found that the combination of Bisecting K Means and KNN classifier enhance the accuracy of the categorization. Accuracy of this proposed system is shown for each category at Result and discussion section.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Machine learning approach for categorical document mining

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract