Feature extraction for document text using Latent Dirichlet Allocation

P M Prihatini; I K Suryawan; IN Mandia

doi:10.1088/1742-6596/953/1/012047

Journal of Physics: Conference Series

Paper • The following article is Open access

Feature extraction for document text using Latent Dirichlet Allocation

P M Prihatini¹, I K Suryawan¹ and IN Mandia²

Published under licence by IOP Publishing Ltd
Journal of Physics: Conference Series, Volume 953, The 2nd International Joint Conference on Science and Technology (IJCST) 2017 27–28 September 2017, Bali, Indonesia Citation P M Prihatini et al 2018 J. Phys.: Conf. Ser. 953 012047 DOI 10.1088/1742-6596/953/1/012047

Download Article PDF

Article metrics

2420 Total downloads

Author e-mails

manikprihatini@pnb.ac.id

Author affiliations

¹ Electrical Engineering Department, Politeknik Negeri Bali, Kampus Bukit Jimbaran, Kuta, Badung 80361 Bali Indonesia

² Accounting Department, Politeknik Negeri Bali, Kampus Bukit Jimbaran, Kuta, Badung 80361 Bali Indonesia

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Feature extraction for document text using Latent Dirichlet Allocation

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract