Brought to you by:
Paper The following article is Open access

Modified union feature selection method on English translation of hadith text clustering

, , and

Published under licence by IOP Publishing Ltd
, , Citation A F Huda et al 2019 J. Phys.: Conf. Ser. 1402 066052 DOI 10.1088/1742-6596/1402/6/066052

1742-6596/1402/6/066052

Abstract

The high feature space (dimension) is one of the main issues to be considered in the text clustering process. Therefore, various dimensional reduction methods have been introduced for selecting informative sub feature. Each method uses a different strategy to select sub feature, and the results are different even if using the same data set. Typically, union methods and intersection methods are used to combine selected sub feature with different reduction methods. The union method selects all feature and intersection only selects the general feature under consideration. Thus, the union approach causes an increase in feature dimensions and the intersection approach causes the loss of some important feature. Therefore, in order to take advantage of a method and reduce its weaknesses, this research proposes new approach, which are called modified union. This approach applies the union methods to select top ranking feature and applies intersection methods to the rest of the feature. In this case, feature selection uses the Term Variance (TV) and Document Frequency (DF) methods to calculate the relevance value of each feature. The effectiveness of the proposed method is tested on the data set of Hadith Shahih Bukhary. The results show that the proposed method improves clustering accuracy over other methods with DB index is 2.7.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/1402/6/066052