Abstract
Because the traditional feature extraction is based on the statistical information such as document frequency and word frequency, the selection of feature words is ignored, and the semantic correlation between words in the text is ignored. The feature selection method based on complex network takes into account the semantic association between words, but does not take into account the statistical information such as word frequency. The above methods are not satisfactory for the selection of feature words, which affects the effect of text classification. Therefore, this paper combines the two, proposes a new method for feature selection, and in order to solve the problem of low accuracy rate of single classification algorithm, USES integrated learning [1] to strengthen the classification algorithm. The results show that this method is feasible and achieves good classification effect.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.