Paper The following article is Open access

An LSTM model for extracting hierarchical relations between words for better topic modeling

Published under licence by IOP Publishing Ltd
, , Citation Arshad Javeed 2021 J. Phys.: Conf. Ser. 1780 012019 DOI 10.1088/1742-6596/1780/1/012019

1742-6596/1780/1/012019

Abstract

Often when dealing with text data, there exists valuable information that determines the relationship between the words encountered in the corpus. The type of relationship which is sought after is the "has-a" and "is-a" relationship, with which one can build a hierarchical representation of words. Since each language has its own set of rules and syntax, extraction of the relationships ultimately boils down to understanding the syntax of the particular language and using relevant features in the process.

The paper presents a machine-learning model for understanding the language syntax and deducing the relationships between the words encountered. To be specific, a sequence modeling approach if followed, where the model receives a sequence of words and makes use of the various properties of the words to build a hierarchical graph. The algorithm described will be independent of the language, and the model should be versatile enough to be trained for different languages. In addition, the paper also describes how this information can be used to build better topic models, given a corpus of text.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/1780/1/012019