Paper The following article is Open access

Development of the algorithm for graphematic analysis and isolating of semantically significant constructions in poorly structured text

and

Published under licence by IOP Publishing Ltd
, , Citation A V Rabin and A A Petrushevskaya 2020 J. Phys.: Conf. Ser. 1679 042002 DOI 10.1088/1742-6596/1679/4/042002

1742-6596/1679/4/042002

Abstract

The article presents the results of analyzing the process of detecting patterns in semi-structured and unstructured texts. Text preprocessing is aimed at removing terms that do not carry useful information, bringing the words of the text to a single representation and forming a set of indices of a text information resource - statistical, linguistic, structural. An algorithm for graphematic analysis has been developed as the first stage of automatic processing of texts in natural language. The algorithm makes it possible to single out semantically significant natural language constructions in a semi-structured text using graphematic descriptors, to detect and replace abbreviations and abbreviations.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/1679/4/042002