Abstract
The article presents the results of analyzing the process of detecting patterns in semi-structured and unstructured texts. Text preprocessing is aimed at removing terms that do not carry useful information, bringing the words of the text to a single representation and forming a set of indices of a text information resource - statistical, linguistic, structural. An algorithm for graphematic analysis has been developed as the first stage of automatic processing of texts in natural language. The algorithm makes it possible to single out semantically significant natural language constructions in a semi-structured text using graphematic descriptors, to detect and replace abbreviations and abbreviations.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.