Development of the algorithm for graphematic analysis and isolating of semantically significant constructions in poorly structured text

A V Rabin; A A Petrushevskaya

doi:10.1088/1742-6596/1679/4/042002

Journal of Physics: Conference Series

Paper • The following article is Open access

Development of the algorithm for graphematic analysis and isolating of semantically significant constructions in poorly structured text

A V Rabin¹ and A A Petrushevskaya¹

Published under licence by IOP Publishing Ltd
Journal of Physics: Conference Series, Volume 1679, Cybernetics and IT Citation A V Rabin and A A Petrushevskaya 2020 J. Phys.: Conf. Ser. 1679 042002 DOI 10.1088/1742-6596/1679/4/042002

Download Article PDF

Article metrics

52 Total downloads

Article and author information

Author e-mails

alexey.rabin@guap.ru

aap@guap.ru

Author affiliations

¹ Saint-Petersburg State University of Aerospace Instrumentation (SUAI), ul. Bolshaya Morskaya, 67, lit. A, St. Petersburg, 190000, Russia

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

The article presents the results of analyzing the process of detecting patterns in semi-structured and unstructured texts. Text preprocessing is aimed at removing terms that do not carry useful information, bringing the words of the text to a single representation and forming a set of indices of a text information resource - statistical, linguistic, structural. An algorithm for graphematic analysis has been developed as the first stage of automatic processing of texts in natural language. The algorithm makes it possible to single out semantically significant natural language constructions in a semi-structured text using graphematic descriptors, to detect and replace abbreviations and abbreviations.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Development of the algorithm for graphematic analysis and isolating of semantically significant constructions in poorly structured text

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract