Analysis of world and domestic experience in the use of XML schemas in the implementation of information interaction during maintaining the information model of a capital construction object

The lack of integration solutions for transferring data according to the information model of the capital construction object or certain parts of the information model between different systems (for example, from the customer to the expertise, from the contractor to the State information system for the provision of urban planning activities (GISOGD) or other state information systems) leads to the loss of data and important information stored in the information model, which accounts for reproduce or transmit in native format without the possibility of data processing by the information system. The purpose of this article is to analyze the world and domestic experience of using XML schemes for organizing the processes of transferring and storing structured data in information systems (including government ones) and software with a reasonable selection of best practices.


Statement of the research problem
This paper presents proposals for the implementation of a set of measures of the federal project "Digital Public Administration" of the national project "Digital Economy of the Russian Federation" [1] in pursuance of the instructions of the President of the Russian Federation V.V. Putin dated July 19, 2018 №. Pr-1235 on the transition to a life cycle management system for capital construction objects by introducing information modeling technologies in order to modernize the construction industry and improve the quality of construction [2].
The purpose of this article is to analyze the world and domestic experience of using XML schemes for organizing the processes of transferring and storing structured data in information systems (including government ones) and software with a reasonable selection of best practices. As part of solving this problem, this article contains: 1. Research of available XML-schemes for storing object-oriented information model (3D / 2D) -IFC and XML-scheme for storing digital terrain model -spatial data.
2. Research of Russian and foreign rules, norms, practices, scientific developments and methods in terms of ordering information data about capital construction objects using XML-schemes or similar IOP Publishing doi: 10.1088/1757-899X/1030/1/012067 2 solutions, including a description of the XML-scheme for the conclusion of the examination of project documentation and / or results engineering surveys.
3. Comparative analysis of the best international practices, scientific developments and methods of using XML-schemas or other similar solutions to determine the structure and composition of the information model of a capital construction object. 4. Research of foreign integration solutions for data exchange of the information model of a capital construction object or their specific parts between various information systems, including state ones; also conducting a comparative analysis of the requirements for the exchange of information model data between different stages of the object's life cycle.

Research of available XML storage schemes for digital information models
This section discusses available XML schemas for storing an object-oriented information model (3D / 2D) -IFC and XML schemas for storing a digital terrain model -spatial data, an analysis of the XML schema description languages and the choice of the optimal one for further use when developing XML schemas of information models.

Comprehensive analysis of the description and storage of engineering information
Within the framework of this study, engineering information or digital engineering information is understood as information sets or collections of files that contain drawings (two-dimensional and three-dimensional), descriptions of electronic models of products, descriptions of simulation and calculation models (in any modeling languages or in the form of program code), images (in any format), descriptions of geospatial data and other materials necessary for organizing production (any industrial, construction, etc.) [3]. The collection of this information will be called the information model. However, for the purposes of this study, engineering information management is considered based on the life cycle of a capital construction object with elements of storage and description of the territory, and when organizing production, only construction production is considered.
First of all, it is necessary to identify two technologically different approaches to engineering information management: document-oriented and data-oriented [4]. The first approach involves the processing of complete, fully formed documents containing engineering data between various participants in the life cycle of an object. Data from these documents can be extracted both in an automated form and through human processing. The second approach involves the transfer of "some" source from which the necessary data can be extracted. An example of this approach is data integration software, for example, Standard Data Access Interface (SDAI), Tekla Open API, Revit API (in terms of creating and managing models and documents), ARCHICAD API, GeoAPI, and others. It is worth noting that the way of information exchange and interaction between participants as a whole forms the requirements and restrictions on the approaches and formats used. Within the framework of this scientific work, only the document-oriented approach will be considered.
To date, a fairly large number of formats have been developed in the world that allow describing the engineering information of the life cycle of a capital construction object [5]. To form a complex information model of a capital construction object, it is necessary to combine at least three types of engineering information: industrial data, geospatial data and text data (explanatory notes, etc.).
In particular, industrial data exchange formats: CIS / 2, DSTV, SDNF, DGN, DXF, DWG, IGES, STEP and JT. There are also older formats that can be supported by various software, but they are not of interest in the framework of this study. Separately, we can highlight the IFC format, which was specially developed for the construction industry [6].
To describe geospatial data, most of the formats can be found on the website of the Open Geospatial Consortium (OGC), but for the purposes of this study, we highlight some of the formats: 3D Tiles, ARML 2.0, CityGML, GeoSciML, OpenGIS® GML, EO-GeoJSON, GroundwaterML, 3d Scene Layer (I3S), IndoorGML, LandInfra / InfraGML, OGC PipelineML, WaterML. The following formats can be used for information exchange of text documents: PDF, PostScript, TeX, Open Document Format (ODF), Open XML (ECMA-376 and ISO / IEC-29500), or Office Open XML.
To ensure a systematic approach to the analysis of engineering data formats and based on world practice, it is advisable to introduce a division into the following information blocks: -graphic information, -attributive information, -information that determines the level of accuracy (error) of the generated engineering data, -information describing the processes of interaction between participants, -information describing the processes of use or maintenance. All this taken together can be designated as information that determines the requirements for the development of the information model, which accordingly determines the requirements for the composition and content of the information model. Based on these five groups, we will consider the applicable formats for storing and managing engineering and scientific data. Table 1 presents summary information on the formats for the exchange of engineering data, formed on the basis of materials from various sources [7,8,9]. The use of XML markup language in most engineering data formats is the basis for the claim that it has a bright future in the formation of the industry data format (IFC) [10]. From the outset, XML has been chosen to mark up most engineering data formats because it meets the following important requirements: -potential extensibility of language constructions, allowing the addition of new types of information; -availability of standard software tools for parsing markup language constructions, searching by marked text and markup transformation.

Comprehensive analysis of the description and storage of engineering information
The XML Schema specification was created and recommended to describe the structure of an XML document. Like most XML description languages, XML Schema was conceived to define the rules that a document must obey. But unlike other languages, XML Schema was designed so that it can be used in creating software for processing XML documents.
After checking the document for XML Schema compliance, the reader can create a document data model that includes: -dictionary (names of elements and attributes); -content model (relationships between elements and attributes and their structure); -data types.
Each element in this model is associated with a specific data type, allowing you to build an object in memory that matches the structure of an XML document. Object-oriented programming languages find it much easier to deal with such an object than with a text file.
Another convenience of XML Schema is that one vocabulary can refer to another, and thus the developer can use existing vocabularies and more easily establish and distribute XML structure standards for specific tasks. Table 2 shows brief information about XML schema description languages. Comparative analysis of XML schema description languages indicates that at present there are and continue to improve many XML schema description languages, each of which has its own characteristics. Schematron is the most promising XML schema description language from the point of view of optimizing the size of the transmitted information.

Analysis of the practice of using XML-schemas to organize information about a capital construction object
This section conducts a study of Russian and foreign rules, norms, practices, scientific developments and methods in terms of organizing information data about capital construction objects using XMLschemes or similar solutions, analysis of the best international practices, scientific developments and methods of using XML-schemes or other similar solutions for determining the structure and composition of the information model of a capital construction object, researching foreign integration solutions for exchanging data on the information model of a capital construction object or their specific parts between various information systems, including state ones, and between different stages of the object's life cycle.

Public Sector XML Schema Rules, Regulations, and Practices
On April 7, 2011, Order No. 79 of the Federal Fund for Compulsory Health Insurance "On the approval of the general principles of the construction and operation of information systems and the procedure for information interaction in the field of compulsory health insurance" was issued. In this Order, information interaction between the Regional and Central Segments of the Unified Register of Insured Persons is provided in XML format.
On On January 27, 2015, a decision of the Board of the Eurasian Economic Commission was adopted, which formulated the rules for electronic data exchange in the integrated information system of foreign and mutual trade, in which XML is defined as a data exchange format.
On January 1, 2016, GOST R ISO / IEC 19770-2-2014 was introduced in the Russian Federation. This standard is an International Standard for identifying software identification tags. A software identification tag is an XML file that contains reliable identification and control information about a software product. The software identification tag is installed on the computing device along with the software product. The tag can be created during installation or added later for already installed untagged software.
On August 18, 2016, Order №P/0390 of the Federal Service for State Cadastre and Cartography was issued "On the organization of work on the provision by the cadastral registration body of information entered in the state cadastre of real estate, submission to the cadastral registration body of applications for state cadastral registration and requests for information, entered in the state cadastre of real estate ". The order provides for the use of XML schemas for organizing the exchange of electronic documents.
On December 24, 2019, Decision № 239 of the Board of the Eurasian Economic Commission "On the requirements for the composition and structure of information in electronic form on the amounts of indirect taxes paid to the budgets of the member states of the Eurasian Economic Union" was issued. The Decision stipulates that the tax authorities exchange information in the form of xml files. On February 7, 2020, the Federal Tax Service of the Russian Federation published a Letter on the direction of the XSD scheme for the KND register 1155118.

Special Plans for Internal Reforms (PERI, Spain)
Defining 3D urban models, creating and co-editing these models has several advantages in urban governance processes. The 3D component makes it easy to present complete information that is easily understood by any user. These models facilitate the collaboration of experts in different fields (design, operation, management, etc.), contributing to the development of their knowledge and the creation of a unified model.
The urban governance process with the greatest practical effect in Spain is the special plans for internal reforms (in Spanish PERI -Plan Especial de Reforma Interior) [11]. The aim of PERI is the urban improvement of urban space. PERI allows the definition of integrated internal tasks covering all areas and problems of the urban environment. PERI is created by local municipalities and city districts. However, the PERI determination process requires interaction with industry professionals and companies, as well as with citizens: to collect their needs and concerns at the first stage, and then to collect feedback through improvements or suggestions after the administration has communicated the initial proposal.
The management process outlined above is a prime example of the need to advance in the development of urban information systems based on open models and standards that combine different scales (construction and design -BIM, urban environment -GIS). As a result of the execution of this model of urban management, a presentation system is created that allows the integration of heterogeneous information covering several areas and evolving over time. This system has a lot of semantic power that allows reasoning about the model and evaluating multi-domain interactions.

Open Street Maps project
Co-creating and publishing a 3D city model has several advantages, including faster results because the work is done in parallel, or the ability for experts in different fields to collaborate and share their knowledge in one model. Nevertheless, the creation and publication of 3D city models is still an open problem due to the high degree of complexity and volume.
The Open Street Map (OSM) project is an excellent example of co-creation and maintenance of georeferenced 2D geographic models [12]. OSM has its own tools and data model with built-in dedicated support for multi-user editing and version control.
The integration of information at different scales is one of the main aspects to consider when managing urban information. There are now systems and tools that address urban management from a geospatial perspective (GIS) and others from a building perspective (BIM). Integration of both domains is an urgent task for which solutions are offered at different levels: data level, process level and application level. CityGML is currently the most widely used standard for the presentation and exchange of city information. This data model is especially relevant when integration between GIS and BIM is required through semantic modeling and 3D representation of geospatial information. The most relevant integration proposals currently developed are based on the use of standard and open data models (mainly CityGML and IFC) and integration at the process level using semantic technologies or based on web services.

3D-UIS project
The 3D Urban Information System (3D-UIS) is a centralized information model based on the philosophy of a shared data environment that provides a single point of access to city information that is properly structured and connected, generated from multiple external sources and connected to different external tools [13]. The system allows multi-scale presentation and is based on standard data models (CityGML and IFC).
The 3D City Information System (3D-UIS) is divided into 3 main components: 1) city storage, which is a PostgreSQL relational database, supplemented by the PostGIS extension (added support for georeferenced 3D objects). The storage of the data model is based on the 3D City database, which allows the presentation and storage of information modeled in CityGML.
2) a repository, which is a cloud storage of files. This repository includes all IFC files that contain most of the building information, as well as other IFC-derived file formats that are used for visualization purposes (eg COLLADA, XML).
3) City-Building Relationship Database, which is a relational database that stores the relationship between the various models (via URIs) included in the City and Building repositories. A link database represents relationships between models, allowing you to link buildings or individual objects such as windows or walls. The various users involved in the project can interact and share resources such as images, send messages, or resolve other issues. This information is also included and linked to this database.

buildingSmart project
The buildingSMART promotes Industry Foundation Classes (IFC), which was published as an ISO standard in 2003 [14]. IFC is a free vendor-independent standard and includes a large set of representations of building information, including many different geometric representations and a large set of semantic objects modeled in a strictly object-oriented way. To provide dynamic (schema invariant) extensions and adaptation to local or national requirements, the IFC data model provides a PropertySet (PSet) mechanism that relies on dynamically defined name-value pairs.
In addition to exchanging data using IFC, working with different types of construction information, such as property sets and definitions, requires standardized terminology. Thus, the buildingSmart Data Dictionary (bsDD) was designed as a central repository that stores multilingual IFC object definitions and generic schema extensions such as the IfcWall object description and Pset_WallCommon.
It is important to note that in the overwhelming all the considered world information systems and in all the considered information systems implemented in the Russian Federation, the W3C XML Schema language is used to describe XML documents.

Conclusion
Based on the analysis of available XML schema description languages and the experience of using XML schemas for organizing data transfer and storage processes in information systems, the following conclusions can be drawn: 1. XML documents are a common and efficient way to exchange electronic documents. 2. In foreign and corporate practice, sufficient experience has been accumulated in the use of XML and XML Schema (XSD) for the exchange of data on capital construction objects, including project documentation.
3. Currently, there are and continue to improve many languages for describing XML-schemas, each of which has its own characteristics. Schematron is the most promising XML schema description language from the point of view of optimizing the size of the transmitted information.
4. In the overwhelming majority of the considered world information systems and in all considered information systems implemented in the Russian Federation, the W3C XML Schema language is used to describe XML schemas (XSD).
From the listed positions, it can be concluded that it is advisable to use XML schemas as a tool for organizing the exchange of electronic documents at all stages of the life cycle of a capital construction IOP Publishing doi:10.1088/1757-899X/1030/1/012067 10 object. At present, it is advisable to use the W3C XML Schema language as a language for describing XML schemas, to ensure compatibility with other information systems and having in mind the proven reliability of using the XML schemas described in this way. At the same time, it is necessary to bear in mind the potential for optimizing the system of transmission and storage of electronic documents when using more modern description languages for XML-schemas (such as Schematron) and to provide in the future 2-3 years the possibility of conducting additional studies with possible experimental areas of transition to new description languages XML schemas.
This work was financially supported by the Ministry of Science and Higher Education of the Russian Federation (#SP-4555.2018.1)