Extraction of isomorphic subformulas as a tool for logic ontology construction

A way of constructing a logic ontology for complex structured objects is considered in the paper. The descriptions of such objects are given in the predicate calculus language. An algorithm for extracting subsets of objects with the same properties and establishing their relative position is described. To construct such an ontology, an algorithm for extracting the largest subformula isomorphic to subformulas of all descriptions of objects from the class is used.


Introduction
The concept of ontology as a branch of philosophy has existed since ancient times. At the same time, in philosophy, ontology is understood as abstractions that take into account some features of objects (or relations between parts of these objects) and ignore others. With the development of computer technology and informatics, this term was transferred to the construction of a detailed formalization of a certain area of knowledge, taking into account structure of the objects under study and relationships between their parts. The creation of a domain ontology is one of the ways of knowledge representation for their use in artificial intelligence systems and knowledge extraction from databases [1 -3].
For a mathematician, an ontology is a directed graph, the nodes of which define certain concepts (classes of objects). If the oriented edge (N 1 , N 2 ) connects two concepts N 1 and N 2 , then all objects included in the concept N 2 are also included in the concept N 1 .
A logical ontology is usually understood as an ontology over complex structured objects. Elements of such objects have predetermined features and are in predetermined relationships. Moreover, for all objects included in one concept, there is a subset of its elements that satisfy the same formula.
A convenient language for description such objects is the predicate calculus language [4]. Earlier [5] the author has introduced the concept of isomorphic elementary conjunctions of literals. This relation differs from that of equivalence. In fact, it means that two elementary conjunctions of literals coincide up to the names of their arguments and the order of the literals in them and define the same property of their arguments.
An algorithm for extracting a formula isomorphic to subformulas of descriptions of a set of objects is described in [5]. This algorithm is the basis for creating a logical ontology.
The following procedure for constructing a logical ontology is considered in this paper. A set of complex structured objects is given. The descriptions of these objects are presented in the form of elementary conjunctions of predicate formulas that specify the properties of the objects' elements and the relations between them. This set defines the initial concept for a logical ontology.
For each set of objects defining the already constructed concept of ontology, it is required to select the maximal formulas that are (up to variable names) subformulas of descriptions of a group of objects from this concept. The sets of objects that satisfy these subformulas define new concepts of the logical ontology that are daughter ones of that from which they are obtained.
The proposed paper is devoted to the description of main definitions and the presentation of the algorithm scheme, which make it possible to implement the just described procedure for constructing a logical ontology.

Main definitions
Before giving a statement of the problem, it is necessary to formulate the basic definitions. ..,b m ) and are denoted as U R,P and U R,Q , respectively [5].
represent a more simple example of isomorphic formulas. They are isomorphic to the relation x < y & y < z & x  z "the number y belongs to the interval (x, z)".
It is easy to prove that the isomorphism relation is the equivalence relation.

For example, let
Every its literal is their common subformula. But only one literal r(x, z) is their maximal common subformula. The formula is their maximal common up to the names of variables subformula with the unifiers U* P,A = (x, y) → (u, v) and U* P,B = (x, z) → (u, v) because 1 The words "up to the names of arguments" would be omitted below.

The problem setting
Let N 0 be a set of complex structured objects in the form ω={ ,...,ω }. The predicates p 1 ,…, p n defined on the elements of ω specify the properties of its elements and the relations between them. Description S()of an object ω is an elementary conjunction of all atomic formulas with predicates p 1 ,…, p n which are true on ω.
For every subset N  N 0 the description S(N) of elements from N is a disjunction of nonisomorphic elementary conjunctions, every one of which is isomorphic to the description of some object from N. If ω  N then every object ω' such that S(') is isomorphic to S()also belongs to N and is its representative. Below we suppose that N contains only representatives of such classes of equivalence. That is why we can write

Definition 4.
A formula which is a maximal common subformula for every pair of descriptions of objects from N is called the maximal common property (MCP) of elements from the set N.
Let N 1 ,…, N m (possibly intersecting) be subsets of the set N 0 . It is required to construct an oriented graph whose node with zero indegree is labeled by the set N 0 and corresponds to the set of nonisomorphic formulas, each of which is isomorphic to some description of an object from N 0 .
Moreover, if there exist oriented edges from a node labled by N k to nodes labeled by N k1 ,…,N kr , then -N k is the union of the sets N k1 ,…,N kr ; -for each i = 1, ..., r, each formula in the description of objects from N ki is isomorphic to one of the formulas in the description of N k , i.e. objects from N ki have all the properties common to all objects from N k ; -if i  j, then the MCP of objects from N ki and N kj are different.
In addition, we will label the nodes of the graph (except of N 0 ) not by the names of the sets, but by formulas that define the MCP of the elements of the corresponding set.
To solve this problem, one can use an algorithm for extracting a maximal formula isomorphic to subformulas of two elementary conjunctions described in [5]. The idea of extracting a maximal formula isomorphic to subformulas of two elementary conjunctions is based on the notion of incomplete sequent and is presented in [6]. This idea is as follows.
Let ¯ and ¯ be lists of variables of elementary conjunctions A ( ¯) and B ( ¯). Sequentially deleting one of the literals of the formula B ( ¯) and receiving the formula B'( ¯'), we check the logical sequent If the logical sequent is fulfilled, the resulting formula B'( ¯') is the MCP of formulas A ( ¯) and B ( ¯). When using constructive methods for checking this sequent (sequential predicate calculus, the resolution method for predicate calculus, or exhaustive search of strings of values from ¯ for ¯' 3 ), the found values ¯' for ¯' will define the unifier U* A,B' .
It should be noted that, firstly, an inefficient exhaustive search of formulas which may turn out to be the maximal common subformula is performed here, and, secondly, only the MCP of the greatest length is extracted. These disadvantages are eliminated in the algorithm described in [5].
2  ¯'  means «there are pairwise different values for the list of variables ¯'  ». 3 Although these methods give the same results, they differ significantly in their computational complexity. They are all exponentially difficult (the problem is NP-complete), but for logical methods, the exponent contains the number of literals of the formula B'( ¯'), and for an exhaustive algorithm, the exponent contains the number of its variables. If MCP(N i , N j ) = P i then N j  N i . It means that an oriented edge from N i to N j is presented in the graph defining the ontology.
If MCP(N i , N j ) = P with P not isomorphic to neither of P 0 , P 1 , …, P m then there is no relation of inclusion between N i and N j .
If MCP(N i , N j ) = , then there is no relation of inclusion or intersection between N i and N j . 3. Draw oriented edges from the node labeled by N i to all nodes labeled by N j , for which MCP(N i , N j ) = P i . 4. If there is an edge (N i , N j ) and an oriented path from N i to N j , then the edge (N i , N j ) is removed.
Note that the emergence of a "vicious circle" is impossible, since any oriented path (N i1 , …, N ik ) in the graph corresponds to the inclusions N ik  ...  N i1 .
In addition, it is possible to construct an ontology in which sets appear that are not represented in the input data. These are the sets MCP of which are formulas P, which appear in item 1 of the algorithm and are not isomorphic to any of the formulas P 0 , P 1 , …, P m .

Model example
Consider a model example of constructing an ontology of convex quadrangles.
In this example, a quadrangle is a sequence of names of four points connected by segments in a cycle.
Predicates p, e, r are defined on the set of quadrangles: Representatives of the sets "Parallelogram", "Rectangle", "Rhombus", "Square", "Trapezium", "Isosceles trapezoid", "Deltoid" are given as representatives of the set of convex quadrangles. Below the following notations will be used: P for "Parallelogram", RP for "Rectangle", R for "Rhombus", S for "Square", T for "Trapezium", IT for "Isosceles trapezoid", D for "Deltoid".  It is supposed in this example that every apex and every side of quadrangles are denoted in formulas presenting MCP of the mentioned sets as it is shown in Fig. 1. It permits, instead of finding the maximum common up to the names of the arguments subformula, to find the maximum subword in the notations of the corresponding formulas.
Having selected the MCP of each set elements, formulas that determine the corresponding properties of quadrangles are obtained. P : p(x,y,z,u)  The ontology of such quadrangles is shown in Fig. 2. Edges from the node labeled as CONVEX QUADRANGLES to the other nodes are deleted because there are oriented paths from it to every other node.
When constructing this ontology, the sets of quadrangles were not taken into account, which were not specified initially, but appeared as a result of pairwise extraction of MCP. These are the sets M 1 =  In this case, the ontology obtained in a similar way will look as it is shown in Fig. 3.

Conclusion
The proposed algorithm for constructing a logical ontology is exponentially complex. This is due to the fact that the problem of extracting a maximal formula isomorphic to subformulas of two elementary conjunctions is NP-hard. However, its use, like the use of any other ontology, allows you to quickly classify the object under study.
The exponential complexity of the algorithm was eliminated due to the fact that conventions were adopted on the order of the arguments in the formulas defining the MCP of two sets in the example given in the article. The main attention in the example was paid to the process of constructing a graph defining an ontology, based on the fact that for two sets their maximum common up to variable names formula is a formula that defines the MCP of elements of one of the sets. Unfortunately, such agreements are not possible for all sets for which the ontology is being built.
In the example, it was also shown that in the process of extraction MCP of two sets, sets may appear that were not given in advance. This is natural, since an example was taken, the ontology for which has existed for more than 1000 years and the extracted sets M 1 and M 2 were not interesting for researchers. However, the appearance of such sets suggests that it is possible to consider a problem in which subsets of the set of objects under study N 0 are not given, but they should be selected.
The proposed approach allows solving such a problem. But when solving it, a huge number of subsets will be allocated. Calling in a subject matter expert can separate "important" subsets from "irrelevant" ones.