Modelling of the process of a program compilation according to the training sample data using the collective intelligence of the simplest computers

The description of the system intended for genetic programming and consisting of a large number of elementary computers is given. Commands and data are placed in a cell consisting of four fields – the first address field, the second address field, the arithmetic field and the result field. This architecture made it possible to place a software binary tree formed by commands and addresses into sequentially located cells of an elementary computer. This technique made it possible to simplify the crossover and mutation operators, as well as the coding of the problem without involving complex programming systems and supercomputers. In the future, it is planned to use table processors tested on complex problems of genetic programming to significantly reduce power consumption in on-board computers of aviation and space technology.


Introduction
Currently, modern production planning tools allow you to describe entire factories in the form of electronic models [1][2][3]. They create the necessary prerequisites for the large-scale implementation of genetic programming (GP) methods.
John Koza is a pioneer in the use of genetic programming to optimize complex problems. He developed a general approach for the formal description of various subject areas. This approach allows us to search for ways to improve real technical systems using GP methods. J. Koza proposed to create robots that can make functional copies of artificial organisms. Several original solutions have already been found. Two of them are already protected by patents. For example, a new method for determining the optimal configuration of wire antennas with the required electromagnetic characteristics; as well as the design of the wings of large aircraft, which aroused the interest of Boeing [4].
Extremely high energy requirements are the main constraint on computing resources to commercialize these methods. Similar problems of reducing energy costs were discussed in the works [5][6][7]. These problems arise, for instance, when carrying out evolutionary computations for problems of planning a directed acyclic graph (DAG) using multiprocessor systems [5]. A systematic review of the literature on the problem of scheduling multiprocessor tasks is given in this work. It is noted that this issue has attracted significant attention over the past three decades.
The complex organization of computational algorithms is the second problem in GP. Programming languages such as LISP [8,9] or PROLOGUE [10] are traditionally used to describe and manipulate trees. The programming costs often exceed the computational costs of these programs.
A wide range of research is also focused on the development of evolutionary algorithms. These works relate, for example, to topics such as characteristics of problems, heterogeneity of the environment and optimization criteria [5]. Many fields of application for evolutionary algorithms exist in multiprocessor systems. A hybrid genetic approach is presented, for example, in [6]. This approach helps to distribute and schedule the parallelization of computational processes on a multiprocessor computer. A system of real-time tasks with balancing the load on processors is considered in work. In addition, "the proposed hybrid genetic approach has better results than the classical GAs in terms of load balancing, minimum response time and good flexibility". An implementation method for solving synchronization problems in multiprocessor systems is given in [7]. This method is called the "Dynamic Genetic Algorithm for Real-Time Planning" (GA-RTS). "A significant feature of GA-RTS is that it can handle dynamic as well as static scheduling of inter-dependent tasks for real-time systems". Generally speaking, such schedules for the operation of multiprocessor systems for the parallelization of tasks to be solved are drawn up using powerful software and hardware resources of third-party computers. Most often, supercomputers and / or cloud computing are used for this.
We propose to build specialized multiprocessor systems based on table processors with a command system that provides a simple implementation of software trees to significantly reduce energy costs for GP. The construction of specialized computing systems is one of the possible ways to solve these problems. These systems consist of a large number of elementary computers created on the basis of table processors. They replace floating point calculations with calculations using tables of logarithms and antilogarithms. This approach reduces energy consumption for computations by an order of magnitude [11]. Table processors tested on such a complex task as genetic programming can be widely used in on-board computers of aviation and space technology for solving a wide class of problems. This is the topic of our further research.
The architecture of elementary computers (EC) should be adapted to work with trees immediately. This will greatly simplify the programming of tasks and reduce the cost of computations due to a significant reduction in power consumption. A feature of this work that distinguishes it from the rest is that evolutionary computations are performed in the system itself using simple tree-like structures implemented in a sequential structure of the simplest two-address commands. The application of this approach makes it possible to significantly simplify the programming of complex problems of genetic programming and other types of evolutionary computations, using the software and hardware of the multiprocessor systems themselves.
The novelty of this work lies in the use of the multiprocessor systems themselves for genetic programming of planning problems. Their resources are proposed to be used both for creating a primary population of individual programs and for executing genetic programming operatorsmutation, crossing and selection.

Formulation of the problem
Let a function F be given on the set of real numbers, connecting n arguments Xi (i=1,2,3…,n) and m constants Cj ( j=1,2,3,…,m) by the arithmetic operations of addition, subtraction and multiplication. Also, a table is given, consisting of k rows containing elements of the known values and the values of the function calculated for them. Suppose that the function F has been "lost" and it is required to find an algorithm for recovering this function from a given set of values using a toroidal matrix of elementary computers. For small n and m, the problem is solved using personal computers. Problems of medium and high difficulty are solved with varying success using powerful software systems on supercomputers, on computer clusters, or using cloud computing. Let us consider ways of solving such problems using a system which consists of a large number of ECs with a special architecture and which uses the rules of interactions taken from the GA.
To connect ECs to each other, various architectures can be used [11], including homogeneous computing environments [12]. Our research was carried out using a computer model of a specialized computing system in the form of a toroidal EC matrix with a size of 1010. Figure 1 shows a similar communication graph of an EC matrix with a size of 55. The vertex numbers correspond to an ordered pair of EC numbers for each coordinate, and to provide cyclic links, the last EC is connected to the first. To test the possibility of solving GP problems using such a system in the Delphi environment, a computer model was built [13]. The basic set of operations included addition, subtraction and multiplication. For testing, a training sample was taken, placed in table 1.  figure 2. In the fourth fields of the first three cells of the program, the sample lines with data X1, X2, X3 are entered in order. In the fourth fields from the fourth to the eighth cells of the EC, random constants C1-C5 are entered, taking integer values from -5 to 5. Further, in the first three fields from the ninth to the fifteenth cells, seven random two-address commands are formed. In the fourth fields from the ninth to the fourteenth cells, the intermediate results of calculations will be entered, and the result of the program will be entered in the fourth field of the fifteenth cell. When generating a program, a simple rule is fulfilled: operands of the i-th instruction can only be operands with addresses from 1 to i-1. Such a simple rule allows to avoid many complex procedures used by developers of software shells. For example, it could be removing unnecessary program codes (the so-called introns).
Further, each EC executes a random program formed by it, makes an assessment of the specimencalculates the sum of the squares of the differences between the results obtained according to the program and taken from the training sample (the so-called fitness function in GA). According to the given communication graph, it exchanges programs and constants generated by it with neighboring ECs. In accordance with the rules of interaction, the EC crosses randomly selected pairs of someone else's and his own program using one-point crossing over, calculates the fitness function and keeps the best program for itself. With a predetermined probability, which is a tuning parameter of the system, the EC makes a mutation in the constants or the commands, and calculates the fitness function for the mutated specimen.
The diagram of the crossing and mutation operators is shown in figure 3. The execution order of the operators of the GP is shown in numbers.  Then, the processes of exchange, crossing over, mutation and natural selection continue until a specimen appears with a fitness function value equal to zero. Thus, within the limits of the binary tree, the EC system finds the required sequence of commands, the number and the values of the constants that satisfy the training sampling. For the test training sampling presented above, emulated into a personal computer (PC) by a toroidal EC matrix, more than ten thousand program restorations from various starting positions were carried out. The collective EC system has demonstrated stable results of the restoration of the "lost" function. One of these results is shown in table 2.  3. Verification of the correctness of the resulting GP scheme In [14] the results of the analytical construction of the mathematical model are presented. We compared them with the results obtained using the GPU circuit to further check the correctness of the resulting circuit.
In [15], the method of the evolutionary decision reconcitiation (EDR) is described. It is a new method of making collective decisions by a group of experts, based on the application of interaction rules taken from the genetic algorithms.
Let us consider a modification of this method. Each expert acts in two roles: as a generator of decisions or their parts at the first stage (the stage of generation) and as an evaluator of the other people's decisions at the second stage (the stage of reconciliation). If the case is especially difficult, the experts may answer "I do not know". Suppose the group consists of M experts and they are assigned a task that they cannot solve alone. Let us introduce the notation. Let G be the average probability that an expert would offer the correct decision; G -the average probability that an expert would answer "I don't know". Consequently, M * G is the number of the experts who would answer "I do not know". At the second stage (the stage of reconciliation) the experts who said "I do not know" are provided with all the solutions, correct and incorrect, obtained at the generation stage.
Let us assume that they will choose correct decisions from this list with probability E . Let the average probability of an expert making the correct decision at the second stage in the group equals P . It is clear that P ≥ G and P depends on M, G , E , G . It is required to find this dependence.
Let's find it using the GP. Let's apply the Monte Carlo method to the computer model that simulates the work of a group of experts who are using the EDR method. Let us construct the training and at the second launch The restored function differed from the original one in both cases starting from the seventh significant digit. Thus, we have obtained an expression as a result of restoring the desired function after changing the notation of variables: Taking into account that the restoration of the function was carried out at M = 3, we obtain the final expression for the desired function as follows: which is fully consistent with the analytical conclusion of [11]. Let the group consist of M experts. To obtain probability Q of the correct decision by a majority of votes, we apply the Condorcet formula [16] and obtain the complete model of this EDR method type:

Conclusion
We can conclude that when using a collective intelligence system that works according to the rules of interaction taken from genetic algorithms, the classical GP problem of compiling a program code for training sampling is solved not only on supercomputers using the most complex software systems, but also on a matrix of the simplest EC. The results obtained are of fundamental importance, since they demonstrate the efficiency of the basic principles underlying the collective intelligence systems when solving a complex GP problem with the use of interaction rules taken from the genetic algorithms. It can also be concluded that the proposed approach to replacing floating point calculations with tabular calculations can be applied to significantly reduce the consumption of electrical energy in the power supplies of on-board computers of aviation and space technology systems, in which reducing the weight of equipment is of fundamental importance.