Distributed approach for optimization problems

The paper deals with a distributed approach for optimization problems based on the use of bioinspired algorithms. This class of algorithms allows us to split solutions into independent subsets and process it in separate streams. One of the main problems is a decomposition and subsequent convolution of the solution. The problem becomes more complex in those cases when several decomposition levels are required. From the point of view of computing power, the simultaneous processing of parallel threads requires significant CPU time and RAM resources. In this regard, the use of several compute nodes that interact via network interfaces contributes to an increase in computational resources and an enhance fault tolerance of the system. The paper proposes the distributed subsystem for NP-complete optimization problems solving, which allows to split a set of input data into subsets in an automated mode, distribute subtasks between computational nodes, and collect the results to solve the original problem. To confirm the system performance, a software implementation was developed in the Java and the message broker RabbitMQ to ensure the interaction of software agents with each other. A series of experiments were carried out, in which studies were conducted with several simultaneously running tasks and agents.


Introduction
In the world community, bioinspired search optimization methods, based on the simulation of natural phenomena and biological systems, use to increase the speed solution and to partially solve the problem of preliminary convergence of NP-hard problems [1]. One of the main features of this class of algorithms involves the alternative solutions processing and represent an alternative to classical search algorithms that work with only one solution [2]. Thus, optimal and quasi-optimal solutions can be found in polynomial time. Due to an increase in the amount of input data, the number of possible alternative solutions grows factorially. Therefore, the use of various approaches related to the decomposition the task into parts and parallel processing of these parts is an important task [1][2][3].
This paper proposes a new, distributed approach to the implementation of bioinspired algorithms for optimization problems. The main idea of the proposed approach is the automatic division the problem into parts according to a given criterion and a certain rule with subsequent convolution. The classic traveling salesman problem, well-known benchmarks (Eilon 50, Eilon 75, Oliver 30) [1][2][3] and generated randomly graphs with given parameters were chosen as the test problem.
All bioinspired algorithms belong to the class of heuristic algorithms. For these algorithms, convergence to the global optimum has not been proved theoretically, but it has been empirically established that the probability of obtaining the optimal or quasi-optimal solution is high [3,4].

A general scheme of bioinspired algorithms
In general, all bioinspired algorithms include the following steps.
 Initialization of the initial set of alternative solutions (population). A given number of alternative solutions, that are close to the desired solution in different degrees, is created in the specified search area. Deterministic or random algorithms can be used to obtain the initial population [4][5][6]. reached, the best solution in the population are taken as an approximate solution of the given problem. One of the main advantages of bioinspired algorithms is its modular structure. This approach allows to quickly get a large number of new variants of the algorithm by developing new and modifying the existing rules for initializing and generating new agents. All bioinspired algorithms are characterized by the following properties of agents [7]: autonomy, stochastic behaviour, communicability, informational limitations, decentralization.
Bioinspired optimization algorithms have a number of advantages over classical optimization methods, especially when solving non-formalized problems and high-dimensional problems. Under such conditions, bioinspired algorithms provide a high probability of finding the optimal or quasioptimal (approximate) solution in polynomial time. It is an approximate solution that is often sufficient. Nevertheless, with large amounts of input data, a significant increase in the operating time of bio heuristic solutions is observed. This is due to the fact that the time complexity of these algorithms can vary from O(n 2 ) to O(n 3 ). The authors have suggested a distributed approach to address the high dimensionality problem. The main idea is to use a message broker and distribute "tasks" between connected software agents.

Architecture of distributed system for optimization problems
To increase the algorithms' speed for optimization problems, the authors have suggested a system allows to automatically decompose tasks into subtasks and reconsilate it [6-8]. As an example, a solution of the traveling salesman problem is proposed based on the preliminary decomposition of input data. A generalized scheme of the algorithm is presented in the figure 1.  The system is based on a message broker. It is proposed to use the well-establish open source message broker RabbitMQ [6-9].
The term "task" determines the following set of data: decomposition conditions and rules, reconsilation and fitness function calculation rules, rules for decision calculation etc.
If condition in Eq.(1) is met, then the rule will be applied to decompose the set of input data into the set ′ and create the set of child subtasks ′ = { ′ } = ( ).
The result of the convolution rule is a set of output data (solutions). These solutions can be the result of both subtasks and root tasks (2) In this paper it is proposed to use the of the parent task. If for the current task the reference to the parent task is an empty set, then this task is the root. Its solution is the solution to the original applied task as a whole.
In the general case, the life cycle diagram of the task is presented in Figures 2 and 3.  Figure 3. The convolution mechanism. The original task goes to the message broker in the calculation queue. The computation module of one of agents selects the next task from the queue, registers it in the data warehouse with a unique identifier and the status IN_PROGRESS and checks the decomposition rule . If the rule is executed, the original task is divided into subtasks with = and sent to the message queue. Otherwise, calculations are performed. The results are stored in the repository for the current task with the status DONE. To convolve the obtained results, each agent queries the data store in each time interval Δτ and selects data according to the sample rule S, that can be described as follows:  To count the number of tasks with the same , a query is made to the repository. Also this request counts the number of all registered tasks that are children of one task.  To count the number of tasks with the status DONE and with the same , a query is made to the repository. Also this request counts the number of all registered completed tasks that are children of one task.  If for each the total number of tasks coincides with the number of completed tasks, then for all tasks with the same can be execute the convolution rule. If there are several such groups, the current agent selects the first from the list. In order for the convolution rule not to be applied simultaneously for the same subtasks by different agents, the data warehouse denies the access for other agents during the implementation of the sample rule S. Let us consider an example of the sample rule S for a relational data warehouse with the use of MySQL.
There is a table TASK with the following fields: ID is a unique identifier of the task, PARENT_ID is a unique identifier of the parent task, STATUS is the current status of the task.
Consequently, the following queries will be applied:

Experiments
To confirm the mechanism, proposed in this work, a distributed system to solve optimization problems was implemented. The RabbitMQ [7-9] has been chosen as the message broker. The DBMS MySQL [8,9] has been chosen as the data storage. Software agents are implemented in Java. To obtain estimates of the system operation time and the solutions quality, the classical optimization problem, namely the search of a Hamiltonian cycle in a graph, has been chosen. Benchmarks Eilon_50, Oliver_30, as well as graphs generated by specified parameters, are considered as test graphs. As an algorithm to solve the problem, a simple genetic algorithm has been chosen. The condition for decomposition is > 0.2 * 0 . The decomposition rule is to split the set of input data in half, namely, the random partition of the graph from task into two subgraphs and the initialization of tasks 'and '', in which the input data are the subgraphs obtained by splitting. The control parameters for the genetic algorithm have the following meanings: The population size is 0 * 0 ; A number of iterations is 1000; A probability of a crossover operator is 85%; A probability of a mutation operator is 15%; The number of simultaneously running agents is 3.
In the first series of experiments, the dependencies of the system speed on the number of simultaneously performed tasks were identified.
identical tasks are simultaneously added in the system. The next run is performed for + , where is the increment of the number of simultaneously running tasks.
The dependence of time on the number of tasks in the system is shown in the figure 4.

Conclusion
The paper presents the bioinspired distributed system for optimization problems based on the automated decomposition input data according to given rules for subtasks' solving. Then, obtained solutions merge in the initial task solution automatically. The main feature of the system is that the engineer who solves a specific applied task does not need to implement software modules supporting the algorithm, but to concentrate directly on the applied methods and algorithms. The paper presents the main stages of the system, as well as the schemes of the decomposition and convolution mechanisms. The format of the input data has described in detail. The prototype of the distributed system for optimization problems solving has implemented based on described mechanisms. The dependence of the system speed on the number of simultaneously running agents and simultaneously running tasks was carried out. The results of experiments showed that for the most efficient operation of the system, the number of simultaneous agents should be the same as the expected number of simultaneously solved tasks. The further development of the distributed system for optimization problems consists in the integration of non-relational high-performance DBMS as a data warehouse to increase the system speed as a whole.