Model and method for optimizing heterogeneous systems

Methodology of distributed computing performance boost by reduction of delays number is proposed. Concept of n-dimentional requirements triangle is introduced. Dynamic mathematical model of resource use in distributed computing systems is described.


TIAA2016
IOP Publishing IOP Conf. Series: Materials Science and Engineering 155 (2016) 012043 doi:10.1088/1757-899X/155/1/012043 Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1 Is element р i 1 of Р 1 is characterized by the following parameters: q i t -free processing power of р i node in time point t, m i t -free storage capacity provided for data storage in р i node in time point t, w i t -р i node reliability measure in time point t calculated as the possibility of р i node breakdown.
Data warehouse р i 2 of Р 2 has the following parameters: f i t -disc space allocated for new data, а i -database type. Channels р i 3 of Р 3 , are characterized by the following: To simplify application of optimization algorithms these sets may be joined provided that all parameters of subset elements that relate to other subset are zero. For example for Р 2 all parameters except f i t and а i are zero.
Considering the above the function that defines each edge may by written as , that defines transfer from node j to node in time point t. Function W ji t is multi-criterion and defines free capacity of edge from i to j in time point tо ji t .
As mentioned earlier input data for distributed computer systems are generated mainly by automated data capture systems that capture information from different systems of information gathering. Let us call them sources in terms of network modeling. The main task of distributed computing is to process raw data coming from sources as required (applying predetermined models and methods) and transfer them to destinations for use in decision support systems on workplaces of decision makers [7]. To describe processes applied to data on its way from source to destination we represent it as a flow: Let us denote q ji k € Q(set of all flows in the system), k th flow with the source in node р i , and destination in p j . This flow is characterized by the following: V input k -input information volume, V output k -out put information volume, c k -amount of work (number of calculation) to transfer input information into output, Parametrrequirements to data transfer route: cost of transfer in Rubles, time frame for package transfer, reliability.
If necessary q k may be divided into N stages for serial or parallel processing, so: where c n k is the volume of n th stage q k . Time required for processing and transfer of k th flow is: where τ n k expected processing time on n th stage q k .
One may easily see that all the flow of input and output data may be divided into separate parts if calculations are divided like this. To denote them we introduce the following: V input k n input data volume for stage n, V output k n expected output data volume of stage n. More generally this task may be represented in the following way: minimizing transfer time of all flows existing in time point t from source in nodes P i to destinations in p j for limited resources defined for each edge as с ji t As in each following time point t+1 function may have the other extreme the task has dynamic character.
Formal representation of this statement is like follows: ∑ τ k t ->min where τ k t is expected time for processing of k th flow existing in time point t. Under limitations: ∑c n k ij t ≤ о ji t , where ∑c n k ij t total of all jobs for running on edge from I to j in time point t.
One of the most important tasks for calculation of this model is calculation of limitations о ji t because they originate not only from internal factors. To calculate о ji t we use parameter s i t (forecasted delay р i 3 in time point t) that depends on external unaccounted effects. In most cases, companies use external communication channels for information transfer with unreliable effectiveness and bandwidth.
The best instrument for forecasting in this case is neuron networks. Statistical methods and time series method may be also useful but configuration of global network is constantly changing and these methods in most cases do not solve the problem [8]. Unlike these methods neural networks are capable not only to run predetermined operations sequence but also analyze inputting information on the fly find patterns in it and make forecasting. Neuron networks are constantly learning basing on previous values.
Method. There are two methods to solve this problem: 1. Staticscaling capacity of computer network and in the first place in bottlenecks or problem nodes. This approach is not universal for given level of complexity because it does not allow solving problem in real time. It is also ineffective for constantly changing volumes and data flow directions and it is also costly [9].
2. Dynamictaking off the load from overloaded nodes by idle ones in real time (resources allocation optimization).
This approach allows balancing the load between the nodes of computer system making it more uniform, increase performance and reliability of information processing and decrease total cost of ownership of resource.
The rule of n-dimentional requirement triangle may be used to define optimal set of criteria for equation.
Criteria are being divided into interrelated discrepant triplets. Values of all criteria should be normalized. We assume 1 as the best value of criterion and 0 for the least value. Distance  Basing on discrepancy and interconnection of requirements let us consider each third requirement to be discrepant with two others. It may be interpreted as the system of equations: [γ] are angles between AOOB, BOOC, AOOC respectively. Segments AB, BC, AC in this case define not only interconnection but also discrepancy.
Let us introduce requirements' balancing function. For 3 requirements it is equal .
Geometric sense of this function is area of ABC triangle. Equilateral triangle has maximal area. In analyzed case equality of legs is achieved in case of equality or balance of all requirements.
Number of requirements in most real world jobs is much greater that three. To apply the rule of N-dimentional triangle it is necessary to draw triangle in every plane defined by triples of requirements. To simplify task areas of drawn triangles may be multiplied together to go to maximum.
Resume. Proposed methodology allows formalizing the process of distributed computer system resources allocation by dynamic mathematical model. Software implementation of