Energy efficient computing by using of software optimization aimed on execution time

Despite of all hardware capabilities, optimization of software always was one of actual tasks. Usually optimization of software was aimed to reduce an execution time. But recent decades it’s possible sometimes also to reduce energy consumption and/or energy efficiency of computing because of enhanced energy saving capabilities. Indeed, if computations are made faster, CPU can suspend unusual computational and control blocks. In case of parallelization which is one of optimization techniques, power consumption can raise in short periods of computations because of using multiple CPUs simultaneously. At once execution time can be decreased dramatically so energy efficiency of computing still increasing. In the paper the authors provide the experiments on small size mini PC Raspberry Pi 3B which show that using of optimization tiling method not only speed up a processing but also increase energy efficiency of computing. For such low power systems this can be useful to increase power on time with a battery power source.


Introduction
The problems of computer systems power consumption have been taking attention for a long time.There were several reasons for this.First, a requirement to increase computing performance has increased the number and activity of nodes in computer systems.Second reason is a development of a different purpose computing system such as stand-alone computing units, mobile devices or on-board computers.Development and research of embedded computing systems have brought particular attention to energy use issues [1].The third reason worth mentioning is a development of systems based on multiprocessor computing systems, such as network servers, data centers, multi-machine complexes for high-performance computing demands.The feature of this class of systems is a requirement to organize a special infrastructure with support for sustainable power, cooling, etc.
Andrae and Edler [2] show various projected scenarios of data centers power consumption by 2030.On the basis of this increase in power consumption by data centers and structure of this demand, it can be concluded that a comprehensive approach has to be taken to address the problem of reducing electricity consumption in this sector.In addition to measures to reduce computing costs, solutions to reduce energy consumption by data center infrastructure should be implemented.This is true for both high-performance computing and global information networks.
Ciancarini et al [3] showed that activities in various sectors of economics are increasingly being digitized.Human activities are increasingly dependent on information technology, resulting in increased energy consumption.Usage of smartphones, tablets and other mobile devices significantly expands a number of active users.
Gupta and Singh [4] shows the power consumption structure of a personal computer including printer and communication system.The authors presented a technical overview of minimization methods of energy consumption by computer systems.Based on the methods of calculation of energy consumption using hardware, software, processor and various algorithmic approaches, a conclusion is presented that there is a great need to change of operating system work at moments of inactivity.Available options in power circuits of computers and peripherals are based on a timeout approach.This is not enough to solve the problem of minimization of energy consumption.
Szydlowski and Chvála, Jr [5] give a fairly detailed study of power consumption of personal computers and workstations.The study should be taken into account, even though the data may appear to be outdated at first glance due to evolving technologies.With a load of 75 to 175W standard PC consumed 144 W, and standard workstation takes 173W.Peripheral equipment such as printers and modems accounted for only 16% of the total workstation consumption.This information indicates that energy-saving measures should focus on computers (50% of total consumption) and monitors (35% of total consumption).Laser printers at 12% of total workstation consumption, also require attention because they are involved in the work, usually 24 h/day.The paper presents a problem of software optimization as a component of the complex solution for the problem of reduction of energy consumption.A focus of the research is on solutions for design and use of mobile and embedded systems.There is an additional point for research in a field of energy efficiency of devices.
An issue of increasing energy consumption has affected on field of mobile information technologies.A growth of energy consumption has outpaced of development of energy storage technologies, in particular, a capacity of batteries.This is reflected in a significant reduction in a lifetime of mobile device batteries while increasing their functionality.Thus, the challenge of improving the energy efficiency of information technology is not only to save energy, but also to extend the battery life of mobile devices.
Hassan et al [6] shows a dependence of a performance and energy consumption of computer systems on quality of program code.The authors demonstrated that in order to ensure the best performance indicators for computer systems, there is a balance between performance, capacity and energy consumption and a choice of a right coding style together with "right" compiler determines an achievement of this balance.
In this article, the approach of the authors is defined by three provisions.First, in a transistor gate in general, two types of power dissipation can be distinguished: static and dynamic.Static power dissipation is measured in opened or closed states of transistor switch.Dynamic losses occur during transistor switching from one state to another.Dynamic losses account for up to 80% of total digital circuits power dissipation.Second, when analyzing of algorithms of computing systems, it is clear that most of computational time of software happens to a cyclic loops of algorithms.By improving of cyclic loops or sections, it is possible to reduce an activity of electronic components and, accordingly, dynamic losses of digital circuits.And third, algorithm transformations should be done at system level.This provides a significant benefit in terms of reducing the impact of poor program source code on a performance of computer system.So, the main idea of the author's approach is to make transformation of program sources taking into account program loop operators and making source-to-source transformation.
Further, in the first section the authors present modern methods and approaches used in the optimization of computer programs.The next section provides hardware and software tools for estimation of execution time of initial and optimized test programs.The results of the optimization method of tiling and parallelization, which show the change in the performance and energy efficiency of calculations, are given in the latest section.

Motivation
From the very beginning of computational systems' development, a lot of efforts were spent to make hardware smaller and to consume less power.Such methods as dynamic clock rate, unused block hibernation, lower operation voltage, comprehensive electric layers' design were used and also are used now to reduce power consumption.
At the moment most of electronic components are built by using complementary metal-oxide semiconductor (CMOS) technology.This technology was discovered by different authors [7,8] and all effects which lead to higher power consumption are known.All sources of CMOS digital circuits power dissipation can be divided onto 4 groups.
Dynamic dissipation is a main dealer of power dissipation and it can dissipate up to 80% of total energy dissipation [9].Formula for dynamic power dissipation is depicted below: where C is a total capacity of a circuit, V -voltage span, k -activity factor and f -switching frequency.Dynamic dissipation take place when a circuit switches from one digital state to another.Dynamic dissipation is linear to switching frequency.If there is no switching -dynamic power will be absent.More active elements involved -higher dynamic dissipation occurs [10].
In terms of software optimization, it can be concluded that every method which reduces register or memory usage or functional blocks is involved as well, in theory can reduce dynamic dissipation.Software optimization methods are aimed on different software parameters.Often main goal of optimization is a reducing of time execution of program.It allows to perform computational tasks faster or to increase a functionality of a software by adding new features to it which were not possible with slow computations.Most of execution time software spends in computational loops which perform the same computational operations for different data.Improvement of any single operation will lead to improvement of every computation in it.Thus, optimization methods targeted on execution time are developed to implement them in computational loops.Some of software optimization methods are listed below.Mentioned methods provide different approaches to speed up a computational loop.Some of them are intuitively clear like parallelization when data are distributed between CPUs for faster execution.Some methods are not so clear.For example, tiling method, practical usage of which will be discovered below, is not so evident.For mathematical description and handling of different optimization methods several models were proposed.One of most used is a polyhedral model [11].This model represents a loop or loop nest as a defined mathematical abstraction [12].Every loop is described with a set of inequalities and limits.Such approach allows to alter one or several loops by changing their parameters and convert model back to code.Polyhedral model can handle tiling method.Tiling method is an optimization method which divides iteration space onto smaller parts named tiles.Computation of data on smaller range provides better data locality and less cache misses.This can lead to faster execution time.Another promising factor that during tiling usage a software natively obtains a different computational blocks which is possible to distribute on different CPUs if there are no data relations between them.Combining of tiling method and parallelization can have synergistic effect.

Hardware and software set
As it mentioned above, a lot of optimization methods are used to gain different improvements: faster execution time, lesser data or code memory usage etc.But what to do if desired goal of optimization is a lesser power consumption or better energy efficiency?How to change a software to reach it?
Because computational system is a comprehensive set of many hardware sub-parts which consume energy, there is no single target parameter available in code to optimize.A possibility that some optimization methods can to influence on energy consumption should be discovered.
The authors performed several experiments to verify it.First, software optimization method should be chosen.The authors have chosen tiling method which uses polyhedral model as a mathematical description.Main idea of the method is to reduce iteration space onto smaller parts which are called tiles (figure 1).It produces better usage of cache memory and improves data locality.Also, to reveal a power of multi-core CPU a parallelization method was used in addition.
Usually, an optimization is performed by compiler or manually.There is another way to optimize source code without using compiler's features.
To apply tiling method, the authors used Pluto framework [13].This is a set of tools which automatically can apply several tiling methods and parallelization as well.This is achieved by transformation of C source code to another source code by some rules.After it a new code should be compiled as usual.Such code-to-code transformation allows researchers and developers to be focused and to improve specific optimization methods.No changes to compiler are required.Moreover, all compiler's optimization methods also are applied on a stage of compilation.
Next, to verify any optimization some test applications are required.In common case, any optimization aimed to execution time reduction or energy consumption can be only verified on real execution of software.The authors chosen Polybench [14] set of test applications.It includes about 30 tests of linear algebra, simulation, vector and matrix computations.The tests have embedded tools to measure execution time.By altering the tests by source-to-source transformation and to compile and execute a new code it's possible to measure and compare new execution time.
Finally, a Raspberry Pi 3B board was chosen as a hardware platform for tests (figure 2).This credit size mini PC has 4 ARM based CPUs, 1 GB of RAM, video, audio outputs and general purpose input/output signals (GPIO).This PC can operate on Linux based distributives and usually is chosen for home automation, home or NAS server and IoT.Main idea of the tests is to measure a speedup of execution time, energy consumption and to calculate energy efficiency of optimized tests.

Experiments
To evaluate an energy efficiency, it is not enough to measure only execution time.Because an optimized software can use less or more electric power i.e. in case of multi-core optimization and be faster or slower an electrical power consumption also should be taken into account.Energy which is required to compute any result is an integral of electric power on time or, in simplified form, when exact power consumption in time is not known, equals to a multiplication of average power by time of computation: IOP Publishing doi:10.1088/1755-1315/1254/1/012037 where P is an average power consumption, T -execution time.To estimate energy efficiency coefficient for non-optimized and optimized version it's necessary to compare required energy in both cases.When required energy is lower than before then coefficient of energy efficiency will be more than 1, if higher -then less than 1.If to apply previous formula, we can obtain: To evaluate an energy efficiency coefficients two measurements were made.First measurement was a measurement of execution time and second one was a measurement of average power consumption.Both measurements were made for non-optimized and all versions of optimized test programs.Optimized version was made with different tiling options of Pluto software.After applying of formula above were obtained energy efficiency coefficients which were composed in table 1.
As it follows from the table, different coefficients of energy efficiency were obtained.For better readability the data are shown on figure 3.There are test programs on the horizontal axe.On the vertical axe there is a coefficient of energy efficiency.For non-optimized program it's always 1.The different optimization methods are depicted by the different color lines.Some of optimization methods are only a variation of tiling method but also some of them are combined with parallelization.In that case name "parallel" is presented inside its name.If to look onto the plot it can be recognized that some of the optimization have much better results than other.Often this is a blue and red lines which mean "Tile Parallel" and "Innerpar Tile Parallel" methods.Both of them have parallelization option.It means that parallelization As it follows from the figure, quite often it's possible to obtain an energy efficient coefficient more than 1.It means that optimized software requires less energy to compute the same algorithm.On other hand there are several algorithms which didn't receive any reducing of total required energy.For example, Floyd-Warshall algorithm which is an algorithm for finding the shortest path between all pairs of vertices in a weighted graph, didn't get better energy efficiency in any applied optimization method.

Conclusion
If to review the obtained results it should be concluded that optimization methods provide better energy efficiency in most cases.Usually, it's provided by significant speed up of execution time with slightly increased power consumption.Together these results produce better energy efficiency of computing.In some particular cases test program didn't react on optimization methods and in such cases energy efficiency was the same or even worse.
Big difference of energy efficiency on the different tests and different optimization methods shows that in any particular case optimization method should be chosen carefully.Several experiments should be done with different optimization methods.Also measurement for original non-optimized software should be done.If there is no speed up of execution time for several tiling methods, then no tiling methods will boost it.Another optimization method should be chosen.
The authors also performed such experiments on other hardware platforms with tiling methods.In common, CPU has bigger cache -bigger speed up of execution can be obtained.
To summarize all the tests, it's necessary to highlight that optimization of software can really improve energy efficiency of computing by reducing of CMOS circuit switches.It improves program locality and this reduces dynamic and thus total power dissipation.

Figure 1 .
Figure 1.Example of original and tiled iteration spaces.Obtained tiles are selected in gray rectangles.
Figure 2 (b) also shows the Raspberry Pi hardware structure.

Figure 3 .
Figure 3. Coefficients of energy efficiency for different test programs and optimization methods.

Table 1 .
Coefficients of energy efficiency for Polybench tests on Raspberry Pi 3 board on different optimization methods, times.