2 \chapter{Conclusion and Perspectives}
6 In this dissertation, we have proposed a method to optimize both the energy consumption and
7 the performance of synchronous and asynchronous iterative applications running over
8 cluster and grid platforms. Dynamic voltage and frequency scaling (DVFS) technique was used to
9 lower the frequency of the processor to reduce its energy consumption while computing. Reducing
10 the frequency of the processor reduces the computing power of a processor and thus the performance of the application running over that processor is decreased too. However, as the first step in this work, different energy consumption and performance models were developed to effectively used in the prediction and measurement processes for energy and performance of iterative applications. Depending on these models, an objective function was defined the best trade-off relation between the energy and the performance. Then, this objective function was used in algorithms which optimize both energy consumption and the performance of the iterative application at the same time.
12 The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been categorized using some hardware and software techniques. Different parallel architectures have been
13 demonstrated and classified according to how the processing units and the memory model connected to each others. Both shared and distributed platforms as well as their depending parallel programming models have been categorized. The two types of parallel iterative applications: synchronous and asynchronous ones have
14 been discussed and compared to each others. It showed that applying the synchronous iterative methods are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous iterative methods are more suited to the distributed heterogeneous clusters. At the end of this chapter, an energy consumption model proposed in the literature to measure the energy consumption of parallel applications was explained. This model does not consider the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture when there are different power consumption values for each type of processor.
16 In the second part of the dissertation, we have developed a new energy and performance models for
17 synchronous and asynchronous message passing applications running over a cluster and a grid. To simultaneously optimize the energy and performance of these applications, the trade-off relation between the two metrics have been defined as a maximum distance between the predicted energy and performance curves to select the best possible frequency scaling factors. We have proposed four different frequencies selecting algorithms to select the best frequencies that gives minimum energy reduction with maximum possible performance at the same time. They used the computation and communication times measured at the first iteration of the iterative application to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online with a very small overhead on the execution time of a parallel application.
19 In chapter \ref{ch2}, we were proposed a new online scaling factor selection method that optimized simultaneously the energy and performance of a distributed synchronous application running on a homogeneous cluster. We have applied this algorithm to the NAS benchmarks of the class C and evaluated by the SimGrid simulator. Firstly, Rauber and Rünger’s energy model used by the proposed algorithm to select best frequency gear. The proposed algorithm was compared to the Rauber and Rünger optimization method, which gives better energy and performance trade-off ratios compare to them. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new enenrgy model was used by the proposed algorithm to select different frequencies. Thus, a new simulation results have been shown, which are more accurate and realistic than other results obtained using the Rauber and Rünger's energy model.
20 In chapter \ref{ch3}, we were proposed two new online frequency scaling factors selecting algorithms to select the best possible vectors of frequency scaling factors that give best trade-off between the predicted energy and the predicted performance values of synchronous iterative application running over a heterogeneous cluster and a grid. Each algorithm depends on a new energy and performance models, which takes into account the underline parallel platform being used. Firstly, the proposed
21 scaling factors selection algorithm for a heterogeneous local cluster was implemented to the NAS parallel benchmarks of the class C and simulated by SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%.
22 Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different values have been used for static and dynamic powers of the CPU. Secondly, the proposed scaling factors selection algorithm for a grid was implemented to the NAS parallel benchmarks of the class D and executed over Grid5000 testbed platform. The experiments on 16 nodes, distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% the performance.
23 The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architecture or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
24 Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting a vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
25 In chapter \ref{ch4}, we were presented a new online frequency selection algorithm for asynchronous iterative applications running over a grid. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
26 iterative applications running over a grid. The proposed algorithm was evaluated twice
27 over the SimGrid simulator and Grid’5000 testbed while running a multi-splitting (MS)
28 application that solves 3D problems. The experiments were executed over different grid
29 scenarios composed of different numbers of clusters and different numbers of nodes
30 per cluster. The proposed algorithm was applied synchronously and asynchronously on a
31 synchronous and an asynchronous version of the MS application. Both the simulation
32 and real experiment results show that applying synchronous frequency selecting algorithm on an
33 asynchronous MS application gives the best tradeoff between energy consumption reduction
34 and performance compared to other scenarios. In the simulation results, this scenario
35 saves on average the energy consumption by 22\% and reduces the execution time of
36 the application by 5.72\%. This version optimizes both of the dynamic energy
37 consumption by applying synchronously the HSA algorithm at the end of the first iteration and the
38 static energy consumption by using asynchronous communications between nodes from
39 different clusters which are overlapped by computations. The proposed algorithm was also
40 evaluated over three power scenarios, which selects different vectors of frequencies proportionally to dynamic and static powers values. More energy reduction has been achieved when the ratio of the
41 dynamic power have been increase and vice versa. whereas, the performance degradation percentages were decreased when the static power ratio has been increased.
42 In the Grid’5000 results, this scenario saves the energy consumption by 26.93\% and
43 reduces the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
44 In both of the Simulation and Grid'5000 testbed experiments, we have compared the proposed algorithm to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
45 a better trade-off between the energy consumption reduction and the performance.
46 Finally, we outline some perspectives that will be applied to this work in the future as in the next section.
49 \section{Perspectives}
50 In the near future, we will adapt the proposed algorithms to take into consideration the variability between some iterations. For example, each proposed algorithm can be executed twice: after the first iteration the frequencies are scaled down according to the execution times measured in the first iteration, then after a fixed number of iterations, the frequencies are adjusted according to the execution times measured during the fixed number of iterations. If the computing power of the system is constantly changing, it would be interesting to implement a mechanism that detects this change and adjusts the frequencies according to the variability of the system.
52 Also, it would be interesting to evaluate the scalability of the proposed algorithm by running it on large platforms composed of many thousands of cores. The scalability of the algorithm can be improved by distributing it in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration.
54 The proposed algorithms would evaluate to other message passing iterative methods in order to see how it adapts to the characteristics of the new method.
55 Also, it would be interesting to explore if a relation can be found between the numbers of asynchronous iterations required to global convergence and the applied frequencies to the
56 nodes. The number of iterations required by each node for global convergence
57 is not known in advance and the change in CPUs frequencies changes the
58 number of iterations required by each node for global convergence.
60 Furthermore, the proposed algorithms for a heterogeneous platforms, chapters \ref{ch3} and \ref{ch4}, should be applied to a heterogeneous platform composed of CPUs and GPUs, where all works of energy minimization in the literature over a collection of GPUs and CPUs platform have been shown more energy efficiency compared those composed from only CPUs.
62 Finally, it would be interesting to compare the accuracy of the
63 results of the proposed energy models to the values given by instruments
64 that measure the energy consumptions of CPUs during the execution time, as in \cite{ref106}.