correction

[ThesisAhmed.git] / CONCLUSION.tex
diff --git a/CONCLUSION.tex b/CONCLUSION.tex

index 06cdaace0f67d2b3714068ee98c4209fd147d806..b5faa8061ac6b897d33eba6c81226303f2022627 100644 (file)
--- a/CONCLUSION.tex
+++ b/CONCLUSION.tex
@@ -4,68 +4,62 @@
  
  \section{Conclusion}
  In this dissertation, we have proposed a method to optimize both the energy consumption and 
-the performance of synchronous and asynchronous iterative applications running over
+the performance at the same time of synchronous and asynchronous  applications with iterations  running over
  cluster and grid platforms. Dynamic voltage and frequency scaling (DVFS) technique was used to 
  lower the frequency of the processor to reduce its energy consumption while computing. Reducing 
-the frequency of the processor reduces the computing power of a processor and thus the performance  of the application running over that processor is decreased too.  However, as the first step in this work,  different energy consumption and performance models were developed to effectively used in the prediction and measurement processes for energy and performance of iterative applications. Depending on these models, an objective function was defined the best trade-off relation between the energy and the performance. Then, this objective function was used in algorithms which optimize both energy consumption and the performance of the iterative application at the same time.
+the frequency of the processor decreases  its computing power which might increase the execution time.  
+In this work, different energy consumption and performance models were developed to  predict the energy consumption and performance of parallel applications with iterations. Depending on these models, an objective function was defined as the best trade-off relation between the energy consumption and the performance of the parallel application. This objective function was used in the frequency selecting algorithms which optimize at the same time both the energy consumption and the performance of the  parallel application with iterations.
  
-The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been categorized using some hardware and software techniques. Different parallel architectures have been
-demonstrated and classified according to how the processing units and the memory model connected to each others. Both shared and distributed platforms as well as their depending parallel programming models have been categorized. The two types of parallel iterative applications: synchronous and asynchronous ones have 
-been discussed and compared to each others. It showed that applying the synchronous iterative methods are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous iterative methods are more suited to the distributed heterogeneous clusters. At the end of this chapter, an energy consumption model proposed in the literature to measure the energy consumption of parallel applications was explained. This model does not consider the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture when there are different power consumption values for each type of processor. 
+The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been classified according to the used hardware and software techniques. Different parallel architectures have also been
+described and classified according to the used  memory model: shared and distributed memory.  The two types of parallel  applications with iterations: synchronous and asynchronous ones have been discussed and compared to each others. Synchronous distributed applications are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous ones are more suited to grids. At the end of this chapter, an energy consumption model proposed in the literature to estimate the energy consumption of parallel applications was explained. This model does not take into account the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture where each type of processor might have a different power consumption value. 
  
-In the second part of the dissertation, we have developed a new energy and performance models for 
-synchronous and asynchronous message passing applications running over a cluster and  a grid. To simultaneously optimize the energy and performance of these applications, the trade-off relation between the two metrics have been defined as a maximum distance between the predicted energy and performance curves to select the best possible frequency scaling factors. We have proposed four different frequencies selecting  algorithms to select the best frequencies that gives minimum energy reduction with maximum possible performance at the same time. They used the computation and communication times measured at the first iteration of the iterative application to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online with a very small overhead on the execution time of a parallel application. 
+In the second part of the dissertation,  a new energy and performance models for 
+synchronous and asynchronous message passing applications with iterations running over clusters and grid,  were presented. To simultaneously optimize the energy and performance of these applications, the trade-off relation has been defined as the maximum distance between the predicted energy and performance curves. This objective function is used by the frequency selecting algorithm to select the available frequency scaling factors that give the optimal energy consumption to performance trade-off. We have proposed four different frequency scaling algorithms, each one of them is adapted to a different execution  context, such as synchronous or asynchronous communications, homogeneous or heterogeneous nodes, and local or distributed architectures. They used the computation and communication times measured at the first iteration of the parallel application with iterations to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online and introduce a very small runtime overhead. They also do not require any profiling or  training. 
  
-In chapter \ref{ch2},  we were proposed a new online scaling factor selection method that optimized simultaneously the energy and performance of a distributed synchronous application running on a homogeneous cluster. We have applied  this algorithm to the NAS benchmarks of the class C and evaluated by the SimGrid simulator. Firstly, Rauber and Rünger’s energy model used by the proposed algorithm to select best frequency gear. The proposed algorithm was compared to the Rauber and Rünger optimization method, which gives better energy and performance trade-off ratios compare to them. Secondly, a new  energy consumption  model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new enenrgy model was used by the proposed algorithm to select different frequencies. Thus,  a new simulation results have been shown, which are more accurate and realistic than other results obtained using the Rauber and Rünger's energy model.  
-In chapter \ref{ch3}, we were proposed  two new online frequency scaling factors selecting algorithms to select the best possible vectors of frequency scaling factors that give best trade-off between the predicted energy and the predicted performance values  of synchronous iterative application running over a heterogeneous cluster and a grid.  Each algorithm depends on a new energy and performance models, which takes into account the underline parallel platform being used. Firstly, the proposed
-scaling factors selection algorithm for a heterogeneous local cluster was implemented  to the  NAS parallel benchmarks of the class C and simulated by SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%. 
-Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and  when different values have been used for static and dynamic powers of the CPU. Secondly, the proposed scaling factors selection algorithm for a grid was implemented to the NAS parallel benchmarks of the class D and executed over Grid5000 testbed platform. The experiments on 16 nodes, distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% the performance. 
-The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architecture or consume  different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
-Both of the proposed  algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting a vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
-In chapter \ref{ch4}, we were presented a new online frequency selection algorithm for asynchronous iterative applications running over a grid.  The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
+In chapter \ref{ch2}, a new online scaling factor selection method that optimizes simultaneously the energy and performance of a distributed synchronous application with iterations running on a homogeneous cluster has been proposed. This algorithm was applied to the NAS benchmarks of the class C and executed over the SimGrid simulator. Firstly, Rauber and Rünger’s energy model was used in the proposed algorithm to select the best frequency gear. The proposed algorithm was compared to the Rauber and Rünger's  optimization method. The results of the comparison showed that the proposed algorithm   gives better energy to  performance trade-off ratios compared to their methods while using the same energy model. Secondly, a new  energy consumption  model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new energy model was used by the proposed algorithm. The new simulation results demonstrated that the new model is  more  accurate and realistic than the previous one.  
+
+In chapter \ref{ch3},  two new online frequency scaling factors selecting algorithms adapted for synchronous application with iterations  running over a heterogeneous cluster and a grid were presented.  Each algorithm uses new energy and performance models which take into account the characteristics of the parallel platform being used. Firstly, the proposed
+scaling factors selection algorithm for a heterogeneous local cluster was applied to the  NAS parallel benchmarks and evaluated over SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the class C of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance to 3.8\%. 
+Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and  when different static and dynamic CPU powers have been used. Secondly, the proposed scaling factors selection algorithm for a grid was applied to the NAS parallel benchmarks and the class D of these benchmarks was executed over the Grid5000 testbed platform. The experiments conducted over 16 nodes distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% their performance. 
+The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architectures or consume  different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
+Both of the proposed  algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting  vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
+
+In chapter \ref{ch4}, a new online frequency selection algorithm were adapted for asynchronous iterative applications running over a grid was presented.  The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
  iterative applications running over a  grid. The proposed algorithm was evaluated twice
  over the SimGrid simulator and Grid’5000 testbed while running a multi-splitting (MS)
  application that solves 3D problems. The experiments were executed over different grid
  scenarios composed of different numbers of clusters and different numbers of nodes
-per cluster. The proposed algorithm was applied synchronously and asynchronously on a
-synchronous and an asynchronous version of the MS application. Both the simulation
-and real experiment results show that applying synchronous frequency selecting algorithm on an 
+per cluster. The proposed algorithm was applied synchronously and asynchronously on 
+synchronous and  asynchronous versions of the MS iterative application. Both the simulations
+and real experiments results showed that applying synchronously the frequency selecting algorithm on an 
  asynchronous MS application gives the best tradeoff between energy consumption reduction
-and performance compared to other scenarios. In the simulation results, this scenario
-saves on average the energy consumption by 22\% and reduces the execution time of
+and performance when compared to the other scenarios. In the simulation results, this scenario
+reduces on average the energy consumption by 22\% and decreases the execution time of
  the application by 5.72\%. This version optimizes both of the dynamic energy 
-consumption by applying synchronously the HSA algorithm at the end of the first iteration and the
-static energy consumption by using asynchronous communications between nodes from
+consumption by applying synchronously the HSA algorithm at the end of the first iteration of the iterative application  and the static energy consumption by using asynchronous communications between nodes from
  different clusters which are overlapped by computations. The proposed algorithm was also
-evaluated over three power scenarios, which  selects different vectors of frequencies proportionally to dynamic and static powers values. More energy reduction has been achieved when the  ratio of the 
-dynamic power have been increase and vice versa. whereas, the performance degradation percentages were decreased when the static power ratio has been increased.
-In the Grid’5000 results, this scenario saves the energy consumption by 26.93\% and 
-reduces the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
-In both of the Simulation and Grid'5000 testbed experiments, we have compared the proposed algorithm to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
+evaluated over three power scenarios which  selects different vectors of frequencies proportionally to the  dynamic and static powers values. More energy reduction was achieved when the  ratio of the 
+dynamic power was increased and vice versa. Whereas, the performance degradation percentages were decreased when the static power ratio was increased.
+In the Grid’5000 experiments, this scenario reduces the energy consumption by 26.93\% and 
+decreases the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
+In both of the Simulations and real experiments, the proposed algorithm was compared to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
  a better trade-off between the energy consumption reduction and the performance.
-Finally, we outline some perspectives that will be applied to this work in the future as in the next section.
  
  
  \section{Perspectives}
  In the near future, we will adapt the proposed algorithms to take into consideration the variability between some iterations.  For example, each proposed algorithm can be executed twice: after the first iteration the frequencies are scaled down according to the execution times measured in the first iteration, then after a fixed number of iterations, the frequencies are adjusted according to the execution times measured during the fixed number of iterations. If the  computing power of the system is constantly changing, it would be interesting to implement a mechanism that detects this change and adjusts the frequencies according to the variability of the system. 
+Also, it would be interesting to evaluate the scalability of the proposed algorithms by running them on large platforms composed of many thousands of cores. The scalability of the algorithms can be improved by distributing them  in a hierarchical manner where a leader is chosen for each cluster or a group of nodes  to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration. 
  
-Also, it would be interesting to evaluate the scalability of the proposed algorithm by running it on large platforms composed of many thousands of cores. The scalability of the algorithm can be improved by distributing it  in a hierarchical manner where a leader is chosen for each cluster or a group of nodes  to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration. 
-
-The proposed algorithms  would evaluate to other message passing iterative methods in order to see how it adapts to the characteristics of the new method. 
+The proposed algorithms  should be applied to other message passing  methods  with iterations in order to see how they adapt to the characteristics of these methods. 
  Also, it would be interesting to explore if a relation can be found between the numbers of asynchronous iterations required to global convergence and the applied frequencies to the
  nodes. The number of iterations required by each node for global convergence
  is not known in advance and the change in CPUs frequencies changes the
  number of iterations required by each node for global convergence.
  
-Furthermore, the proposed algorithms for a heterogeneous platforms,  chapters \ref{ch3} and \ref{ch4}, should be applied to a heterogeneous platform composed of  CPUs and GPUs, where all works of energy minimization in the literature over a collection of GPUs and CPUs platform  have been shown more energy efficiency compared those composed from only CPUs.
+Furthermore, the proposed algorithms for  heterogeneous platforms,  in chapters \ref{ch3} and \ref{ch4}, should be applied to heterogeneous platforms composed of  CPUs and GPUs. Indeed, most of the works in the 
+green computing field showed  that these mixed platforms of GPUs and CPUs are more energy efficient than those composed of only CPUS. 
  
-Finally, it would be interesting to compare the accuracy of the
-results of the proposed energy models to the values given by instruments
+Finally, it would be interesting to verify the accuracy  of the
+results returned by  the  energy models by comparing them to the values given by instruments
  that measure the energy consumptions of CPUs during the execution time, as in \cite{ref106}.
  
-
-
-
-
-
-