From 74744b2d2a5a2ce6ebad4b47c112102a7a572766 Mon Sep 17 00:00:00 2001
From: asider <ar.sider@univ-bejaia.dz>
Date: Wed, 20 Jan 2016 15:53:02 +0100
Subject: [PATCH 1/1] retouche conclusion

---
 paper.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/paper.tex b/paper.tex
index 5e1d528..f34155e 100644
--- a/paper.tex
+++ b/paper.tex
@@ -464,9 +464,9 @@ These experiments report the execution times of the EA method for sparse and ful
 
 \section{Conclusion}
 \label{sec6}
-In this paper, we have presented parallel implementations of the Ehrlich-Aberth algorithm to solve full and sparse polynomials, on a single GPU with CUDA and on multiple GPUs using two parallel paradigms: shared memory with OpenMP and distributed memory with MPI. These architectures were addressed by a CUDA-OpenMP approach and CUDA-MPI approach, respectively. Experiments show that, using parallel programming model like (OpenMP or MPI), we can efficiently manage multiple graphics cards to solve the same problem and accelerate the parallel execution with 4 GPUs and solve a polynomial of degree up-to 5,000,000 four times faster than on single GPU. 
+In this paper, we have presented parallel implementations of the Ehrlich-Aberth algorithm to solve full and sparse polynomials, on a single GPU with CUDA and on multiple GPUs using two parallel paradigms: shared memory with OpenMP and distributed memory with MPI. These architectures were addressed by a CUDA-OpenMP approach and CUDA-MPI approach, respectively. Experiments show that, using parallel programming model like (OpenMP or MPI), we can efficiently manage multiple graphics cards to solve the same problem and accelerate the parallel execution with 4 GPUs and solve a polynomial of degree up-to 5,000,000 four times faster than on a single GPU. 
 
-Our next objective is to extend the model presented here with clusters of GPU nodes, with a three-level scheme: inter-node communications via MPI processes (distributed memory), management of multi-GPU nodes by OpenMP threads (shared memory).
+Our next objective is to extend the model presented here to clusters of GPU nodes, with a three-level scheme: inter-node communications via MPI processes (distributed memory), management of multi-GPU nodes by OpenMP threads (shared memory). Actual platforms may probably also contain purely multi-core nodes without any GPU. This heterogeneous setting may lead to the integration of load balancing algorithms so as to allow an optimal use of hardware ressources. 
 
 
 \section*{Acknowledgment}
-- 
2.39.5