From dbf513e0795ba066aabeaff58cbe12983f1413ff Mon Sep 17 00:00:00 2001
From: Kahina <kahina@kahina-VPCEH3K1E.(none)>
Date: Wed, 30 Dec 2015 07:19:45 +0100
Subject: [PATCH] MAJ

---
 paper.tex | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/paper.tex b/paper.tex
index fbdcac7..688f393 100644
--- a/paper.tex
+++ b/paper.tex
@@ -463,7 +463,7 @@ Open Multi-Processing (OpenMP) is a shared memory architecture API that provides
 a portable approach for parallel programming on shared memory systems based on compiler directives, that can be included in order
 to parallelize a loop. In this way, a set of loops can be distributed along the different threads that will access to different data allocated in local shared memory. One of the advantages of OpenMP is its global view of application memory address space that allows relatively fast development of parallel applications with easier maintenance. However, it is often difficult to get high rates of performance in large scale applications. Although usage of OpenMP  threads and managed data explicitly done with MPI can be considered, this approcache undermines the advantages of OpenMP.
 
-%\subsection{OpenMP} %L'article en FranÃ§ais Programmation multiGPU â OpenMP versus MPI
+%\subsection{OpenMP} 
 %OpenMP is a shared memory programming API based on threads from
 %the same system process. Designed for multiprocessor shared memory UMA or
 %NUMA [10], it relies on the execution model SPMD ( Single Program, Multiple Data Stream )
@@ -474,9 +474,9 @@ to parallelize a loop. In this way, a set of loops can be distributed along the
 %have private memory areas [6].
 
 \subsection{MPI} 
-The MPI (Message Passing Interface) library allows to create computer programs that run on a distributed memory architecture. The various processes have their own environment of execution and execute their code in a asynchronous way, according to the MIMD model  (Multiple Instruction streams, Multiple Data streams); they communicate and synchronise by exchanging messages~\cite{Peter96}. MPI messages are explicitly sent, while the exchanges are implicit within the framework of a multi-thread programming environment like OpenMP or Pthreads.
+The MPI (Message Passing Interface) library allows to create computer programs that run on a distributed memory architecture. The various processes have their own environment of execution and execute their code in a asynchronous way, according to the MIMD model  (Multiple Instruction streams, Multiple Data streams); they communicate and synchronize by exchanging messages~\cite{Peter96}. MPI messages are explicitly sent, while the exchanges are implicit within the framework of a multi-thread programming environment like OpenMP or Pthreads.
  
-\subsection{CUDA}%L'article en anglais Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
+\subsection{CUDA}
 CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA~\cite{CUDA10}. The
 unit of execution in CUDA is called a thread. Each thread executes a kernel by the streaming processors in parallel. In CUDA,
 a group of threads that are executed together is called a thread block, and the computational grid consists of a grid of thread
@@ -964,12 +964,13 @@ This is due to the use of parallelization MPI paradigm that divides the polynomi
 
 
 \section{Conclusion}
-In this paper, we have presented a parallel implementation of Ehrlich-Aberth algorithm for solving full and sparse polynomials, on single GPU with CUDA and Multi-GPUs using two parallel paradigm, shared memory with OpenMP, distributed memory with MPI.(CUDA-OpenMP) approach and (CUDA-MPI) approach, 
-We have performed many experiments with the Ehrlich-Aberth method in single GPU, Multi-GPU with (CUDA-OpenMP) approach, Multi-GPU with (CUDA-MPI) approach for sparse and full polynomials. the experiments show that, using parallel programming model like (OpenMP, MPI) can effectively manage multiple graphics cards to work together to solve the same problem and accelerate parallel applications, like (CUDA MPI) approach with 4 GPUs can solve a polynomial of 1,000,000 4 speed up than on single GPU.
+In this paper, we have presented a parallel implementation of Ehrlich-Aberth algorithm for solving full and sparse polynomials, on single GPU with CUDA and on Multi-GPUs using two parallel paradigm, shared memory with OpenMP, distributed memory with MPI.(CUDA-OpenMP) approach and (CUDA-MPI) approach, 
+We have performed many experiments with the Ehrlich-Aberth method in single GPU, Multi-GPU with (CUDA-OpenMP) approach, Multi-GPU with (CUDA-MPI) approach for sparse and full polynomials. the experiments show that, using parallel programming model like (OpenMP, MPI) can efficiently manage multiple graphics cards to work together to solve the same problem and accelerate parallel applications, like (CUDA MPI) approach with 4 GPUs can solve a polynomial of 1,000,000 4 speed up than on single GPU.
 
 
-In future, we will evaluate our parallel implementation of Ehrlich-Aberth algorithm on other parallel programming model 
+%In future, we will evaluate our parallel implementation of Ehrlich-Aberth algorithm on other parallel programming model 
 
+Our next objective is to extend the model presented here at nodes clusters frame multi-GPU , with a three-level scheme: inter-node communication via MPI processes (distributed memory), management of multi-GPU node by OpenMP threads (shared memory).
 
 %present a communication approach between multiple GPUs. The comparison between MPI and OpenMP as GPUs controllers shows that these
 %solutions can effectively manage multiple graphics cards to work together
-- 
2.39.5