-Finding polynomial roots rapidly and accurately is the main objective of our work. In this paper we propose the parallelization of Ehrlich-Aberth method using two parallel programming paradigms OpenMP and MPI on multi-GPU platforms. {\color{red}{We consider two architectures: shared-memory computers with OpenMP API and distributed-memory computers with MPI API. The first approach is based on threads from the same system process, with each thread attached to one GPU and after the various memory allocations, each thread launches its part of computations. To do this we must first load on the GPU required data and after the computations are carried, repatriate the result on the host. The second approach i.e distributed memory with MPI relies on the MPI library which is often used for parallel programming~\cite{Peter96} in
-cluster systems because it is a message-passing programming language. Each GPU is attached to one MPI process, and a loop is in charge of the distribution of tasks between the MPI processes. This solution can be used on one GPU, or executed on a distributed cluster of GPUs, employing the Message Passing Interface (MPI) to communicate between separate CUDA cards. This solution permits scaling of the problem size to larger classes than would be possible on a single device and demonstrates the performance which users might expect from future HPC architectures where accelerators are deployed.}}
-\LZK{Trop détaillé et mal expliqué. \\ We consider two architectures: shared-memory and distributed-memory computers. The first parallel algorithm is implemented on shared-memory computers by using OpenMP API. It is based on threads created from the same system process, such that each thread is attached to one GPU. In this case the communications between GPUs are done by OpenMP threads through shared memory. The second parallel algorithm uses the MPI API, such that each GPU is attached and managed by a MPI process. The GPUs exchange their data by message-passing communications. This latter approach is more used on distributed-memory clusters to solve very complex problems that are too large for traditional supercomputers, which are very expensive to build and run.}
-
-{\color{red}{This paper is organized as follows. In Section~\ref{sec2} we recall the Ehrlich-Aberth method. In section 3 we present EA algorithm on single GPU. In section 4 we propose the EA algorithm implementation on Multi-GPU for (OpenMP-CUDA) approach and (MPI-CUDA) approach. In section 5 we present our experiments and discus it. Finally, Section~\ref{sec6} concludes this paper and gives some hints for future research directions in this topic.}}\LZK{A revoir toute cette organization}
-
+%Finding polynomial roots rapidly and accurately is the main objective of our work. In this paper we propose the parallelization of Ehrlich-Aberth method using two parallel programming paradigms OpenMP and MPI on multi-GPU platforms. We consider two architectures: shared memory and distributed memory computers. The first parallel algorithm is implemented on shared memory computers by using OpenMP API. It is based on threads created from the same system process, such that each thread is attached to one GPU. In this case the communications between GPUs are done by OpenMP threads through shared memory. The second parallel algorithm uses the MPI API, such that each GPU is attached and managed by a MPI process. The GPUs exchange their data by message-passing communications. This latter approach is more used on distributed memory clusters to solve very complex problems that are too large for traditional supercomputers, which are very expensive to build and run.
+%\LZK{Cette partie est réécrite. \\ Sinon qu'est ce qui a été fait pour l'accuracy dans ce papier (Finding polynomial roots rapidly and accurately is the main objective of our work.)?}
+%\LZK{Les contributions ne sont pas définies !!}
+
+In this paper we propose the parallelization of Ehrlich-Aberth method using two parallel programming paradigms OpenMP and MPI on CUDA multi-GPU platforms. Our CUDA/MPI and CUDA/OpenMP codes are the first implementations of Ehrlich-Aberth method with multiple GPUs for finding roots of polynomials. Our major contributions include:
+\LZK{Pourquoi la méthode Ehrlich-Aberth et pas autres?}
+ \begin{itemize}
+\item The parallel implementation of Ehrlich-Aberth algorithm on a multi-GPU platform with a shared memory using OpenMP API. It is based on threads created from the same system process, such that each thread is attached to one GPU. In this case the communications between GPUs are done by OpenMP threads through shared memory.
+\item The parallel implementation of Ehrlich-Aberth algorithm on a multi-GPU platform with a distributed memory using MPI API, such that each GPU is attached and managed by a MPI process. The GPUs exchange their data by message-passing communications. This latter approach is more used on clusters to solve very complex problems that are too large for traditional supercomputers, which are very expensive to build and run.
+ \end{itemize}
+\LZK{Pas d'autres contributions possibles?}
+
+%This paper is organized as follows. In Section~\ref{sec2} we recall the Ehrlich-Aberth method. In section~\ref{sec3} we present EA algorithm on single GPU. In section~\ref{sec4} we propose the EA algorithm implementation on Multi-GPU for (OpenMP-CUDA) approach and (MPI-CUDA) approach. In sectioné\ref{sec5} we present our experiments and discus it. Finally, Section~\ref{sec6} concludes this paper and gives some hints for future research directions in this topic.}
+
+The paper is organized as follows. In Section~\ref{sec2} we present three different parallel programming models OpenMP, MPI and CUDA. In Section~\ref{sec3} we present the implementation of the Ehrlich-Aberth algorithm on a single GPU. In Section~\ref{sec4} we present the parallel implementations of the Ehrlich-Aberth algorithm on Multi-GPU using the OpenMP and MPI approaches.
+\LZK{A revoir toute cette organization}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%