the EA implementation on (CUDA, OpenMP)

author Kahina <kahina@kahina-VPCEH3K1E.(none)>

Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)

committer Kahina <kahina@kahina-VPCEH3K1E.(none)>

Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)
author Kahina <kahina@kahina-VPCEH3K1E.(none)>
Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)
committer Kahina <kahina@kahina-VPCEH3K1E.(none)>
Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)
diff --git a/paper.tex b/paper.tex

index 962c7f9167c745362c36f252c511288c398d8ad5..88d882317d55082178510383bde12d407168e403 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -565,8 +565,6 @@ Algorithm~\ref{alg2-cuda} shows a sketch of the Ehrlich-Aberth method using CUDA
  \section{The EA algorithm on Multi-GPU}
  
  \subsection{MGPU (OpenMP-CUDA)approach}
-Before beginning the calculation, our implementation parallel with OpenMP and CUDA shares the input data between threads OpenMP, these input data sotn Z: the vector solution, P: the polynomial to solve,
-
  Before starting computations, our parallel implementation shared input data of the root finding polynomial between OpenMP threads. From Algorithm 1, the input data are the solution vector $Z$, the polynomial to solve $P$. Let number of OpenMP threads is equal to the number of GPUs, each threads OpenMP ( T-omp) checks one GPU,  and control a part of the shared memory, that is a part of the vector Z  like: $(n/Nbr_gpu)$ roots, n: the polynomial's degrees, $Nbr_gpu$ the number of GPUs. Then every GPU will have a grid of computation organized with its performances and the size of data of which it checks. In principle a grid is set by two parameter DimGrid, the number of block per grid, DimBloc: the number of threads per block. The following schema  shows the architecture of (CUDA,OpenMP).
author	Kahina <kahina@kahina-VPCEH3K1E.(none)>
	Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)
committer	Kahina <kahina@kahina-VPCEH3K1E.(none)>
	Sun, 20 Dec 2015 06:58:21 +0000 (07:58 +0100)