The kernels terminate it computations when all the roots converge. Finally, the solution of the root finding problem is copied back from GPU global memory to CPU memory. We use the communication functions of CUDA for the memory allocation in the GPU \verb=(cudaMalloc())= and for data transfers from the CPU memory to the GPU memory \verb=(cudaMemcpyHostToDevice)=
or from GPU memory to CPU memory \verb=(cudaMemcpyDeviceToHost))=.
%%HIER END MY REVISIONS (SIDER)
The kernels terminate it computations when all the roots converge. Finally, the solution of the root finding problem is copied back from GPU global memory to CPU memory. We use the communication functions of CUDA for the memory allocation in the GPU \verb=(cudaMalloc())= and for data transfers from the CPU memory to the GPU memory \verb=(cudaMemcpyHostToDevice)=
or from GPU memory to CPU memory \verb=(cudaMemcpyDeviceToHost))=.
%%HIER END MY REVISIONS (SIDER)