allocates memory on the GPU. The second parameter represents the size of the
allocated variables, this size is expressed in bits.
allocates memory on the GPU. The second parameter represents the size of the
allocated variables, this size is expressed in bits.
In this example, we want to compare the execution time of the additions of two
arrays in CPU and GPU. So for both these operations, a timer is created to
measure the time. CUDA proposes to manipulate timers quite easily. The first
In this example, we want to compare the execution time of the additions of two
arrays in CPU and GPU. So for both these operations, a timer is created to
measure the time. CUDA proposes to manipulate timers quite easily. The first
the end to stop it. For each of these operations a dedicated function is used.
In order to compute the same sum with a GPU, the first step consists of
transferring the data from the CPU (considered as the host with CUDA) to the GPU
(considered as the device with CUDA). A call to \texttt{cudaMemcpy} copies the content of an array allocated in the host to the device when the fourth
parameter is set
the end to stop it. For each of these operations a dedicated function is used.
In order to compute the same sum with a GPU, the first step consists of
transferring the data from the CPU (considered as the host with CUDA) to the GPU
(considered as the device with CUDA). A call to \texttt{cudaMemcpy} copies the content of an array allocated in the host to the device when the fourth
parameter is set
parameter of the function is the destination array, the second is the
source array, and the third is the number of elements to copy (expressed in
bytes).
parameter of the function is the destination array, the second is the
source array, and the third is the number of elements to copy (expressed in
bytes).
sufficient). In Listing~\ref{ch2:lst:ex1} at the beginning, a simple kernel,
called \texttt{addition} is defined to compute in parallel the summation of the
two arrays. With CUDA, a kernel starts with the
sufficient). In Listing~\ref{ch2:lst:ex1} at the beginning, a simple kernel,
called \texttt{addition} is defined to compute in parallel the summation of the
two arrays. With CUDA, a kernel starts with the
indicates that this kernel can be called from the C code. The first instruction
in this kernel is used to compute the variable \texttt{tid} which represents the
indicates that this kernel can be called from the C code. The first instruction
in this kernel is used to compute the variable \texttt{tid} which represents the
-(called \texttt{blockIdx} \index{CUDA~keywords!blockIdx} in CUDA) and of the
-thread index (called \texttt{threadIdx}\index{CUDA~keywords!threadIdx} in
+(called \texttt{blockIdx} \index{CUDA keywords!blockIdx} in CUDA) and of the
+thread index (called \texttt{threadIdx}\index{CUDA keywords!threadIdx} in
-2 dimensions, or 3 dimensions. {\bf A REGARDER} According to the dimension of manipulated data,
-the appropriate dimension can be useful. In our example, only one dimension is
+2 dimensions, or 3 dimensions. According to the dimension of manipulated data,
+the dimension of blocks of threads must be chosen carefully. In our example, only one dimension is
used. Then using the notation \texttt{.x}, we can access the first dimension
(\texttt{.y} and \texttt{.z}, respectively allow access to the second and
used. Then using the notation \texttt{.x}, we can access the first dimension
(\texttt{.y} and \texttt{.z}, respectively allow access to the second and
operations~\cite{ch2:journals/ijhpca/Dongarra02}. Some of those operations seem
to be easy to implement with CUDA. Nevertheless, as soon as a reduction is
needed, implementing an efficient reduction routine with CUDA is far from being
operations~\cite{ch2:journals/ijhpca/Dongarra02}. Some of those operations seem
to be easy to implement with CUDA. Nevertheless, as soon as a reduction is
needed, implementing an efficient reduction routine with CUDA is far from being
operation which combines all the elements of an array and extracts a number
computed from all the elements. For example, a sum, a maximum, or a dot product
are reduction operations.
operation which combines all the elements of an array and extracts a number
computed from all the elements. For example, a sum, a maximum, or a dot product
are reduction operations.