array, \texttt{A[i*size+j]} allows us to access to the element of the $i^{th}$
row and of the $j^{th}$ column.
+In sequential the matrix multiplication is performed using three loops. Supposing that $A$, $B$ represent two square matrices, the result of the multiplication of $A \times B$ is
+
On C2070M Tesla card, this code take 37.68ms to perform the multiplication. On a
Intel Xeon E31245 at 3.30GHz, it takes 2465ms without any parallelization (using
only one core). Consequently the speed up between the CPU and GPU version is