unit of execution in CUDA is called a thread. Each thread executes a kernel by the streaming processors in parallel. In CUDA,
a group of threads that are executed together is called a thread block, and the computational grid consists of a grid of thread
blocks. Additionally, a thread block can use the shared memory on a single multiprocessor while the grid executes a single
unit of execution in CUDA is called a thread. Each thread executes a kernel by the streaming processors in parallel. In CUDA,
a group of threads that are executed together is called a thread block, and the computational grid consists of a grid of thread
blocks. Additionally, a thread block can use the shared memory on a single multiprocessor while the grid executes a single