CUDA内置变量
uint3 gridDim; //单个网格中每一维度上的块数 uint3 blockIdx; //块在网格中的索引 uint3 blockDim; //单个块中每一维度上的线程数 uint3 threadIdx; //线程在块中的索引
调用kernek函数
dim3 gridDims , blockDims; //对应device端的gridDim和blockDim kernelFunc <<<gridDims,blockDims>>> (args);
从硬件角度考虑线程的连续性,维度从低到高的顺序是blockIdx.x,blockIdx.y,blockIdx.z。由于不同块可能在不同的SM上执行,所以其相邻性不确定。计算线程绝对索引的方式为
uint numThreadsPerBlockLine = blockDim.x; uint numThreadsPerBlockPlane = numThreadsPerBlockLine.x * blockDim.y; uint numThreadsPerBlock = numThreadsPerBlockPlane * blockDim.z; uint numThreadsPerGridLine = numThreadsPerBlock * gridDim.x; uint numThreadsPerGridPlane = numThreadsPerGridLine * gridDim.y; uint id = numThreadsPerGridPlane * blockIdx.z + numThreadsPerGridLine * blockIdx.y + numThreadsPerBlock * blockIdx.x + numThreadsPerBlockPlane * threadIdx.z + numThreadsPerBlockLine * threadIdx.y + threadIdx.x; uint ix = blockIdx.x * blockDim.x + threadIdx.x; uint iy = blockIdx.y * blockDim.y + threadIdx.y; uint iz = blockIdx.z * blockDim.z + threadIdx.z; uint threadsPerLine = gridDim.x * blockDim.x; uint threadPerColumn = gridDim.y * blockDim.y; uint threadsPerPlane = threadsPerLine * threadsPerColumn; uint id = threadsPerPlane * iz + threadsPerLine * iy + ix;
- 将二维线程索引和块索引转换为二维内存索引时,通常是为了处理图像,所以同一块中的线程对应的像素数据应该具备空间局部性。此时的解决方法是使用共享内存缓存块对应的像素内存,或使用纹理内存。不论使用哪一种方法,二维内存索引的计算方式都不能使用计算线程绝对索引的方法,因为该方法得到的索引在内存上连续(连续内存和空间局部性冲突)。
线程组织
最新推荐文章于 2024-03-03 22:15:18 发布