CUDA例程：cdpSimplePrint

最新推荐文章于 2025-09-14 13:35:15 发布

原创最新推荐文章于 2025-09-14 13:35:15 发布 · 667 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#cuda #sample

CUDA例程全解析专栏收录该内容

1 篇文章

订阅专栏

概念

CUDA：Compute unified device architecture 统一计算架构; 计算统一设备架构; 统一计算设备架构（百科）

CDP：CUDA dynamic parallelism CUDA动态并行特性

核函数：__global__ func<<<m,n>>>

线程栅格：每个核函数 func 都会有一个线程栅格 <<<m,n>>>

线程块：m，表示栅格中包含m个线程块

线程：n，表示线程块中包含n个线程

例子功能

核函数打印当前线程栅格中所有id为0的线程，并且递归调用，max_depth为递归深度

核函数

__global__ void cdp_kernel(int max_depth, int depth, int thread, int parent_uid)
{
    // We create a unique ID per block. Thread 0 does that and shares the value with the other threads.
    __shared__ int s_uid;

    if (threadIdx.x == 0)
    {
        s_uid = atomicAdd(&g_uids, 1);
    }

    __syncthreads();

    // We print the ID of the block and information about its parent.
    print_info(depth, thread, s_uid, parent_uid);

    // We launch new blocks if we haven't reached the max_depth yet.
    if (++depth >= max_depth)
    {
        return;
    }

    cdp_kernel<<<gridDim.x, blockDim.x>>>(max_depth, depth, threadIdx.x, s_uid);
}

atomicAdd() ：原子++，可以查看cuda c programming guide

__syncthreads() ：同步等待同一线程块中所有的线程

本例中，打印结果会随着迭代深度、核函数m、n的变化而变化，可以试试加深理解