cuda练习(二)：灰度统计直方图

最新推荐文章于 2025-09-14 13:23:20 发布

原创

最新推荐文章于 2025-09-14 13:23:20 发布 · 2k 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#cuda #opencv #图像

本文介绍了使用CUDA实现灰度图像统计直方图的三种GPU版本：CPU版本、GPU错误的朴素实现、使用原子操作的GPU版本以及利用共享内存优化的GPU版本。通过测试，发现使用共享内存的实现显著提升了速度，同时指出了未使用原子操作时可能出现的数据不正确问题。

编写代码

首先将上次的转灰度图的程序拷过来用于生成灰度图

共编写了cpu、gpu_wrong_naive、gpu_naive、gpu_usesharemem四种方式实现

cpu版本

cpu版本代码很简单：

void getGrayHistincpu(unsigned char * const grayData, 
                    unsigned int * const hist,
                    uint imgheight,
                    uint imgwidth)
{
    for(int i = 0; i < imgheight; i++)
    {
        for (int j = 0; j < imgwidth; j++)
        {
            hist[grayData[i*imgwidth+j]]++;
        }
    }
}

gpu版本1——直接照搬 gpu_wrong_naive

__global__ void getGrayHistincuda_wrong_naive(unsigned char * const grayData, 
                                unsigned int * const hist,
                                uint imgheight,
                                uint imgwidth)  //会发生冲突，数值每次会变化
{
    const unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;
    const unsigned int idy = blockDim.y * blockIdx.y + threadIdx.y;    

    if(idx < imgwidth && idy < imgheight)
    {
        const unsigned long pid = imgwidth * idy + idx;
        const unsigned char value = grayData[pid];
        hist[value]++;
    }
}

这个代码有问题，因为各个线程会同时访问同一块全局内存，数值会不正确

gpu版本2——原子操作 gpu_naive

__global__ void getGrayHistincuda_naive(unsigned char * const grayData, 
                                            unsigned int * const hist,
                                            uint imgheight,
                                            uint imgwidth)  //使用原子操作保证数值正确
{
    const unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;
    const unsigned int idy = blockDim.y * blockIdx.y + threadIdx.y;    

    if(idx < imgwidth && idy < imgheight)
    {
        const unsigned long pid = imgwidth * idy + idx;
        const unsigned char value = grayData[pid];
        atomicAdd(&(hist[value]), 1);
    }
}