Cuda By Example - 10 (Texture Memory)

最新推荐文章于 2024-11-01 10:18:25 发布

我且低吟浅唱

最新推荐文章于 2024-11-01 10:18:25 发布

阅读量420

点赞数 3

分类专栏： cuda by example 读书笔记文章标签： cuda

本文链接：https://blog.youkuaiyun.com/dairyman000/article/details/143290620

版权

《Cuda By Example》这本书写得很早，Texture Memory这部分改变很大。旧的使用方法有很多别扭的地方。不过思路上可以借鉴一下。

blend_kernel函数里访问dev_inSrc的方式，很适合使用Texture Memory来加速内存访问速度。现在来看如何Texture Memory。

Texture Memory声明

首先我们声明3个texture变量，用于热源，输出和输出。因温度是float类型，所以使用texture<float>类型。不熟悉texture<float>这种写法，可以去看一下C++的类模板。这些变量作为全局变量声明的。

texture<float> texConstSrc;
texture<float> texIn;
texture<float> texOut;

绑定到Global Memory

然后将texture变量跟cudaMalloc分配GPU内存绑定，这样就可以使用texture变量来访问dev_constSrc, dev_inSrc, dev_outSrc指向的内存了。绑定使用CUDA C提供的函数cudaBinTexture来完成。绑定放在main函数申请了dev_*内存之后。

cudaBindTexture(NULL, texConstSrc, d.dev_constSrc, imageSize);
cudaBindTexture(NULL, texIn, d.dev_inSrc, imageSize);
cudaBindTexture(NULL, texOut, d.dev_outSrc, imageSize);

读取Texture Momory数据

从texture变量中取数据，必须使用特殊的函数：tex1Dfetch(). 比如从texIn中取offset处的数据，我们需要使用下面的语句

float v = tex1Dfetch(texIn, offset);

而tex1Dfetch又引入另外一个问题，tex1Dfetch的第一个参数texture在编译时就需要确定。因此在blend_kernel中，不能将texture变量作为参数传入，然后用作tex1Dfetch的第一个参数。而函数又需要翻转dev_inSrc和dev_outSrc，轮流作为输入和输出buffer。例子采用的变通方式是传入一个布尔量作为哪个buffer用作输入，哪个buffer用作输出的标识。

新的blend_kernel函数如下：

__global__ void blend_kernel(float *dst, bool dstOut)
{
    // map from threadIdx/blockIdx to pixel position
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    int offset = x + y * blockDim.x * gridDim.x;

    int left = offset - 1;
    int right = offset + 1;
    if (x == 0)
        left++;
    if (x == DIM -1)
        right--;

    int top = offset - DIM;
    int bottom = offset + DIM;
    if (y == 0)
        top += DIM;
    if (y == DIM -1)
        bottom -= DIM;

    float t, l, c, r, b;
    if (dstOut) {
        t = tex1Dfetch(texIn, top);
        l = tex1Dfetch(texIn, left);
        c = tex1D

最低0.47元/天解锁文章