CUDA纹理笔记

最新推荐文章于 2024-09-21 16:26:46 发布

fb_help

最新推荐文章于 2024-09-21 16:26:46 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： CUDA 文章标签：笔记

本文链接：https://blog.youkuaiyun.com/fb_help/article/details/79548209

CUDA 专栏收录该内容

51 篇文章

订阅专栏

CUDA纹理笔记

本文参照CUDA C programming guide有关纹理内存的内容。
有关纹理内存讲的好的内容：CUDA texture，纹理内存访问原理

Device memory

CUDA arrays
CUDA arrays are opaque memory layouts optimized for texture fetching. They are described in Texture and Surface Memory.
Linear memory
The normal memory we used via cudaMalloc() cudeFree() cudeMemcpy() and 2D 3D with
cudaMallocPitch() cudaMallocPitch() cudaMemcpy2D() and cudaMemcpy3D()
Othres

__constant__ float constData[256];
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data));
cudaMemcpyFromSymbol(data, constData, sizeof(data));

__device__ float devData;
float value = 3.14f;
cudaMemcpyToSymbol(devData, &value, sizeof(float));

__device__ float* devPointer;
float* ptr;
cudaMalloc(&ptr, 256 * sizeof(float));
cudaMemcpyToSymbol(devPointer, &ptr, sizeof(ptr));

cudaMemcpyToSymbol()与cudaMemcpyFromSymbol() 重点记，它们是申请设备内存的一种方式，主要用于申请常量内存。

Texture and Surface Memory

纹理和表面内存空间驻留在设备内存中，并缓存在纹理高速缓存中，因此纹理读取或表面读取只需要在缓存未命中时从设备内存中读取一个内存，否则仅花费一次从纹理高速缓存读取。纹理缓存针对2D空间局部性进行了优化，因此读取纹理或表面地址相同的warp的线程可以获得最佳性能。此外，它设计用于具有恒定延迟的流式抓取; 缓存命中减少了DRAM带宽需求，但不能获取延迟。

通过纹理或表面读取读取设备内存可带来一些好处，可以使其成为从全局或常量内存中读取设备内存的有利替代方案：
1. 如果内存的访问的请求方式对global or constance memory 不友好（如非合并或warp内不需要广播的情况），texture memory is another alternative method to progress the locality(局部) memory request.
2. 有关地址的计算是有专用单元在核外执行的
3. 打包数据可以被广播给在单个操作中独立的变量;
4. 8位和16位整型输入数据可以选择性地转换为[0.0，1.0]或[-1.0,1.0]范围内的32位浮点值（请参阅纹理存储器）。

每个texture fetch 是texture object（纹理对象）或texture reference（纹理参考）中的一个参数。
纹理参考在编译时被确定，纹理（内存）在运行时与纹理参考绑定。
不同的纹理参考可以对应相同的纹理内存
纹理内存的表现形式是CUDA array （专门用于texture or surface 的内存形式）或 linear memory （用cudamalloc申请的设备内存）
纹理内存的维数决定了纹理坐标的表现形式是一维二维还是三维，纹理内存中的元素被即texture elements 被简写为 texel。
纹理内存的最大的长宽深度以及纹理缓存的大小见：

大小描述	大小
Cache working set per multiprocessor for texture memory （每个多处理器的纹理内存缓存工作集）	Between 24 KB and 48 KB
Maximum width for a 1D texture reference bound to a CUDA array	65536
Maximum width for a 1D texture reference bound to linear memory	2`27
Maximum width and number of layers for a 1D layered texture reference	16384 x 2048
Maximum width and height for a 2D texture reference bound to a CUDA array	65536 x 65535
Maximum width and height for a 2D texture reference bound to linear memory	65000 x 65000
Maximum width and height for a 2D texture reference bound to a CUDA array supporting texture gather	16384 x 16384
Maximum width, height, and number of layers for a 2D layered texture reference	16384 x 16384 x 2048
Maximum width, height, and depth for a 3D texture reference bound to a CUDA array	4096 x 4096 x 4096
Maximum width (and height) for a cubemap texture reference	16384
Maximum width (and height) and number of layers for a cubemap layered texture reference	16384 x 2046
Maximum number of textures that can be bound to a kernel	256
Maximum width for a 1D surface reference bound to a CUDA array	65536
Maximum width and number of layers for a 1D layered surface reference	65536 x 2048
Maximum width and height for a 2D surface reference bound to a CUDA array	65536 x 32768
Maximum width, height, and number of layers for a 2D layered surface reference	65536 x 32768 x 2048
Maximum width, height, and depth for a 3D surface reference bound to a CUDA array	65536 x 32768 x 2048
Maximum width (and height) for a cubemap surface reference bound to a CUDA array	32768
Maximum width (and height) and number of layers for a cubemap layered surface reference	32768 x 2046
Maximum number of surfaces that can be bound to a kernel	16

7. 纹理内存的元素可定义为char, short, int, long, longlong, float, double等类型。
8. 读取模式（Read mode）是否归一化：cudaReadModeNormalizedFloat or cudaReadModeElementType 属性可以决定读取的是归一化的数据还是原始类型的数据。
如果是：cudaReadModeNormalizedFloat，且纹理元素是1字节或2字节的整型。纹理获取返回的值实际上是作为浮点类型返回的，整数类型的整个范围被映射为无符号整数类型的[0.0,1.0]和有符号整数类型的[-1.0,1.0]; 例如，值为0xff的无符号8位纹理元素读取为1.
9. 纹理坐标是否标准化（归一化）：默认情况下纹理坐标的范围在[0,N-1]，如果使用纹理坐标归一化，那么纹理坐标的范围是[0,1/(N-1)]，在某些应用中可以用到，例如不考虑纹理的大小，只考虑比例的程序。
10. 寻址模式（Addressing mode），当请求在边界内（纹理内存范围内）时，是标准的情况，但当寻址请求超出了边界，超出的部分我们通过不同的模式可以处理它：cudaAddressModeBorder，cudaAddressModeClamp，cudaAddressModeWrap和cudaAddressModeMirror; cudaAddressModeWrap，其中后两个在支持归一化读取时可用。
cudaAddressModeClamp：超出范围就用边界值代替，示意： AA | ABCDE | EE
cudaAddressModeBorder：超出范围就用零代替，示意： 00 | ABCDE | 00
cudaAddressModeWrap：重叠模式（循环），示意： DE | ABCDE || AB
cudaAddressModeMirror：镜像模式，示意： BA | ABCDE | ED
参考：http://blog.youkuaiyun.com/Kelvin_Yan/article/details/54019017
11. 纹理过滤模式（Flter mode），定义了fetch返回结果的计算方式，或称插值模式，线性纹理过滤只能对配置为返回浮点数据的纹理进行，当访问的纹理坐标在已有的纹理坐标之间时，插值就是获得一个理想的估计值，过滤模式有：
cudaFilterModePoint or cudaFilterModeLinear
cudaFilterModePoint：点模式，返回最接近的一个点，即最近邻插值。插值公式 tex(x) = T(i)，i=floor(x)，注意是对坐标向下取整，所以一般对输入坐标值+0.5，避免无法精确表示的某些数值出现错误取值，如 x=3，实际是2.99999，此时实际获取的是x=2的元素
cudaFilterModeLinear：线性模式，即线性插值，对于一维纹理，两点插值；对于二维纹理，四点插值；对于三维纹理，八点插值。线性模式只有在fetch返回浮点类型数据（注意并非指read mode的归一化浮点模式）下才有效
参考：http://blog.youkuaiyun.com/Kelvin_Yan/article/details/54019017

Texture Object API：

纹理对象是使用cudaCreateTextureObject（）从指定纹理的struct cudaResourceDesc类型的资源描述创建的，也可以是从如此定义的纹理描述中创建的：

struct cudaTextureDesc
{
    enum cudaTextureAddressMode addressMode[3];
    enum cudaTextureFilterMode  filterMode;
    enum cudaTextureReadMode    readMode;
    int                         sRGB;
    int                         normalizedCoords;
    unsigned int                maxAnisotropy;
    enum cudaTextureFilterMode  mipmapFilterMode;
    float                       mipmapLevelBias;
    float                       minMipmapLevelClamp;
    float                       maxMipmapLevelClamp;
};

addressMode specifies the addressing mode;
filterMode specifies the filter mode;
readMode specifies the read mode;
normalizedCoords specifies whether texture coordinates are normalized or not;
See reference manual for sRGB, maxAnisotropy, mipmapFilterMode, mipmapLevelBias, minMipmapLevelClamp, and maxMipmapLevelClamp.

Texture Object 程序的相关步骤是：
1. Alloate CUDA array
2. Copy the host memory to the CUDA array in device memory
3. Specify the Texture ,defined resDesc, using CUDA array above
4. Specify the Texture Object parameters define texDesc.
5. Create the Texture Object by the resDesc and texDesc.
6. Launch the kernel and completed the calculate.
7. Free the Texture Object and the CUDA array.

Texture Reference API:

Texture Reference 的一些变量是不可变的，必须在编译时就确定。它作为一个纹理类型的变量（a variable of type texture）即texture类型的变量。在文件范围内被声明。

A texture reference can only be declared as a static global variable and cannot be passed as an argument to a function.
texture<DataType, Type, ReadMode> texRef;

DataType specifies the type of the texel;
DataTyepe定义了纹理元素的类型
Type specifies the type of the texture reference and is equal to cudaTextureType1D, cudaTextureType2D, or cudaTextureType3D, for a one-dimensional, two-dimensional, or three-dimensional texture, respectively, or cudaTextureType1DLayered or cudaTextureType2DLayered for a one-dimensional or two-dimensional layered texture respectively; Type is an optional argument which defaults to cudaTextureType1D;
Type定义了Texture Reference的类型，是1D ，2D或3D。
ReadMode specifies the read mode; it is an optional argument which defaults to cudaReadModeElementType.
ReadMode 定义了数据读取的模式是归一化还是原值。默认是原值： cudaReadModeElementType。

纹理引用的其他属性是可变的，并且可以在运行时通过主机运行时更改。如参考手册中所述，运行时API具有低级C风格界面和高级C ++风格界面。纹理类型在高级API中被定义为从低级API中定义的textureReference类型公开派生的结构，如下所示：

struct textureReference {
    int                          normalized;
    enum cudaTextureFilterMode   filterMode;
    enum cudaTextureAddressMode  addressMode[3];
    struct cudaChannelFormatDesc channelDesc;
    int                          sRGB;
    unsigned int                 maxAnisotropy;
    enum cudaTextureFilterMode   mipmapFilterMode;
    float                        mipmapLevelBias;
    float                        minMipmapLevelClamp;
    float                        maxMipmapLevelClamp;
}

normalized specifies whether texture coordinates are normalized or not;
filterMode specifies the filtering mode;
addressMode specifies the addressing mode;
channelDesc describes the format of the texture element; it must match the DataType argument of the texture reference declaration; channelDesc is of the following type:

struct cudaChannelFormatDesc {
  int x, y, z, w;
  enum cudaChannelFormatKind f;
};

where x, y, z, and w are equal to the number of bits of each component of the returned value and f is:

cudaChannelFormatKindSigned if these components are of signed integer type,
cudaChannelFormatKindUnsigned if they are of unsigned integer type,
cudaChannelFormatKindFloat if they are of floating point type.
normalized, addressMode, and filterMode may be directly modified in host code.
1.在内核可以使用纹理参考来读取纹理内存之前， texture reference（纹理参考）必须使用线性内存的cudaBindTexture（）或cudaBindTexture2D（）或CUDA阵列的cudaBindTextureToArray（）来绑定到纹理。
2.cudaUnbindTexture（）用于解除绑定纹理参考。一旦纹理参考被解除绑定，即使使用先前绑定纹理的内核尚未完成，也可以安全地将其重新绑定到另一个阵列。
3.建议使用cudaMallocPitch（）在线性内存中分配二维纹理，并使用cudaMallocPitch（）返回的间距作为cudaBindTexture2D（）的输入参数。

The following code samples bind a 2D texture reference to linear memory pointed to by devPtr:

Using the low-level API:

texture<float, cudaTextureType2D,
        cudaReadModeElementType> texRef;
textureReference* texRefPtr;
cudaGetTextureReference(&texRefPtr, &texRef);
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
size_t offset;
cudaBindTexture2D(&offset, texRefPtr, devPtr, &channelDesc,
                  width, height, pitch);

Using the high-level API:

texture<float, cudaTextureType2D,
        cudaReadModeElementType> texRef;
cudaChannelFormatDesc channelDesc =
                             cudaCreateChannelDesc<float>();
size_t offset;
cudaBindTexture2D(&offset, texRef, devPtr, channelDesc,
                  width, height, pitch);

The following code samples bind a 2D texture reference to a CUDA array cuArray:
- Using the low-level API:

texture<float, cudaTextureType2D,
        cudaReadModeElementType> texRef;
textureReference* texRefPtr;
cudaGetTextureReference(&texRefPtr, &texRef);
cudaChannelFormatDesc channelDesc;
cudaGetChannelDesc(&channelDesc, cuArray);
cudaBindTextureToArray(texRef, cuArray, &channelDesc);

Using the low-level API:

texture<float, cudaTextureType2D,
        cudaReadModeElementType> texRef;
cudaBindTextureToArray(texRef, cuArray);

16位浮点：

CUDA阵列支持的16位浮点或半格式与IEEE 754-2008 binary2格式相同。
CUDA C不支持匹配的数据类型，但通过unsigned short类型提供内部函数来转换32位浮点格式：__float2half_rn（float）和__half2float（unsigned short）。这些功能仅在设备代码中受支持。例如，主机代码的等效函数可以在OpenEXR库中找到。
在执行任何过滤之前，16位浮点组件在纹理提取期间被提升为32位浮点。
可以通过调用cudaCreateChannelDescHalf *（）函数之一来创建16位浮点格式的通道描述。

纹理图层（Texture Layer）

一维或二维分层纹理（也称为Direct3D中的纹理阵列和OpenGL中的阵列纹理）是由一系列图层组成的纹理，所有图层都是具有相同维度，大小和数据类型的常规纹理。

使用整数索引和浮点纹理坐标来寻址一维分层纹理;该索引表示该序列内的一个层，并且该坐标对该层内的纹理元素进行寻址。使用整数索引和两个浮点纹理坐标来寻址二维分层纹理;索引表示序列中的一个层，坐标表示该层内的纹理元素。

通过使用cudaArrayLayered标志调用cudaMalloc3DArray（）（一维分层纹理的高度为零），分层纹理只能是CUDA数组。

使用tex1DLayered（），tex1DLayered（），tex2DLayered（）和tex2DLayered（）中描述的设备函数获取分层纹理。纹理过滤（请参见纹理拾取）仅在一个图层内完成，而不是跨层完成。

立方体纹理（Cubemap Textures）

立方体贴图纹理是一种特殊类型的二维分层纹理，它具有六个层来表示立方体的面：

立方体纹理图层（Cubemap Layered Textures）

立方体贴图分层纹理是分层纹理，其图层是具有相同尺寸的立方体贴图。

面内存（Surface Memory）

可使用cudaArraySurfaceLoadStore标志创建的CUDA数组（可在Cubemap表面中描述）通过表面对象或表面参考进行读取和写入。其他基本和texture memory一样。

图形互操作性

来自OpenGL和Direct3D的一些资源可能被映射到CUDA的地址空间中，以使CUDA能够读取由OpenGL或Direct3D写入的数据，或者使CUDA能够写入数据供OpenGL或Direct3D使用。
在使用OpenGL互操作性和Direct3D互操作性中提到的功能进行映射之前，资源必须注册到CUDA。这些函数返回一个指向struct cudaGraphicsResource类型的CUDA图形资源的指针。注册资源的开销可能很高，因此通常每个资源仅调用一次。使用cudaGraphicsUnregisterResource（）取消注册CUDA图形资源。打算使用资源的每个CUDA上下文都需要单独注册。