OpenCl 笔记1 Memory Model

本文介绍了GPU内存模型的不同组成部分,包括寄存器、本地内存、共享内存、常数内存及多种纹理数组等,并讨论了这些内存区域的特点及使用限制。此外还提到了全局内存及其与其他类型内存相比的速度差异。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Memory model

Registers

Equally to a CPU register file, it is private for each thread and read-/write-able. The amount of registers is limited depending on the occupancy, the kernel complexity and the GPU generation. Should the register file be exhausted, then data spills into local memory.


Local Memory

It is introduced to provide a dynamic approach of register files in order to overcome hardware limitations.  The price to be paid is performance loss.


Shared Memory

It can be used for communication between all threads of a thread block as well as primary local storage space. Shared memory is generally the lowest latency communication method between threads. It is read- and write-able, but no coherency is guaranteed if two threads try to access it at the same point of time. Therefore atomic functions are included in the framework.


Constant Memory

The constant memory is one of the read only address spaces.


1-D Texture Array

In contrast to the constant memory the texture array allows an automatic interpolation between neighboring values - in hardware - depending on the given position.

1-D Linear Texture

In contrast to the 1-D texture array, the 1-D linear texture is write-able for kernel functions. Since the texture caches don't force coherence, it is important to understand the behavior will be undefined if a thread writes to a certain position while another thread is reading the position.

2-D Texture Array

Similar to 1-D texture array, provides a bilinear interpolations by hardware.

2-D Texture from Pitch-Linear Memory

Similar to 1-D linear Texture, they are write-able by the kernel.

3-D Texture Array

Unfortunately, there is no 3-D write-able texture available.

Global Memory

Compared to others,it is the slowest possible access. It is limited only by the amount of memory available on the graphics card.




Conclusion: without detailed knowledge about this memory model a parallel implementation is still possible, but a huge loss in performance is very likely.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值