GPU杂记

原创已于 2022-12-02 18:33:59 修改 · 580 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#gpu

于 2022-12-02 18:06:14 首次发布

本文介绍了GPU中的并行计算概念，包括lane的概念及其在GPU体系结构中的作用，同时解释了SIMTStack的工作原理以及如何通过它实现线程间的独立运行。

为了提升效率，GPU一般会将若干个线程（thread）组织起来。NVIDIA将这些线程的组织成为wrap，AMD将其成为wavefronts。调度是以wrap为单位进行的。
什么叫GPU里的lane，
体系结构量化分析方法中的描述如下：
All modern vector computers have vector functional units with multiple parallel pipelines (or lanes) that can produce two or more results per clock cycle, but they may also have some functional units that are not fully pipelined.
也就是说lane是一种pipeline，这种pipeline可以在一个周期内产生多个结果。
GPGPU 处理器架构中的描述如下：
Each thread executes on the function unit associated with a lane ……
这里说了一个lane有多个function unit和其相关联。有哪些function unit呢，举了个例子，NVIDIA GPU的function unit有special function unit（SFU），load/store unit, floating -point function unit, integer funtcion unit……
通过SIMT Stack和predication可以让GPU的各个线程在程序员的角度看起来是互相独立的。
什么是STIMT Stack： stack of predicate masks that we shall refer to as the SIMT stack。SIMT为thread相互独立的运行解决了两个关键问题，嵌套控制流和skipping computation。