
GPU/CUDA
文章平均质量分 76
Firehotest
这个作者很懒,什么都没留下…
展开
-
Self Summary: Basic concepts of GPU
Some basic concepts of GPU programming: Here is the overview of a GPU(Fermi Architecture)[1]:It is a 16-way many core (16 SM) GPU. Each way of many core has the architecture like t原创 2016-02-09 22:10:22 · 692 阅读 · 0 评论 -
Study Note: Shared Memory Optimisation -- avoid of bank conflict
This article is illustrated bases on 2.x computation device: Typically speaking, a shared memory has 16KB totally. And it has 32 banks for 2.x computation device. Bank is a unit of parallel read原创 2016-02-28 13:11:37 · 1382 阅读 · 0 评论 -
Study Note: Schedule Optimisation and math_intrinsic in CUDA Programming
Let us introduce a new term first[1]. It is the ratio of active warps / maximum number(32) of warps. It depends on three parameters: 1) threads/block (set in >>)2) registers/th原创 2016-02-29 09:41:27 · 689 阅读 · 0 评论 -
Study Note: Instruction Optimisation of CUDA programming
Consideration 1: Branch Divergence Before we talk about this, let us go through what is going on in GPU actually.Here is the abstract model of SM like[1]:Every SM has one con原创 2016-02-28 22:55:24 · 977 阅读 · 0 评论 -
Study Note: Global memory optimisation of CUDA programming
Global memory coalescing: The storage pattern of global memory in GPU is row first pattern because there is not two dimension array in GPU. Use a matrix as an example[1]: Knowledge of原创 2016-02-27 23:49:19 · 836 阅读 · 0 评论 -
CUDA/ GPU: CUDA核函数的运行参数
转载自:http://blog.youkuaiyun.com/jonny_super/article/details/23208227核函数是GPU每个thread上运行的程序。必须通过__gloabl__函数类型限定符定义。形式如下: __global__ void kernel(param list){ }核函数只能在主机端调用,调用时必须申明执转载 2017-03-17 08:10:56 · 1797 阅读 · 0 评论