1.Compute Unit图解
2.Kernel实现
2.1 local histogram的初始化
代码:
__local int localHistorgram[HIST_BINS];
int lid = get_local_id(0);
/* Initialize local histogram to zero */
for (int i = lid; i < HIST_BINS; i += get_local_size(0)) {
localHistorgram[i] = 0;
}
/* Wait nutil all work-items within
* the work-group have completed their stores */
barrier(CLK_LOCAL_MEM_FENCE);
运行示意图:
2.2 histogram功能统计
代码:
/* Compute local histogram */
int gid = get_global_id(0);
for (int i = gid; i < numData; i += get_global_size(0)) {
atomic_add(&(localHistorgram[data[i]]), 1);
}
/* Wait nutil all work-items within
* the work-group have completed their stores */
barrier(CLK_LOCAL_MEM_FENCE);
运行原理跟初始化部分类似,都是workitem内部串行化执行,workitem间并行。
2.3 histogram归一合并
代码:
/* Write the local histogram out to
* the global histogram */
for (int i = lid; i < HIST_BINS; i += get_local_size(0)) {
atomic_add(&(histogram[i]), localHistorgram[i]);
}