（Caffe）编程小技巧

最新推荐文章于 2019-10-31 13:43:21 发布

沤江一流

最新推荐文章于 2019-10-31 13:43:21 发布

阅读量5.3k

点赞数 1

分类专栏： Cuda Caffe

本文链接：https://blog.youkuaiyun.com/mounty_fsc/article/details/51296091

版权

Caffe 同时被 2 个专栏收录

15 篇文章

订阅专栏

Cuda

4 篇文章

订阅专栏

1. Cuda中要处理单位数据N大于可用的线程数量N’时

以向量乘函数为例，mul_kernel(n,a,b,y)对长为n的a,b求内积，结果放入y

template <typename Dtype>
__global__ void mul_kernel(const int n, const Dtype* a,
    const Dtype* b, Dtype* y) {

  CUDA_KERNEL_LOOP(index, n) {
    y[index] = a[index] * b[index];
  }

}

展开CUDA_KERNEL_LOOP(index, n)得

template <typename Dtype>
__global__ void mul_kernel(const int n, const Dtype* a,

  for (int i = blockIdx.x * blockDim.x + threadIdx.x; \
       i < (n); \
       i += blockDim.x * gridDim.x)

    y[i] = a[i] * b[i];

  }
}

说明：