caffe源码分析--math_functions.cu代码研究

本文深入解析CUDA宏定义CUDA_KERNEL_LOOP的使用方式,并介绍如何利用这一技巧实现一个向量点积运算的CUDA核函数。讨论了当向量维数超过线程总数时,如何通过循环使单个线程处理多个元素。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



其中用到一个宏定义CUDA_KERNEL_LOOP

common.hpp中有。


#defineCUDA_KERNEL_LOOP(i,n) \

for(inti = blockIdx.x * blockDim.x + threadIdx.x; \

i < (n); \

i +=blockDim.x * gridDim.x)



先看看caffe采取的线程格和线程块的维数设计,

还是从common.hpp可以看到

CAFFE_CUDA_NUM_THREADS

CAFFE_GET_BLOCKS(constintN)

明显都是一维的。


整理一下CUDA_KERNEL_LOOP格式看看,

for(inti = blockIdx.x * blockDim.x + threadIdx.x;

i< (n);

i+= blockDim.x * gridDim.x)

blockDim.x* gridDim.x表示的是该线程格所有线程的数量。

n表示核函数总共要处理的元素个数。

有时候,n会大于blockDim.x* gridDim.x,因此并不能一个线程处理一个元素。

由此通过上面的方法,让一个线程串行(for循环)处理几个元素。

这其实是常用的伎俩,得借鉴学习一下。




再来看一下这个核函数的实现。


template<typename Dtype>

__global__void mul_kernel(const int n, const Dtype* a,

constDtype* b, Dtype* y)

{

CUDA_KERNEL_LOOP(index,n)

{

y[index]= a[index] * b[index];

}

}


明显就是算两个向量的点积了。

由于向量的维数可能大于该kernel函数线程格的总线程数量。

因此有些线程可以要串行处理几个元素。





-- Build files have been written to: /root/Workspace/openpose/build/caffe/src/openpose_lib-build [ 75%] Performing build step for 'openpose_lib' [ 1%] Running C++/Python protocol buffer compiler on /root/Workspace/openpose/3rdparty/caffe/src/caffe/proto/caffe.proto [ 1%] Building CXX object src/caffe/CMakeFiles/caffeproto.dir/__/__/include/caffe/proto/caffe.pb.cc.o [ 1%] Linking CXX static library ../../lib/libcaffeproto.a [ 1%] Built target caffeproto [ 1%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/util/cuda_compile_1_generated_math_functions.cu.o [ 1%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_absval_layer.cu.o [ 2%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_base_data_layer.cu.o [ 2%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_accuracy_layer.cu.o [ 2%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_batch_norm_layer.cu.o [ 2%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_batch_reindex_layer.cu.o [ 4%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_bnll_layer.cu.o [ 4%] Building NVCC (Device) object src/caffe/CMakeFiles/cuda_compile_1.dir/layers/cuda_compile_1_generated_bias_layer.cu.o In file included from /root/Workspace/openpose/3rdparty/caffe/src/caffe/util/math_functions.cu:1: /usr/local/cuda-11.8/include/math_functions.h:54:2: warning: #warning "math_functions.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp] 54 | #warning "math_functions.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release.
最新发布
03-15
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值