最近在玩Deep learning,试着使用gpu加速来提供项目性能,
参考:http://blog.youkuaiyun.com/xuhang0910/article/details/45601035和http://tieba.baidu.com/p/3329042929
按照步骤一步步设置,最后终于弄好,我使用UDA6.5+VS2013,软件版本不同,个人认为大同小异,在配置过程中,遇到好多麻烦,文章最后会写上自己的教训及经验
准备阶段
<pre name="code" class="cpp">
可以去此网址查看你的显卡是否支持CUDA https://developer.nvidia.com/cuda-gpus#collapse4
如此页面,选择你的显卡类型,如我的是GeFore GT 730m,就要选择GeForce
配置阶段
2.1 打开 CMAKE
- 选择 Source Folder 到 OpenCV 的源文件目录 (注:是source文件夹,里面有CMakeLists.txt)
- 选择 Output Folder (注:输出路径,新建一个文件夹,用于放编译生成的相关文件)
- 勾选 Advanced
2.2 点 Configure,选择编译器
- 选择 ‘Visual Studio 12 2013 Win64′
2.3 配置 CUDA 选项
- 取消 ‘BUILD_DOCS’ and ‘BUILD_EXAMPLES’
- 取消 ‘CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE’
- 检查 ‘CMAKE_LINKER’, 保证是 Visual Studio 12.0 (vs2013)
- 选上 ‘WITH_CUBLAS’, ‘WITH_CUDA’, ‘WITH_OPENGL’, ‘WITH_TBB’
- 点击 Configure 刷新配置
2.4 配置 TBB 选项
- 把 tbb include path 设置好,例如我的是 “D:\toolkits\tbb43_20140724oss\include”。一定要选到 include文件夹为止。
- 点击 Configure 刷新
- 接着可以看到 tbb 的 library 目录自动有了,但是可能是错的,需要改到 Debug 和 Release 文件夹的上级目录为止。例如我的要加上 ‘vc12*,变成 D:/toolkits/tbb43_20140724oss/lib/intel64/vc12〃,如下图
- 点击 Configure 刷新
如果在下方信息框中有:Use TBB: YES(ver 4.1 interface 6105),Use Cuda: YES(ver5.0),证明我们已经将inteltbb和CUDA正确配置
2.5 点击 Generate 生成 OpenCV.sln
3. 修改 OpenCV 源文件
- 打开 ‘opencv-2.4.9\modules\gpu\src\nvidia\core\NCV.cu’ 加上 #include <algorithm>。不然编译时候会有 *max* undefined error
4. 编译 OpenCV.sln
- 如果 OpenCV,tbb,Python 中的某个安装在 C:\Program Files,那么你需要用管理员权限运行 vs2013 才行
- 我建议先编译 *opencv_core* 和 ‘ opencv_gpu* (右键点击,点 *BUILD*)。如果这两个没错,接下来应该都可以编译
- 右键点击 *ALL_BUILD*,再点 *BUILD*
- 然后再编译一下 *INSTALL*,来把编译好的文件放在一起到 *<Output Folder>\install*
- 在切换到 Release,重复 *ALL_BUILD* 和 *INSTALL*
- *Debug* 应该会有一个 error, *Release* 应该会没有 error
- 编译时间很长。很长。(我花了一个晚上,特别是编译opencv_gpu时花的时间巨长)
最后在install文件夹下的bin文件夹中生成了如图:
cuda项目opencv配置环境
1、打开vs2013,新建cuda项目,为vs2013配置OpenCV环境:选择属性管理器,
先后分别在Debug和Release上的Microsoft.Cpp.Win64.user,点击右键,属性,VC++目录:
包含目录:D:\tfiles\opencv-2.4.9\build-vs2013\install\include;
D:\tfiles\opencv-2.4.9\build-vs2013\install\include\opencv;
D:\tfiles\opencv-2.4.9\build-vs2013\install\include\opencv2;
2.可执行文件目录:D:\tfiles\tbb43_20150424oss\bin\intel64\vc12
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin
Debug:
opencv_calib3d249d.lib
opencv_contrib249d.lib
opencv_core249d.lib
opencv_features2d249d.lib
opencv_flann249d.lib
opencv_gpu249d.lib
opencv_highgui249d.lib
opencv_imgproc249d.lib
opencv_legacy249d.lib
opencv_ml249d.lib
opencv_nonfree249d.lib
opencv_objdetect249d.lib
opencv_ocl249d.lib
opencv_photo249d.lib
opencv_stitching249d.lib
opencv_superres249d.lib
opencv_ts249d.lib
opencv_video249d.lib
opencv_videostab249d.lib
Release:
opencv_calib3d249.lib
opencv_contrib249.lib
opencv_core249.lib
opencv_features2d249.lib
opencv_flann249.lib
opencv_gpu249.lib
opencv_highgui249.lib
opencv_imgproc249.lib
opencv_legacy249.lib
opencv_ml249.lib
opencv_nonfree249.lib
opencv_objdetect249.lib
opencv_ocl249.lib
opencv_photo249.lib
opencv_stitching249.lib
opencv_superres249.lib
opencv_ts249.lib
opencv_video249.lib
opencv_videostab249.lib
我的是248,这个没关系:
在验证的程序中,我选择的静态编译,在下面的程序中有,即:
#pragma comment(lib,"opencv_gpu248d.lib")
#pragma comment(lib,"opencv_core248d.lib")
#pragma comment(lib,"opencv_core248d.lib")
#pragma comment(lib,"opencv_highgui248d.lib")
#pragma comment(lib,"opencv_imgproc248d.lib")
5.将D:\tfiles\opencv-2.4.9\build-vs2013\install\x64\vc12\bin(这是我的目录)加入到windows系统环境变量Path中,重启
验证环节:
新建cuda项目,平台debug x64
#include<stdlib.h>
#include<device_launch_parameters.h>
#include<cuda_runtime.h>
#include<opencv2\opencv.hpp>
#include<opencv2\gpu\gpu.hpp>
#pragma comment(lib,"opencv_gpu248d.lib")
#pragma comment(lib,"opencv_core248d.lib")
#pragma comment(lib,"opencv_core248d.lib")
#pragma comment(lib,"opencv_highgui248d.lib")
#pragma comment(lib,"opencv_imgproc248d.lib")
int main()
{
int num_devices = cv::gpu::getCudaEnabledDeviceCount();
if (num_devices <= 0)
{
std::cerr << "There is no devoce" << std::endl;
return -1;
}
int enable_device_id = -1;
for (int i = 0; i < num_devices; i++)
{
cv::gpu::DeviceInfo dev_info(i);
if (dev_info.isCompatible())
{
enable_device_id = i;
}
}
if (enable_device_id < 0)
{
std::cerr << "GPU module isn't built for GPU" << std::endl;
return -1;
}
cv::gpu::setDevice(enable_device_id);
cv::Mat src_image = cv::imread("test.jpg");
cv::Mat dst_image;
cv::gpu::GpuMat d_src_img(src_image);//upload src image to gpu
cv::gpu::GpuMat d_dst_img;
cv::gpu::cvtColor(d_src_img, d_dst_img, CV_BGR2GRAY);//canny
d_dst_img.download(dst_image);//download dst image to cpu
cv::imshow("test", dst_image);
cv::waitKey(50000);
return 0;
}
其中
getCudaEnabledDeviceCount()
返回大于0的数才能证明之前的编译时成功的,可以断点调试一下。
在之前的调试过程中,我也遇到cuda error MSB3721 exited with code 2 这样的报错,找了好多解决方案,都没有一个准确的说法,我是参考http://stackoverflow.com/questions/12888247/cuda-error-msb3721-exited-with-code-2 慢慢调成功的,有些说的是编译的路径不能有中文,我的vs的路径就有中文,其实并没有影响。弄了两天,终于试好了,接下来就要在自己的工程里使用gpu加速了。