Ubuntu下编译caffe解决方案总结

最新推荐文章于 2024-05-11 23:32:26 发布

原创最新推荐文章于 2024-05-11 23:32:26 发布 · 1.7k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#caffe #makefile #deeplearning #cuda

━═━═━◥ 后端 ◤━═━═━ 同时被 3 个专栏收录

37 篇文章

订阅专栏

Deep Learning

29 篇文章

订阅专栏

深度学习实践

24 篇文章

订阅专栏

本文详细记录了在Ubuntu16.04环境下搭建Caffe深度学习框架的全过程，包括解决各种常见错误的方法，如HDF5库版本不匹配、缺失文件等问题。通过本文，读者可以了解如何正确配置环境并解决安装过程中遇到的各种问题。

　　下一届学生要在我的项目的基础上开展大学生科研项目，所以记录下来给自己以后借鉴使用，也可以帮助他们少走一些弯路。
　　caffe这个深度学习框架学要得对环境的要求和依赖的使用非常多，很容易出现问题，网上关于caffe的安装教程非常多，但是关于每一步是否操作成功，出现了什么样的错误又该如何处理没有给出说明。
　　操作系统的环境千差万别，按照博客中的教程一步步的安装，最后可能失败。建议一定要先查看官方文档，不要盲目复制别人技术博客的命令。
　　caffe官方给出的ubuntu下的caffe.berkeleyvision.org教程

一切不说具体实验环境的配置都是耍流氓

环境

操作系统： Ubuntu 16.04
GCC/G++:5.4.3
OpenCV: 2.4.11和3.1.0
Matlab :R2014b(a)
Python: 2.7

“fatal error: hdf5.h: 没有那个文件或目录”

解决方法：
1.修改 Makefile.config 文件内容：

将
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib 
修改为： 
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

2.修改 caffe 目录下的 Makefile 文件：

将：
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
修改为：
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

apt-get软件包没有被完全安装或卸载解决

错误是 x的 pre-remove 脚本引发的，导致不能正常卸载这个包
解决方法：
/var/lib/dpkg/info 中找到 x.prerm 修改，清除里面的脚本，然后删除这个受损的包，可以利用

apt-get remove --purge x

直接删除这个包

leveldb/db.h：没有那个文件或目录

解决方法：
levelDB是一个快速的key-value （String,String）存储库，由Google编写。
1.下载levelDB文件：

git clone https://github.com/google/leveldb.git

2.到levelDB目录下，执行make命令
执行完成后：
leveldb/下多出out-shared和out-static目录，其中out-shared/下有：

db db_bench helpers libleveldb.so libleveldb.so.1 libleveldb.so.1.19 port table util

3.将文件拷贝

sudo cp out-shared/libleveldb.so* /usr/local/lib & sudo cp -R include/* /usr/local/include

make: protoc: 命令未找到

解决方法：

sudo apt-get install protobuf-c-compiler protobuf-compiler

fatal error: gflags/gflags.h:没有那个文件或目录

解决方法：

sudo apt-get install libgflags-dev

fatal error: glog/logging.h: 没有那个文件或目录

解决方法：

sudo apt-get install libgoogle-glog-dev

fatal error: hdf5.h: 没有那个文件或目录

解决方法：

sudo apt-get install libhdf5-*

fatal error: lmdb.h: 没有那个文件或目录

解决方法：

sudo apt-get install liblmdb-dev

./include/caffe/util/cudnn.hpp:5:19: fatal error: cudnn.h: 没有那个文件或目录

解决方法：在Makefile.config中添加#include

error: ‘GetSolver’ is not a member of ‘caffe’

解决方法：

将 
mySolver.reset(caffe::GetSolver(solver_param));
修改为
 mySolver.reset(caffe::SolverRegistry::CreateSolver(solver_param));

caffe/blob.hpp:9:34: fatal error: caffe/proto/caffe.pb.h: No such file or directory

解决方法：
用protoc从caffe/src/caffe/proto/caffe.proto生成caffe.pb.h和caffe.pb.cc

~/caffe/src/caffe/proto$ protoc --cpp_out=/当前路径 caffe.proto

Makefile:594: recipe for target ‘.build_release/cuda/src/caffe/layers/power_layer.o’ failed

解决方法：
1. Makefile.config 中 CUDA_ARCH 设置未按规定设置（CUDA 9.0）

    # CUDA architecture setting: going with all of them.  
    # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.  
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.  
    # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.  
    CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \  
                    -gencode arch=compute_20,code=sm_21 \  
                    -gencode arch=compute_30,code=sm_30 \  
                    -gencode arch=compute_35,code=sm_35 \  
                    -gencode arch=compute_50,code=sm_50 \  
                    -gencode arch=compute_52,code=sm_52 \  
                    -gencode arch=compute_60,code=sm_60 \  
                    -gencode arch=compute_61,code=sm_61 \  
                    -gencode arch=compute_61,code=compute_61

2.删除版本不兼容的代码即可

    -gencode arch=compute_20,code=sm_20 \  
    -gencode arch=compute_20,code=sm_21 \

3.对于makefile.config文件，可以用\来换行，此时要注意，在一行的最后面加上\,而这个\后面不能再有任何字符，哪怕是空格也不允许，否则出错。

error while loading shared libraries: libcudnn.so.7: cannot open shared object file: No such file or directory

解决方法：
1.lib文件夹是在系统路径里的，用ls -al发现是文件权限的问题，因此用下述命令先删除软连接

cd /usr/local/cuda/lib64/

sudo rm -rf libcudnn.solibcudnn.so.7

2.修改文件权限，并创建新的软连接

sudo chmod u=rwx,g=rx,o=rx libcudnn.so.7.1.2

sudo ln -s libcudnn.so.7.1.2 libcudnn.so.7

sudo ln -s libcudnn.so.7 libcudnn.so

sudo ldconfig -v

Makefile:532: recipe for target ‘runtest’ failed

解决方法：

export MKL_CBWR=AUTO

export CUDA_VISIBLE_DEVICES=0

Check failed: error == cudaSuccess (8 vs. 0) invalid device function

解决方法：
1.修改Makefile.config中的这一部分

# CUDA architecture setting: going with all of them (up to CUDA 5.5 compatible).
# For the latest architecture, you need to install CUDA >= 6.0 and uncomment
# the *_50 lines below.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
             -gencode arch=compute_20,code=sm_21 \
             -gencode arch=compute_30,code=sm_30 \
             -gencode arch=compute_35,code=sm_35
             -gencode=arch=compute_50,code=sm_50 \
             -gencode=arch=compute_50,code=compute_50 \

2.查询N卡计算能力，将Makefile.config文件中的CUDA_ARCH参数更改，注释掉不适用的参数

Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version

解决方法：
1.在双显卡系统中，cuda运行时NVIDAI显卡必须是当前使用的显卡，否则无法获取GPU设备，cudaGetDeviceCount函数会报错，错误码35。
2.使用nvidia-prime切换到Nvidia显卡时，如果只是按照提示logout，再重新login是不行的，必须重启系统，否则会报错，错误码30。
3.caffe调用cuda的cudaGetDeviceCount函数获取GPU设备，执行

nvidia-setting

4.在PRIME profile中果然显示当前使用的是Intel 集成显卡，于是切换到nvidia显卡，重启

error while loading shared libraries: libcublas.so.8.0: cannot open shared object file: No such file or directory

解决方法：
cd到caffe目录下

cd caffe/

1.make all
2.make test
3.make runtest

HDF5 library and header mismatch error

解决方法：
安装caffe的过程进行到sudo make runtest时，提示HDF5 header version与HDF5 library不匹配，主要是anaconda和系统自带的发生了冲突，系统中h5py库存在多个版本，系统自带的python下有以前的h5py版本，所以只保留新版本
1.到hdfgroup.org官网下载对应的版本
请先注意PATHON环境变量配置正确
我的提示是Headers are 1.10.1, library is 1.8.16
2.卸载系统自带的HDF5旧版本

sudo conda uninstall h5py

3.解压安装新版本

tar -xvf hdf5-1.8.16.tar
cd hdf5-1.8.16
./configure --prefix=/usr/local/hdf5 
make
make check
make install
make check-install
conda install h5py

总结：

　　编译环境和依赖的不同，每个人产生的问题也会不一样。但是大部分情况第一、二两种解决方法可以解决绝大多数问题。cuda、cudnn、opencv版本兼容问题要优先考虑，安装caffe时一定要加倍细心。