Ubuntu16.04分布式并行版caffe

 

主体来自参考教程:http://www.cnblogs.com/beihaidao/p/6866342.html

 

并行版caffe下载地址:https://github.com/yjxiong/caffe.git

 

下载方法:git clone https://github.com/yjxiong/caffe.git

 

在安装caffe之前,要先装好cuda,cudnn,opencv,openmpi,hdf5等(在我安装过程中遇到的)

 

CUDA和cuDNN

(安装步骤略去)

版本是CUDA8.0,cuDNN5.0

 

安装OpenCV

版本是2.4.13

 

编译命令:

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so -D CUDA_ARCH_BIN=5.2 -D CUDA_ARCH_PTX="" -D WITH_CUDA=ON -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_NVCUVID:BOOL="1" .

 

安装OpenMPI

版本是2.1.0

参考网上教程(http://blog.youkuaiyun.com/among12345/article/details/69053506)

下面再详细说明一下一些细节:

官方给出的安装步骤:https://www.open-mpi.org/faq/?category=building#easy-build

 

gunzip -c openmpi-3.0.0.tar.gz | tar xf -
cd openmpi-3.0.0
./configure --prefix=/usr/local
<...lots of output...>
make all install


两个地方需要修改:

1. 根据yjxiong在issue 9中的回答(https://github.com/yjxiong/caffe/issues/9),配置时的完整命令为:

 

./configure --prefix=/usr/local --with-cuda --enable-mpi-thread-multiple


2. make all install前加sudo,否则安装过程中可能出现一些问题

 

 

 

然后测试一下是否安装成功:

 

cd openmpi-2.1.0/examples

make

mpirun -np 4 hello_c

 

 

 

 

 

安装caffe(重点)

 

 

到caffe的目录下,即使用git clone下载好的文件夹。

将Makefile.config.example 另存一份名为Makefile.config

修改Makefile.config,最终的样子如下:

 

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!
 
# cuDNN acceleration switch (uncomment to build with cuDNN).
 USE_CUDNN := 1
 
# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1
 
# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++
 
# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr
 
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
        -gencode arch=compute_20,code=sm_21 \
        -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50
 
# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas
 
# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib
 
# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
 MATLAB_DIR := /usr/local/MATLAB/R2014a
# MATLAB_DIR := /Applications/MATLAB_R2012b.app
 
# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
        /usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
        # $(ANACONDA_HOME)/include/python2.7 \
        # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \
 
# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib
 
# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib
 
# Uncomment to support layers written in Python (will link against Python libs)
 WITH_PYTHON_LAYER := 1
 
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
 
# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib
 
# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1
 
BUILD_DIR := build
DISTRIBUTE_DIR := distribute
 
# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1
 
# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0
 
# enable pretty build (comment to see full commands)
Q ?= @

 

 

 

 

然后在caffe目录下执行如下命令:

创建build文件夹并进入:

mkdir build

cd build

 

下面开始caffe的编译(重点重点重点!):

编译命令:

cmake -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF -DUSE_MPI=ON -DMPI_CXX_COMPILER=/usr/local/openmpi/bin/mpicxx ..(注意:不是主目录下的带有版本号的openmpi文件夹)

注意:-DCUDA_USE_STATIC_CUDA_RUNTIME=OFF在参考教程中没有,若在安装caffe时(即make all的时候)出现报错 cannot find -lopencv_dep_cudart时可以使用,重新编译后再make all

 

 

然后是安装:

make all 要在caffe根目录下!

make all -j8 (j8 是为了加快安装速度,可以去掉)

sudo make install (注意 sudo权限)  [这一步应该不需要]

最后就是测试:

make runtest (参考教程中提到有两个test没有通过,我安装的时候也是。。。)

 

最后就是python的接口(未使用matlab接口):

 

这2者都是caffe装之前就装好了的。

编译python接口:

添加环境变量:

vi ~/.bashrc

写入:

export PYTHONPATH=/your/path/caffe/python:$PYTHONPATH

保存,退出,执行sourc使文件生效:
source ~/.bashrc

接着在caffe根目录下

sudo make pycaffe

 

注意:上面这个命令前一定要加sudo!!!

安装过程中出现:CXX/LD -o python/caffe/_caffe.so python/caffe/_caffe.cpp才可以

否则import caffe时会出现:ImportError: No module named _caffe

 

make pycaffe时出现的问题

参考教程里提供了处理报错的一个链接,但是与我遇到的问题不符

 

我遇到的问题1:

 

In file included from src/caffe/solver.cpp:10:0:
./include/caffe/util/io.hpp:8:18: fatal error: hdf5.h: 没有那个文件或目录
compilation terminated.
Makefile:516: recipe for target '.build_release/src/caffe/solver.o' failed
make: *** [.build_release/src/caffe/solver.o] Error 1

 

 

(参考链接:http://blog.youkuaiyun.com/jessir/article/details/71195115)

问题:“fatal error: hdf5.h: 没有那个文件或目录”解决方法
解决方法:
(1)在Makefile.config文件的第85行,添加/usr/include/hdf5/serial/ 到 INCLUDE_DIRS,也就是把下面第一行代码改为第二行代码。
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

 


我遇到的问题2:

 

 

LD -o .build_release/lib/libcaffe.so
/usr/bin/ld: 找不到 -lhdf5_hl
/usr/bin/ld: 找不到 -lhdf5
collect2: error: ld returned 1 exit status
Makefile:508: recipe for target '.build_release/lib/libcaffe.so' failed
make: *** [.build_release/lib/libcaffe.so] Error 1


解决方法:

 

(参考链接:http://blog.youkuaiyun.com/md_learning/article/details/53185992)

将# Whatever else you find you need goes here.下面的
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
修改为:
INCLUDE_DIRS :=  $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
//这是因为ubuntu16.04的文件包含位置发生了变化,尤其是需要用到的hdf5的位置,所以需要更改这一路径


cd /usr/lib/x86_64-linux-gnu

\\然后根据情况执行下面两句:
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

 

遇到的问题3:

 

nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release 
(Use -Wno-deprecated-gpu-targets to suppress warning).

 

根据教程(https://www.jianshu.com/p/7df78120803a)中的方法:

 

删除Makefile.config中的:

 

-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \

重新make即可

 

原因可能是较新版本的caffe对一些GPU的架构不再支持了

 

 

最后就是输入命令:

python

import caffe 

没报错就是成功了。

 

遇到的问题4:

>>> import caffe
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hywang/caffe-Multi/python/caffe/__init__.py", line 1, in <module>
    from .pycaffe import Net, SGDSolver
  File "/home/hywang/caffe-Multi/python/caffe/pycaffe.py", line 14, in <module>
    import caffe.io
  File "/home/hywang/caffe-Multi/python/caffe/io.py", line 8, in <module>
    from caffe.proto import caffe_pb2
  File "/home/hywang/caffe-Multi/python/caffe/proto/caffe_pb2.py", line 6, in <module>
    from google.protobuf.internal import enum_type_wrapper
ImportError: No module named google.protobuf.internal

根据stackoverflow上的回答(https://stackoverflow.com/questions/37666241/importing-caffe-results-in-importerror-no-module-named-google-protobuf-interna):

安装protobuf即可:

pip install protobuf

 

------------------------后续--------------------------------

按照上面的步骤可以安装但是在运行代码时出现报错:

"field named 'fix_crop' can not found"(大概意思)

重新安装时采用不同的步骤:

1. cp Make.config.examples Make.config(修改文件:cudnn;python_include;  python_lib)

2. mkdir build;  cd build; cmake DUSE_MPI=ON

3. make all -j(注意:这里的make是在build里面)

4. cd caffe;  make all -j(注意:而这里的make是在caffe根目录下)

5. make pycaffe -j

参考:https://www.cnblogs.com/go-better/p/7161006.html

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值