目录
- 1 ubuntu server系统安装
- 2 Pre-Installation Actions
- [2.1. Verify You Have a CUDA-Capable GPU](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#verify-you-have-cuda-enabled-system)
- [2.2. Verify You Have a Supported Version of Linux](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#verify-you-have-supported-version-of-linux)
- [2.3. Verify the System Has gcc Installed](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#verify-that-gcc-is-installed)
- [2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#verify-kernel-packages)
- [2.5. Choose an Installation Method](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#choose-installation-method)
- [2.6. Download the NVIDIA CUDA Toolkit](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html#download-nvidia-driver-and-cuda-software)
- 3 CUDA 8.0 安装
- 4 cuDNN
- 5 Anaconda3
- 6 Tensorflow
- 7 参考文档
硬件配置
主机:Dell T620塔式服务器
显卡:Nvidia Tesla K20c
系统&驱动配置
Ubuntu Server 16.04
Nvidia driver:375.26
CUDA 8.0
cuDNN 7.0.3
tensorflow1.3
1 ubuntu server系统安装
系统:Ubuntu Server 16.04
安装方式:光盘安装(使用U盘安装会出现ISO文件无法挂载的问题)
光盘刻录系统步骤:
- ISO文件打开方式选择Windows光盘映像刻录机
- 点击刻录
1.1 网络
安装后可能DHCP服务没有启动,需要手动启动使用dhclient
命令
使用ifconfig
可以看到网卡信息
zjw@t620:~$ ifconfig
eno1 Link encap:Ethernet HWaddr f0:1f:af:e8:79:0e
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Memory:dad00000-dadfffff
eno2 Link encap:Ethernet HWaddr f0:1f:af:e8:79:0f
inet addr:219.223.196.65 Bcast:219.223.199.255 Mask:255.255.248.0
inet6 addr: 2001:250:3c02:200:760:ae61:106:5168/128 Scope:Global
inet6 addr: fe80::306f:6961:652b:a64b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:16373 errors:0 dropped:0 overruns:0 frame:0
TX packets:6720 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15745390 (15.7 MB) TX bytes:971604 (971.6 KB)
Memory:dae00000-daefffff
idrac Link encap:Ethernet HWaddr f0:1f:af:e8:79:11
inet addr:169.254.0.2 Bcast:169.254.0.255 Mask:255.255.255.0
inet6 addr: fe80::b5f1:8170:e317:6c31/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:33 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:594 (594.0 B) TX bytes:4752 (4.7 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:232 errors:0 dropped:0 overruns:0 frame:0
TX packets:232 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:25105 (25.1 KB) TX bytes:25105 (25.1 KB)
本服务器的地址就是eno2网卡的地址
在校园网需要通过浏览器图形界面登陆账号密码才能上网,使用以下命令代替
curl -X POST -F 'action=login' -F 'username=账户' -F 'password=密码' -F 'ac_id=1' -F 'ajax=1' 1' http://10.0.10.66/include/auth_action.php
在使用前需要先能ping通10.0.10.66
这个地址,在登陆后测试是否已经联网
zjw@t620:~$ curl www.baidu.com
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
抓取了网站的内容,已经联网
1.2 更换软件源为清华大学
Ubuntu 的软件源配置文件是 /etc/apt/sources.list。将系统自带的该文件做个备份,将该文件替换为下面内容,即可使用 TUNA 的软件源镜像。
$ sudo gedit /etc/apt/sources.list
修改如下:
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
# 预发布软件源,不建议启用
# deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
# deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
修改后 sudo apt-get update 使修改生效
1.3 远程登陆
安装openssh: sudo apt-get install openssh-server
==================================================
查看GPU使用情况: nvidia-smi
zjw@t620:~$ nvidia-smi
Sat Sep 22 15:22:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 00000000:02:00.0 Off | 0 |
| 30% 33C P0 52W / 225W | 0MiB / 4742MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
下载驱动 : https://www.nvidia.cn/Download/index.aspx?lang=cn
Ubuntu16.04 系统下K20c CUDA只能装8.0以上版本
查找结果
2 Pre-Installation Actions
2.1. Verify You Have a CUDA-Capable GPU
zjw@t620:~$ lspci | grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
2.2. Verify You Have a Supported Version of Linux
zjw@t620:~$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
2.3. Verify the System Has gcc Installed
zjw@t620:~$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed
zjw@t620:~$ uname -r
4.4.0-87-generic
安装对应的kernels header和开发包:
$ sudo apt-get install linux-headers-$(uname -r)
2.5. Choose an Installation Method
runfile 推荐 / deb /
2.6. Download the NVIDIA CUDA Toolkit
CUDA toolkit 8.0 下载地址:https://developer.nvidia.com/cuda-80-ga2-download-archive
CUDA toolkit 8.0 安装过程文档(照做基本不出问题):https://docs.nvidia.com/cuda/archive/8.0/
3 CUDA 8.0 安装
卸载CUDA相关包:
sudo apt-get remove cuda
sudo apt-get autoclean
sudo apt-get --purge remove nvidia* # 卸载Nvidia相关包
然后在目录切换到/esr/local/下 cd /usr/local/
sudo rm -r cuda-*
3.1 runfile安装
推荐使用runfile
方式安装(deb方式卸载的时候麻烦)
runfile 下载地址:https://developer.nvidia.com/cuda-80-ga2-download-archive
可以用wget
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
sudo sh cuda_8.0.61_375.26_linux.run.26_linux-run
会出现一大堆选项,OPENGL安装选no,其余按照yes或者default选。如果已经安装了新的驱动,不要选择安装驱动。
-------------------------------------------------------------
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/cmfchina ]:
Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Installing the CUDA Samples in /home/cmfchina ...
Copying samples to /home/cmfchina/NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/cmfchina, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-8.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
3.2 设置环境变量
输入命令,编辑环境变量配置文件
sudo vim ~/.bashrc
在文本末端追加以下两行代码(按键“i”进行编辑操作)
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
保存退出,执行下面命令,使环境变量立刻生效
sudo source ~/.bashrc
sudo ldconfig
安装完成后后重启
3.3 检查CUDA配置
root@t620:/home/zjw# nvidia-smi
Sun Sep 23 17:31:45 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:02:00.0 Off | 0 |
| 30% 32C P0 52W / 225W | 0MiB / 4742MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
检查cuda是否配置正确
zjw@t620:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
3.4 测试CUDA的sammples
进入cude sample code目录,make
编译所有demo。注意:因为这里的make操作是将sample文件夹下所有的demo都编译了一遍,所以比较好使,如果仅仅想测试某个例子,可以进入相应的文件夹去编译即可。
# 切换到cuda-samples所在目录
cd /usr/local/cuda-8.0/samples 或者 cd /home/NVIDIA_CUDA-8.0_Samples
# 没有make,先安装命令 sudo apt-get install cmake,-j是最大限度的使用cpu编译,加快编译的速度
make –j
# 编译完毕,切换release目录(/usr/local/cuda-8.0/samples/bin/x86_64/linux/release完整目录)
cd ./bin/x86_64/linux/release
# 检验是否成功,运行实例
./deviceQuery
# 可以认真看看自行结果,它显示了你的NVIDIA显卡的相关信息,最后能看到Result = PASS就算成功。
编译完成后切换到 bin
目录
./deviceQuery
root@t620:/home/zjw/CUDA_Samples/NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/releas e# ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla K20c"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 4742 MBytes (4972412928 bytes)
(13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
GPU Max Clock rate: 706 MHz (0.71 GHz)
Memory Clock rate: 2600 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 1310720 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simu ltaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Versi on = 8.0, NumDevs = 1, Device0 = Tesla K20c
Result = PASS
输出结果看到显卡相关信息,并且最后Result = PASS ,这说明CUDA才真正完全安装成功了
再检查一下系统和CUDA-Capable device的连接情况
root@t620:/home/zjw/CUDA_Samples/NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release# ./bandwidthTest [CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Tesla K20c
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6160.9
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6550.6
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 146967.3
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
4 cuDNN
官方安装文档(照做基本不出问题)
4.1 下载cuDNN
cuDNN是GPU加速计算深层神经网络的库。首先去官网(https://developer.nvidia.com/rdp/cudnn-download)下载cuDNN,需要注册一个账号才能下载,没有的话自己注册一个。由于本人的显卡是K20c,CUDA 8.0,最新的版本是v7:
下载速度几KB。Nvidia把国内IP屏蔽了,建议代理换全局模式,用国外IP就可以下载了,亲测。
4.2 安装cuDNN
安装cudnn比较简单,简单地说,就是复制几个文件:库文件和头文件。将cudnn的头文件复制到cuda安装路径的include路径下,将cudnn的库文件复制到cuda安装路径的lib64路径下。具体操作如下
# 解压文件
zjw@t620:~$ cp cudnn-8.0-linux-x64-v7.solitairetheme8 cudnn-8.0-linux-x64-v7.tgz
zjw@t620:~$ tar -zxvf cudnn-8.0-linux-x64-v7.tgz
#切换到刚刚解压出来的文件夹路径
cd cuda
#复制include里的头文件(记得转到include文件里执行下面命令)
sudo cp include/cudnn.h /usr/local/cuda/include/
#复制lib64下的lib文件到cuda安装路径下的lib64(记得转到lib64文件里执行下面命令)
sudo cp lib* /usr/local/cuda/lib64/
#设置权限
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
#======更新软连接======
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.7 #删除原有动态文件,版本号注意变化,可在cudnn的lib64文件夹中查看
sudo ln -s libcudnn.so.7.0.5 libcudnn.so.7 #生成软衔接(注意这里要和自己下载的cudnn版本对应,可以在/usr/local/cuda/lib64下查看自己libcudnn的版本)
sudo ln -s libcudnn.so.7 libcudnn.so #生成软链接
sudo ldconfig -v #立刻生效
备注:上面的软连接的版本号要根据自己实际下载的cudnn的lib版本号
最后我们看看验证安装cudnn后cuda是否依旧可用
zjw@t620:/usr/local/cuda/lib64$ nvcc --version # or nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
4.3 检验cuDNN是否安装成功
到目前为止,cuDNN已经安装完了,但是,是否成功安装,我们可以通过cuDNN sample测试一下(https://developer.nvidia.com/rdp/cudnn-archive 页面中找到对应的cudnn版本,里面有 cuDNN v5 Code Samples,点击该链接下载即可,版本可能不一样,下载最新的就行)
下载完,转到解压出的目录下的mnistCUDNN
# Copy the cuDNN sample to a writable path.
$cp -r /usr/src/cudnn_samples_v7/ $HOME
# Go to the writable path
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
# Compile the mnistCUDNN sample
$make clean
$make
Run the mnistCUDNN sample
zjw@t620:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7005 (7.0.5)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 13 Capabilities 3.5, SmClock 705.5 Mhz, MemSize (Mb) 4742, MemClock 2600.0 Mhz, Ecc=1, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.041376 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.079680 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.092288 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.153120 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.172800 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.061024 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.061312 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.086560 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.172704 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.186944 time requiring 207360 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Test passed! 至此,cuDNN已经成功安装了
5 Anaconda3
Anaconda是python的一个科学计算发行版,内置了数百个python经常会使用的库,很多是TensorFlow的依赖库。安装好Anaconda可以提供一个好的环境直接安装TensorFlow。
bash Anaconda3-4.2.0-Linux-x86_64.sh
安装anaconda,回车后,是许可文件,接收许可。直接回车即可。最后会询问是否把anaconda的bin添加到用户的环境变量中,选择yes。在终端输入python发现依然是系统自带的python版本,这是因为环境变量的更新还没有生效,命令行输入如下命令是安装的anaconda生效。如果conda --version
没有找到任何信息,说明没有加入到环境变量没有,需要手动加入,如图所示:
root@t620:/home/zjw# vim ~/.bashrc
root@t620:/home/zjw# source ~/.bashrc
检查环境变量是否生效
zjw@t620:~$ conda --version
conda 4.4.10
zjw@t620:~$ python
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
python版本为 Anaconda版本而非系统自带,说明环境变量更新生效
6 Tensorflow
大家可以参考TensorFlow的官方安装教程(https://www.tensorflow.org/install/),官网提供的了 Pip, Docker, Virtualenv, Anaconda 或 源码编译的方法安装 TensorFlow,我们这里主要介绍以Anaconda安装。其他安装方式,大家可以到官方安装教程查看。
6.1 安装TensorFlow
通过Anaconda安装TensorFlow CPU,TensorFlow 的官方下载源现在已经在GitHub上提供了(https://github.com/tensorflow/tensorflow),找到对应的版本号,如图所示:
官方的文档:使用 Anaconda 进行安装tensorflow https://www.tensorflow.org/install/install_linux#InstallingAnaconda
6.2 创建一个名为tensorflow的conda环境Python 3.6
#Python 2.7
conda create -n tensorflow python=2.7
#Python 3.4
conda create -n tensorflow python=3.4
#Python 3.5
conda create -n tensorflow python=3.5
#Python 3.6
conda create -n tensorflow python=3.6 #我下的TensorFlow对应的Python是3.6版本,那么我就使用这行
备注:(根据TensorFlow版本号,一定要设置Python版本号,切记切记切记!!!!!重要的事情说三遍!否则后面会报各种错的)
创建时出错
zjw@t620:~$ conda create -n tensorflow python=3.6
Solving environment: failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.continuum.io/pkgs/main/linux-64/repoa.json.bz2>
Elapsed: -
...
解决方法:(关闭VPN)
以下是辅助,不一定成功
# 首先添加清华的镜像源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --set show_channel_urls yes
6.3 激活 conda 环境
zjw@t620:~$ conda create -n tensorflow pip python=3.6
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.4.10
latest version: 4.5.11
Please update conda by running
$ conda update -n base conda
## Package Plan ##
environment location: /home/zjw/.conda/envs/tensorflow
added / updated specs:
- pip
- python=3.6
The following packages will be downloaded:
package | build
---------------------------|-----------------
xz-5.2.3 | 0 667 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tk-8.5.18 | 0 1.9 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
wheel-0.29.0 | py36_0 88 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
openssl-1.0.2l | 0 3.2 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
readline-6.2 | 2 606 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
python-3.6.2 | 0 16.5 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pip-9.0.1 | py36_1 1.7 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
sqlite-3.13.0 | 0 4.0 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
certifi-2016.2.28 | py36_0 216 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
setuptools-36.4.0 | py36_1 563 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
zlib-1.2.11 | 0 109 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
------------------------------------------------------------
Total: 29.3 MB
The following NEW packages will be INSTALLED:
certifi: 2016.2.28-py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
openssl: 1.0.2l-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
pip: 9.0.1-py36_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
python: 3.6.2-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
readline: 6.2-2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
setuptools: 36.4.0-py36_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
sqlite: 3.13.0-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
tk: 8.5.18-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
wheel: 0.29.0-py36_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
xz: 5.2.3-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
zlib: 1.2.11-0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
Proceed ([y]/n)? y
Downloading and Extracting Packages
xz 5.2.3: ################################################################################### | 100%
tk 8.5.18: ################################################################################## | 100%
wheel 0.29.0: ############################################################################### | 100%
openssl 1.0.2l: ############################################################################# | 100%
readline 6.2: ############################################################################### | 100%
python 3.6.2: ############################################################################### | 100%
pip 9.0.1: ################################################################################## | 100%
sqlite 3.13.0: ############################################################################## | 100%
certifi 2016.2.28: ########################################################################## | 100%
setuptools 36.4.0: ########################################################################## | 100%
zlib 1.2.11: ################################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate tensorflow
#
# To deactivate an active environment, use
#
# $ conda deactivate
zjw@t620:~$ conda activate tensorflow
source activate tensorflow
6.4 在conda环境中安装TensorFlow GPU版
因为我们前面选择了conda环境为Python3.6的,所以我们选择Python3.6版本的GPU链接地址,进行安装
#如何进行安装,我们这里安装Python版本为3.6的TensorFlow
sudo pip3 install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.3.0-cp36-cp36m-linux_x86_64.whl
# 备注:连接里的cpxx和cpxxm的xx是对应Python的版本号#
失败,我们需要下载GPU版的安装包,在安装包下载之后,然后手动进入环境,安装TensorFlow whl安装包。
(tensorflow) zjw@t620:~$ wget https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.3.0-cp36-cp36m-linux_x86_64.whl
source activate tensorflow #激活tensorflow环境(这步操作了,就忽略)
(tensorflow) zjw@t620:~$ pip install --ignore-installed --upgrade tensorflow_gpu-1.3.0-cp36-cp36m-linux_x86_64.whl
Processing ./tensorflow_gpu-1.3.0-cp36-cp36m-linux_x86_64.whl
Collecting wheel>=0.26 (from tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/81/30/e935244ca6165187ae8be876b6316ae201b71485538ffac1d718843025a9/wheel-0.31.1-py2.py3-none-any.whl (41kB)
100% |████████████████████████████████| 51kB 96kB/s
Collecting numpy>=1.11.0 (from tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/22/02/bae88c4aaea4256d890adbf3f7cf33e59a443f9985cf91cd08a35656676a/numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl (13.9MB)
100% |████████████████████████████████| 13.9MB 46kB/s
Collecting protobuf>=3.3.0 (from tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/c2/f9/28787754923612ca9bfdffc588daa05580ed70698add063a5629d1a4209d/protobuf-3.6.1-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
100% |████████████████████████████████| 1.1MB 110kB/s
Collecting six>=1.10.0 (from tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting tensorflow-tensorboard<0.2.0,>=0.1.0 (from tensorflow-gpu==1.3.0)
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out. (read timeout=15)",)': /simple/tensorflow-tensorboard/
Downloading https://files.pythonhosted.org/packages/93/31/bb4111c3141d22bd7b2b553a26aa0c1863c86cb723919e5bd7847b3de4fc/tensorflow_tensorboard-0.1.8-py3-none-any.whl (1.6MB)
100% |████████████████████████████████| 1.6MB 531kB/s
Collecting setuptools (from protobuf>=3.3.0->tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/6e/9c/cc2eb661d85f4aa541910af1a72b834a0f5c9209079fcbd1438fa6da17c6/setuptools-40.4.2-py2.py3-none-any.whl (569kB)
100% |████████████████████████████████| 573kB 303kB/s
Collecting werkzeug>=0.11.10 (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/20/c4/12e3e56473e52375aa29c4764e70d1b8f3efa6682bef8d0aae04fe335243/Werkzeug-0.14.1-py2.py3-none-any.whl (322kB)
100% |████████████████████████████████| 327kB 329kB/s
Collecting bleach==1.5.0 (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting html5lib==0.9999999 (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
100% |████████████████████████████████| 890kB 274kB/s
Collecting markdown>=2.6.8 (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow-gpu==1.3.0)
Downloading https://files.pythonhosted.org/packages/7a/fd/e22357c299e93c0bc11ec8ba54e79f98dd568e09adfe9b39d6852c744938/Markdown-3.0-py2.py3-none-any.whl (89kB)
100% |████████████████████████████████| 92kB 331kB/s
Building wheels for collected packages: html5lib
Running setup.py bdist_wheel for html5lib ... done
Stored in directory: /home/zjw/.cache/pip/wheels/50/ae/f9/d2b189788efcf61d1ee0e36045476735c838898eef1cad6e29
Successfully built html5lib
Installing collected packages: wheel, numpy, six, setuptools, protobuf, werkzeug, html5lib, bleach, markdown, tensorflow-tensorboard, tensorflow-gpu
Successfully installed bleach-1.5.0 html5lib-0.9999999 markdown-3.0 numpy-1.15.2 protobuf-3.6.1 setuptools-40.4.2 six-1.11.0 tensorflow-gpu-1.3.0 tensorflow-tensorboard-0.1.8 werkzeug-0.14.1 wheel-0.31.1
You are using pip version 9.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
6.5 当你不用 TensorFlow 的时候,关闭环境
source deactivate tensorflow
6.6 安装成功后,每次使用 TensorFlow 的时候需要激活 conda 环境(操作步骤2就可以了)
6.7 常见问题
出现“ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory”错误信息
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import *
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/cmfchina/.conda/envs/tensorflow/lib/python3.6/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcudnn.so.6: cannot open shared object file: No such file or directory
解决方法:
find / -name libcudnn.so.*
找到文件就下一步,没找到,检查下cudnn的依赖库,就是前面的环境变量做对了没
-
建立硬连接
sudo ln -s <path>libcudnn.so.7.* <path>libcudnn.so.6 #path就是libcudnn.so.7的所在目录或者 sudo ln -s libcudnn.so.7.* libcudnn.so.6 #cd 到 libcudnn.so.7的所在目录
6.8 卸载TensorFlow
如果我们需要卸载TensorFlow的话,使用下面命令
sudo pip uninstall tensorflow #Python2.7
sudo pip3 uninstall tensorflow #Python3.x
6.9 测试Tensorflow
(tensorflow) zjw@t620:/usr/local$ python
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2018-09-24 10:25:36.621775: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-24 10:25:36.621832: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-24 10:25:36.621852: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-09-24 10:25:38.176557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla K20c
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:02:00.0
Total memory: 4.63GiB
Free memory: 4.57GiB
2018-09-24 10:25:38.176610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-09-24 10:25:38.176620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-09-24 10:25:38.176637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20c, pci bus id: 0000:02:00.0)
>>> sess.run(hello)
b'Hello, TensorFlow!'
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> b = tf.constant(32)
>>> sess.run(a + b)
42
>>> sess.close()