安装流程:
1.关闭UEFI
在BIOS里面检查你的UEFI是否开启,如果开启的话请立马关掉它(这个很湿重要,因为它很有可能导致你的kernel安装失败,笔者就遇到了这个坑,浪费了好多时间),具体怎么关掉就不多说,由于每种电脑型号的BIOS都有所不同。
2.确认自己的显卡支持cuda
- [littlebei@localhost ~]$ lspci | grep -i nvidia
- 01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 745] (rev a2)
- 01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
如果有以上信息出现,说明你的显卡是支持cuda的。
3.确认Linux版本是否支持cuda
- [littlebei@localhost ~]$ uname -m && cat /etc/*release
若有信息输出,说明是支持的。
4.检查gcc是否安装
- [littlebei@localhost ~]$ gcc --version
- gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
- Copyright (C) 2015 Free Software Foundation, Inc.
- This is free software; see the source for copying conditions. There is NO
- warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
若出现以上信息说明gcc已经安装。
若没有安装,可以使用一下命令安装- [littlebei@localhost ~]$ sudo yum install gcc gcc-c++
5.安装kernel-devel和kernel-headers
- $ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
6.关闭X server
- $ systemctl stop gdm.service
7.禁用nouveau(因为它是一般linxu系统自带的显卡驱动,会和nvidia冲突,所以必须要关掉)
(1)将 nouveau 驱动加入黑名单:
在 /usr/lib/modprobe.d/dist-blacklist.conf 中加入 blacklist nouveau(这种方式仅限在centos 7,其他Linux 系统自行解决)。
(2)备份 initramfs 文件:
- $ sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
- $ sudo dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
8.关机重启
9.安装NVIDIA驱动
安装NVIDIA驱动是很重要的步骤,该步成功了,后面也就基本上一马平川了。(1)使用第2步中的方法,找到你的驱动型号,然后在官网找到与之匹配的型号,下载安装,下载链接戳我
(2)使用一下命令安装
- $ sudo sh NVIDIAxxx --kernel-source-path=/usr/src/kernels/x.xx.x-xxxxx
其中 NVIDIAxxx 为 nvidia 驱动脚本文件, x.xx.x-xxxx 为 kernel 版本号,kernel版本号可以使用一下命令查找
- [littlebei@localhost ~]$ uname -r
- 3.10.0-693.2.2.el7.x86_64
在安装过程中,可能会出现一下两种错误:
第一种:
- The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are
- installed and set up correctly.
- If you know that the kernel source packages are installed and set up correctly, you may pass the location of thekernel source with the '--kernel-source-path' flag.
解决方案:
- $ sudo yum install epel-release
- $ sudo yum install --enablerepo=epel dkms
第二种:
- ERROR: Unable to load the 'nvidia-drm' kernel module.
解决方案:
- One probable reason is that the system is boot from UEFI but Secure Boot option is turned on in the BIOS setting.
- Turn it off and the problem will be solved.
(3)具体的安装执行过程
在accept的页面选择Accept,在32-bit页面选择No,在X- configuration页面选择Yes
10.安装cuda
在这个页面选择与系统版本匹配的cuda,戳我,进行下载,这里建议不要下载太新的cuda版本,因为下载太新的版本很有可能和tensorflow版本匹配不上,这里也是笔者踩过得坑。
安装的命令
- $ sudo sh cuda_8.0.61_375.26_linux.run
安装执行以下过程
- # accept
- -------------------------------------------------------------
- Do you accept the previously read EULA?accept/decline/quit: accept
- # no
- Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?(y)es/(n)o/(q)uit: n
- -------------------------------------------------------------
- # 后面的就都选yes或者default
- Do you want to install the OpenGL libraries?
- (y)es/(n)o/(q)uit [ default is yes ]:
- Do you want to run nvidia-xconfig?
- This will update the system X configuration file so that the NVIDIA X driver is used.
- The pre-existing X configuration file will be backed up.
- This option should not be used on systems that require a custom X configuration,
- such as systems with multiple GPU vendors.
- (y)es/(n)o/(q)uit [ default is no ]: y
- Install the CUDA 8.0 Toolkit?
- (y)es/(n)o/(q)uit: y
- Enter Toolkit Location [ default is /usr/local/cuda-8.0 ]:
- Do you want to install a symbolic link at /usr/local/cuda?
- (y)es/(n)o/(q)uit: y
- Install the CUDA 8.0 Samples?
- (y)es/(n)o/(q)uit: y
- Enter CUDA Samples Location
- [ default is /root ]:
- Installing the NVIDIA display driver...
看到以下输出信息说明安装成功
- The driver installation has failed due to an unknown error. Please consult the driver
- installation log located at /var/log/nvidia-installer.log.
- ===========
- = Summary =
- ===========
- Driver: Not Selected
- Toolkit: Installed in /usr/local/cuda-8.0
- Samples: Installed in /root, but missing recommended libraries
- Please make sure that
- - PATH includes /usr/local/cuda-8.0/bin
- - LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or,
- add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
- To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
- Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed
- information on setting up CUDA.
- ***WARNING: Incomplete installation! This installation did not install the CUDA Driver.
- A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
- To install the driver using this installer, run the following command,
- replacing <CudaInstaller> with the name of this run file:
- sudo <CudaInstaller>.run -silent -driver
- Logfile is /tmp/cuda_install_192.log
11.配置cuda环境变量
编辑~/.bashrc文件
- $ sudo vim ~/.bashrc
添加如下内容
- export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
- export CUDA_HOME=/usr/local/cuda-8.0/
12.安装cuDNN
在官网上下载cuDNN包,戳我(注意版本匹配的问题)
下载完成执行以下操作
- $ tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz
- $ cp include/* /usr/local/cuda/include
- $ cp lib64/* /usr/local/cuda/lib64
13.安装gpu版的TensorFlow
- $ sudo pip install tensorflow-gpu
这里是使用pip直接安装的,如果你的机器上没有安装pip的话,可以参考我的另外一篇博文里面有写到pip的安装教程。
14.测试TensorFlow
走过前面的沟沟坎坎,终于到了测试这一步了,是不是很happy。
- Python 2.7.5 (default, Jun 17 2014, 18:11:42)
- [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
- Type "help", "copyright", "credits" or "license" for more information.
- >>> import tensorflow as tf
- >>> hello = tf.constant('Hello, TensorFlow!')
- >>> sess = tf.Session()
- 2017-06-28 16:42:53.518877: W tensorflow/core/platform/cpu_feature_guard.cc:45]
- The TensorFlow library wasn't compiled to use SSE4.1 instructions,
- but these are available on your machine and could speed up CPU computations.
- 2017-06-28 16:42:53.518906: W tensorflow/core/platform/cpu_feature_guard.cc:45]
- The TensorFlow library wasn't compiled to use SSE4.2 instructions,
- but these are available on your machine and could speed up CPU computations.
- 2017-06-28 16:42:53.518914: W tensorflow/core/platform/cpu_feature_guard.cc:45]
- The TensorFlow library wasn't compiled to use AVX instructions,
- but these are available on your machine and could speed up CPU computations.
- 2017-06-28 16:42:53.518921: W tensorflow/core/platform/cpu_feature_guard.cc:45]
- The TensorFlow library wasn't compiled to use AVX2 instructions,
- but these are available on your machine and could speed up CPU computations.
- 2017-06-28 16:42:53.518929: W tensorflow/core/platform/cpu_feature_guard.cc:45]
- The TensorFlow library wasn't compiled to use FMA instructions,
- but these are available on your machine and could speed up CPU computations.
- 2017-06-28 16:42:54.099744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901]
- successful NUMA node read from SysFS had negative value (-1),
- but there must be at least one NUMA node, so returning NUMA node zero
- 2017-06-28 16:42:54.100218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887]
- Found device 0 with properties:
- name: Tesla M60
- major: 5 minor: 2 memoryClockRate (GHz) 1.1775
- pciBusID 0000:00:02.0
- Total memory: 7.93GiB
- Free memory: 7.86GiB
- 2017-06-28 16:42:54.100243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
- 2017-06-28 16:42:54.100251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
- 2017-06-28 16:42:54.100266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]
- Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:02.0)
- >>> print(sess.run(hello))
- Hello, TensorFlow!
如果你可以正确的运行上面这个小的例子,那么恭喜你,gpu版的TensorFlow安装成功了,还等什么,赶紧造起来吧