linux-指定cuda环境-运行tensorflow-gpu程序

一. 背景

昨天需要将win10下的一个程序部署到linux服务器上。

此程序运行环境:python3.6+tensorflow-gpu1.9。

服务器上只有账号,没有修改权限。

二. 实际操作流程

1. 首先将程序整理好。

2. 下载putty用于连接linux服务器。putty下载地址

3. 下载pscp.exe用于与linux服务器传送文件。pscp下载地址(pscp的下载地址跟putty是一样的)。下载好pscp.exe后,将pscp.exe放在c:/windows/system32目录下。

4. 开始连接linux服务器,打开putty,输入访问ip地址,弹出窗口后输入用户名和密码。

5. 使用pscp.exe将程序上传到linux服务器。操作如下:

pscp 源文件 用户名@ip:目标路径


例如上传rar文件:
pscp D:/java/apache-tomcat-5.5.2/webapps/szfdc.rar dev@192.168.68.249:/home/dev 


如果要上传文件夹,则使用如下格式进行传送
pscp -r 源文件 用户名@ip:目标路径

6.  接下来,发现服务器并没有需要的tensorflow-gpu1.9的环境,所以开始进行安装:

1. 下载anaconda3
wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh

2. 安装anaconda3
sh Anaconda3-5.2.0-Linux-x86_64.sh

3. 安装tensorflow-gpu1.9
pip install tensorflow-gpu==1.9
发现下载速度极慢,使用镜像进行下载,改为
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.9
ok,分分钟下完。

7. 接着运行程序,报错:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

8. 以上报错是由于使用的cuda版本不正确,tensorflow-gpu1.9对应的是cuda9.0,而服务器默认调用的是cuda8.0。好在服务器里本来就有cuda9.0,通过改变用户环境变量来让程序调用cuda9.0(没有权限修改,sudo都用不了,各种痛苦,大哥不放心把权限放给我,所以全部后来大哥教我直接修改用户环境变量就ok,感谢大哥的教导)。

修改用户环境变量:

export $PATH=/usr/local/cuda9.0

执行程序,依旧报错,emm,看来部署服务器并没有那么简单呀。错误如下:

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

9. 查看了很多资料, 说是这个报错是因为没有建立软连接的关系导致的。(然后我就找大哥,这个咋办,大哥十分无奈地让我新建一个脚本,然后他帮我配好环境,大哥真是个好人,流下没有技术的泪水),大哥的解决方案如下:

export LD_LIBRARY_PATH=/usr/local/cudnn_9.0/lib64:/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH

大哥牛逼,可惜并不是很懂,大哥并没有建立软连接,而是直接指定了访问的路径,以后看来还需要补一补这方面的知识,时间真的是不够用, 要学的太多了。

三. 引用

https://blog.youkuaiyun.com/lambert310/article/details/52412059  pip镜像

https://blog.youkuaiyun.com/u010329292/article/details/70243721 pscp上传下载

https://blog.youkuaiyun.com/w5688414/article/details/79187499 importError:libcublas.so.9

https://tieba.baidu.com/p/5721780567?red_tag=1534022437 importError:libcudnn.so.7(虽然这个没有直接解决问题,但也找到了问题的症结吧)

另外,tensorflow-gpu的安装这里就不贴连接了,当时安装的时候,也是找了很多资料吧,具体用的哪些也不记得了,也没有再全部重新试一下的想法,担待。

 

自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.无mkl支持; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 TI 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]://home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值