0、系统环境:
ubuntu 16.04 + tensorflow 1.14.0 + python 3.5
1、安装tensorflow很慢(换源)
直接用以下代码安装
# For CPU
pip3 install tensorflow
# For GPU
pip3 install tensorflow-gpu
因为谷歌在国外的关系,下载很慢,需要换源,直接在用户名根目录创建.pip文件夹并创建pip.conf文件
vim ~/.pip/pip.conf
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
[install]
trusted-host=mirrors.aliyun.com
2、虚拟机无法读取物理机显卡(无法运行Tensorflow-GPU)
说明:VM装的虚拟机是虚拟显卡,不是物理机自带显卡,考虑以下解决办法。
解决办法:
1、安装docker,可以直接调用物理机硬件。(推荐)
2、双系统。
3、安装Tensorflow-CPU。(暂未试过)
3、numpy报错
root@48e02d5a30a1:~/python/lx_express# python3 main.py
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:458: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:459: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:460: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:461: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:462: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/root/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/dtypes.py:465: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
问题说明:numpy版本过高,输入 pip show numpy 查看numpy版本
pip show numpy # 查看版本
pip uninstall numpy # 卸载numpy
pip install numpy==1.16 # 指定安装1.16版本numpy
4、Cannot uninstall 'wrapt'
ERROR: Cannot uninstall 'wrapt'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
解决:
pip install -U --ignore-installed wrapt enum34 simplejson netaddr
5、KeyError: 'IteratorFromStringHandleV2'
KeyError: 'IteratorFromStringHandleV2'
解决:
在本地环境可以运行的代码放到docker之后报错,查看发现tensorflow版本太低,最后安装了和本地一样的tensorflow(1.14.0版本)解决问题。
6、UnicodeEncodeError: 'ascii' codec can't encode characters in position 159-168: ordinal not in range(128)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 159-168: ordinal not in range(128)
说明:
在运行的代码里面添加一下语句:
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
7、ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory (docker 环境下)
ImportError: Traceback (most recent call last):
File "/storage/xuminghong/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/storage/xuminghong/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/storage/xuminghong/.local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/local/lib/python3.6/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/local/lib/python3.6/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
解决方法:
我是在docker环境下运行的,之前直接docker run,后面用nvidia-docker run以上的错误解决。
8、pycharm配置cuda
1.打开pycham run->Edit Configurations
2.设置Enviroment variables 变量:LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64(注意自己cuda安裝的路径)
9、 Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
之前使用keras,本地运行得好好的,部署到docker上报错(如题),发现是刚开始分配太多GPU内存(可能用不了这么多),但是GPU没有那么多内存给他,加入以下代码,按需获取GPU内存。
import tensorflow as tf # 如果没有,记得import
config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=config)
10、 Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
对于ibcudart.so.n,n为几,为找不到cuda-几的环境。此条错为没找到cuda10的路径,
需要配置ubuntu的cuda环境变量,其文件路径 ~/.bashrc
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}
记得source ~/.bashrc
注意:如果是在pycharm中报错,参考第8条。
修改成功后如图示:
11、ERROR: Could not find a version that satisfies the requirement tensorflowt (from versions: none)
ERROR: No matching distribution found for tensorflow
说明:
之前在比特大陆TPU盒子(型号:SE3)上安装tensorflow(也试过mxnet),直接pip install tensorflow一直报错,网上查了下有些人说python版本与tensorflow版本不匹配,依旧没解决问题。后面查了一下cpu架构不一样,无法直接从网上pip安装,得下arm架构的安装包。
在TPU终端中输入arch可以直接看CPU架构
我们一般的服务器CPU架构都是X86_64的
我下载tensorflow的地址(https://github.com/lhelontra/tensorflow-on-arm/releases )
最后直接安装即可
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-1.13.1-cp35-none-linux_aarch64.whl