一、问题源起
从以下的异常堆栈可以看到是BLAS程序集初始化失败,可以看到是执行MatMul的时候发生的异常,基本可以断定可能数据集太大导致memory不够用了。
2021-08-10 16:38:04.917501: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.960048: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.986898: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.992366: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.992389: W tensorflow/stream_executor/stream.cc:1455] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/home/mango/PycharmProjects/DeepLearing/minist_conv.py", line 32, in <module>
model.fit(train_images, train_labels, epochs=5, batch_size=64)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 3023, in __call__
return graph_function._call_flat(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 591, in call
outputs = execute.execute(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed : a.shape=[1,64,576], b.shape=[1,576,64], m=64, n=64, k=576
[[node sequential/dense/MatMul (defined at home/mango/PycharmProjects/DeepLearing/minist_conv.py:32) ]] [Op:__inference_train_function_993]
Function call stack:
train_function
二、开发环境
mango@mango-ubuntu:~$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda~~ compilation tools, release 11.4, V11.4.100==
Build cuda_11.4.r11.4/compiler.30188945_0
mango@mango-ubuntu:~$ tail -n 10 /usr/include/cudnn_version.h
#ifndef CUDNN_VERSION_H_
#define CUDNN_VERSION_H_
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 2
#define CUDNN_PATCHLEVEL 2
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#endif /* CUDNN_VERSION_H */
mango@mango-ubuntu:~$ python3 --version
Python 3.9.5
mango@mango-ubuntu:~$ nvidia-smi
Tue Aug 10 19:57:58 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|------------------------------