win10+GTX1060+tf1.15+object detection API踩坑记(4)

最新推荐文章于 2023-09-23 18:24:31 发布

原创最新推荐文章于 2023-09-23 18:24:31 发布 · 326 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow #目标检测 #python

python 同时被 3 个专栏收录

14 篇文章

订阅专栏

目标检测

13 篇文章

订阅专栏

tensorflow2

6 篇文章

订阅专栏

博客内容描述了一个遇到的GPU故障情况，使用的是TensorFlow1.15.5版本和CUDA11.0及不同版本的CUDNN，但仍然出现'Internal: invalid device function'的错误。尝试过降低CUDA版本以及限制GPU内存使用，问题未解决。可能的原因是CUDA版本过高或者与TensorFlow版本不匹配。

部署运行你感兴趣的模型镜像

GPU故障

2021-09-08 06:53:46.096205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4863 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-09-08 06:53:47.587810: F .\tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: invalid device function
Fatal Python error: Aborted

检查tf版本

(base) PS D:\models-master\research> python
Python 3.7.10 (default, Feb 26 2021, 13:06:18) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-09-08 07:04:27.742112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
>>> tf.__version__
'1.15.5'
>>> exit()

查看CUDA

(base) PS D:\models-master\research> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:48_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.relgpu_drvr445TC445_37.28540450_0

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|========================================
| 0 N/A N/A 19812 C+G ...bbwe\Microsoft.Photos.exe N/A |
+-----------------------------------------------------------------------------+

可能原因：

cuda版本太高，换成10.0?

cuda换成10.0，cudnn测试了从7.3至7.6所有12个cuda10.0版本，故障依旧。

默认使用的GPU大小超过了当前GPU的最大值，或者说默认使用量太大了，只能手动限制

import tensorflow as tf
os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”
os.environ[“CUDA_VISIBLE_DEVICES”] = “5”
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

gpu_memory_fraction=1 改为0.7，一样问题。

重新安装了CUDA和CUDNN ？

您可能感兴趣的与本文相关的镜像