Tensorflow:If you want to see a list of allocated tensors when OOM happens, add report_tensor_alloca

本文详细解析了在使用TensorFlow进行深度学习模型训练时遇到的内存溢出问题。问题出现在尝试分配形状为[2,33,1024,1024]的张量时,由于batch_size设置过大及同时运行多个大型网络导致GPU资源耗尽。解决方法包括减小batch_size,关闭不必要的应用程序以释放内存,并确保系统资源充足。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

错误:ensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,33,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: training/Adam/gradients/AddN_3-1-TransposeNHWCToNCHW-LayoutOptimizer = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/batch_normalization_11/cond/FusedBatchNorm/Switch_grad/cond_grad, PermConstNHWCToNCHW-LayoutOptimizer)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: loss/mul/_831 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6298_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

原因:batch_size设置较大,可以改小,同时电脑尽量关闭不必要页面。以保证代码运行有足够的内存。我的原因就是同时跑了两个大型网络,而且都用到两个GPU,所以出现此问题。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值