第二阶段-tensorflow程序图文详解(八)Debugging TensorFlow Programs

本教程介绍了如何使用TensorFlow调试器(tfdbg)来调试模型训练中的nans和infs问题。tfdbg是一个专为TensorFlow设计的调试器,能够帮助在训练和推理过程中检查图的内部状态。通过添加tfdbg包装器并使用特定命令,可以在遇到训练失败时查找如nans和infs这样的数值问题。此外,还展示了如何在tf-learn Estimators和Experiments中使用tfdbg,以及如何进行远程会话的离线调试。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

TensorFlow debugger (tfdbg) is a specialized debugger for TensorFlow. It lets you view the internal structure and states of running TensorFlow graphs during training and inference, which is difficult to debug with general-purpose debuggers such as Python’s pdb due to TensorFlow’s computation-graph paradigm.
TensorFlow调试器(tfdbg)是TensorFlow的专用调试器。 它允许您在训练和推理期间查看运行TensorFlow图形的内部结构和状态,由于TensorFlow的计算图范例,使用通用调试器(如Python的pdb)难以进行调试。

  • NOTE: The system requirements of tfdbg on supported external
    platforms include the following. On Mac OS X, the ncurses library is
    required. It can be installed with brew install
    homebrew/dupes/ncurses. On Windows, pyreadline is required. If you
    use Anaconda3, you can install it with a command such as “C:\Program
    Files\Anaconda3\Scripts\pip.exe” install pyreadline.
    注:tfdbg在支持的外部系统上的要求平台包括以下内容。
    在Mac OS X上,ncurses库是需要。 它可以安装brew安装自制/愚弄/ ncurses的。 在Windows上,pyreadline是必需的。如果你使用Anaconda3,您可以使用“C:\ Program”等命令进行安装Files \ Anaconda3 \ Scripts \ pip.exe“安装pyreadline。

This tutorial demonstrates how to use the tfdbg command-line interface (CLI) to debug the appearance of nans and infs, a frequently-encountered type of bug in TensorFlow model development. The following example is for users who use the low-level Session API of TensorFlow. A later section of this document describes how to use tfdbg with a higher-level API, namely tf-learn Estimators and Experiments. To observe such an issue, run the following command without the debugger (the source code can be found here):
本教程演示如何使用tfdbg命令行界面(CLI)调试nans和infs的外观,这是TensorFlow模型开发中经常遇到的一种错误。 以下示例适用于使用TensorFlow的低级Session API的用户。 本文后面的部分将介绍如何将tfdbg与更高级别的API一起使用,即tf-learn Estimators和Experiments。 要观察这样一个问题,运行下面的命令没有调试器(源代码可以在这里找到):

python -m tensorflow.python.debug.examples.debug_mnist

This code trains a simple neural network for MNIST digit image recognition. Notice that the accuracy increases slightly after the first training step, but then gets stuck at a low (near-chance) level:
这段代码训练MNIST的一段代码。注意开始之后准确率会上升,但是之后准确率会卡在比较低的值上。

Accuracy at step 0: 0.1113
Accuracy at step 1: 0.3183
Accuracy at step 2: 0.098
Accuracy at step 3: 0.098
Accuracy at step 4: 0.098

Wondering what might have gone wrong, you suspect that certain nodes in the training graph generated bad numeric values such as infs and nans, because this is a common cause of this type of training failure. Let’s use tfdbg to debug this issue and pinpoint the exact graph node where this numeric problem first surfaced.
想知道什么可能会出错,您怀疑training graph中的某些节点会生成错误的数值(如infs和nans),因为这是此类training failure的常见原因。 让我们使用tfdbg来调试这个问题,并确定这个数字问题首先出现的the exact graph node 。

1,Wrapping TensorFlow Sessions with tfdbg(使用tfdbg包装TensorFlow Sessions )

To add support for tfdbg in our example, all that is needed is to add the following lines of code and wrap the Session object with a debugger wrapper. This code is already added in debug_mnist.py, so you can activate tfdbg CLI with the –debug flag at the command line.
为了在我们的例子中添加对tfdbg的支持,所需要的只是添加下面的代码行,并用 debugger wrapper来包装Session对象。 此代码已添加到debug_mnist.py中,因此您可以在命令行中使用–debug标志激活tfdbg CLI。

# Let your BUILD target depend on "//tensorflow/python/debug:debug_py"
# (You don't need to worry about the BUILD dependency if you are using a pip
#  install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug

sess = tf_debug.LocalCLIDebugWrapperSession(sess)

This wrapper has the same interface as Session, so enabling debugging requires no other changes to the code. The wrapper provides additional features, including:
这个wrapper 与Session具有相同的接口,因此启用调试不需要对代码进行其他更改。 包装提供了额外的功能,包括:

  • Bringing up a CLI before and after Session.run() calls, to let you
    control the execution and inspect the graph’s internal state.
  • Allowing you to register special filters for tensor values, to
    facilitate the diagnosis of issues.

In this example, we have already registered a tensor filter called tfdbg.has_inf_or_nan, which simply determines if there are any nan or inf values in any intermediate tensors (tensors that are neither inputs or outputs of the Session.run() call, but are in the path leading from the inputs to the outputs). This filter is for nans and infs is a common enough use case that we ship it with the debug_data module.
在这个例子中,我们已经注册了一个称为tfdbg.has_inf_or_nan的张tensor filter,它可以简单地确定在任何中间张量(张量既不是Session.run()调用的输入也不是输出中有nan或inf值, 在从输入到输出的路径中)。 这个过滤器是nans和infs是一个常见的用例,我们用debug_data模块运送。

Note: You can also write your own custom filters. See the API documentation of DebugDumpDir.find() for additional information.

2,Debugging Model Training with tfdbg

Let’s try training the model again, but with the –debug flag added this time:

python -m tensorflow.python.debug.examples.debug_mnist --debug

The debug wrapper session will prompt you when it is about to execute the first Session.run() call, with information regarding the fetched tensor and feed dictionaries displayed on the screen.
The debug wrapper session 在您要执行第一个Session.run()调用时提示您,并提供有关屏幕上显示的提取张量和Feed字典的信息。
这里写图片描述

This is what we refer to as the run-start CLI. It lists the feeds and fetches to the current Session.run call, before executing anything.

If the screen size is too small to display the content of the message in its entirety, you can resize it.

Use the PageUp / PageDown / Home / End keys to navigate the screen output. On most keyboards lacking those keys Fn + Up / Fn + Down / Fn + Right / Fn + Left will work.

Enter the run command (or just r) at the command prompt:

tfdbg> 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值