TensorFlow debugger (tfdbg) is a specialized debugger for TensorFlow. It lets you view the internal structure and states of running TensorFlow graphs during training and inference, which is difficult to debug with general-purpose debuggers such as Python’s pdb due to TensorFlow’s computation-graph paradigm.
TensorFlow调试器(tfdbg)是TensorFlow的专用调试器。 它允许您在训练和推理期间查看运行TensorFlow图形的内部结构和状态,由于TensorFlow的计算图范例,使用通用调试器(如Python的pdb)难以进行调试。
- NOTE: The system requirements of tfdbg on supported external
platforms include the following. On Mac OS X, the ncurses library is
required. It can be installed with brew install
homebrew/dupes/ncurses. On Windows, pyreadline is required. If you
use Anaconda3, you can install it with a command such as “C:\Program
Files\Anaconda3\Scripts\pip.exe” install pyreadline.
注:tfdbg在支持的外部系统上的要求平台包括以下内容。
在Mac OS X上,ncurses库是需要。 它可以安装brew安装自制/愚弄/ ncurses的。 在Windows上,pyreadline是必需的。如果你使用Anaconda3,您可以使用“C:\ Program”等命令进行安装Files \ Anaconda3 \ Scripts \ pip.exe“安装pyreadline。
This tutorial demonstrates how to use the tfdbg command-line interface (CLI) to debug the appearance of nans and infs, a frequently-encountered type of bug in TensorFlow model development. The following example is for users who use the low-level Session API of TensorFlow. A later section of this document describes how to use tfdbg with a higher-level API, namely tf-learn Estimators and Experiments. To observe such an issue, run the following command without the debugger (the source code can be found here):
本教程演示如何使用tfdbg命令行界面(CLI)调试nans和infs的外观,这是TensorFlow模型开发中经常遇到的一种错误。 以下示例适用于使用TensorFlow的低级Session API的用户。 本文后面的部分将介绍如何将tfdbg与更高级别的API一起使用,即tf-learn Estimators和Experiments。 要观察这样一个问题,运行下面的命令没有调试器(源代码可以在这里找到):
python -m tensorflow.python.debug.examples.debug_mnist
This code trains a simple neural network for MNIST digit image recognition. Notice that the accuracy increases slightly after the first training step, but then gets stuck at a low (near-chance) level:
这段代码训练MNIST的一段代码。注意开始之后准确率会上升,但是之后准确率会卡在比较低的值上。
Accuracy at step 0: 0.1113
Accuracy at step 1: 0.3183
Accuracy at step 2: 0.098
Accuracy at step 3: 0.098
Accuracy at step 4: 0.098
Wondering what might have gone wrong, you suspect that certain nodes in the training graph generated bad numeric values such as infs and nans, because this is a common cause of this type of training failure. Let’s use tfdbg to debug this issue and pinpoint the exact graph node where this numeric problem first surfaced.
想知道什么可能会出错,您怀疑training graph中的某些节点会生成错误的数值(如infs和nans),因为这是此类training failure的常见原因。 让我们使用tfdbg来调试这个问题,并确定这个数字问题首先出现的the exact graph node 。
1,Wrapping TensorFlow Sessions with tfdbg(使用tfdbg包装TensorFlow Sessions )
To add support for tfdbg in our example, all that is needed is to add the following lines of code and wrap the Session object with a debugger wrapper. This code is already added in debug_mnist.py, so you can activate tfdbg CLI with the –debug flag at the command line.
为了在我们的例子中添加对tfdbg的支持,所需要的只是添加下面的代码行,并用 debugger wrapper来包装Session对象。 此代码已添加到debug_mnist.py中,因此您可以在命令行中使用–debug标志激活tfdbg CLI。
# Let your BUILD target depend on "//tensorflow/python/debug:debug_py"
# (You don't need to worry about the BUILD dependency if you are using a pip
# install of open-source TensorFlow.)
from tensorflow.python import debug as tf_debug
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
This wrapper has the same interface as Session, so enabling debugging requires no other changes to the code. The wrapper provides additional features, including:
这个wrapper 与Session具有相同的接口,因此启用调试不需要对代码进行其他更改。 包装提供了额外的功能,包括:
- Bringing up a CLI before and after Session.run() calls, to let you
control the execution and inspect the graph’s internal state. - Allowing you to register special filters for tensor values, to
facilitate the diagnosis of issues.
In this example, we have already registered a tensor filter called tfdbg.has_inf_or_nan, which simply determines if there are any nan or inf values in any intermediate tensors (tensors that are neither inputs or outputs of the Session.run() call, but are in the path leading from the inputs to the outputs). This filter is for nans and infs is a common enough use case that we ship it with the debug_data module.
在这个例子中,我们已经注册了一个称为tfdbg.has_inf_or_nan的张tensor filter,它可以简单地确定在任何中间张量(张量既不是Session.run()调用的输入也不是输出中有nan或inf值, 在从输入到输出的路径中)。 这个过滤器是nans和infs是一个常见的用例,我们用debug_data模块运送。
Note: You can also write your own custom filters. See the API documentation of DebugDumpDir.find() for additional information.
2,Debugging Model Training with tfdbg
Let’s try training the model again, but with the –debug flag added this time:
python -m tensorflow.python.debug.examples.debug_mnist --debug
The debug wrapper session will prompt you when it is about to execute the first Session.run() call, with information regarding the fetched tensor and feed dictionaries displayed on the screen.
The debug wrapper session 在您要执行第一个Session.run()调用时提示您,并提供有关屏幕上显示的提取张量和Feed字典的信息。
This is what we refer to as the run-start CLI. It lists the feeds and fetches to the current Session.run call, before executing anything.
If the screen size is too small to display the content of the message in its entirety, you can resize it.
Use the PageUp / PageDown / Home / End keys to navigate the screen output. On most keyboards lacking those keys Fn + Up / Fn + Down / Fn + Right / Fn + Left will work.
Enter the run command (or just r) at the command prompt:
tfdbg>