基于LSTM的Chatbot实例(3) — tensorboard可视化分析LSTM

本文链接：https://blog.youkuaiyun.com/zhangchen2449/article/details/80491979

本文详细介绍了如何使用Tensorboard对基于LSTM的Chatbot模型进行可视化分析，包括Encoder LSTM的结构，BasicLSTMCell的工作原理以及Dropout正则化的应用。通过Tensorboard的GRAPHS栏位，可以观察到模型的计算图和参数变化，有助于理解模型的运行过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、LSTM 计算图　

　　上一篇博文中已经完成了基于tensorflow的chatbot模型建立和训练,并保存训练日志在指定目录。在命令行使用”tensorboard –logdir=‘XXX’”，根据提示打开指定url，即可可视化整个模型计算图及训练过程的参数的变化情况。这里先将选项卡切换在”GRAPHS”栏位，查看整个计算图，整个序列非常的长，下面将分别展开来说

Fig 1-1 chatbot graphs

1.1 Encoder LSTM

上一篇的chatbot代码中我们使用的是tflearn.lstm来构建Encoder

# 开始编码过程，返回的encoder_output_tensor展开成tflearn.regression回归可以识别的形如(?,1,200)的向量
    (encoder_output_tensor, states) = tflearn.lstm(encoder_inputs, self.word_vec_dim, return_state=True,
                                                   scope="encoder_lstm")

首先双击放大encoder_lstm之后，第一层框架如下：

Fig 1-2 encoder_lstm
　　右侧标识了encoder_lstm的Inputs和Outputs（这里的Outputs(8)并不是指encoder_lstm的实际输出，因为lstm的输出只有encoder_output_tensor和states，不会有8个输出，这里的Outputs(8)指的是改encoder_lstm的2个输出在后续的那些模块中有被用到/依赖）。tflearn中的lstm源码如下：

# @File    : tflearn.layers.recurrent.py

def lstm(incoming, n_units, activation='tanh', inner_activation='sigmoid',
         dropout=None, bias=True, weights_init=None, forget_bias=1.0,
         return_seq=False, return_state=False, initial_state=None,
         dynamic=False, trainable=True, restore=True, reuse=False,
         scope=None, name="LSTM"):

    cell = BasicLSTMCell(n_units, activation=activation,
                         inner_activation=inner_activation,
                         forget_bias=forget_bias, bias=bias,
                         weights_init=weights_init, trainable=trainable,
                         restore=restore, reuse=reuse)
    x = _rnn_template(incoming, cell=cell, dropout=dropout,
                      return_seq=return_seq, return_state=return_state,
                      initial_state=initial_state, dynamic=dynamic,
                      scope=scope, name=name)

    return x

　　其中_rnn_template()定义了循环神经网络RNN的模板，BasicLSTMCell只是定义了一个RNN循环中的每个节点Cell。接下来我们先看_rnn_template()是如何循环展开每个Cell，核心代码如下

# @File    : tflearn.layers.recurrent.py

def _rnn_template(incoming, cell, dropout=None, return_seq=False,
                  return_state=False, initial_state=None, dynamic=False,
                  scope=None, reuse=False, name="LSTM"):
    """ RNN Layer Template. """
    with tf.variable_scope(scope, default_name=name, values=[incoming],
                           reuse=reuse) as scope:
        name = scope.name

        _cell = cell
        # Apply dropout
        if dropout:
            if type(dropout) in [tuple, list]:
                in_keep_prob = dropout[0]
                out_keep_prob = dropout[1]
            elif isinstance(dropout, float):
                in_keep_prob, out_keep_prob = dropout, dropout
            else:
                raise Exception("Invalid dropout type (must be a 2-D tuple of "
                                "float)")
            cell = DropoutWrapper(cell, in_keep_prob, out_keep_prob)#这里进行了Dropput封装，参考文献【1】

        inference = incoming
        # If a tensor given, convert it to a per timestep list
        if type(inference) not in [list, np.array]:
            ndim = len(input_shape)
            assert ndim >= 3, "Input dim should be at least 3."
            axes = [1, 0] + list(range(2, ndim))
            inference = tf.transpose(inference, (axes))
            inference = tf.unstack(inference)

        outputs, state = _rnn(cell, inference, dtype=tf.float32,
                              initial_state=initial_state, scope=name,
                              sequence_length=sequence_length)

其中tf.transpose和tf.unstack对应Fig 1-2中绿色框中的计算节点，_rnn中定义了循环展开cell的操作，核心代码如下：

# @File    : tensorflow.python.ops.rnn.py

def static_rnn(cell,
               inputs,
               initial_state=None,
               dtype=None,
               sequence_length=None,
               scope=None):
  """Creates a recurrent neural network specified by RNNCell `cell`."""
  ......

    for time, input_ in enumerate(inputs):
      if time > 0:
        varscope.reuse_variables()
      # pylint: disable=cell-var-from-loop
      call_cell = lambda: cell(input_, state)
      # pylint: enable=cell-var-from-loop
      if sequence_length is not None:
        (output, state) = _rnn_step(
            time=time,
            sequence_length=sequence_length,
            min_sequence_length=min_sequence_length,
            max_sequence_length=max_sequence_length,
            zero_output=zero_output,
            state=state,
            call_cell=call_cell,
            state_size=cell.state_size)
      else:
        (output, state) = call_cell()

      outputs.append(output)

    return (outputs, state)

在tensorboard中，双击encoder_lstm中的encoder_lstm子模块，可以看到RNN循环展开序列如下：

Fig 1-3 encoder_lstm/encoder_lstm

1.２ BasicLSTMCell

　　循环序列的每个子节点都是上面tflearn.lstm()方法中的BasicLSTMCell。tflearn中的BasicLSTMCell是基于论文Recurrent Neural Network Regularization实现的。论文中先是给出了经典的LSTM文献中每个隐藏变量依次单个迭代步骤的矩阵批量迭代形式如下：