一、LSTM 计算图
上一篇博文中已经完成了基于tensorflow的chatbot模型建立和训练,并保存训练日志在指定目录。在命令行使用”tensorboard –logdir=‘XXX’”,根据提示打开指定url,即可可视化整个模型计算图及训练过程的参数的变化情况。这里先将选项卡切换在”GRAPHS”栏位,查看整个计算图,整个序列非常的长,下面将分别展开来说

Fig 1-1 chatbot graphs
1.1 Encoder LSTM
上一篇的chatbot代码中我们使用的是tflearn.lstm来构建Encoder
# 开始编码过程,返回的encoder_output_tensor展开成tflearn.regression回归可以识别的形如(?,1,200)的向量
(encoder_output_tensor, states) = tflearn.lstm(encoder_inputs, self.word_vec_dim, return_state=True,
scope="encoder_lstm")
首先双击放大encoder_lstm之后,第一层框架如下:

Fig 1-2 encoder_lstm
右侧标识了encoder_lstm的Inputs和Outputs(这里的Outputs(8)并不是指encoder_lstm的实际输出,因为lstm的输出只有encoder_output_tensor和states,不会有8个输出,这里的Outputs(8)指的是改encoder_lstm的2个输出在后续的那些模块中有被用到/依赖)。tflearn中的lstm源码如下:
# @File : tflearn.layers.recurrent.py
def lstm(incoming, n_units, activation='tanh', inner_activation='sigmoid',
dropout=None, bias=True, weights_init=None, forget_bias=1.0,
return_seq=False, return_state=False, initial_state=None,
dynamic=False, trainable=True, restore=True, reuse=False,
scope=None, name="LSTM"):
cell = BasicLSTMCell(n_units, activation=activation,
inner_activation=inner_activation,
forget_bias=forget_bias, bias=bias,
weights_init=weights_init, trainable=trainable,
restore=restore, reuse=reuse)
x = _rnn_template(incoming, cell=cell, dropout=dropout,
return_seq=return_seq, return_state=return_state,
initial_state=initial_state, dynamic=dynamic,
scope=scope, name=name)
return x
其中_rnn_template()定义了循环神经网络RNN的模板,BasicLSTMCell只是定义了一个RNN循环中的每个节点Cell。接下来我们先看_rnn_template()是如何循环展开每个Cell,核心代码如下
# @File : tflearn.layers.recurrent.py
def _rnn_template(incoming, cell, dropout=None, return_seq=False,
return_state=False, initial_state=None, dynamic=False,
scope=None, reuse=False, name="LSTM"):
""" RNN Layer Template. """
with tf.variable_scope(scope, default_name=name, values=[incoming],
reuse=reuse) as scope:
name = scope.name
_cell = cell
# Apply dropout
if dropout:
if type(dropout) in [tuple, list]:
in_keep_prob = dropout[0]
out_keep_prob = dropout[1]
elif isinstance(dropout, float):
in_keep_prob, out_keep_prob = dropout, dropout
else:
raise Exception("Invalid dropout type (must be a 2-D tuple of "
"float)")
cell = DropoutWrapper(cell, in_keep_prob, out_keep_prob)#这里进行了Dropput封装,参考文献【1】
inference = incoming
# If a tensor given, convert it to a per timestep list
if type(inference) not in [list, np.array]:
ndim = len(input_shape)
assert ndim >= 3, "Input dim should be at least 3."
axes = [1, 0] + list(range(2, ndim))
inference = tf.transpose(inference, (axes))
inference = tf.unstack(inference)
outputs, state = _rnn(cell, inference, dtype=tf.float32,
initial_state=initial_state, scope=name,
sequence_length=sequence_length)
其中tf.transpose和tf.unstack对应Fig 1-2中绿色框中的计算节点,_rnn中定义了循环展开cell的操作,核心代码如下:
# @File : tensorflow.python.ops.rnn.py
def static_rnn(cell,
inputs,
initial_state=None,
dtype=None,
sequence_length=None,
scope=None):
"""Creates a recurrent neural network specified by RNNCell `cell`."""
......
for time, input_ in enumerate(inputs):
if time > 0:
varscope.reuse_variables()
# pylint: disable=cell-var-from-loop
call_cell = lambda: cell(input_, state)
# pylint: enable=cell-var-from-loop
if sequence_length is not None:
(output, state) = _rnn_step(
time=time,
sequence_length=sequence_length,
min_sequence_length=min_sequence_length,
max_sequence_length=max_sequence_length,
zero_output=zero_output,
state=state,
call_cell=call_cell,
state_size=cell.state_size)
else:
(output, state) = call_cell()
outputs.append(output)
return (outputs, state)
在tensorboard中,双击encoder_lstm中的encoder_lstm子模块,可以看到RNN循环展开序列如下:

1.2 BasicLSTMCell
循环序列的每个子节点都是上面tflearn.lstm()方法中的BasicLSTMCell。tflearn中的BasicLSTMCell是基于论文Recurrent Neural Network Regularization实现的。论文中先是给出了经典的LSTM文献中每个隐藏变量依次单个迭代步骤的矩阵批量迭代形式如下:

Fig 1-4 批量迭代公式
其中