关于tensorflow: stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数中sequence_length参数的理解
最近因为做毕设,代码中用到了 stack_bidirectional_dynamic_rnn这个API 。对于其中的sequence_length纠结了两三天,不知道到底有没有必要传入这个参数。昨天读了相关源代码,懂了个大概。
通过阅读源码可知,该函数内部通过for循环调用了 bidirectional_dynamic_rnn函数,放上相关源代码( 链接):
def stack_bidirectional_dynamic_rnn(cells_fw,
cells_bw,
inputs,
initial_states_fw=None,
initial_states_bw=None,
dtype=None,
sequence_length=None,
parallel_iterations=None,
time_major=False,
scope=None):
"""
...
Args:
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
containing the actual lengths for each of the sequences.
...
"""
...
states_fw = []
states_bw = []
prev_layer = inputs
with vs.variable_scope(scope or "stack_bidirectional_rnn"):
for i, (cell_fw, cell_bw) in enumerate(zip(cells_fw, cells_bw)):
initial_state_fw = None
initial_state_bw = None
if initial_states_fw:
initial_state_fw = initial_states_fw[i]
if initial_states_bw:
initial_state_bw = initial_states_bw[i]
with vs.variable_scope("cell_%d" % i):
outputs, (state_fw, state_bw) = rnn.bidirectional_dynamic_rnn(
cell_fw,
cell_bw,
prev_layer,
initial_state_fw=initial_state_fw,
initial_state_bw=initial_state_bw,
sequence_length=sequence_length,
parallel_iterations=parallel_iterations,
dtype=dtype,
time_major=time_major)
# Concat the outputs to create the new input.
prev_layer = array_ops.concat(outputs, 2)
states_fw.append(state_fw)
states_bw.append(state_bw)
return prev_layer, tuple(states_fw), tuple(states_bw)
可以看到,注释中写的是sequence_length是optional,但是实际debug时,发现如果不加sequence_length这个参数,输出的output中,某个数据[max_time, layers_output] 超过实际序列长度后,是不同的output,还是在向后传递计算。所以在模型训练时,这有可能会影响最后的prediction。
现在来看看bidirectional_dynamic_rnn(链接)中,对于sequence_length参数的处理,这里注意该函数内部调用的是dynamic_rnn。
def bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None,
initial_state_fw=None, initial_state_bw=None,
dtype=None, parallel_iterations=None,
swap_memory=False, time_major=False, scope=None):
...
Args:
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
containing the actual lengths for each of the sequences in the batch.
If not provided, all batch entries are assumed to be full sequences; and
time reversal is applied from time `0` to `max_time` for each sequence.
...
with vs.variable_scope(scope or "bidirectional_rnn"):
# Forward direction
with vs.variable_scope("fw") as fw_scope:
output_fw, output_state_fw = dynamic_rnn(
cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
initial_state=initial_state_fw, dtype=dtype,
parallel_iterations=parallel_iterations, swap_memory=swap_memory,
time_major=time_major, scope=fw_scope)
前向传播部分,是把sequence_length传入dynamic_rnn。
现在重点来了,后向传播部分,需要先把input进行reverse,再计算dynamic_rnn。
# Backward direction
if not time_major:
time_axis = 1
batch_axis = 0
else:
time_axis = 0
batch_axis = 1
def _reverse(input_, seq_lengths, seq_axis, batch_axis):
if seq_lengths is not None:
return array_ops.reverse_sequence(
input=input_, seq_lengths=seq_lengths,
seq_axis=seq_axis, batch_axis=batch_axis)
else:
return array_ops.reverse(input_, axis=[seq_axis])
with vs.variable_scope("bw") as bw_scope:
def _map_reverse(inp):
return _reverse(
inp,
seq_lengths=sequence_length,
seq_axis=time_axis,
batch_axis=batch_axis)
inputs_reverse = nest.map_structure(_map_reverse, inputs)
tmp, output_state_bw = dynamic_rnn(
cell=cell_bw, inputs=inputs_reverse, sequence_length=sequence_length,
initial_state=initial_state_bw, dtype=dtype,
parallel_iterations=parallel_iterations, swap_memory=swap_memory,
time_major=time_major, scope=bw_scope)
后向传播部分,对input进行reverse,是经过_map_reverse函数,也即_reverse函数完成的。在_reverse函数中,就有对sequence_length参数的处理,如果没有输入这个参数,通过array_ops.reverse函数翻转整个序列。那么对于输入长短不一的数据,padding部分的0,就会在后向传播一开始输入进去;而输入了sequence_length参数的情况下,是将其传入array_ops.reverse_sequence函数进行翻转。
先看看两个翻转函数的区别:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
# a shape[3, 4, 2]
a = np.array([[[1, 2], [2, 1], [4, 3], [0, 0]],
[[2, 1], [3, 4], [0, 0], [0, 0]],
[[3, 5], [1, 3], [4, 6], [5, 2]]])
seq_length = [3, 2, 4]
b = tf.reverse_sequence(a, seq_length, 1, 0)
c = tf.reverse(a, axis=[1])
print(sess.run(b))
print(sess.run(c))
输出:
[[[4 3]
[2 1]
[1 2]
[0 0]]
[[3 4]
[2 1]
[0 0]
[0 0]]
[[5 2]
[4 6]
[1 3]
[3 5]]]
[[[0 0]
[4 3]
[2 1]
[1 2]]
[[0 0]
[0 0]
[3 4]
[2 1]]
[[5 2]
[4 6]
[1 3]
[3 5]]]
能看到reverse_sequence是只翻转实际长度,0放在最后,reverse是直接全部翻转。
所以,得到第一个结论,stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数对sequence_length参数的处理在于使用了两个不同的翻转函数:reverse_sequence和reverse。
那么,sequence_length的输入与否会对stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn的输出造成什么影响呢?
笔者看来,有两个方面对输出有潜在影响:
-
一个是正向序列梯度计算时,输入的不足最大长度的data,在计算到实际长度后,还会对后面padding为0的time_step进行计算,生成不必要的错误数据,使训练时长增加;
-
二则是由于是双向RNN,在反向序列计算中,reverse后,前面padding为0的time_step计算都是无用的,到了真正有数据时才开始计算。
现在还需要搞清楚,是否输入sequence_length是否会对计算结果准确度造成影响?
从基本的开始,先看sequence_length对dynamic_rnn函数输出的影响:
with tf.Session() as sess:
X = np.random.randn(2, 10, 8)
# 第二个example长度为6
X[1, 6:] = 0.0
# X[1, :6] = 0.0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
outputs, last_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
outputs1, last_states1 = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float64,
inputs