关于tensorflow: stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数中sequence_length的理解

最新推荐文章于 2024-09-25 20:22:30 发布

原创

最新推荐文章于 2024-09-25 20:22:30 发布 · 2.6k 阅读

6 ·

CC 4.0 BY-SA版权

关于tensorflow: stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数中sequence_length参数的理解

最近因为做毕设，代码中用到了 stack_bidirectional_dynamic_rnn这个API 。对于其中的sequence_length纠结了两三天，不知道到底有没有必要传入这个参数。昨天读了相关源代码，懂了个大概。
通过阅读源码可知，该函数内部通过for循环调用了 bidirectional_dynamic_rnn函数，放上相关源代码（链接）：

def stack_bidirectional_dynamic_rnn(cells_fw,
                                    cells_bw,
                                    inputs,
                                    initial_states_fw=None,
                                    initial_states_bw=None,
                                    dtype=None,
                                    sequence_length=None,
                                    parallel_iterations=None,
                                    time_major=False,
                                    scope=None):
  """
  ...
  Args:
    sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
      containing the actual lengths for each of the sequences.
      ...
"""
 ...
  states_fw = []
  states_bw = []
  prev_layer = inputs

  with vs.variable_scope(scope or "stack_bidirectional_rnn"):
    for i, (cell_fw, cell_bw) in enumerate(zip(cells_fw, cells_bw)):
      initial_state_fw = None
      initial_state_bw = None
      if initial_states_fw:
        initial_state_fw = initial_states_fw[i]
      if initial_states_bw:
        initial_state_bw = initial_states_bw[i]

      with vs.variable_scope("cell_%d" % i):
        outputs, (state_fw, state_bw) = rnn.bidirectional_dynamic_rnn(
            cell_fw,
            cell_bw,
            prev_layer,
            initial_state_fw=initial_state_fw,
            initial_state_bw=initial_state_bw,
            sequence_length=sequence_length,
            parallel_iterations=parallel_iterations,
            dtype=dtype,
            time_major=time_major)
        # Concat the outputs to create the new input.
        prev_layer = array_ops.concat(outputs, 2)
      states_fw.append(state_fw)
      states_bw.append(state_bw)

  return prev_layer, tuple(states_fw), tuple(states_bw)

可以看到，注释中写的是sequence_length是optional，但是实际debug时，发现如果不加sequence_length这个参数，输出的output中，某个数据[max_time, layers_output] 超过实际序列长度后，是不同的output，还是在向后传递计算。所以在模型训练时，这有可能会影响最后的prediction。

现在来看看bidirectional_dynamic_rnn（链接）中，对于sequence_length参数的处理,这里注意该函数内部调用的是dynamic_rnn。

def bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None,
                              initial_state_fw=None, initial_state_bw=None,
                              dtype=None, parallel_iterations=None,
                              swap_memory=False, time_major=False, scope=None):
 ...
 Args:
    sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
      containing the actual lengths for each of the sequences in the batch.
      If not provided, all batch entries are assumed to be full sequences; and
      time reversal is applied from time `0` to `max_time` for each sequence.
...
  with vs.variable_scope(scope or "bidirectional_rnn"):
    # Forward direction
    with vs.variable_scope("fw") as fw_scope:
      output_fw, output_state_fw = dynamic_rnn(
          cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
          initial_state=initial_state_fw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=fw_scope)

前向传播部分，是把sequence_length传入dynamic_rnn。

现在重点来了，后向传播部分，需要先把input进行reverse，再计算dynamic_rnn。

    # Backward direction
    if not time_major:
      time_axis = 1
      batch_axis = 0
    else:
      time_axis = 0
      batch_axis = 1

    def _reverse(input_, seq_lengths, seq_axis, batch_axis):
      if seq_lengths is not None:
        return array_ops.reverse_sequence(
            input=input_, seq_lengths=seq_lengths,
            seq_axis=seq_axis, batch_axis=batch_axis)
      else:
        return array_ops.reverse(input_, axis=[seq_axis])

    with vs.variable_scope("bw") as bw_scope:

      def _map_reverse(inp):
        return _reverse(
            inp,
            seq_lengths=sequence_length,
            seq_axis=time_axis,
            batch_axis=batch_axis)

      inputs_reverse = nest.map_structure(_map_reverse, inputs)
      tmp, output_state_bw = dynamic_rnn(
          cell=cell_bw, inputs=inputs_reverse, sequence_length=sequence_length,
          initial_state=initial_state_bw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=bw_scope)

后向传播部分，对input进行reverse，是经过_map_reverse函数，也即_reverse函数完成的。在_reverse函数中，就有对sequence_length参数的处理，如果没有输入这个参数，通过array_ops.reverse函数翻转整个序列。那么对于输入长短不一的数据，padding部分的0，就会在后向传播一开始输入进去；而输入了sequence_length参数的情况下，是将其传入array_ops.reverse_sequence函数进行翻转。

先看看两个翻转函数的区别：

import tensorflow as tf
import numpy as np

with tf.Session() as sess:
    # a shape[3, 4, 2]
    a = np.array([[[1, 2], [2, 1], [4, 3], [0, 0]],
                  [[2, 1], [3, 4], [0, 0], [0, 0]],
                  [[3, 5], [1, 3], [4, 6], [5, 2]]])

    seq_length = [3, 2, 4]
    b = tf.reverse_sequence(a, seq_length, 1, 0)
    c = tf.reverse(a, axis=[1])
    print(sess.run(b))
    print(sess.run(c))
    
输出：
[[[4 3]
  [2 1]
  [1 2]
  [0 0]]
 [[3 4]
  [2 1]
  [0 0]
  [0 0]]
 [[5 2]
  [4 6]
  [1 3]
  [3 5]]]
[[[0 0]
  [4 3]
  [2 1]
  [1 2]]
 [[0 0]
  [0 0]
  [3 4]
  [2 1]]
 [[5 2]
  [4 6]
  [1 3]
  [3 5]]]

能看到reverse_sequence是只翻转实际长度，0放在最后，reverse是直接全部翻转。

所以，得到第一个结论，stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数对sequence_length参数的处理在于使用了两个不同的翻转函数：reverse_sequence和reverse。

那么，sequence_length的输入与否会对stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn的输出造成什么影响呢？

笔者看来，有两个方面对输出有潜在影响：

一个是正向序列梯度计算时，输入的不足最大长度的data,在计算到实际长度后，还会对后面padding为0的time_step进行计算，生成不必要的错误数据，使训练时长增加；
二则是由于是双向RNN，在反向序列计算中，reverse后，前面padding为0的time_step计算都是无用的，到了真正有数据时才开始计算。

现在还需要搞清楚，是否输入sequence_length是否会对计算结果准确度造成影响？

从基本的开始，先看sequence_length对dynamic_rnn函数输出的影响：

with tf.Session() as sess:
    X = np.random.randn(2, 10, 8)
    # 第二个example长度为6
    X[1, 6:] = 0.0
    # X[1, :6] = 0.0
    X_lengths = [10, 6]

    cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
    outputs, last_states = tf.nn.dynamic_rnn(
        cell=cell,
        dtype=tf.float64,
        sequence_length=X_lengths,
        inputs=X)
    outputs1, last_states1 = tf.nn.dynamic_rnn(
        cell=cell,
        dtype=tf.float64,
        inputs