tensorflow教程:LSTMCell和BasicLSTMCell

最新推荐文章于 2023-10-22 13:45:00 发布

abclhq2005

最新推荐文章于 2023-10-22 13:45:00 发布

阅读量1w

点赞数 2

CC 4.0 BY-SA版权

分类专栏： Python RNN

本文链接：https://blog.youkuaiyun.com/abclhq2005/article/details/78683530

RNN 同时被 2 个专栏收录

5 篇文章

订阅专栏

Python

4 篇文章

订阅专栏

本文详细解析了TensorFlow中LSTMCell与BasicLSTMCell两种长短期记忆单元的不同之处，包括参数设置、内部计算流程及使用场景等方面。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

tf.contrib.rnn.BasicLSTMCell
Defined in tensorflow/python/ops/rnn_cell_impl.py.

__init__(
    num_units,
    forget_bias=1.0,
    state_is_tuple=True,
    activation=None,
    reuse=None
)

Initialize the basic LSTM cell.

Args:

*num_units*: int, The number of units in the LSTM cell.
*forget_bias*: float, The bias added to forget gates (see above). Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
*state_is_tuple*: If True, accepted and returned states are 2-tuples of the c_state and m_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.
*activation*: Activation function of the inner states. Default: tanh.
*reuse*: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
When restoring from CudnnLSTM-trained checkpoints, must use CudnnCompatibleLSTMCell instead.

tf.contrib.rnn.LSTMCell
Defined in tensorflow/python/ops/rnn_cell_impl.py.

__init__(
    num_units,
    use_peepholes=False,
    cell_clip=None,
    initializer=None,
    num_proj=None,
    proj_clip=None,
    num_unit_shards=None,
    num_proj_shards=None,
    forget_bias=1.0,
    state_is_tuple=True,
    activation=None,
    reuse=None
)

Initialize the parameters for an LSTM cell.

Args:

*num_units*: int, The number of units in the LSTM cell.
*use_peepholes*: bool, set True to enable diagonal/peephole connections.
*cell_clip*: (optional) A float value, if provided the cell state is clipped by this value prior to the cell output activation.
*initializer*: (optional) The initializer to use for the weight and projection matrices.
*num_proj*: (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.
*proj_clip*: (optional) A float value. If num_proj > 0 and proj_clip is provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip].
*num_unit_shards*: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
*num_proj_shards*: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
*forget_bias*: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training. Must set it manually to 0.0 when restoring from CudnnLSTM trained checkpoints.
*state_is_tuple*: If True, accepted and returned states are 2-tuples of the c_state and m_state. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
*activation*: Activation function of the inner states. Default: tanh.
*reuse*: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
When restoring from CudnnLSTM-trained checkpoints, must use CudnnCompatibleLSTMCell instead.

LSTMCell和BasicLSTMCell区别
BasicLSTMCell：

if self._linear is None:
      self._linear = _Linear([inputs, h], 4 * self._num_units, True)
    # i = input_gate, j = new_input, f = forget_gate, o = output_gate
    i, j, f, o = array_ops.split(
        value=self._linear([inputs, h]), num_or_size_splits=4, axis=1)

    new_c = (
        c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j))
    new_h = self._activation(new_c) * sigmoid(o)

    if self._state_is_tuple:
      new_state = LSTMStateTuple(new_c, new_h)
    else:
      new_state = array_ops.concat([new_c, new_h], 1)
    return new_h, new_state

LSTMCell：

# i = input_gate, j = new_input, f = forget_gate, o = output_gate
    lstm_matrix = self._linear1([inputs, m_prev])
    i, j, f, o = array_ops.split(
        value=lstm_matrix, num_or_size_splits=4, axis=1)
    # Diagonal connections
    if self._use_peepholes and not self._w_f_diag:
      scope = vs.get_variable_scope()
      with vs.variable_scope(
          scope, initializer=self._initializer) as unit_scope:
        with vs.variable_scope(unit_scope):
          self._w_f_diag = vs.get_variable(
              "w_f_diag", shape=[self._num_units], dtype=dtype)
          self._w_i_diag = vs.get_variable(
              "w_i_diag", shape=[self._num_units], dtype=dtype)
          self._w_o_diag = vs.get_variable(
              "w_o_diag", shape=[self._num_units], dtype=dtype)

    if self._use_peepholes:
      c = (sigmoid(f + self._forget_bias + self._w_f_diag * c_prev) * c_prev +
           sigmoid(i + self._w_i_diag * c_prev) * self._activation(j))
    else:
      c = (sigmoid(f + self._forget_bias) * c_prev + sigmoid(i) *
           self._activation(j))

    if self._cell_clip is not None:
      # pylint: disable=invalid-unary-operand-type
      c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip)
      # pylint: enable=invalid-unary-operand-type
    if self._use_peepholes:
      m = sigmoid(o + self._w_o_diag * c) * self._activation(c)
    else:
      m = sigmoid(o) * self._activation(c)

    if self._num_proj is not None:
      if self._linear2 is None:
        scope = vs.get_variable_scope()
        with vs.variable_scope(scope, initializer=self._initializer):
          with vs.variable_scope("projection") as proj_scope:
            if self._num_proj_shards is not None:
              proj_scope.set_partitioner(
                  partitioned_variables.fixed_size_partitioner(
                      self._num_proj_shards))
            self._linear2 = _Linear(m, self._num_proj, False)
      m = self._linear2(m)

      if self._proj_clip is not None:
        # pylint: disable=invalid-unary-operand-type
        m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip)
        # pylint: enable=invalid-unary-operand-type

    new_state = (LSTMStateTuple(c, m) if self._state_is_tuple else
                 array_ops.concat([c, m], 1))
    return m, new_state

STMCell和BasicLSTMCell的区别：
1. 增加了use_peepholes, bool值，为True时增加窥视孔。
这里写图片描述
2. 增加了cell_clip, 浮点值，把cell的值限制在 ±cell_clip内

c = clip_ops.clip_by_value(c, -self._cell_clip, self._cell_clip)

增加了num_proj（int）和proj_clip(float), 相对于BasicLSTMCell，在输出m计算完之后增加了一层线性变换，并限制了输出的值

m = _linear(m, self._num_proj, bias=False, scope=scope)
m = clip_ops.clip_by_value(m, -self._proj_clip, self._proj_clip)