TensorFlow2：RNN、LSTM、GRU

最新推荐文章于 2025-03-27 13:05:17 发布

supreme_wpc98

最新推荐文章于 2025-03-27 13:05:17 发布

阅读量781

点赞数

分类专栏：人工智能

本文链接：https://blog.youkuaiyun.com/pengchengIT/article/details/117572824

版权

人工智能专栏收录该内容

11 篇文章

订阅专栏

本文详细介绍了在TensorFlow2的keras框架下，SimpleRNN、LSTM和GRU层的使用方法，特别是它们的输出形式。RNN通常取最后一个时间步的输出，而LSTM和GRU可以返回每个时间步的状态。通过设置return_sequences和return_state参数，可以获取全部时间步的输出和内部状态。示例代码展示了如何实现这些操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

TensorFlow2中keras框架下layer对象中封装了大量常见循环神经网络层，如keras.layer.SimpleRNN、keras.layer.RNNcell、keras.layer.LSTM、keras.layer.LSTMcell等等类，其中keras.layer.SimpleRnn、keras.layer.LSTM、keras.layer.GRU类就是我们常说的RNN、LSTM、GRU在TensorFlow2对应的函数，下面对几种循环神经网络的输入输出简单介绍

不同的任务，循环神经网络的输出不同，有时取最后一个时间的输出即可，有时要用到全部时间步的输出

RNN

输出最后一个时间步

输入[batch_size,time_step,hidden_nodes]
输出[batch_size,time_step,hidden_nodes]/[batch_size,hidden_nodes]

inputs = np.random.random([32, 10, 8]).astype(np.float32)
simple_rnn = tf.keras.layers.SimpleRNN(4)
output = simple_rnn(inputs)

输出维度

The output has shape [32, 4].

输出每个时间步状态

simple_rnn = tf.keras.layers.SimpleRNN(
    4, return_sequences=True, return_state=True)
whole_sequence_output, final_state = simple_rnn(inputs)

输出维度

whole_sequence_output has shape [32, 10, 4].
final_state has shape [32, 4].

LSTM

从图中可以看出LSTM的会得到一个状态 $c_t$ 经LSTM的输出门最终得到 $h_t$

输出最后一个时间步

inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)

输出维度

The output has shape [32, 4].

获取每个时间步状态、输出

lstm = tf.keras.layers.LSTM(4, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
print(whole_seq_output.shape)
print(final_memory_state.shape)
print(final_carry_state.shape)

输出维度

whole_sequence_output has shape [32, 10, 4].
final_memory_state has shape [32, 4].
final_carry_state has shape [32, 4].

GRU

输出最后一个时间步

inputs = tf.random.normal([32, 10, 8])
gru = tf.keras.layers.GRU(4)
output = gru(inputs)
print(output.shape)

输出维度

The output has shape（32,4）

输出每个时间步状态

gru = tf.keras.layers.GRU(4, return_sequences=True, return_state=True)
whole_sequence_output, final_state = gru(inputs)
print(whole_sequence_output.shape)
print(final_state.shape)