Pytorch中LSTM与GRU的使用与参数理解

最新推荐文章于 2025-03-07 17:13:50 发布

SpadeA_Iverxin

最新推荐文章于 2025-03-07 17:13:50 发布

阅读量6.5k

点赞数 6

分类专栏： coding 机器学习

本文链接：https://blog.youkuaiyun.com/KuXiaoQuShiHuai/article/details/113367305

版权

机器学习同时被 2 个专栏收录

14 篇文章

订阅专栏

coding

8 篇文章

订阅专栏

本文详细介绍了在PyTorch中如何使用LSTM和GRU进行序列建模，包括它们的参数设置、输入格式、初始化过程以及实例演示。重点讲解了这两种门控循环神经网络的区别与应用场景，适合初学者和进阶者参考。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Pytorch中LSTM与GRU的使用

在pytorch中，LSTM模块调用和GRU类似。下面调用以GRU为例。

GRU

初始化

rnn = nn.GRU(input_size, hidden_size, num_layers, bias, batch_first, dropout, bidirectional)

input_size: input的特征维度
hidden_size: 隐藏层的宽度
num_layers: 单元的数量（层数），默认为1，如果为2以为着将两个GRU堆叠在一起，当成一个GRU单元使用。
bias: True or False，是否使用bias项，默认使用
batch_first: Ture or False, 默认的输入是三个维度的，即：(seq, batch, feature)，第一个维度是时间序列，第二个维度是batch，第三个维度是特征。如果设置为True，则(batch, seq, feature)。即batch，时间序列，每个时间点特征。
dropout：设置隐藏层是否启用dropout，默认为0
bidirectional：True or False, 默认为False，是否使用双向的GRU，如果使用双向的GRU，则自动将序列正序和反序各输入一次。

调用输入：

output = rnn(input, h_0)

input of shape (seq_len, batch, input_size): tensor containing the features
of the input sequence. The input can also be a packed variable length
sequence. See :func:torch.nn.utils.rnn.pack_padded_sequence
for details.

输入是3个维度，分别是时间步，batch，和特征。

关于rnn.pack_padded_sequence [Pytorch中的RNN之pack_padded_sequence()和pad_packed_sequence()]
h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the initial hidden state for each element in the batch.
Defaults to zero if not provided. If the RNN is bidirectional,
num_directions should be 2, else it should be 1.

隐藏层的初始化值。shape是（GRU单元的隐藏层数量x方向[1 or 2]， batch，隐藏层宽度）

输出：

output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t. If a class:torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

output形状：(序列长度，batch，方向[1 or 2]*隐藏层宽度) 。这个ouput包含了每个时间步的输出。可以使用output.view(seq_len, batch, num_directions, hidden_size)分解维度。
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor
containing the hidden state for t = seq_len Like output, the layers can be separated using
h_n.view(num_layers, num_directions, batch, hidden_size).

隐藏层形状：(GRU堆叠层数*方向[1 or 2], batch, 隐藏层的宽度)。可以使用h_n.view(num_layers, num_directions, batch, hidden_size)分解维度。

Examples1（官方源码）::

rnn = nn.GRU(10, 20, 2) # 输入的特征数10，隐藏层宽度20，2个GRU堆叠作为一个单元（2层）
input = torch.randn(5, 3, 10) # 随机初始化输入，
# 5个时间步，[5个序列（时间步）[batch=3[10个特征]]]
h0 = torch.randn(2, 3, 20) # 初始化隐藏层输入。
# 一次输入了5个时间步，3个batch
output, hn = rnn(input, h0) # 调用（input, 隐藏层）

Examples2()::当初始化条件：输入为1个时间步，此时GRU只有一个节点，在处理序列为题时，需要自己手写循环的数量。该层只相当于一个GRU单元。
rnn = nn.GRU(10,20,1) # input特征数为10，隐藏层宽度20, 1个GRU
input = torch.randn(1,1,10) # 随机输入，1个时间步，batch_size = 1, 特征数10
h0 = torch.randn(1,1,20)
output, hn = rnn(input, h0)