pytorch中Bi-LSTM传递给线性层的输入采用lstm_out[:,-1,:]还是torch.cat([h_n[-1,:,:],h

本文链接：https://blog.youkuaiyun.com/w_x_yhao/article/details/124198755

本文探讨了PyTorch中Bi-LSTM在网络结构上的两种常见理解误区，并通过实例代码分析了Bi-LSTM的隐藏层输出`lstm_out[:,-1,:]`与`torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)`的区别。实验结果显示`lstm_out[:,-1,:]`并不等同于两者拼接，而是包含了正向和反向LSTM的最后一个时刻状态。因此，为了充分利用Bi-LSTM的特性，建议使用`torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)`传递给线性层。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、困惑点：

pytorch中Bi-LSTM传递给线性层的输入采用lstm_out[:,-1,:]还是torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)这个问题困惑了我很久。
首先是原理上的困惑：主要原因是网上很多关于Bi-LSTM的网络示意图存在一定的误导性。
比说如下面两种原理图：

1、正向计算中序列从左往右最后一个隐藏层状态输出和反向计算中序列从右往左第一个隐藏层状态输出

在这里插入图片描述
图片来自：https://www.weiyangx.com/362968.html

这种原理会导致反向计算过程中，只用到了xt时刻的数据，并没有达到根据后面学习前面的目的。（所以应该采用第二种原理图）

2、正向计算中序列从左往右最后一个隐藏层状态输出和反向计算中序列从右往左最后一个隐藏层状态输出

图片来自：https://blog.youkuaiyun.com/weixin_38981611/article/details/120366207

二、 lstm_out[:,-1,:]还是torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)

（由于本人并不是专门做软件的，是做硬件加速器设计的，所以下面可能会存在错误，如有错误，恳请指出）

根据上面两个原理图的分析，我们可以知道，将正向计算中序列从左往右最后一个隐藏层状态输出和反向计算中序列从右往左最后一个隐藏层状态输出组合在一起送到线性层。
在参考其他人代码的时候，发现不同人在用pytorch时，lstm传递给线性层的参数不同，lstm_out[:,-1,:]或torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1) 注：针对单项lstm时，lstm_out[:,-1,:]和h_n是一样的。
那么存在一个问题，对于Bi-LSTM来说 lstm_out[:,-1,:]和torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)等价吗？或者说lstm_out[:,-1,:]是由正向计算中序列从左往右最后一个隐藏层状态输出和反向计算中序列从右往左最后一个隐藏层状态输出拼接得到的吗？
我编写了一个一个的双向lstm网络，对这件事情进行验证，其中：
batch_size = 10
time_step = 2
input_size = 2
hidden_size = 2
num_layers = 1
具体代码如下：

import torch
from torch import nn

time_step = 2
input_size = 2
learning_rate = 0.001
hidden_size = 2
num_layers = 1
end_lenth=6500

torch.set_printoptions(threshold=3000000)
class Bi_LSTM(nn.Module):
    """搭建LSTM"""
    def __init__(self):
        super(Bi_LSTM, self).__init__()
        # LSTM层
        self.lstm = nn.LSTM(input_size=input_size,      # 输入单元个数
                            hidden_size=hidden_size,    # 隐藏单元个数
                            num_layers=num_layers,      # 隐藏层数
                            batch_first=True,           # True：[batch, time_step, input_size] False:[time_step, batch, input_size]
                            bidirectional=True)

        # 输出层
        self.output_layers= nn.Linear(in_features=hidden_size*2,    # 输入特征个数
                                       out_features=1)  # 输出特征个数

    def forward(self, x):
        lstm_out, (h_n, h_c) = self.lstm(x, None)   #
        print('-'*40)
        print('lstm_out:    ',lstm_out.shape)
        print('h_n:         ',h_n.shape)
        print('-'*20,'lstm_out[:,-1,:]','-'*20)
        print(lstm_out[:,-1,:])
        print('-' * 20, 'lstm_out[:,0,:]', '-' * 20)
        print(lstm_out[:,0,:])
        print('-' * 20, 'h_n[-2,:,:]', '-' * 20)
        print(h_n[-2, :, :])
        print('-' * 20, 'h_n[-1,:,:]', '-' * 20)
        print(h_n[-1,:,:])
        output=torch.cat([h_n[-1,:,:],h_n[-2,:,:]],dim=-1)
        output = self.output_layers(output)
        return output

过程中会打印lstm_out第一时刻和最后一时刻的输出（其实总共就两个时刻），以及h_n[-1,:,:]和h_n[-2,:,:]，下面进行对比：

lstm_out最后一时刻和第一时刻的值：
在这里插入图片描述
h_n[-2,:,:]和h_n[-1,:,:]

经过上面的对比，可以看出lstm_out[:,-1,:]并不是由h_n[-2,:,:]和h_n[-1,:,:]拼接而来的，lstm_out[:,-1,:]是由正向计算中序列从左往右最后一个隐藏层状态输出和反向计算中序列从右往左第一个隐藏层状态输出拼接得到。