Day07【手动实现lstm网络结构】

原创已于 2025-04-11 15:56:01 修改 · 818 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#lstm #人工智能 #自然语言处理

于 2025-04-08 18:50:53 首次发布

自然语言处理专栏收录该内容

23 篇文章

订阅专栏

PyTorch 2.8

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理

手动实现lstm网络结构

目标

通过手动实现矩阵运算的方式复现了 LSTM（长短期记忆网络） 的基本计算过程，并与 PyTorch 中的 LSTM 层进行了比较。目的是帮助理解 LSTM 的工作原理，并为模型转换等任务提供实践支持。

实现过程

核心计算公式

$\begin{aligned} 输入门：i_t &= \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\ 遗忘门：f_t &= \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\ 细胞候选状态：g_t &= \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\ 更新细胞状态：c_t &= f_t \odot c_{t-1} + i_t \odot g_t \\ 输出门：o_t &= \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\ 隐藏状态：h_t &= o_t \odot \tanh(c_t) \end{aligned}$

原论文地址：lstm论文

1.sigmoid 函数

def sigmoid(x):
    return 1/(1 + np.exp(-x))

经典的 sigmoid 激活函数，用于计算值在 (0, 1) 之间的输出，常用在神经网络中作为门控函数。

2. numpy_lstm 函数

def numpy_lstm(x, state_dict):
    ...

numpy_lstm 函数使用 NumPy 来手动实现 LSTM 的计算。
输入 x 是一个时间步序列数据，形状是 (sequence_length, input_dim)，即时间步数和输入维度。
state_dict 是 PyTorch 中 LSTM 层的状态字典，包含了权重和偏置。
hidden_size 是隐藏层的大小。

LSTM 网络通常有四个门：输入门 (i_t), 遗忘门 (f_t), 候选记忆单元 (g), 输出门 (o_t)，这些门的权重和偏置都在 state_dict 中。

3. 权重和偏置拆解

w_i_x, w_f_x, w_c_x, w_o_x = weight_ih[0:hidden_size, :], ...
w_i_h, w_f_h, w_c_h, w_o_h = weight_hh[0:hidden_size, :], ...
b_i_x, b_f_x, b_c_x, b_o_x = bias_ih[0:hidden_size], ...
b_i_h, b_f_h, b_c_h, b_o_h = bias_hh[0:hidden_size], ...

这部分从 PyTorch内定义好的 LSTM 层的权重和偏置中提取出来并拆分成各个门的权重和偏置。PyTorch 将四个门的权重拼接在一起，在手动实现时，我们需要将它们拆开。

4. 矩阵拼接

w_i = np.concatenate([w_i_h, w_i_x], axis=1)
w_f = np.concatenate([w_f_h, w_f_x], axis=1)
w_c = np.concatenate([w_c_h, w_c_x], axis=1)
w_o = np.concatenate([w_o_h, w_o_x], axis=1)

LSTM 的每个门需要的权重是由输入（x）和前一时刻的隐藏状态（h_t）的权重拼接得到的。

5. 计算 LSTM 的状态

f_t = sigmoid(np.dot(hx, w_f.T) + b_f)
i_t = sigmoid(np.dot(hx, w_i.T) + b_i)
g = np.tanh(np.dot(hx, w_c.T) + b_c)
c_t = f_t * c_t + i_t * g
o_t = sigmoid(np.dot(hx, w_o.T) + b_o)
h_t = o_t * np.tanh(c_t)

对于每个时间步 x_t，我们通过矩阵运算计算 遗忘门 (f_t)，输入门 (i_t)，候选记忆单元 (g)，输出门 (o_t)。
然后根据这些门的输出计算当前时刻的记忆单元 c_t 和隐藏状态 h_t。
c_t 是 LSTM 的记忆单元，存储长期记忆，h_t 是当前时刻的隐藏状态，作为输出传递到下一层或者作为模型的输出。

6. LSTM 的时间步循环

for x_t in x:
    x_t = x_t[np.newaxis, :]
    hx = np.concatenate([h_t, x_t], axis=1)
    f_t = sigmoid(np.dot(hx, w_f.T) + b_f)
    i_t = sigmoid(np.dot(hx, w_i.T) + b_i)
    g = np.tanh(np.dot(hx, w_c.T) + b_c)
    c_t = f_t * c_t + i_t * g
    o_t = sigmoid(np.dot(hx, w_o.T) + b_o)
    h_t = o_t * np.tanh(c_t)
    sequence_output.append(h_t)

在每个时间步，输入 x_t 和上一时刻的隐藏状态 h_t 被拼接在一起，然后根据上述公式计算当前时间步的状态和输出。
通过这个循环，LSTM 对整个序列进行处理，输出每个时间步的隐藏状态 h_t，最终返回一个包含所有时间步输出的序列。

7. Main Function（主函数）

if __name__ == "__main__":
    # 构造一个输入
    length = 6
    input_dim = 12
    hidden_size = 7
    x = np.random.random((length, input_dim))

    # 使用pytorch的lstm层
    torch_lstm = nn.LSTM(input_dim, hidden_size, batch_first=True)
    for key, weight in torch_lstm.state_dict().items():
        print(key, weight.shape)
    torch_sequence_output, (torch_h, torch_c) = torch_lstm(torch.Tensor([x]))
    numpy_sequence_output, (numpy_h, numpy_c) = numpy_lstm(x, torch_lstm.state_dict())
    print(torch_sequence_output)
    print(numpy_sequence_output)
    print("--------")
    print(torch_h)
    print(numpy_h)
    print("--------")
    print(torch_c)
    print(numpy_c)

生成一个随机的输入序列 x，并且定义了一个 PyTorch 的 LSTM 层。
使用 PyTorch 的 LSTM 层进行前向传播计算，并且将 PyTorch LSTM 层的状态字典 (state_dict) 传递给 numpy_lstm 函数来实现相同的 LSTM 计算过程。
输出结果包括两个部分：LSTM 层的输出（sequence_output）和 最终的隐藏状态（h_t 和 c_t）。
torch_sequence_output 和 numpy_sequence_output 这两个输出会被比较，来验证手动实现的 LSTM 是否与 PyTorch 的 LSTM 层计算一致。

以下是全部代码：


import torch
import torch.nn as nn
import numpy as np

'''
用矩阵运算的方式复现一些基础的模型结构
清楚模型的计算细节，有助于加深对于模型的理解，以及模型转换等工作
'''
def sigmoid(x):
    return 1/(1 + np.exp(-x))

#将pytorch的lstm网络权重拿出来，用numpy通过矩阵运算实现lstm的计算
def numpy_lstm(x, state_dict):
    weight_ih = state_dict["weight_ih_l0"].numpy()
    weight_hh = state_dict["weight_hh_l0"].numpy()
    bias_ih = state_dict["bias_ih_l0"].numpy()
    bias_hh = state_dict["bias_hh_l0"].numpy()
    #pytorch将四个门的权重拼接存储，我们将它拆开
    w_i_x, w_f_x, w_c_x, w_o_x = weight_ih[0:hidden_size, :], \
                                 weight_ih[hidden_size:hidden_size*2, :],\
                                 weight_ih[hidden_size*2:hidden_size*3, :],\
                                 weight_ih[hidden_size*3:hidden_size*4, :]
    w_i_h, w_f_h, w_c_h, w_o_h = weight_hh[0:hidden_size, :], \
                                 weight_hh[hidden_size:hidden_size * 2, :], \
                                 weight_hh[hidden_size * 2:hidden_size * 3, :], \
                                 weight_hh[hidden_size * 3:hidden_size * 4, :]
    b_i_x, b_f_x, b_c_x, b_o_x = bias_ih[0:hidden_size], \
                                 bias_ih[hidden_size:hidden_size * 2], \
                                 bias_ih[hidden_size * 2:hidden_size * 3], \
                                 bias_ih[hidden_size * 3:hidden_size * 4]
    b_i_h, b_f_h, b_c_h, b_o_h = bias_hh[0:hidden_size], \
                                 bias_hh[hidden_size:hidden_size * 2], \
                                 bias_hh[hidden_size * 2:hidden_size * 3], \
                                 bias_hh[hidden_size * 3:hidden_size * 4]
    w_i = np.concatenate([w_i_h, w_i_x], axis=1)
    w_f = np.concatenate([w_f_h, w_f_x], axis=1)
    w_c = np.concatenate([w_c_h, w_c_x], axis=1)
    w_o = np.concatenate([w_o_h, w_o_x], axis=1)
    b_f = b_f_h + b_f_x
    b_i = b_i_h + b_i_x
    b_c = b_c_h + b_c_x
    b_o = b_o_h + b_o_x
    c_t = np.zeros((1, hidden_size))
    h_t = np.zeros((1, hidden_size))
    sequence_output = []
    for x_t in x:
        x_t = x_t[np.newaxis, :]
        hx = np.concatenate([h_t, x_t], axis=1)
        # f_t = sigmoid(np.dot(x_t, w_f_x.T) + b_f_x + np.dot(h_t, w_f_h.T) + b_f_h)
        f_t = sigmoid(np.dot(hx, w_f.T) + b_f)
        # i_t = sigmoid(np.dot(x_t, w_i_x.T) + b_i_x + np.dot(h_t, w_i_h.T) + b_i_h)
        i_t = sigmoid(np.dot(hx, w_i.T) + b_i)
        # g = np.tanh(np.dot(x_t, w_c_x.T) + b_c_x + np.dot(h_t, w_c_h.T) + b_c_h)
        g = np.tanh(np.dot(hx, w_c.T) + b_c)
        c_t = f_t * c_t + i_t * g
        # o_t = sigmoid(np.dot(x_t, w_o_x.T) + b_o_x + np.dot(h_t, w_o_h.T) + b_o_h)
        o_t = sigmoid(np.dot(hx, w_o.T) + b_o)
        h_t = o_t * np.tanh(c_t)
        sequence_output.append(h_t)
    return np.array(sequence_output), (h_t, c_t)


if __name__ == "__main__":
    # 构造一个输入
    length = 6
    input_dim = 12
    hidden_size = 7
    x = np.random.random((length, input_dim))
    # print(x)

    # 使用pytorch的lstm层
    torch_lstm = nn.LSTM(input_dim, hidden_size, batch_first=True)
    for key, weight in torch_lstm.state_dict().items():
        print(key, weight.shape)
    torch_sequence_output, (torch_h, torch_c) = torch_lstm(torch.Tensor([x]))
    numpy_sequence_output, (numpy_h, numpy_c) = numpy_lstm(x, torch_lstm.state_dict())
    print(torch_sequence_output)
    print(numpy_sequence_output)
    print("--------")
    print(torch_h)
    print(numpy_h)
    print("--------")
    print(torch_c)
    print(numpy_c)

总结

通过手动实现 LSTM 的矩阵运算并与 PyTorch 的 LSTM 层进行对比，可以帮助我们更深入地理解 LSTM 网络的内部计算流程，尤其是如何处理时间步之间的状态更新。通过直接操作权重和偏置，可以更好地理解 LSTM 的工作原理。

输出对比

tensor([[[-0.1205, -0.1704,  0.0310, -0.0989,  0.0871,  0.1023,  0.0655],
         [-0.1579, -0.2710,  0.0359, -0.2037,  0.1490,  0.1068,  0.1143],
         [-0.1529, -0.3607,  0.0552, -0.2770,  0.2027,  0.0644,  0.1113],
         [-0.1655, -0.3208,  0.0550, -0.3131,  0.2490,  0.0651,  0.1200],
         [-0.1228, -0.2922,  0.0145, -0.2761,  0.2012,  0.0490,  0.1081],
         [-0.1530, -0.2648,  0.0325, -0.3218,  0.2041,  0.0555,  0.0857]]],
       grad_fn=<TransposeBackward0>)
[[[-0.120453   -0.17039223  0.03101122 -0.098935    0.08713611
    0.10227061  0.06551073]]

 [[-0.15789323 -0.27098775  0.03587566 -0.20372095  0.14899549
    0.10679599  0.11429273]]

 [[-0.15292168 -0.36072989  0.05515171 -0.27695148  0.20270405
    0.06443658  0.1112967 ]]

 [[-0.16553585 -0.32076167  0.05504265 -0.31307045  0.24900938
    0.06508521  0.12003417]]

 [[-0.12280774 -0.29223718  0.01452096 -0.27610671  0.20121294
    0.04902264  0.10812964]]

 [[-0.15301787 -0.26477319  0.0324808  -0.32181077  0.20412219
    0.05554343  0.08568159]]]

进程已结束，退出代码为 0