MXNet双向循环神经网络----单个隐藏层的双向循环神经网络（程序）

最新推荐文章于 2025-08-13 10:59:24 发布

原创最新推荐文章于 2025-08-13 10:59:24 发布 · 578 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#神经网络 #深度学习 #算法

深度学习同时被 3 个专栏收录

21 篇文章

订阅专栏

动手学深度学习

10 篇文章

订阅专栏

MXNet

8 篇文章

订阅专栏

本文详细介绍并实现了一个含单隐藏层的双向循环神经网络（RNN）模型，通过《动手学深度学习》一书的练习题进行深入解析。文中不仅讲解了双向RNN的数学原理，还提供了具体代码实现，展示了如何利用MXNet搭建和训练模型。

MXNet双向循环神经网络----单个隐藏层的双向循环神经网络（程序）

《动手学深度学习》第六章第10节的练习题，个人解答。

下图演示了一个含单隐藏层的双向循环神经网络的架构。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LfdILbao-1587711783753)(../img/birnn.svg)]

下面我们来介绍具体的定义。
给定时间步 $t$ 的小批量输入 $Xt∈Rn×d\boldsymbol{X}_t \in \mathbb{R}^{n \times d}$ （样本数为 $n$ ，输入个数为 $d$ ）和隐藏层激活函数为 $ϕ\phi$ 。在双向循环神经网络的架构中，
设该时间步正向隐藏状态为 $H→t∈Rn×h\overrightarrow{\boldsymbol{H}}_t \in \mathbb{R}^{n \times h}$ （正向隐藏单元个数为 $h$ ），
反向隐藏状态为 $H←t∈Rn×h\overleftarrow{\boldsymbol{H}}_t \in \mathbb{R}^{n \times h}$ （反向隐藏单元个数为 $h$ ）。我们可以分别计算正向隐藏状态和反向隐藏状态：

$\begin{aligned} \overrightarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(f)} + \overrightarrow{\boldsymbol{H}}_{t-1} \boldsymbol{W}_{hh}^{(f)} + \boldsymbol{b}_h^{(f)}),\\ \overleftarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(b)} + \overleftarrow{\boldsymbol{H}}_{t+1} \boldsymbol{W}_{hh}^{(b)} + \boldsymbol{b}_h^{(b)}), \end{aligned}$

其中权重 $Wxh(f)∈Rd×h\boldsymbol{W}_{xh}^{(f)} \in \mathbb{R}^{d \times h}$ 、 $Whh(f)∈Rh×h\boldsymbol{W}_{hh}^{(f)} \in \mathbb{R}^{h \times h}$ 、 $Wxh(b)∈Rd×h\boldsymbol{W}_{xh}^{(b)} \in \mathbb{R}^{d \times h}$ 、 $Whh(b)∈Rh×h\boldsymbol{W}_{hh}^{(b)} \in \mathbb{R}^{h \times h}$ 和偏差 $bh(f)∈R1×h\boldsymbol{b}_h^{(f)} \in \mathbb{R}^{1 \times h}$ 、 $bh(b)∈R1×h\boldsymbol{b}_h^{(b)} \in \mathbb{R}^{1 \times h}$ 均为模型参数。

然后我们连结两个方向的隐藏状态 $H→t\overrightarrow{\boldsymbol{H}}_t$ 和 $H←t\overleftarrow{\boldsymbol{H}}_t$ 来得到隐藏状态 $Ht∈Rn×2h\boldsymbol{H}_t \in \mathbb{R}^{n \times 2h}$ ，并将其输入到输出层。输出层计算输出 $Ot∈Rn×q\boldsymbol{O}_t \in \mathbb{R}^{n \times q}$ （输出个数为 $q$ ）：

$Ot=HtWhq+bq,\boldsymbol{O}_t = \boldsymbol{H}_t \boldsymbol{W}_{hq} + \boldsymbol{b}_q,$

其中权重 $Whq∈R2h×q\boldsymbol{W}_{hq} \in \mathbb{R}^{2h \times q}$ 和偏差 $bq∈R1×q\boldsymbol{b}_q \in \mathbb{R}^{1 \times q}$ 为输出层的模型参数。不同方向上的隐藏单元个数也可以不同。

小结

双向循环神经网络在每个时间步的隐藏状态同时取决于该时间步之前和之后的子序列（包括当前时间步的输入）。

练习

参考上图设计含多个隐藏层的双向循环神经网络。

import d2lzh as d2l
from mxnet import nd
from mxnet.gluon import rnn

(corpus_indices, char_to_idx, idx_to_char,
 vocab_size) = d2l.load_data_jay_lyrics()

初始化模型参数

创建公式中出现的参数，并初始化。

num_inputs, num_hiddens, num_outputs = vocab_size, 256, vocab_size
ctx = d2l.try_gpu()

def get_params():
    def _one(shape):
        return nd.random.normal(scale=0.01, shape=shape, ctx=ctx)

    def _three():
        return (_one((num_inputs, num_hiddens)),
                _one((num_hiddens, num_hiddens)),
                nd.zeros(num_hiddens, ctx=ctx))

    W_xhf, W_hhf, b_hf = _three()  # 前向参数
    W_xhb, W_hhb, b_hb = _three()  # 反向参数

    # 输出层参数
    W_hq = _one((2*num_hiddens, num_outputs))
    b_q = nd.zeros(num_outputs, ctx=ctx)
    # 附上梯度
    params = [W_xhf, W_hhf, b_hf, W_xhb, W_hhb, b_hb,
              W_hq, b_q]
    for param in params:
        param.attach_grad()
    return params

定义模型

根据两个方向的隐藏状态 $H→t\overrightarrow{\boldsymbol{H}}_t$ 和 $H←t\overleftarrow{\boldsymbol{H}}_t$ ，以及输出层计算输出 $Ot\boldsymbol{O}_t$ 的计算公式创建双向循环神经网络模型。

def init_bi_state(batch_size, num_hiddens, ctx):
    return (nd.zeros(shape=(batch_size, num_hiddens), ctx=ctx),
            nd.zeros(shape=(batch_size, num_hiddens), ctx=ctx),)

def birnn(inputs, state, params):
    [W_xhf, W_hhf, b_hf,W_xhb, W_hhb, b_hb,
    W_hq, b_q] = params
    (Hf,Hb,) = state
    outputs = []
    for X in inputs:
        Hf = nd.sigmoid(nd.dot(X, W_xhf) + nd.dot(Hf, W_hhf) + b_hf)
        Hb = nd.sigmoid(nd.dot(X, W_xhb) + nd.dot(Hb, W_hhb) + b_hb)
        
        H = nd.concat(Hf,Hb,dim=1)
        Y = nd.dot(H, W_hq) + b_q
        outputs.append(Y)
    return outputs, (Hf, Hb,)

num_epochs, num_steps, batch_size, lr, clipping_theta = 160, 35, 32, 1e2, 1e-1  #lr设小后，没输出
pred_period, pred_len, prefixes = 40, 50, ['分开', '不分开']

中间测试

[W_xhf, W_hhf, b_hf,W_xhb, W_hhb, b_hb,W_hq, b_q] = get_params()
(Hf,Hb,) = init_bi_state(batch_size, num_hiddens, ctx)
Hf,Hb,W_xhf
H = nd.concat(Hf,Hb,dim=1)
H

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
<NDArray 32x512 @gpu(0)>

训练预测

d2l.train_and_predict_rnn(birnn, get_params, init_bi_state, num_hiddens,
                          vocab_size, ctx, corpus_indices, idx_to_char,
                          char_to_idx, False, num_epochs, num_steps, lr,
                          clipping_theta, batch_size, pred_period, pred_len,
                          prefixes)

epoch 40, perplexity 153.796754, time 1.32 sec
 - 分开 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我
 - 不分开哼不能 我不能 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 
epoch 80, perplexity 33.458324, time 1.34 sec
 - 分开 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够不起 
 - 不分开始的我 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够不起 我不能够
epoch 120, perplexity 10.509801, time 1.32 sec
 - 分开 我 这里什么 不会B血 我想就这样牵着你的手 不会B血 我想就这样牵着你的手 不会B血 我想就这样
 - 不分开不 我不能再想 我不能再想 我不能再想 我不能再想 我不能再想 我不能再想 我不能再想 我不能再想 
epoch 160, perplexity 4.527863, time 1.29 sec
 - 分开的可爱女人 坏坏的让我疯狂的可爱女人 坏坏的让我疯狂的可爱女人 坏坏的让我疯狂的可爱女人 坏坏的让我
 - 不分开我爱你 一朵莫默默默离开这样打我妈妈这样牵着你的手不放开离开我 不能承受我已无处可躲可以演戏 快使用