Bi-LSTM

最新推荐文章于 2024-09-20 19:59:39 发布

weixin_40200315

最新推荐文章于 2024-09-20 19:59:39 发布

阅读量1.5k

点赞数

分类专栏：机器学习 python 深度学习理论文章标签： Bi-LSTM

本文链接：https://blog.youkuaiyun.com/weixin_40200315/article/details/97887473

版权

本文介绍了Bi-LSTM的工作原理，通过前后两个LSTMCell分别处理输入序列的正向和反向信息。在TensorFlow中构建Bi-LSTM网络，包括字符嵌入、LSTM层、Dropout和多层LSTM的实现，并展示了如何堆叠双向RNN层进行计算。最后，通过全连接层对Bi-LSTM的输出进行转换，用于预测任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

https://blog.youkuaiyun.com/vivian_ll/article/details/88974691
https://blog.youkuaiyun.com/jerr__y/article/details/70471066

在这里插入图片描述

Bi-LSTM大致的思路是这样的，看图中最下方的输入层，假设一个样本（句子）有10个 timestep （字）的输入 x1,x2,…,x10x1,x2,…,x10。现在有两个相互分离的 LSTMCell：

对于前向 fw_cell ，样本按照x1,x2,…,x10x1,x2,…,x10 的顺序输入 cell 中，得到第一组状态输出 {h1,h2,…,h10h1,h2,…,h10} ;
对于反向 bw_cell ，样本按照 x10,x9,…,x1x10,x9,…,x1 的反序输入 cell 中，得到第二组状态输出 {h10,h9,…,h1h10,h9,…,h1 };
得到的两组状态输出的每个元素是一个长度为 hidden_size 的向量（一般情况下，h1h1和h1h1长度相等）。现在按照下面的形式把两组状态变量拼起来{[h1h1,h1h1], [h2h2,h2h2], … , [h10h10,h10h10]}。
最后对于每个 timestep 的输入 xtxt, 都得到一个长度为 2*hidden_size 的状态输出 HtHt= [htht,htht]。然后呢，后面处理方式和单向 LSTM 一样。

def bi_lstm(X_inputs):
“”“build the bi-LSTMs network. Return the y_pred”""
*** 0.char embedding，请自行理解 embedding 的原理！！做 NLP 的朋友必须理解这个
embedding = tf.get_variable(“embedding”, [vocab_size, embedding_size], dtype=tf.float32)
X_inputs.shape = [batchsize, timestep_size] -> inputs.shape = [batchsize, timestep_size, embedding_size]
inputs = tf.nn.embedding_lookup(embedding, X_inputs)
** 1.LSTM 层 ***
lstm_fw_cell = rnn.BasicLSTMCell(hidden_size, forget_bi

最低0.47元/天解锁文章