TensorFlow自然语言处理篇--------递归（循环）神经网络RNN（LSTM模型）

最新推荐文章于 2025-07-14 14:48:22 发布

iTensor

最新推荐文章于 2025-07-14 14:48:22 发布

阅读量2k

点赞数

分类专栏： TensorFlow 深度学习文章标签： TensorFlow 深度学习 LSTM

深度学习同时被 2 个专栏收录

17 篇文章

订阅专栏

TensorFlow

8 篇文章

订阅专栏

本文介绍如何利用PTB数据集训练一个基于LSTM的语言模型来预测下一个单词，详细阐述了模型构建过程、截断反向传播等关键技术，并提供了具体的代码实现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

欢迎点击参观我的 ——> 个人学习网站

（未完待续）

准备工作

我们将会训练一个RNN用于语言方面，目标是给出一系列单词，然后预测下一个单词。为此，我们使用专门衡量这些模型好坏的标准数据：PTB数据。它的数据量比较小并且训练起来相对较快。
PTB数据集已经预处理并含有整体10000个不同的词，包括结束句子的标记和用于罕见词语的特殊符号（\ < UNK>）。
为了更容易处理数据，在 reader.py 中，我们将每个单词转换成唯一整数标识符。

代码	功能
ptb_word_lm.py	使用PTB数据集训练模型代码
reader.py	读取数据代码

点击这里下载数据。

构建模型

1. LSTM

模型的核心由一个LSTM单元组成，该单元每次处理一个单词并计算句子中下一个单词的可能值的概率。LSTM单元状态使用零向量初始化并在读入单词时进行更新。出于计算原因，我们将以 batch_size 的小批量处理数据，每一批的每个词都对应着一个时间 t ，TensorFlow将会自动计算每一批的梯度和。

例如：

t=0  t=1    t=2  t=3     t=4
[The, brown, fox, is,     quick]
[The, red,   fox, jumped, high]

words_in_dataset[0] = [The, The]
words_in_dataset[1] = [brown, red]
words_in_dataset[2] = [fox, fox]
words_in_dataset[3] = [is, jumped]
words_in_dataset[4] = [quick, high]
batch_size = 2, time_steps = 5

伪代码如下：

words_in_dataset = tf.placeholder(tf.float32, [time_steps, batch_size, num_features ])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
hidden_state = tf.zeros([batch_size, lstm.state_size])
current_state = tf.zeros([batch_size, lstm.state_size])
state = hidden_state, current_state
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities.append(tf.nn.softmax(logits))
    loss += loss_function(probabilities, target_words)

2. 截断反向传播

通过设计，RNN的输出依赖任意距离的输入。然而，这使得BP算法难以计算。为了使学习过程易于处理，通常会创建一个“展开”版本的网络，其中包含固定数量（num_steps）的LSTM输入和输出。这可以通过一次输入长度为 num_steps 的输入并在每个这样的输入之后进行反向传递来实现。

下面是创建执行截断后向传播的图的简化代码：

# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])

for i in range(num_steps):
    # The value of state is updated after processing each batch of words.
    output, state = lstm(words[:, i], state)

    # The rest of the code.
    # ...

final_state = state

对全部数据实现迭代的代码：

# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
    numpy_state, current_loss = session.run([final_state, loss],
        # Initialize the LSTM state from the previous iteration.
        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
    total_loss += current_loss