(未完待续)
准备工作
我们将会训练一个RNN用于语言方面,目标是给出一系列单词,然后预测下一个单词。为此,我们使用专门衡量这些模型好坏的标准数据:PTB数据。它的数据量比较小并且训练起来相对较快。
PTB数据集已经预处理并含有整体10000个不同的词,包括结束句子的标记和用于罕见词语的特殊符号(\ < UNK>)。
为了更容易处理数据,在 reader.py
中,我们将每个单词转换成唯一整数标识符。
代码 | 功能 |
---|---|
ptb_word_lm.py | 使用PTB数据集训练模型代码 |
reader.py | 读取数据代码 |
点击这里下载数据。
构建模型
1. LSTM
模型的核心由一个LSTM单元组成,该单元每次处理一个单词并计算句子中下一个单词的可能值的概率。LSTM单元状态使用零向量初始化并在读入单词时进行更新。出于计算原因,我们将以 batch_size
的小批量处理数据,每一批的每个词都对应着一个时间 t ,TensorFlow将会自动计算每一批的梯度和。
例如:
t=0 t=1 t=2 t=3 t=4
[The, brown, fox, is, quick]
[The, red, fox, jumped, high]
words_in_dataset[0] = [The, The]
words_in_dataset[1] = [brown, red]
words_in_dataset[2] = [fox, fox]
words_in_dataset[3] = [is, jumped]
words_in_dataset[4] = [quick, high]
batch_size = 2, time_steps = 5
伪代码如下:
words_in_dataset = tf.placeholder(tf.float32, [time_steps, batch_size, num_features ])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
hidden_state = tf.zeros([batch_size, lstm.state_size])
current_state = tf.zeros([batch_size, lstm.state_size])
state = hidden_state, current_state
probabilities = []
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities.append(tf.nn.softmax(logits))
loss += loss_function(probabilities, target_words)
2. 截断反向传播
通过设计,RNN的输出依赖任意距离的输入。然而,这使得BP算法难以计算。为了使学习过程易于处理,通常会创建一个“展开”版本的网络,其中包含固定数量(num_steps)的LSTM输入和输出。这可以通过一次输入长度为 num_steps 的输入并在每个这样的输入之后进行反向传递来实现。
下面是创建执行截断后向传播的图的简化代码:
# Placeholder for the inputs in a given iteration.
words = tf.placeholder(tf.int32, [batch_size, num_steps])
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
# The rest of the code.
# ...
final_state = state
对全部数据实现迭代的代码:
# A numpy array holding the state of LSTM after each batch of words.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
numpy_state, current_loss = session.run([final_state, loss],
# Initialize the LSTM state from the previous iteration.
feed_dict={initial_state: numpy_state, words: current_batch_of_words})
total_loss += current_loss