古诗创作这个项目已经进行了一大半了,在实施过程中遇到了各种各样的问题,虽然很难,但是的确加深了我对许多tf函数的理解。
tf.contrib.seq2seq.TrainingHelper和tf.contrib.seq2seq.GreedyEmbeddingHelper
这两者分别用于训练阶段和最后的测试阶段。两者的主要作用都是为decoder的输入做准备,在训练时使用TrainingHelper,这个方法,使得每一时刻decoder的输入,是真实序列该时刻的前一时刻的值,比如古诗的上句“白日依山尽”,下句“黄河入海流”,那么当decoder在进行迭代时,输入分别是[0,‘黄’,‘河’,‘入’,‘海‘’,‘流‘’]。而使用了GreedyEmbeddingHelper,他就不会用到真实的decoder-seq,而是将每一次decoder得到的id进行embedding后输入到新一轮的decoder中迭代。
两者的具体代码参考这里
正是因为tf.contrib.seq2seq.GreedyEmbeddingHelper没有使用真实的decoder序列,因此要注意他的第一位驶入并不是decoder-seq,而是用于embedding的矩阵。
tf.contrib.seq2seq.TrainingHelper
def next_inputs(self, time, outputs, state, name=None, **unused_kwargs):
with ops.name_scope(name, "TrainingHelperNextInputs", [time, outputs, state]):
next_time = time + 1
finished = (next_time >= self._sequence_length)
all_finished = math_ops.reduce_all(finished)
#直接从decode_inputs中读取下一个值作为下一时刻的解码输入
def read_from_ta(inp):
return inp.read(next_time)
next_inputs = control_flow_ops.cond(
all_finished, lambda: self._zero_inputs,
lambda: nest.map_structure(read_from_ta, self._input_tas))
return (finished, next_inputs, state)
tf.contrib.seq2seq.GreedyEmbeddingHelper
def next_inputs(self, time, outputs, state, sample_ids, name=None):
del time, outputs # unused by next_inputs_fn
finished = math_ops.equal(sample_ids, self._end_token)
all_finished = math_ops.reduce_all(finished)
#将sample_ids通过embedding转化成下一时刻输入的词向量
next_inputs = control_flow_ops.cond(
all_finished,
# If we're finished, the next_inputs value doesn't matter
lambda: self._start_inputs,
lambda: self._embedding_fn(sample_ids))
return (finished, next_inputs, state)
从代码中看出,前者和后者每一时刻的输出是什么,同时注意第一个输出,前者是0,后者是start—token
train_decoder和infer_decoder
训练的decoder和用来预测的decoder应该是相同的,因此需要做变量的共享
def decoder(in_seq_len, target_seq, target_seq_len,
encoder_state, num_units, layers, output_size,embedding):
projection_layer=tf.layers.Dense(output_size)
decoder_cell = getLayeredCell(layers, num_units)
with tf.variable_scope("decoder"):
helper = tf.contrib.seq2seq.TrainingHelper(target_seq, target_seq_len, time_major=False)
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, helper, encoder_state, output_layer=projection_layer)
outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder,impute_finished=True,maximum_iterations=20)
with tf.variable_scope("decoder",reuse = True):
batch_size = tf.shape(in_seq_len)[0]
start_tokens = tf.tile(tf.constant([2],dtype=tf.int32),[batch_size],name='start_token')
infer_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding,start_tokens ,3)
infer_decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, infer_helper, encoder_state, output_layer=projection_layer)
infer_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(infer_decoder,impute_finished=True,maximum_iterations=20)
return outputs.rnn_output,infer_outputs,outputs.sample_id
如上代码,使用了with tf.variable_scope(“decoder”,reuse = True):来保证参数的共享。
预测时没有结果
在训练不充分的情况下,直接做预测会发现无法输出结果,这其实是因为训练不充分导致无法得到句子的结束符,因此decoder会一直循环导致无法输出结果,所以需要在代码中约定dynamic——decoder的最大迭代次数。如下:
tf.contrib.seq2seq.dynamic_decode(infer_decoder,impute_finished=True,maximum_iterations=20)
显卡被占用
训练过程中有时会提示一堆错误(具体的忘记了),是由于显卡被其他东西占用了导致的,关掉游戏基本就可以。