【深度学习系列（三）】：基于CNN+seq2seq公式识别系统实现（3）

wxplol

已于 2022-06-14 09:16:46 修改

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

分类专栏：公式识别文章标签：深度学习 cnn 机器学习计算机视觉

于 2019-09-05 17:03:48 首次发布

本文链接：https://blog.youkuaiyun.com/wxplol/article/details/100033978

公式识别专栏收录该内容

10 篇文章

订阅专栏

本文深入解析了两种关键的解码算法——贪心搜索与集束搜索。贪心搜索在每一步选择当前最大概率的结果，而集束搜索则考虑多个可能性，通过设定集束宽来平衡准确性和效率。文章详细介绍了这两种算法的工作原理及其在序列生成任务中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

3、解码

3.1、贪心搜索

贪心搜索是一种来自计算机科学的算法，生成第一个词的分布以后，它将会根据你的条件语言模型挑选出最有可能的第一个词进入你的机器翻译模型中，在挑选出第一个词之后它将会继续挑选出最有可能的第二个词，然后继续挑选第三个最有可能的词，这种算法就叫做贪心搜索。这种算法就是在每一步中挑选当前最大概率的结果，并将这个结果作为下一次的输入，并得到这一步的最大概率作为这一次的结果，以此类推。该代码位于：./model/components/greedy_decoder_cell.py中。

    def step(self, time, state, embedding, finished):
        # next step of attention cell
        logits, new_state = self._attention_cell.step(embedding, state)

        # get ids of words predicted and get embedding
        # 计算当前结果的最大值的下标，logits.shape=[N,301]
        new_ids = tf.cast(tf.argmax(logits, axis=-1), tf.int32)
        new_embedding = tf.nn.embedding_lookup(self._embeddings, new_ids)

        # create new state of decoder
        new_output = DecoderOutput(logits, new_ids)

        new_finished = tf.logical_or(finished, tf.equal(new_ids, self._end_token))

        return (new_output, new_state, new_embedding, new_finished)

当然这只是获取当前输出的结果，在./model/components/dynamic_decode.py中，通过dynamic_decode()函数实现在序列中将前时间的输出提供给下一次输入，其主要通过tf.while_loop来实现循环输入。

def dynamic_decode(decoder_cell, maximum_iterations):
    """Similar to dynamic_rnn but to decode

    Args:
        decoder_cell: (instance of DecoderCell) with step method
        maximum_iterations: (int)

    """
    try:
        maximum_iterations = tf.convert_to_tensor(maximum_iterations, dtype=tf.int32)
    except ValueError:
        pass

    # create TA for outputs by mimicing the structure of decodercell output
    def create_tensor_array(d):
        return tf.TensorArray(dtype=d, size=0, dynamic_size=True)

    initial_time = tf.constant(0, dtype=tf.int32)
    initial_outputs_ta = nest.map_structure(create_tensor_array, decoder_cell.output_dtype)
    initial_state, initial_inputs, initial_finished = decoder_cell.initialize()

    def condition(time, unused_outputs_ta, unused_state, unused_inputs,
        finished):
        return tf.logical_not(tf.reduce_all(finished))

    def body(time, outputs_ta, state, inputs, finished):
        new_output, new_state, new_inputs, new_finished = decoder_cell.step(
            time, state, inputs, finished)

        outputs_ta = nest.map_structure(lambda ta, out: ta.write(time, out),
                                      outputs_ta, new_output)

        new_finished = tf.logical_or(
            tf.greater_equal(time, maximum_iterations),
            new_finished)

        return (time + 1, outputs_ta, new_state, new_inputs, new_finished)

    with tf.variable_scope("rnn"):
        res = tf.while_loop(
            condition,
            body,
            loop_vars=[initial_time, initial_outputs_ta, initial_state,
                       initial_inputs, initial_finished],
            back_prop=False)

    # get final outputs and states
    final_outputs_ta, final_state = res[1], res[2]

    # unfold and stack the structure from the nested tas
    final_outputs = nest.map_structure(lambda ta: ta.stack(), final_outputs_ta)

    # finalize the computation from the decoder cell
    final_outputs = decoder_cell.finalize(final_outputs, final_state)

    # transpose the final output
    final_outputs = nest.map_structure(transpose_batch_time, final_outputs)

    return final_outputs, final_state

参考文献：

Greedy search与beam search

3.2、beam search

集束搜索会考虑多个选择，集束搜索算法会有一个参数B，叫做集束宽（beam width）。集束搜索可以认为是维特比算法的贪心形式，在维特比所有中由于利用动态规划导致当字典较大时效率低，而集束搜索使用beam size参数来限制在每一步保留下来的可能性词的数量。集束搜索是在测试阶段为了获得更好准确性而采取的一种策略，在训练阶段无需使用。

    def step(self, time, state, embedding, finished):
        """
        Args:
            time: tensorf or int
            embedding: shape [batch_size, beam_size, d]
            state: structure of shape [bach_size, beam_size, ...]
            finished: structure of shape [batch_size, beam_size, ...]

        """
        # merge batch and beam dimension before callling step of cell
        cell_state = nest.map_structure(merge_batch_beam, state.cell_state)
        embedding = merge_batch_beam(embedding)

        # compute new logits
        logits, new_cell_state = self._cell.step(embedding, cell_state)

        # split batch and beam dimension before beam search logic
        new_logits = split_batch_beam(logits, self._beam_size)
        new_cell_state = nest.map_structure(
            lambda t: split_batch_beam(t, self._beam_size), new_cell_state)

        # compute log probs of the step
        # shape = [batch_size, beam_size, vocab_size]
        step_log_probs = tf.nn.log_softmax(new_logits)
        # shape = [batch_size, beam_size, vocab_size]
        step_log_probs = mask_probs(step_log_probs, self._end_token, finished)
        # shape = [batch_size, beam_size, vocab_size]
        log_probs = tf.expand_dims(state.log_probs, axis=-1) + step_log_probs
        log_probs = add_div_penalty(log_probs, self._div_gamma, self._div_prob,
                                    self._batch_size, self._beam_size, self._vocab_size)

        # compute the best beams
        # shape =  (batch_size, beam_size * vocab_size)
        log_probs_flat = tf.reshape(log_probs,
                                    [self._batch_size, self._beam_size * self._vocab_size])
        # if time = 0, consider only one beam, otherwise beams are equal
        log_probs_flat = tf.cond(time > 0, lambda: log_probs_flat,
                                 lambda: log_probs[:, 0])
        new_probs, indices = tf.nn.top_k(log_probs_flat, self._beam_size)

        # of shape [batch_size, beam_size]
        new_ids = indices % self._vocab_size
        new_parents = indices // self._vocab_size

        # get ids of words predicted and get embedding
        new_embedding = tf.nn.embedding_lookup(self._embeddings, new_ids)

        # compute end of beam
        finished = gather_helper(finished, new_parents,
                                 self._batch_size, self._beam_size)
        new_finished = tf.logical_or(finished,
                                     tf.equal(new_ids, self._end_token))

        new_cell_state = nest.map_structure(
            lambda t: gather_helper(t, new_parents, self._batch_size,
                                    self._beam_size), new_cell_state)

        # create new state of decoder
        new_state = BeamSearchDecoderCellState(cell_state=new_cell_state,
                                               log_probs=new_probs)

        new_output = BeamSearchDecoderOutput(logits=new_logits, ids=new_ids,
                                             parents=new_parents)

        return (new_output, new_state, new_embedding, new_finished)

参考文献：

NLP 自然语言处理集束搜索beam search和贪心搜索greedy search