tensorflow（十三）seq2seq.py文件源码解析（上）

最新推荐文章于 2024-12-14 17:05:39 发布

原创最新推荐文章于 2024-12-14 17:05:39 发布 · 835 阅读

1 ·

CC 4.0 BY-SA版权

tensorflow 专栏收录该内容

13 篇文章

订阅专栏

本文详细介绍了 TensorFlow 1.2.1 版本中 seq2seq 模型的实现原理及源码解读，包括不同类型的序列模型及其应用场景，适合希望深入了解 TensorFlow 内部机制的开发者。

一、前言

自从接触并学习tensorflow框架之后，总是会遇到很多莫名奇妙的报错信息。而网上又很少有相似的问题的解决方案。因此很久之前就想学一下tendorflow的源码，能够深层次的理解tensorflow这个框架。但是由于一些原因耽搁了。现在正式开始研究tensorflow源码，由于要参加之后的京东对话系统挑战赛，因此就从nlp部分的seq2seq开始。这里使用的tensorflow版本为1.2.1。

二、阅读源码的一些小技巧

在阅读源码文件的过程中，会发现基本上每个文件都存在大量交叉引用的现象，在阅读的过程中，可以提前了解import进来的一些文件，暂时先不用去读import进来的文件中的内容，看源码的过程中遇到不知道的函数名或类，再去import进来的其他文件中去找。我用的工具是windows下的pycharm。这里放一张分析图：

三、开始搞

seq2seq.py文件

1、介绍

seq2seq.py文件在tensorflow/contrib/legacy_seq2seq/python/ops路径下。为1.2.1以下版本的seq2seq接口，但是也封装进了1.2.1版本中。由于使用1.2.1版本以下的人也很多，因此先介绍一下这个文件。
文件的目的：在TensorFlow创建序列到序列模型的库。

2、*全部的序列模型包括：

-basic_rnn_seq2seq：
#最简单版本，输入和输出都是embedding的形式；最后一步的state vector#作为decoder的initial state；encoder和decoder用相同的RNN cell， #但不共享权值参数；

-tied_rnn_seq2seq：
#同basic_rnn_seq2seq，但是encoder和decoder共享权值参数

-embedding_rnn_seq2seq：
 #同basic_rnn_seq2seq，但输入和输出改为id的形式，函数会在内部创建分 #别用于encoder和decoder的embedding matrix
-embedding_tied_rnn_seq2seq
#同tied_rnn_seq2seq，但输入和输出改为id形式，函数会在内部创建分别  #用于encoder和decoder的embedding matrix

-embedding_attention_seq2seq：
#同embedding_rnn_seq2seq，但多了attention机制，推荐用于复杂任务。

*多任务序列到序列模型

 -one2many_rnn_seq2seq：具有多个解码器的嵌入模型

*解码器（当你编写自己的编码器时，你可以用这些来解码；
- rnn_decoder: 基于纯RNN的基本解码器。
- attention_decoder: 使用注意机制的解码器。

*损失。

 - sequence_loss:返回 average log-perplexity的序列模型的损失。
 - sequence_loss_by_example: 和上面损失函数一样，但不在所有的例子中求均值

*model_with_buckets：一种方便的带桶创建模型的功能

3、正式源码：

（1）import部分

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import copy

# We disable pylint because we need python3 compatibility.
from six.moves import xrange  # pylint: disable=redefined-builtin
from six.moves import zip  # pylint: disable=redefined-builtin

#core_rnn_cell在tensorflow/contrib/rnn/python/ops目录下。即引入rnn模块
from tensorflow.contrib.rnn.python.ops import core_rnn_cell

#引入张量元素类型的库
from tensorflow.python.framework import dtypes

#引入用来构建graph的类和函数
from tensorflow.python.framework import ops

#引入关于array操作的一些函数，下面就不一一列举了，想要了解每个import进来的文件是做什么用的，去对应的文件下看一下就知道了
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.ops import embedding_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import rnn
from tensorflow.python.ops import rnn_cell_impl
from tensorflow.python.ops import variable_scope
from tensorflow.python.util import nest

下面将按照源码中函数出现的顺序逐一介绍各个函数。

（2） _extract_argmax_and_embed（）

函数功能：得到一个提取前一个符号并嵌入它的loop_function
def _extract_argmax_and_embed(embedding,
                              output_projection=None,
                               update_embedding=True):

'''
参数：embedding：嵌入符号的张量
     output_projection=None：None或一对（W，B）。如果提供，如果提供，每个前馈输出将首先乘以W并加上B.
     update_embedding=True：布尔类型，如果为假，则梯度不会通过嵌入传播。
'''

  def loop_function(prev, _):
    if output_projection is not None:
      prev = nn_ops.xw_plus_b(prev, output_projection[0], output_projection[1])
    prev_symbol = math_ops.argmax(prev, 1)
    # Note that gradients will not propagate through the second parameter of
    # embedding_lookup.
    emb_prev = embedding_ops.embedding_lookup(embedding, prev_symbol)
    if not update_embedding:
      emb_prev = array_ops.stop_gradient(emb_prev)
    return emb_prev

  #最终返回的loop_function在之后的函数中会被用到
  return loop_function

（3）rnn_decoder（）

函数功能：RNN decoder for the sequence-to-sequence model
def rnn_decoder(decoder_inputs,
                initial_state,
                cell,
                loop_function=None,
                scope=None):
'''
参数：
decoder_inputs：是a list，其中的每一个元素表示的是t_i时刻的输入，每一时刻的输入又会有batch_size个，
每一个输入（通差是表示一个word或token）又是input_size维度的。

initial_state：初始状态，通常是encoder的ht。

cell：如果output_projection为默认的None，此时为训练模式，这时的cell加了一层OutputProjectionWrapper，
即将输出的[batch_size, output_size]转化为[batch_size,symbol]。而如果output_projection不为空，
此时的cell的输出还是[batch_size, output_size]。

loop_function: 如果loop_function有设置的话，decoder input中第一个”GO”会输入，但之后时刻的input就会被忽略，
取代的是input_ti+1 = loop_function(output_ti)。这里定义的loop_function，有2个参数，（prev,i），输出为next
'''

  with variable_scope.variable_scope(scope or "rnn_decoder"):
    state = initial_state
    outputs = []
    prev = None
    for i, inp in enumerate(decoder_inputs):
      if loop_function is not None and prev is not None:
        with variable_scope.variable_scope("loop_function", reuse=True):
          inp = loop_function(prev, i)
      if i > 0:
        variable_scope.get_variable_scope().reuse_variables()
      output, state = cell(inp, state)
      outputs.append(output)
      if loop_function is not None:
        prev = output
  return outputs, state

（4）、basic_rnn_seq2seq（）

函数功能：Basic RNN sequence-to-sequence model.
def basic_rnn_seq2seq(encoder_inputs,
                      decoder_inputs,
                      cell,
                      dtype=dtypes.float32,
                      scope=None):
  """
  这一部分具体描述就看英文的吧，更通俗易懂一些
  This model first runs an RNN to encode encoder_inputs into a state vector,
  then runs decoder, initialized with the last encoder state, on decoder_inputs.
  Encoder and decoder use the same RNN cell type, but don't share parameters.

  Args:
    encoder_inputs: A list of 2D Tensors [batch_size x input_size].
    decoder_inputs: A list of 2D Tensors [batch_size x input_size].
    cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
    dtype: The dtype of the initial state of the RNN cell (default: tf.float32).
    scope: VariableScope for the created subgraph; default: "basic_rnn_seq2seq".

  Returns:
    #一个由output和state构成的元组
    A tuple of the form (outputs, state), where:
      outputs: A list of the same length as decoder_inputs of 2D Tensors with
        shape [batch_size x output_size] containing the generated outputs.
      state: The state of each decoder cell in the final time-step.
        It is a 2D Tensor of shape [batch_size x cell.state_size].
  """
  with variable_scope.variable_scope(scope or "basic_rnn_seq2seq"):
    enc_cell = copy.deepcopy(cell)
    _, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
    return rnn_decoder(decoder_inputs, enc_state, cell)

由于篇幅原因，后面的API介绍将在下篇文章中给出。由于本人水平有限，文中难免有出错的地方，还望大家指正，谢谢大家。