Tensorflow创建循环神经网络_基于tensorflow建立循环神经网络的代码-优快云博客

本文链接：https://blog.youkuaiyun.com/u011415481/article/details/72757980

本文介绍了如何在TensorFlow中实现RNN细胞和创建循环神经网络，包括GRUCell的实现，以及如何构建RNN结构，如BasicLSTMCell、双向RNN和应用dropout的技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

虽然已经接触deep learning很长一段时间了，也看了很久rnn相关的代码，但是突然想用tensorflow实现一些功能的时候，突然发现丝毫没有头绪，找了一些资料，学习了一波，记录一下。

一、tensorflow实现RNN cell

tensorflow由于不同的版本改动较大，在1.0版本之后，可以使用如下语句来创建一个cell：
from tensorflow.contrib import rnn
cell_fun = rnn.GRUCell(rnn_hidden_size)

在tensorflow中，上述GRUCell的实现如下（可以在GitHub上看到源码）：

class GRUCell(RNNCell):
  """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""

  def __init__(self, num_units, input_size=None, activation=tanh):
    if input_size is not None:
      logging.warn("%s: The input_size parameter is deprecated.", self)
    self._num_units = num_units
    self._activation = activation

  @property
  def state_size(self):
    return self._num_units

  @property
  def output_size(self):
    return self._num_units

  def __call__(self, inputs, state, scope=None):
    """Gated recurrent unit (GRU) with nunits cells."""
    with vs.variable_scope(scope or "gru_cell"):
      with vs.variable_scope("gates"):  # Reset gate and update gate.
        # We start with bias of 1.0 to not reset and not update.
        r, u = array_ops.split(
            value=_linear(
                [inputs, state], 2 * self._num_units, True, 1.0, scope=scope),
            num_or_size_splits=2,
            axis=1)
        r, u = sigmoid(r), sigmoid(u)
      with vs.variable_scope("candidate"):
        c = self._activation(_linear([inputs, r * state],
                                     self._num_units, True,
                                     scope=scope))
      new_h = u * state + (1 - u) * c
    return new_h, new_h

注意到这里面有一个call函数，这个函数表示的意思就是，把类的对象可以当做函数来使用，比如上面的GRUCell这个类有个对象gru，那么我们可以直接使用 ’ gru(input, last_state) ‘；

其实一开始并不知道tensorflow中有这个，所以还自己写了一个GRU的cell，仅供参考:

# -*- coding: utf-8 -*-
# @Last Modified    : 5/23/2017 1:56 PM
# @Author  : SummmerSnow
# @Description:

import tensorflow as tf

class GRU(object):

    def __init__(self, name, input_len, hidden_len):
        self.name = name
        self.input_len = input_len
        self.hidden_len = hidden_len

    def define_param(self):
        self.W = tf.Variable("_W", self.input_len, 3*self.hidden_len)
        self.U = tf.Variable("_U", self.hidden_len, 3*self.hidden_len)
        self.B = tf.Variable("_B", 3*self.hidden_len)

    def build_net(self, input_data, last_hidden):
        xW = tf.add(tf.matmul(input_data, self.W), self.B)
        hU = tf.matmul(last_hidden, self.U)
        xw1, xw2, xw3 = tf.split(xW, 3, 1)
        hu1, hu2, hu3 = tf.split(hU, 3, 1)
        r = tf.sigmoid(xw1 + hu1)
        z = tf.sigmoid(xw2 + hu2)
        h1 = tf.tanh(xw3, r*hu3)
        h = (h1 - last_hidden) * z + last_hidden

        return h

二、tensorflow创建RNN

上一章其实只是创建了一个rnncell，那么问题就在于如何写出一个循环的神经网络，loss如何计算。【注意，这里这是在讲述如何实现RNN，假设的是已经了解RNN的原理，如果对原理还是很懂，可以看相关资料】
几种实现的方法：
[转载自： http://www.what21.com/article/b_android_1491375010268.html]
在 tensorflow 中实现 LSTM 结构的循环神经网络的前向传播过程，即使用 BasicLSTMCell：

# 定义一个 LSTM 结构，LSTM 中使用的变量会在该函数中自动被声明
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_hidden_size)

# 将 LSTM 中的状态初始化为全 0 数组，batch_size 给出一个 batch 的大小
state = lstm.zero_state(batch_size, tf.float32)

# 定义损失函数
loss = 0.0

# num_steps 表示最大的序列长度
for i in range(num_steps):
  # 在第一个时刻声明 LSTM 结构中使用的变量，在之后的时刻都需要服用之前定义好的变量
  if i>0:
    tf.get_variable_scope().reuse_variables()
  # 每一步处理时间序列中的一个时刻。将当前输入（current_input）和前一时刻状态（state）传入定义的 LSTM 结构就可以得到当前 LSTM 结构的输出 lstm_output 和更新后的状态 state
  lstm_output, state = lstm(current_input, state)

  # 将当前时刻 LSTM 结构的输出传入一个全连接层得到最后的输出
  final_output = fully_connected(lstm_output)

  # 计算当前时刻输出的损失
  loss += calc_loss(final_output, expected_output)

在 tensorflow中实现双向RNN（BiRNN），使用 MultiRNNCell：

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_hidden_size)
# 使用 MultiRNNCell 类实现深层循环网络中每一个时刻的前向传播过程，number_of_layers 表示有多少层
stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm] * number_of_layers)

state = stacked_lstm.zero_state(batch_size, tf.float32)

for i in range(len(num_steps)):
  if i>0:
    tf.get_variable_scope().reuse_variables()
  stacked_lstm_output, state = stacked_lstm(current_input, state)
  final_output = fully_connected(stacked_lstm_output)
  loss += calc_loss(final_output, expected_output)

循环神经网络 RNN 中的 dropout，使用 DropoutWrapper：

# 定义 LSTM 结构
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_hidden_size)

# 使用 DropoutWrapper 类来实现 dropout 功能，input_keep_prob 控制输出的 dropout 概率
dropout_lstm = tf.contrib.rnn.DropoutWrapper(lstm, input_keep_prob=0.5)

stacked_lstm = tf.contrib.rnn.MultiRNNCell([dropout_lstm] * number_of_layers)