循环神经网络实例2：RNN改进

最新推荐文章于 2024-07-14 11:02:46 发布

原创最新推荐文章于 2024-07-14 11:02:46 发布 · 1.3k 阅读

8 ·

CC 4.0 BY-SA版权

深度学习专栏收录该内容

64 篇文章

订阅专栏

博客指出RNN因激活函数在处理复杂问题时有缺陷，无法学习长序列特征。介绍了RNN的变体如LSTM、Bi - RNN等，还提及语音辨识关键技术CTC。阐述了TensorFlow中RNN的构建方法，包括静态、动态、双向RNN，以及用RNN对MNIST分类实例，最后介绍RNN特有的优化方法。

部署运行你感兴趣的模型镜像

对于相对较复杂的问题，这种RNN便会显出其缺陷，原因还是出在激活函数。通常来讲，激活函数在神经网络里最多只能6层左右，因为它的反向误差传递会随着层数的增加，传递的误差值越来越小，而在RNN中，误差传递不仅存在于层与层之间，也在存于每一层的样本序列间，所以RNN无法去学习太长的序列特征。

于是，神经网络学科中又演化了许多RNN网络的变体版本，使得模型能够学习更长的序列特征。

长短记忆的时间递归神经网络（Long Short Term Memory, LSTM）

窥视孔连接（Peephole）的出现是为了弥补忘记门一个缺点：当前cell的状态不能影响到Input Gate, Forget Gate在下一时刻的输出，使整个cell对上个序列的处理丢失了部分信息。如下图虚线部分，计算的顺序为：

（1）上一时刻从cell输出的数据，随着本次时刻的数据一起输入Input Gate和Forget Gate。

（2）将输入门和忘记门的输出数据同时输入cell中。

（3）cell出来的数据输入到当前时刻的Output Gate，也输入到下一时刻的input gate，forget gate。

（4）Forget Gate输出的数据与cell激活后的数据一起作为整个Block的输出。

Bi-RNN采用了两个方向的RNN网络

基于神经网络的时序类分类CTC是语音辨识中的一个关键技术，通过增加一个额外的Symbol代表NULL来解决叠字问题。

该方法主要体现在处理loss值上，通过对序列对不上的label添加blank（空label）的方式，将预测的输出值与给定的label值在时间序列上对齐，通过交叉熵的算法求出具体损失值。

比如在语音识别的例子中，对于一句语音有它的序列值级对应的文本，可以使用CTC的损失函数求出模型输出与label之间的loss，再通过优化器的迭代训练让损失值变小的方式将模型训练出来。

TensorFlow中的RNN

定义好cell类之后，还需要将它们连接起来构成RNN网络。

1、静态RNN构建：static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)

cell：生成好的cell类对象
inputs：输入数据，一定是list或者二维张量，list的顺序就是时间序列。元素就是每一个序列的值。
initial_state：初始化cell状态
dtype：期望输出和初始化state的类型。
sequence_length：每一个输入的序列长度。
scope：命名空间
返回值有两个，一个是结果，一个是cell状态，输入多少个时序，结果就会输出多少个元素

2、动态RNN构建：dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, sequence_length, time_major=False, scope=None)

cell：生成好的cell类对象
inputs：输入数据为张量，一般是三维，[batch_size, max_time, ...]
initial_state：初始化cell状态
dtype：期望输出和初始化state的类型
sequence_length：每一个输入的序列长度
time_major：默认False, input的shape为[batch_size, max_time, ...]。如果是True，shape为[max_time, batch_size, ...]
scope：命名空间
返回值：一个是结果，[batch_size, max_time, ...]，一个是cell状态

3、双向RNN构建：有4个函数可以使用

4、使用动态RNN处理变长序列

动态RNN还有个更高级的功能就是可以处理变长序列，方法就是：在准备样本的同时，将样本对应的长度也作为初始化参数，一起创建动态RNN

实例：使用RNN对MNIST分类

import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)

n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10  # MNIST 列别 (0-9 ，一共10类)

tf.reset_default_graph()

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])


x1 = tf.unstack(x, n_steps, 1)

#1 BasicLSTMCell
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)

#4 创建动态RNN
#outputs,_  = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
#outputs = tf.transpose(outputs, [1, 0, 2])

pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 启动session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # 计算批次数据的准确率
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print (" Finished!")

    # 计算准确率 for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

优化RNN

RNN的优化技巧有很多，这里介绍RNN特有的两个优化方法

1、dropout功能：RNN有自己的dropout，lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob)

从t-1时刻的状态传递到t时刻进行计算，这中间不进行memory的dropout，仅在同一个t时刻中，多层cell之间传递信息时进行dropout。所以RNN的dropout方法会有两个设置参数input_keep_prob（传入cell的保留率）和output_keep_prob（输出cell的保留率）

2、LN基于层的归一化：由于RNN的特殊结构，它的输入不同于前面所讲的全连接、卷积网络。

在BN中，每一层的输入只考虑当前批次样本（或批次样本的转化值）即可。

但是在RNN中，每一层的输入除了当前批次样本的转化值，还得考虑样本中上一个序列样本的输出值，所以对于RNN的归一化，BN算法不再使用，最小批次覆盖不了全部的输入数据，而是需要对于输入BN的某一层来做归一化，即layer-Normalization。

import numpy as np
import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)

from tensorflow.python.ops.rnn_cell_impl import _RNNCell as RNNCell
from tensorflow.python.ops.math_ops import sigmoid
from tensorflow.python.ops.math_ops import tanh
from tensorflow.python.ops import variable_scope as vs
from tensorflow.python.ops import array_ops
from tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl import _linear 

print(tf.__version__)

tf.reset_default_graph()

def ln(tensor, scope = None, epsilon = 1e-5):
    """ Layer normalizes a 2D tensor along its second axis """
    assert(len(tensor.get_shape()) == 2)
    m, v = tf.nn.moments(tensor, [1], keep_dims=True)
    if not isinstance(scope, str):
        scope = ''
    with tf.variable_scope(scope + 'layer_norm'):
        scale = tf.get_variable('scale',
                                shape=[tensor.get_shape()[1]],
                                initializer=tf.constant_initializer(1))
        shift = tf.get_variable('shift',
                                shape=[tensor.get_shape()[1]],
                                initializer=tf.constant_initializer(0))
    LN_initial = (tensor - m) / tf.sqrt(v + epsilon)

    return LN_initial * scale + shift

class LNGRUCell(RNNCell):
    """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""

    def __init__(self, num_units, input_size=None, activation=tanh):
        if input_size is not None:
            print("%s: The input_size parameter is deprecated." % self)
        self._num_units = num_units
        self._activation = activation

    @property
    def state_size(self):
        return self._num_units

    @property
    def output_size(self):
        return self._num_units

    def __call__(self, inputs, state):
        """Gated recurrent unit (GRU) with nunits cells."""
        with vs.variable_scope("Gates"):  # Reset gate and update gate.,reuse=True
            # We start with bias of 1.0 to not reset and not update.
            value =_linear([inputs, state], 2 * self._num_units, True, 1.0)
            r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
            r = ln(r, scope = 'r/')
            u = ln(u, scope = 'u/')
            r, u = sigmoid(r), sigmoid(u)
        with vs.variable_scope("Candidate"):
#            with vs.variable_scope("Layer_Parameters"):
            Cand = _linear([inputs,  r *state], self._num_units, True)
            c_pre = ln(Cand,  scope = 'new_h/')
            c = self._activation(c_pre)
        new_h = u * state + (1 - u) * c
        return new_h, new_h

n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10  # MNIST 列别 (0-9 ，一共10类)

tf.reset_default_graph()

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

x1 = tf.unstack(x, n_steps, 1)

#1 BasicLSTMCell
#lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)

#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
gru = LNGRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)

#4 创建动态RNN
outputs,_  = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])

pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 启动session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # 计算批次数据的准确率
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print (" Finished!")

    # 计算准确率 for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

您可能感兴趣的与本文相关的镜像