Neural Networks and Deep Learning CH6

最新推荐文章于 2024-10-31 13:48:39 发布

AlmostFree

最新推荐文章于 2024-10-31 13:48:39 发布

阅读量821

点赞数 1

分类专栏： Machine Learning 文章标签：深度学习

本文链接：https://blog.youkuaiyun.com/u013508213/article/details/53122667

版权

Machine Learning 专栏收录该内容

31 篇文章

订阅专栏

本文详细介绍了卷积神经网络(CNN)的基本原理和技术特点，包括局部感受野、权重共享和池化层等核心概念。并通过实际案例展示了CNN在图像识别领域的应用，最后提供了CNN的Python实现代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Introducing convolutional networks
Convolutional neural networks in practice
The code for our convolutional networks
Recent progress in image recognition
Other approaches to deep neural nets
On the future of neural networks

这一章提出了一些训练DNN的技术并用python实现（主要就是CNN）；拓展了一下最近在image recognition，speech recognition还有其他应用中使用的DL技术；最后总结展望了一下神经网络和AI的未来。

Introducing convolutional networks

之前的手写识别，用了如下的层间全连接的神经网络：
这里写图片描述
具体来说，输入层有 $28*28$ 个神经元，对应着 $784(=28*28)$ 个像素；训练这个神经网络的biases和weights，使得这个网络正确地判别数字。
之前的全连接神经网络已经可以做到98%的正确率，然而用全连接的神经网络很片面，因为没有考虑到图片的空间结构(spatial structure of the images)。比如把两个相邻的像素点和距离相差较远的像素点等同考虑了。
CNN(convolutional neural networks)就是利用了图片空间结构优点的一种结构。这种结构加快了网路的训练速度，目前常用在图片识别上。
CNN包括三个部分：local receptive fields，shared weights，pooling。

Local receptive fields

在CNN中，图像输入是一个二维数组。如下 $28*28$ 的数组：
这里写图片描述
和之前一样将每个点作为一个神经元组成输入层。与之前不同的是每个神经元不与第一个隐藏层的每个神经元相连，而是第一个隐藏层的每个神经元连接一个小范围的输入神经元。如下例子是连接一个 $5*5$ 的小范围：
这里写图片描述
这个范围称为local receptive field。每条连接的边都有一个weight，每个隐藏神经元也有一个bias。
每个field都采取这样的方式，可以得到第一个隐藏层的第一个神经元：

第二个神经元：

以此类推，构建出完整的第一个隐藏层。
如上例子，如果我们用的是一个 $28*28$ 的输入图像， $5*5$ 的local receptive fields，我们就可以得到一个 $24*24(=28-5+1))$ 的隐藏层了。
上面的例子一次只移动了一个像素点来构建local receptive fields，有时可以移动两个或更多个来构建。

Shared weights and biases

上面例子中，每个隐藏层的神经元有 $1$ 个bias和 $5*5$ 个weights。这里的 $24*24$ 个隐藏层的神经元，都用相同的weights和bias。换句话说，对于第 $j,k$ 个神经元，输出为：
这里写图片描述
其中 $\sigma$ 为激活函数， $b$ 为shared bias， $w_{l,m}$ 为 $5*5$ 的shared weights， $a_{x,y}$ 为输入。
这说明第一个隐藏层的所有神经元检测的是同样的特征。由于这个原因，我们把输入层到隐藏层的映射(map)称为feature map。把其weight称为shared weight，同理shared bias。把shared weight和shared bias同时称做kernel或filter。
为了做到图像识别，我们需要用到多个feature map，所以一个完整的convolutional层用到了多个feature map：
这里写图片描述
上面例子中有3个feature maps，每个定义了一个 $5*5$ 的shared weights，还包含一个shared bias。这个网络可以检测3个不同的特征(features)。

share weights和biases的最大优点在于相比fully-connected模型大量缩减了参数的数量，这将使得训练速度变快。

最后一点，之所以把这样的结构称为卷积，是因为式(125)长的很像卷积的形式。

对于卷积的理解，记得数字信号处理老师给画的波浪叠加；还有信号系统老师给比喻的连锁敲钟，面积平移法求卷积。对比过来，这个式子的确有平移一个矩形，覆盖求和的味道。

Pooling layers

Pooling layers常常跟在卷积层之后，其作用是简化了卷积层的输出信息。
具体来说，pooling layer把每个卷积层的feature map的输出浓缩成一个更小的feature map。
比如下面的例子，把每 $2*2$ 个神经元取最大的输出作为结果浓缩，这样的方法称为max-polling：
这里写图片描述
原来我们的神经元数量为 $24*24$ ，目前只剩下 $12*12$ 个了。

将上述的内容组合起来，便得到了如下结构：
这里写图片描述
对于max-pooling可以这样理解：网络询问一个给定的feature是否在图片中的任意一个地方能找到，pooling给出确切的能找到的地点。

常用的pooling方法有：max-pooling，L2 pooling（取 $2*2$ 矩阵的和的平方根）。

Put it all together

最后加入10个输出神经元，代表“1 2…10”，完整的神经网络如下：
这里写图片描述
注意输出层和池化层是fully-connected，上图简化了。

这个结构的目的与之前的结构相同：使用数据训练除weights和biases来使得网络能很好的分类输入数据。训练方法还是用backpropagation和SGD，但细节有改变。

以下这个问题暂时先留着，之后再推。
Problem：Backpropagation in a convolutional network
The core equations of backpropagation in a network with fully-connected layers are (BP1)-(BP4) (link). Suppose we have a network containing a convolutional layer, a max-pooling layer, and a fully-connected output layer, as in the network discussed above. How are the equations of backpropagation modified?

Convolutional neural networks in practice

这节主要介绍了CNN怎么应用在手写识别上那个例子上。具体代码在下一节，这一节把其当作库函数来做实验和演示，并直接选用Theano库来简化CNN中的backpropagation。

实验中用到了之前提到过的各种方法，比如更换激活函数，softmax，dropout等等。

这里解释了上一章提出的问题：存在vanish和explode问题，为什么我们能训练？
答案是并没有避开这些问题，而是使用了一些方法来优化：

使用卷积层能大大降低这些层的参数数量，使得学习变得简单。
使用强大的regularization技术(dropout,convolutional layers)来减小过拟合。
使用ReLu代替sigmoid，加快学习速度。
使用GPUs。

还有其他的一些细节：充足的数据来避免过拟合；正确的cost function来避免学习过慢；好的weight初始化方式来避免学习过慢和神经元饱和；以及人为扩展训练数据等等。

The code for our convolutional networks

完整代码如下，细节还没打磨。

"""network3.py
~~~~~~~~~~~~~~

A Theano-based program for training and running simple neural
networks.

Supports several layer types (fully connected, convolutional, max
pooling, softmax), and activation functions (sigmoid, tanh, and
rectified linear units, with more easily added).

When run on a CPU, this program is much faster than network.py and
network2.py.  However, unlike network.py and network2.py it can also
be run on a GPU, which makes it faster still.

Because the code is based on Theano, the code is different in many
ways from network.py and network2.py.  However, where possible I have
tried to maintain consistency with the earlier programs.  In
particular, the API is similar to network2.py.  Note that I have
focused on making the code simple, easily readable, and easily
modifiable.  It is not optimized, and omits many desirable features.

This program incorporates ideas from the Theano documentation on
convolutional neural nets (notably,
http://deeplearning.net/tutorial/lenet.html ), from Misha Denil's
implementation of dropout (https://github.com/mdenil/dropout ), and
from Chris Olah (http://colah.github.io ).

"""

#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv
from theano.tensor.nnet import softmax
from theano.tensor import shared_randomstreams
from theano.tensor.signal import downsample

# Activation functions for neurons
def linear(z): return z
def ReLU(z): return T.maximum(0.0, z)
from theano.tensor.nnet import sigmoid
from theano.tensor import tanh


#### Constants
GPU = True
if GPU:
    print "Trying to run under a GPU.  If this is not desired, then modify "+\
        "network3.py\nto set the GPU flag to False."
    try: theano.config.device = 'gpu'
    except: pass # it's already set
    theano.config.floatX = 'float32'
else:
    print "Running with a CPU.  If this is not desired, then the modify "+\
        "network3.py to set\nthe GPU flag to True."

#### Load the MNIST data
def load_data_shared(filename="../data/mnist.pkl.gz"):
    f = gzip.open(filename, 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()
    def shared(data):
        """Place the data into shared variables.  This allows Theano to copy
        the data to the GPU, if one is available.

        """
        shared_x = theano.shared(
            np.asarray(data[0], dtype=theano.config.floatX), borrow=True)
        shared_y = theano.shared(
            np.asarray(data[1], dtype=theano.config.floatX), borrow=True)
        return shared_x, T.cast(shared_y, "int32")
    return [shared(training_data), shared(validation_data), shared(test_data)]

#### Main class used to construct and train networks
class Network(object):

    def __init__(self, layers, mini_batch_size):
        """Takes a list of `layers`, describing the network architecture, and
        a value for the `mini_batch_size` to be used during training
        by stochastic gradient descent.

        """
        self.layers = layers
        self.mini_batch_size = mini_batch_size
        self.params = [param for layer in self.layers for param in layer.params]
        self.x = T.matrix("x")
        self.y = T.ivector("y")
        init_layer = self.layers[0]
        init_layer.set_inpt(self.x, self.x, self.mini_batch_size)
        for j in xrange(1, len(self.layers)):
            prev_layer, layer  = self.layers[j-1], self.layers[j]
            layer.set_inpt(
                prev_layer.output, prev_layer.output_dropout, self.mini_batch_size)
        self.output = self.layers[-1].output
        self.output_dropout = self.layers[-1].output_dropout

    def SGD(self, training_data, epochs, mini_batch_size, eta,
            validation_data, test_data, lmbda=0.0):
        """Train the network using mini-batch stochastic gradient descent."""
        training_x, training_y = training_data
        validation_x, validation_y = validation_data
        test_x, test_y = test_data

        # compute number of minibatches for training, validation and testing
        num_training_batches = size(training_data)/mini_batch_size
        num_validation_batches = size(validation_data)/mini_batch_size
        num_test_batches = size(test_data)/mini_batch_size

        # define the (regularized) cost function, symbolic gradients, and updates
        l2_norm_squared = sum([(layer.w**2).sum() for layer in self.layers])
        cost = self.layers[-1].cost(self)+\
               0.5*lmbda*l2_norm_squared/num_training_batches
        grads = T.grad(cost, self.params)
        updates = [(param, param-eta*grad)
                   for param, grad in zip(self.params, grads)]

        # define functions to train a mini-batch, and to compute the
        # accuracy in validation and test mini-batches.
        i = T.lscalar() # mini-batch index
        train_mb = theano.function(
            [i], cost, updates=updates,
            givens={
                self.x:
                training_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                training_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        validate_mb_accuracy = theano.function(
            [i], self.layers[-1].accuracy(self.y),
            givens={
                self.x:
                validation_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                validation_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        test_mb_accuracy = theano.function(
            [i], self.layers[-1].accuracy(self.y),
            givens={
                self.x:
                test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
                self.y:
                test_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        self.test_mb_predictions = theano.function(
            [i], self.layers[-1].y_out,
            givens={
                self.x:
                test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
            })
        # Do the actual training
        best_validation_accuracy = 0.0
        for epoch in xrange(epochs):
            for minibatch_index in xrange(num_training_batches):
                iteration = num_training_batches*epoch+minibatch_index
                if iteration % 1000 == 0:
                    print("Training mini-batch number {0}".format(iteration))
                cost_ij = train_mb(minibatch_index)
                if (iteration+1) % num_training_batches == 0:
                    validation_accuracy = np.mean(
                        [validate_mb_accuracy(j) for j in xrange(num_validation_batches)])
                    print("Epoch {0}: validation accuracy {1:.2%}".format(
                        epoch, validation_accuracy))
                    if validation_accuracy >= best_validation_accuracy:
                        print("This is the best validation accuracy to date.")
                        best_validation_accuracy = validation_accuracy
                        best_iteration = iteration
                        if test_data:
                            test_accuracy = np.mean(
                                [test_mb_accuracy(j) for j in xrange(num_test_batches)])
                            print('The corresponding test accuracy is {0:.2%}'.format(
                                test_accuracy))
        print("Finished training network.")
        print("Best validation accuracy of {0:.2%} obtained at iteration {1}".format(
            best_validation_accuracy, best_iteration))
        print("Corresponding test accuracy of {0:.2%}".format(test_accuracy))

#### Define layer types

class ConvPoolLayer(object):
    """Used to create a combination of a convolutional and a max-pooling
    layer.  A more sophisticated implementation would separate the
    two, but for our purposes we'll always use them together, and it
    simplifies the code, so it makes sense to combine them.

    """

    def __init__(self, filter_shape, image_shape, poolsize=(2, 2),
                 activation_fn=sigmoid):
        """`filter_shape` is a tuple of length 4, whose entries are the number
        of filters, the number of input feature maps, the filter height, and the
        filter width.

        `image_shape` is a tuple of length 4, whose entries are the
        mini-batch size, the number of input feature maps, the image
        height, and the image width.

        `poolsize` is a tuple of length 2, whose entries are the y and
        x pooling sizes.

        """
        self.filter_shape = filter_shape
        self.image_shape = image_shape
        self.poolsize = poolsize
        self.activation_fn=activation_fn
        # initialize weights and biases
        n_out = (filter_shape[0]*np.prod(filter_shape[2:])/np.prod(poolsize))
        self.w = theano.shared(
            np.asarray(
                np.random.normal(loc=0, scale=np.sqrt(1.0/n_out), size=filter_shape),
                dtype=theano.config.floatX),
            borrow=True)
        self.b = theano.shared(
            np.asarray(
                np.random.normal(loc=0, scale=1.0, size=(filter_shape[0],)),
                dtype=theano.config.floatX),
            borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape(self.image_shape)
        conv_out = conv.conv2d(
            input=self.inpt, filters=self.w, filter_shape=self.filter_shape,
            image_shape=self.image_shape)
        pooled_out = downsample.max_pool_2d(
            input=conv_out, ds=self.poolsize, ignore_border=True)
        self.output = self.activation_fn(
            pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        self.output_dropout = self.output # no dropout in the convolutional layers

class FullyConnectedLayer(object):

    def __init__(self, n_in, n_out, activation_fn=sigmoid, p_dropout=0.0):
        self.n_in = n_in
        self.n_out = n_out
        self.activation_fn = activation_fn
        self.p_dropout = p_dropout
        # Initialize weights and biases
        self.w = theano.shared(
            np.asarray(
                np.random.normal(
                    loc=0.0, scale=np.sqrt(1.0/n_out), size=(n_in, n_out)),
                dtype=theano.config.floatX),
            name='w', borrow=True)
        self.b = theano.shared(
            np.asarray(np.random.normal(loc=0.0, scale=1.0, size=(n_out,)),
                       dtype=theano.config.floatX),
            name='b', borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape((mini_batch_size, self.n_in))
        self.output = self.activation_fn(
            (1-self.p_dropout)*T.dot(self.inpt, self.w) + self.b)
        self.y_out = T.argmax(self.output, axis=1)
        self.inpt_dropout = dropout_layer(
            inpt_dropout.reshape((mini_batch_size, self.n_in)), self.p_dropout)
        self.output_dropout = self.activation_fn(
            T.dot(self.inpt_dropout, self.w) + self.b)

    def accuracy(self, y):
        "Return the accuracy for the mini-batch."
        return T.mean(T.eq(y, self.y_out))

class SoftmaxLayer(object):

    def __init__(self, n_in, n_out, p_dropout=0.0):
        self.n_in = n_in
        self.n_out = n_out
        self.p_dropout = p_dropout
        # Initialize weights and biases
        self.w = theano.shared(
            np.zeros((n_in, n_out), dtype=theano.config.floatX),
            name='w', borrow=True)
        self.b = theano.shared(
            np.zeros((n_out,), dtype=theano.config.floatX),
            name='b', borrow=True)
        self.params = [self.w, self.b]

    def set_inpt(self, inpt, inpt_dropout, mini_batch_size):
        self.inpt = inpt.reshape((mini_batch_size, self.n_in))
        self.output = softmax((1-self.p_dropout)*T.dot(self.inpt, self.w) + self.b)
        self.y_out = T.argmax(self.output, axis=1)
        self.inpt_dropout = dropout_layer(
            inpt_dropout.reshape((mini_batch_size, self.n_in)), self.p_dropout)
        self.output_dropout = softmax(T.dot(self.inpt_dropout, self.w) + self.b)

    def cost(self, net):
        "Return the log-likelihood cost."
        return -T.mean(T.log(self.output_dropout)[T.arange(net.y.shape[0]), net.y])

    def accuracy(self, y):
        "Return the accuracy for the mini-batch."
        return T.mean(T.eq(y, self.y_out))


#### Miscellanea
def size(data):
    "Return the size of the dataset `data`."
    return data[0].get_value(borrow=True).shape[0]

def dropout_layer(layer, p_dropout):
    srng = shared_randomstreams.RandomStreams(
        np.random.RandomState(0).randint(999999))
    mask = srng.binomial(n=1, p=1-p_dropout, size=layer.shape)
    return layer*T.cast(mask, theano.config.floatX)