Deep learn toolbox:CNN BP求导解析

本文深入探讨了《Notes on Convolutional Neural Networks》中关于卷积神经网络(CNN)的BP过程,并结合deeplearn toolbox的源码进行解析。重点解释了卷积层与池化层之间的误差传递机制,包括不同池化方法如均值池化和最大池化在反向传播过程中的应用。同时,详细介绍了CNN BP算法在deeplearn toolbox中的具体实现,包括误差敏感项的计算与梯度的计算过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


《Notes on Convolutional Neural Networks》中详细讲解了CNN的BP过程,下面结合Deep learn toolbox中CNN的BP源码对此做一些解析

 卷积层:

       卷积层的前向传导:

                     

       误差反传:

        当卷基层的下一层是pooling层时,如果pooling层的误差敏感项为时,那么卷基层的误差敏感项为:

      

       其中,upsample表示对进行上采样,表示激活函数对输入的导数,代表点积操作

       upsample根据pooling时采用方法来定,大概思想为:pooling层的每个节点是由卷积层中多个节点(一般为一个矩形区域)共同计算得到,所以pooling层每个节点的误差敏感值也是由卷积层中多个节点的误差敏感值共同产生的,只需满足两层的误差敏感值总和相等即可,下面以mean-pooling和max-pooling为例来说明。

假设卷积层的矩形大小为4×4, pooling区域大小为2×2, 很容易知道pooling后得到的矩形大小也为2*2(本文默认pooling过程是没有重叠的,卷积过程是每次移动一个像素,即是有重叠的,后续不再声明),如果此时pooling后的矩形误差敏感值如下:

   

  则按照mean-pooling,首先得到的卷积层应该是4×4大小,其值分布为(等值复制):

   

  因为得满足反向传播时各层见误差敏感总和不变,所以卷积层对应每个值需要平摊(除以pooling区域大小即可,这里pooling层大小为2×2=4)),最后的卷积层值

分布为:

   

  如果是max-pooling,则需要记录前向传播过程中pooling区域中最大值的位置,这里假设pooling层值1,3,2,4对应的pooling区域位置分别为右下、右上、左上、左下。则此时对应卷积层误差敏感值分布为:

   

  求得后,我们就要求卷积层的导数了,论文中给出的公式为:

      

    上式中表示的(u,v)项的值,表示进行卷积的结果的(u,v)项所对应的的patch。

     上述公式和下面的公式是等价的:

    损失函数对b的导数为:

         

 deep learn toolbox就是按照上述2个公式计算的。

                                                        


 pooling层

      pooling层的前向传导:

                                       

    down(.)表示一个下采样函数。典型的操作一般是对输入图像的不同nxn的块的所有像素进行求和。这样输出图像在两个维度上都缩小了n倍。每个输出map都对应一个属于自己的乘性偏置β和一个加性偏置b

    已知卷积层的误差敏感项时,那么pooling层的误差敏感项为:

                                                   

    得到误差敏感项后,由于pooling只有两个参数,分别求导:

                                                    

                                                    

                                                   

   但在deep learn toolbox中,只是简单地进行subsampling,并没有加sigmoid激活函数,因而pooling层没有参数,不需要对pooling层求导,也不

要对其参数进行更新。

   

   下面是deep learn toolbox 中CNN BP算法的代码:

   

function net = cnnbp(net, y)  
    n = numel(net.layers);  
    //  error  
    net.e = net.o - y;  
    //  loss function  
    net.L = 1/2* sum(net.e(:) .^ 2) / size(net.e, 2);  
    //从最后一层的error倒推回来deltas  
    //和神经网络的bp有些类似  
    ////  backprop deltas  
    net.od = net.e .* (net.o .* (1 - net.o));   //  output delta  
    net.fvd = (net.ffW' * net.od);              //  feature vector delta  
    if strcmp(net.layers{n}.type, 'c')         //  only conv layers has sigm function  
        net.fvd = net.fvd .* (net.fv .* (1 - net.fv));  
    end  
    //和神经网络类似,参看神经网络的bp  
  
    //  reshape feature vector deltas into output map style  
    sa = size(net.layers{n}.a{1});  
    fvnum = sa(1) * sa(2);  
    for j = 1 : numel(net.layers{n}.a)  
        net.layers{n}.d{j} = reshape(net.fvd(((j - 1) * fvnum + 1) : j * fvnum, :), sa(1), sa(2), sa(3));  
    end  
    //这是算delta的步骤  
    //这部分的计算参看Notes on Convolutional Neural Networks,其中的变化有些复杂  
    //和这篇文章里稍微有些不一样的是这个toolbox在subsampling(也就是pooling层)没有加sigmoid激活函数  
    //所以这地方还需仔细辨别  
    //这这个toolbox里的subsampling是不用计算gradient的,而在上面那篇note里是计算了的  
    for l = (n - 1) : -1 : 1  
        if strcmp(net.layers{l}.type, 'c')  
            for j = 1 : numel(net.layers{l}.a)  
                net.layers{l}.d{j} = net.layers{l}.a{j} .* (1 - net.layers{l}.a{j}) .* (expand(net.layers{l + 1}.d{j}, [net.layers{l + 1}.scale net.layers{l + 1}.scale 1]) / net.layers{l + 1}.scale ^ 2);  
            end  
        elseif strcmp(net.layers{l}.type, 's')  
            for i = 1 : numel(net.layers{l}.a)  
                z = zeros(size(net.layers{l}.a{1}));  
                for j = 1 : numel(net.layers{l + 1}.a)  
                     z = z + convn(net.layers{l + 1}.d{j}, rot180(net.layers{l + 1}.k{i}{j}), 'full');  
                end  
                net.layers{l}.d{i} = z;  
            end  
        end  
    end  
    //参见paper,注意这里只计算了'c'层的gradient,因为只有这层有参数  
    ////  calc gradients  
    for l = 2 : n  
        if strcmp(net.layers{l}.type, 'c')  
            for j = 1 : numel(net.layers{l}.a)  
                for i = 1 : numel(net.layers{l - 1}.a)  
                    net.layers{l}.dk{i}{j} = convn(flipall(net.layers{l - 1}.a{i}), net.layers{l}.d{j}, 'valid') / size(net.layers{l}.d{j}, 3);  
                end  
                net.layers{l}.db{j} = sum(net.layers{l}.d{j}(:)) / size(net.layers{l}.d{j}, 3);  
            end  
        end  
    end  
    //最后一层perceptron的gradient的计算  
    net.dffW = net.od * (net.fv)' / size(net.od, 2);  
    net.dffb = mean(net.od, 2);  
  
    function X = rot180(X)  
        X = flipdim(flipdim(X, 1), 2);  
    end  
end  


     

    


深度学习工具包 Deprecation notice. ----- This toolbox is outdated and no longer maintained. There are much better tools available for deep learning than this toolbox, e.g. [Theano](http://deeplearning.net/software/theano/), [torch](http://torch.ch/) or [tensorflow](http://www.tensorflow.org/) I would suggest you use one of the tools mentioned above rather than use this toolbox. Best, Rasmus. DeepLearnToolbox ================ A Matlab toolbox for Deep Learning. Deep Learning is a new subfield of machine learning that focuses on learning deep hierarchical models of data. It is inspired by the human brain's apparent deep (layered, hierarchical) architecture. A good overview of the theory of Deep Learning theory is [Learning Deep Architectures for AI](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf) For a more informal introduction, see the following videos by Geoffrey Hinton and Andrew Ng. * [The Next Generation of Neural Networks](http://www.youtube.com/watch?v=AyzOUbkUf3M) (Hinton, 2007) * [Recent Developments in Deep Learning](http://www.youtube.com/watch?v=VdIURAu1-aU) (Hinton, 2010) * [Unsupervised Feature Learning and Deep Learning](http://www.youtube.com/watch?v=ZmNOAtZIgIk) (Ng, 2011) If you use this toolbox in your research please cite [Prediction as a candidate for learning deep hierarchical models of data](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284) ``` @MASTERSTHESIS\{IMM2012-06284, author = "R. B. Palm", title = "Prediction as a candidate for learning deep hierarchical models of data", year = "2012", } ``` Contact: rasmusbergpalm at gmail dot com Directories included in the toolbox ----------------------------------- `NN/` - A library for Feedforward Backpropagation Neural Networks `CNN/` - A library for Convolutional Neural Networks `DBN/` - A library for Deep Belief Networks `SAE/` - A library for Stacked Auto-Encoders `CAE/` - A library for Convolutional Auto-Encoders `util/` - Utility functions used by the libraries `data/` - Data used by the examples `tests/` - unit tests to verify toolbox is working For references on each library check REFS.md Setup ----- 1. Download. 2. addpath(genpath('DeepLearnToolbox')); Example: Deep Belief Network --------------------- ```matlab function test_example_DBN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit RBM and visualize its weights rand('state',0) dbn.sizes = [100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights %% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN rand('state',0) %train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); %unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); nn.activation_function = 'sigm'; %train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, 'Too big error'); ``` Example: Stacked Auto-Encoders --------------------- ```matlab function test_example_SAE load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]); sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts); visualize(sae.ae{1}.W{1}(:,2:end)') % Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; % Train the FFNN opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.16, 'Too big error'); ``` Example: Convolutional Neural Nets --------------------- ```matlab function test_example_CNN load mnist_uint8; train_x = double(reshape(train_x',28,28,60000))/255; test_x = double(reshape(test_x',28,28,10000))/255; train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) cnn.layers = { struct('type', 'i') %input layer struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %sub sampling layer struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %subsampling layer }; cnn = cnnsetup(cnn, train_x, train_y); opts.alpha = 1; opts.batchsize = 50; opts.numepochs = 1; cnn = cnntrain(cnn, train_x, train_y, opts); [er, bad] = cnntest(cnn, test_x, test_y); %plot mean squared error figure; plot(cnn.rL); assert(er<0.12, 'Too big error'); ``` Example: Neural Networks --------------------- ```matlab function test_example_NN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); % normalize [train_x, mu, sigma] = zscore(train_x); test_x = normalize(test_x, mu, sigma); %% ex1 vanilla neural net rand('state',0) nn = nnsetup([784 100 10]); opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.08, 'Too big error'); %% ex2 neural net with L2 weight decay rand('state',0) nn = nnsetup([784 100 10]); nn.weightPenaltyL2 = 1e-4; % L2 weight decay opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex3 neural net with dropout rand('state',0) nn = nnsetup([784 100 10]); nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex4 neural net with sigmoid activation function rand('state',0) nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; % Sigmoid activation function nn.learningRate = 1; % Sigm require a lower learning rate opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex5 plotting functionality rand('state',0) nn = nnsetup([784 20 10]); opts.numepochs = 5; % Number of full sweeps through data nn.output = 'softmax'; % use softmax output opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex6 neural net with sigmoid activation and plotting of validation and training error % split training data into training and validation data vx = train_x(1:10000,:); tx = train_x(10001:end,:); vy = train_y(1:10000,:); ty = train_y(10001:end,:); rand('state',0) nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax output opts.numepochs = 5; % Number of full sweeps through data opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally) [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); ``` [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rasmusbergpalm/deeplearntoolbox/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值