【转帖】【面向代码】学习 Deep Learning(一)Neural Network

本文通过分析DeepLearnToolbox代码,介绍了神经网络的构建、训练和测试过程,并探讨了dropout和denoising autoencoder的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近一直在看Deep Learning,各类博客、论文看得不少

但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一个toolbox

只是跟着Andrew Ng的UFLDL tutorial 写了些已有框架的代码(这部分的代码见github)

后来发现了一个matlab的Deep Learning的toolbox,发现其代码很简单,感觉比较适合用来学习算法

再一个就是matlab的实现可以省略掉很多数据结构的代码,使算法思路非常清晰

所以我想在解读这个toolbox的代码的同时来巩固自己学到的,同时也为下一步的实践打好基础

(本文只是从代码的角度解读算法,具体的算法理论步骤还是需要去看paper的

我会在文中给出一些相关的paper的名字,本文旨在梳理一下算法过程,不会深究算法原理和公式)

==========================================================================================

使用的代码:DeepLearnToolbox   ,下载地址:点击打开,感谢该toolbox的作者

==========================================================================================

第一章从分析NN(neural network)开始,因为这是整个deep learning的大框架,参见UFLDL

==========================================================================================

首先看一下\tests\test_example_NN.m ,跳过对数据进行normalize的部分,最关键的就是:

(为了注释显示有颜色,我把matlab代码中的%都改成了//)

 

  1. nn = nnsetup([784 100 10]);  
  2. opts.numepochs =  1;   //  Number of full sweeps through data  
  3. opts.batchsize = 100;  //  Take a mean gradient step over this many samples  
  4. [nn, L] = nntrain(nn, train_x, train_y, opts);  
  5. [er, bad] = nntest(nn, test_x, test_y);  
nn = nnsetup([784 100 10]);
opts.numepochs =  1;   //  Number of full sweeps through data
opts.batchsize = 100;  //  Take a mean gradient step over this many samples
[nn, L] = nntrain(nn, train_x, train_y, opts);
[er, bad] = nntest(nn, test_x, test_y);

 

很简单的几步就训练了一个NN,我们发现其中最重要的几个函数就是nnsetup,nntrain和nntest了

 

那么我们分别来分析着几个函数,\NN\nnsetup.m

nnsetup

 

  1. function nn = nnsetup(architecture)  
  2. //首先从传入的architecture中获得这个网络的整体结构,nn.n表示这个网络有多少层,可以参照上面的样例调用nnsetup([784 100 10])加以理解  
  3.   
  4.     nn.size   = architecture;  
  5.     nn.n      = numel(nn.size);  
  6.     //接下来是一大堆的参数,这个我们到具体用的时候再加以说明  
  7.     nn.activation_function              = 'tanh_opt';   //  Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).  
  8.     nn.learningRate                     = 2;            //  learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.  
  9.     nn.momentum                         = 0.5;          //  Momentum  
  10.     nn.scaling_learningRate             = 1;            //  Scaling factor for the learning rate (each epoch)  
  11.     nn.weightPenaltyL2                  = 0;            //  L2 regularization  
  12.     nn.nonSparsityPenalty               = 0;            //  Non sparsity penalty  
  13.     nn.sparsityTarget                   = 0.05;         //  Sparsity target  
  14.     nn.inputZeroMaskedFraction          = 0;            //  Used for Denoising AutoEncoders  
  15.     nn.dropoutFraction                  = 0;            //  Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)  
  16.     nn.testing                          = 0;            //  Internal variable. nntest sets this to one.  
  17.     nn.output                           = 'sigm';       //  output unit 'sigm' (=logistic), 'softmax' and 'linear'  
  18.     //对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数    
  19.     //vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)      
  20.    for i = 2 : nn.n     
  21.         // weights and weight momentum  
  22.         nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));  
  23.         nn.vW{i - 1} = zeros(size(nn.W{i - 1}));  
  24.           
  25.         // average activations (for use with sparsity)  
  26.         nn.p{i}     = zeros(1, nn.size(i));     
  27.     end  
  28. end  
function nn = nnsetup(architecture)
//首先从传入的architecture中获得这个网络的整体结构,nn.n表示这个网络有多少层,可以参照上面的样例调用nnsetup([784 100 10])加以理解

    nn.size   = architecture;
    nn.n      = numel(nn.size);
    //接下来是一大堆的参数,这个我们到具体用的时候再加以说明
    nn.activation_function              = 'tanh_opt';   //  Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
    nn.learningRate                     = 2;            //  learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
    nn.momentum                         = 0.5;          //  Momentum
    nn.scaling_learningRate             = 1;            //  Scaling factor for the learning rate (each epoch)
    nn.weightPenaltyL2                  = 0;            //  L2 regularization
    nn.nonSparsityPenalty               = 0;            //  Non sparsity penalty
    nn.sparsityTarget                   = 0.05;         //  Sparsity target
    nn.inputZeroMaskedFraction          = 0;            //  Used for Denoising AutoEncoders
    nn.dropoutFraction                  = 0;            //  Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
    nn.testing                          = 0;            //  Internal variable. nntest sets this to one.
    nn.output                           = 'sigm';       //  output unit 'sigm' (=logistic), 'softmax' and 'linear'
    //对每一层的网络结构进行初始化,一共三个参数W,vW,p,其中W是主要的参数  
    //vW是更新参数时的临时参数,p是所谓的sparsity,(等看到代码了再细讲)    
   for i = 2 : nn.n   
        // weights and weight momentum
        nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1)));
        nn.vW{i - 1} = zeros(size(nn.W{i - 1}));
        
        // average activations (for use with sparsity)
        nn.p{i}     = zeros(1, nn.size(i));   
    end
end

nntrain

setup大概就这样一个过程,下面就到了train了,打开\NN\nntrain.m

我们跳过那些检验传入数据是否正确的代码,直接到关键的部分

denoising 的部分请参考论文:Extracting and Composing Robust Features with Denoising Autoencoders

 

  1. m = size(train_x, 1);  
  2. //m是训练样本的数量  
  3. //注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小  
  4. batchsize = opts.batchsize; numepochs = opts.numepochs;  
  5. numbatches = m / batchsize;  //计算batch的数量  
  6. assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');  
  7. L = zeros(numepochs*numbatches,1);  
  8. n = 1;  
  9. //numepochs是循环的次数  
  10. for i = 1 : numepochs  
  11.     tic;  
  12.     kk = randperm(m);  
  13.     //把batches打乱顺序进行训练,randperm(m)生成一个乱序的1到m的数组  
  14.     for l = 1 : numbatches  
  15.         batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);  
  16.         //Add noise to input (for use in denoising autoencoder)  
  17.         //加入noise,这是denoising autoencoder需要使用到的部分  
  18.         //这部分请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文  
  19.         //具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例  
  20.         if(nn.inputZeroMaskedFraction ~= 0)  
  21.             batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);  
  22.         end  
  23.         batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);  
  24.         //这三个函数  
  25.         //nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降  
  26.         //我们在下面分析这些函数的代码  
  27.         nn = nnff(nn, batch_x, batch_y);  
  28.         nn = nnbp(nn);  
  29.         nn = nnapplygrads(nn);  
  30.         L(n) = nn.L;  
  31.         n = n + 1;  
  32.     end  
  33.       
  34.     t = toc;  
  35.     if ishandle(fhandle)  
  36.         if opts.validation == 1  
  37.             loss = nneval(nn, loss, train_x, train_y, val_x, val_y);  
  38.         else  
  39.             loss = nneval(nn, loss, train_x, train_y);  
  40.         end  
  41.         nnupdatefigures(nn, fhandle, loss, opts, i);  
  42.     end  
  43.           
  44.     disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]);  
  45.     nn.learningRate = nn.learningRate * nn.scaling_learningRate;  
  46. end  
m = size(train_x, 1);
//m是训练样本的数量
//注意在调用的时候我们设置了opt,batchsize是做batch gradient时候的大小
batchsize = opts.batchsize; numepochs = opts.numepochs;
numbatches = m / batchsize;  //计算batch的数量
assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');
L = zeros(numepochs*numbatches,1);
n = 1;
//numepochs是循环的次数
for i = 1 : numepochs
    tic;
    kk = randperm(m);
    //把batches打乱顺序进行训练,randperm(m)生成一个乱序的1到m的数组
    for l = 1 : numbatches
        batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :);
        //Add noise to input (for use in denoising autoencoder)
        //加入noise,这是denoising autoencoder需要使用到的部分
        //这部分请参见《Extracting and Composing Robust Features with Denoising Autoencoders》这篇论文
        //具体加入的方法就是把训练样例中的一些数据调整变为0,inputZeroMaskedFraction表示了调整的比例
        if(nn.inputZeroMaskedFraction ~= 0)
            batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction);
        end
        batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :);
        //这三个函数
        //nnff是进行前向传播,nnbp是后向传播,nnapplygrads是进行梯度下降
        //我们在下面分析这些函数的代码
        nn = nnff(nn, batch_x, batch_y);
        nn = nnbp(nn);
        nn = nnapplygrads(nn);
        L(n) = nn.L;
        n = n + 1;
    end
    
    t = toc;
    if ishandle(fhandle)
        if opts.validation == 1
            loss = nneval(nn, loss, train_x, train_y, val_x, val_y);
        else
            loss = nneval(nn, loss, train_x, train_y);
        end
        nnupdatefigures(nn, fhandle, loss, opts, i);
    end
        
    disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1))))]);
    nn.learningRate = nn.learningRate * nn.scaling_learningRate;
end

下面分析三个函数nnff,nnbp和nnapplygrads

 

nnff

nnff就是进行feedforward pass,其实非常简单,就是整个网络正向跑一次就可以了

当然其中有dropout和sparsity的计算

具体的参见论文“Improving Neural Networks with Dropout“和Autoencoders and Sparsity

  1. function nn = nnff(nn, x, y)  
  2. //NNFF performs a feedforward pass  
  3. // nn = nnff(nn, x, y) returns an neural network structure with updated  
  4. // layer activations, error and loss (nn.a, nn.e and nn.L)  
  5.   
  6.     n = nn.n;  
  7.     m = size(x, 1);  
  8.       
  9.     x = [ones(m,1) x];  
  10.     nn.a{1} = x;  
  11.   
  12.     //feedforward pass  
  13.     for i = 2 : n-1  
  14.         //根据选择的激活函数不同进行正向传播计算  
  15.         //你可以回过头去看nnsetup里面的第一个参数activation_function  
  16.         //sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变  
  17.         //tanh_opt是1.7159*tanh(2/3.*A)  
  18.         switch nn.activation_function   
  19.             case 'sigm'  
  20.                 // Calculate the unit's outputs (including the bias term)  
  21.                 nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');  
  22.             case 'tanh_opt'  
  23.                 nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');  
  24.         end  
  25.           
  26.         //dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数  
  27.         if(nn.dropoutFraction > 0)  
  28.             if(nn.testing)  
  29.                 nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);  
  30.             else  
  31.                 nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);  
  32.                 nn.a{i} = nn.a{i}.*nn.dropOutMask{i};  
  33.             end  
  34.         end  
  35.         //计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数  
  36.         //calculate running exponential activations for use with sparsity  
  37.         if(nn.nonSparsityPenalty>0)  
  38.             nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);  
  39.         end  
  40.           
  41.         //Add the bias term  
  42.         nn.a{i} = [ones(m,1) nn.a{i}];  
  43.     end  
  44.     switch nn.output   
  45.         case 'sigm'  
  46.             nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');  
  47.         case 'linear'  
  48.             nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';  
  49.         case 'softmax'  
  50.             nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';  
  51.             nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));  
  52.             nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2));   
  53.     end  
  54.     //error and loss  
  55.     //计算error  
  56.     nn.e = y - nn.a{n};  
  57.       
  58.     switch nn.output  
  59.         case {'sigm', 'linear'}  
  60.             nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m;   
  61.         case 'softmax'  
  62.             nn.L = -sum(sum(y .* log(nn.a{n}))) / m;  
  63.     end  
  64. end  
function nn = nnff(nn, x, y)
//NNFF performs a feedforward pass
// nn = nnff(nn, x, y) returns an neural network structure with updated
// layer activations, error and loss (nn.a, nn.e and nn.L)

    n = nn.n;
    m = size(x, 1);
    
    x = [ones(m,1) x];
    nn.a{1} = x;

    //feedforward pass
    for i = 2 : n-1
        //根据选择的激活函数不同进行正向传播计算
        //你可以回过头去看nnsetup里面的第一个参数activation_function
        //sigm就是sigmoid函数,tanh_opt就是tanh的函数,这个toolbox好像有一点改变
        //tanh_opt是1.7159*tanh(2/3.*A)
        switch nn.activation_function 
            case 'sigm'
                // Calculate the unit's outputs (including the bias term)
                nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}');
            case 'tanh_opt'
                nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}');
        end
        
        //dropout的计算部分部分 dropoutFraction 是nnsetup中可以设置的一个参数
        if(nn.dropoutFraction > 0)
            if(nn.testing)
                nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
            else
                nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
                nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
            end
        end
        //计算sparsity,nonSparsityPenalty 是对没达到sparsitytarget的参数的惩罚系数
        //calculate running exponential activations for use with sparsity
        if(nn.nonSparsityPenalty>0)
            nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1);
        end
        
        //Add the bias term
        nn.a{i} = [ones(m,1) nn.a{i}];
    end
    switch nn.output 
        case 'sigm'
            nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}');
        case 'linear'
            nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
        case 'softmax'
            nn.a{n} = nn.a{n - 1} * nn.W{n - 1}';
            nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2)));
            nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2)); 
    end
    //error and loss
	//计算error
    nn.e = y - nn.a{n};
    
    switch nn.output
        case {'sigm', 'linear'}
            nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m; 
        case 'softmax'
            nn.L = -sum(sum(y .* log(nn.a{n}))) / m;
    end
end

 

nnbp

 

代码:\NN\nnbp.m

nnbp呢是进行back propagation的过程,过程还是比较中规中矩,和ufldl中的Neural Network讲的基本一致

值得注意的还是dropout和sparsity的部分

  1. if(nn.nonSparsityPenalty>0)  
  2.     pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);  
  3.     sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];  
  4. end  
  5.   
  6. // Backpropagate first derivatives  
  7. if i+1==n % in this case in d{n} there is not the bias term to be removed               
  8.     d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56)  
  9. else // in this case in d{i} the bias term has to be removed  
  10.     d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;  
  11. end  
  12.   
  13. if(nn.dropoutFraction>0)  
  14.     d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];  
  15. end  
        if(nn.nonSparsityPenalty>0)
            pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1);
            sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))];
        end
        
        // Backpropagate first derivatives
        if i+1==n % in this case in d{n} there is not the bias term to be removed             
            d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; // Bishop (5.56)
        else // in this case in d{i} the bias term has to be removed
            d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act;
        end
        
        if(nn.dropoutFraction>0)
            d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];
        end

这只是实现的内容,代码中的d{i}就是这一层的delta值,在ufldl中有讲的

 

dW{i}基本就是计算的gradient了,只是后面还要加入一些东西,进行一些修改

具体原理参见论文“Improving Neural Networks with Dropout“ 以及 Autoencoders and Sparsity的内容

nnapplygrads

代码文件:\NN\nnapplygrads.m

  1. for i = 1 : (nn.n - 1)  
  2.     if(nn.weightPenaltyL2>0)  
  3.         dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};  
  4.     else  
  5.         dW = nn.dW{i};  
  6.     end  
  7.       
  8.     dW = nn.learningRate * dW;  
  9.       
  10.     if(nn.momentum>0)  
  11.         nn.vW{i} = nn.momentum*nn.vW{i} + dW;  
  12.         dW = nn.vW{i};  
  13.     end  
  14.           
  15.     nn.W{i} = nn.W{i} - dW;  
  16. end  
    for i = 1 : (nn.n - 1)
        if(nn.weightPenaltyL2>0)
            dW = nn.dW{i} + nn.weightPenaltyL2 * nn.W{i};
        else
            dW = nn.dW{i};
        end
        
        dW = nn.learningRate * dW;
        
        if(nn.momentum>0)
            nn.vW{i} = nn.momentum*nn.vW{i} + dW;
            dW = nn.vW{i};
        end
            
        nn.W{i} = nn.W{i} - dW;
    end

这个内容就简单了,nn.weightPenaltyL2 是weight decay的部分,也是nnsetup时可以设置的一个参数

 

有的话就加入weight Penalty,防止过拟合,然后再根据momentum的大小调整一下,最后改变nn.W{i}即可

nntest

nntest再简单不过了,就是调用一下nnpredict,在和test的集合进行比较

  1. function [er, bad] = nntest(nn, x, y)  
  2.     labels = nnpredict(nn, x);  
  3.     [~, expected] = max(y,[],2);  
  4.     bad = find(labels ~= expected);      
  5.     er = numel(bad) / size(x, 1);  
  6. end  
function [er, bad] = nntest(nn, x, y)
    labels = nnpredict(nn, x);
    [~, expected] = max(y,[],2);
    bad = find(labels ~= expected);    
    er = numel(bad) / size(x, 1);
end

 

nnpredict

 

代码文件:\NN\nnpredict.m

  1. function labels = nnpredict(nn, x)  
  2.     nn.testing = 1;  
  3.     nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));  
  4.     nn.testing = 0;  
  5.       
  6.     [~, i] = max(nn.a{end},[],2);  
  7.     labels = i;  
  8. end  
function labels = nnpredict(nn, x)
    nn.testing = 1;
    nn = nnff(nn, x, zeros(size(x,1), nn.size(end)));
    nn.testing = 0;
    
    [~, i] = max(nn.a{end},[],2);
    labels = i;
end

继续非常简单,predict不过是nnff一次,得到最后的output~~

 

max(nn.a{end},[],2); 是返回每一行的最大值以及所在的列数,所以labels返回的就是标号啦

(这个test好像是专门用来test 分类问题的,我们知道nnff得到最后的值即可)

 

总结

 
   总的来说,神经网络的代码比较常规易理解,基本上和 UFLDL中的内容相差不大
   只是加入了dropout的部分和denoising的部分
   本文的目的也不奢望讲清楚这些东西,只是给出一个路线,可以跟着代码去学习,加深对算法的理解和应用能力

转载于:https://www.cnblogs.com/daleloogn/p/4162459.html

Deep Learning Toolbox™提供了个框架,用于设计和实现具有算法,预训练模型和应用程序的深度神经网络。您可以使用卷积神经网络(ConvNets,CNN)和长期短期记忆(LSTM)网络对图像,时间序列和文本数据进行分类和回归。应用程序和图表可帮助您可视化激活,编辑网络体系结构以及监控培训进度。 对于小型训练集,您可以使用预训练的深层网络模型(包括SqueezeNet,Inception-v3,ResNet-101,GoogLeNet和VGG-19)以及从TensorFlow™-Keras和Caffe导入的模型执行传输学习。 了解深度学习工具箱的基础知识 深度学习图像 从头开始训练卷积神经网络或使用预训练网络快速学习新任务 使用时间序列,序列和文本进行深度学习 为时间序列分类,回归和预测任务创建和训练网络 深度学习调整和可视化 绘制培训进度,评估准确性,进行预测,调整培训选项以及可视化网络学习的功能 并行和云中的深度学习 通过本地或云中的多个GPU扩展深度学习,并以交互方式或批量作业培训多个网络 深度学习应用 通过计算机视觉,图像处理,自动驾驶,信号和音频扩展深度学习工作流程 深度学习导入,导出和自定义 导入和导出网络,定义自定义深度学习图层以及自定义数据存储 深度学习代码生成 生成MATLAB代码或CUDA ®和C ++代码和部署深学习网络 函数逼近和聚类 使用浅层神经网络执行回归,分类和聚类 时间序列和控制系统 基于浅网络的模型非线性动态系统; 使用顺序数据进行预测。
深度学习工具包 Deprecation notice. ----- This toolbox is outdated and no longer maintained. There are much better tools available for deep learning than this toolbox, e.g. [Theano](http://deeplearning.net/software/theano/), [torch](http://torch.ch/) or [tensorflow](http://www.tensorflow.org/) I would suggest you use one of the tools mentioned above rather than use this toolbox. Best, Rasmus. DeepLearnToolbox ================ A Matlab toolbox for Deep Learning. Deep Learning is a new subfield of machine learning that focuses on learning deep hierarchical models of data. It is inspired by the human brain's apparent deep (layered, hierarchical) architecture. A good overview of the theory of Deep Learning theory is [Learning Deep Architectures for AI](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf) For a more informal introduction, see the following videos by Geoffrey Hinton and Andrew Ng. * [The Next Generation of Neural Networks](http://www.youtube.com/watch?v=AyzOUbkUf3M) (Hinton, 2007) * [Recent Developments in Deep Learning](http://www.youtube.com/watch?v=VdIURAu1-aU) (Hinton, 2010) * [Unsupervised Feature Learning and Deep Learning](http://www.youtube.com/watch?v=ZmNOAtZIgIk) (Ng, 2011) If you use this toolbox in your research please cite [Prediction as a candidate for learning deep hierarchical models of data](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284) ``` @MASTERSTHESIS\{IMM2012-06284, author = "R. B. Palm", title = "Prediction as a candidate for learning deep hierarchical models of data", year = "2012", } ``` Contact: rasmusbergpalm at gmail dot com Directories included in the toolbox ----------------------------------- `NN/` - A library for Feedforward Backpropagation Neural Networks `CNN/` - A library for Convolutional Neural Networks `DBN/` - A library for Deep Belief Networks `SAE/` - A library for Stacked Auto-Encoders `CAE/` - A library for Convolutional Auto-Encoders `util/` - Utility functions used by the libraries `data/` - Data used by the examples `tests/` - unit tests to verify toolbox is working For references on each library check REFS.md Setup ----- 1. Download. 2. addpath(genpath('DeepLearnToolbox')); Example: Deep Belief Network --------------------- ```matlab function test_example_DBN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit RBM and visualize its weights rand('state',0) dbn.sizes = [100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights %% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN rand('state',0) %train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); %unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); nn.activation_function = 'sigm'; %train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, 'Too big error'); ``` Example: Stacked Auto-Encoders --------------------- ```matlab function test_example_SAE load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]); sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts); visualize(sae.ae{1}.W{1}(:,2:end)') % Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; % Train the FFNN opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.16, 'Too big error'); ``` Example: Convolutional Neural Nets --------------------- ```matlab function test_example_CNN load mnist_uint8; train_x = double(reshape(train_x',28,28,60000))/255; test_x = double(reshape(test_x',28,28,10000))/255; train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) cnn.layers = { struct('type', 'i') %input layer struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %sub sampling layer struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %subsampling layer }; cnn = cnnsetup(cnn, train_x, train_y); opts.alpha = 1; opts.batchsize = 50; opts.numepochs = 1; cnn = cnntrain(cnn, train_x, train_y, opts); [er, bad] = cnntest(cnn, test_x, test_y); %plot mean squared error figure; plot(cnn.rL); assert(er<0.12, 'Too big error'); ``` Example: Neural Networks --------------------- ```matlab function test_example_NN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); % normalize [train_x, mu, sigma] = zscore(train_x); test_x = normalize(test_x, mu, sigma); %% ex1 vanilla neural net rand('state',0) nn = nnsetup([784 100 10]); opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.08, 'Too big error'); %% ex2 neural net with L2 weight decay rand('state',0) nn = nnsetup([784 100 10]); nn.weightPenaltyL2 = 1e-4; % L2 weight decay opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex3 neural net with dropout rand('state',0) nn = nnsetup([784 100 10]); nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex4 neural net with sigmoid activation function rand('state',0) nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; % Sigmoid activation function nn.learningRate = 1; % Sigm require a lower learning rate opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex5 plotting functionality rand('state',0) nn = nnsetup([784 20 10]); opts.numepochs = 5; % Number of full sweeps through data nn.output = 'softmax'; % use softmax output opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex6 neural net with sigmoid activation and plotting of validation and training error % split training data into training and validation data vx = train_x(1:10000,:); tx = train_x(10001:end,:); vy = train_y(1:10000,:); ty = train_y(10001:end,:); rand('state',0) nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax output opts.numepochs = 5; % Number of full sweeps through data opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally) [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); ``` [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rasmusbergpalm/deeplearntoolbox/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值