Deep Learning 工具

后之视今,亦犹今之视昔

在风口的时候,可以亲眼见证各种变化,于是才有机会体验什么是瞬息万变。当年以目标识别为切入点,依据网上有的各种资料,各种填坑安装,重装,兵分几路,尝试caffe, mxnet, tensorflow,torch, cntk, darknet。

  1. cntk用的太少,基本被放弃。
  2. darknet仅仅是对yolo的专属,难以扩展,也没有专门去看。
  3. torch那时仅仅支持lua,虽然买了本lua的书啃,但是生态还是个大问题,迫于tensorflow的压力后面最终还是推出了pytorch,大有超越tensorflow之势。
  4. tensorflow感觉越来越庞杂,bazel的编译也是。。。
  5. 李沐、陈天奇等人的mxnet当时也是个厉害的角色,可惜没有大公司背书,生态很难起来。现在李沐本尊都已经有了《手把手教你》系列教程了。
  6. caffe真的是个祖传工具,推出早,成名早,用户积累广,是源码debug研究的绝好材料,虽然后面一度被python接口的tensorflow碾压,在贾扬清大神加盟Facebook,推出caffe2以后,和pytorch配合训练+部署一条龙服务,简直不要太友好。

Caffe概览

祖传工具的好处是,有各种前人已经帮你把它揉碎嚼烂消化吸收好之后,呈现在你面前,可以大大的提升小白的学习效率。总的印象它大约是长成如下这个样子
在这里插入图片描述

Caffe的反向传播推导

结合安排和自己的兴趣,当时重点源码debug反向传播这一块。
以Softmax 函数为例,假设数据 x {\mathbf{x}} x对应的标签为 y {\mathbf{y}} y,观察到的数据 x {\mathbf{x}} x,属于类别 i {i} i的概率为 o i {o_{i}} oi, Softmax 函数: σ ( z ) = ( σ 1 ( z ) , . . . , σ m ( z ) ) {\mathbf{\sigma}(\mathbf{z} )=(\sigma_{1}(\mathbf{z}),...,\sigma_{m}(\mathbf{z}))} σ(z)=(σ1(z),...,σm(z))
o i = σ i ( z ) = e x p ( z i ) ∑ j = 1 m e x p ( z j ) , i = 1 , . . . , m o_{i}=\sigma_{i}(\mathbf{z})=\frac{exp(z_{i})}{\sum_{j=1}^{m}exp(z_{j})},i=1,...,m oi=σi(z)=j=1mexp(zj)exp(zi),i=1,...,m
反向传播的动力源头-Multinomial Logistic Loss
l ( y , o ) = − l o g ( o y ) l(y,o)=-log(o_{y}) l(y,o)=log(oy)
∂ l ( y , o ) ∂ o i = − δ i y o y \frac{\partial l(y,o)}{\partial o_{i}}=-\frac{\delta_{iy}}{o_{y}} oil(y,o)=oyδiy
δ k y = { 1 k = y 0 k!= y \delta_{ky}=\left\{ \begin{array}{ll} 1 & \textrm{k = y}\\ 0 & \textrm{k!= y}\\ \end{array} \right. δky={10k = yk!= y
Softmax的导数
∂ o i ∂ z k = δ i k e z i ( ∑ j = 1 m e z i ) − e z i e z k ( ∑ j = 1 m e z i ) 2 = δ i k o k − o i o k \frac{\partial o_{i}}{\partial z_{k}}=\frac{\delta_{ik}e^{z_{i}}(\sum_{j=1}^{m}e^{z_{i}})-e^{z_{i}}e^{z_{k}}}{(\sum_{j=1}^{m}e^{z_{i}})^2}=\delta_{ik}o_{k}-o_{i}o_{k} zkoi=(j=1mezi)2δikezi(j=1mezi)eziezk=δikokoiok
引入chain rule,得到SoftmaxWithLoss的导数
∑ i = 1 m ∂ o i ∂ z k ⋅ ∂ l ( y , o ) ∂ o i = o k − δ y k o k o y = o k − δ y k \sum_{i=1}^{m}\frac{\partial o_{i}}{\partial z_{k}}\cdot \frac{\partial l(y,o)}{\partial o_{i}}=o_{k}-\delta_{yk}\frac{o_{k}}{o_{y}}=o_{k}-\delta_{yk} i=1mzkoioil(y,o)=okδykoyok=okδyk
Numerical Stability:
如果分成两层计算,除了计算量增大,数值稳定性也变差。由于浮点数有精度限制,每多一次运算就多累积一定误差,且分两步计算时,我们需要计算 δ i y o y \frac{\delta_{iy}}{o_{y}} oyδiy ,如果这次预测非常不准,正确的类别所得到的概率非常小, 会产生overflow。

Caffe的反向传播代码

softmaxwithloss的反向传播实现

template <typename Dtype>
void KLNSOFTMAXLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, 
const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type()
               << "Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down[0]) {
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const Dtype* prob_data = prob_.cpu_data();
    caffe_copy(prob_.count(), prob_data, bottom_diff);
    const Dtype* label = bottom[1]->cpu_data();
    int dim = prob_.count() / outer_num_;
    int count = 0;
    for (int i = 0; i < outer_num_; ++i) {
      for (int j = 0; j < inner_num_; ++j) {
        const int label_value = 
        static_cast<int>(label[i * inner_num_ + j]);
        if (has_ignore_label_ && label_value == ignore_label_) {
         for (int c = 0; c < bottom[0]->shape(kln_softmax_axis_); ++c) 
         {
            bottom_diff[i * dim + c * inner_num_ + j] = 0;
          }
        } else {
          bottom_diff[i * dim + label_value * inner_num_ + j] -= 1;
          ++count;
        }
      }
    }// Scale gradient
    Dtype loss_weight = top[0]->cpu_diff()[0] /
                        get_normalizer(normalization_, count);
    caffe_scal(prob_.count(), loss_weight, bottom_diff);
  }
}

Caffe手动添加一个自己的层

  • Step1: 添加ReLU在caffe.proto中的消息定义
message KLNReLUParameter {
   optional float negative_slope = 1 [default = 0];
  enum Engine {
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 2 [default = DEFAULT];
}
  • Step2: 在caffe.proto的LayerParameter中添加最新的ID
message LayerParameter {
  optional string name = 1; 
  optional string type = 2; 
  repeated string bottom = 3; 
  repeated string top = 4; 
......
  optional AccuracyParameter accuracy_param = 102;
  optional ArgMaxParameter argmax_param = 103;
  optional BatchNormParameter batch_norm_param = 139;
  optional BiasParameter bias_param = 141;
  optional ConcatParameter concat_param = 104;
......
  optional KLNReLUParameter kln_relu_param = 147;
}
  • Step3: 在include/caffe/layers/中添加一个头文件kln_relu_layer.hpp
#ifndef CAFFE_KLN_RELU_LAYER_HPP_
#define CAFFE_KLN_RELU_LAYER_HPP_
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/layers/kln_neuron_layer.hpp"
namespace caffe {
template <typename Dtype>
class KLNReLULayer : public KLNNeuronLayer<Dtype> {
 public:
  explicit KLNReLULayer(const LayerParameter& param)
      : KLNNeuronLayer<Dtype>(param) {}
  virtual inline const char* type() const { return "KLNReLU"; }
 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top){
      NOT_IMPLEMENTED;
  };
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, 
      const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, 
      const vector<Blob<Dtype>*>& bottom){
      NOT_IMPLEMENTED;
  };
};
}  // namespace caffe
#endif  // CAFFE_KLN_RELU_LAYER_HPP_
  • Step4:在src/caffe/layers/中添加对应的实现文件kln_relu_layer.cpp

#include <algorithm>
#include <vector>
#include "caffe/layers/kln_relu_layer.hpp"
namespace caffe {
template <typename Dtype>
void KLNReLULayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();
  Dtype negative_slope =
  this->layer_param_.kln_relu_param().negative_slope();
  for (int i = 0; i < count; ++i) {
    top_data[i] = std::max(bottom_data[i], Dtype(0))
        + negative_slope * std::min(bottom_data[i], Dtype(0));
  }
}
template <typename Dtype>
void KLNReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
    const Dtype* bottom_data = bottom[0]->cpu_data();
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const int count = bottom[0]->count();
    Dtype negative_slope =
    this->layer_param_.kln_relu_param().negative_slope();
    for (int i = 0; i < count; ++i) {
      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
          + negative_slope * (bottom_data[i] <= 0));
    }
  }
}
INSTANTIATE_CLASS(KLNReLULayer);
}  // namespace caffe
  • Step5:在src/caffe/layer_factory.cpp中添加对应的注册实现
...
#include "caffe/layers/kln_relu_layer.hpp"
...
template <typename Dtype>
shared_ptr<Layer<Dtype>> GetKLNReLULayer(const LayerParameter& param) {
  KLNReLUParameter_Engine engine = param.kln_relu_param().engine();
  if (engine == KLNReLUParameter_Engine_DEFAULT) {
    engine = KLNReLUParameter_Engine_CAFFE;
  }
  if (engine == KLNReLUParameter_Engine_CAFFE) {
    return shared_ptr<Layer<Dtype> >(new KLNReLULayer<Dtype>(param));
  } else {
    LOG(FATAL) << "Layer " << param.name() << "has unknown engine.";
  }
}
REGISTER_LAYER_CREATOR(KLNReLU, GetKLNReLULayer);

后记

看到这里就知道为什么当时如火如荼的caffe后来被“新涌现”的tensorflow压倒,tensorflow也有重蹈覆辙之势。“每览昔人兴感之由,若合一契,未尝不临文嗟悼,不能喻之于怀。固知一死生为虚诞,齐彭殇为妄作。后之视今,亦犹今之视昔”。

- [1] 深度学习:21天实战Caffe. 赵永科,电子工业出版社.
- [2] http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/
深度学习工具包 Deprecation notice. ----- This toolbox is outdated and no longer maintained. There are much better tools available for deep learning than this toolbox, e.g. [Theano](http://deeplearning.net/software/theano/), [torch](http://torch.ch/) or [tensorflow](http://www.tensorflow.org/) I would suggest you use one of the tools mentioned above rather than use this toolbox. Best, Rasmus. DeepLearnToolbox ================ A Matlab toolbox for Deep Learning. Deep Learning is a new subfield of machine learning that focuses on learning deep hierarchical models of data. It is inspired by the human brain's apparent deep (layered, hierarchical) architecture. A good overview of the theory of Deep Learning theory is [Learning Deep Architectures for AI](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf) For a more informal introduction, see the following videos by Geoffrey Hinton and Andrew Ng. * [The Next Generation of Neural Networks](http://www.youtube.com/watch?v=AyzOUbkUf3M) (Hinton, 2007) * [Recent Developments in Deep Learning](http://www.youtube.com/watch?v=VdIURAu1-aU) (Hinton, 2010) * [Unsupervised Feature Learning and Deep Learning](http://www.youtube.com/watch?v=ZmNOAtZIgIk) (Ng, 2011) If you use this toolbox in your research please cite [Prediction as a candidate for learning deep hierarchical models of data](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284) ``` @MASTERSTHESIS\{IMM2012-06284, author = "R. B. Palm", title = "Prediction as a candidate for learning deep hierarchical models of data", year = "2012", } ``` Contact: rasmusbergpalm at gmail dot com Directories included in the toolbox ----------------------------------- `NN/` - A library for Feedforward Backpropagation Neural Networks `CNN/` - A library for Convolutional Neural Networks `DBN/` - A library for Deep Belief Networks `SAE/` - A library for Stacked Auto-Encoders `CAE/` - A library for Convolutional Auto-Encoders `util/` - Utility functions used by the libraries `data/` - Data used by the examples `tests/` - unit tests to verify toolbox is working For references on each library check REFS.md Setup ----- 1. Download. 2. addpath(genpath('DeepLearnToolbox')); Example: Deep Belief Network --------------------- ```matlab function test_example_DBN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit RBM and visualize its weights rand('state',0) dbn.sizes = [100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights %% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN rand('state',0) %train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); %unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); nn.activation_function = 'sigm'; %train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, 'Too big error'); ``` Example: Stacked Auto-Encoders --------------------- ```matlab function test_example_SAE load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]); sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts); visualize(sae.ae{1}.W{1}(:,2:end)') % Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; % Train the FFNN opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.16, 'Too big error'); ``` Example: Convolutional Neural Nets --------------------- ```matlab function test_example_CNN load mnist_uint8; train_x = double(reshape(train_x',28,28,60000))/255; test_x = double(reshape(test_x',28,28,10000))/255; train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) cnn.layers = { struct('type', 'i') %input layer struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %sub sampling layer struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %subsampling layer }; cnn = cnnsetup(cnn, train_x, train_y); opts.alpha = 1; opts.batchsize = 50; opts.numepochs = 1; cnn = cnntrain(cnn, train_x, train_y, opts); [er, bad] = cnntest(cnn, test_x, test_y); %plot mean squared error figure; plot(cnn.rL); assert(er<0.12, 'Too big error'); ``` Example: Neural Networks --------------------- ```matlab function test_example_NN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); % normalize [train_x, mu, sigma] = zscore(train_x); test_x = normalize(test_x, mu, sigma); %% ex1 vanilla neural net rand('state',0) nn = nnsetup([784 100 10]); opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.08, 'Too big error'); %% ex2 neural net with L2 weight decay rand('state',0) nn = nnsetup([784 100 10]); nn.weightPenaltyL2 = 1e-4; % L2 weight decay opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex3 neural net with dropout rand('state',0) nn = nnsetup([784 100 10]); nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex4 neural net with sigmoid activation function rand('state',0) nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; % Sigmoid activation function nn.learningRate = 1; % Sigm require a lower learning rate opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex5 plotting functionality rand('state',0) nn = nnsetup([784 20 10]); opts.numepochs = 5; % Number of full sweeps through data nn.output = 'softmax'; % use softmax output opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex6 neural net with sigmoid activation and plotting of validation and training error % split training data into training and validation data vx = train_x(1:10000,:); tx = train_x(10001:end,:); vy = train_y(1:10000,:); ty = train_y(10001:end,:); rand('state',0) nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax output opts.numepochs = 5; % Number of full sweeps through data opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally) [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); ``` [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rasmusbergpalm/deeplearntoolbox/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
Deep Learning Toolbox™提供了一个框架,用于设计和实现具有算法,预训练模型和应用程序的深度神经网络。您可以使用卷积神经网络(ConvNets,CNN)和长期短期记忆(LSTM)网络对图像,时间序列和文本数据进行分类和回归。应用程序和图表可帮助您可视化激活,编辑网络体系结构以及监控培训进度。 对于小型训练集,您可以使用预训练的深层网络模型(包括SqueezeNet,Inception-v3,ResNet-101,GoogLeNet和VGG-19)以及从TensorFlow™-Keras和Caffe导入的模型执行传输学习。 了解深度学习工具箱的基础知识 深度学习图像 从头开始训练卷积神经网络或使用预训练网络快速学习新任务 使用时间序列,序列和文本进行深度学习 为时间序列分类,回归和预测任务创建和训练网络 深度学习调整和可视化 绘制培训进度,评估准确性,进行预测,调整培训选项以及可视化网络学习的功能 并行和云中的深度学习 通过本地或云中的多个GPU扩展深度学习,并以交互方式或批量作业培训多个网络 深度学习应用 通过计算机视觉,图像处理,自动驾驶,信号和音频扩展深度学习工作流程 深度学习导入,导出和自定义 导入和导出网络,定义自定义深度学习图层以及自定义数据存储 深度学习代码生成 生成MATLAB代码或CUDA ®和C ++代码和部署深学习网络 函数逼近和聚类 使用浅层神经网络执行回归,分类和聚类 时间序列和控制系统 基于浅网络的模型非线性动态系统; 使用顺序数据进行预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值