caffe源码分析随笔(一)——概览与Layer_梳理caffe代码layer-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_39560336/article/details/81482782

本文介绍了Caffe中prototxt文件的使用，如train、deploy和solver.prototxt，并探讨了layer的结构。在Caffe中，layer的参数由LayerParameter定义，各层继承自BaseLayer，并实现特定的函数，如LayerSetUp和Reshape。自定义层需要实现前向传播和反向传播功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

使用caffe 主要需要先编写prototxt文件，比如包括train.prototxt, deploy.prototxt, solve.prototxt. 其train.prototxt主要描述caffe训练每一层网络结构； deploy.prototxt 和train.prototxt差不多，最大区别是没有loss，而solve.prototxt主要是设置训练时使用的方法，学习率，模型保存之类的。
比如 solver.prototxt:

net: "model_simple/train.prototxt"
base_lr: 0.00001

lr_policy: "multistep"
gamma: 0.5
stepvalue: 200000
stepvalue: 300000
stepvalue: 400000
stepvalue: 500000

momentum: 0.9

weight_decay: 0.0004
max_iter: 1200000
snapshot: 5000
snapshot_prefix: "snapshot/flow"

solver_mode: GPU # Some of our layers only support GPU
type: "Adam"
momentum2: 0.999

display: 50

train.prototxt大概长这样（这是caffe自带的手写数字识别的一个例子，我嫌太长删了一层）：

layer{
  name: "LeNet"
  input: "data"
  input_shape {
    dim: 64
    dim: 1
    dim: 28
    dim: 28
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "pool1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "ip2"
  top: "prob"
}

编译好caffe后， train和test 时，命令行接口是：

/way to/caffe  train  -solver ./model/solver.prototxt -gpu gpu_id   //从头训练
/way to/caffe  train  -solver ./model/solver.prototxt -snapshot flow_iter_last.solverstate -gpu gpu_id   
//'接着训练'
/way to/caffe test -model deploy.prototxt -weights  model_filename.caffemodel, -gpu gpu_id //'test'

这里主要讲.prototxt中的layer结构.
caffe 定义了很多layer，观察caffe文件结构，各种layer头文件在/include/caffe/layers里，实现在src/ /caffe/layers 包含 .cpp 和 .cu 文件。
而.prototxt文件中的参数列表登记在/src/proto/caffe.proto
其中每一层参数解析类型为：LayerParameter。
里面可选项包括： name,type等基本类型以及不同layer各种特别的Parameter.

message LayerParameter {
  optional string name = 1; // the layer name
  optional string type = 2; // the layer type
  repeated string bottom = 3; // the name of each bottom blob
  repeated string top = 4; // the name of each top blob

  // The train / test phase for computation.
  optional Phase phase = 10;

  // The amount of weight to assign each top blob in the objective.
  // Each layer assigns a default value, usually of either 0 or 1,
  // to each top blob.
  repeated float loss_weight = 5;

  // Specifies training parameters (multipliers on global learning constants,
  // and the name and other settings used for weight sharing).
  repeated ParamSpec param = 6;

  // The blobs containing the numeric parameters of the layer.
  repeated BlobProto blobs = 7;

  // Specifies whether to backpropagate to each bottom. If unspecified,
  // Caffe will automatically infer whether each input needs backpropagation
  // to compute parameter gradients. If set to true for some inputs,
  // backpropagation to those inputs is forced; if set false for some inputs,
  // backpropagation to those inputs is skipped.
  //
  // The size must be either 0 or equal to the number of bottoms.
  repeated bool propagate_down = 11;

  // Rules controlling whether and when a layer is included in the network,
  // based on the current NetState.  You may specify a non-zero number of rules
  // to include OR exclude, but not both.  If no include or exclude rules are
  // specified, the layer is always included.  If the current NetState meets
  // ANY (i.e., one or more) of the specified rules, the layer is
  // included/excluded.
  repeated NetStateRule include = 8;
  repeated NetStateRule exclude = 9;

  // Parameters for data pre-processing.
  optional TransformationParameter transform_param = 100;

  // Parameters shared by loss layers.
  optional LossParameter loss_param = 101;

  // Layer type-specific parameters.
  //
  // Note: certain layers may have more than one computational engine
  // for their implementation. These layers include an Engine type and
  // engine parameter for selecting the implementation.
  // The default for the engine is set by the ENGINE switch at compile-time.
  optional AccuracyParameter accuracy_param = 102;
  optional ArgMaxParameter argmax_param = 103;
  optional BatchNormParameter batch_norm_param = 139;
  optional BiasParameter bias_param = 141;
  optional ConcatParameter concat_param = 104;
  optional ContrastiveLossParameter contrastive_loss_param = 105;
  optional ConvolutionParameter convolution_param = 106;
  optional CropParameter crop_param = 144;
  optional DataParameter data_param = 107;
  optional DropoutParameter dropout_param = 108;
  optional DummyDataParameter dummy_data_param = 109;
  optional EltwiseParameter eltwise_param = 110;
  optional ELUParameter elu_param = 140;
  optional EmbedParameter embed_param = 137;
  optional ExpParameter exp_param = 111;
  optional FlattenParameter flatten_param = 135;
  optional HDF5DataParameter hdf5_data_param = 112;
  optional HDF5OutputParameter hdf5_output_param = 113;
  optional HingeLossParameter hinge_loss_param = 114;
  optional ImageDataParameter image_data_param = 115;
  optional InfogainLossParameter infogain_loss_param = 116;
  optional InnerProductParameter inner_product_param = 117;
  optional InputParameter input_param = 143;
  optional LogParameter log_param = 134;
  optional LRNParameter lrn_param = 118;
  optional MemoryDataParameter memory_data_param = 119;
  optional MVNParameter mvn_param = 120;
  optional ParameterParameter parameter_param = 145;
  optional PoolingParameter pooling_param = 121;
  optional PowerParameter power_param = 122;
  optional PReLUParameter prelu_param = 131;
  optional PythonParameter python_param = 130;
  optional RecurrentParameter recurrent_param = 146;
  optional ReductionParameter reduction_param = 136;
  optional ReLUParameter relu_param = 123;
  optional ReshapeParameter reshape_param = 133;
  optional ScaleParameter scale_param = 142;
  optional SigmoidParameter sigmoid_param = 124;
  optional SoftmaxParameter softmax_param = 125;
  optional SPPParameter spp_param = 132;
  optional SliceParameter slice_param = 126;
  optional TanHParameter tanh_param = 127;
  optional ThresholdParameter threshold_param = 128;
  optional TileParameter tile_param = 138;
  optional WindowDataParameter window_data_param = 129;

  optional CoeffScheduleParameter coeff_schedule_param = 148;
  optional AugmentationParameter augmentation_param = 149;
  optional CorrelationParameter correlation_param = 150;
  optional L1LossParameter l1_loss_param = 151;
  optional WriterParameter writer_param = 152;
  optional ReaderParameter reader_param = 153;
  optional MeanParameter mean_param = 154;
  optional ResampleParameter resample_param = 155;
  optional DownsampleParameter downsample_param = 156;
  optional LpqLossParameter lpq_loss_param = 158;
  optional FlowWarpParameter flow_warp_param = 159;
  optional AccumParameter accum_param = 160;
  optional BlackAugmentationParameter black_augmentation_param = 161;

  // If should run reshape every iteration
  optional bool reshape_every_iter = 157 [default = true];
}

观察易得：caffe所有layer继承自类Layer<Dtype>
比如class BaseConvolutionLayer : public Layer<Dtype>

Layer.hpp中注意到：

  /**
   * @brief Implements common layer setup functionality.
   *
   * @param bottom the preshaped input blobs
   * @param top
   *     the allocated but unshaped output blobs, to be shaped by Reshape
   *
   * Checks that the number of bottom and top blobs is correct.
   * Calls LayerSetUp to do special layer setup for individual layer types,
   * followed by Reshape to set up sizes of top blobs and internal buffers.
   * Sets up the loss weight multiplier blobs for any non-zero loss weights.
   * This method may not be overridden.
   */
  void SetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
    InitMutex();
    CheckBlobCounts(bottom, top);
    LayerSetUp(bottom, top);
    Reshape(bottom, top);
    SetLossWeights(top);
  }

所以每一层首先 CheckBlobCounts()检查输入输出设置是不是对的，然后调用reshape()设置top层大小，
至于SetLossWeights(top) 主要用于Loss layer, 根据 https://stackoverflow.com/questions/43094891/caffe-what-is-setlossweights。

The purpose of loss weight is to combine loss from multiple layers. So Layer::SetLossWeights is assigning the loss weight to loss_ variable and diff blob which is used in forward to compute total loss.
As default layers with suffix loss have loss weight 1 and others with 0. But any layer that is able to backpropagate can be given a non-zero loss_weight.

也就是说其他各种类型层继承自Layer, 重载了 LayerSetUp(), Reshape(), 还有一些返回type,size的基本函数。
当然最基本的4个函数：

virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);