使用caffe 主要需要先编写prototxt文件,比如包括train.prototxt, deploy.prototxt, solve.prototxt. 其train.prototxt主要描述caffe训练每一层网络结构; deploy.prototxt 和train.prototxt差不多,最大区别是没有loss, 而solve.prototxt主要是设置训练时使用的方法,学习率, 模型保存之类的。
比如 solver.prototxt:
net: "model_simple/train.prototxt"
base_lr: 0.00001
lr_policy: "multistep"
gamma: 0.5
stepvalue: 200000
stepvalue: 300000
stepvalue: 400000
stepvalue: 500000
momentum: 0.9
weight_decay: 0.0004
max_iter: 1200000
snapshot: 5000
snapshot_prefix: "snapshot/flow"
solver_mode: GPU # Some of our layers only support GPU
type: "Adam"
momentum2: 0.999
display: 50
train.prototxt大概长这样(这是caffe自带的手写数字识别的一个例子,我嫌太长删了一层):
layer{
name: "LeNet"
input: "data"
input_shape {
dim: 64
dim: 1
dim: 28
dim: 28
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "pool1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
编译好caffe后, train和test 时,命令行接口是:
/way to/caffe train -solver ./model/solver.prototxt -gpu gpu_id //从头训练
/way to/caffe train -solver ./model/solver.prototxt -snapshot flow_iter_last.solverstate -gpu gpu_id
//'接着训练'
/way to/caffe test -model deploy.prototxt -weights model_filename.caffemodel, -gpu gpu_id //'test'
这里主要讲.prototxt中的layer结构.
caffe 定义了很多layer,观察caffe文件结构, 各种layer头文件在/include/caffe/layers
里, 实现在src/
包含 .cpp 和 .cu 文件。
/caffe/layers
而.prototxt文件中的参数列表登记在/src/proto/caffe.proto
其中每一层参数解析类型为:LayerParameter。
里面可选项包括: name,type等基本类型以及不同layer各种特别的Parameter.
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
// The train / test phase for computation.
optional Phase phase = 10;
// The amount of weight to assign each top blob in the objective.
// Each layer assigns a default value, usually of either 0 or 1,
// to each top blob.
repeated float loss_weight = 5;
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
repeated ParamSpec param = 6;
// The blobs containing the numeric parameters of the layer.
repeated BlobProto blobs = 7;
// Specifies whether to backpropagate to each bottom. If unspecified,
// Caffe will automatically infer whether each input needs backpropagation
// to compute parameter gradients. If set to true for some inputs,
// backpropagation to those inputs is forced; if set false for some inputs,
// backpropagation to those inputs is skipped.
//
// The size must be either 0 or equal to the number of bottoms.
repeated bool propagate_down = 11;
// Rules controlling whether and when a layer is included in the network,
// based on the current NetState. You may specify a non-zero number of rules
// to include OR exclude, but not both. If no include or exclude rules are
// specified, the layer is always included. If the current NetState meets
// ANY (i.e., one or more) of the specified rules, the layer is
// included/excluded.
repeated NetStateRule include = 8;
repeated NetStateRule exclude = 9;
// Parameters for data pre-processing.
optional TransformationParameter transform_param = 100;
// Parameters shared by loss layers.
optional LossParameter loss_param = 101;
// Layer type-specific parameters.
//
// Note: certain layers may have more than one computational engine
// for their implementation. These layers include an Engine type and
// engine parameter for selecting the implementation.
// The default for the engine is set by the ENGINE switch at compile-time.
optional AccuracyParameter accuracy_param = 102;
optional ArgMaxParameter argmax_param = 103;
optional BatchNormParameter batch_norm_param = 139;
optional BiasParameter bias_param = 141;
optional ConcatParameter concat_param = 104;
optional ContrastiveLossParameter contrastive_loss_param = 105;
optional ConvolutionParameter convolution_param = 106;
optional CropParameter crop_param = 144;
optional DataParameter data_param = 107;
optional DropoutParameter dropout_param = 108;
optional DummyDataParameter dummy_data_param = 109;
optional EltwiseParameter eltwise_param = 110;
optional ELUParameter elu_param = 140;
optional EmbedParameter embed_param = 137;
optional ExpParameter exp_param = 111;
optional FlattenParameter flatten_param = 135;
optional HDF5DataParameter hdf5_data_param = 112;
optional HDF5OutputParameter hdf5_output_param = 113;
optional HingeLossParameter hinge_loss_param = 114;
optional ImageDataParameter image_data_param = 115;
optional InfogainLossParameter infogain_loss_param = 116;
optional InnerProductParameter inner_product_param = 117;
optional InputParameter input_param = 143;
optional LogParameter log_param = 134;
optional LRNParameter lrn_param = 118;
optional MemoryDataParameter memory_data_param = 119;
optional MVNParameter mvn_param = 120;
optional ParameterParameter parameter_param = 145;
optional PoolingParameter pooling_param = 121;
optional PowerParameter power_param = 122;
optional PReLUParameter prelu_param = 131;
optional PythonParameter python_param = 130;
optional RecurrentParameter recurrent_param = 146;
optional ReductionParameter reduction_param = 136;
optional ReLUParameter relu_param = 123;
optional ReshapeParameter reshape_param = 133;
optional ScaleParameter scale_param = 142;
optional SigmoidParameter sigmoid_param = 124;
optional SoftmaxParameter softmax_param = 125;
optional SPPParameter spp_param = 132;
optional SliceParameter slice_param = 126;
optional TanHParameter tanh_param = 127;
optional ThresholdParameter threshold_param = 128;
optional TileParameter tile_param = 138;
optional WindowDataParameter window_data_param = 129;
optional CoeffScheduleParameter coeff_schedule_param = 148;
optional AugmentationParameter augmentation_param = 149;
optional CorrelationParameter correlation_param = 150;
optional L1LossParameter l1_loss_param = 151;
optional WriterParameter writer_param = 152;
optional ReaderParameter reader_param = 153;
optional MeanParameter mean_param = 154;
optional ResampleParameter resample_param = 155;
optional DownsampleParameter downsample_param = 156;
optional LpqLossParameter lpq_loss_param = 158;
optional FlowWarpParameter flow_warp_param = 159;
optional AccumParameter accum_param = 160;
optional BlackAugmentationParameter black_augmentation_param = 161;
// If should run reshape every iteration
optional bool reshape_every_iter = 157 [default = true];
}
观察易得:caffe所有layer继承自 类Layer<Dtype>
比如class BaseConvolutionLayer : public Layer<Dtype>
Layer.hpp中注意到:
/**
* @brief Implements common layer setup functionality.
*
* @param bottom the preshaped input blobs
* @param top
* the allocated but unshaped output blobs, to be shaped by Reshape
*
* Checks that the number of bottom and top blobs is correct.
* Calls LayerSetUp to do special layer setup for individual layer types,
* followed by Reshape to set up sizes of top blobs and internal buffers.
* Sets up the loss weight multiplier blobs for any non-zero loss weights.
* This method may not be overridden.
*/
void SetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
InitMutex();
CheckBlobCounts(bottom, top);
LayerSetUp(bottom, top);
Reshape(bottom, top);
SetLossWeights(top);
}
所以每一层首先 CheckBlobCounts()检查输入输出设置是不是对的,然后调用reshape()设置top层大小,
至于SetLossWeights(top) 主要用于Loss layer, 根据 https://stackoverflow.com/questions/43094891/caffe-what-is-setlossweights。
The purpose of loss weight is to combine loss from multiple layers. So Layer::SetLossWeights is assigning the loss weight to loss_ variable and diff blob which is used in forward to compute total loss.
As default layers with suffix loss have loss weight 1 and others with 0. But any layer that is able to backpropagate can be given a non-zero loss_weight.
也就是说其他各种类型层继承自Layer, 重载了 LayerSetUp(), Reshape(), 还有一些返回type,size的基本函数。
当然最基本的4个函数:
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
其他很好理解,其中 propagate_down 表示该层是否反向传播梯度
自定义层的话,需要手写前向传播和反向传播函数。