xavierFiller 体会

最新推荐文章于 2019-08-07 22:17:36 发布

原创最新推荐文章于 2019-08-07 22:17:36 发布 · 3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#caffe #protobuf

深度学习专栏收录该内容

18 篇文章

订阅专栏

本文分享了一套在Caffe框架下实现高准确率（94%以上）的配置方案，涉及网络结构的设计细节，包括卷积层、池化层等，并详细说明了solver参数设置，如学习率策略、动量等，适用于初学者快速上手。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

之前用caffe时，权值是用的xavierFiller填充的，配合以比较小的学习速率（SGD），发现效果出奇好，随便配置下网络，就能干到94%＋的准确率，不论训练集还是测试集。

不知道原因找的准不准，贴出配置，供查阅。

下面是solver的配置：

# The train/test net protocol buffer definition
net: "./car_detector_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of PLAN3, we have test batch size 64 and 2 test iterations,
# covering the full 106 testing images.
test_iter: 2
# Carry out testing every 100 training iterations.
test_interval: 100
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.05 #0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 200
snapshot_prefix: "./solve/car_detector"
# solver mode: CPU or GPU
solver_mode: CPU

下面是model的配置：

name: "car_detector_train_test"
layer {
  name: "car_detector"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "../train_set/car_detector_mean.binaryproto"
    scale: 0.00390625
  }
  data_param {
    source: "../train_set"
    batch_size: 64
    backend: LMDB
    mirror: true
  }
}
layer {
  name: "car_detector"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "../train_set/car_detector_mean.binaryproto"
    scale: 0.00390625
  }
  data_param {
    source: "../test_set"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    name: "conv1_w"
    lr_mult: 1
  }
  param {
    name: "conv1_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    name: "conv2_w"
    lr_mult: 1
  }
  param {
    name: "conv2_b"
    lr_mult: 2
  }
  convolution_param {
    num_output: 40
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    name: "ip1_w"
    lr_mult: 1
  }
  param {
    name: "ip1_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    name: "ip2_w"
    lr_mult: 1
  }
  param {
    name: "ip2_b"
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  #include {
  #  phase: TEST
  #}
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}
#layer {
#  name: "prob"
#  type: "Softmax"
#  bottom: "ip2"
#  top: "prob"
#}

解释下solver的配置：https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp

SGD优化器（默认的），

lr_policy inv： base_lr * (1 + gamma * iter) ^ (- power) ，这里的iter我认为是计算过的batch的数量，比如batch＝64的话，每计算过64个样本算一个迭代，iter＋＋。

// policies are as follows:
//    - fixed: always return base_lr.
//    - step: return base_lr * gamma ^ (floor(iter / step))
//    - exp: return base_lr * gamma ^ iter
//    - inv: return base_lr * (1 + gamma * iter) ^ (- power)
//    - multistep: similar to step but it allows non uniform steps defined by
//      stepvalue
//    - poly: the effective learning rate follows a polynomial decay, to be
//      zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
//    - sigmoid: the effective learning rate follows a sigmod decay
//      return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))

那么在torch中时怎么一回事呢？如果使用的是optim包中的sgd，查看代码可见：https://github.com/torch/optim/blob/master/sgd.lua

local clr = lr / (1 + nevals*lrd)

lr是基础学习速率，nevals是调用optimMethod(。。。）的次数，也就是一次batch一次。lrd是学习速率缩小因子。

这样一来，通过降低lrd，如调整为0.0004，这个值在我的实验中大约为完整的过一次样本库，降低量刚好为0.1。

lr＝0.01的情况下，重现了比较好的结果。

所以，学习率是要缩小的，但是不能太快。至于基础学习速率，0.01～0.001差不多，根据caffe网站上的建议：http://caffe.berkeleyvision.org/tutorial/solver.html