论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification

最新推荐文章于 2024-08-18 17:05:40 发布

原创最新推荐文章于 2024-08-18 17:05:40 发布 · 4.1k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#caffe #深度学习 #人脸识别 #损失

Deep-Learning 人脸识别论文专栏收录该内容

7 篇文章

订阅专栏

本文介绍AM-Softmax算法，一种改进的人脸特征提取方法，通过增加余弦距离的间隔提升人脸识别精度。相较于L-Softmax和A-Softmax，AM-Softmax简化计算并提高效率，在LFW和MegaFace数据集上表现优异。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification

Tags：Deep_Learning_基础论文

本文主要包含如下内容：

论文地址
 代码地址
 参考博客

论文阅读笔记：AM-Softmax: Additive Margin Softmax for Face Verification

本篇论文来自电子科技大学UESTC，论文参考NormFace、A-Softmax进行优化，提出了AM-Softmax。

主要思想

L-Softmax, A-Softmax引入了角间距的概念，用于改进传统的softmax loss函数，使得人脸特征具有更大的类间距和更小的类内距。作者在这些方法的启发下，提出了一种更直观和更易解释的additive margin Softmax (AM-Softmax)。同时，本文强调和讨论了特征正则化的重要性。实验表明AM-Softmax在LFW和MegaFace得到了比之前方法更好的效果。

算法原理

L-Softmax和A-Softmax均是引入了一个参数因子m 将权重W和f的cos距离变为cos(mθ)，通过m 来调节特征间的距离。与前两者类似，AM-Softmax将cos(θ)的式子改写为：式子是一个单调递减的函数，且比L-Softmax/A-Softmax所用的 Ψ(θ)在形式和计算时更为简单。

Ψ (θ) = c o s (θ) - m

$\Psi(\theta) = cos(\theta)-m$

L A M S = - 1 n \sum i = 1 n l o g e s \cdot ( c o s θ y i - m ) e s ( c o s θ y i - m ) + \sum c j = 1 , j \neq y i e s \cdot c o s ( θ j )

$L_{AMS}=-\frac{1}{n}\sum^n_{i=1}{log\frac{e^{s\cdot(cos\theta_{y_i}-m)}}{e^{s(cos\theta_{y_i}-m)}+\sum_{j=1,j\neq{y_i}}^c{e^{s\cdot cos(\theta_j)}}}}$
其中s是一个缩放因子，论文中固定为30。
角度距离与余弦距离的关系：Asoftmax是用m乘以θ，而AMSoftmax是用cosθ减去m，这是两者的最大不同之处：一个是角度距离，一个是余弦距离。之所以选择cosθ-m而不是cos（θ-m），这是因为我们从网络中得到的是W和f的内积，如果要优化cos（θ-m）那么会涉及到arccos操作，计算量过大。
归一化特征 feature normalization：高质量的图片提取出来的特征范数大，低质量的图片提取出来的特征范数小，在进行了feature normalizaiton后，这些质量较差的图片特征会产生更大的梯度，导致网络在训练过程中将更多的注意力集中在这些样本上。因此，对于数据集图片质量较差时，更适合采用feature normalization。

实验结果

值得注意的是，在LFW集上，未采用feature normalization比采用了feature normalizaiton的结果更好，作者分析是由于LFW的数据质量较高。
这里的了feature normalizaiton指的是将Scale层s的参数进行相应的更换，即将固定的s参数改变为对应的特征归一化尺度。即根据特征，缩放比例不一样了。

总结

本文在特征和权值正则化的情况下，提出了一种 additive margin Softmax，更直观也更易解释，同时也取得了比A-Softmax更好的实验结果。

代码实现

代码可以参考NormFace的相关代码，比较类似。只是在上面进行想应该改进。
这里在NormFace的基础上，提出了新的层LabelSpecificAdd，即AMSoftmax的核心，将cosθ减去m。

layer {
  name: "norm1"
  type: "Normalize"
  bottom: "fc5"
  top: "norm1"
}
layer {
  name: "fc6_l2"
  type: "InnerProduct"
  bottom: "norm1"
  top: "fc6"
  param {
    lr_mult: 1
  }
  inner_product_param{
    num_output: 10516
    normalize: true
    weight_filler {
      type: "xavier"
    }
    bias_term: false
  }
}
layer {
  name: "label_specific_margin"
  type: "LabelSpecificAdd"
  bottom: "fc6"
  bottom: "label"
  top: "fc6_margin"
  label_specific_add_param {
    bias: -0.35
  }
}
layer {
  name: "fc6_margin_scale"
  type: "Scale"
  bottom: "fc6_margin"
  top: "fc6_margin_scale"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  scale_param {
    filler{
      type: "constant"
      value: 30
    }
  }
}
layer {
  name: "softmax_loss"
  type: "SoftmaxWithLoss"
  bottom: "fc6_margin_scale"
  bottom: "label"
  top: "softmax_loss"
  loss_weight: 1
}
layer {
  name: "Accuracy"
  type: "Accuracy"
  bottom: "fc6"
  bottom: "label"
  top: "accuracy"
  include { 
    phase: TEST
  }
}

label_specific_add_layer.hpp/label_specific_add_layer.cpp

label_specific_add_layer.hpp/label_specific_add_layer.cpp（执行cosθ减去m操作）
公式：

Ψ (θ) = c o s (θ) - m

$\Psi(\theta) = cos(\theta)-m$

#ifndef CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_
#define CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_

#include <vector>

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

namespace caffe {

template <typename Dtype>
class LabelSpecificAddLayer : public Layer<Dtype> {
 public:
  explicit LabelSpecificAddLayer(const LayerParameter& param)
      : Layer<Dtype>(param) {}
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
                          const vector<Blob<Dtype>*>& top);
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
                       const vector<Blob<Dtype>*>& top);

  virtual inline const char* type() const { return "LabelSpecificAdd"; }
  virtual inline int MinNumBottomBlobs() const { return 2; }

 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

  Dtype bias_;
  bool transform_test_;
  bool anneal_bias_;
  Dtype bias_base_;
  Dtype bias_gamma_;
  Dtype bias_power_;
  Dtype bias_min_;
  Dtype bias_max_;
  int iteration_;
};

}  // namespace caffe

#endif  // CAFFE_LABEL_SPECIFIC_ADD_LAYER_HPP_

#include <algorithm>
#include <vector>

#include "caffe/layers/label_specific_add_layer.hpp"

namespace caffe {

  template <typename Dtype>
  void LabelSpecificAddLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
                                                    const vector<Blob<Dtype>*>& top) {
    const LabelSpecificAddParameter& param = this->layer_param_.label_specific_add_param();
    bias_ = param.bias();
    transform_test_ = param.transform_test() & (this->phase_ == TRAIN);
    anneal_bias_ = param.has_bias_base();
    bias_base_ = param.bias_base();
    bias_gamma_ = param.bias_gamma();
    bias_power_ = param.bias_power();
    bias_min_ = param.bias_min();
    bias_max_ = param.bias_max();
    iteration_ = param.iteration();
  }

  template <typename Dtype>
  void LabelSpecificAddLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
                                                    const vector<Blob<Dtype>*>& top) {
    if(top[0] != bottom[0]) top[0]->ReshapeLike(*bottom[0]);
    if (top.size() == 2) top[1]->Reshape({ 1 });
  }

template <typename Dtype>
void LabelSpecificAddLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                                                  const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* label_data = bottom[1]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();

  int num = bottom[0]->num();   // 返回batch_size
  int count = bottom[0]->count();   //返回输入的维度
  int dim = count / num;    // 对应输出的类别数

  if (top[0] != bottom[0]) caffe_copy(count, bottom_data, top_data);

  if (!transform_test_ && this->phase_ == TEST) return;     // 如果测试，则不进行该操作（思路正确）

  if (anneal_bias_) {   // 计算偏差，这里可以模拟模拟退化（缓慢变化）
    bias_ = bias_base_ + pow(((Dtype)1. + bias_gamma_ * iteration_), bias_power_) - (Dtype)1.;
    bias_ = std::max(bias_, bias_min_);
    bias_ = std::min(bias_, bias_max_);
    iteration_++;
  }
  if (top.size() == 2) {
    top[1]->mutable_cpu_data()[0] = bias_;
  }     // 输出计算偏差结果

  for (int i = 0; i < num; ++i) {
    int gt = static_cast<int>(label_data[i]);
    if(top_data[i * dim + gt] > -bias_) top_data[i * dim + gt] += bias_;    // 对应标签位置加上bias
  }
}

template <typename Dtype>
void LabelSpecificAddLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
                                                   const vector<bool>& propagate_down,
                                                   const vector<Blob<Dtype>*>& bottom) {       // 反向传播就是本身，故复制本身即可
  if (top[0] != bottom[0] && propagate_down[0]) {
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    int count = bottom[0]->count();
    caffe_copy(count, top_diff, bottom_diff);
  }
}


#ifdef CPU_ONLY
STUB_GPU(LabelSpecificAddLayer);
#endif

INSTANTIATE_CLASS(LabelSpecificAddLayer);
REGISTER_LAYER_CLASS(LabelSpecificAdd);

}  // namespace caffe