READING NOTE: Face Detection with End-to-End Integration of a ConvNet and a 3D Model

本文提出了一种简单而有效的方法,将卷积神经网络(ConvNet)与三维模型进行端到端集成,并使用多任务损失函数进行人脸检测。该方法解决了在野外环境下更快的RCNN用于人脸检测时存在的两个限制:通过利用三维模型消除了锚点框的启发式设计;用配置池化代替了通用且预定义的区域兴趣池化,充分利用了潜在的对象结构配置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

TITLE: Face Detection with End-to-End Integration of a ConvNet and a 3D Model

AUTHOR: Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang

ASSOCIATION: Peking University, North Carolina State University

FROM: arXiv:1606.00850

CONTRIBUTIONS

  1. It presents a simple yet effective method to integrate a ConvNet and a 3D model in an end-to-end learning with multi-task loss used for face detection in the wild.
  2. It addresses two limitations in adapting the state-of-the-art faster-RCNN for face detection: eliminating the heuristic design of anchor boxes by leveraging a 3D model, and replacing the generic and predefined RoI pooling with a configuration pooling which exploits the underlying object structural configurations.
  3. It obtains very competitive state-of-the-art performance in the FDDB and AFW benchmarks.

METHOD

The main scheme of inferring is shown in the following figure.

The input image is sent into a ConvNet, e.g. VGG, with an upsampling layer. Then the network will generate face proposals based on the score of summing the log probability of the keypoints, which is predicted by the predefined 3D face model.

some details

  1. The loss of keypoint labels is defined as

    Lcls(ω)=12mi=12mlog(pxili)

    where ω stands for the learnable weights of ConvNet, m is the number of the keypoints, and pxili is the probability of the point in location xi , which can be obtained by annotations, belongs to label li .

  2. The loss of keypoit locations is defined as

    Lptloc(ω)=1m2i=1mi=1mt{x,y}Smooth(tit^i,j)

    where smooth() is the smooth l1 loss. For each ground-truth keypoint, we can generate a set of predicted keypoints based on the 3D face model and the 3D transformation parameters. If for each face we have m keypoints, then we will generate m sets of predicted keypoints. For each keypoint, m locations will be predicted.

  3. The Configuration Pooling Layer is similar to the ROI Pooling Layer in faster-RCNN. Features are extracted based on the locations and relations of the keypoints, rather than based on the predefined perceptive field.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值