人脸检测--Supervised Transformer Network for Efficient Face Detection

最新推荐文章于 2025-02-14 15:54:07 发布

原创最新推荐文章于 2025-02-14 15:54:07 发布 · 1.9k 阅读

3 ·

CC 4.0 BY-SA版权

人脸检测识别专栏收录该内容

36 篇文章

订阅专栏

Supervised Transformer Network for Efficient Face Detection
ECCV2016

人脸检测： the cascaded network；end-to-end learning； jointly conduct face detection and face alignment
Our detector runs at 30 FPS on a single CPU core for a VGA-resolution image

2 Network Architecture
2.1 Overview
这里写图片描述
整个网络主要包括两个模块：
1）：第一个模块是多任务 Region Proposal Network (RPN)，它负责提取人脸候选区域及相应的人脸特征点 facial landmarks，在局部邻域，我们只选前 K 个候选区域，其他的候选区域直接扔掉。
2）：第二个模块是一个 Supervised Transformer layer 和一个 RCNN，Transformer layer 的输入是人脸区域及对应的特征点，然后将人脸映射到标准姿态，即人脸转正。主要是通过特征点对齐完成的：facial landmarks and the canonical positions 两者是一一对应的。最后将人脸区域输入 RCNN network 进行人脸二分类。

2.2 Multi-task RPN
这里的多任务RPN 是受文献【16】启发的，RPN同时完成人脸检测及对应的人脸特征点。我们的方法和文献【20】很相似，只不过我们回归的目标是人脸特征点位置，而不是矩形框坐标参数
这里写图片描述

2.3 The supervised transformer layer
这个模块主要负责解决人脸多尺度和多姿态问题 scale and rotation variation

常用的方法是训练一个预测模型用于检测人脸特征点，然后通过特征点对应关系来将人脸转正 map to a canonical pose
这个过程至少有两个问题：
1）需要人工设定 canonical locations，not only time-consuming, but also suboptimal
2）训练样本中的 facial landmark points 这个标记不太容易， highly subjective process

We propose to learn both the canonical positions and the prediction of the facial landmarks end-to-end from the network with additional supervision
information from the classification objective of the RCNN using end-to-end back propagation

这里我们通过学习得到 canonical positions 和 prediction of the facial landmarks

接着就是公式推导

2.4 Non-top K suppression
keep K candidate regions with highest confidence for each potential face

2.5 Multi-granularity feature combination
综合利用多尺度特征有助于提高系统性能，这里我们将 RPN 的特征和 RCNN特征综合起来

3 The ROI convolution
3.1 Motivation
如何在 CPU 中提高 CNN 网络的运算速度是一个很重要的问题，卷积层的计算量大约占整个网络的 90%。
我们这里主要的加速思路是：使用一个标准的级联人脸检测器用于快速去除非人脸区域，得到一个二值 ROI mask。这个 ROI mask 的尺寸和输入图像尺寸一样，背景区域为0，人脸区域为1. DNN 卷积只对 mask 为 1的区域进行计算。
这里写图片描述

3.2 Implementation details
Cascade pre-filter：这里就是一个加强版的 Volia-Jones’s detector，更多的弱分类和更多的训练数据

ROI convolution
这里写图片描述
主要利用二值 mask 来加速卷积计算
The original DNN detector can run at 50 FPS on GPU and 10 FPS on CPU for a VGA image. With ROI convolution, it can speed up to 30 FPS on CPU with little accuracy loss