
二. APPROACH
In this section, we will describe our approach towards joint
face detection and alignment.
A. Overall Framework
The overall pipeline of our approach is shown in Fig. 1.
Given an image, we initially resize it to different scales to build
an image pyramid, which is the input of the following
three-stage cascaded framework:
Stage 1
: We exploit a fully convolutional network[?], called
Proposal Network (P-Net), to obtain the candidate windows
and their bounding box regression vectors in a similar manner
as [29]. Then we use the estimated bounding box regression
vectors to calibrate the candidates. After that, we employ
non-maximum suppression (NMS) to merge highly overlapped
candidates.
Stage 2
: all candidates are fed to another CNN, called Refine
Network (R-Net), which further rejects a large number of false
candidates, performs calibration with bounding box regression,
and NMS candidate merge.
Stage 3
: This stage is similar to the second stage, but in this
stage we aim to describe the face in more details. In particular,

最低0.47元/天 解锁文章
756

被折叠的 条评论
为什么被折叠?



