2.Methodology
Against the aforementioned challenges, effective mea
sures need to be taken. In this section, we fifirst focus on the
design of loss function, which simultaneously takes care of
Challenges #1, #2, and #3. Then, we detail our architecture.
The whole deep network consists of a backbone subnet for
predicting landmark coordinates, which specififically consid
ers Challenge #4, as well as an auxiliary one for estimating
geometric information.
为了应对以上提到的种种挑战,需要采取高效的方法。在这一部分,我们首先关注损失函数的设计,同时兼顾挑战 #1 #2 #3,然后我们详述我们的结构的一些细节。整个深度学习网络由用来预测关键点坐标的主干网络组成,也同时特地考虑到了挑战#4,用来辅助计算几何信息。
2.1 损失函数
The quality of training greatly depends on the design
of loss function, especially when the scale of training data
is not suffificiently large. For penalizing errors between
ground-truth landmarks
X
:= [
x
1
, ...,
x
N
]
∈
R
2
×
N
and
predicted ones
Y
:= [
y
1
, ...,
y
N
]
∈
R
2
×
N
, the simplest
losses arguably go to
`
2
and
`
1
losses. However, equally
measuring the differences of landmark pairs is not so wise,
without considering geometric/structural information. For
instance, given a pair of
x
i
and
y
i
with their deviation
d
i
:=
x
i -
y
i
in the image space, if two projections (poses
with respect to a camera) are applied from 3D real face to
2D image, the intrinsic distances on the real face could be
signifificantly different. Hence,
integrating geometric infor
mation into penalization is helpful to mitigating this issue.
For face images, the global geometric status - 3D pose -
is suffificient to determine the manner of projection. For
mally, let
X
denote the concerned location of 2D land
marks, which is a projection of 3D face landmarks,
i.e.
U
∈
R
4
×
N
, each column of which corresponds to a 3D
location
[
u
i
, v
i
, z
i
,
1]
T
. By assuming a weak perspective
model as [
14
], a
2
×

最低0.47元/天 解锁文章
881

被折叠的 条评论
为什么被折叠?



