今天在arxiv上看到了这篇[论文] (https://arxiv.org/abs/1910.06278),个人认为这是一个很有意思的工作, 利用用heatmap上的最大值以及其对应位置m, 来估计真实高斯分布均值位置μ. 这样的量化误差(下采样导致的量化最小单位误差)能够得到最大程度上的减轻.
论文实验验证了该方法比经验上的估计方法(取峰值到次峰值的1/4偏移处的位置,这个估计其实也是很符合高斯分布了)更准确.
公式6一阶导
D ′ ( x ) ∣ x = μ = ∂ P T ∂ x ∣ x = μ = − Σ − 1 ( x − μ ) ∣ x = μ = 0 \left.\mathcal{D}^{\prime}(\boldsymbol{x})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=\left.\frac{\partial \mathcal{P}^{T}}{\partial \boldsymbol{x}}\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=-\left.\Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right|_{\boldsymbol{x}=\boldsymbol{\mu}}=0 D′(x)∣x=μ=∂x∂PT∣∣∣x=μ=−Σ−1(x−μ)∣∣x=μ=0
那么 D ′ ( x ) \mathcal{D}^{\prime}(\boldsymbol{x}) D′(x) 是一个和 x \boldsymbol{x} x 形状一样的向量, 然而在公式(7)对向量 μ \boldsymbol{\mu} μ泰勒展开:
公式7,高斯分布均值 μ \mu μ处关于 m m m位置的二阶泰勒展开
P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D′(m)(μ−m)+21(μ−m)TD′′(m)(μ−m)
中的第二项 D ′ ( m ) ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) D′(m)(μ−m) 中的 D ′ ( m ) \mathcal{D}^{\prime}(\boldsymbol{m}) D′(m) 是不是应该加上转置,才能得到标量? 即 D ′ ( m ) T ( μ − m ) \mathcal{D}^{\prime}(\boldsymbol{m})^T(\boldsymbol{\mu}-\boldsymbol{m}) D′(m)T(μ−m)
推导
泰勒展开公式
P ( μ ) = P ( m ) + D ′ ( m ) ( μ − m ) + 1 2 ( μ − m ) T D ′ ′ ( m ) ( μ − m ) \mathcal{P}(\boldsymbol{\mu})=\mathcal{P}(\boldsymbol{m})+\mathcal{D}^{\prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m})+\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{m})^{T} \mathcal{D}^{\prime \prime}(\boldsymbol{m})(\boldsymbol{\mu}-\boldsymbol{m}) P(μ)=P(m)+D′(m)(μ−m)+21(μ−m)TD′′(m)(μ−m)
代入
P
(
μ
)
P(\mu)
P(μ)和
P
(
m
)
P(m)
P(m)的高斯分布公式,即,将
μ
,
m
\mu,m
μ,m代入下面的式子,约掉常数项
P
(
x
;
μ
,
Σ
)
=
ln
(
G
)
=
−
ln
(
2
π
)
−
1
2
ln
(
∣
Σ
∣
)
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
\begin{aligned} \mathcal{P}(\boldsymbol{x} ; \boldsymbol{\mu}, \Sigma)=\ln (\mathcal{G})=&-\ln (2 \pi)-\frac{1}{2} \ln (|\Sigma|) \\ &-\frac{1}{2}(\boldsymbol{x}-\boldsymbol{\mu})^{T} \Sigma^{-1}(\boldsymbol{x}-\boldsymbol{\mu}) \end{aligned}
P(x;μ,Σ)=ln(G)=−ln(2π)−21ln(∣Σ∣)−21(x−μ)TΣ−1(x−μ)可以得到
0
=
−
1
2
(
m
−
μ
)
⊤
Σ
−
1
(
m
−
μ
)
+
D
′
(
m
)
⊤
(
μ
−
m
)
+
1
2
(
μ
−
m
)
⊤
D
′
′
(
m
)
(
μ
−
m
)
−
D
′
(
m
)
⊤
(
μ
−
m
)
=
(
μ
−
m
)
⊤
D
′
′
(
m
)
(
μ
−
m
)
−
D
′
(
m
)
⊤
=
(
μ
−
m
)
⊤
D
′
′
(
m
)
−
D
′
(
m
)
⊤
D
′
′
(
m
)
−
1
=
(
μ
−
m
)
⊤
−
D
′
′
(
m
)
−
⊤
D
′
(
m
)
=
μ
−
m
μ
=
m
−
D
′
′
(
m
)
−
⊤
D
′
(
m
)
\begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} &=(\mu-m)^{\top} D^{\prime \prime}(m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} &=(\mu-m)^{\top} \\-D^{\prime\prime}(m)^{-\top} D^{\prime}(m) &=\mu-m \\ \mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \end{aligned}
0−D′(m)⊤(μ−m)−D′(m)⊤−D′(m)⊤D′′(m)−1−D′′(m)−⊤D′(m)μ=−21(m−μ)⊤Σ−1(m−μ)+D′(m)⊤(μ−m)+21(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)=(μ−m)⊤=μ−m=m−D′′(m)−⊤D′(m)
因为 D ′ ′ ( m ) = - Σ − 1 D^{\prime \prime}(m)=- \Sigma^{-1} D′′(m)=-Σ−1,在论文中方差矩阵假设为对角阵(可逆) Σ = [ σ 2 0 0 σ 2 ] \Sigma=\left[\begin{array}{ll}{\sigma^{2}} & {0} \\ {0} & {\sigma^{2}}\end{array}\right] Σ=[σ200σ2] (因为xy方向独立), 这意味着 D ′ ′ ( m ) = D ′ ′ ( m ) T D^{\prime \prime}(m)=D^{\prime \prime}(m)^T D′′(m)=D′′(m)T, 所以
μ = m − D ′ ′ ( m ) − ⊤ D ′ ( m ) μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \begin{aligned}\mu &=m-D^{\prime \prime}(m)^{-\top} D^{\prime}(m) \\ \mu&=m-D^{\prime \prime}(m)^{-1} D^{\prime}(m)\end{aligned} μμ=m−D′′(m)−⊤D′(m)=m−D′′(m)−1D′(m)
补充一个细节:
上面的推导, 在第三个等式约掉(
μ
−
m
\mu-m
μ−m)的条件是假设
μ
\mu
μ不等于
m
m
m,
所以下面的等式是更完备的推导:
0
=
−
1
2
(
m
−
μ
)
⊤
Σ
−
1
(
m
−
μ
)
+
D
′
(
m
)
⊤
(
μ
−
m
)
+
1
2
(
μ
−
m
)
⊤
D
′
′
(
m
)
(
μ
−
m
)
−
D
′
(
m
)
⊤
(
μ
−
m
)
=
(
μ
−
m
)
⊤
D
′
′
(
m
)
(
μ
−
m
)
−
D
′
(
m
)
⊤
D
′
′
(
m
)
−
1
(
μ
−
m
)
=
(
μ
−
m
)
⊤
(
μ
−
m
)
0
=
[
μ
−
m
+
D
′
′
(
m
)
−
⊤
D
′
(
m
)
]
(
μ
−
m
)
0
=
[
μ
−
m
+
D
′
′
(
m
)
−
1
D
′
(
m
)
]
(
μ
−
m
)
\begin{aligned} 0&=-\frac{1}{2}(m-\mu)^{\top} \Sigma^{-1}(m-\mu) +D^{\prime}(m)^{\top}(\mu-m)+\frac{1}{2}(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m)\\ -D^{\prime}(m)^{\top}(\mu-m)&=(\mu-m)^{\top} D^{\prime \prime}(m)(\mu-m) \\-D^{\prime}(m)^{\top} D^{\prime \prime}(m)^{-1} (\mu-m)&=(\mu-m)^{\top}(\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-\top} D^{\prime}(m)](\mu-m) \\ 0 &=[\mu-m+D^{\prime \prime}(m)^{-1} D^{\prime}(m)](\mu-m) \end{aligned}
0−D′(m)⊤(μ−m)−D′(m)⊤D′′(m)−1(μ−m)00=−21(m−μ)⊤Σ−1(m−μ)+D′(m)⊤(μ−m)+21(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤D′′(m)(μ−m)=(μ−m)⊤(μ−m)=[μ−m+D′′(m)−⊤D′(m)](μ−m)=[μ−m+D′′(m)−1D′(m)](μ−m)
这个推导的建立在两个假设上面:
(1) 下采样后得到的heatmap上面的取值, 被假设为服从真实关键点位置的高斯分布
(2) 二阶泰勒展开的近似
那么 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=m−D′′(m)−1D′(m) 也包含了 μ = m \mu=m μ=m的可能, 因为
D ′ ( m ) = 0 ⇔ m D^{\prime}(m)=0\Leftrightarrow m D′(m)=0⇔m在高斯分布的均值位置 ⇔ μ = m \Leftrightarrow \mu=m ⇔μ=m
所以 μ = m − D ′ ′ ( m ) − 1 D ′ ( m ) \mu =m-D^{\prime \prime}(m)^{-1} D^{\prime}(m) μ=m−D′′(m)−1D′(m)是完备的
如果有问题, 还请指出~
原作者也给出了关于公式的解释: http://www.ilovepose.cn/t/99