参考:
https://blog.youkuaiyun.com/dcz1994/article/details/88837760
用一个Gibbs分布来表征条件随机场:
P
(
X
∣
I
)
=
1
Z
(
I
)
exp
(
−
∑
c
∈
C
G
ϕ
c
(
X
c
∣
I
)
)
P(\mathbf{X} | \mathbf{I})=\frac{1}{Z(\mathbf{I})} \exp \left(-\sum_{c \in \mathcal{C}_{\mathcal{G}}} \phi_{c}\left(\mathbf{X}_{c} | \mathbf{I}\right)\right)
P(X∣I)=Z(I)1exp⎝⎛−c∈CG∑ϕc(Xc∣I)⎠⎞
取随机场最大后验概率对应的x作为标签:
x
∗
=
arg
mal
x
∈
L
N
P
(
x
∣
I
)
\mathbf{x}^{*}=\arg \operatorname{mal}_{\mathbf{x} \in \mathcal{L}^{N}} P(\mathbf{x} | \mathbf{I})
x∗=argmalx∈LNP(x∣I)
整个随机场的Gibbs能量为:
E
(
x
)
=
∑
i
ψ
u
(
x
i
)
+
∑
i
<
j
ψ
p
(
x
i
,
x
j
)
E(\mathrm{x})=\sum_{i} \psi_{u}\left(x_{i}\right)+\sum_{i<j} \psi_{p}\left(x_{i}, x_{j}\right)
E(x)=i∑ψu(xi)+i<j∑ψp(xi,xj)
式中,
ψ
u
(
x
i
)
\psi_{u}\left(x_{i}\right)
ψu(xi)和
ψ
p
(
x
i
,
x
j
)
\psi_{p}\left(x_{i},x_j\right)
ψp(xi,xj)分别代表unary and pairwise cliques
考虑二元势:
ψ
p
(
x
i
,
x
j
)
=
μ
(
x
i
,
x
j
)
∑
m
=
1
K
w
(
m
)
k
(
m
)
(
f
i
,
f
j
)
⎵
k
(
f
i
,
f
j
)
\psi_{p}\left(x_{i}, x_{j}\right)=\mu\left(x_{i}, x_{j}\right) \underbrace{\sum_{m=1}^{K} w^{(m)} k^{(m)}\left(\mathbf{f}_{i}, \mathbf{f}_{j}\right)}_{k\left(\mathbf{f}_{i}, \mathbf{f}_{j}\right)}
ψp(xi,xj)=μ(xi,xj)k(fi,fj)
m=1∑Kw(m)k(m)(fi,fj)
式中表示的是整个概率图模型中某一个pairwise cliques的势函数,那个K是指一共有k个高斯核吗?
μ
(
x
i
,
x
j
)
\mu(x_i,x_j)
μ(xi,xj)是标签相关性函数:
对于多类别图像分割问题使用contrast-sensitive two-kernel potentials,
I
i
I_i
Ii和
I
j
I_j
Ij表示颜色向量,
p
i
p_i
pi和
p
j
p_j
pj表示位置:
k
(
f
i
,
f
j
)
=
w
(
1
)
exp
(
−
∣
p
i
−
p
j
∣
2
2
θ
α
2
−
∣
I
i
−
I
j
∣
2
2
θ
β
2
)
⎵
appearance kernel
+
w
(
2
)
exp
(
−
∣
p
i
−
p
j
∣
2
2
θ
γ
2
)
⎵
smoothness kernel
k\left(\mathbf{f}_{i}, \mathbf{f}_{j}\right)=\underbrace{w^{(1)} \exp \left(-\frac{\left|p_{i}-p_{j}\right|^{2}}{2 \theta_{\alpha}^{2}}-\frac{\left|I_{i}-I_{j}\right|^{2}}{2 \theta_{\beta}^{2}}\right)}_{\text { appearance kernel }}+w^{(2)} \underbrace{\exp \left(-\frac{\left|p_{i}-p_{j}\right|^{2}}{2 \theta_{\gamma}^{2}}\right)}_{\text { smoothness kernel }}
k(fi,fj)= appearance kernel
w(1)exp(−2θα2∣pi−pj∣2−2θβ2∣Ii−Ij∣2)+w(2) smoothness kernel
exp(−2θγ2∣pi−pj∣2)
Efficient Inference in Fully Connected CRFs
使用
Q
(
X
)
Q(X)
Q(X)近似代替原始的
P
(
X
)
P(X)
P(X)分布,并使得KL散度
D
(
Q
∣
∣
P
)
D(Q||P)
D(Q∣∣P)最小。
推导过程参考FCN(5)——DenseCRF推导
这里我直接搬运过来了,这样方变做笔记哈哈哈
下面变分推断的目的是找到一个函数
Q
(
x
)
Q(x)
Q(x),来近似表示
P
(
x
)
P(x)
P(x),以降低模型的复杂度。这个过程经过推导可知需要进行迭代近似。CRF的参数包括
θ
和
w
\theta和w
θ和w,参数的学习需要使用其他算法进行。
我们首先给出denseCRF的Gibbs分布:
P
(
X
)
=
1
Z
P
~
(
X
)
=
1
Z
exp
(
∑
i
ψ
u
(
x
i
)
+
∑
i
<
j
ψ
p
(
x
i
,
x
j
)
)
P(X)=\frac{1}{Z} \tilde{P}(X)=\frac{1}{Z} \exp \left(\sum_{i} \psi_{u}\left(x_{i}\right)+\sum_{i<j} \psi_{p}\left(x_{i}, x_{j}\right)\right)
P(X)=Z1P~(X)=Z1exp(i∑ψu(xi)+i<j∑ψp(xi,xj))
D
(
Q
∥
P
)
=
∑
x
Q
(
x
)
log
(
Q
(
x
)
P
(
x
)
)
=
−
∑
x
Q
(
x
)
log
P
(
x
)
+
∑
x
Q
(
x
)
log
Q
(
x
)
D(Q \| P)=\sum_{x} Q(x) \log \left(\frac{Q(x)}{P(x)}\right)=-\sum_{x} Q(x) \log P(x)+\sum_{x} Q(x) \log Q(x)
D(Q∥P)=x∑Q(x)log(P(x)Q(x))=−x∑Q(x)logP(x)+x∑Q(x)logQ(x)
= − E X ∈ Q [ log P ( X ) ] + E X ∈ Q [ log Q ( X ) ] =-E_{X \in Q}[\log P(X)]+E_{X \in Q}[\log Q(X)] =−EX∈Q[logP(X)]+EX∈Q[logQ(X)]
= − E X ∈ Q [ log P ~ ( X ) ] + E X ∈ Q [ log Z ] + ∑ i E X i ∈ Q [ log Q i ( X i ) ] =-E_{X \in Q}[\log \tilde{P}(X)]+E_{X \in Q}[\log Z]+\sum_{i} E_{X_{i} \in Q}\left[\log Q_{i}\left(X_{i}\right)\right] =−EX∈Q[logP~(X)]+EX∈Q[logZ]+i∑EXi∈Q[logQi(Xi)]
=
−
E
X
∈
Q
[
log
P
~
(
X
)
]
+
log
Z
+
∑
i
E
X
i
∈
Q
i
[
log
Q
i
(
X
i
)
]
=-E_{X \in Q}[\log \tilde{P}(X)]+\log Z+\sum_{i} E_{X_{i} \in Q_{i}}\left[\log Q_{i}\left(X_{i}\right)\right]
=−EX∈Q[logP~(X)]+logZ+i∑EXi∈Qi[logQi(Xi)]
由于我们要求的是Q,而logZ项中没有Q,所以这一项可以省略。
Q(X)是在当前输入下,某一标签取得x值的概率
同时Q还需要满足:
概率归一化
∑
x
i
Q
i
(
x
i
)
=
1
\sum_{x_{i}} Q_{i}\left(x_{i}\right)=1
xi∑Qi(xi)=1
所以利用拉格朗日乘子法,可以得到
L
(
Q
i
)
=
−
E
X
i
∈
Q
[
log
P
~
(
X
)
]
+
∑
i
E
x
i
∈
Q
i
[
log
Q
i
(
x
i
)
]
+
λ
(
∑
x
i
Q
i
(
x
i
)
−
1
)
L\left(Q_{i}\right)=-E_{X_{i} \in Q}[\log \tilde{P}(X)]+\sum_{i} E_{x_{i} \in Q_{i}}\left[\log Q_{i}\left(x_{i}\right)\right]+\lambda\left(\sum_{x_{i}} Q_{i}\left(x_{i}\right)-1\right)
L(Qi)=−EXi∈Q[logP~(X)]+i∑Exi∈Qi[logQi(xi)]+λ(xi∑Qi(xi)−1)
这个公式的后面两项相对比较简单,但是前面一项比较复杂,我们单独做一下处理:
该项在之前被表示为:
∑
x
Q
(
x
)
log
Q
(
x
)
\sum_{x} Q(x) \log Q(x)
∑xQ(x)logQ(x)
−
E
X
i
∈
Q
[
log
P
~
(
X
)
]
=
−
∫
∏
i
Q
i
(
x
i
)
[
log
P
~
(
X
)
]
d
X
-E_{X_{i} \in Q}[\log \tilde{P}(X)]=-\int \prod_{i} Q_{i}\left(x_{i}\right)[\log \tilde{P}(X)] d X
−EXi∈Q[logP~(X)]=−∫i∏Qi(xi)[logP~(X)]dX
= − ∫ Q i ( x i ) ∏ i Q ( x ‾ i ) [ log P ~ ( X ) ] d x i d X ‾ =-\int Q_{i}\left(x_{i}\right) \prod_{i} Q\left(\overline{x}_{i}\right)[\log \tilde{P}(X)] d x_{i} d \overline{X} =−∫Qi(xi)i∏Q(xi)[logP~(X)]dxidX
=
−
∫
Q
i
(
x
i
)
E
X
‾
∈
Q
[
log
P
~
(
X
)
]
d
x
i
=-\int Q_{i}\left(x_{i}\right) E_{\overline{X} \in Q}[\log \tilde{P}(X)] d x_{i}
=−∫Qi(xi)EX∈Q[logP~(X)]dxi
经过上面的公式整理,我们可以求出偏导,可得
∂
L
(
Q
i
)
∂
Q
i
(
x
i
)
=
−
E
X
‾
∈
Q
i
[
log
P
~
(
X
∣
x
i
)
]
−
log
Q
i
(
x
i
)
−
1
+
λ
\frac{\partial L\left(Q_{i}\right)}{\partial Q_{i}\left(x_{i}\right)}=-E_{\overline{X} \in Q_{i}}\left[\log \tilde{P}\left(X | x_{i}\right)\right]-\log Q_{i}\left(x_{i}\right)-1+\lambda
∂Qi(xi)∂L(Qi)=−EX∈Qi[logP~(X∣xi)]−logQi(xi)−1+λ
令偏导为0,就可以求出极值:
Q
i
(
x
i
)
=
exp
(
λ
−
1
)
exp
(
−
E
X
‾
∈
Q
i
[
log
P
~
(
X
∣
x
i
)
]
)
Q_{i}\left(x_{i}\right)=\exp (\lambda-1) \exp \left(-E_{\overline{X} \in Q_{i}}\left[\log \tilde{P}\left(X | x_{i}\right)\right]\right)
Qi(xi)=exp(λ−1)exp(−EX∈Qi[logP~(X∣xi)])
由于每一个Q的
exp
(
λ
−
1
)
\exp(\lambda-1)
exp(λ−1)都相同,我们将其当作一个常数项,之后在renormalize的时候将其抵消掉,于是Q函数就等于:
Q
(
x
i
)
=
1
Z
1
exp
(
−
E
X
‾
∈
Q
i
[
log
P
~
(
X
∣
x
i
)
]
)
Q\left(x_{i}\right)=\frac{1}{Z_{1}} \exp \left(-E_{\overline{X} \in Q_{i}}\left[\log \tilde{P}\left(X | x_{i}\right)\right]\right)
Q(xi)=Z11exp(−EX∈Qi[logP~(X∣xi)])
我们将文章开头关于\tilde{P}的定义带入,就得到了
Q
(
x
i
)
=
1
Z
1
exp
(
−
E
X
‾
∈
Q
[
(
∑
i
ψ
u
(
x
i
)
+
∑
j
≠
i
ψ
p
(
x
i
,
x
j
)
)
∣
x
i
]
)
Q\left(x_{i}\right)=\frac{1}{Z_{1}} \exp \left(-E_{\overline{X} \in Q}\left[\left(\sum_{i} \psi_{u}\left(x_{i}\right)+\sum_{j \neq i} \psi_{p}\left(x_{i}, x_{j}\right)\right) | x_{i}\right]\right)
Q(xi)=Z11exp⎝⎛−EX∈Q⎣⎡⎝⎛i∑ψu(xi)+j̸=i∑ψp(xi,xj)⎠⎞∣xi⎦⎤⎠⎞
这里面xi的由于是已知的,所以我们可以得到补充材料里的结果(但是变量名不太一样):
Q
i
(
x
i
=
l
)
=
1
Z
i
exp
[
−
ψ
u
(
l
)
−
∑
j
≠
i
E
X
‾
∈
Q
j
ψ
p
(
l
,
X
j
)
]
Q_{i}\left(x_{i}=l\right)=\frac{1}{Z_{i}} \exp \left[-\psi_{u}(l)-\sum_{j \neq i} E_{\overline{X} \in Q_{j}} \psi_{p}\left(l, X_{j}\right)\right]
Qi(xi=l)=Zi1exp⎣⎡−ψu(l)−j̸=i∑EX∈Qjψp(l,Xj)⎦⎤
继续扩展,就可以得到
=
1
Z
i
exp
[
−
ψ
u
(
l
)
−
∑
m
=
1
K
w
(
m
)
∑
j
≠
i
E
X
∈
Q
j
[
μ
(
l
,
X
j
)
k
(
m
)
(
f
i
,
f
j
)
]
]
=\frac{1}{Z_{i}} \exp \left[-\psi_{u}(l)-\sum_{m=1}^{K} w^{(m)} \sum_{j \neq i} E_{X \in Q_{j}}\left[\mu\left(l, X_{j}\right) k^{(m)}\left(f_{i}, f_{j}\right)\right]\right]
=Zi1exp⎣⎡−ψu(l)−m=1∑Kw(m)j̸=i∑EX∈Qj[μ(l,Xj)k(m)(fi,fj)]⎦⎤
= 1 Z i exp [ − ψ u ( l ) − ∑ m = 1 K w ( m ) ∑ j ≠ i ∑ l ′ ∈ L Q j ( l ′ ) μ ( l , l ′ ) k ( m ) ( f i , f j ) ] =\frac{1}{Z_{i}} \exp \left[-\psi_{u}(l)-\sum_{m=1}^{K} w^{(m)} \sum_{j \neq i} \sum_{l^{\prime} \in L} Q_{j}\left(l^{\prime}\right) \mu\left(l, l^{\prime}\right) k^{(m)}\left(f_{i}, f_{j}\right)\right] =Zi1exp⎣⎡−ψu(l)−m=1∑Kw(m)j̸=i∑l′∈L∑Qj(l′)μ(l,l′)k(m)(fi,fj)⎦⎤
=
1
Z
i
exp
[
−
ψ
u
(
l
)
−
∑
l
′
∈
L
μ
(
l
,
l
′
)
∑
m
=
1
K
w
(
m
)
∑
j
≠
i
Q
j
(
l
′
)
k
(
m
)
(
f
i
,
f
j
)
]
=\frac{1}{Z_{i}} \exp \left[-\psi_{u}(l)-\sum_{l^{\prime} \in L} \mu\left(l, l^{\prime}\right) \sum_{m=1}^{K} w^{(m)} \sum_{j \neq i} Q_{j}\left(l^{\prime}\right) k^{(m)}\left(f_{i}, f_{j}\right)\right]
=Zi1exp⎣⎡−ψu(l)−l′∈L∑μ(l,l′)m=1∑Kw(m)j̸=i∑Qj(l′)k(m)(fi,fj)⎦⎤
这样,一个类似message passing的公式推导就完成了。其中最内层的求和可以用截断的高斯滤波完成。搬运最后的一点公式,可以得:
Q
i
(
m
~
)
(
l
)
=
∑
j
≠
i
Q
j
(
l
′
)
k
(
m
)
(
f
i
,
f
j
)
=
∑
j
Q
j
(
l
)
k
(
m
)
(
f
i
,
f
j
)
−
Q
i
(
l
)
Q_{i}^{(\tilde{m})}(l)=\sum_{j \neq i} Q_{j}\left(l^{\prime}\right) k^{(m)}\left(f_{i}, f_{j}\right)=\sum_{j} Q_{j}(l) k^{(m)}\left(f_{i}, f_{j}\right)-Q_{i}(l)
Qi(m~)(l)=j̸=i∑Qj(l′)k(m)(fi,fj)=j∑Qj(l)k(m)(fi,fj)−Qi(l)
最终得到的迭代公式是:
Q
i
(
x
i
=
l
)
=
1
Z
i
exp
{
−
ψ
u
(
x
i
)
−
∑
l
′
∈
L
μ
(
l
,
l
′
)
∑
m
=
1
K
w
(
m
)
∑
j
≠
i
k
(
m
)
(
f
i
,
f
j
)
Q
j
(
l
′
)
}
Q_{i}\left(x_{i}=l\right)=\frac{1}{Z_{i}} \exp \left\{-\psi_{u}\left(x_{i}\right)-\sum_{l^{\prime} \in \mathcal{L}} \mu\left(l, l^{\prime}\right) \sum_{m=1}^{K} w^{(m)} \sum_{j \neq i} k^{(m)}\left(\mathbf{f}_{i}, \mathbf{f}_{j}\right) Q_{j}\left(l^{\prime}\right)\right\}
Qi(xi=l)=Zi1exp⎩⎨⎧−ψu(xi)−l′∈L∑μ(l,l′)m=1∑Kw(m)j̸=i∑k(m)(fi,fj)Qj(l′)⎭⎬⎫