看BACF论文中的公式,有很多地方的推导感觉不对劲,推导出的结果也和代码不一致,自己又推导了一遍,得到了与代码一致的推导。
损失函数
E
(
h
)
=
1
2
∑
j
=
1
T
∣
∣
y
(
j
)
−
∑
k
=
1
K
h
k
P
T
x
k
[
∇
τ
j
]
∣
∣
2
2
+
λ
2
∣
∣
h
k
∣
∣
2
2
E(\mathbf{h})=\frac 12\sum_{j=1}^T ||\mathbf{y}(j)-\sum_{k=1}^K\mathbf{h}_k\mathbf{P}^T\mathbf{x}_k[\nabla \tau_j]||^2_2+\frac{\lambda}2||\mathbf{h}_k||^2_2
E(h)=21j=1∑T∣∣y(j)−k=1∑KhkPTxk[∇τj]∣∣22+2λ∣∣hk∣∣22
其中
T
T
T为
x
\mathbf{x}
x的像素个数。
转到频域,即论文的公式(4),根据帕斯瓦尔定理,前面应该多一个
1
T
\frac 1T
T1的系数。
E
(
h
,
g
^
)
=
1
2
T
∣
∣
y
^
−
X
^
g
^
∣
∣
2
2
+
λ
2
∣
∣
h
∣
∣
2
2
s
.
t
g
^
=
T
(
F
P
T
⊗
I
K
)
h
E(\mathbf{h},\hat{\mathbf{g}})=\frac 1{2T}||\hat{\mathbf{y}}-\hat{\mathbf{X}}\hat{\mathbf{g}}||^2_2+\frac {\lambda}2||\mathbf{h}||^2_2\\ s.t\quad \hat{\mathbf{g}}=\sqrt{T}(\mathbf{FP}^T\otimes \mathbf{I_K})\mathbf{h}
E(h,g^)=2T1∣∣y^−X^g^∣∣22+2λ∣∣h∣∣22s.tg^=T(FPT⊗IK)h
其中
^
\hat{}
^表示傅里叶变换,
a
^
=
T
F
a
\hat{\mathbf{a}}=\sqrt{T}\mathbf{Fa}
a^=TFa,
X ^ = [ d i a g ( x 1 ) T , . . . d i a g ( x K ) T ] \hat{\mathbf{X}}=[diag(\mathbf{x_1})^T, ... diag(\mathbf{x}_K)^T] X^=[diag(x1)T,...diag(xK)T], h = [ h 1 T , . . . , h K T ] T \mathbf{h}=[\mathbf{h}^T_1,...,\mathbf{h}^T_K]^T h=[h1T,...,hKT]T , g ^ = [ g 1 T ^ , . . . , g K T ^ ] T \hat{\mathbf{g}}=[\hat{\mathbf{g}^T_1},...,\hat{\mathbf{g}^T_K}]^T g^=[g1T^,...,gKT^]T。
构造增广拉格朗日函数
L
(
g
^
,
h
,
ζ
^
)
=
1
2
T
∥
y
^
−
X
^
g
^
∥
2
2
+
λ
2
∥
h
∥
2
2
+
ζ
^
⊤
(
g
^
−
T
(
F
P
⊤
⊗
I
K
)
h
)
+
μ
2
∥
g
^
−
T
(
F
P
⊤
⊗
I
K
)
h
∥
2
2
\begin{aligned} \mathcal{L}(\hat{\mathbf{g}}, \mathbf{h}, \hat{\zeta})=& \frac{1}{2T}\|\hat{\mathbf{y}}-\hat{\mathbf{X}} \hat{\mathbf{g}}\|_{2}^{2}+\frac{\lambda}{2}\|\mathbf{h}\|_{2}^{2} \\ &+\hat{\zeta}^{\top}\left(\hat{\mathbf{g}}-\sqrt{T}\left(\mathbf{F} \mathbf{P}^{\top} \otimes \mathbf{I}_{K}\right) \mathbf{h}\right) \\ &+\frac{\mu}{2}\left\|\hat{\mathbf{g}}-\sqrt{T}\left(\mathbf{F} \mathbf{P}^{\top} \otimes \mathbf{I}_{K}\right) \mathbf{h}\right\|_{2}^{2} \end{aligned}
L(g^,h,ζ^)=2T1∥y^−X^g^∥22+2λ∥h∥22+ζ^⊤(g^−T(FP⊤⊗IK)h)+2μ∥∥∥g^−T(FP⊤⊗IK)h∥∥∥22
使用交替方向乘子法(ADMM)求解。
子问题
h
=
a
r
g
m
i
n
h
L
(
g
^
,
h
,
ζ
^
)
\mathbf{h}=\underset{\mathbf{h}}{\rm argmin} \mathcal{L}(\hat{\mathbf{g}}, \mathbf{h}, \hat{\zeta})
h=hargminL(g^,h,ζ^),对
h
\mathbf{h}
h求偏导。
∂
L
∂
h
=
λ
h
−
T
(
F
P
T
⊗
I
K
)
T
ζ
^
−
μ
T
(
F
P
T
⊗
I
K
)
T
(
g
^
−
T
(
F
P
T
⊗
I
K
)
h
)
=
λ
h
−
T
ζ
−
μ
T
g
+
μ
T
h
=
0
\frac {\partial \mathcal{L}}{\partial \mathbf{h}} =\lambda\mathbf{h}-\sqrt{T}(\mathbf{FP}^T\otimes \mathbf{I}_K)^T\hat{\zeta}-\mu\sqrt{T}(\mathbf{FP}^T\otimes \mathbf{I}_K)^T(\hat{\mathbf{g}}-\sqrt{T}(\mathbf{FP}^T\otimes \mathbf{I}_K)\mathbf{h})\\ = \lambda \mathbf{h}-T\mathbf{\zeta}-\mu T\mathbf{g}+\mu T \mathbf{h} = 0
∂h∂L=λh−T(FPT⊗IK)Tζ^−μT(FPT⊗IK)T(g^−T(FPT⊗IK)h)=λh−Tζ−μTg+μTh=0
其中
g
=
1
T
(
P
F
T
⊗
I
K
)
g
^
\mathbf{g}=\frac 1{\sqrt{T}}(\mathbf{PF}^T\otimes \mathbf{I}_K)\hat{\mathbf{g}}
g=T1(PFT⊗IK)g^,
ζ
=
1
T
(
P
F
T
⊗
I
K
)
ζ
^
\mathbf{\zeta}=\frac 1{\sqrt{T}}(\mathbf{PF}^T\otimes \mathbf{I}_K)\hat{\mathbf{\zeta}}
ζ=T1(PFT⊗IK)ζ^,即傅里叶反变换。
解得
h
=
μ
g
+
ζ
λ
T
+
μ
\mathbf{h}=\frac{\mu\mathbf{g}+\mathbf{\zeta}}{\frac{\lambda}T+\mu}
h=Tλ+μμg+ζ
与论文(6)式不一致。
子问题
g
^
=
a
r
g
m
i
n
g
^
L
(
g
^
,
h
,
ζ
^
)
\mathbf{\hat{g}}=\underset{\mathbf{\hat{g}}}{\rm argmin} \mathcal{L}(\hat{\mathbf{g}}, \mathbf{h}, \hat{\zeta})
g^=g^argminL(g^,h,ζ^), 由于每个像素的值独立,将问题分解为
T
T
T个子问题,
t
=
1
,
2
,
.
.
.
,
T
t=1,2,...,T
t=1,2,...,T。
g
^
(
t
)
∗
=
arg
min
g
^
(
t
)
{
1
2
T
∥
y
^
(
t
)
−
x
^
(
t
)
⊤
g
^
(
t
)
∥
2
2
+
ζ
^
(
t
)
⊤
(
g
^
(
t
)
−
h
^
(
t
)
)
+
μ
2
∥
g
^
(
t
)
−
h
^
(
t
)
∥
2
2
}
\begin{aligned} \hat{\mathbf{g}}(t)^{*}=& \arg \min _{\hat{\mathbf{g}}(t)}\left\{\frac{1}{2T}\left\|\hat{\mathbf{y}}(t)-\hat{\mathbf{x}}(t)^{\top} \hat{\mathbf{g}}(t)\right\|_{2}^{2}\right.\\ &+{\hat{\zeta}}(t)^{\top}(\hat{\mathbf{g}}(t)-\hat{\mathbf{h}}(t)) \\ &\left.+\frac{\mu}{2}\|\hat{\mathbf{g}}(t)-\hat{\mathbf{h}}(t)\|_{2}^{2}\right\} \end{aligned}
g^(t)∗=argg^(t)min{2T1∥∥y^(t)−x^(t)⊤g^(t)∥∥22+ζ^(t)⊤(g^(t)−h^(t))+2μ∥g^(t)−h^(t)∥22}
求偏导,令为0
−
1
T
x
^
(
t
)
(
y
^
(
t
)
−
x
^
(
t
)
T
g
^
(
t
)
)
+
ζ
^
+
μ
(
g
^
(
t
)
−
h
^
(
t
)
)
=
0
-\frac 1{T}\hat{\mathbf{x}}(t)(\hat{\mathbf{y}}(t)-\hat{\mathbf{x}}(t)^T\hat{\mathbf{g}}(t))+\hat{\mathbf{\zeta}}+\mu(\hat{\mathbf{g}}(t)-\hat{\mathbf{h}}(t))=0
−T1x^(t)(y^(t)−x^(t)Tg^(t))+ζ^+μ(g^(t)−h^(t))=0
( x ^ ( t ) x ^ ( t ) T + μ T I ) g ^ ( t ) = x ^ ( t ) y ^ ( t ) − T ζ ^ + μ T h ^ ( t ) (\hat{\mathbf{x}}(t)\hat{\mathbf{x}}(t)^T+\mu T \mathbf{I})\hat{\mathbf{g}}(t)= \hat{\mathbf{x}}(t)\hat{\mathbf{y}}(t)-T\hat{\mathbf{\zeta}}+\mu T \hat{\mathbf{h}}(t) (x^(t)x^(t)T+μTI)g^(t)=x^(t)y^(t)−Tζ^+μTh^(t)
这步得到的结果和论文公式(9)一致,也解答了 T T T从哪推出来的疑惑。
得到
g
^
(
t
)
∗
=
(
x
^
(
t
)
x
^
(
t
)
⊤
+
T
μ
I
K
)
−
1
(
y
^
(
t
)
x
^
(
t
)
−
T
ζ
^
(
t
)
+
T
μ
h
^
(
t
)
)
\begin{aligned} \hat{\mathbf{g}}(t)^{*}=&\left(\hat{\mathbf{x}}(t) \hat{\mathbf{x}}(t)^{\top}+T \mu \mathbf{I}_{K}\right)^{-1} \\ &(\hat{\mathbf{y}}(t) \hat{\mathbf{x}}(t)-T \hat{\zeta}(t)+T \mu \hat{\mathbf{h}}(t)) \end{aligned}
g^(t)∗=(x^(t)x^(t)⊤+TμIK)−1(y^(t)x^(t)−Tζ^(t)+Tμh^(t))
根据Sherman-Morrison公式
(
A
+
u
v
T
)
−
1
=
A
−
1
−
A
−
1
u
v
T
A
−
1
1
+
v
T
A
−
1
u
\left(\mathbf{A}+\mathbf{u v}^{T}\right)^{-1}=\mathbf{A}^{-1}-\frac{\mathbf{A}^{-1} \mathbf{u} \mathbf{v}^{T} \mathbf{A}^{-1}}{1+\mathbf{v}^{T} \mathbf{A}^{-1} \mathbf{u}}
(A+uvT)−1=A−1−1+vTA−1uA−1uvTA−1
令
q
=
y
^
(
t
)
x
^
(
t
)
−
T
ζ
^
(
t
)
+
T
μ
h
^
(
t
)
\mathbf{q}=\hat{\mathbf{y}}(t) \hat{\mathbf{x}}(t)-T \hat{\zeta}(t)+T \mu \hat{\mathbf{h}}(t)
q=y^(t)x^(t)−Tζ^(t)+Tμh^(t)
得
g
^
(
t
)
∗
=
1
μ
T
(
I
−
x
^
(
t
)
x
^
(
t
)
T
μ
T
+
x
^
(
t
)
T
x
^
(
t
)
)
q
\hat{\mathbf{g}}(t)^*=\frac 1{\mu T}(\mathbf{I}-\frac {\hat{\mathbf{x}}(t)\hat{\mathbf{x}}(t)^T}{\mu T+\hat{\mathbf{x}}(t)^T\hat{\mathbf{x}}(t)})\mathbf{q}
g^(t)∗=μT1(I−μT+x^(t)Tx^(t)x^(t)x^(t)T)q
整理一下应该是
g
^
(
t
)
∗
=
1
μ
T
(
y
^
(
t
)
x
^
(
t
)
−
T
ζ
^
(
t
)
+
μ
T
h
^
(
t
)
)
−
x
^
(
t
)
μ
T
b
(
y
^
(
t
)
s
^
x
(
t
)
−
T
s
^
ζ
(
t
)
+
μ
T
s
^
h
(
t
)
)
\begin{aligned} \hat{\mathbf{g}}(t)^{*}=& \frac{1}{\mu T}( \hat{\mathbf{y}}(t) \hat{\mathbf{x}}(t)-T\hat{\boldsymbol{\zeta}}(t)+\mu T \hat{\mathbf{h}}(t)) \\ &-\frac{\hat{\mathbf{x}}(t)}{\mu T b}\left( \hat{\mathbf{y}}(t) \hat{s}_{\mathbf{x}}(t)-T\hat{s}_{\zeta}(t)+\mu T \hat{s}_{\mathbf{h}}(t)\right) \end{aligned}
g^(t)∗=μT1(y^(t)x^(t)−Tζ^(t)+μTh^(t))−μTbx^(t)(y^(t)s^x(t)−Ts^ζ(t)+μTs^h(t))
其中
s
^
x
(
t
)
=
x
^
(
t
)
⊤
x
^
,
s
^
ζ
(
t
)
=
x
^
(
t
)
⊤
ζ
^
,
s
^
h
(
t
)
=
x
^
(
t
)
⊤
h
^
,
b
=
s
^
x
(
t
)
+
T
μ
{\hat{s}_{\mathbf{x}}(t)=\hat{\mathbf{x}}(t)^{\top} \hat{\mathbf{x}}, \hat{s}_{\zeta}(t)=\hat{\mathbf{x}}(t)^{\top} \hat{\boldsymbol{\zeta}}, \hat{s}_{\mathbf{h}}(t)=\hat{\mathbf{x}}(t)^{\top} \hat{\mathbf{h}}}, b=\hat{s}_{\mathbf{x}}(t)+T \mu
s^x(t)=x^(t)⊤x^,s^ζ(t)=x^(t)⊤ζ^,s^h(t)=x^(t)⊤h^,b=s^x(t)+Tμ
再看看代码里的求解过程,是不是一致了?
g_f = single(zeros(size(xf)));
h_f = g_f;
l_f = g_f;
mu = 1;
betha = 10;
mumax = 10000;
i = 1;
T = prod(use_sz);
S_xx = sum(conj(model_xf) .* model_xf, 3);
params.admm_iterations = 2;
% ADMM
while (i <= params.admm_iterations)
% solve for G- please refer to the paper for more details
B = S_xx + (T * mu);
%B = S_xx + T*mu;
S_lx = sum(conj(model_xf) .* l_f, 3);
S_hx = sum(conj(model_xf) .* h_f, 3);
g_f = (((1/(T*mu)) * bsxfun(@times, yf, model_xf)) - ((1/mu) * l_f) + h_f) - ...
bsxfun(@rdivide,(((1/(T*mu)) * bsxfun(@times, model_xf, (S_xx .* yf))) - ((1/mu) * bsxfun(@times, model_xf, S_lx)) + (bsxfun(@times, model_xf, S_hx))), B);
%g_f = (((1/(mu)) * bsxfun(@times, yf, model_xf)) - ((1/mu) * l_f) + h_f) - ...
% bsxfun(@rdivide,(((1/(mu)) * bsxfun(@times, model_xf, (S_xx .* yf))) - ((1/mu) * bsxfun(@times, model_xf, S_lx)) + (bsxfun(@times, model_xf, S_hx))), B);
% solve for H
h = (T/((mu*T)+ params.admm_lambda))* ifft2((mu*g_f) + l_f);
%h = (sqrt(T)/((mu*sqrt(T))+ params.admm_lambda))* ifft2((mu*g_f) + l_f);
[sx,sy,h] = get_subwindow_no_window(h, floor(use_sz/2) , small_filter_sz);
t = single(zeros(use_sz(1), use_sz(2), size(h,3)));
t(sx,sy,:) = h;
h_f = fft2(t);
% update L
l_f = l_f + (mu * (g_f - h_f));
% update mu- betha = 10.
mu = min(betha * mu, mumax);
i = i+1;
end