==PART1 ==
1. 问题
如何求下列表达式中的未知参数W?求偏导?
min
f
(
W
)
=
min
w
∥
(
X
W
∘
D
˘
)
B
∥
F
2
(0)
\min f(W)=\min_w \|(XW \circ \breve{D})B\|_F^2 \tag{0}
minf(W)=wmin∥(XW∘D˘)B∥F2(0)
其中
只有
W
是未知参数
\textcolor{red}{只有W是未知参数}
只有W是未知参数,且
X
∈
R
n
×
m
,
W
∈
R
m
×
c
,
D
˘
∈
R
n
×
c
,
B
∈
R
c
×
c
X \in R^{n \times m}, W \in R^{m \times c}, \breve{D} \in R^{n \times c}, B \in R^{c \times c}
X∈Rn×m,W∈Rm×c,D˘∈Rn×c,B∈Rc×c
同时
∘
\circ
∘ 表示 Hadamard积,即矩阵按位乘(matlab的点乘)。
2. 推导过程
令
S
=
X
W
∘
D
˘
S=XW \circ \breve{D}
S=XW∘D˘,则
f
(
W
)
=
∥
(
X
W
∘
D
˘
)
B
∥
F
2
=
∥
S
B
∥
F
2
=
t
r
(
S
B
B
T
S
T
)
(1)
f(W)=\|(XW \circ \breve{D})B\|_F^2 \\ = \|SB\|_F^2 \\ =tr(SBB^TS^T) \tag{1}
f(W)=∥(XW∘D˘)B∥F2=∥SB∥F2=tr(SBBTST)(1)
由于有
∂
t
r
(
X
B
X
T
)
∂
X
=
X
B
T
+
X
B
,
(2)
\frac{\partial{tr(XBX^T)}}{\partial{X}}=XB^T + XB \tag{2},
∂X∂tr(XBXT)=XBT+XB,(2)
所以
∂
f
∂
S
=
S
B
B
T
+
S
B
B
T
=
2
S
B
B
T
(3)
\frac{\partial{f}}{\partial{S}}=SBB^T + SBB^T=2SBB^T \tag{3}
∂S∂f=SBBT+SBBT=2SBBT(3)
继续求解:
d
f
=
t
r
[
(
∂
f
∂
S
)
T
d
S
]
=
t
r
[
(
∂
f
∂
S
)
T
d
(
X
W
∘
D
˘
)
]
=
t
r
[
(
∂
f
∂
S
)
T
(
d
(
X
W
)
∘
D
˘
)
]
=
t
r
[
(
∂
f
∂
S
)
T
(
X
d
W
∘
D
˘
)
]
=
t
r
[
(
∂
f
∂
S
)
T
(
D
˘
∘
X
d
W
)
]
=
t
r
[
(
∂
f
∂
S
∘
D
˘
)
T
X
d
W
)
]
=
t
r
[
(
X
T
(
∂
f
∂
S
∘
D
˘
)
)
T
d
W
]
df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(XW \circ \breve{D})] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(d(XW)\circ\breve{D}) ] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(XdW\circ\breve{D})] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(\breve{D} \circ XdW)] \\ =tr[(\frac{\partial{f}}{\partial{S}} \circ \breve{D})^TXdW)] \\ =tr[(X^T(\frac{\partial{f}}{\partial{S}} \circ \breve{D}))^TdW] \\
df=tr[(∂S∂f)TdS]=tr[(∂S∂f)Td(XW∘D˘)]=tr[(∂S∂f)T(d(XW)∘D˘)]=tr[(∂S∂f)T(XdW∘D˘)]=tr[(∂S∂f)T(D˘∘XdW)]=tr[(∂S∂f∘D˘)TXdW)]=tr[(XT(∂S∂f∘D˘))TdW]
所以:
∂
f
∂
W
=
X
T
(
∂
f
∂
S
∘
D
˘
)
=
X
T
[
(
2
S
B
B
T
)
∘
D
˘
]
=
X
T
(
2
(
X
W
∘
D
˘
)
B
B
T
∘
D
˘
)
\frac{\partial{f}}{\partial{W}}=X^T(\frac{\partial{f}}{\partial{S}} \circ \breve{D})=X^T[(2SBB^T) \circ \breve{D}] \\ =X^T(2(XW \circ \breve{D})BB^T\circ \breve{D} )
∂W∂f=XT(∂S∂f∘D˘)=XT[(2SBBT)∘D˘]=XT(2(XW∘D˘)BBT∘D˘)
3. 说明
- 本篇内容属于标量对矩阵的求导,还有更难的矩阵对矩阵求导
- 本文内容经 矩阵求导术 相关内容推导而来,更详细说明请参阅矩阵求导术相关内容 https://blog.youkuaiyun.com/lgl123ok/article/details/120780368
- 重点公式:
==PART2 ==
问题2:
∥
X
W
−
Z
∥
F
2
\|XW-Z\|_F^2
∥XW−Z∥F2对W求偏导是多少?
答:令
f
(
W
)
=
∥
X
W
−
Z
∥
F
2
f(W)=\|XW-Z\|_F^2
f(W)=∥XW−Z∥F2, S=XW-Z
则
∂
f
∂
S
=
2
S
=
2
(
X
W
−
Z
)
\frac{\partial{f}}{\partial{S}}=2S=2(XW-Z)
∂S∂f=2S=2(XW−Z)
d
f
=
t
r
[
(
∂
f
∂
S
)
T
d
S
]
=
t
r
[
(
∂
f
∂
S
)
T
d
(
X
W
−
Z
)
]
=
t
r
[
(
∂
f
∂
S
)
T
(
X
d
W
)
]
=
t
r
[
(
X
T
∂
f
∂
S
)
T
d
W
]
df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(XW -Z)] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(XdW) ] \\ =tr[(X^T\frac{\partial{f}}{\partial{S}})^TdW ] \\
df=tr[(∂S∂f)TdS]=tr[(∂S∂f)Td(XW−Z)]=tr[(∂S∂f)T(XdW)]=tr[(XT∂S∂f)TdW]
所以:
∂
f
∂
W
=
X
T
∂
f
∂
S
=
X
T
[
2
(
X
W
−
Z
)
]
=
2
X
T
(
X
W
−
Z
)
\frac{\partial{f}}{\partial{W}}=X^T\frac{\partial{f}}{\partial{S}} =X^T[2(XW-Z)] \\ =2X^T(XW-Z)
∂W∂f=XT∂S∂f=XT[2(XW−Z)]=2XT(XW−Z)
结论是正确的,参 Multi-label feature selection via manifold regularization and
dependence maximization 的第11式。
==PART3 ==
问题3:
∥
Q
−
W
∘
A
+
1
ρ
Γ
1
∥
F
2
\|Q- W \circ A + \frac{1}{\rho}\Gamma_1\|_F^2
∥Q−W∘A+ρ1Γ1∥F2对W求偏导是多少? 圈是点乘。
答:令
f
(
W
)
=
∥
Q
−
W
∘
S
+
1
ρ
Γ
1
∥
F
2
f(W)=\|Q- W \circ S + \frac{1}{\rho}\Gamma_1\|_F^2
f(W)=∥Q−W∘S+ρ1Γ1∥F2,
S
=
Q
−
W
∘
S
+
1
ρ
Γ
1
S=Q- W \circ S + \frac{1}{\rho}\Gamma_1
S=Q−W∘S+ρ1Γ1
则
∂
f
∂
S
=
2
S
=
2
(
Q
−
W
∘
A
+
1
ρ
Γ
1
)
\frac{\partial{f}}{\partial{S}}=2S=2(Q- W \circ A + \frac{1}{\rho}\Gamma_1)
∂S∂f=2S=2(Q−W∘A+ρ1Γ1)
d
f
=
t
r
[
(
∂
f
∂
S
)
T
d
S
]
=
t
r
[
(
∂
f
∂
S
)
T
d
(
Q
−
W
∘
A
+
1
ρ
Γ
1
)
]
=
t
r
[
(
∂
f
∂
S
)
T
(
−
A
∘
d
W
)
]
=
t
r
[
(
∂
f
∂
S
∘
(
−
A
)
)
T
d
W
]
df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(Q- W \circ A + \frac{1}{\rho}\Gamma_1)] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(-A \circ dW) ] \\ =tr[(\frac{\partial{f}}{\partial{S}} \circ (-A))^TdW ] \\
df=tr[(∂S∂f)TdS]=tr[(∂S∂f)Td(Q−W∘A+ρ1Γ1)]=tr[(∂S∂f)T(−A∘dW)]=tr[(∂S∂f∘(−A))TdW]
所以:
∂
f
∂
W
=
∂
f
∂
S
∘
(
−
A
)
=
2
(
Q
−
W
∘
A
+
1
ρ
Γ
1
)
∘
(
−
A
)
=
−
2
(
Q
−
W
∘
A
+
1
ρ
Γ
1
)
∘
A
\frac{\partial{f}}{\partial{W}}=\frac{\partial{f}}{\partial{S}} \circ (-A) =2(Q- W \circ A + \frac{1}{\rho}\Gamma_1) \circ (-A) \\ =-2(Q- W \circ A + \frac{1}{\rho}\Gamma_1) \circ A
∂W∂f=∂S∂f∘(−A)=2(Q−W∘A+ρ1Γ1)∘(−A)=−2(Q−W∘A+ρ1Γ1)∘A
这个只能适当参考。