0. 带约束问题的最小化
minf(x)s.t.(1)gi(x)≤0,i=1,2,⋯ ,m(2)hi(x)=0,i=1,2,⋯ ,q
min f(x)
\\
s.t.
\\
(1) g_i(x)\leq0, i=1,2,\cdots,m
\\
(2) h_i(x)=0, i=1,2,\cdots,q
minf(x)s.t.(1)gi(x)≤0,i=1,2,⋯,m(2)hi(x)=0,i=1,2,⋯,q
可以利用KKT条件,将问题转换为无约束最小化:
y(x∣λi,vi)=f(x)+∑i=1mλigi(x)+∑i=1qvihi(x)
y(x|\lambda_i, v_i) = f(x) + \sum_{i=1}^m\lambda_i g_i(x) + \sum_{i=1}^q v_i h_i(x)
y(x∣λi,vi)=f(x)+i=1∑mλigi(x)+i=1∑qvihi(x)
其中,λi,vi\lambda_i, v_iλi,vi为拉格朗日乘数。
局部极小值解满足KKT条件:
- gi(x∗)≤0,i=1,2,⋯ ,mg_i(x^*)\leq 0, i=1,2,\cdots,mgi(x∗)≤0,i=1,2,⋯,m
- hx(x)=0,i=1,2,⋯ ,qh_x(x)=0, i=1,2,\cdots,qhx(x)=0,i=1,2,⋯,q
- λi≥0,i=1,2,⋯ ,m\lambda_i\geq0, i=1,2,\cdots,mλi≥0,i=1,2,⋯,m
- λigi(x)=0,i=1,2,⋯ ,m\lambda_ig_i(x)=0, i=1,2,\cdots,mλigi(x)=0,i=1,2,⋯,m
- ∇y(x∗∣λi∗,vi∗)=0\nabla y(x^*|\lambda_i^*, v_i^*)=0∇y(x∗∣λi∗,vi∗)=0
1. 非负矩阵分解的定义
三个矩阵:
(1)数据矩阵V,大小为m×nm\times nm×n,其中,m为样本特征维数,n为样本个数
(2)基矩阵W,大小为m×dm\times dm×d,其中,d为隐特征空间中样本的维数,W≥0W\geq 0W≥0
(3)系数矩阵H,大小为d×nd\times nd×n, H≥0H\geq 0H≥0
NMF的目标函数为:
minJ=∣∣V−WH∣∣F2s.t.W≥0,H≥0
min J = ||V - WH||_F^2
\\
s.t. W\geq 0, H\geq 0
minJ=∣∣V−WH∣∣F2s.t.W≥0,H≥0
2. 非负矩阵求解
J=∣∣V−WH∣∣F2=tr((V−WH)T(V−WH))s.t.W≥0,H≥0
J = ||V - WH||_F^2=tr((V-WH)^T(V-WH))
\\
s.t. W\geq0, H\geq0
J=∣∣V−WH∣∣F2=tr((V−WH)T(V−WH))s.t.W≥0,H≥0
假设A=[Aij]A = [A_{ij}]A=[Aij], B=[Bij]B = [B_{ij}]B=[Bij],将约束条件代入目标J可得:
J=∣∣V−WH∣∣F2−∑im∑jdAijWij−∑id∑jnBijHijJ=tr((V−WH)T(V−WH))−tr(ATW)−tr(BTH)
J = ||V-WH||_F^2 - \sum_i^m\sum_j^d A_{ij}W_{ij }- \sum_i^d\sum_j^n B_{ij}H_{ij}
\\
J = tr((V-WH)^T(V-WH)) - tr(A^TW) - tr(B^TH)
J=∣∣V−WH∣∣F2−i∑mj∑dAijWij−i∑dj∑nBijHijJ=tr((V−WH)T(V−WH))−tr(ATW)−tr(BTH)
2.1 目标函数对W求导
令:
J1=tr((V−WH)T(V−WH))=tr(VTV)−tr(VTWH)−tr(HTWTV)+tr(HTWTWH)J2=tr(ATW)J3=tr(BTH)
J_1 = tr((V-WH)^T(V-WH))=tr(V^TV)-tr(V^TWH)-tr(H^TW^TV)+tr(H^TW^TWH)
\\
J_2 = tr(A^TW)
\\
J_3 = tr(B^TH)
J1=tr((V−WH)T(V−WH))=tr(VTV)−tr(VTWH)−tr(HTWTV)+tr(HTWTWH)J2=tr(ATW)J3=tr(BTH)
2.1.1 计算W的导数
(1)J1J_1J1 对W求导
∂tr(VTWH)∂W=∂tr(HVTW)∂W=(HVT)T=VHT
\frac{\partial tr(V^TWH)}{\partial W} = \frac{\partial tr(HV^TW)}{\partial W} =(HV^T)^T=VH^T
∂W∂tr(VTWH)=∂W∂tr(HVTW)=(HVT)T=VHT
∂tr(HTWTV)∂W=∂tr(VHTWT)∂W=VHT
\frac{\partial tr(H^TW^TV)}{\partial W} = \frac{\partial tr(VH^TW^T)}{\partial W} =VH^T
∂W∂tr(HTWTV)=∂W∂tr(VHTWT)=VHT
∂tr(HTWTWH)∂W=∂tr(WHHTWT)∂W=2WHHT
\frac{\partial tr(H^TW^TWH)}{\partial W} = \frac{\partial tr(WHH^TW^T)}{\partial W} =2WHH^T
∂W∂tr(HTWTWH)=∂W∂tr(WHHTWT)=2WHHT
所以,
∂J1∂W=−2(V−WH)HT
\frac{\partial J_1}{\partial W} = -2(V-WH)H^T
∂W∂J1=−2(V−WH)HT
(2) J2J_2J2对W求导
∂J2∂W=∂tr(ATW)∂W=A
\frac{\partial J_2}{\partial W} = \frac{\partial tr(A^TW)}{\partial W} = A
∂W∂J2=∂W∂tr(ATW)=A
(3) J对W求导
由(1)和(2)可得,
∂J∂W=−2(V−WH)HT+A
\frac{\partial J}{\partial W} = -2(V-WH)H^T + A
∂W∂J=−2(V−WH)HT+A
(4) 考虑KKT条件
因为Aij(−Wij)=0A_{ij}(-W_{ij})=0Aij(−Wij)=0, Bij(−Hij)=0B_{ij}(-H_{ij})=0Bij(−Hij)=0,有
A⊙W=O,B⊙H=O
A \odot W = O, B\odot H = O
A⊙W=O,B⊙H=O
(5) 求取迭代公式
令∂J∂W=0\frac{\partial J}{\partial W}=0∂W∂J=0,并右乘W,有
−2(V−WH)HT⊙W+A⊙W=O
-2(V-WH)H^T \odot W + A \odot W = O
−2(V−WH)HT⊙W+A⊙W=O
代入(4),有
(V−WH)HT⊙W=O
(V-WH)H^T \odot W = O
(V−WH)HT⊙W=O
可得:
(VHT)ij(WHHT)ijWij→Wij
\frac{(VH^T)_{ij}}{(WHH^T)_{ij}}W_{ij} \rightarrow W_{ij}
(WHHT)ij(VHT)ijWij→Wij
2.1.2 计算H的导数
采用与W导数类似的计算方法,可得J对H的导数为:
∂J∂H=−2WT(V−WH)+B
\frac{\partial J}{\partial H} = -2W^T(V-WH) + B
∂H∂J=−2WT(V−WH)+B
令∂J∂H=0\frac{\partial J}{\partial H}=0∂H∂J=0,并右乘H,有
−2WT(V−WH)⊙H+B⊙H=0
-2W^T(V-WH)\odot H + B\odot H = 0
−2WT(V−WH)⊙H+B⊙H=0
可得WT(V−WH)⊙H=0W^T(V-WH)\odot H=0WT(V−WH)⊙H=0
所以,有:
[WTV]kj[WTWH]kjHkj→Hkj
\frac{[W^TV]_{kj}}{[W^TWH]_{kj}}H_{kj} \rightarrow H_{kj}
[WTWH]kj[WTV]kjHkj→Hkj