问题:如果有以下目标函数,则其对P,Q的求导结果是怎样的,怎么来的?这个问题困扰了我非常久,费了不少劲,特此记录下。
F(P,Q)=minP,Q∥X−PQ∥F2+α∥P∣F2+α∥Q∣F2∂F∂P=?,∂F∂Q=?并且整个推导过程是怎么样的? F(P,Q)=\min_{P,Q} \|X-PQ\|_F^2+\alpha \|P|_F^2+\alpha \|Q|_F^2 \\ \frac{\partial{F}}{\partial{P}}=\textcolor{red}{?}, \quad \frac{\partial{F}}{\partial{Q}}=\textcolor{red}{?} \\ 并且整个推导过程是怎么样的? F(P,Q)=P,Qmin∥X−PQ∥F2+α∥P∣F2+α∥Q∣F2∂P∂F=?,∂Q∂F=?并且整个推导过程是怎么样的?
1. 常用公式:
∥X∥F2=tr(XXT)(54)
\|X\|_F^2=tr(XX^T) \tag{54}
∥X∥F2=tr(XXT)(54)
(A+B)T=AT+BT(4)
(A+B)^T=A^T+B^T \tag{4}
(A+B)T=AT+BT(4)tr(A)=tr(AT)(13)
tr(A)=tr(A^T) \tag{13}
tr(A)=tr(AT)(13)∂∥x∥22∂x=2x,x是向量(131)
\frac{\partial{\| x\|_2^2}}{\partial{x}}=2x, \quad x是向量 \tag{131}
∂x∂∥x∥22=2x,x是向量(131)∂∥X∥F2∂X=2X,X是矩阵(132)
\frac{\partial{\|X\|_F^2}}{\partial{X}}=2X , \quad X是矩阵 \tag{132}
∂X∂∥X∥F2=2X,X是矩阵(132)∂tr(F(X))∂X=f(X)T,其中f(X)是F(X)的标量导数(98-99)
\frac{\partial{tr(F(X))}}{\partial{X}}=f(X)^T \tag{98-99}
,其中f(X)是F(X)的标量导数
∂X∂tr(F(X))=f(X)T,其中f(X)是F(X)的标量导数(98-99)∂tr(XBXT)∂X=XBT+XB(111)
\frac{\partial{tr(XBX^T)}}{\partial{X}}=XB^T + XB \tag{111}
∂X∂tr(XBXT)=XBT+XB(111)∂tr(AXBXTC)∂X=ATCTXBT+CAXB(118)
\frac{\partial{tr(AXBX^TC)}}{\partial{X}}=A^TC^TXB^T + CAXB \tag{118}
∂X∂tr(AXBXTC)=ATCTXBT+CAXB(118)∂A=0,A是一个常量矩阵(33)
\partial{A}=0, \quad A是一个常量矩阵 \tag{33}
∂A=0,A是一个常量矩阵(33)∂(XY)=∂(X)Y+X∂(Y),matmulproduct(一般矩阵乘积)(37)
\partial{(XY)}=\partial{(X )} Y + X \partial{(Y)}, \quad matmul product(一般矩阵乘积)\tag{37}
∂(XY)=∂(X)Y+X∂(Y),matmulproduct(一般矩阵乘积)(37)∂(X∘Y)=∂(X)∘Y+X∘∂(Y),Hadamard积(38)
\partial{(X \circ Y)}=\partial{(X )} \circ Y + X \circ \partial{(Y)}, \quad Hadamard积\tag{38}
∂(X∘Y)=∂(X)∘Y+X∘∂(Y),Hadamard积(38)∂(X⊗Y)=∂(X)⊗Y+X⊗∂(Y),Kronecker积(39)
\partial{(X \otimes Y)}=\partial{(X )} \otimes Y + X \otimes \partial{(Y)}
, \quad Kronecker 积 \tag{39}
∂(X⊗Y)=∂(X)⊗Y+X⊗∂(Y),Kronecker积(39)
2. 问题推导:
先对目标函数进行整理:
F(P,Q)=∥X−PQ∥F2+α∥P∣F2+α∥Q∣F2=tr((X−PQ)(X−PQ)T)+αtr(PPT)+αtr(QQT)=tr((X−PQ)(XT−QTPT))+αtr(PPT)+αtr(QQT)=tr(XXT)−tr(PQXT)−tr(XQTPT)+tr(PQQTPT)+αtr(PPT)+αtr(QQT)=tr(XXT)−2tr(PQXT)+tr(PQQTPT)+αtr(PPT)+αtr(QQT)
F(P,Q) = \|X-PQ\|_F^2+\alpha \|P|_F^2+\alpha \|Q|_F^2 \\
= tr((X-PQ)(X-PQ)^T)+\alpha tr(PP^T)+\alpha tr(QQ^T) \\
= tr((X-PQ)(X^T-Q^TP^T))+\alpha tr(PP^T)+\alpha tr(QQ^T) \\
= tr(XX^T)-tr(PQX^T)-tr(XQ^TP^T)+tr(PQQ^TP^T)+\alpha tr(PP^T)+\alpha tr(QQ^T) \\
= tr(XX^T)-2tr(PQX^T)+tr(PQQ^TP^T)+\alpha tr(PP^T)+\alpha tr(QQ^T) \\
F(P,Q)=∥X−PQ∥F2+α∥P∣F2+α∥Q∣F2=tr((X−PQ)(X−PQ)T)+αtr(PPT)+αtr(QQT)=tr((X−PQ)(XT−QTPT))+αtr(PPT)+αtr(QQT)=tr(XXT)−tr(PQXT)−tr(XQTPT)+tr(PQQTPT)+αtr(PPT)+αtr(QQT)=tr(XXT)−2tr(PQXT)+tr(PQQTPT)+αtr(PPT)+αtr(QQT)
则
∂F∂P=0−2(QXT)T+P(QQT)T+P(QQT)+αP1T+αP1+0=−2XQT+PQQT+PQQT+2αP=−2XQT+2PQQT+2αP
\frac{\partial{F}}{\partial{P}} = 0-2(QX^T)^T+P(QQ^T)^T+ P(QQ^T)+\alpha P1^T+\alpha P1+0 \\
= -2XQ^T+PQQ^T+ PQQ^T+2 \alpha P \\
= -2XQ^T+2PQQ^T+2 \alpha P
∂P∂F=0−2(QXT)T+P(QQT)T+P(QQT)+αP1T+αP1+0=−2XQT+PQQT+PQQT+2αP=−2XQT+2PQQT+2αP
同理:
∂F∂Q=−2PTX+2PTPQ+2αQ
\frac{\partial{F}}{\partial{Q}} = -2P^TX+2P^TPQ+2 \alpha Q
∂Q∂F=−2PTX+2PTPQ+2αQ