矩阵微分

这篇博客详细介绍了矩阵微分的相关知识,包括标准梯度公式、矩阵迹的性质和矩阵微分的多个重要性质,并通过证明解释了如何求解矩阵函数的导数。重点探讨了自变量为标量、向量和矩阵时的微分形式,以及如何利用矩阵微分解决实际问题。

矩阵微分(一)

标准梯度公式

自变量是标量

Df(x)=lim⁡t→0f(x+t)−f(x)t Df(x) = \lim _{t\to 0} \frac {f(x+t)-f(x)}{t} Df(x)=t0limtf(x+t)f(x)

自变量是向量

Dwf(x)=lim⁡t→0f(x+tw)−f(x)t D_{\textbf {w}}f(\textbf {x}) = \lim _{t\to 0} \frac {f(\textbf {x} + t\textbf {w}) - f(\textbf {x})}{t} Dwf(x)=t0limtf(x+tw)f(x)

自变量是矩阵

DWf(X)=lim⁡t→0f(X+tW)−f(X)t D_{\textbf {W}}f(\textbf {X}) = \lim _{t\to 0} \frac {f(\textbf {X}+t\textbf {W})-f(\textbf {X})}{t} DWf(X)=t0limtf(X+tW)f(X)

矩阵迹的性质

性质1

tr(A)=tr(AT) tr(A) = tr(A^{T}) tr(A)=tr(AT)

性质2

tr(AB)=tr(BA) tr(AB) = tr(BA) tr(AB)=tr(BA)

tr(AB)=tr(BA) tr(AB) = tr(BA) tr(AB)=tr(BA)

tr(ABCD)=tr(DABC)=tr(CDAB)=tr(BCDA) tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA) tr(ABCD)=tr(DABC)=tr(CDAB)=tr(BCDA)

性质3

tr(A+B)=tr(A)+tr(B) tr(A+B) = tr(A) + tr(B) tr(A+B)=tr(A)+tr(B)

性质4

tr(αA)=αtr(A) tr(\alpha A) = \alpha tr(A) tr(αA)=αtr(A)

性质5

设有矩阵H、U,H和U都是n x m的矩阵,则有:

∑j=1m∑i=1n(hijuij)=∑j=1m∑i=1n((hT)jiuij)=tr(HTU) \sum _{j=1}^{m} \sum _{i=1}^{n}(h_{ij}u_{ij}) = \sum _{j=1}^{m} \sum _{i=1}^{n}((h^{T})_{ji}u_{ij}) = tr(H^{T}U) j=1mi=1n(hijuij)=j=1mi=1n((hT)jiuij)=tr(HTU)

矩阵微分的性质

设有关于矩阵A的一个函数f,记为f(A),f(A)关于A的导数为:

∇Af(A)=∂f(A)∂A=[∂f∂A11∂f∂A12⋯∂f∂A1n∂f∂A21∂f∂A22⋯∂f∂A2n⋮⋮⋱⋮∂f∂Am1∂f∂Am2⋯∂f∂Amn] \nabla _{A}f(A) = \frac { \partial f(A) }{ \partial A } = \left[ \begin{matrix} \frac {\partial f }{\partial A_{11}}&\frac {\partial f }{\partial A_{12}}&\cdots &\frac {\partial f }{\partial A_{1n}}\\ \frac {\partial f }{\partial A_{21}}&\frac {\partial f }{\partial A_{22}}&\cdots &\frac {\partial f }{\partial A_{2n}}\\ \vdots &\vdots &\ddots &\vdots \\ \frac {\partial f }{\partial A_{m1}}&\frac {\partial f }{\partial A_{m2}}&\cdots &\frac {\partial f }{\partial A_{mn}}\\ \end{matrix} \right] Af(A)=Af(A)=A11fA21fAm1fA12fA22fAm2fA1nfA2nfAmnf

性质1

∇ATf(A)=(∇Af(A))T \nabla _{ A^{T} }f(A) = (\nabla _{A}f(A))^{T} ATf(A)=(Af(A))T

证明

∇ATf(A)=[∂f∂A11∂f∂A21⋯∂f∂Am1∂f∂A12∂f∂A22⋯∂f∂Am2⋮⋮⋱⋮∂f∂A1n∂f∂A2n⋯∂f∂Amn]=[∂f∂A11∂f∂A12⋯∂f∂A1n∂f∂A21∂f∂A22⋯∂f∂A2n⋮⋮⋱⋮∂f∂Am1∂f∂Am2⋯∂f∂Amn]T=(∂f(A)∂A)T=(∇Af(A))T \nabla _{ A^{T} }f(A) = \left[ \begin{matrix} \frac {\partial f }{\partial A_{11}}&\frac {\partial f }{\partial A_{21}}&\cdots &\frac {\partial f }{\partial A_{m1}}\\ \frac {\partial f }{\partial A_{12}}&\frac {\partial f }{\partial A_{22}}&\cdots &\frac {\partial f }{\partial A_{m2}}\\ \vdots &\vdots &\ddots &\vdots \\ \frac {\partial f }{\partial A_{1n}}&\frac {\partial f }{\partial A_{2n}}&\cdots &\frac {\partial f }{\partial A_{mn}}\\ \end{matrix} \right] = \left[ \begin{matrix} \frac {\partial f }{\partial A_{11}}&\frac {\partial f }{\partial A_{12}}&\cdots &\frac {\partial f }{\partial A_{1n}}\\ \frac {\partial f }{\partial A_{21}}&\frac {\partial f }{\partial A_{22}}&\cdots &\frac {\partial f }{\partial A_{2n}}\\ \vdots &\vdots &\ddots &\vdots \\ \frac {\partial f }{\partial A_{m1}}&\frac {\partial f }{\partial A_{m2}}&\cdots &\frac {\partial f }{\partial A_{mn}}\\ \end{matrix} \right]^{T} = (\frac { \partial f(A) }{ \partial A })^{T} = (\nabla _{A}f(A))^{T} ATf(A)=A11fA12fA1nfA21fA22fA2nfAm1fAm2fAmnf=A11fA21fAm1fA12fA22fAm2fA1nfA2nfAmnfT=(Af(A))T=(Af(A))T

性质2

假设存在矩阵U,使得下面的等式成立:

DWf(X)=lim⁡t→0f(X+tW)−f(X)t=tr(WTU) D_{\textbf {W}}f(\textbf {X}) = \lim _{t\to 0} \frac {f(\textbf {X}+t\textbf {W})-f(\textbf {X})}{t} = tr(W^{T}U) DWf(X)=t0limtf(X+tW)f(X)=tr(WTU)

当f(X)是一个tr运算的时候可能成立

那么,对W中任意一个Wij求导,则有:

DWijf(X)=tr(WijTU)=∑j=1∑i=1(wijuij)=uij D_{W_{ij}}f(\textbf {X}) = tr(W_{ij}^{T}U) = \sum _{j=1}^{} \sum _{i=1}^{}(w_{ij}u_{ij}) = u_{ij} DWijf(X)=tr(WijTU)=j=1i=1(wijuij)=uij

对W矩阵的局部单个元素求导,其实按偏导数的概念理解即可,既然是偏导数,这就意味着除了存在w_{ij}的那一项之外的其他元素都被当做常数,而对常数求导必然等于0,所以最后会得到唯一的u_{ij}。

对局部的Wij求导会得到Uij,那么分别对所有Wij求导,并把各个求导结果再组成一个矩阵,就是U矩阵了。又因为W代表任意矩阵,所以f(X)关于X的导数等于U:
∂f(X)∂X=U \frac { \partial f(\textbf {X}) }{ \partial X } = \textbf {U} Xf(X)=U

这个式子的意义在于,当题目是“给你一个自变量是矩阵X的函数f(X),求它关于X的导数”时,可以把问题立即转变成求U,而U的求解,可以通过上面的标准导数公式来求。小结一下步骤:

  • 计算lim⁡t→0f(X+tW)−f(X)t,并化简,直到得到一个形如tr(WTQ)的式子计算\lim _{t\to 0} \frac {f(\textbf {X}+t\textbf {W})-f(\textbf {X})}{t},并化简,直到得到一个形如 tr(W^{T}Q)的式子limt0tf(X+tW)f(X)tr(WTQ)

  • 根据∂f(X)∂X=U可以得到tr(WTQ)=tr(WTU),于是就得到了∂f(X)∂X=U=Q。根据\frac { \partial f(\textbf {X}) }{ \partial X } = \textbf {U}可以得到 tr(W^{T}Q)=tr(W^{T}U),于是就得到了\frac { \partial f(\textbf {X}) }{ \partial X } = U = Q。Xf(X)=Utr(WTQ)=tr(WTU)Xf(X)=U=Q

性质3

∂tr(AX)∂X=AT \frac { \partial tr(AX) }{ \partial X } = A^{T} Xtr(AX)=AT

证明

设:
f(X)=tr(AX) f(X) = tr(AX) f(X)=tr(AX)
根据上面的结论,只需要把下面这个极限简化,理论上就可以求出了:∂tr(AX)∂X根据上面的结论,只需要把下面这个极限简化,理论上就可以求出 了:\frac { \partial tr(AX) }{ \partial X }Xtr(AX)
DWf(X)=lim⁡t→0f(X+tW)−f(X)t D_{\textbf {W}}f(\textbf {X}) = \lim _{t\to 0} \frac {f(\textbf {X}+t\textbf {W})-f(\textbf {X})}{t} DWf(X)=t0limtf(X+tW)f(X)

=lim⁡t→0tr(A(X+tW))−tr(AX)t = \lim _{t\to 0} \frac { tr(A(X + tW)) - tr(AX) }{t} =t0limttr(A(X+tW))tr(AX)

=lim⁡t→0tr(AX+AtW)−tr(AX)t = \lim _{t\to 0} \frac { tr(AX + AtW) - tr(AX) }{t} =t0limttr(AX+AtW)tr(AX)

=lim⁡t→0tr(AX)+tr(AtW)−tr(AX)t = \lim _{t\to 0} \frac { tr(AX) + tr(AtW) - tr(AX) }{t} =t0limttr(AX)+tr(AtW)tr(AX)

=lim⁡t→0tr(AtW)t = \lim _{t\to 0} \frac { tr(AtW) }{t} =t0limttr(AtW)

=lim⁡t→0tr(AW)tt = \lim _{t\to 0} \frac { tr(AW)t }{t} =t0limttr(AW)t

=lim⁡t→0tr(AW) = \lim _{t\to 0} tr(AW) =t0limtr(AW)

=tr(AW) = tr(AW) =tr(AW)

=tr((AW)T) = tr((AW)^{T}) =tr((AW)T)

=tr(WTAT) = tr(W^{T}A^{T}) =tr(WTAT)

所以有:
DWf(X)=tr(WTAT)=tr(WTU) D_{W}f(X) = tr(W^{T}A^{T}) = tr(W^{T}U) DWf(X)=tr(WTAT)=tr(WTU)

U=AT U = A^{T} U=AT

得证:
∂tr(AX)∂X=U=AT \frac { \partial tr(AX) }{ \partial X } = U = A^{T} Xtr(AX)=U=AT

性质4

∂tr(XTAT)∂X=AT \frac { \partial tr(X^{T}A^{T}) }{ \partial X } = A^{T} Xtr(XTAT)=AT

性质5

∇Xtr(X)=tr(∇XX) \nabla _{ X}tr(X) = tr(\nabla _{ X}X) Xtr(X)=tr(XX)

性质6

∇Xtr(AXBXTC)=ATCTXBT+CAXB \nabla _{ X}tr(AXBX^{T}C) = A^{T}C^{T}XB^{T} + CAXB Xtr(AXBXTC)=ATCTXBT+CAXB

性质7

∇Xtr(XBXTC)=CTXBT+CXB \nabla _{ X}tr(XBX^{T}C) = C^{T}XB^{T} + CXB Xtr(XBXTC)=CTXBT+CXB

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值