线性代数(五)——矩阵微积分补充

本文深入探讨矩阵微积分的核心概念,包括雅克比矩阵、偏导数计算法则及矩阵求导布局,涵盖向量与矩阵的微分运算,适用于机器学习、深度学习等领域。

矩阵微积分补充

约定1:

y=f(x)\mathbf y=f(\mathbf x)y=f(x),其中y\mathbf yy是含有m个元素的向量,x\mathbf xx是含有n个元素的向量,则:
∂y∂x=[∂y1∂x1∂y1∂x2⋯∂y1∂xn∂y2∂x1∂y2∂x2⋯∂y2∂xn⋮⋮⋱⋮∂ym∂x1∂ym∂x2⋯∂ym∂xn] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
这个m×nm \times nm×n的矩阵用来表示为由x\mathbf xxy\mathbf yy的偏导数(y\mathbf yyx\mathbf xx求偏导)。这种矩阵我们称为雅克比矩阵。

注意:如果x\mathbf xx是一个标量,那么得到的雅克比矩阵实际上是一个m×1m \times 1m×1的列向量。如果y\mathbf yy是一个标量,那么得到的雅克比矩阵实际上是一个1×n1 \times n1×n的行向量。

命题1:


y=Ax \mathbf y=A\mathbf x y=Ax
其中y\mathbf yym×1m \times 1m×1的列向量,x\mathbf xxn×1n\times 1n×1的列向量,AAAm×nm \times nm×n的矩阵,并且AAA不依赖x\mathbf xx,则
∂y∂x=A \frac{\partial \mathbf y}{\partial \mathbf x}=A xy=A
证明:

对于y\mathbf yy的第i个元素:
yi=∑k=1naikxk y_i=\sum^n_{k=1}a_{ik}x_k yi=k=1naikxk
显然我们可以得到:
∂yi∂xj=aij \frac{\partial y_i}{\partial x_j}=a_{ij} xjyi=aij
对于所有的i=1,2,⋯ ,m, j=1,2,⋯ ,ni=1,2,\cdots,m,\ j=1,2,\cdots,ni=1,2,,m, j=1,2,,n有:
∂y∂x=A \frac{\partial \mathbf y}{\partial \mathbf x}=A xy=A
命题2:


y=Ax \mathbf y=A\mathbf x y=Ax
其中y\mathbf yym×1m \times 1m×1的列向量,x\mathbf xxn×1n\times 1n×1的列向量,AAAm×nm \times nm×n的矩阵,AAA不依赖x\mathbf xx,并且我们假设x\mathbf xx是关于向量z\mathbf zz的函数,则
∂y∂x=A∂x∂z \frac{\partial \mathbf y}{\partial \mathbf x}=A\frac{\partial \mathbf x}{\partial \mathbf z} xy=Azx
证明:

对于y\mathbf yy的第i个元素:
yi=∑k=1naikxk y_i=\sum^n_{k=1}a_{ik}x_k yi=k=1naikxk
于是我们可以得到:
∂yi∂zj=∑k=inaik∂xk∂zj \frac{\partial y_i}{\partial z_j}=\sum^n_{k=i}a_{ik}\frac{\partial x_k}{\partial z_j} zjyi=k=inaikzjxk
我们可以发现这只是A∂x/∂zA{\partial \mathbf x}/{\partial \mathbf z}Ax/z的第(i,j)(i,j)(i,j)元素,因此我们可以得到:
∂y∂z=∂y∂x∂x∂z=A∂y∂z \frac {\partial \mathbf y}{\partial \mathbf z}= \frac {\partial \mathbf y}{\partial \mathbf x} \frac {\partial \mathbf x}{\partial \mathbf z}= A\frac {\partial \mathbf y}{\partial \mathbf z} zy=xyzx=Azy
命题3:

令标量ααα定义如下:
α=yTAx α=\mathbf y^TA \mathbf x α=yTAx
其中y\mathbf yym×1m \times 1m×1的列向量,x\mathbf xxn×1n\times 1n×1的列向量,AAAm×nm \times nm×n的矩阵,并且AAA不依赖x,y\mathbf x,\mathbf yx,y,则:
∂α∂x=yTA \frac {\partial α}{\partial \mathbf x}=\mathbf y^TA xα=yTA
并且:
∂α∂y=xTAT \frac {\partial α}{\partial \mathbf y}=\mathbf x^TA^T yα=xTAT
证明:

我们不妨令:
wT=yTA \mathbf w^T=\mathbf y^TA wT=yTA
并且我们将ααα写作:
α=wTx α=\mathbf w^T \mathbf x α=wTx
命题1我们可以得到:
∂α∂x=wT=yTA \frac {\partial α}{\partial \mathbf x}=\mathbf w^T=\mathbf y^TA xα=wT=yTA
这是结果一。又因为ααα是标量,所以:
α=αT=xTATy α=α^T=\mathbf x^TA^T\mathbf y α=αT=xTATy
再次使用命题1,我们可以得到:
∂α∂y=xTAT \frac {\partial α}{\partial \mathbf y}=\mathbf x^TA^T yα=xTAT
命题4:

对于标量ααα为二次型的特殊情况,ααα写作如下形式:
α=xTAx α=\mathbf x^TA\mathbf x α=xTAx
其中x\mathbf xxn×1n\times 1n×1的列向量,AAAn×nn \times nn×n的矩阵,并且AAA不依赖x\mathbf xx,则:
∂α∂x=xT(A+AT) \frac {\partial α}{\partial \mathbf x}=\mathbf x^T(A+A^T) xα=xT(A+AT)
证明:

由定义可知:
α=∑j=1n∑i=1naijxixj α=\sum^n_{j=1}\sum^n_{i=1}a_{ij}x_ix_j α=j=1ni=1naijxixj
关于x\mathbf xx的第k个元素的微分:
∂α∂xk=∑j=1nakjxj+∑i=1naikxi \frac{\partial α}{\partial x_k}=\sum^n_{j=1}a_{kj}x_j+\sum^n_{i=1}a_{ik}x_i xkα=j=1nakjxj+i=1naikxi
于是:
∂α∂x=xTAT+xTA=xT(AT+A) \frac{\partial α}{\partial \mathbf x}=\mathbf x^TA^T+\mathbf x^TA=\mathbf x^T(A^T+A) xα=xTAT+xTA=xT(AT+A)

注意:此处的结论与第4节中的结论略有不同,第4章结论:
∂α∂x=(AT+A)x \frac{\partial α}{\partial \mathbf x}=(A^T+A) \mathbf x xα=(AT+A)x
可以发现这两个结论只是相差一个转置而已:
((AT+A)x)T=xT(A+AT)=xT(AT+A) ((A^T+A) \mathbf x)^T=\mathbf x^T(A+A^T)=\mathbf x^T(A^T+A) ((AT+A)x)T=xT(A+AT)=xT(AT+A)
这是因为这里偏微分后的结果是个向量,对于向量中的单个元素而言,转置只是横着摆和竖着摆的区别((AT+A)x(A^T+A) \mathbf x(AT+A)x是列向量,xT(A+AT)\mathbf x^T(A+A^T)xT(A+AT)是行向量),从本质上来说并无区别。

但是,为何会产生这种差异?

通过上下文我们可以发现,在第4章中,对矩阵的偏微分结果是依赖于变量向量(矩阵)的形态:
∇Af(A)∈Rm×n=[∂f(A)∂A11∂f(A)∂A12⋯∂f(A)∂A1n∂f(A)∂A21∂f(A)∂A22⋯∂f(A)∂A2n⋮⋮⋱⋮∂f(A)∂Am1∂f(A)∂Am2⋯∂f(A)∂Amn] \nabla_Af(A) \in \R^{m \times n} =\left[ \begin{matrix} \frac{\partial f(A)}{\partial A_{11}} & \frac{\partial f(A)}{\partial A_{12}} &\cdots& \frac{\partial f(A)}{\partial A_{1n}}\\ \frac{\partial f(A)}{\partial A_{21}} & \frac{\partial f(A)}{\partial A_{22}} &\cdots& \frac{\partial f(A)}{\partial A_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f(A)}{\partial A_{m1}} & \frac{\partial f(A)}{\partial A_{m2}} &\cdots& \frac{\partial f(A)}{\partial A_{mn}}\\ \end{matrix} \right] Af(A)Rm×n=A11f(A)A21f(A)Am1f(A)A12f(A)A22f(A)Am2f(A)A1nf(A)A2nf(A)Amnf(A)
但是在此补充当中,偏微分的结果始终应该是遵从雅克比矩阵:
∂y∂x=[∂y1∂x1∂y1∂x2⋯∂y1∂xn∂y2∂x1∂y2∂x2⋯∂y2∂xn⋮⋮⋱⋮∂ym∂x1∂ym∂x2⋯∂ym∂xn] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
所以按照第4章的定义,标量对列向量求导结果应当为列向量

按照此补充中的雅克比矩阵定义,标量对向量求导应该为行向量。所以才会产生一个转置的差异。

实际上,在矩阵微积分中,矩阵的求导很多方面并没有统一的符号和表达方式。但是我们大致可以分为两类布局:

  • 分子布局
  • 分母布局
  1. 分子布局

将:
∂y∂x \frac{\partial \mathbf y}{\partial \mathbf x} xy
中的分子向量y\mathbf yy当做列向量,分母向量x\mathbf xx当做行向量处理(因为对于单个向量而言并没有行列之分,行列只是人为的规定)。得到结果就是雅克比矩阵
∂y∂x=[∂y1∂x1∂y1∂x2⋯∂y1∂xn∂y2∂x1∂y2∂x2⋯∂y2∂xn⋮⋮⋱⋮∂ym∂x1∂ym∂x2⋯∂ym∂xn] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
如果将分子向量y\mathbf yy退化为标量yyy
∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn] \frac{\partial y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} &\cdots& \frac{\partial y}{\partial x_n} \end{matrix} \right] xy=[x1yx2yxny]
如果将分母向量x\mathbf xx退化为标量xxx:
∂y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x] \frac{\partial \mathbf y}{\partial x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_m}{\partial x} \\ \end{matrix} \right] xy=xy1xy2xym
下面这中情况,只存在与分子布局:

分子为矩阵YYY,分母为标量xxx:
∂Y∂x=[∂y11∂x∂y12∂x⋯∂y1n∂x∂y21∂x∂y22∂x⋯∂y2n∂x⋮⋮⋱⋮∂ym1∂x∂ym2∂x⋯∂ymn∂x] \frac{\partial Y}{\partial x} =\left[ \begin{matrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} &\cdots& \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} &\cdots& \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} &\cdots& \frac{\partial y_{mn}}{\partial x}\\ \end{matrix} \right] xY=xy11xy21xym1xy12xy22xym2xy1nxy2nxymn

  1. 分母布局

将:
∂y∂x \frac{\partial \mathbf y}{\partial \mathbf x} xy
中的分子向量y\mathbf yy当做行向量,分母向量x\mathbf xx当做列向量处理(因为对于单个向量而言并没有行列之分,行列只是人为的规定)。得到结果就是:
∂y∂x=[∂y1∂x1∂y2∂x1⋯∂ym∂x1∂y1∂x2∂y2∂x2⋯∂ym∂x2⋮⋮⋱⋮∂y1∂xn∂y2∂xn⋯∂ym∂xn] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} &\cdots& \frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x2y1xny1x1y2x2y2xny2x1ymx2ymxnym
如果将分子向量y\mathbf yy退化为标量yyy
∂y∂x=[∂y∂x1∂y∂x2⋮∂y∂xn] \frac{\partial \mathbf y}{\partial x} =\left[ \begin{matrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{matrix} \right] xy=x1yx2yxny
如果将分母向量x\mathbf xx退化为标量xxx:
∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x] \frac{\partial y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} &\cdots& \frac{\partial y_m}{\partial x} \end{matrix} \right] xy=[xy1xy2xym]
下面这中情况,只存在与分母布局:

分子为标量yyy,分母为矩阵XXX:
∂y∂X=[∂y∂x11∂y∂x12⋯∂y∂x1n∂y∂x21∂y∂x22⋯∂y∂x2n⋮⋮⋱⋮∂y∂xm1∂y∂xm2⋯∂y∂xmn] \frac{\partial y}{\partial X} =\left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} &\cdots& \frac{\partial y}{\partial x_{1n}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} &\cdots& \frac{\partial y}{\partial x_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{m1}} & \frac{\partial y}{\partial x_{m2}} &\cdots& \frac{\partial y}{\partial x_{mn}}\\ \end{matrix} \right] Xy=x11yx21yxm1yx12yx22yxm2yx1nyx2nyxmny
可以发现这种分母布局便是第4章所提到的梯度.

通过观察可以发现,分子布局和分母布局在表达形式上只是相差一个转置而已。

对于以上两种布局我们可以总结为:什么布局,什么为列,什么布局,什么不变

例如:分子布局,分子为列(分子看做列向量),分子布局,分子不变(求导后的矩阵每行的分子都是相同不变的)

但是在实际使用中,最初就会规定x,y\mathbf x,\mathbf yx,y是列向量,或者行向量(以下默认向量为列向量),则:

分子布局:
∂x∂xT=[10⋯001⋯0⋮⋮⋱⋮00⋯1]=I \frac{\partial \mathbf x}{\partial \mathbf x^T} =\left[ \begin{matrix} 1 & 0 &\cdots& 0\\ 0 & 1&\cdots& 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 &0 &\cdots& 1\\ \end{matrix} \right]=I xTx=100010001=I
分母布局:
∂xT∂x=[10⋯001⋯0⋮⋮⋱⋮00⋯1]=I \frac{\partial \mathbf x^T}{\partial \mathbf x} =\left[ \begin{matrix} 1 & 0 &\cdots& 0\\ 0 & 1&\cdots& 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 &0 &\cdots& 1\\ \end{matrix} \right]=I xxT=100010001=I
(需要注意的是列对列,行对行求导我们这里不讨论。)

写到这里刚好解决我这段时间的一大困惑:
y=xTA \mathbf y = \mathbf x^TA y=xTA
其中x\mathbf xxn×1n \times 1n×1的列向量,AAAn×mn \times mn×m的矩阵,明显y\mathbf yy1×m1 \times m1×m的行向量,于是我们可以得出(分母布局):
∂y∂x=∂xTA∂x=∂[∑i=1nai1xi∑i=1nai2xi⋯∑i=1naimxi]∂[x1x2⋮xn]=[a11a12⋯a1ma21a22⋯a2m⋮⋮⋱⋮an1an2⋯anm]=A \begin{aligned} \frac{\partial \mathbf y}{\partial \mathbf x}&=\frac{\partial \mathbf x^TA}{\partial \mathbf x}\\ &=\frac{\partial {\left[ \begin{matrix} \sum_{i=1}^na_{i1}x_i& \sum_{i=1}^na_{i2}x_i& \cdots & \sum_{i=1}^na_{im}x_i \end{matrix} \right]}}{\partial {\left[ \begin{matrix} x_1\\ x_2\\ \vdots \\ x_n\\ \end{matrix} \right]}}\\ &=\left[ \begin{matrix} a_{11} & a_{12} &\cdots& a_{1m}\\ a_{21} & a_{22}&\cdots& a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} &a_{n2} &\cdots& a_{nm}\\ \end{matrix} \right] \\&=A \end{aligned} xy=xxTA=x1x2xn[i=1nai1xii=1nai2xii=1naimxi]=a11a21an1a12a22an2a1ma2manm=A
于是我便思考如果是y\mathbf yy的转置yT\mathbf y^TyTx\mathbf xx的求导的结果是否存在某种关联,但是现在我发现yTy^TyT是列向量,x\mathbf xx也是列向量,列向量对列向量求导依旧是列向量(也就是矩阵A的向量化vec(A)vec(A)vec(A)),会改变现有A矩阵的形式。所以我们应该写成如下形式(分子布局):
∂yT∂xT=∂ATx∂xT=∂[∑i=1nai1xi∑i=1nai2xi⋮∑i=1naimxi]∂[x1x2⋯xn]=[a11a21⋯an1a12a22⋯an2⋮⋮⋱⋮a1ma2m⋯anm]=AT \begin{aligned} \frac{\partial \mathbf y^T}{\partial \mathbf x^T}&=\frac{\partial A^T\mathbf x}{\partial \mathbf x^T}\\ &=\frac{\partial {\left[ \begin{matrix} \sum_{i=1}^na_{i1}x_i\\ \sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{i=1}^na_{im}x_i \end{matrix} \right]}}{\partial {\left[ \begin{matrix} x_1& x_2& \cdots & x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{11} & a_{21} &\cdots& a_{n1}\\ a_{12} & a_{22}&\cdots& a_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ a_{1m} &a_{2m} &\cdots& a_{nm}\\ \end{matrix} \right]\\ \\&=A^T \end{aligned} xTyT=xTATx=[x1x2xn]i=1nai1xii=1nai2xii=1naimxi=a11a12a1ma21a22a2man1an2anm=AT
我们可以发现分子布局和分母布局只是相差一个矩阵而已,即
(∂y∂x)T=∂yT∂xT \left(\frac{\partial \mathbf y}{\partial \mathbf x}\right)^T=\frac{\partial \mathbf y^T}{\partial \mathbf x^T} (xy)T=xTyT
那么又如果y\mathbf yy退化为一个标量yyy,又是如何?

不妨令
y=xTa y=\mathbf x^T \mathbf a y=xTa
其中x\mathbf xxn×1n \times 1n×1的列向量,a\mathbf aan×1n \times 1n×1的列向量,显然yyy是一个标量,则(分子布局):
∂y∂x=∂xTa∂x=∂(∑i=1naixi)∂[x1x2⋮xn]=[a1a2⋮an]=a \begin{aligned} \frac{\partial y}{\partial \mathbf x}&=\frac{\partial \mathbf x^T\mathbf a}{\partial \mathbf x}\\ &=\frac{\partial {\left( \sum^n_{i=1}a_{i}x_i \right) }}{\partial {\left[ \begin{matrix} x_1\\ x_2\\ \vdots \\ x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \\ \end{matrix} \right]\\ \\&=\mathbf a \end{aligned} xy=xxTa=x1x2xn(i=1naixi)=a1a2an=a
由于yyy是一个标量,所以有:
y=(y)T=aTx y=(y)^T=\mathbf a^T \mathbf x y=(y)T=aTx
于是我们可以得到:
∂aTx∂x=a \frac{\partial \mathbf a^T\mathbf x}{\partial \mathbf x}=\mathbf a xaTx=a
如果标量yTy^TyT是对xT\mathbf x^TxT求导呢(分母布局)?
∂yT∂xT=∂aTx∂xT=∂(∑i=1naixi)∂[x1x2⋯xn]=[a1a2⋯an]=aT \begin{aligned} \frac{\partial y^T}{\partial \mathbf x^T}&=\frac{\partial \mathbf a^T\mathbf x}{\partial \mathbf x^T}\\ &=\frac{\partial {\left( \sum^n_{i=1}a_{i}x_i \right) }}{\partial {\left[ \begin{matrix} x_1& x_2& \cdots & x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{1} & a_{2} & \cdots & a_{n} \end{matrix} \right]\\ \\&=\mathbf a^T \end{aligned} xTyT=xTaTx=[x1x2xn](i=1naixi)=[a1a2an]=aT

命题5:

假设命题4中的A是对称矩阵
α=xTAx α=\mathbf x^TA\mathbf x α=xTAx
其中x\mathbf xxn×1n\times 1n×1的列向量,AAAn×nn \times nn×n的矩阵,并且AAA不依赖x\mathbf xx,则:
∂α∂x=2xTA \frac {\partial α}{\partial \mathbf x}=\mathbf 2x^TA xα=2xTA
证明:由命题4即可证明

命题6:

假设标量ααα
α=yTx α=\mathbf y^T\mathbf x α=yTx
其中x\mathbf xxn×1n\times 1n×1的列向量,y\mathbf yyn×1n\times 1n×1的列向量,​并且x,y\mathbf x,\mathbf yx,y都是关于向量z\mathbf zz的函数,则:
∂α∂z=xT∂y∂z+yT∂x∂z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTzy+yTzx
证明:
α=∑i=1Nxiyi α=\sum^N_{i=1}x_iy_i α=i=1Nxiyi
向量z\mathbf zz的第k个元素的微分:
∂α∂zk=∑i=1n(xi∂yi∂zk+yi∂xi∂zk) \frac {\partial α}{\partial \mathbf z_k}=\sum^n_{i=1} \left(x_i\frac {\partial y_i}{\partial z_k}+y_i\frac {\partial x_i}{\partial z_k}\right) zkα=i=1n(xizkyi+yizkxi)
所以我们可以得出;
∂α∂z=∂α∂y∂y∂z+∂α∂x∂x∂z=xT∂y∂z+yT∂x∂z \frac {\partial α}{\partial \mathbf z}= \frac {\partial α}{\partial \mathbf y}\frac {\partial \mathbf y}{\partial \mathbf z}+\frac {\partial α}{\partial \mathbf x}\frac {\partial \mathbf x}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=yαzy+xαzx=xTzy+yTzx
命题7

假设标量ααα
α=xTx α=\mathbf x^T\mathbf x α=xTx
其中x\mathbf xxn×1n\times 1n×1的列向量,并且x\mathbf xx是关于向量z\mathbf zz的函数,则:
∂α∂z=2xT∂x∂z \frac {\partial α}{\partial \mathbf z}=2\mathbf x^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=2xTzx
证明:由结论6证明

命题8

假设标量ααα
α=yTAx α=\mathbf y^TA\mathbf x α=yTAx
其中x\mathbf xxn×1n\times 1n×1的列向量,y\mathbf yym×1m\times 1m×1的列向量,AAAn×nn \times nn×n的矩阵,AAA不依赖z\mathbf zz,并且x,y\mathbf x,\mathbf yx,y是关于向量z\mathbf zz的函数,则:
∂α∂z=xTAT∂y∂z+yTA∂x∂z \frac {\partial α}{\partial \mathbf z}=\mathbf x^TA^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTATzy+yTAzx
证明:令:
wT=yTA \mathbf w^T=\mathbf y^TA wT=yTA
ααα可以写作:
α=wTx α=\mathbf w^T \mathbf x α=wTx
由结论6,我们可以得到:
∂α∂z=xT∂w∂z+wT∂x∂z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf w}{\partial \mathbf z}+\mathbf w^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTzw+wTzx
我们在将w\mathbf ww带回到式子中:
∂α∂z=xT∂(ATy)∂z+yTA∂x∂z=xTAT∂y∂z+yTA∂x∂z \begin{aligned} \frac {\partial α}{\partial \mathbf z}&=\mathbf x^T\frac {\partial ( A^T \mathbf y)}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z}\\ &=\mathbf x^TA^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z}\\ \end{aligned} zα=xTz(ATy)+yTAzx=xTATzy+yTAzx
命题9

假设标量ααα
α=xTAx α=\mathbf x^TA\mathbf x α=xTAx
其中x\mathbf xxn×1n\times 1n×1的列向量,AAAn×nn \times nn×n的矩阵,AAA不依赖z\mathbf zz,并且x\mathbf xx是关于向量z\mathbf zz的函数,则:
∂α∂z=xT(A+AT)∂x∂z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T(A+A^T)\frac {\partial \mathbf x}{\partial \mathbf z} zα=xT(A+AT)zx
证明:有结论8得出

命题10

假设标量ααα为,其中A为对称矩阵
α=xTAx α=\mathbf x^TA\mathbf x α=xTAx
其中x\mathbf xxn×1n\times 1n×1的列向量,AAAn×nn \times nn×n的矩阵,AAA不依赖z\mathbf zz,并且x\mathbf xx是关于向量z\mathbf zz的函数,则:
∂α∂z=2xTA∂x∂z \frac {\partial α}{\partial \mathbf z}=2\mathbf x^TA\frac {\partial \mathbf x}{\partial \mathbf z} zα=2xTAzx
证明:有结论9得出

命题11:

如果A是一个m×mm\times mm×m的可逆矩阵,那么A对标量ααα的偏微分是:
∂A−1∂α=−A−1∂A∂αA−1 \frac {\partial A^{-1}}{\partial α}=-A^{-1}\frac {\partial A}{\partial α}A^{-1} αA1=A1αAA1
证明:由定义可知
A−1A=I A^{-1}A=I A1A=I
等式两边对标量ααα微分:
A−1∂A∂α+∂A∂αA=0 A^{-1}\frac {\partial A}{\partial α}+\frac {\partial A}{\partial α}A=0 A1αA+αAA=0
移项:
∂A−1∂α=−A−1∂A∂αA−1 \frac {\partial A^{-1}}{\partial α}=-A^{-1}\frac {\partial A}{\partial α}A^{-1} αA1=A1αAA1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值