线性代数(五)——矩阵微积分补充

本文深入探讨矩阵微积分的核心概念,包括雅克比矩阵、偏导数计算法则及矩阵求导布局,涵盖向量与矩阵的微分运算,适用于机器学习、深度学习等领域。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

矩阵微积分补充

约定1:

y = f ( x ) \mathbf y=f(\mathbf x) y=f(x),其中 y \mathbf y y是含有m个元素的向量, x \mathbf x x是含有n个元素的向量,则:
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
这个 m × n m \times n m×n的矩阵用来表示为由 x \mathbf x x y \mathbf y y的偏导数( y \mathbf y y x \mathbf x x求偏导)。这种矩阵我们称为雅克比矩阵。

注意:如果 x \mathbf x x是一个标量,那么得到的雅克比矩阵实际上是一个 m × 1 m \times 1 m×1的列向量。如果 y \mathbf y y是一个标量,那么得到的雅克比矩阵实际上是一个 1 × n 1 \times n 1×n的行向量。

命题1:


y = A x \mathbf y=A\mathbf x y=Ax
其中 y \mathbf y y m × 1 m \times 1 m×1的列向量, x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A m × n m \times n m×n的矩阵,并且 A A A不依赖 x \mathbf x x,则
∂ y ∂ x = A \frac{\partial \mathbf y}{\partial \mathbf x}=A xy=A
证明:

对于 y \mathbf y y的第i个元素:
y i = ∑ k = 1 n a i k x k y_i=\sum^n_{k=1}a_{ik}x_k yi=k=1naikxk
显然我们可以得到:
∂ y i ∂ x j = a i j \frac{\partial y_i}{\partial x_j}=a_{ij} xjyi=aij
对于所有的 i = 1 , 2 , ⋯   , m ,   j = 1 , 2 , ⋯   , n i=1,2,\cdots,m,\ j=1,2,\cdots,n i=1,2,,m, j=1,2,,n有:
∂ y ∂ x = A \frac{\partial \mathbf y}{\partial \mathbf x}=A xy=A
命题2:


y = A x \mathbf y=A\mathbf x y=Ax
其中 y \mathbf y y m × 1 m \times 1 m×1的列向量, x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A m × n m \times n m×n的矩阵, A A A不依赖 x \mathbf x x,并且我们假设 x \mathbf x x是关于向量 z \mathbf z z的函数,则
∂ y ∂ x = A ∂ x ∂ z \frac{\partial \mathbf y}{\partial \mathbf x}=A\frac{\partial \mathbf x}{\partial \mathbf z} xy=Azx
证明:

对于 y \mathbf y y的第i个元素:
y i = ∑ k = 1 n a i k x k y_i=\sum^n_{k=1}a_{ik}x_k yi=k=1naikxk
于是我们可以得到:
∂ y i ∂ z j = ∑ k = i n a i k ∂ x k ∂ z j \frac{\partial y_i}{\partial z_j}=\sum^n_{k=i}a_{ik}\frac{\partial x_k}{\partial z_j} zjyi=k=inaikzjxk
我们可以发现这只是 A ∂ x / ∂ z A{\partial \mathbf x}/{\partial \mathbf z} Ax/z的第 ( i , j ) (i,j) (i,j)元素,因此我们可以得到:
∂ y ∂ z = ∂ y ∂ x ∂ x ∂ z = A ∂ y ∂ z \frac {\partial \mathbf y}{\partial \mathbf z}= \frac {\partial \mathbf y}{\partial \mathbf x} \frac {\partial \mathbf x}{\partial \mathbf z}= A\frac {\partial \mathbf y}{\partial \mathbf z} zy=xyzx=Azy
命题3:

令标量 α α α定义如下:
α = y T A x α=\mathbf y^TA \mathbf x α=yTAx
其中 y \mathbf y y m × 1 m \times 1 m×1的列向量, x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A m × n m \times n m×n的矩阵,并且 A A A不依赖 x , y \mathbf x,\mathbf y x,y,则:
∂ α ∂ x = y T A \frac {\partial α}{\partial \mathbf x}=\mathbf y^TA xα=yTA
并且:
∂ α ∂ y = x T A T \frac {\partial α}{\partial \mathbf y}=\mathbf x^TA^T yα=xTAT
证明:

我们不妨令:
w T = y T A \mathbf w^T=\mathbf y^TA wT=yTA
并且我们将 α α α写作:
α = w T x α=\mathbf w^T \mathbf x α=wTx
命题1我们可以得到:
∂ α ∂ x = w T = y T A \frac {\partial α}{\partial \mathbf x}=\mathbf w^T=\mathbf y^TA xα=wT=yTA
这是结果一。又因为 α α α是标量,所以:
α = α T = x T A T y α=α^T=\mathbf x^TA^T\mathbf y α=αT=xTATy
再次使用命题1,我们可以得到:
∂ α ∂ y = x T A T \frac {\partial α}{\partial \mathbf y}=\mathbf x^TA^T yα=xTAT
命题4:

对于标量 α α α为二次型的特殊情况, α α α写作如下形式:
α = x T A x α=\mathbf x^TA\mathbf x α=xTAx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A n × n n \times n n×n的矩阵,并且 A A A不依赖 x \mathbf x x,则:
∂ α ∂ x = x T ( A + A T ) \frac {\partial α}{\partial \mathbf x}=\mathbf x^T(A+A^T) xα=xT(A+AT)
证明:

由定义可知:
α = ∑ j = 1 n ∑ i = 1 n a i j x i x j α=\sum^n_{j=1}\sum^n_{i=1}a_{ij}x_ix_j α=j=1ni=1naijxixj
关于 x \mathbf x x的第k个元素的微分:
∂ α ∂ x k = ∑ j = 1 n a k j x j + ∑ i = 1 n a i k x i \frac{\partial α}{\partial x_k}=\sum^n_{j=1}a_{kj}x_j+\sum^n_{i=1}a_{ik}x_i xkα=j=1nakjxj+i=1naikxi
于是:
∂ α ∂ x = x T A T + x T A = x T ( A T + A ) \frac{\partial α}{\partial \mathbf x}=\mathbf x^TA^T+\mathbf x^TA=\mathbf x^T(A^T+A) xα=xTAT+xTA=xT(AT+A)

注意:此处的结论与第4节中的结论略有不同,第4章结论:
∂ α ∂ x = ( A T + A ) x \frac{\partial α}{\partial \mathbf x}=(A^T+A) \mathbf x xα=(AT+A)x
可以发现这两个结论只是相差一个转置而已:
( ( A T + A ) x ) T = x T ( A + A T ) = x T ( A T + A ) ((A^T+A) \mathbf x)^T=\mathbf x^T(A+A^T)=\mathbf x^T(A^T+A) ((AT+A)x)T=xT(A+AT)=xT(AT+A)
这是因为这里偏微分后的结果是个向量,对于向量中的单个元素而言,转置只是横着摆和竖着摆的区别( ( A T + A ) x (A^T+A) \mathbf x (AT+A)x是列向量, x T ( A + A T ) \mathbf x^T(A+A^T) xT(A+AT)是行向量),从本质上来说并无区别。

但是,为何会产生这种差异?

通过上下文我们可以发现,在第4章中,对矩阵的偏微分结果是依赖于变量向量(矩阵)的形态:
∇ A f ( A ) ∈ R m × n = [ ∂ f ( A ) ∂ A 11 ∂ f ( A ) ∂ A 12 ⋯ ∂ f ( A ) ∂ A 1 n ∂ f ( A ) ∂ A 21 ∂ f ( A ) ∂ A 22 ⋯ ∂ f ( A ) ∂ A 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ( A ) ∂ A m 1 ∂ f ( A ) ∂ A m 2 ⋯ ∂ f ( A ) ∂ A m n ] \nabla_Af(A) \in \R^{m \times n} =\left[ \begin{matrix} \frac{\partial f(A)}{\partial A_{11}} & \frac{\partial f(A)}{\partial A_{12}} &\cdots& \frac{\partial f(A)}{\partial A_{1n}}\\ \frac{\partial f(A)}{\partial A_{21}} & \frac{\partial f(A)}{\partial A_{22}} &\cdots& \frac{\partial f(A)}{\partial A_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f(A)}{\partial A_{m1}} & \frac{\partial f(A)}{\partial A_{m2}} &\cdots& \frac{\partial f(A)}{\partial A_{mn}}\\ \end{matrix} \right] Af(A)Rm×n=A11f(A)A21f(A)Am1f(A)A12f(A)A22f(A)Am2f(A)A1nf(A)A2nf(A)Amnf(A)
但是在此补充当中,偏微分的结果始终应该是遵从雅克比矩阵:
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
所以按照第4章的定义,标量对列向量求导结果应当为列向量

按照此补充中的雅克比矩阵定义,标量对向量求导应该为行向量。所以才会产生一个转置的差异。

实际上,在矩阵微积分中,矩阵的求导很多方面并没有统一的符号和表达方式。但是我们大致可以分为两类布局:

  • 分子布局
  • 分母布局
  1. 分子布局

将:
∂ y ∂ x \frac{\partial \mathbf y}{\partial \mathbf x} xy
中的分子向量 y \mathbf y y当做列向量,分母向量 x \mathbf x x当做行向量处理(因为对于单个向量而言并没有行列之分,行列只是人为的规定)。得到结果就是雅克比矩阵
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots& \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym
如果将分子向量 y \mathbf y y退化为标量 y y y
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ] \frac{\partial y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} &\cdots& \frac{\partial y}{\partial x_n} \end{matrix} \right] xy=[x1yx2yxny]
如果将分母向量 x \mathbf x x退化为标量 x x x:
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y m ∂ x ] \frac{\partial \mathbf y}{\partial x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_m}{\partial x} \\ \end{matrix} \right] xy=xy1xy2xym
下面这中情况,只存在与分子布局:

分子为矩阵 Y Y Y,分母为标量 x x x:
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial Y}{\partial x} =\left[ \begin{matrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} &\cdots& \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} &\cdots& \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} &\cdots& \frac{\partial y_{mn}}{\partial x}\\ \end{matrix} \right] xY=xy11xy21xym1xy12xy22xym2xy1nxy2nxymn

  1. 分母布局

将:
∂ y ∂ x \frac{\partial \mathbf y}{\partial \mathbf x} xy
中的分子向量 y \mathbf y y当做行向量,分母向量 x \mathbf x x当做列向量处理(因为对于单个向量而言并没有行列之分,行列只是人为的规定)。得到结果就是:
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ⋯ ∂ y m ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ⋯ ∂ y m ∂ x n ] \frac{\partial \mathbf y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} &\cdots& \frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} &\cdots& \frac{\partial y_m}{\partial x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} &\cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right] xy=x1y1x2y1xny1x1y2x2y2xny2x1ymx2ymxnym
如果将分子向量 y \mathbf y y退化为标量 y y y
∂ y ∂ x = [ ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ] \frac{\partial \mathbf y}{\partial x} =\left[ \begin{matrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{matrix} \right] xy=x1yx2yxny
如果将分母向量 x \mathbf x x退化为标量 x x x:
∂ y ∂ x = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋯ ∂ y m ∂ x ] \frac{\partial y}{\partial \mathbf x} =\left[ \begin{matrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} &\cdots& \frac{\partial y_m}{\partial x} \end{matrix} \right] xy=[xy1xy2xym]
下面这中情况,只存在与分母布局:

分子为标量 y y y,分母为矩阵 X X X:
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 12 ⋯ ∂ y ∂ x 1 n ∂ y ∂ x 21 ∂ y ∂ x 22 ⋯ ∂ y ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x m 1 ∂ y ∂ x m 2 ⋯ ∂ y ∂ x m n ] \frac{\partial y}{\partial X} =\left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} &\cdots& \frac{\partial y}{\partial x_{1n}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} &\cdots& \frac{\partial y}{\partial x_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{m1}} & \frac{\partial y}{\partial x_{m2}} &\cdots& \frac{\partial y}{\partial x_{mn}}\\ \end{matrix} \right] Xy=x11yx21yxm1yx12yx22yxm2yx1nyx2nyxmny
可以发现这种分母布局便是第4章所提到的梯度.

通过观察可以发现,分子布局和分母布局在表达形式上只是相差一个转置而已。

对于以上两种布局我们可以总结为:什么布局,什么为列,什么布局,什么不变

例如:分子布局,分子为列(分子看做列向量),分子布局,分子不变(求导后的矩阵每行的分子都是相同不变的)

但是在实际使用中,最初就会规定 x , y \mathbf x,\mathbf y x,y是列向量,或者行向量(以下默认向量为列向量),则:

分子布局:
∂ x ∂ x T = [ 1 0 ⋯ 0 0 1 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 ] = I \frac{\partial \mathbf x}{\partial \mathbf x^T} =\left[ \begin{matrix} 1 & 0 &\cdots& 0\\ 0 & 1&\cdots& 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 &0 &\cdots& 1\\ \end{matrix} \right]=I xTx=100010001=I
分母布局:
∂ x T ∂ x = [ 1 0 ⋯ 0 0 1 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 ] = I \frac{\partial \mathbf x^T}{\partial \mathbf x} =\left[ \begin{matrix} 1 & 0 &\cdots& 0\\ 0 & 1&\cdots& 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 &0 &\cdots& 1\\ \end{matrix} \right]=I xxT=100010001=I
(需要注意的是列对列,行对行求导我们这里不讨论。)

写到这里刚好解决我这段时间的一大困惑:
y = x T A \mathbf y = \mathbf x^TA y=xTA
其中 x \mathbf x x n × 1 n \times 1 n×1的列向量, A A A n × m n \times m n×m的矩阵,明显 y \mathbf y y 1 × m 1 \times m 1×m的行向量,于是我们可以得出(分母布局):
∂ y ∂ x = ∂ x T A ∂ x = ∂ [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋯ ∑ i = 1 n a i m x i ] ∂ [ x 1 x 2 ⋮ x n ] = [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a n 1 a n 2 ⋯ a n m ] = A \begin{aligned} \frac{\partial \mathbf y}{\partial \mathbf x}&=\frac{\partial \mathbf x^TA}{\partial \mathbf x}\\ &=\frac{\partial {\left[ \begin{matrix} \sum_{i=1}^na_{i1}x_i& \sum_{i=1}^na_{i2}x_i& \cdots & \sum_{i=1}^na_{im}x_i \end{matrix} \right]}}{\partial {\left[ \begin{matrix} x_1\\ x_2\\ \vdots \\ x_n\\ \end{matrix} \right]}}\\ &=\left[ \begin{matrix} a_{11} & a_{12} &\cdots& a_{1m}\\ a_{21} & a_{22}&\cdots& a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} &a_{n2} &\cdots& a_{nm}\\ \end{matrix} \right] \\&=A \end{aligned} xy=xxTA=x1x2xn[i=1nai1xii=1nai2xii=1naimxi]=a11a21an1a12a22an2a1ma2manm=A
于是我便思考如果是 y \mathbf y y的转置 y T \mathbf y^T yT x \mathbf x x的求导的结果是否存在某种关联,但是现在我发现 y T y^T yT是列向量, x \mathbf x x也是列向量,列向量对列向量求导依旧是列向量(也就是矩阵A的向量化 v e c ( A ) vec(A) vec(A)),会改变现有A矩阵的形式。所以我们应该写成如下形式(分子布局):
∂ y T ∂ x T = ∂ A T x ∂ x T = ∂ [ ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i ⋮ ∑ i = 1 n a i m x i ] ∂ [ x 1 x 2 ⋯ x n ] = [ a 11 a 21 ⋯ a n 1 a 12 a 22 ⋯ a n 2 ⋮ ⋮ ⋱ ⋮ a 1 m a 2 m ⋯ a n m ] = A T \begin{aligned} \frac{\partial \mathbf y^T}{\partial \mathbf x^T}&=\frac{\partial A^T\mathbf x}{\partial \mathbf x^T}\\ &=\frac{\partial {\left[ \begin{matrix} \sum_{i=1}^na_{i1}x_i\\ \sum_{i=1}^na_{i2}x_i\\ \vdots \\ \sum_{i=1}^na_{im}x_i \end{matrix} \right]}}{\partial {\left[ \begin{matrix} x_1& x_2& \cdots & x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{11} & a_{21} &\cdots& a_{n1}\\ a_{12} & a_{22}&\cdots& a_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ a_{1m} &a_{2m} &\cdots& a_{nm}\\ \end{matrix} \right]\\ \\&=A^T \end{aligned} xTyT=xTATx=[x1x2xn]i=1nai1xii=1nai2xii=1naimxi=a11a12a1ma21a22a2man1an2anm=AT
我们可以发现分子布局和分母布局只是相差一个矩阵而已,即
( ∂ y ∂ x ) T = ∂ y T ∂ x T \left(\frac{\partial \mathbf y}{\partial \mathbf x}\right)^T=\frac{\partial \mathbf y^T}{\partial \mathbf x^T} (xy)T=xTyT
那么又如果 y \mathbf y y退化为一个标量 y y y,又是如何?

不妨令
y = x T a y=\mathbf x^T \mathbf a y=xTa
其中 x \mathbf x x n × 1 n \times 1 n×1的列向量, a \mathbf a a n × 1 n \times 1 n×1的列向量,显然 y y y是一个标量,则(分子布局):
∂ y ∂ x = ∂ x T a ∂ x = ∂ ( ∑ i = 1 n a i x i ) ∂ [ x 1 x 2 ⋮ x n ] = [ a 1 a 2 ⋮ a n ] = a \begin{aligned} \frac{\partial y}{\partial \mathbf x}&=\frac{\partial \mathbf x^T\mathbf a}{\partial \mathbf x}\\ &=\frac{\partial {\left( \sum^n_{i=1}a_{i}x_i \right) }}{\partial {\left[ \begin{matrix} x_1\\ x_2\\ \vdots \\ x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \\ \end{matrix} \right]\\ \\&=\mathbf a \end{aligned} xy=xxTa=x1x2xn(i=1naixi)=a1a2an=a
由于 y y y是一个标量,所以有:
y = ( y ) T = a T x y=(y)^T=\mathbf a^T \mathbf x y=(y)T=aTx
于是我们可以得到:
∂ a T x ∂ x = a \frac{\partial \mathbf a^T\mathbf x}{\partial \mathbf x}=\mathbf a xaTx=a
如果标量 y T y^T yT是对 x T \mathbf x^T xT求导呢(分母布局)?
∂ y T ∂ x T = ∂ a T x ∂ x T = ∂ ( ∑ i = 1 n a i x i ) ∂ [ x 1 x 2 ⋯ x n ] = [ a 1 a 2 ⋯ a n ] = a T \begin{aligned} \frac{\partial y^T}{\partial \mathbf x^T}&=\frac{\partial \mathbf a^T\mathbf x}{\partial \mathbf x^T}\\ &=\frac{\partial {\left( \sum^n_{i=1}a_{i}x_i \right) }}{\partial {\left[ \begin{matrix} x_1& x_2& \cdots & x_n \end{matrix} \right]}}\\\\ &=\left[ \begin{matrix} a_{1} & a_{2} & \cdots & a_{n} \end{matrix} \right]\\ \\&=\mathbf a^T \end{aligned} xTyT=xTaTx=[x1x2xn](i=1naixi)=[a1a2an]=aT

命题5:

假设命题4中的A是对称矩阵
α = x T A x α=\mathbf x^TA\mathbf x α=xTAx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A n × n n \times n n×n的矩阵,并且 A A A不依赖 x \mathbf x x,则:
∂ α ∂ x = 2 x T A \frac {\partial α}{\partial \mathbf x}=\mathbf 2x^TA xα=2xTA
证明:由命题4即可证明

命题6:

假设标量 α α α
α = y T x α=\mathbf y^T\mathbf x α=yTx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, y \mathbf y y n × 1 n\times 1 n×1的列向量,​并且 x , y \mathbf x,\mathbf y x,y都是关于向量 z \mathbf z z的函数,则:
∂ α ∂ z = x T ∂ y ∂ z + y T ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTzy+yTzx
证明:
α = ∑ i = 1 N x i y i α=\sum^N_{i=1}x_iy_i α=i=1Nxiyi
向量 z \mathbf z z的第k个元素的微分:
∂ α ∂ z k = ∑ i = 1 n ( x i ∂ y i ∂ z k + y i ∂ x i ∂ z k ) \frac {\partial α}{\partial \mathbf z_k}=\sum^n_{i=1} \left(x_i\frac {\partial y_i}{\partial z_k}+y_i\frac {\partial x_i}{\partial z_k}\right) zkα=i=1n(xizkyi+yizkxi)
所以我们可以得出;
∂ α ∂ z = ∂ α ∂ y ∂ y ∂ z + ∂ α ∂ x ∂ x ∂ z = x T ∂ y ∂ z + y T ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}= \frac {\partial α}{\partial \mathbf y}\frac {\partial \mathbf y}{\partial \mathbf z}+\frac {\partial α}{\partial \mathbf x}\frac {\partial \mathbf x}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=yαzy+xαzx=xTzy+yTzx
命题7

假设标量 α α α
α = x T x α=\mathbf x^T\mathbf x α=xTx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量,并且 x \mathbf x x是关于向量 z \mathbf z z的函数,则:
∂ α ∂ z = 2 x T ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=2\mathbf x^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=2xTzx
证明:由结论6证明

命题8

假设标量 α α α
α = y T A x α=\mathbf y^TA\mathbf x α=yTAx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, y \mathbf y y m × 1 m\times 1 m×1的列向量, A A A n × n n \times n n×n的矩阵, A A A不依赖 z \mathbf z z,并且 x , y \mathbf x,\mathbf y x,y是关于向量 z \mathbf z z的函数,则:
∂ α ∂ z = x T A T ∂ y ∂ z + y T A ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=\mathbf x^TA^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTATzy+yTAzx
证明:令:
w T = y T A \mathbf w^T=\mathbf y^TA wT=yTA
α α α可以写作:
α = w T x α=\mathbf w^T \mathbf x α=wTx
由结论6,我们可以得到:
∂ α ∂ z = x T ∂ w ∂ z + w T ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T\frac {\partial \mathbf w}{\partial \mathbf z}+\mathbf w^T\frac {\partial \mathbf x}{\partial \mathbf z} zα=xTzw+wTzx
我们在将 w \mathbf w w带回到式子中:
∂ α ∂ z = x T ∂ ( A T y ) ∂ z + y T A ∂ x ∂ z = x T A T ∂ y ∂ z + y T A ∂ x ∂ z \begin{aligned} \frac {\partial α}{\partial \mathbf z}&=\mathbf x^T\frac {\partial ( A^T \mathbf y)}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z}\\ &=\mathbf x^TA^T\frac {\partial \mathbf y}{\partial \mathbf z}+\mathbf y^TA\frac {\partial \mathbf x}{\partial \mathbf z}\\ \end{aligned} zα=xTz(ATy)+yTAzx=xTATzy+yTAzx
命题9

假设标量 α α α
α = x T A x α=\mathbf x^TA\mathbf x α=xTAx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A n × n n \times n n×n的矩阵, A A A不依赖 z \mathbf z z,并且 x \mathbf x x是关于向量 z \mathbf z z的函数,则:
∂ α ∂ z = x T ( A + A T ) ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=\mathbf x^T(A+A^T)\frac {\partial \mathbf x}{\partial \mathbf z} zα=xT(A+AT)zx
证明:有结论8得出

命题10

假设标量 α α α为,其中A为对称矩阵
α = x T A x α=\mathbf x^TA\mathbf x α=xTAx
其中 x \mathbf x x n × 1 n\times 1 n×1的列向量, A A A n × n n \times n n×n的矩阵, A A A不依赖 z \mathbf z z,并且 x \mathbf x x是关于向量 z \mathbf z z的函数,则:
∂ α ∂ z = 2 x T A ∂ x ∂ z \frac {\partial α}{\partial \mathbf z}=2\mathbf x^TA\frac {\partial \mathbf x}{\partial \mathbf z} zα=2xTAzx
证明:有结论9得出

命题11:

如果A是一个 m × m m\times m m×m的可逆矩阵,那么A对标量 α α α的偏微分是:
∂ A − 1 ∂ α = − A − 1 ∂ A ∂ α A − 1 \frac {\partial A^{-1}}{\partial α}=-A^{-1}\frac {\partial A}{\partial α}A^{-1} αA1=A1αAA1
证明:由定义可知
A − 1 A = I A^{-1}A=I A1A=I
等式两边对标量 α α α微分:
A − 1 ∂ A ∂ α + ∂ A ∂ α A = 0 A^{-1}\frac {\partial A}{\partial α}+\frac {\partial A}{\partial α}A=0 A1αA+αAA=0
移项:
∂ A − 1 ∂ α = − A − 1 ∂ A ∂ α A − 1 \frac {\partial A^{-1}}{\partial α}=-A^{-1}\frac {\partial A}{\partial α}A^{-1} αA1=A1αAA1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值