设 X=(xij)m×nX = (x_{ij})_{m \times n}X=(xij)m×n,函数 f(X)=f(x11,x12,…,x1n,x21,…,xmn)f(X) = f(x_{11}, x_{12}, \ldots, x_{1n}, x_{21}, \ldots, x_{mn})f(X)=f(x11,x12,…,x1n,x21,…,xmn) 是一个 m×nm \times nm×n 元的多元函数,且偏导数
∂f∂xij(i=1,2,…,m, j=1,2,…,n) \frac{\partial f}{\partial x_{ij}} \quad (i=1,2,\ldots,m,\ j=1,2,\ldots,n) ∂xij∂f(i=1,2,…,m, j=1,2,…,n)
都存在。定义 f(X)f(X)f(X) 对矩阵 XXX 的导数为:
df(X)dX=(∂f∂xij)m×n=[∂f∂x11⋯∂f∂x1n⋮⋱⋮∂f∂xm1⋯∂f∂xmn] \frac{df(X)}{dX} = \left( \frac{\partial f}{\partial x_{ij}} \right)_{m \times n} =\begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{bmatrix} dXdf(X)=(∂xij∂f)m×n=∂x11∂f⋮∂xm1∂f⋯⋱⋯∂x1n∂f⋮∂xmn∂f
(1) 设 x=(ξ1,ξ2,⋯ ,ξn)⊤\mathbf{x} = (\xi_1, \xi_2, \cdots, \xi_n)^\topx=(ξ1,ξ2,⋯,ξn)⊤,nnn 元函数 f(x)f(\mathbf{x})f(x),求 dfdx⊤\frac{df}{d\mathbf{x}^\top}dx⊤df、dfdx\frac{df}{d\mathbf{x}}dxdf 和 d2fdx2\frac{d^2f}{d\mathbf{x}^2}dx2d2f。
dfdx⊤=(∂f∂ξ1,∂f∂ξ2,⋯ ,∂f∂ξn) \frac{df}{d\mathbf{x}^\top} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1}, \frac{\partial f}{\partial \xi_2},\cdots, \frac{\partial f}{\partial \xi_n} \end{pmatrix} dx⊤df=(∂ξ1∂f,∂ξ2∂f,⋯,∂ξn∂f)
∇f(x)=dfdx=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn),这就是梯度。 \nabla f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} \text{,这就是梯度。} ∇f(x)=dxdf=∂ξ1∂f∂ξ2∂f⋮∂ξn∂f,这就是梯度。
H(x)=∇2f(x)=∂2f∂x∂x⊤=[∂2f∂ξ12∂2f∂ξ1∂ξ2⋯∂2f∂ξ1∂ξn∂2f∂ξ2∂ξ1∂2f∂ξ22⋯∂2f∂ξ2∂ξn⋮⋮⋱⋮∂2f∂ξn∂ξ1∂2f∂ξn∂ξ2⋯∂2f∂ξn2],这就是Hessian 矩阵,它是对称的。 H(\mathbf{x}) = \nabla^2 f(\mathbf{x}) = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^\top} = \begin{bmatrix} \frac{\partial^2 f}{\partial \xi_1^2} & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_n} \\ \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_2^2} & \cdots & \frac{\partial^2 f}{\partial \xi_2 \partial \xi_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial \xi_n \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_n \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_n^2} \end{bmatrix}, \text{这就是Hessian 矩阵,它是对称的。} H(x)=∇2f(x)=∂x∂x⊤∂2f=∂ξ12∂2f∂ξ2∂ξ1∂2f⋮∂ξn∂ξ1∂2f∂ξ1∂ξ2∂2f∂ξ22∂2f⋮∂ξn∂ξ2∂2f⋯⋯⋱⋯∂ξ1∂ξn∂2f∂ξ2∂ξn∂2f⋮∂ξn2∂2f,这就是Hessian 矩阵,它是对称的。
(2) 设 a=(a1,a2,⋯ ,an)⊤\mathbf{a} = \begin{pmatrix} a_1, a_2, \cdots, a_n \end{pmatrix}^\topa=(a1,a2,⋯,an)⊤ 为向量变量,且 f(x)=f(x,a)f(\mathbf{x}) = f(\mathbf{x}, \mathbf{a})f(x)=f(x,a),求 ∂f∂x\frac{\partial f}{\partial \mathbf{x}}∂x∂f。
解:由于 f(x)=∑i=1naiξjf(\mathbf{x}) = \sum_{i=1}^{n} a_i \xi_jf(x)=∑i=1naiξj,∂f∂ξj=aj\frac{\partial f}{\partial \xi_j} = a_j∂ξj∂f=aj,(j=1,2,⋯ ,n)(j = 1,2,\cdots, n)(j=1,2,⋯,n),所以
∂f∂x=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn)=(a1a2⋮an)=a \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \mathbf{a} ∂x∂f=∂ξ1∂f∂ξ2∂f⋮∂ξn∂f=a1a2⋮an=a
(3) 设 A=(aij)m×nA = \left(a_{ij}\right)_{m \times n}A=(aij)m×n 为常矩阵,X=(xij)n×mX = \left( x_{ij} \right)_{n \times m}X=(xij)n×m 为矩阵变量,且 f(X)=tr(AX)f(\mathbf{X}) = \operatorname{tr}(\mathbf{A X})f(X)=tr(AX),求 ∂f∂X\frac{\partial f}{\partial X}∂X∂f。
分析:
(c11⋯c1m⋮⋱⋮cm1⋯cmm)=(a11⋯a1n⋮⋱⋮am1⋯amn)(x11⋯x1n⋮⋱⋮xn1⋯xnm)
\begin{pmatrix}
c_{11} & \cdots & c_{1m} \\
\vdots & \ddots & \vdots \\
c_{m1} & \cdots & c_{mm}
\end{pmatrix}=\begin{pmatrix}
a_{11} & \cdots & a_{1n} \\
\vdots & \ddots & \vdots \\
a_{m1} & \cdots & a_{mn}
\end{pmatrix}\begin{pmatrix}
x_{11} & \cdots & x_{1n} \\
\vdots & \ddots & \vdots \\
x_{n1} & \cdots & x_{nm}
\end{pmatrix}
c11⋮cm1⋯⋱⋯c1m⋮cmm=a11⋮am1⋯⋱⋯a1n⋮amnx11⋮xn1⋯⋱⋯x1n⋮xnm
展开后得:
c11=a11x11+a12x21+⋯+a1nxn1,c22=a21x12+a22x22+⋯+a2nxn2,⋮cmn=am1x1m+am2x2m+⋯+amnxnm
\begin{equation}
\begin{aligned}
c_{11} &= a_{11}x_{11} + a_{12}x_{21} + \cdots + a_{1n}x_{n1}, \\
c_{22} &= a_{21}x_{12} + a_{22}x_{22} + \cdots + a_{2n}x_{n2}, \\
&\qquad \mathllap{\vdots} \\
c_{mn} &= a_{m1}x_{1m} + a_{m2}x_{2m} + \cdots + a_{mn}x_{nm}
\end{aligned}
\end{equation}
c11c22cmn=a11x11+a12x21+⋯+a1nxn1,=a21x12+a22x22+⋯+a2nxn2,⋮=am1x1m+am2x2m+⋯+amnxnm
规律:每个 xxx 只会被用到一次,xxx 的下标和 aaa 的下标是相反的。
解:由于 AX=(∑i=1naikxki)m×mAX = \left(\sum_{i=1}^n a_{ik}x_{ki}\right)_{m \times m}AX=(∑i=1naikxki)m×m,
所以:f(X)=tr(AX)=∑s=1m∑k=1naskxksf(\mathbf{X}) = \operatorname{tr}(\mathbf{AX}) = \sum_{s=1}^{m} \sum_{k=1}^n a_{sk} x_{ks}f(X)=tr(AX)=∑s=1m∑k=1naskxks。
而:
(∂f∂xij)n×m=(aji)n×m(i=1,2,⋯ ,n,j=1,2,⋯ ,m)
\left( \frac{\partial f}{\partial x_{ij}} \right)_{n \times m} = (a_{ji})_{n \times m} \quad (i=1,2,\cdots,n, j = 1,2,\cdots,m)
(∂xij∂f)n×m=(aji)n×m(i=1,2,⋯,n,j=1,2,⋯,m)
故:
∂f∂X=(∂f∂xij)=(aji)n×m=A⊤
\frac{\partial f}{\partial X} = \left( \frac{\partial f}{\partial x_{ij}} \right) = (a_{ji})_{n \times m} = A^\top
∂X∂f=(∂xij∂f)=(aji)n×m=A⊤
(4) 设 x=(ξ1,ξ2,⋯ ,ξn)⊤\mathbf{x} = \left( \xi_1, \xi_2, \cdots, \xi_n \right)^\topx=(ξ1,ξ2,⋯,ξn)⊤,矩阵 A=(aij)n×nA = \left(a_{ij}\right)_{n \times n}A=(aij)n×n,nnn 元函数 f(x)=x⊤Axf(\mathbf{x}) = \mathbf{x}^\top A \mathbf{x}f(x)=x⊤Ax,求导数 dfdx\dfrac{d f}{d \mathbf{x}}dxdf。
解:因
f(x)=x⊤Ax=(ξ1,ξ2,⋯ ,ξn)(a11a12⋯a1na21a22⋯a2n⋮⋮⋱⋮an1an2⋯ann)(ξ1ξ2ξ3⋮ξn)=(ξ1ξ2⋯ξk⋯ξn)(∑i=1na1iξi∑i=1na2iξi⋮∑i=1nakiξi⋮∑i=1naniξi)=ξ1∑j=1na1jξj+⋯+ξk∑j=1nakjξj+⋯+ξn∑j=1nanjξj
\begin{align*}
f\left( \mathbf{x} \right)
&= \mathbf{x}^\top A \mathbf{x} \\
&=
\left( \xi_1, \xi_2, \cdots, \xi_n \right)
\begin{pmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\end{pmatrix}
\begin{pmatrix}
\xi_1 \\
\xi_2 \\
\xi_3 \\
\vdots \\
\xi_n
\end{pmatrix} \\
&=
\left(
\begin{array}{cccccc}
\xi_1 & \xi_2 & \cdots & \xi_k & \cdots & \xi_n
\end{array}
\right)
\left(
\begin{array}{c}
\displaystyle \sum_{i=1}^n a_{1i} \xi_i \\
\displaystyle \sum_{i=1}^n a_{2i} \xi_i \\
\vdots \\
\displaystyle \sum_{i=1}^n a_{ki} \xi_i \\
\vdots \\
\displaystyle \sum_{i=1}^n a_{ni} \xi_i
\end{array}
\right) \\
&=
\xi_1\sum_{j=1}^{n}a_{1j}\xi_j + \cdots + \xi_k\sum_{j=1}^{n}a_{kj}\xi_j + \cdots + \xi_n\sum_{j=1}^{n}a_{nj}\xi_j
\end{align*}
f(x)=x⊤Ax=(ξ1,ξ2,⋯,ξn)a11a21⋮an1a12a22⋮an2⋯⋯⋱⋯a1na2n⋮annξ1ξ2ξ3⋮ξn=(ξ1ξ2⋯ξk⋯ξn)i=1∑na1iξii=1∑na2iξi⋮i=1∑nakiξi⋮i=1∑naniξi=ξ1j=1∑na1jξj+⋯+ξkj=1∑nakjξj+⋯+ξnj=1∑nanjξj
所以:
∂f(x)∂ξk=ξ1a1k+⋯+ξk−1ak−1,k+(∑j=1nakjξj+ξkakk)+ξk+1ak+1,k+⋯+ξnank=∑i=1naikξi+∑j=1nakjξj,k=1,2,⋯ ,n
\begin{align*}
\frac{\partial f(\mathbf{x})}{\partial \xi_k}
&= \xi_1 a_{1k} + \cdots + \xi_{k-1} a_{k-1,k} + \left( \sum_{j=1}^{n} a_{kj} \xi_j + \xi_k a_{kk}\right) + \xi_{k+1} a_{k+1,k} + \cdots + \xi_n a_{nk} \\
&= \sum_{i=1}^n a_{ik} \xi_i + \sum_{j=1}^n a_{kj} \xi_j, \quad k=1,2,\cdots,n
\end{align*}
∂ξk∂f(x)=ξ1a1k+⋯+ξk−1ak−1,k+(j=1∑nakjξj+ξkakk)+ξk+1ak+1,k+⋯+ξnank=i=1∑naikξi+j=1∑nakjξj,k=1,2,⋯,n
所以:
dfdx=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn)=(∑j=1na1jξj∑j=1na2jξj⋮∑j=1nanjξj)+(∑i=1nai1ξi∑i=1nai2ξi⋮∑i=1nainξi)=Ax+A⊤x=(A+A⊤)x
\begin{align*}
\dfrac{d f}{d \mathbf{x}}
&=\begin{pmatrix}
\frac{\partial f}{\partial \xi_1} \\
\frac{\partial f}{\partial \xi_2} \\
\vdots \\
\frac{\partial f}{\partial \xi_n}
\end{pmatrix}
=\left(
\begin{array}{c}
\displaystyle \sum_{j=1}^n a_{1j} \xi_j \\
\displaystyle \sum_{j=1}^n a_{2j} \xi_j \\
\vdots \\
\displaystyle \sum_{j=1}^n a_{nj} \xi_j
\end{array}
\right) + \left(
\begin{array}{c}
\displaystyle \sum_{i=1}^n a_{i1} \xi_i \\
\displaystyle \sum_{i=1}^n a_{i2} \xi_i \\
\vdots \\
\displaystyle \sum_{i=1}^n a_{in} \xi_i
\end{array}
\right) \\
&=Ax + A^\top x = (A + A^\top)x
\end{align*}
dxdf=∂ξ1∂f∂ξ2∂f⋮∂ξn∂f=j=1∑na1jξjj=1∑na2jξj⋮j=1∑nanjξj+i=1∑nai1ξii=1∑nai2ξi⋮i=1∑nainξi=Ax+A⊤x=(A+A⊤)x
特别地,当A为对称矩阵时,dfdx=2Ax\dfrac{d f}{d \mathbf{x}} = 2Axdxdf=2Ax
矩阵的偏导数
1168

被折叠的 条评论
为什么被折叠?



