矩阵的偏导数

矩阵的偏导数

X=(xij)m×nX = (x_{ij})_{m \times n}X=(xij)m×n,函数 f(X)=f(x11,x12,…,x1n,x21,…,xmn)f(X) = f(x_{11}, x_{12}, \ldots, x_{1n}, x_{21}, \ldots, x_{mn})f(X)=f(x11,x12,,x1n,x21,,xmn) 是一个 m×nm \times nm×n 元的多元函数,且偏导数

∂f∂xij(i=1,2,…,m, j=1,2,…,n) \frac{\partial f}{\partial x_{ij}} \quad (i=1,2,\ldots,m,\ j=1,2,\ldots,n) xijf(i=1,2,,m, j=1,2,,n)

都存在。定义 f(X)f(X)f(X) 对矩阵 XXX 的导数为:

df(X)dX=(∂f∂xij)m×n=[∂f∂x11⋯∂f∂x1n⋮⋱⋮∂f∂xm1⋯∂f∂xmn] \frac{df(X)}{dX} = \left( \frac{\partial f}{\partial x_{ij}} \right)_{m \times n} =\begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{bmatrix} dXdf(X)=(xijf)m×n=x11fxm1fx1nfxmnf

(1) 设 x=(ξ1,ξ2,⋯ ,ξn)⊤\mathbf{x} = (\xi_1, \xi_2, \cdots, \xi_n)^\topx=(ξ1,ξ2,,ξn)nnn 元函数 f(x)f(\mathbf{x})f(x),求 dfdx⊤\frac{df}{d\mathbf{x}^\top}dxdfdfdx\frac{df}{d\mathbf{x}}dxdfd2fdx2\frac{d^2f}{d\mathbf{x}^2}dx2d2f

dfdx⊤=(∂f∂ξ1,∂f∂ξ2,⋯ ,∂f∂ξn) \frac{df}{d\mathbf{x}^\top} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1}, \frac{\partial f}{\partial \xi_2},\cdots, \frac{\partial f}{\partial \xi_n} \end{pmatrix} dxdf=(ξ1f,ξ2f,,ξnf)

∇f(x)=dfdx=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn),这就是梯度。 \nabla f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} \text{,这就是梯度。} f(x)=dxdf=ξ1fξ2fξnf,这就是梯度。

H(x)=∇2f(x)=∂2f∂x∂x⊤=[∂2f∂ξ12∂2f∂ξ1∂ξ2⋯∂2f∂ξ1∂ξn∂2f∂ξ2∂ξ1∂2f∂ξ22⋯∂2f∂ξ2∂ξn⋮⋮⋱⋮∂2f∂ξn∂ξ1∂2f∂ξn∂ξ2⋯∂2f∂ξn2],这就是Hessian 矩阵,它是对称的。 H(\mathbf{x}) = \nabla^2 f(\mathbf{x}) = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^\top} = \begin{bmatrix} \frac{\partial^2 f}{\partial \xi_1^2} & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_n} \\ \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_2^2} & \cdots & \frac{\partial^2 f}{\partial \xi_2 \partial \xi_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial \xi_n \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_n \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_n^2} \end{bmatrix}, \text{这就是Hessian 矩阵,它是对称的。} H(x)=2f(x)=xx2f=ξ122fξ2ξ12fξnξ12fξ1ξ22fξ222fξnξ22fξ1ξn2fξ2ξn2fξn22f,这就是Hessian 矩阵,它是对称的。

(2) 设 a=(a1,a2,⋯ ,an)⊤\mathbf{a} = \begin{pmatrix} a_1, a_2, \cdots, a_n \end{pmatrix}^\topa=(a1,a2,,an) 为向量变量,且 f(x)=f(x,a)f(\mathbf{x}) = f(\mathbf{x}, \mathbf{a})f(x)=f(x,a),求 ∂f∂x\frac{\partial f}{\partial \mathbf{x}}xf

解:由于 f(x)=∑i=1naiξjf(\mathbf{x}) = \sum_{i=1}^{n} a_i \xi_jf(x)=i=1naiξj∂f∂ξj=aj\frac{\partial f}{\partial \xi_j} = a_jξjf=aj(j=1,2,⋯ ,n)(j = 1,2,\cdots, n)(j=1,2,,n),所以

∂f∂x=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn)=(a1a2⋮an)=a \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \mathbf{a} xf=ξ1fξ2fξnf=a1a2an=a

(3) 设 A=(aij)m×nA = \left(a_{ij}\right)_{m \times n}A=(aij)m×n 为常矩阵,X=(xij)n×mX = \left( x_{ij} \right)_{n \times m}X=(xij)n×m 为矩阵变量,且 f(X)=tr⁡(AX)f(\mathbf{X}) = \operatorname{tr}(\mathbf{A X})f(X)=tr(AX),求 ∂f∂X\frac{\partial f}{\partial X}Xf

分析:
(c11⋯c1m⋮⋱⋮cm1⋯cmm)=(a11⋯a1n⋮⋱⋮am1⋯amn)(x11⋯x1n⋮⋱⋮xn1⋯xnm) \begin{pmatrix} c_{11} & \cdots & c_{1m} \\ \vdots & \ddots & \vdots \\ c_{m1} & \cdots & c_{mm} \end{pmatrix}=\begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}\begin{pmatrix} x_{11} & \cdots & x_{1n} \\ \vdots & \ddots & \vdots \\ x_{n1} & \cdots & x_{nm} \end{pmatrix} c11cm1c1mcmm=a11am1a1namnx11xn1x1nxnm

展开后得:
c11=a11x11+a12x21+⋯+a1nxn1,c22=a21x12+a22x22+⋯+a2nxn2,⋮cmn=am1x1m+am2x2m+⋯+amnxnm \begin{equation} \begin{aligned} c_{11} &= a_{11}x_{11} + a_{12}x_{21} + \cdots + a_{1n}x_{n1}, \\ c_{22} &= a_{21}x_{12} + a_{22}x_{22} + \cdots + a_{2n}x_{n2}, \\ &\qquad \mathllap{\vdots} \\ c_{mn} &= a_{m1}x_{1m} + a_{m2}x_{2m} + \cdots + a_{mn}x_{nm} \end{aligned} \end{equation} c11c22cmn=a11x11+a12x21++a1nxn1,=a21x12+a22x22++a2nxn2,=am1x1m+am2x2m++amnxnm

规律:每个 xxx 只会被用到一次,xxx 的下标和 aaa 的下标是相反的。

解:由于 AX=(∑i=1naikxki)m×mAX = \left(\sum_{i=1}^n a_{ik}x_{ki}\right)_{m \times m}AX=(i=1naikxki)m×m

所以:f(X)=tr⁡(AX)=∑s=1m∑k=1naskxksf(\mathbf{X}) = \operatorname{tr}(\mathbf{AX}) = \sum_{s=1}^{m} \sum_{k=1}^n a_{sk} x_{ks}f(X)=tr(AX)=s=1mk=1naskxks

而:
(∂f∂xij)n×m=(aji)n×m(i=1,2,⋯ ,n,j=1,2,⋯ ,m) \left( \frac{\partial f}{\partial x_{ij}} \right)_{n \times m} = (a_{ji})_{n \times m} \quad (i=1,2,\cdots,n, j = 1,2,\cdots,m) (xijf)n×m=(aji)n×m(i=1,2,,n,j=1,2,,m)

故:
∂f∂X=(∂f∂xij)=(aji)n×m=A⊤ \frac{\partial f}{\partial X} = \left( \frac{\partial f}{\partial x_{ij}} \right) = (a_{ji})_{n \times m} = A^\top Xf=(xijf)=(aji)n×m=A

(4) 设 x=(ξ1,ξ2,⋯ ,ξn)⊤\mathbf{x} = \left( \xi_1, \xi_2, \cdots, \xi_n \right)^\topx=(ξ1,ξ2,,ξn),矩阵 A=(aij)n×nA = \left(a_{ij}\right)_{n \times n}A=(aij)n×nnnn 元函数 f(x)=x⊤Axf(\mathbf{x}) = \mathbf{x}^\top A \mathbf{x}f(x)=xAx,求导数 dfdx\dfrac{d f}{d \mathbf{x}}dxdf

解:因
f(x)=x⊤Ax=(ξ1,ξ2,⋯ ,ξn)(a11a12⋯a1na21a22⋯a2n⋮⋮⋱⋮an1an2⋯ann)(ξ1ξ2ξ3⋮ξn)=(ξ1ξ2⋯ξk⋯ξn)(∑i=1na1iξi∑i=1na2iξi⋮∑i=1nakiξi⋮∑i=1naniξi)=ξ1∑j=1na1jξj+⋯+ξk∑j=1nakjξj+⋯+ξn∑j=1nanjξj \begin{align*} f\left( \mathbf{x} \right) &= \mathbf{x}^\top A \mathbf{x} \\ &= \left( \xi_1, \xi_2, \cdots, \xi_n \right) \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} \xi_1 \\ \xi_2 \\ \xi_3 \\ \vdots \\ \xi_n \end{pmatrix} \\ &= \left( \begin{array}{cccccc} \xi_1 & \xi_2 & \cdots & \xi_k & \cdots & \xi_n \end{array} \right) \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{1i} \xi_i \\ \displaystyle \sum_{i=1}^n a_{2i} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ki} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ni} \xi_i \end{array} \right) \\ &= \xi_1\sum_{j=1}^{n}a_{1j}\xi_j + \cdots + \xi_k\sum_{j=1}^{n}a_{kj}\xi_j + \cdots + \xi_n\sum_{j=1}^{n}a_{nj}\xi_j \end{align*} f(x)=xAx=(ξ1,ξ2,,ξn)a11a21an1a12a22an2a1na2nannξ1ξ2ξ3ξn=(ξ1ξ2ξkξn)i=1na1iξii=1na2iξii=1nakiξii=1naniξi=ξ1j=1na1jξj++ξkj=1nakjξj++ξnj=1nanjξj

所以:
∂f(x)∂ξk=ξ1a1k+⋯+ξk−1ak−1,k+(∑j=1nakjξj+ξkakk)+ξk+1ak+1,k+⋯+ξnank=∑i=1naikξi+∑j=1nakjξj,k=1,2,⋯ ,n \begin{align*} \frac{\partial f(\mathbf{x})}{\partial \xi_k} &= \xi_1 a_{1k} + \cdots + \xi_{k-1} a_{k-1,k} + \left( \sum_{j=1}^{n} a_{kj} \xi_j + \xi_k a_{kk}\right) + \xi_{k+1} a_{k+1,k} + \cdots + \xi_n a_{nk} \\ &= \sum_{i=1}^n a_{ik} \xi_i + \sum_{j=1}^n a_{kj} \xi_j, \quad k=1,2,\cdots,n \end{align*} ξkf(x)=ξ1a1k++ξk1ak1,k+(j=1nakjξj+ξkakk)+ξk+1ak+1,k++ξnank=i=1naikξi+j=1nakjξj,k=1,2,,n
所以:
dfdx=(∂f∂ξ1∂f∂ξ2⋮∂f∂ξn)=(∑j=1na1jξj∑j=1na2jξj⋮∑j=1nanjξj)+(∑i=1nai1ξi∑i=1nai2ξi⋮∑i=1nainξi)=Ax+A⊤x=(A+A⊤)x \begin{align*} \dfrac{d f}{d \mathbf{x}} &=\begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} =\left( \begin{array}{c} \displaystyle \sum_{j=1}^n a_{1j} \xi_j \\ \displaystyle \sum_{j=1}^n a_{2j} \xi_j \\ \vdots \\ \displaystyle \sum_{j=1}^n a_{nj} \xi_j \end{array} \right) + \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{i1} \xi_i \\ \displaystyle \sum_{i=1}^n a_{i2} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{in} \xi_i \end{array} \right) \\ &=Ax + A^\top x = (A + A^\top)x \end{align*} dxdf=ξ1fξ2fξnf=j=1na1jξjj=1na2jξjj=1nanjξj+i=1nai1ξii=1nai2ξii=1nainξi=Ax+Ax=(A+A)x
特别地,当A为对称矩阵时,dfdx=2Ax\dfrac{d f}{d \mathbf{x}} = 2Axdxdf=2Ax

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值