矩阵的求导
下面举常用的三种求导的例子:
例1 f(x)=Axf(x)=Axf(x)=Ax,其中
A=[a11a12a13a21a22a23]A=\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\end{matrix}\right]A=[a11a21a12a22a13a23]
x=[x1x2x3]x=\left[\begin{matrix}x_{1}\\x_2\\x_3\end{matrix}\right]x=⎣⎡x1x2x3⎦⎤
则f(x)=[a11a12a13a21a22a23][x1x2x3]=[a11x1+a12x2+a13x3a21x1+a22x2+a23x3]=[f1f2]f(x)=\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\end{matrix}\right]\left[\begin{matrix}x_{1}\\x_2\\x_3\end{matrix}\right]=\left[\begin{matrix}a_{11}x_1+a_{12}x_2+a_{13}x_3\\a_{21}x_1+a_{22}x_2+a_{23}x_3\end{matrix}\right]=\left[\begin{matrix}f_{1}\\f_2\end{matrix}\right]f(x)=[a11a21a12a22a13a23]⎣⎡x1x2x3⎦⎤=[a11x1+a12x2+a13x3a21x1+a22x2+a23x3]=[f1f2]
对矩阵f(x)f(x)f(x)求导:
∂f(x)∂xT=[∂f1∂x1∂f1∂x2∂f1∂x3∂f2∂x1∂f2∂x1∂f2∂x1]=[a11a12a13a21a22a23]=A\frac{\partial f(x)}{\partial x^T}=\left[\begin{matrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&\frac{\partial f_1}{\partial x_3}\\\frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_1}\end{matrix}\right]=\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\end{matrix}\right]=A∂xT∂f(x)=[∂x1∂f1∂x1∂f2∂x2∂f1∂x1∂f2∂x3∂f1∂x1∂f2]=[a11a21a12a22a13a23]=A
总结:
∂f(x)∂xT=∂(Ax)∂xT=A\frac{\partial f(x)}{\partial x^T}=\frac{\partial (Ax)}{\partial x^T}=A∂xT∂f(x)=∂xT∂(Ax)=A
一个矩阵(A)与一个向量(列向量x)相乘,再对此列向量的转置求导,得到此矩阵本身(A)。这是列向量(f)对行向量(x转置)求导的求导法则。
例2:x=[x1x2]x=\left[\begin{matrix}x_1\\x_2\end{matrix}\right]x=[x1x2]和A=[abcd]A=\left[\begin{matrix}a&b\\c&d\end{matrix}\right]A=[acbd].
f(x)=xTAx=[x1x2][abcd][x1x2]=[ax1+cx2bx1+dx2][x1x2]=ax2+(b+c)x1x2+dx22f(x)=x^TAx=\left[\begin{matrix}x_1&x_2\end{matrix}\right]\left[\begin{matrix}a&b\\c&d\end{matrix}\right]\left[\begin{matrix}x_1\\x_2\end{matrix}\right]=\left[\begin{matrix}ax_1+cx_2&bx_1+dx_2\end{matrix}\right]\left[\begin{matrix}x_1\\x_2\end{matrix}\right]=ax^2+(b+c)x_1x_2+dx_2^2f(x)=xTAx=[x1x2][acbd][x1x2]=[ax1+cx2bx1+dx2][x1x2]=ax2+(b+c)x1x2+dx22 (结果为实数)
求导:
∂f∂x=[∂f∂x1∂f∂x2]=[2ax1+(b+c)x2(b+c)x1+2dx2]=[[abcd]+[acbd]][x1x2]=(A+AT)x\frac{\partial f}{\partial x}=\left[\begin{matrix}\frac{\partial f}{\partial x_1}\\\frac{\partial f}{\partial x_2}\end{matrix}\right]=\left[\begin{matrix}2ax_1+(b+c)x_2\\(b+c)x_1+2dx_2\end{matrix}\right]=\left[\begin{matrix}\left[\begin{matrix}a&b\\c&d\end{matrix}\right]+\left[\begin{matrix}a&c\\b&d\end{matrix}\right]\end{matrix}\right]\left[\begin{matrix}x_1\\x_2\end{matrix}\right]=(A+A^T)x∂x∂f=[∂x1∂f∂x2∂f]=[2ax1+(b+c)x2(b+c)x1+2dx2]=[[acbd]+[abcd]][x1x2]=(A+AT)x (fff即f(x)f(x)f(x))
特别地,当AAA为对称矩阵(A=ATA=A^TA=AT)时,∂f∂x=2Ax\frac{\partial f}{\partial x}=2Ax∂x∂f=2Ax
总结:
实数f(x)=xTAxf(x)=x^TAxf(x)=xTAx对列向量x求导即:
∂f(x)∂x=∂(xTAx)∂x=Ax+ATx\frac{\partial f(x)}{\partial x}=\frac{\partial (x^TAx)}{\partial x}=Ax+A^Tx∂x∂f(x)=∂x∂(xTAx)=Ax+ATx
例3:a=[a1a2],x=[x1x2],f(x)=aTx,求f(x)=aTx的导数。a=\left[\begin{matrix}a_1\\a_2\end{matrix}\right],x=\left[\begin{matrix}x_1\\x_2\end{matrix}\right],f(x)=a^Tx,求f(x)=a^Tx的导数。a=[a1a2],x=[x1x2],f(x)=aTx,求f(x)=aTx的导数。
f=f(x)=[a1a2][x1x2]=a1x1+a2x2f=f(x)=\left[\begin{matrix}a_1&a_2\end{matrix}\right]\left[\begin{matrix}x_1\\x_2\end{matrix}\right]=a_1x_1+a_2x_2f=f(x)=[a1a2][x1x2]=a1x1+a2x2
∂f∂x=[∂f∂x1∂f∂x2]=[a1a2]=a\frac{\partial f}{\partial x}=\left[\begin{matrix}\frac{\partial f}{\partial x_1}\\\frac{\partial f}{\partial x_2}\end{matrix}\right]=\left[\begin{matrix}a_1\\a_2\end{matrix}\right]=a∂x∂f=[∂x1∂f∂x2∂f]=[a1a2]=a
∂f∂a=[∂f∂a1∂f∂a2]=[x1x2]=x\frac{\partial f}{\partial a}=\left[\begin{matrix}\frac{\partial f}{\partial a_1}\\\frac{\partial f}{\partial a_2}\end{matrix}\right]=\left[\begin{matrix}x_1\\x_2\end{matrix}\right]=x∂a∂f=[∂a1∂f∂a2∂f]=[x1x2]=x