机器学习的求导公式
损失函数的求导公式
设loss(X)\operatorname {loss} \left (X\right )loss(X) 为单个样本XXX 的损失函数,
A=g(Z)=(g(z1)⋮g(zn))A = g\left (Z\right ) = \begin{pmatrix} \operatorname {g} \left ( z _{1} \right ) \\ \vdots \\ \operatorname {g} \left ( z _{n} \right ) \end{pmatrix}A=g(Z)=⎝⎜⎛g(z1)⋮g(zn)⎠⎟⎞ 即ai=g(zi)a _{i} = \operatorname {g} \left ( z _{i} \right )ai=g(zi),其中g\operatorname {g}g 为激活函数。
Z=WX+bZ = WX + bZ=WX+b 即zi=∑jwi,jxj+biz _{i} = \sum \limits_{j} w _{i, j} x _{j} + b _{i}zi=j∑wi,jxj+bi
对于任意变量x,x,x, 记dx=∂∂xloss(X)\operatorname {d} x = \dfrac {\partial} {\partial x} \operatorname {loss} \left (X\right )dx=∂x∂loss(X)
则:
dZ=dA∗g′(Z)\operatorname {d} Z = \operatorname {d} A * g'\left (Z\right )dZ=dA∗g′(Z) 其中∗*∗ 为element-wise的乘积。
dX=W⊺⋅dZ\operatorname {d} X = W ^{\intercal} \cdot \operatorname {d} ZdX=W⊺⋅dZ
dW=dZ⋅X⊺\operatorname {d} W = \operatorname {d} Z \cdot X ^{\intercal}dW=dZ⋅X⊺
db=dZ\operatorname {d} b = \operatorname {d} Zdb=<