机器学习的求导公式
损失函数的求导公式
设 loss ( X ) \operatorname {loss} \left (X\right ) loss(X) 为单个样本 X X X 的损失函数,
A = g ( Z ) = ( g ( z 1 ) ⋮ g ( z n ) ) A = g\left (Z\right ) = \begin{pmatrix} \operatorname {g} \left ( z _{1} \right ) \\ \vdots \\ \operatorname {g} \left ( z _{n} \right ) \end{pmatrix} A=g(Z)=⎝⎜⎛g(z1)⋮g(zn)⎠⎟⎞ 即 a i = g ( z i ) a _{i} = \operatorname {g} \left ( z _{i} \right ) ai=g(zi),其中 g \operatorname {g} g 为激活函数。
Z = W X + b Z = WX + b Z=WX+b 即 z i = ∑ j w i , j x j + b i z _{i} = \sum \limits_{j} w _{i, j} x _{j} + b _{i} zi=j∑wi,jxj+bi
对于任意变量 x , x, x, 记 d x = ∂ ∂ x loss ( X ) \operatorname {d} x = \dfrac {\partial} {\partial x} \operatorname {loss} \left (X\right ) dx=∂x∂loss(X)
则:
d Z = d A ∗ g ′ ( Z ) \operatorname {d} Z = \operatorname {d} A * g'\left (Z\right ) dZ=dA∗g′(Z) 其中 ∗ * ∗ 为element-wise的乘积。
d X = W ⊺ ⋅ d Z \operatorname {d} X = W ^{\intercal} \cdot \operatorname {d} Z dX=W⊺⋅dZ
d W = d Z ⋅ X ⊺ \operatorname {d} W = \operatorname {d} Z \cdot X ^{\intercal} dW=dZ⋅X⊺
d b = d Z \operatorname {d} b = \operatorname {d} Z