Softmax Cross Entropy 梯度推导
Softmax的梯度
Softmax定义:
p i = e a i ∑ k = 1 N e a k , i ∈ N p_i = \frac{e^{a_i}}{\sum_{k=1}^Ne^{a_k}}, \quad i\in N pi=∑k=1Neakeai,i∈N
数值稳定的Softmax:
e a i ∑ k = 1 N e a k = C e a i C ∑ k = 1 N e a k = e a i + log C ∑ k = 1 N e a k + log C , i ∈ N \frac{e^{a_i}}{\sum_{k=1}^Ne^{a_k}} = \frac{Ce^{a_i}}{C\sum_{k=1}^Ne^{a_k}} = \frac{e^{a_i + \log{C}}}{\sum_{k=1}^Ne^{a_k + \log{C}}}, \quad i \in N ∑k=1Neakeai=C∑k=1NeakCeai=∑k=1Neak+logCeai+logC,i∈N
其中 log C = − max ( a ) \log{C} = -\max{(\bm{a})} logC=−max(a)
对于向量函数Softmax,第i个输出相对于第j个输入的偏导数可以定义如下:
∂ p i ∂ a j = ∂ e a i ∑ k = 1 N e a k ∂ a j \frac{\partial p_i}{\partial a_j} = \frac{\partial{\frac{e^{a_i}}{\sum_{k=1}^Ne^{a_k}}}}{\partial a_j} ∂aj∂pi=∂aj∂∑k=1Neakeai
那么根据商的求导法则:
f ( x ) = g ( x ) h ( x ) , f ′ ( x ) = g ′ ( x ) h ( x ) − h ′ ( x ) g ( x ) ( h ( x ) ) 2 f(x) = \frac{g(x)}{h(x)}, \quad f'(x) = \frac{g'(x)h(x) - h'(x)g(x)}{(h(x))^2} f(x)=h(x)g(x),f′(x)=(h(x))2g′(x)h(x)−h′(x)g(x)
在这里, g i = e a i , h i = ∑ k = 1 N e a k g_i = e^{a_i},\quad h_i = \sum_{k=1}^Ne^{a_k} g