1. softmax \operatorname {softmax} softmax 函数的定义
对于任意一个 n n n 维向量 X = ( ⋮ x i ⋮ ) X = \begin {pmatrix} \vdots \\ {x}_{i} \\ \vdots \end {pmatrix} X=⎝⎜⎜⎛⋮xi⋮⎠⎟⎟⎞ ,定义
softmax ( X ) = ( ⋮ e x i ∑ k e x k ⋮ ) \operatorname {softmax} (X) = \begin {pmatrix} \vdots \\ \dfrac {e ^ {x_i}} { \sum \limits _{k} {e ^ {x_k} } } \\ \vdots \end {pmatrix} softmax(X)=⎝⎜⎜⎜⎜⎜⎛⋮k∑exkexi⋮⎠⎟⎟⎟⎟⎟⎞
2. softmax \operatorname {softmax} softmax 的梯度
记 n n n 维向量 Y ^ = ( ⋮ y ^ i ⋮ ) = softmax ( X ) \widehat {Y} = \begin {pmatrix} \vdots \\ \hat {y}_{i} \\ \vdots \end {pmatrix} = \operatorname {softmax} (X) Y =⎝⎜⎜⎛⋮y^i⋮⎠⎟⎟⎞=softmax(X) ,则
∂ ∂ x j y ^ i = { e x j ∑ k e x k − e x j e x j ( ∑ k e x k ) 2 , i = j − e x i e x j ( ∑ k e x k ) 2 , i ≠ j = { y ^ j ( 1 − y ^ j ) , i = j − y ^ i y ^ j , i ≠ j \dfrac {\partial } {\partial {x_j}} {\hat {y}_{i} } = \begin {cases} \dfrac {e ^ {x_j} \sum \limits _{k} {e ^ {x_k} } - e ^ {x_j} e ^ {x_j} } { { \left ( \sum \limits _{k} {e ^ {x_k} } \right ) } ^2 }, & i = j \\ \\ - \dfrac { e ^ {x_i} e ^ {x_j} } { { \left ( \sum \limits _{k} {e ^ {x_k} } \right ) } ^2 }, & i \neq j \end {cases} = \begin {cases} \hat {y}_{j} \left ( 1 - \hat {y}_{j} \right ), & i = j \\ - \hat {y}_{i} \hat {y}_{j} , & i \neq j \end {cases} ∂xj∂y^i=⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧(k∑exk)2exjk∑exk−exje