L = 1 N ∑ i = 1 N − l o g ( e S y i ∑ j e S j ) + λ R ( W 1 ) + λ R ( W 2 ) S = W 2 m a x ( 0 , W 1 x + b 1 ) + b 2 L=\frac{1}{N}\sum_{i=1}^{N}-log(\frac{e^{S_{y_i}}}{\sum_{j}e^{S_j}})+λR(W1)+λR(W2)\\S=W_2max(0,W_1x+b1)+b2 L=N1∑i=1N−log(∑jeSjeSyi)+λR(W1)+λR(W2)S=W2max(0,W1x+b1)+b2
注意: d L d W ≠ d L d S y i ∗ d S y i d W + d L d S j ∗ d S j d W \bold{注意:}\frac{dL}{dW}≠\frac{dL}{dS_{y_i}}*\frac{dS_{y_i}}{dW}+\frac{dL}{dS_j}*\frac{dS_j}{dW} 注意:dWdL=dSyidL∗dWdSyi+dSjdL∗dWdSj,因为 S y i S_{y_i} Sy