Derivative of the softmax loss function

Back-propagation in a nerual network with a Softmax classifier, which uses the Softmax function:
\[\hat y_i=\frac{\exp(o_i)}{\sum_j \exp(o_j)}\]

This is used in a loss function of the form:
\[\mathcal{L}=-\sum_j{y_j\log \hat y_j}\]
where \(o\) is a vector. we need the derivate of \(\mathcal{L}\) with respect to \(o\).

Derivative of the softmax function

if \(i=j\),
\[\frac{\partial \hat y_j}{\partial o_i}=\frac{\exp(o_i)\times \sum_i \exp(o_i) - \exp(o_i)\exp(o_i)}{(\sum_i \exp(o_i))^2}=\hat y_i(1-\hat y_i)\]
if \(i\ne j\),

\[\frac{\partial \hat y_j}{\partial o_i}=\frac{0 - \exp(o_i)\exp(o_j)}{(\sum_i \exp(o_i))^2}=-\hat y_i \hat y_j\]

These two part can be conveniently combined using a construct called Kronecker Delta, so the definition of the gradient becomes,

\[\frac{\partial \hat y_j}{\partial o_i}=\hat y_i(\delta_{ij}-\hat y_i)\]

where the Kronecker delta \(\delta_{ij}\) is defined as:
\[\delta_{ij} = \begin{cases} 0 &\text{if } i \neq j, \\ 1 &\text{if } i=j. \end{cases}\]

Derivative of Cross-entropy cost function

\[\begin{split}\frac{\partial L}{\partial o_i}&=-\sum_k y_k\frac{\partial \log \hat y_k}{\partial o_i}=-\sum_k y_k\frac{1}{\hat y_k}\frac{\partial \hat y_k}{\partial o_i}\\ &=-y_i(1-\hat y_i)-\sum_{k\neq i}y_k\frac{1}{\hat y_k}(-\hat y_k \hat y_i)\\ &=-y_i(1-\hat y_i)+\sum_{k\neq i}y_k \hat y_i\\ &=-y_i +y_i\hat y_i+ \hat y_i\sum_{k\ne i}{y_k}\\ &=\hat y_i\sum_k{y_k}-y_i\\ &=\hat y_i-y_i\end{split}\]

given that \(\sum_ky_k=1\)(as \(y\) is a vector with only one non-zero element, which is \(1\)).

finally, we get,
\[\frac{\partial \mathcal{L}}{\partial o_i} = \hat y_i - y_i\]

转载于:https://www.cnblogs.com/ZJUT-jiangnan/p/5791115.html

以下是一个示例代码,用于将ReLU激活函数替换为sigmoid激活函数: ```matlab classdef SigmoidLayer < nnet.layer.Layer properties % (Optional) Layer properties. % 例如: % Layer properties can be set and used during training and prediction. % MyProperty end methods function layer = SigmoidLayer(name) % (Optional) Create a myLayer. % This function must have the same name as the class. % 例如: % layer = myLayer(myProperty) % Set layer name. layer.Name = name; % (Optional) Set layer description. layer.Description = 'Sigmoid Layer'; end function Z = predict(layer, X) % Forward input data through the layer and output the result. % 例如: % Z = layer.predict(X) % Z = mySigmoidFunction(X) Z = sigmoid(X); end function [dLdX] = backward(layer, X, Z, dLdZ, memory) % Backward propagate the derivative of the loss function through % the layer. % 例如: % dLdX = layer.backward(X, Z, dLdZ, memory) % dLdX = mySigmoidFunctionGradient(X, Z, dLdZ) dLdX = dLdZ .* Z .* (1-Z); end end end ``` 然后,您可以将它与卷积神经网络的各层结合使用,例如: ```matlab layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) SigmoidLayer('sigmoid1') maxPooling2dLayer(2,'Stride',2) convolution2dLayer(5,50) SigmoidLayer('sigmoid2') maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(500) SigmoidLayer('sigmoid3') fullyConnectedLayer(10) softmaxLayer classificationLayer]; options = trainingOptions('sgdm',... 'MaxEpochs',20,... 'InitialLearnRate',0.01); net = trainNetwork(trainData,layers,options); ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值