1.Softmax函数及Softmax交叉熵损失函数的导数
对于一组输入[1, 2, …,i ,…] 使用softmax公式可将其转化为概率分布的形式。Softmax公式:
yi=ei∑jejy_i=\frac{e^i}{\sum_j{e^j}}yi=∑jejei
其中i为一组输入中的第i个输入。则对应的softmax输出是yiy_iyi。
对于分类问题来说,经常使用softmax交叉熵损失函数,形式如下:
Loss=−∑itilnyiLoss=-\sum_i{t_i\ln{y_i}}Loss=−i∑tilnyi
其中,tit_iti为真实值,yiy_iyi为预测值。当预测第i个时,ti=1t_i=1ti=1,其他为0,则损失为:
Lossi=−lnyiLoss_i=-\ln{y_i}Lossi=−lnyi
其中
yi=ei∑jej=1−∑j̸=iej∑jej(分子中是j不等于i)y_i = \frac{e^i}{\sum_j{e^j}}=1-\frac{\sum_{j\not=i}{e^j}}{\sum_j{e^j}} (分子中是j不等于i)yi=∑jejei=1−∑jej∑j̸=iej(分子中是j不等于i)
对LossiLoss_iLossi求偏导则有:
∂Lossi∂i=−∂lnyi∂i=∂(−lnei∑jej)∂i=−1ei∑jej⋅∂(ei∑jej)∂i=−∑jejei⋅∂(1−∑j̸=iej∑jej)∂i=−∑jejei⋅(−∑j̸=iej)⋅∂(1∑jej)∂i=∑jej⋅∑j̸=iejei⋅−ei(∑jej)2=−∑j̸=iej∑jej=−(1−ei∑jej)=−(1−yi)=yi−1
\begin{aligned}
\frac{\partial{Loss_i}}{\partial{i}}&=-\frac{\partial{\ln{y_i}}}{\partial{i}} \\
&=\frac{\partial{(-\ln{\frac{e^i}{\sum_j{e^j}}})}}{\partial{i}} \\
&=-\frac{1}{\frac{e^i}{\sum_j{e^j}}}\cdot\frac{\partial{(\frac{e^i}{\sum_j{e^j}})}}{\partial{i}} \\
&=-\frac{\sum_j{e^j}}{e^i}\cdot\frac{\partial{(1-\frac{\sum_{j\not=i}{e^j}}{\sum_j{e^j}})}}{\partial{i}} \\
&=-\frac{\sum_j{e^j}}{e^i}\cdot(-\sum_{j\not=i}{e^j})\cdot\frac{\partial{(\frac{1}{\sum_j{e^j}})}}{\partial{i}} \\
&=\frac{\sum_j{e^j}\cdot\sum_{j\not=i}{e^j}}{e^i}\cdot\frac{-e^i}{(\sum_j{e^j})^2} \\
&=-\frac{\sum_{j\not=i}{e^j}}{\sum_j{e^j}} \\
&=-(1-\frac{e^i}{\sum_j{e^j}}) \\
&=-(1-y_i) \\
&=y_i-1
\end{aligned}
∂i∂Lossi=−∂i∂lnyi=∂i∂(−ln∑jejei)=−∑jejei1⋅∂i∂(∑jejei)=−ei∑jej⋅∂i∂(1−∑jej∑j̸=iej)=−ei∑jej⋅(−j̸=i∑ej)⋅∂i∂(∑jej1)=ei∑jej⋅∑j̸=iej⋅(∑jej)2−ei=−∑jej∑j̸=iej=−(1−∑jejei)=−(1−yi)=yi−1
即,∂Lossi∂i=yi−1\frac{\partial{Loss_i}}{\partial{i}}=y_i - 1∂i∂Lossi=yi−1
也可以直接对softmax求导得:
∂ej∑kek∂i=DjSi={Si(1−Sj)i=j−SjSii̸=j
\frac{\partial{\frac{e^j}{\sum_k{e^k}}}}{\partial{i}}= D_jS_i=\left\{\begin{matrix}
S_i(1-S_j) & i=j\\
-S_jS_i & i \not=j
\end{matrix}\right.
∂i∂∑kekej=DjSi={Si(1−Sj)−SjSii=ji̸=j
其中,DjD_jDj对应∂ej∑kek\partial{\frac{e^j}{\sum_k{e^k}}}∂∑kekej, Si对应iS_i对应iSi对应i。i是上述的出入
2.sigmoid函数及其导数
sigmoid函数形式:
f(z)=11+e−zf(z) = \frac{1}{1+e^{-z}}f(z)=1+e−z1
则有:1−f(z)=f(−z)1-f(z) = f(-z)1−f(z)=f(−z)
对f(z)求导有:
f′(z)=(11+e−z)′=e−z(1+e−z)2=1+e−z−1(1+e−z)2=1(1+e−z)(1−1(1+e−z))=f(z)(1−f(z))
\begin{aligned}
f'(z) &= (\frac{1}{1+e^{-z}})'
\\
&= \frac{e^{-z}}{(1+e^{-z})^{2}}
\\
&= \frac{1+e^{-z}-1}{(1+e^{-z})^{2}}
\\
&= \frac{1}{(1+e^{-z})}(1-\frac{1}{(1+e^{-z})})
\\
&= f(z)(1-f(z))
\\
\end{aligned}
f′(z)=(1+e−z1)′=(1+e−z)2e−z=(1+e−z)21+e−z−1=(1+e−z)1(1−(1+e−z)1)=f(z)(1−f(z))