s i g m o i d f u n c t i o n \mathbf{sigmoid\ function} sigmoid function
σ
(
x
)
=
1
1
+
e
−
x
\sigma(x)=\frac{1}{1+e^{-x}}
σ(x)=1+e−x1
σ
′
(
x
)
=
(
1
−
σ
(
x
)
)
σ
(
x
)
\sigma^{'}(x)=(1-\sigma(x))\sigma(x)
σ′(x)=(1−σ(x))σ(x)
证明:
∂
σ
(
x
)
∂
x
=
e
−
x
(
1
+
e
−
x
)
2
=
1
+
e
−
x
−
1
(
1
+
e
−
x
)
2
=
1
1
+
e
−
x
−
1
(
1
+
e
−
x
)
2
=
(
1
−
1
1
+
e
−
x
)
(
1
1
+
e
−
x
)
=
(
1
−
σ
(
x
)
)
σ
(
x
)
\begin{aligned}\\ \frac{\partial\sigma(x)}{\partial x}&=\frac{e^{-x}}{(1+e^{-x})^2} \\ &= \frac{1+e^{-x}-1}{(1+e^{-x})^2} \\ &=\frac{1}{1+e^{-x}}-\frac{1}{(1+e^{-x})^2} \\ &=(1-\frac{1}{1+e^{-x}})(\frac{1}{1+e^{-x}}) \\ &=(1-\sigma(x))\sigma(x) \end{aligned}
∂x∂σ(x)=(1+e−x)2e−x=(1+e−x)21+e−x−1=1+e−x1−(1+e−x)21=(1−1+e−x1)(1+e−x1)=(1−σ(x))σ(x)
t a n h f u n c t i o n \mathbf{tanh\ function} tanh function
t
a
n
h
(
x
)
=
e
2
x
−
1
e
2
x
+
1
tanh(x)=\frac{e^{2x}-1}{e^{2x}+1}
tanh(x)=e2x+1e2x−1
t
a
n
h
′
(
x
)
=
1
−
t
a
n
h
2
(
x
)
tanh^{'}(x)=1-tanh^2(x)
tanh′(x)=1−tanh2(x)
证明:
∂
t
a
n
h
(
x
)
x
=
(
1
−
2
e
2
x
+
1
)
′
=
−
2
−
2
e
2
x
(
e
2
x
+
1
)
2
=
4
e
2
x
(
e
2
x
+
1
)
2
=
(
e
2
x
+
1
)
2
−
(
e
2
x
−
1
)
2
(
e
2
x
+
1
)
2
=
1
−
(
e
2
x
−
1
e
2
x
+
1
)
2
=
1
−
t
a
n
h
2
(
x
)
\begin{aligned}\\ \frac{\partial tanh(x)}{x}&=(1-\frac{2}{e^{2x}+1})^{'} \\ &=-2\frac{-2e^{2x}}{(e^{2x}+1)^2} \\ &=\frac{4e^{2x}}{(e^{2x}+1)^2} \\ &=\frac{(e^{2x}+1)^2-(e^{2x}-1)^2}{(e^{2x}+1)^2} \\ &=1-(\frac{e^{2x}-1}{e^{2x}+1})^2 \\ &=1-tanh^2(x) \end{aligned}
x∂tanh(x)=(1−e2x+12)′=−2(e2x+1)2−2e2x=(e2x+1)24e2x=(e2x+1)2(e2x+1)2−(e2x−1)2=1−(e2x+1e2x−1)2=1−tanh2(x)
s o f t m a x f u n c t i o n \mathbf{softmax\ function} softmax function
y
^
t
i
=
s
o
f
t
m
a
x
(
o
t
i
)
=
e
o
t
i
∑
k
e
o
t
k
\hat y_{t_i}=softmax(o_{t_i})=\frac{e^{o_{t_i}}}{\sum_k e^{o_{t_k}}}
y^ti=softmax(oti)=∑keotkeoti
s
o
f
t
m
a
x
′
(
o
t
i
)
=
∂
y
^
t
i
∂
o
t
j
=
{
y
^
t
i
(
1
−
y
^
t
i
)
,
i
f
i
=
j
−
y
^
t
i
y
^
t
j
,
i
f
i
=
̸
j
softmax^{'}(o_{t_i})=\frac{\partial \hat y_{t_i}}{\partial o_{t_j}}=\begin{cases}\hat y_{t_i}(1-\hat y_{t_i}),&if\ i=j \\ -\hat y_{t_i} \hat y_{t_j} ,&if\ i =\not j\end{cases}
softmax′(oti)=∂otj∂y^ti={y^ti(1−y^ti),−y^tiy^tj,if i=jif i≠j
证明:
s
o
f
t
m
a
x
′
(
o
t
i
)
=
∂
y
^
t
i
∂
o
t
j
softmax^{'}(o_{t_i})=\frac{\partial \hat y_{t_i}}{\partial o_{t_j}}
softmax′(oti)=∂otj∂y^ti
i
f
i
=
j
:
if\ i=j:
if i=j:
∂
y
^
t
i
∂
o
t
i
=
(
e
o
t
i
∑
k
e
o
t
k
)
′
=
(
1
−
S
e
o
t
i
+
S
)
′
/
/
s
e
t
S
=
∑
k
=
̸
i
e
o
t
k
=
S
e
o
t
i
(
e
o
t
i
+
S
)
2
=
S
e
o
t
i
+
S
e
o
t
i
e
o
t
i
+
S
=
(
1
−
e
o
t
i
e
o
t
i
+
S
)
e
o
t
i
e
o
t
i
+
S
=
(
1
−
y
^
t
i
)
y
^
t
i
\begin{aligned} \ \ \ \ \frac{\partial \hat y_{t_i}}{\partial o_{t_i}}&=(\frac{e^{o_{t_i}}}{\sum_k e^{o_{t_k}}})^{'} \\ &=(1-\frac{S}{e^{o_{t_i}}+S})^{'} \ \ \ //set\ S=\sum_{k=\not i}e^{o_{t_k}} \\ &= \frac{Se^{o_{t_i}}}{(e^{o_{t_i}}+S)^2} \\ &=\frac{S}{e^{o_{t_i}}+S}\frac{e^{o_{t_i}}}{e^{o_{t_i}}+S} \\ &=(1-\frac{e^{o_{t_i}}}{e^{o_{t_i}}+S})\frac{e^{o_{t_i}}}{e^{o_{t_i}}+S} \\ &=(1-\hat y_{t_i})\hat y_{t_i} \end{aligned}
∂oti∂y^ti=(∑keotkeoti)′=(1−eoti+SS)′ //set S=k≠i∑eotk=(eoti+S)2Seoti=eoti+SSeoti+Seoti=(1−eoti+Seoti)eoti+Seoti=(1−y^ti)y^ti
e
l
s
e
:
else:
else:
∂
y
^
t
i
∂
o
t
j
=
(
e
o
t
i
∑
k
e
o
t
k
)
′
=
(
e
o
t
i
S
+
e
o
t
j
)
′
/
/
s
e
t
S
=
∑
k
=
̸
j
e
o
t
k
=
−
e
o
t
i
e
o
t
j
(
S
+
e
o
t
j
)
2
=
−
e
o
t
i
S
+
e
o
t
j
e
o
t
j
S
+
e
o
t
j
=
−
y
^
t
i
y
^
t
j
\begin{aligned} \ \ \ \ \frac{\partial \hat y_{t_i}}{\partial o_{t_j}}&=(\frac{e^{o_{t_i}}}{\sum_k e^{o_{t_k}}})^{'}\\ &=(\frac{e^{o_{t_i}}}{S+e^{o_{t_j}}})^{'}\ \ //set\ S=\sum_{k=\not j} e^{o_{t_k}}\\ &=-\frac{e^{o_{t_i}}e^{o_{t_j}}}{(S+e^{o_{t_j}})^2} \\ &=-\frac{e^{o_{t_i}}}{S+e^{o_{t_j}}}\frac{e^{o_{t_j}}}{S+e^{o_{t_j}}} \\ &=-\hat y_{t_i}\hat y_{t_j} \end{aligned}
∂otj∂y^ti=(∑keotkeoti)′=(S+eotjeoti)′ //set S=k≠j∑eotk=−(S+eotj)2eotieotj=−S+eotjeotiS+eotjeotj=−y^tiy^tj