考虑三层网络:
输出层没有激活函数,只有中间层有:
y
=
W
o
u
t
⋅
t
a
n
h
(
W
i
n
⋅
x
)
y = W_{out}\cdot tanh(W_{in}\cdot x)
y=Wout⋅tanh(Win⋅x)
对其做拆分:
y
=
W
o
u
t
⋅
A
y = W_{out}\cdot A
y=Wout⋅A
A
=
t
a
n
h
(
Z
)
A = tanh(Z)
A=tanh(Z)
Z
=
W
i
n
⋅
x
Z = W_{in}\cdot x
Z=Win⋅x
其中,A 代表 activation,即通过激活函数后的值。
损失函数 Loss:
L
=
1
2
∣
∣
y
−
y
ˉ
∣
∣
2
L = \frac{1}{2} ||y-\bar{y}||^2
L=21∣∣y−yˉ∣∣2
故
δ
y
=
y
−
y
ˉ
\delta y= y-\bar{y}
δy=y−yˉ
δ
W
o
u
t
=
δ
y
⋅
A
T
\delta W_{out} =\delta y \cdot A^T
δWout=δy⋅AT
δ
A
=
W
o
u
t
T
⋅
δ
y
\delta A = W_{out}^T \cdot \delta y
δA=WoutT⋅δy
δ
Z
=
δ
A
⊙
t
a
n
h
′
(
Z
)
=
δ
A
⊙
(
1
−
t
a
n
h
2
(
Z
)
)
\delta Z = \delta A \odot tanh'(Z) = \delta A \odot \big(1-tanh^2(Z)\big)
δZ=δA⊙tanh′(Z)=δA⊙(1−tanh2(Z))
δ
W
i
n
=
δ
Z
⋅
x
T
\delta W_{in} = \delta Z \cdot x^T
δWin=δZ⋅xT
其中
⊙
\odot
⊙ 表示矩阵对应元素相乘。