设al−1
a
l
−
1
为l−1
l
−
1
层输出,wl
w
l
为l
l
层权重,这里符号∗,代表深度学习里的卷积,数学上的互相关
zl=al−1∗Wl
z
l
=
a
l
−
1
∗
W
l
⎡⎣⎢⎢⎢al−111a21l−1al−131al−112al−122al−132al−113al−123al−133⎤⎦⎥⎥⎥∗[wl11wl21wl12wl22]=[zl11zl21zl12zl22]
[
a
11
l
−
1
a
12
l
−
1
a
13
l
−
1
a
21
l
−
1
a
22
l
−
1
a
23
l
−
1
a
31
l
−
1
a
32
l
−
1
a
33
l
−
1
]
∗
[
w
11
l
w
12
l
w
21
l
w
22
l
]
=
[
z
11
l
z
12
l
z
21
l
z
22
l
]
下面为书写简便不再标注层数,默认a为l−1层
l
−
1
层
,w为l层 那么按stride=1
s
t
r
i
d
e
=
1
,有:
zi,j=(∑m=02∑n=02a(i,j)⋅w(i+m,j+n))+b
z
i
,
j
=
(
∑
m
=
0
2
∑
n
=
0
2
a
(
i
,
j
)
⋅
w
(
i
+
m
,
j
+
n
)
)
+
b
即:
z11=a11w11+a12w12+a21w21+a22w22+bz12=a12w11+a13w12+a22w21+a23w22+bz21=a21w11+a22w12+a31w21+a32w22+bz22=a22w11+a23w12+a32w21+a33w22+b
z
11
=
a
11
w
11
+
a
12
w
12
+
a
21
w
21
+
a
22
w
22
+
b
z
12
=
a
12
w
11
+
a
13
w
12
+
a
22
w
21
+
a
23
w
22
+
b
z
21
=
a
21
w
11
+
a
22
w
12
+
a
31
w
21
+
a
32
w
22
+
b
z
22
=
a
22
w
11
+
a
23
w
12
+
a
32
w
21
+
a
33
w
22
+
b
反向传播
设本层敏感度图:
δ=[δ11δ21δ12δ22]
δ
=
[
δ
11
δ
12
δ
21
δ
22
]
那么上一层敏感度图:
δl−1=∂C∂zl−1=∂C∂al−1∂al−1∂zl−1
δ
l
−
1
=
∂
C
∂
z
l
−
1
=
∂
C
∂
a
l
−
1
∂
a
l
−
1
∂
z
l
−
1
而
∇ai,j=∂C∂al−1(i,j)=∑m,nm=2,n=2∂C∂zl(m,n)∂zl(m,n)al−1(i,j)=∑m,nm=2,n=2δl(m,n)∂zl(m,n)al−1(i,j)
∇
a
i
,
j
=
∂
C
∂
a
(
i
,
j
)
l
−
1
=
∑
m
,
n
m
=
2
,
n
=
2
∂
C
∂
z
(
m
,
n
)
l
∂
z
(
m
,
n
)
l
a
(
i
,
j
)
l
−
1
=
∑
m
,
n
m
=
2
,
n
=
2
δ
(
m
,
n
)
l
∂
z
(
m
,
n
)
l
a
(
i
,
j
)
l
−
1
即:
∇a11=δ11w11∇a12=δ11w12+δ12w12∇a13=δ12w12∇a21=δ11w21+δ21w11∇a22=δ11w22+δ12w21+δ21w12+δ22w11∇a23=δ12w22+δ22w12∇a31=δ21w21∇a32=δ21w22+δ22w21∇a33=δ22w22
∇
a
11
=
δ
11
w
11
∇
a
12
=
δ
11
w
12
+
δ
12
w
12
∇
a
13
=
δ
12
w
12
∇
a
21
=
δ
11
w
21
+
δ
21
w
11
∇
a
22
=
δ
11
w
22
+
δ
12
w
21
+
δ
21
w
12
+
δ
22
w
11
∇
a
23
=
δ
12
w
22
+
δ
22
w
12
∇
a
31
=
δ
21
w
21
∇
a
32
=
δ
21
w
22
+
δ
22
w
21
∇
a
33
=
δ
22
w
22
这里实际上可以,把第l层的敏感度图周围填充一圈0,再将卷积核翻转
180o
180
o
,对两者进行互相关操作,便得到
∇a
∇
a
,如下图所示:
∇a=⎡⎣⎢⎢∇a11∇a21∇a31∇a12∇a22∇a32∇a13∇a23∇a33⎤⎦⎥⎥=⎡⎣⎢⎢⎢⎢00000δ11δ2100δ12δ2200000⎤⎦⎥⎥⎥⎥∗[w22w12w21w11]=δl∗rot180(wl)
∇
a
=
[
∇
a
11
∇
a
12
∇
a
13
∇
a
21
∇
a
22
∇
a
23
∇
a
31
∇
a
32
∇
a
33
]
=
[
0
0
0
0
0
δ
11
δ
12
0
0
δ
21
δ
22
0
0
0
0
0
]
∗
[
w
22
w
21
w
12
w
11
]
=
δ
l
∗
r
o
t
180
(
w
l
)
所以上一层敏感度图:
δl−1=∂C∂zl−1=∇a∂al−1∂zl−1=δl∗rot180(wl)⨀σ(zl−1)
δ
l
−
1
=
∂
C
∂
z
l
−
1
=
∇
a
∂
a
l
−
1
∂
z
l
−
1
=
δ
l
∗
r
o
t
180
(
w
l
)
⨀
σ
(
z
l
−
1
)
求权重W的梯度
∂C∂wli,j=∑m,nm=2,n=2(∂C∂zlm,n∂zlm,nwli,j)
∂
C
∂
w
i
,
j
l
=
∑
m
,
n
m
=
2
,
n
=
2
(
∂
C
∂
z
m
,
n
l
∂
z
m
,
n
l
w
i
,
j
l
)
即:
∇w11=δ11a11+δ12a12+δ21a21+δ22a22∇w12=δ11a12+δ12a13+δ21a22+δ22a23∇w21=δ11a21+δ12a22+δ21a31+δ22a32∇w22=δ11a22+δ12a23+δ21a32+δ22a33
∇
w
11
=
δ
11
a
11
+
δ
12
a
12
+
δ
21
a
21
+
δ
22
a
22
∇
w
12
=
δ
11
a
12
+
δ
12
a
13
+
δ
21
a
22
+
δ
22
a
23
∇
w
21
=
δ
11
a
21
+
δ
12
a
22
+
δ
21
a
31
+
δ
22
a
32
∇
w
22
=
δ
11
a
22
+
δ
12
a
23
+
δ
21
a
32
+
δ
22
a
33
等价于:
∇w=⎡⎣⎢⎢a11a21a31a12a22a32a13a23a33⎤⎦⎥⎥∗[δ11δ21δ12δ22]=al−1∗δl
∇
w
=
[
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
]
∗
[
δ
11
δ
12
δ
21
δ
22
]
=
a
l
−
1
∗
δ
l
求偏差b的梯度
∂C∂bl=∑m,nm=2,n=2(∂C∂zlm,n∂zlm,nbl)=∑m,nm=2,n=2∂C∂zlm,n=∑m,nm=2,n=2δlm,n
∂
C
∂
b
l
=
∑
m
,
n
m
=
2
,
n
=
2
(
∂
C
∂
z
m
,
n
l
∂
z
m
,
n
l
b
l
)
=
∑
m
,
n
m
=
2
,
n
=
2
∂
C
∂
z
m
,
n
l
=
∑
m
,
n
m
=
2
,
n
=
2
δ
m
,
n
l