定义激活函数
sigmoid
(
)
\text{sigmoid}\left( \right)
sigmoid()
g
(
z
)
=
sigmoid
(
z
)
=
(
1
+
e
−
z
)
−
1
g\left( z \right) =\text{sigmoid}\left( z \right) =\left( 1+e^{-z} \right) ^{-1}
g(z)=sigmoid(z)=(1+e−z)−1
图中第2层为隐藏层,其各个神经元入下所示,
a
1
(
2
)
=
g
(
Θ
10
(
1
)
x
0
+
Θ
11
(
1
)
x
1
+
Θ
12
(
1
)
x
2
+
Θ
13
(
1
)
x
3
)
a
2
(
2
)
=
g
(
Θ
20
(
1
)
x
0
+
Θ
21
(
1
)
x
1
+
Θ
22
(
1
)
x
2
+
Θ
23
(
1
)
x
3
)
a
3
(
2
)
=
g
(
Θ
30
(
1
)
x
0
+
Θ
31
(
1
)
x
1
+
Θ
32
(
1
)
x
2
+
Θ
33
(
1
)
x
3
)
a_1^{\left( 2 \right)}=g\left( \boldsymbol{\Theta }_{10}^{\left( 1 \right)}x_0+\boldsymbol{\Theta }_{11}^{\left( 1 \right)}x_1+\boldsymbol{\Theta }_{12}^{\left( 1 \right)}x_2+\boldsymbol{\Theta }_{13}^{\left( 1 \right)}x_3 \right) \\ a_2^{\left( 2 \right)}=g\left( \boldsymbol{\Theta }_{20}^{\left( 1 \right)}x_0+\boldsymbol{\Theta }_{21}^{\left( 1 \right)}x_1+\boldsymbol{\Theta }_{22}^{\left( 1 \right)}x_2+\boldsymbol{\Theta }_{23}^{\left( 1 \right)}x_3 \right) \\ a_3^{\left( 2 \right)}=g\left( \boldsymbol{\Theta }_{30}^{\left( 1 \right)}x_0+\boldsymbol{\Theta }_{31}^{\left( 1 \right)}x_1+\boldsymbol{\Theta }_{32}^{\left( 1 \right)}x_2+\boldsymbol{\Theta }_{33}^{\left( 1 \right)}x_3 \right)
a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)
将第2层神经元组成一个向量
a
(
2
)
\boldsymbol{a}^{\left( 2 \right)}
a(2)
a
(
2
)
=
[
a
0
(
2
)
=
1
a
1
(
2
)
a
2
(
2
)
a
3
(
2
)
]
\boldsymbol{a}^{\left( 2 \right)}=\left[ \begin{array}{c} a_0^{\left( 2 \right)}=1\\ a_1^{\left( 2 \right)}\\ a_2^{\left( 2 \right)}\\ a_3^{\left( 2 \right)}\\ \end{array} \right]
a(2)=⎣⎢⎢⎢⎡a0(2)=1a1(2)a2(2)a3(2)⎦⎥⎥⎥⎤
将输入特征x,组成一组向量
x
\boldsymbol{x}
x,注意多了一个默认的
x
0
=
1
x_0=1
x0=1
x
=
[
x
0
=
1
x
1
x
2
x
3
]
\boldsymbol{x}=\left[ \begin{array}{c} x_0=1\\ x_1\\ x_2\\ x_3\\ \end{array} \right]
x=⎣⎢⎢⎡x0=1x1x2x3⎦⎥⎥⎤
将第
j
j
j层后的权重系数,组成矩阵
Θ
(
j
)
\boldsymbol{\Theta }^{\left( j \right)}
Θ(j),其维度是(第
j
+
1
j+1
j+1层的元素数量)
×
\times
×(第
j
j
j层的元素数量+1),其中的元素不包括偏置元素。例如下面的
Θ
(
1
)
\boldsymbol{\Theta }^{\left( 1 \right)}
Θ(1),维度是
3
×
4
3\times4
3×4
Θ
(
1
)
=
[
Θ
10
(
1
)
Θ
11
(
1
)
Θ
12
(
1
)
Θ
13
(
1
)
Θ
20
(
1
)
Θ
21
(
1
)
Θ
22
(
1
)
Θ
23
(
1
)
Θ
30
(
1
)
Θ
31
(
1
)
Θ
32
(
1
)
Θ
33
(
1
)
]
\boldsymbol{\Theta }^{\left( 1 \right)}=\left[ \begin{matrix}{} \boldsymbol{\Theta }_{10}^{\left( 1 \right)}& \boldsymbol{\Theta }_{11}^{\left( 1 \right)}& \boldsymbol{\Theta }_{12}^{\left( 1 \right)}& \boldsymbol{\Theta }_{13}^{\left( 1 \right)}\\ \boldsymbol{\Theta }_{20}^{\left( 1 \right)}& \boldsymbol{\Theta }_{21}^{\left( 1 \right)}& \boldsymbol{\Theta }_{22}^{\left( 1 \right)}& \boldsymbol{\Theta }_{23}^{\left( 1 \right)}\\ \boldsymbol{\Theta }_{30}^{\left( 1 \right)}& \boldsymbol{\Theta }_{31}^{\left( 1 \right)}& \boldsymbol{\Theta }_{32}^{\left( 1 \right)}& \boldsymbol{\Theta }_{33}^{\left( 1 \right)}\\ \end{matrix} \right]
Θ(1)=⎣⎢⎡Θ10(1)Θ20(1)Θ30(1)Θ11(1)Θ21(1)Θ31(1)Θ12(1)Θ22(1)Θ32(1)Θ13(1)Θ23(1)Θ33(1)⎦⎥⎤
以此类推,所以有入下公式,其中的1,为默认存在的偏置项。
a
(
2
)
=
[
1
g
(
Θ
(
1
)
x
)
]
\boldsymbol{a}^{\left( 2 \right)}=\left[ \begin{array}{c} 1\\ g\left( \boldsymbol{\Theta }^{\left( 1 \right)}\boldsymbol{x} \right)\\ \end{array} \right]
a(2)=[1g(Θ(1)x)]
a ( 3 ) = [ 1 g ( Θ ( 2 ) a ( 2 ) ) ] \boldsymbol{a}^{\left( 3 \right)}=\left[ \begin{array}{c} 1\\ g\left( \boldsymbol{\Theta }^{\left( 2 \right)}\boldsymbol{a}^{\left( 2 \right)} \right)\\ \end{array} \right] a(3)=[1g(Θ(2)a(2))]
y = g ( Θ ( 3 ) a ( 3 ) ) \boldsymbol{y}=g\left( \boldsymbol{\Theta }^{\left( 3 \right)}\boldsymbol{a}^{\left( 3 \right)} \right) y=g(Θ(3)a(3))
注意,只有在计算下一层时,才会给当前层添加一个隐藏的1,也就是说,
a
(
2
)
\boldsymbol{a}^{\left( 2 \right)}
a(2)有两个意思,一个是图中显示的
a
(
2
)
=
[
a
1
(
2
)
a
2
(
2
)
a
3
(
2
)
]
=
g
(
Θ
(
1
)
x
)
\boldsymbol{a}^{\left( 2 \right)}=\left[ \begin{array}{c} a_1^{\left( 2 \right)}\\ a_2^{\left( 2 \right)}\\ a_3^{\left( 2 \right)}\\ \end{array} \right] =g\left( \boldsymbol{\varTheta }^{\left( 1 \right)}\boldsymbol{x} \right)
a(2)=⎣⎢⎡a1(2)a2(2)a3(2)⎦⎥⎤=g(Θ(1)x)
另一个意思是为了计算下一层神经元,添加的隐藏的偏置1。
a ( 2 ) = [ a 0 ( 2 ) = 1 a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] = [ 1 g ( Θ ( 1 ) x ) ] \boldsymbol{a}^{\left( 2 \right)}=\left[ \begin{array}{c} a_0^{\left( 2 \right)}=1\\ a_1^{\left( 2 \right)}\\ a_2^{\left( 2 \right)}\\ a_3^{\left( 2 \right)}\\ \end{array} \right] =\left[ \begin{array}{c} 1\\ g\left( \boldsymbol{\varTheta }^{\left( 1 \right)}\boldsymbol{x} \right)\\ \end{array} \right] a(2)=⎣⎢⎢⎢⎡a0(2)=1a1(2)a2(2)a3(2)⎦⎥⎥⎥⎤=[1g(Θ(1)x)]