8 Neural Networks: Representation
8-1 Non-linear hypotheses
Non-linear Classification 0、1
8-2 Neurons and the brain
Neurons Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s;popularity diminished in late 90s
Recent resurgence: State-of-the-art technique for many applications
The “one learning algorithm” hypothsis
neuro-rewiring experiments
Sensor representation in brain
8-3 Model representation I
Neuron in brain
Neuron model: logistic unit
sigmoid (logistic) activation function.
Neural Network
input layer
hidden layer
output layer
a i ( j ) = a_i^{(j)}= ai(j)=“activation” of unit i in layer j
θ ( j ) = \theta^{(j)=} θ(j)=matrix of weight controlling function mapping from layer j to layer j+1
a 1 ( 2 ) = g ( θ 10 ( 1 ) x 0 + θ 11 ( 1 ) x 1 + θ 12 ( 1 ) x 2 + θ 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( θ 20 ( 1 ) x 0 + θ 21 ( 1 ) x 1 + θ 22 ( 1 ) x 2 + θ 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( θ 30 ( 1 ) x 0 + θ 31 ( 1 ) x 1 + θ 32 ( 1 ) x 2 + θ 33 ( 1 ) x 3 ) h θ ( x ) = a 1 ( 3 ) = g ( θ 10 ( 2 ) a 0 ( 2 ) + θ 11 ( 2 ) ) a 1 ( 2 ) + θ 12 ( 2 ) a 2 ( 2 ) + θ 13 2 a 3 ( 2 ) \begin{matrix}\\a_1^{(2)}=g(\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1)}x_3)\\a_2^{(2)}=g(\theta_{20}^{(1)}x_0+\theta_{21}^{(1)}x_1+\theta_{22}^{(1)}x_2+\theta_{23}^{(1)}x_3)\\a_3^{(2)}=g(\theta_{30}^{(1)}x_0+\theta_{31}^{(1)}x_1+\theta_{32}^{(1)}x_2+\theta_{33}^{(1)}x_3)\\h_{\theta(x)=a_1^{(3)}=g(\theta_{10}^{(2)}a_0^{(2)}+\theta_{11}^{(2)})a_1^{(2)}+\theta_{12}^{(2)}a_2^{(2)}+\theta_{13}^{2}a_3^{(2)}}\end{matrix} a1(2)=g(θ10(1)x0+θ11(1)x1+θ12(1)x2+θ13(1)x3)a2(2)=g(θ20(1)x0+θ21(1)x1+θ22(1)x2+θ23(1)x3)a3(2)=g(θ30(1)x0+θ31(1)x1+θ32(1)x2+θ33(1)x3)hθ(x)=a1(3)=g(θ10(2)a0(2)+θ11(2))a1(2)+θ12(2)a2(2)+θ132a3(2)
If network has s j s_j sj units in layer j , s j + 1 j, s_j+1 j,sj+1 units in layer j+ 1, then θ ( j ) \theta^{(j)} θ(j) will be of dimension s j + 1 × ( s j + 1 ) s_{j+1}\times(s_j+1) sj+1×(sj+1)
8-4 Model representation II
Forward propagation: Vectorized implementation
x = [ x 0 x 1 x 2 x 3 ] x =\begin{bmatrix}x_0 \\ x_1 \\ x_2\\x_3\end{bmatrix} x=⎣⎢⎢⎡x0x1x2x3⎦⎥⎥⎤ z ( 2 ) = [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] z ^ { ( 2 ) } = \left[ \begin{array} { l } { z _ { 1 } ^ { ( 2 ) } } \\ { z _ { 2 } ^ { ( 2 ) } } \\ { z _ { 3 } ^ { ( 2 ) } } \end{array} \right] z(2)=⎣⎢⎡z1(2)z2(2)z3(2)⎦⎥⎤
z ( 2 ) = θ ( 1 ) x z ^ { ( 2 ) } = \theta ^ { ( 1 ) } x z(2)=θ(1)x
a ( 2 ) = g ( z ( 2 ) ) a ^ { ( 2 ) } = g ( z ^ { ( 2 ) } ) a(2)=g(z(2))
Add a 0 ( 2 ) = 1 a_0^{(2)}=1 a0(2)=1
z ( 3 ) = θ ( 2 ) a ( 2 ) z^{(3)}=\theta^{(2)}a^{(2)} z(3)=θ(2)a(2)
h θ ( x ) = a ( 3 ) = g ( z ( 3 ) ) h_\theta(x)=a^{(3)}=g(z^{(3)}) hθ(x)=a(3)=g(z(3))
Neural Network learning its own features
Other network architectures
8-5 Example and intuitions I
XOR/XNOR
AND
Or
8-6 Example and intuitions II
Negation
Putting it together:
AND/NOT AND NOT/OR ⟶ \longrightarrow ⟶ x 1 x_1 x1 XNOR x 2 x_2 x2
(Similar to the combination of multiple logic circuits)
Handwritten digit classfication
8-6 Multi-class classification
Multiple output units : One-vs-all
Want h θ ( x ) ≈ [ 1 0 0 0 ] h_\theta(x)\approx \begin{bmatrix}1\\0\\0\\0\end{bmatrix} hθ(x)≈⎣⎢⎢⎡1000⎦⎥⎥⎤, h θ ( x ) ≈ [ 0 1 0 0 ] h_\theta(x)\approx \begin{bmatrix}0\\1\\0\\0\end{bmatrix} hθ(x)≈⎣⎢⎢⎡0100⎦⎥⎥⎤, h θ ( x ) ≈ [ 0 0 1 0 ] h_\theta(x)\approx \begin{bmatrix}0\\0\\1\\0\end{bmatrix} hθ(x)≈⎣⎢⎢⎡0010⎦⎥⎥⎤, e t c etc etc
Training set: ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ⋯ , ( x ( m ) , y ( m ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)}) (x(1),y(1)),(x(2),y(2)),⋯,(x(m),y(m))
y ( i ) y^{(i)} y(i) one of [ 1 0 0 0 ] \begin{bmatrix}1\\0\\0\\0\end{bmatrix} ⎣⎢⎢⎡1000⎦⎥⎥⎤, [ 0 1 0 0 ] \begin{bmatrix}0\\1\\0\\0\end{bmatrix} ⎣⎢⎢⎡0100⎦⎥⎥⎤, [ 0 0 1 0 ] \begin{bmatrix}0\\0\\1\\0\end{bmatrix} ⎣⎢⎢⎡0010⎦⎥⎥⎤, [ 0 0 0 1 ] \begin{bmatrix}0\\0\\0\\1\end{bmatrix} ⎣⎢⎢⎡0001⎦⎥⎥⎤