吴恩达·Machine Learning || chap8 Neural Networks:Representation 简记

本文链接：https://blog.youkuaiyun.com/qq_46203130/article/details/119679520

8 Neural Networks： Representation

8-1 Non-linear hypotheses

Non-linear Classification 0、1

8-2 Neurons and the brain

Neurons Networks

Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s;popularity diminished in late 90s
Recent resurgence: State-of-the-art technique for many applications

The “one learning algorithm” hypothsis

neuro-rewiring experiments

Sensor representation in brain

8-3 Model representation I

Neuron in brain

Neuron model: logistic unit

sigmoid (logistic) activation function.

Neural Network

input layer

hidden layer

output layer

$a_i^{(j)}=$ “activation” of unit i in layer j

$\theta^{(j)=}$ matrix of weight controlling function mapping from layer j to layer j+1

$\begin{matrix}\\a_1^{(2)}=g(\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1)}x_3)\\a_2^{(2)}=g(\theta_{20}^{(1)}x_0+\theta_{21}^{(1)}x_1+\theta_{22}^{(1)}x_2+\theta_{23}^{(1)}x_3)\\a_3^{(2)}=g(\theta_{30}^{(1)}x_0+\theta_{31}^{(1)}x_1+\theta_{32}^{(1)}x_2+\theta_{33}^{(1)}x_3)\\h_{\theta(x)=a_1^{(3)}=g(\theta_{10}^{(2)}a_0^{(2)}+\theta_{11}^{(2)})a_1^{(2)}+\theta_{12}^{(2)}a_2^{(2)}+\theta_{13}^{2}a_3^{(2)}}\end{matrix}$

If network has $s_j$ units in layer $j, s_j+1$ units in layer j+ 1, then $\theta^{(j)}$ will be of dimension $s_{j+1}\times(s_j+1)$

8-4 Model representation II

Forward propagation: Vectorized implementation

$=\begin{bmatrix}x_0 \\ x_1 \\ x_2\\x_3\end{bmatrix}$ $\left[ \begin{array} { l } { z _ { 1 } ^ { ( 2 ) } } \\ { z _ { 2 } ^ { ( 2 ) } } \\ { z _ { 3 } ^ { ( 2 ) } } \end{array} \right]$

$\theta ^ { ( 1 ) } x$

$a ^ { ( 2 ) } = g ( z ^ { ( 2 ) } )$

Add $a_0^{(2)}=1$

$z^{(3)}=\theta^{(2)}a^{(2)}$

$h_\theta(x)=a^{(3)}=g(z^{(3)})$

Neural Network learning its own features

Other network architectures

8-5 Example and intuitions I

XOR/XNOR

AND

8-6 Example and intuitions II

Negation

Putting it together:

AND/NOT AND NOT/OR $\longrightarrow$ $x_1$ XNOR $x_2$

(Similar to the combination of multiple logic circuits)

Handwritten digit classfication

8-6 Multi-class classification

Multiple output units : One-vs-all

Want $h_\theta(x)\approx \begin{bmatrix}1\\0\\0\\0\end{bmatrix}$ , $h_\theta(x)\approx \begin{bmatrix}0\\1\\0\\0\end{bmatrix}$ , $h_\theta(x)\approx \begin{bmatrix}0\\0\\1\\0\end{bmatrix}$ , $e t c$

Training set: $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})$

$y^{(i)}$ one of $\begin{bmatrix}1\\0\\0\\0\end{bmatrix}$ ， $\begin{bmatrix}0\\1\\0\\0\end{bmatrix}$ ， $\begin{bmatrix}0\\0\\1\\0\end{bmatrix}$ ， $\begin{bmatrix}0\\0\\0\\1\end{bmatrix}$