【深度学习入门】NNDL学习笔记(一)

前言

http://neuralnetworksanddeeplearning.com

本文是此电子书学习笔记。现初步完结。抽空会补上Softmax和部分练习题。

chapter 1 using nn to recognize handwriten digits

neural network uses the examples to automatically infer rules for recognizing handwritten digits. 

two important types of artificial neuron : the perceptron and the sigmoid neuron 感知器,

the standard learning algorithm for neural networks: stochastic gradient descent 随机梯度下降

Perceptrons

1.A method for weighing evidence to make decisions\ to compute the elementary logical functions.

A perceptron takes several binary inputs, x1,x2,…x1,x2,…, and produces a single binary output:

The neuron's output, 0 or 1, is determined by whether the weighted sum w\cdot x\equiv \sum_j w_jx_j is less than or greater than some threshold value. The threshold is a real number which is a parameter of the neuron.

 

output=\left\{\begin{matrix} 0& if\ w\cdot x+b\leq0\\ 1& if\ w\cdot x+b>0 \end{matrix}\right.

Perceptrons are also universal for computation.

Sigmoid Neurons

1. Crucial fact to learn: A small change in a weight (or bias) causes only a small change in output.

activation function f(w\cdot x+b)

output=\frac{1}{1+exp(-\sum_jw_j x_j-b)}=\frac{1}{1+exp(-w\cdot x-b)}

\Delta output is a linear function of the changes \Delta w_j\Delta b.

Exercises

1. Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, the behavior of the network doesn't change.

2. Because -c(w\cdot x+b)=0 \Rightarrow f(.)=\frac{1}{2}, but it should be 0 as the ouput of a perceptron.

The Architecture of a NN

1. MLPs = multilayer perceptrons

2. feedforward NN vs. recurrent NN (a neuron's output only affects its input at some later time)

A Simple Network to Classify handwritten digits

1. Learning with gradient descent

What we'd like is an algorithm which lets us find weights and biases so that the output from the network approximates y(x)for all training inputs x. To quantify how well we're achieving this goal we define a cost function*: Sometimes referred to as a loss or objective function. 

quadratic cost function \ mean squared error \ MSE: C(w,b)\equiv \frac{1}{2n} \sum_x ||y(x)-a||^2

 

Suppose in particular that C is a function of m variables, v1,…,vm:\Delta C= \nabla C \cdot \Delta v

\Delta v = -\eta\nabla Cv^{'}= v - \eta\nabla C

One problem: we need to compute the gradients ∇Cx separately for each training input x. 

Solution ---- stochastic gradient descent:

mini-batch of size m, a commonly used and powerful technique.

 \frac{\sum_{j=1}^{m} \nabla C_{X_j}}{m} \approx \frac{\sum_{x} \nabla C_x}{n} = \nabla C

2.Ball-mimicking variations

Have advantages but a major disadvantage: it turns out to be necessary to compute second partial derivatives of C, and this can be quite costly.

Exercises

An extreme version of gradient descent is to use a mini-batch size of just 1. This procedure is known as onlineon-line, or incremental learning. In online learning, a neural network learns from just one training input at a time (just as human beings do). 

One advantage: Faster.

One disadvantage: The batch can be not sufficient enough to represent all the input. And it's highly dependent on the sequence of batch.

Implementing the network to classify digits

with Python 2.7 and Numpy

1. Network class

\omega is the weight from the 2nd layer to the 3rd layer. Then, activation of the second layer will be: a'=\sigma (\omega a+b)

vectorizing: Apply the function elementwise to every entry in a vector.

2. hyper-parameters

 #epochs of training, the mini-batch size, and the learning rate η.

3. SVM support vector machine

python library:  scikit-learn,which provides a simple Python interface to a fast C-based library for SVMs known as LIBSVM.

sophisticated algorithm ≤ simple learning algorithm + good training data.

Toward Deep Learning

Networks with this kind of many-layer structure - two or more hidden layers - are called deep neural networks

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值