Numpy-基础神经网络
文章目录
Numpy是基于CPU的优化,能使用Tensorflow和Pytorch的项目尽量使用这两个深度学习库。
1神经网络本质是多层矩阵运算!!
2神经网络的核心参数是权重W和偏置b
3通过反向传播使得拟合误差最小的训练过程 实际上是min 误差的优化问题求解过程
1 矩阵运算
1.1 回归矩阵运算
机器学习的核心数据计算是矩阵计算,最重要的是矩阵乘法。
A*B=np.dot(A,B)
1.2 神经网络的分类与回归
权重和偏置是神经网络的重要参数!
神经网络具有以下两种用途:
- ⭐️回归:预测事物的值
使用一张带权重矩阵的神经网络进行加权运算,得到预测值。
- 分类:预测事物的类别
首先,使用一张带权重矩阵的神经网络进行加权运算,得到预测值;其次,进行二次加工使其能间接表示类别信息。例如使用sigmoid运算后得到二分类结果。神经网络能够提取代表特征。
def sigmoid(x):
return 1 / (1 + np.exp(-x))
1.3 多层神经网络前向
此部分有代码练习,神经网络的前向重要参数是权重和偏置。神经网络的前向传播即矩阵运算。
import numpy as np
import matplotlib.pyplot as plt
def draw_scatter(x, y):
plt.scatter(x.ravel(), y.ravel())
plt.show()
# 数据, 10组,每组一个特征
x = np.linspace(-1, 1, 10)[:, None] # shape [10, 1]
y = np.random.normal(loc=0, scale=0.2, size=[10, 1]) + x # shape [10, 1] 正态分布
draw_scatter(x, y)
def layer(in_dim, out_dim):
weights = np.random.normal(loc=0, scale=0.1, size=[in_dim, out_dim])
bias = np.full([1, out_dim], 0.1)
return {"w": weights, "b": bias}
# 模型
l1 = layer(1, 3)
l2 = layer(3, 1)
# 计算
o = x.dot(l1["w"]) + l1["b"] ##O=X*W1+B1
print("第一层出来后的 shape:", o.shape) ##10*3 = (10*1)*(1*3) .+ 1*3 (X*W1的每行+B1)
o = o.dot(l2["w"]) + l2["b"] ##O=O*W2+B2
print("第二层出来后的 shape:", o.shape) ##10*1=(10*3)*(3*1) .+ 1*1
print("output:", o)
draw_scatter(x, o)
1.4 激活函数及其导数
激活函数(激励函数,Activation Function) 即非线性函数,可用来处理复杂(非线性)数据的预测,增加神经网络的非线性拟合能力。激活函数放在神经元单元后面。注意:激励函数必须能够微分,这样才能反向传递误差。
常用的激活函数为tanh()和ReLU():
- tanh(x)
t a n h ( x ) = e x − e − x e x + e − x t a n h ′ ( x ) = ( e x + e − x ) 2 − ( e x − e − x ) 2 ( e x + e − x ) 2 = 1 − t a n h 2 ( x ) tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}\\ tanh'(x)=\frac{(e^x+e^{-x})^2-(e^x-e^{-x})^2}{(e^x+e^{-x})^2}=1-tanh^2(x) tanh(x)=ex+e−xex−e−xtanh′(x)=(ex+e−x)2(ex+e−x)2−(ex−e−x)2=1−tanh2(x)
def tanh(x):
return np.tanh(x)
def tanh_derivative(x): # 导数
return 1 - np.square(np.tanh(x))
- relu(x)=max(0,x)
r e l u ( x ) = m a x ( 0 , x ) r e l u ′ ( x ) = { 0 , x < 0 1 , x > 0 relu(x)=max(0,x)\\ relu'(x)= \begin{cases} 0,x<0\\ 1,x>0 \end{cases} relu(x)=max(0,x)relu′(x)={0,x<01,x>0
def relu(x):
return np.maximum(0,x) ##比较两个参数用maximum,max()是一个数组内部找最大值
def relu_derivative(x): # 导数
return np.where(x > 0, np.ones_like(x), np.zeros_like(x))
- sigmoid(x)=1/(1+exp(-x))
s i g m o i d ( x ) = 1 1 + e ( − x ) s i g m o i d ′ ( x ) = e ( − x ) ( 1 + e ( − x ) ) 2 = s i g m o i d ( x ) ∗ ( 1 − s i g m o i d ( x ) ) sigmoid(x)=\frac{1}{1+e^{(-x)}}\\ sigmoid'(x)=\frac{e^{(-x)}}{(1+e^{(-x)})^2}\\=sigmoid(x)*(1-sigmoid(x)) sigmoid(x)=1+e(−x)1sigmoid′(x)=(1+e(−x))2e(−x)=sigmoid(x)∗(1−sigmoid(x))
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x): # 导数
o = sigmoid(x)
return o * (1 - o)
# 第一层
o = x.dot(l1["w"]) + l1["b"]
# 可以在这里添加激活函数,增强非线性拟合能力
o = relu(o)
#o = tanh(o)
# 第二层
o = o.dot(l2["w"]) + l2["b"]
print(o.shape)
draw_scatter(x, o)
不同的神经网络会适合选择不同的激励函数:
神经网络 | 激活函数 |
---|---|
卷积神经网络 | relu() |
循环神经网络 | tanh()、relu() |
2 自动训练
神经网络能够做前向的信息加工与预测,但是仅仅前向传播其结果并不准确。模型需要通过后向传播不断学习,不断修正自己的网络参数(权重和偏置),进而提高准确率。
神经网络训练步骤:
2.1 模型后向传播
反向传播理论:通过计算输出误差对网络中各个权重的偏导数,而后根据偏导数调整权重(常用梯度下降法则),使得网络的输出误差逐渐减小。其使用链式法则来计算网络中每个神经元的偏导数。
2.1.1 损失函数
损失函数描述是正向传播后得到的预测值与实际数值之间的误差。最常用的损失函数是最小二乘法损失函数。
- 最小二乘法损失函数
E = 1 2 ( Y p r e d i c t − Y ) 2 E=\frac{1}{2}(Y_{predict}-Y)^2 E=21(Ypredict−Y)2
汇总见:
https://www.51cto.com/article/758058.html
【精选】神经网络损失函数汇总_猿代码_xiao的博客-优快云博客
2.2 举例
训练如下图所示的神经网络
2.2.1 数学公式计算
通过训练使得拟合误差最小的过程实际上是min 误差的优化问题求解过程,以上训练可转换为以下数学表达:
m
i
n
W
1
,
W
2
,
b
1
,
b
2
E
=
(
o
2
−
y
)
2
s
.
t
.
o
1
=
X
W
1
+
b
1
a
1
=
f
(
o
1
)
o
2
=
a
1
W
2
+
b
2
=
>
m
i
n
E
=
(
f
(
X
W
1
+
b
1
)
⋅
W
2
+
b
2
−
y
)
2
m
i
n
W
1
,
W
2
,
b
1
,
b
2
E
=
(
o
2
−
y
)
2
s
.
t
.
o
1
=
X
W
1
+
b
1
∣
∣
10
∗
3
=
(
10
∗
1
)
∗
(
1
∗
3
)
.
+
1
∗
3
a
1
=
f
(
o
1
)
∣
∣
10
∗
3
=
10
∗
3
o
2
=
a
1
W
2
+
b
2
∣
∣
10
∗
1
=
(
10
∗
3
)
∗
(
3
∗
1
)
.
+
1
∗
1
min_{W_1,W_2,b_1,b_2}\ E=(o_2-y)^2\\ s.t.o_1=XW_1+b1\\ a_1=f(o_1)\\ o_2=a_1W_2+b2\\ =>min\ E=(f(XW_1+b1)·W_2+b2-y)^2\\\\ min_{W_1,W_2,b_1,b_2}\ E=(o_2-y)^2\\ s.t.o_1=XW_1+b1||10*3 = (10*1)*(1*3) .+ 1*3\\ a_1=f(o_1)||10*3=10*3\\ o_2=a_1W_2+b2||10*1=(10*3)*(3*1) .+ 1*1\\\\
minW1,W2,b1,b2 E=(o2−y)2s.t.o1=XW1+b1a1=f(o1)o2=a1W2+b2=>min E=(f(XW1+b1)⋅W2+b2−y)2minW1,W2,b1,b2 E=(o2−y)2s.t.o1=XW1+b1∣∣10∗3=(10∗1)∗(1∗3).+1∗3a1=f(o1)∣∣10∗3=10∗3o2=a1W2+b2∣∣10∗1=(10∗3)∗(3∗1).+1∗1
计算输出误差E对网络中各个权重的偏导数:
∂
E
∂
W
2
=
∂
E
∂
o
2
⋅
∂
o
2
∂
W
2
=
2
⋅
(
o
2
−
y
)
⋅
a
1
T
∂
E
∂
b
2
=
∂
E
∂
o
2
⋅
∂
o
2
∂
b
2
=
2
(
o
2
−
y
)
⋅
1
T
∂
E
∂
W
1
=
∂
E
∂
o
2
⋅
∂
o
2
∂
a
1
⋅
∂
a
1
∂
o
1
⋅
∂
o
1
∂
W
1
=
2
(
o
2
−
y
)
⋅
W
2
T
⋅
f
′
⋅
X
T
∂
E
∂
b
1
=
∂
E
∂
o
2
⋅
∂
o
2
∂
a
1
⋅
∂
a
1
∂
o
1
⋅
∂
o
1
∂
b
1
=
2
(
o
2
−
y
)
⋅
W
2
T
⋅
f
′
⋅
1
T
(
考虑矩阵维度
)
∂
E
(
1
)
∂
W
2
(
3
∗
1
)
=
{
∂
o
2
(
10
∗
1
)
∂
W
2
(
3
∗
1
)
(
3
∗
10
)
}
⋅
{
∂
E
(
1
)
∂
o
2
(
10
∗
1
)
(
10
∗
1
)
}
=
a
1
T
⋅
2
(
o
2
−
y
)
(
3
∗
10
10
∗
1
=
3
∗
1
)
∂
E
(
1
)
∂
b
2
(
1
)
=
{
∂
E
(
1
)
∂
o
2
(
10
∗
1
)
(
10
∗
1
)
}
⋅
{
∂
o
2
(
10
∗
1
)
∂
b
2
(
1
)
(
1
∗
1
)
}
=
1
T
⋅
2
(
o
2
−
y
)
(
10
∗
1
)
∂
E
∂
W
1
(
1
∗
3
)
=
∂
E
∂
o
2
(
10
∗
1
)
⋅
∂
o
2
∂
a
1
(
1
∗
3
)
⋅
∂
a
1
∂
o
1
(
3
∗
3
)
⋅
∂
o
1
∂
W
1
(
1
∗
10
)
=
X
T
⋅
2
(
o
2
−
y
)
⋅
W
2
T
⋅
f
′
∂
E
∂
b
1
(
1
∗
1
)
=
∂
E
∂
o
2
⋅
∂
o
2
∂
a
1
⋅
∂
a
1
∂
o
1
⋅
∂
o
1
∂
b
1
=
1
T
⋅
2
(
o
2
−
y
)
⋅
W
2
T
⋅
f
′
\frac{\partial E}{\partial W_2}=\frac{\partial E}{\partial o_2}·\frac{\partial o_2}{\partial W_2}=2·(o_2-y)·a_1^T\\ \frac{\partial E}{\partial b_2}=\frac{\partial E}{\partial o_2}·\frac{\partial o_2}{\partial b_2}=2(o_2-y)·1^T\\ \frac{\partial E}{\partial W_1}=\frac{\partial E}{\partial o_2}·\frac{\partial o_2}{\partial a_1}·\frac{\partial a_1}{\partial o_1}·\frac{\partial o_1}{\partial W_1}=2(o_2-y)·W_2^T·f'·X^T\\ \frac{\partial E}{\partial b_1}=\frac{\partial E}{\partial o_2}·\frac{\partial o_2}{\partial a_1}·\frac{\partial a_1}{\partial o_1}·\frac{\partial o_1}{\partial b_1}=2(o_2-y)·W_2^T·f'·1^T\\ (考虑矩阵维度)\\ \frac{\partial E\ (1)}{\partial W_2\ (3*1)}=\{\frac{\partial o_2\ (10*1)}{\partial W_2\ (3*1 )}\ (3*10)\}·\{\frac{\partial E\ (1)}{\partial o_2\ (10*1)}\ (10*1)\}=a_1^T·2(o_2-y)\ (3*10 \ 10*1=3*1)\\ \frac{\partial E\ (1)}{\partial b_2\ (1)}=\{\frac{\partial E\ (1)}{\partial o_2\ (10*1)}\ (10*1)\}·\{\frac{\partial o_2\ (10*1)}{\partial b_2\ (1)}\ (1*1)\}=1^T·2(o_2-y) \ (10*1)\\ \frac{\partial E}{\partial W_1}\ (1*3)=\frac{\partial E}{\partial o_2}\ (10*1)·\frac{\partial o_2}{\partial a_1}\ (1*3)·\frac{\partial a_1}{\partial o_1} \ (3*3)·\frac{\partial o_1}{\partial W_1}\ (1*10)=X^T·2(o_2-y)·W_2^T·f'\\ \frac{\partial E}{\partial b_1}\ (1*1)=\frac{\partial E}{\partial o_2}·\frac{\partial o_2}{\partial a_1}·\frac{\partial a_1}{\partial o_1}·\frac{\partial o_1}{\partial b_1}=1^T·2(o_2-y)·W_2^T·f'\\
∂W2∂E=∂o2∂E⋅∂W2∂o2=2⋅(o2−y)⋅a1T∂b2∂E=∂o2∂E⋅∂b2∂o2=2(o2−y)⋅1T∂W1∂E=∂o2∂E⋅∂a1∂o2⋅∂o1∂a1⋅∂W1∂o1=2(o2−y)⋅W2T⋅f′⋅XT∂b1∂E=∂o2∂E⋅∂a1∂o2⋅∂o1∂a1⋅∂b1∂o1=2(o2−y)⋅W2T⋅f′⋅1T(考虑矩阵维度)∂W2 (3∗1)∂E (1)={∂W2 (3∗1)∂o2 (10∗1) (3∗10)}⋅{∂o2 (10∗1)∂E (1) (10∗1)}=a1T⋅2(o2−y) (3∗10 10∗1=3∗1)∂b2 (1)∂E (1)={∂o2 (10∗1)∂E (1) (10∗1)}⋅{∂b2 (1)∂o2 (10∗1) (1∗1)}=1T⋅2(o2−y) (10∗1)∂W1∂E (1∗3)=∂o2∂E (10∗1)⋅∂a1∂o2 (1∗3)⋅∂o1∂a1 (3∗3)⋅∂W1∂o1 (1∗10)=XT⋅2(o2−y)⋅W2T⋅f′∂b1∂E (1∗1)=∂o2∂E⋅∂a1∂o2⋅∂o1∂a1⋅∂b1∂o1=1T⋅2(o2−y)⋅W2T⋅f′
由于此问题为无约束优化问题,可使用梯度下降法进行求解,可得:
W
2
=
W
2
−
α
⋅
∂
E
∂
W
2
b
2
=
b
2
−
α
⋅
∂
E
∂
b
2
W
1
=
W
1
−
α
⋅
∂
E
∂
W
1
b
1
=
b
1
−
α
⋅
∂
E
∂
b
1
W_2=W_2-\alpha·\frac{\partial E}{\partial W_2}\\ b_2=b_2-\alpha·\frac{\partial E}{\partial b_2}\\ W_1=W_1-\alpha·\frac{\partial E}{\partial W_1}\\ b_1=b_1-\alpha·\frac{\partial E}{\partial b_1}\\
W2=W2−α⋅∂W2∂Eb2=b2−α⋅∂b2∂EW1=W1−α⋅∂W1∂Eb1=b1−α⋅∂b1∂E
2.2.2 pythonCode
np_neuralNetwork.py
import numpy as np
import matplotlib.pyplot as plt
def draw_scatter(x, y):
plt.scatter(x.ravel(), y.ravel())
plt.show()
def draw_line(x, y):
idx = np.argsort(x.ravel())
plt.plot(x.ravel()[idx], y.ravel()[idx])
plt.show()
def layer(in_dim, out_dim):
weights = np.random.normal(loc=0, scale=0.1, size=[in_dim, out_dim])
bias = np.full([1, out_dim], 0.1)
return {"w": weights, "b": bias}
def relu(x):
return np.maximum(0, x)
def relu_derivative(x): # 导数
return np.where(x > 0, np.ones_like(x), np.zeros_like(x))
# 数据, 10组,每组一个特征
x = np.linspace(-1, 1, 10)[:, None] # shape [10, 1]
y = np.random.normal(loc=0, scale=0.2, size=[10, 1]) + x # shape [10, 1] 正态分布
# 模型
l1 = layer(1, 3)
l2 = layer(3, 1)
def predict(x):
o1 = x.dot(l1["w"]) + l1["b"]
a1 = relu(o1) # 这里我添加了一个激活函数
o2 = a1.dot(l2["w"]) + l2["b"]
return [o1, a1, o2]
def backprop(dz, layer, layer_in): #当前神经层及其输入参数
gw = layer_in.T.dot(dz)
gb = np.sum(dz, axis=0, keepdims=True)
new_dz = dz.dot(layer["w"].T) #W^T*dz
layer["w"] -= learning_rate * gw
layer["b"] -= learning_rate * gb
return new_dz
# 训练 300 次
learning_rate = 0.01
for i in range(300):
# 前向预测
o1, a1, o2 = predict(x)
# 误差计算
if i % 10 == 0:
average_cost = np.mean(np.square(o2 - y))
print(average_cost)
# 反向传播,梯度更新
dz2 = 2 * (o2 - y) # 输出误差 (o2 - y)**2 的导数,即dz2=dE/dO2
dz1 = backprop(dz2, l2, a1) #dz1=dE/da1=(dO2/da1)*(dE/dO2)
dz1 *= relu_derivative(o1) # 这里要添加对应激活函数的反向传播dz1=dE/do1=dE/da1 *(da1/dO1)
_ = backprop(dz1, l1, x)
draw_line(x, predict(x)[-1])
draw_scatter(x, y)