我觉得GPT老师的语言功底比大多数的博客主要好(包括我自己),阅读起来更易理解,而且哪里不明白还可以直接问gpt老师,孜孜不倦,尽心尽责,全天待命,究极贴心。有这么厉害的一个老师,不学习简直暴殄天物。
于是乎我准备立一个flag,挑战跟着GPT老师学习365天,每天我都会整理自己的学习心得和脉络(文字大部分都是GPT直接生成的,我觉得比我自己写肯定好多了)感谢gpt老师!跪谢
全系列文章:跟着GPT学习-AI系列
前面学了这么多知识点和公式,现在来学习一下如何手写实现一个神经网络及反向传播的整个过程。
下载数据集
import urllib.request
import os
data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 'test-labels.gz'))
输出结果
('C:\\Users\\jike4\\data\\test-labels.gz',
<http.client.HTTPMessage at 0x228e7de9990>)
解压数据集
import gzip
import shutil
import os
def extract_gz(file_path):
with gzip.open(file_path, 'rb') as f_in:
with open(file_path.replace('.gz', ''), 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')
# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data' # 替换为你的文件路径
# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
for file in files:
if file.endswith('.gz'):
extract_gz(os.path.join(root, file))
print("All files have been extracted successfully.")
加载数据集
import numpy as np
import struct
import os
def load_mnist_images(file_path):
with open(file_path, 'rb') as f:
magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
return images
def load_mnist_labels(file_path):
with open(file_path, 'rb') as f:
magic, num = struct.unpack('>II', f.read(8))
labels = np.fromfile(f, dtype=np.uint8)
return labels
# 读取数据
data_dir = 'C:\\Users\\jike4\\data' # 替换为你的文件路径
train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))
# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0
train_labels = np.eye(10)[train_labels] # One-hot encoding
test_labels = np.eye(10)[test_labels]
print(f'Train images shape: {train_images.shape}')
print(f'Train labels shape: {train_labels.shape}')
print(f'Test images shape: {test_images.shape}')
print(f'Test labels shape: {test_labels.shape}')
归一化数据
为什么要除以255? MNIST 数据集中的图像数据是灰度图像,每个像素点的值范围是0到255。0表示黑色,255表示白色,中间的值表示不同的灰度级别。
- 范围归一化: 将像素值从 0-255 的范围归一化到 0-1 的范围。
- 归一化后的数据范围为 [0, 1],这可以使不同特征的数据在同一个尺度上,避免某些特征值范围过大或过小而导致模型训练困难。
- 归一化的优点:
-
稳定训练过程: 归一化后的数据可以使模型的训练过程更加稳定,梯度下降时的收敛速度更快。原因是标准化的数据会让权重更新更加平滑,从而避免梯度爆炸或梯度消失的问题。
-
提高模型性能: 归一化后的数据可以提高模型的性能,因为它有助于模型更好地捕捉数据中的模式和规律。归一化后的输入数据通常能够更好地适应模型的初始权重,使得模型可以更快地找到全局最优解。
-
减少计算复杂度: 在计算过程中,处理较小的数字(如0到1之间的数字)比处理较大的数字(如0到255)更加高效,特别是在使用浮点数计算时。
-
One-hot 编码
One-hot 编码是一种将类别标签转换为二进制向量的方式。对于一个包含 nnn 个类别的分类问题,每个类别都会被转换为一个 nnn 维的二进制向量。在这个向量中,只有与类别对应的索引位置为 1,其余位置为 0。
示例:
假设我们有 3 个类别(0,1,2),它们的 One-hot 编码如下:
- 类别 0: [1, 0, 0]
- 类别 1: [0, 1, 0]
- 类别 2: [0, 0, 1]
为什么要使用 One-hot 编码
- 兼容性: 神经网络的输出层通常使用 softmax 激活函数,生成一个概率分布。使用 One-hot 编码的标签数据可以直接与神经网络输出进行对比,计算损失值。
- 避免顺序问题: 标签的原始值可能是整数,如 0,1,2 等。如果不进行 One-hot 编码,这些整数会被误解为有序数据,影响模型训练。而 One-hot 编码将类别标签表示为向量,避免了这种问题。
- 提高模型性能: 通过 One-hot 编码,模型可以更好地处理分类任务,准确地识别每个类别。
train_labels = np.eye(10)[train_labels] # 对训练标签进行 One-hot 编码
test_labels = np.eye(10)[test_labels] # 对测试标签进行 One-hot 编码
np.eye(10) 创建一个 10x10 的单位矩阵,其中对角线元素为 1,其余元素为 0。
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
初始化权重和偏置量
# 初始化网络参数
input_size = 784 # 输入层神经元数(28x28像素)
hidden_size = 64 # 隐藏层神经元数
output_size = 10 # 输出层神经元数(数字0-9)
# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))
print(f'W1: {W1.shape}')
print(f'b1: {b1.shape}')
print(f'W2: {W2.shape}')
print(f'b2: {b2.shape}')
输出
W1: (784, 64)
b1: (1, 64)
W2: (64, 10)
b2: (1, 10)
Relu激活函数
# ReLU 激活函数
def relu(x):
return np.maximum(0, x)
正向传播函数实现
# 前向传播
def forward_propagation(X):
Z1 = np.dot(X, W1) + b1
A1 = relu(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = relu(Z2)
return Z1, A1, Z2, A2
Z1, A1, Z2, A2 = forward_propagation(train_images)
print(f'Z1: {Z1.shape}')
print(f'A1: {A1.shape}')
print(f'Z2: {Z2.shape}')
print(f'A2: {A2.shape}')
验证传播的结果
Z1: (60000, 64)
A1: (60000, 64)
Z2: (60000, 10)
A2: (60000, 10)
Relu激活函数的导数
# ReLU 激活函数的导数
def relu_derivative(x):
return np.where(x > 0, 1, 0)
损失函数实现
# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
m = Y.shape[0]
cost = np.sum((A2 - Y) ** 2) / (2 * m)
return cost
反向传播
# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
m = X.shape[0]
dZ2 = A2 - Y # 输出层
dW2 = np.dot(A1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
dA1 = np.dot(dZ2, W2.T)
dZ1 = dA1 * relu_derivative(Z1)
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
return dW1, db1, dW2, db2
在许多标准的实现中,计算平方和误差的梯度时,常常不显式地保留乘以 2
的系数。公式里的2可以省略掉,
dZ2 = A2 - Y
由于:对于输出层的权重
w
i
j
(
L
)
w_{ij}^{(L)}
wij(L),梯度公式为:
∂
C
∂
w
i
j
(
L
)
=
(
a
j
(
L
)
−
y
j
)
⋅
σ
′
(
z
j
(
L
)
)
⋅
a
i
(
L
−
1
)
=
d
Z
2
∗
A
1
\frac{\partial C}{\partial w_{ij}^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) \cdot a_i^{(L-1)} = dZ2 * A1
∂wij(L)∂C=(aj(L)−yj)⋅σ′(zj(L))⋅ai(L−1)=dZ2∗A1
然后A1的维度为(60000,64),而A2的维度为(60000, 10),所以写成
dW2 = np.dot(A1.T, dZ2) / m
由于:对于输出层的偏置
b
j
(
L
)
b_j^{(L)}
bj(L),梯度公式为:
∂
C
∂
b
j
(
L
)
=
(
a
j
(
L
)
−
y
j
)
⋅
σ
′
(
z
j
(
L
)
)
=
d
Z
2
\frac{\partial C}{\partial b_j^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) = dZ2
∂bj(L)∂C=(aj(L)−yj)⋅σ′(zj(L))=dZ2
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
由于
∂
C
∂
a
(
l
)
=
∂
C
∂
z
(
l
+
1
)
⋅
w
(
l
+
1
)
\frac{\partial C}{\partial a^{(l)}} = \frac{\partial C}{\partial z^{(l+1)}} \cdot w^{(l+1)}
∂a(l)∂C=∂z(l+1)∂C⋅w(l+1)
所以
dA1 = np.dot(dZ2, W2.T)
由于
∂
C
∂
z
(
l
)
=
∂
C
∂
a
(
l
)
⋅
σ
′
(
z
(
l
)
)
\frac{\partial C}{\partial z^{(l)}} = \frac{\partial C}{\partial a^{(l)}} \cdot \sigma'(z^{(l)})
∂z(l)∂C=∂a(l)∂C⋅σ′(z(l))
所以
dZ1 = dA1 * relu_derivative(Z1)
由于
∂
C
∂
w
(
l
)
=
∂
C
∂
z
(
l
)
⋅
a
(
l
−
1
)
=
d
z
1
⋅
x
\frac{\partial C}{\partial w^{(l)}} = \frac{\partial C}{\partial z^{(l)}} \cdot a^{(l-1)} = dz1 \cdot x
∂w(l)∂C=∂z(l)∂C⋅a(l−1)=dz1⋅x
这里
a
(
l
−
1
)
=
x
a^{(l-1)}=x
a(l−1)=x 因为只有一层隐藏层,上一层就是输入层
x
x
x了
所以
dW1 = np.dot(X.T, dZ1) / m
由于
∂
C
∂
b
(
l
)
=
∂
C
∂
z
(
l
)
=
d
z
1
\frac{\partial C}{\partial b^{(l)}} = \frac{\partial C}{\partial z^{(l)}} = dz1
∂b(l)∂C=∂z(l)∂C=dz1
所以
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
更新参数
def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
global W1, b1, W2, b2
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
训练模型
num_epochs = 1000
learning_rate = 0.01
for epoch in range(num_epochs):
Z1, A1, Z2, A2 = forward_propagation(train_images)
cost = compute_loss_mse(A2, train_labels)
dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
update_parameters(dW1, db1, dW2, db2, learning_rate)
if epoch % 100 == 0:
print(f'Epoch {epoch}, Cost: {cost}')
模型评估
def predict(X):
_, _, _, A2 = forward_propagation(X)
predictions = np.argmax(A2, axis=1)
return predictions
train_predictions = predict(train_images)
test_predictions = predict(test_images)
train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)
print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')
完整代码
# 下载
import urllib.request
import os
data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 'test-labels.gz'))
# 解压数据集
import gzip
import shutil
import os
def extract_gz(file_path):
with gzip.open(file_path, 'rb') as f_in:
with open(file_path.replace('.gz', ''), 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')
# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data' # 替换为你的文件路径
# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
for file in files:
if file.endswith('.gz'):
extract_gz(os.path.join(root, file))
print("All files have been extracted successfully.")
# 加载数据集
import numpy as np
import struct
import os
def load_mnist_images(file_path):
with open(file_path, 'rb') as f:
magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
return images
def load_mnist_labels(file_path):
with open(file_path, 'rb') as f:
magic, num = struct.unpack('>II', f.read(8))
labels = np.fromfile(f, dtype=np.uint8)
return labels
# 读取数据
data_dir = 'C:\\Users\\jike4\\data' # 替换为你的文件路径
train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))
# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0
train_labels = np.eye(10)[train_labels] # One-hot encoding
test_labels = np.eye(10)[test_labels]
# 初始化权重和偏置量
# 初始化网络参数
input_size = 784 # 输入层神经元数(28x28像素)
hidden_size = 64 # 隐藏层神经元数
output_size = 10 # 输出层神经元数(数字0-9)
# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))
# ReLU 激活函数
def relu(x):
return np.maximum(0, x)
# 前向传播
def forward_propagation(X):
Z1 = np.dot(X, W1) + b1
A1 = relu(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = relu(Z2)
return Z1, A1, Z2, A2
# ReLU 激活函数的导数
def relu_derivative(x):
return np.where(x > 0, 1, 0)
# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
m = Y.shape[0]
cost = np.sum((A2 - Y) ** 2) / (2 * m)
return cost
# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
m = X.shape[0]
dZ2 = A2 - Y # 输出层
dW2 = np.dot(A1.T, dZ2) / m
db2 = np.sum(dZ2, axis=0, keepdims=True) / m
dA1 = np.dot(dZ2, W2.T)
dZ1 = dA1 * relu_derivative(Z1)
dW1 = np.dot(X.T, dZ1) / m
db1 = np.sum(dZ1, axis=0, keepdims=True) / m
return dW1, db1, dW2, db2
# 更新参数
def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
global W1, b1, W2, b2
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
# 训练模型
num_epochs = 1000
learning_rate = 0.01
for epoch in range(num_epochs):
Z1, A1, Z2, A2 = forward_propagation(train_images)
cost = compute_loss_mse(A2, train_labels)
dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
update_parameters(dW1, db1, dW2, db2, learning_rate)
if epoch % 100 == 0:
print(f'Epoch {epoch}, Cost: {cost}')
# 模型评估
def predict(X):
_, _, _, A2 = forward_propagation(X)
predictions = np.argmax(A2, axis=1)
return predictions
train_predictions = predict(train_images)
test_predictions = predict(test_images)
train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)
print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')