跟着GPT学习——神经网络及反向传播代码完整实现

原创已于 2024-06-21 10:58:18 修改

· 1.6k 阅读

48 ·

版权

文章标签：

#gpt #学习 #神经网络

于 2024-06-21 00:33:03 首次发布

跟着GPT学习-AI系列专栏收录该内容

10 篇文章

订阅专栏

我觉得GPT老师的语言功底比大多数的博客主要好（包括我自己），阅读起来更易理解，而且哪里不明白还可以直接问gpt老师，孜孜不倦，尽心尽责，全天待命，究极贴心。有这么厉害的一个老师，不学习简直暴殄天物。

于是乎我准备立一个flag，挑战跟着GPT老师学习365天，每天我都会整理自己的学习心得和脉络（文字大部分都是GPT直接生成的，我觉得比我自己写肯定好多了）感谢gpt老师！跪谢

全系列文章：跟着GPT学习-AI系列

前面学了这么多知识点和公式，现在来学习一下如何手写实现一个神经网络及反向传播的整个过程。

下载数据集

import urllib.request
import os

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-labels.gz'))

输出结果

('C:\\Users\\jike4\\data\\test-labels.gz',
 <http.client.HTTPMessage at 0x228e7de9990>)

在这里插入图片描述

解压数据集

import gzip
import shutil
import os

def extract_gz(file_path):
    with gzip.open(file_path, 'rb') as f_in:
        with open(file_path.replace('.gz', ''), 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')

# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.endswith('.gz'):
            extract_gz(os.path.join(root, file))

print("All files have been extracted successfully.")

加载数据集

import numpy as np
import struct
import os

def load_mnist_images(file_path):
    with open(file_path, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
    return images

def load_mnist_labels(file_path):
    with open(file_path, 'rb') as f:
        magic, num = struct.unpack('>II', f.read(8))
        labels = np.fromfile(f, dtype=np.uint8)
    return labels

# 读取数据
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))

# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0

train_labels = np.eye(10)[train_labels]  # One-hot encoding
test_labels = np.eye(10)[test_labels]

print(f'Train images shape: {train_images.shape}')
print(f'Train labels shape: {train_labels.shape}')
print(f'Test images shape: {test_images.shape}')
print(f'Test labels shape: {test_labels.shape}')

归一化数据

为什么要除以255？ MNIST 数据集中的图像数据是灰度图像，每个像素点的值范围是0到255。0表示黑色，255表示白色，中间的值表示不同的灰度级别。

范围归一化: 将像素值从 0-255 的范围归一化到 0-1 的范围。
- 归一化后的数据范围为 [0, 1]，这可以使不同特征的数据在同一个尺度上，避免某些特征值范围过大或过小而导致模型训练困难。
归一化的优点：
- 稳定训练过程: 归一化后的数据可以使模型的训练过程更加稳定，梯度下降时的收敛速度更快。原因是标准化的数据会让权重更新更加平滑，从而避免梯度爆炸或梯度消失的问题。
- 提高模型性能: 归一化后的数据可以提高模型的性能，因为它有助于模型更好地捕捉数据中的模式和规律。归一化后的输入数据通常能够更好地适应模型的初始权重，使得模型可以更快地找到全局最优解。
- 减少计算复杂度: 在计算过程中，处理较小的数字（如0到1之间的数字）比处理较大的数字（如0到255）更加高效，特别是在使用浮点数计算时。

One-hot 编码

One-hot 编码是一种将类别标签转换为二进制向量的方式。对于一个包含 nnn 个类别的分类问题，每个类别都会被转换为一个 nnn 维的二进制向量。在这个向量中，只有与类别对应的索引位置为 1，其余位置为 0。

示例：
假设我们有 3 个类别（0，1，2），它们的 One-hot 编码如下：

类别 0: [1, 0, 0]
类别 1: [0, 1, 0]
类别 2: [0, 0, 1]

为什么要使用 One-hot 编码

兼容性: 神经网络的输出层通常使用 softmax 激活函数，生成一个概率分布。使用 One-hot 编码的标签数据可以直接与神经网络输出进行对比，计算损失值。
避免顺序问题: 标签的原始值可能是整数，如 0，1，2 等。如果不进行 One-hot 编码，这些整数会被误解为有序数据，影响模型训练。而 One-hot 编码将类别标签表示为向量，避免了这种问题。
提高模型性能: 通过 One-hot 编码，模型可以更好地处理分类任务，准确地识别每个类别。

train_labels = np.eye(10)[train_labels]  # 对训练标签进行 One-hot 编码
test_labels = np.eye(10)[test_labels]    # 对测试标签进行 One-hot 编码

np.eye(10) 创建一个 10x10 的单位矩阵，其中对角线元素为 1，其余元素为 0。

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

初始化权重和偏置量

# 初始化网络参数
input_size = 784    # 输入层神经元数（28x28像素）
hidden_size = 64    # 隐藏层神经元数
output_size = 10    # 输出层神经元数（数字0-9）

# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

print(f'W1: {W1.shape}')
print(f'b1: {b1.shape}')
print(f'W2: {W2.shape}')
print(f'b2: {b2.shape}')

输出

W1: (784, 64)
b1: (1, 64)
W2: (64, 10)
b2: (1, 10)

Relu激活函数

# ReLU 激活函数
def relu(x):
    return np.maximum(0, x)

正向传播函数实现

# 前向传播
def forward_propagation(X):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = relu(Z2)
    return Z1, A1, Z2, A2

Z1, A1, Z2, A2 = forward_propagation(train_images)
print(f'Z1: {Z1.shape}')
print(f'A1: {A1.shape}')
print(f'Z2: {Z2.shape}')
print(f'A2: {A2.shape}')

验证传播的结果

Z1: (60000, 64)
A1: (60000, 64)
Z2: (60000, 10)
A2: (60000, 10)

Relu激活函数的导数

# ReLU 激活函数的导数
def relu_derivative(x):
    return np.where(x > 0, 1, 0)

损失函数实现

# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
    m = Y.shape[0]
    cost = np.sum((A2 - Y) ** 2) / (2 * m)
    return cost

反向传播

# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
    m = X.shape[0]
    
    dZ2 = A2 - Y # 输出层
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m
    
    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m
    
    return dW1, db1, dW2, db2

在许多标准的实现中，计算平方和误差的梯度时，常常不显式地保留乘以 2
的系数。公式里的2可以省略掉，

dZ2 = A2 - Y

由于：对于输出层的权重 $w_{ij}^{(L)}$ ，梯度公式为：
$\frac{\partial C}{\partial w_{ij}^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) \cdot a_i^{(L-1)} = dZ2 * A1$
然后A1的维度为（60000，64），而A2的维度为（60000， 10），所以写成

dW2 = np.dot(A1.T, dZ2) / m

由于：对于输出层的偏置 $b_j^{(L)}$ ，梯度公式为：
$\frac{\partial C}{\partial b_j^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) = dZ2$

db2 = np.sum(dZ2, axis=0, keepdims=True) / m

由于
$\frac{\partial C}{\partial a^{(l)}} = \frac{\partial C}{\partial z^{(l+1)}} \cdot w^{(l+1)}$
所以

dA1 = np.dot(dZ2, W2.T)

由于
$\frac{\partial C}{\partial z^{(l)}} = \frac{\partial C}{\partial a^{(l)}} \cdot \sigma'(z^{(l)})$
所以

dZ1 = dA1 * relu_derivative(Z1)

由于
$\frac{\partial C}{\partial w^{(l)}} = \frac{\partial C}{\partial z^{(l)}} \cdot a^{(l-1)} = dz1 \cdot x$
这里 $a^{(l-1)}=x$ 因为只有一层隐藏层，上一层就是输入层 $x$ 了
所以

dW1 = np.dot(X.T, dZ1) / m

由于
$\frac{\partial C}{\partial b^{(l)}} = \frac{\partial C}{\partial z^{(l)}} = dz1$
所以

db1 = np.sum(dZ1, axis=0, keepdims=True) / m

更新参数

def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
    global W1, b1, W2, b2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

训练模型

num_epochs = 1000
learning_rate = 0.01

for epoch in range(num_epochs):
    Z1, A1, Z2, A2 = forward_propagation(train_images)
    cost = compute_loss_mse(A2, train_labels)
    dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
    update_parameters(dW1, db1, dW2, db2, learning_rate)
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Cost: {cost}')

模型评估

def predict(X):
	_, _, _, A2 = forward_propagation(X)
    predictions = np.argmax(A2, axis=1)
    return predictions

train_predictions = predict(train_images)
test_predictions = predict(test_images)

train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)

print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')

完整代码

# 下载
import urllib.request
import os

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-labels.gz'))

# 解压数据集
import gzip
import shutil
import os

def extract_gz(file_path):
    with gzip.open(file_path, 'rb') as f_in:
        with open(file_path.replace('.gz', ''), 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')

# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.endswith('.gz'):
            extract_gz(os.path.join(root, file))

print("All files have been extracted successfully.")

# 加载数据集
import numpy as np
import struct
import os

def load_mnist_images(file_path):
    with open(file_path, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
    return images

def load_mnist_labels(file_path):
    with open(file_path, 'rb') as f:
        magic, num = struct.unpack('>II', f.read(8))
        labels = np.fromfile(f, dtype=np.uint8)
    return labels

# 读取数据
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))

# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0

train_labels = np.eye(10)[train_labels]  # One-hot encoding
test_labels = np.eye(10)[test_labels]

# 初始化权重和偏置量 
# 初始化网络参数
input_size = 784    # 输入层神经元数（28x28像素）
hidden_size = 64    # 隐藏层神经元数
output_size = 10    # 输出层神经元数（数字0-9）

# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

# ReLU 激活函数
def relu(x):
    return np.maximum(0, x)

# 前向传播
def forward_propagation(X):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = relu(Z2)
    return Z1, A1, Z2, A2

# ReLU 激活函数的导数
def relu_derivative(x):
    return np.where(x > 0, 1, 0)

# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
    m = Y.shape[0]
    cost = np.sum((A2 - Y) ** 2) / (2 * m)
    return cost

# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
    m = X.shape[0]
    
    dZ2 = A2 - Y # 输出层
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m
    
    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m
    
    return dW1, db1, dW2, db2

# 更新参数
def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
    global W1, b1, W2, b2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

# 训练模型
num_epochs = 1000
learning_rate = 0.01

for epoch in range(num_epochs):
    Z1, A1, Z2, A2 = forward_propagation(train_images)
    cost = compute_loss_mse(A2, train_labels)
    dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
    update_parameters(dW1, db1, dW2, db2, learning_rate)
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Cost: {cost}')

# 模型评估
def predict(X):
	_, _, _, A2 = forward_propagation(X)
    predictions = np.argmax(A2, axis=1)
    return predictions

train_predictions = predict(train_images)
test_predictions = predict(test_images)

train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)

print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')