跟着GPT学习——神经网络及反向传播代码完整实现

我觉得GPT老师的语言功底比大多数的博客主要好(包括我自己),阅读起来更易理解,而且哪里不明白还可以直接问gpt老师,孜孜不倦,尽心尽责,全天待命,究极贴心。有这么厉害的一个老师,不学习简直暴殄天物。

于是乎我准备立一个flag,挑战跟着GPT老师学习365天,每天我都会整理自己的学习心得和脉络(文字大部分都是GPT直接生成的,我觉得比我自己写肯定好多了)感谢gpt老师!跪谢

全系列文章:跟着GPT学习-AI系列

前面学了这么多知识点和公式,现在来学习一下如何手写实现一个神经网络及反向传播的整个过程。

下载数据集

import urllib.request
import os

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-labels.gz'))

输出结果

('C:\\Users\\jike4\\data\\test-labels.gz',
 <http.client.HTTPMessage at 0x228e7de9990>)

在这里插入图片描述

解压数据集

import gzip
import shutil
import os

def extract_gz(file_path):
    with gzip.open(file_path, 'rb') as f_in:
        with open(file_path.replace('.gz', ''), 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')

# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.endswith('.gz'):
            extract_gz(os.path.join(root, file))

print("All files have been extracted successfully.")

加载数据集

import numpy as np
import struct
import os

def load_mnist_images(file_path):
    with open(file_path, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
    return images

def load_mnist_labels(file_path):
    with open(file_path, 'rb') as f:
        magic, num = struct.unpack('>II', f.read(8))
        labels = np.fromfile(f, dtype=np.uint8)
    return labels

# 读取数据
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))

# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0

train_labels = np.eye(10)[train_labels]  # One-hot encoding
test_labels = np.eye(10)[test_labels]

print(f'Train images shape: {train_images.shape}')
print(f'Train labels shape: {train_labels.shape}')
print(f'Test images shape: {test_images.shape}')
print(f'Test labels shape: {test_labels.shape}')

归一化数据

为什么要除以255? MNIST 数据集中的图像数据是灰度图像,每个像素点的值范围是0到255。0表示黑色,255表示白色,中间的值表示不同的灰度级别。

  • 范围归一化: 将像素值从 0-255 的范围归一化到 0-1 的范围。
    • 归一化后的数据范围为 [0, 1],这可以使不同特征的数据在同一个尺度上,避免某些特征值范围过大或过小而导致模型训练困难。
  • 归一化的优点:
    • 稳定训练过程: 归一化后的数据可以使模型的训练过程更加稳定,梯度下降时的收敛速度更快。原因是标准化的数据会让权重更新更加平滑,从而避免梯度爆炸或梯度消失的问题。

    • 提高模型性能: 归一化后的数据可以提高模型的性能,因为它有助于模型更好地捕捉数据中的模式和规律。归一化后的输入数据通常能够更好地适应模型的初始权重,使得模型可以更快地找到全局最优解。

    • 减少计算复杂度: 在计算过程中,处理较小的数字(如0到1之间的数字)比处理较大的数字(如0到255)更加高效,特别是在使用浮点数计算时。

One-hot 编码

One-hot 编码是一种将类别标签转换为二进制向量的方式。对于一个包含 nnn 个类别的分类问题,每个类别都会被转换为一个 nnn 维的二进制向量。在这个向量中,只有与类别对应的索引位置为 1,其余位置为 0。

示例:
假设我们有 3 个类别(0,1,2),它们的 One-hot 编码如下:

  • 类别 0: [1, 0, 0]
  • 类别 1: [0, 1, 0]
  • 类别 2: [0, 0, 1]

为什么要使用 One-hot 编码

  • 兼容性: 神经网络的输出层通常使用 softmax 激活函数,生成一个概率分布。使用 One-hot 编码的标签数据可以直接与神经网络输出进行对比,计算损失值。
  • 避免顺序问题: 标签的原始值可能是整数,如 0,1,2 等。如果不进行 One-hot 编码,这些整数会被误解为有序数据,影响模型训练。而 One-hot 编码将类别标签表示为向量,避免了这种问题。
  • 提高模型性能: 通过 One-hot 编码,模型可以更好地处理分类任务,准确地识别每个类别。
train_labels = np.eye(10)[train_labels]  # 对训练标签进行 One-hot 编码
test_labels = np.eye(10)[test_labels]    # 对测试标签进行 One-hot 编码

np.eye(10) 创建一个 10x10 的单位矩阵,其中对角线元素为 1,其余元素为 0。

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

初始化权重和偏置量

# 初始化网络参数
input_size = 784    # 输入层神经元数(28x28像素)
hidden_size = 64    # 隐藏层神经元数
output_size = 10    # 输出层神经元数(数字0-9)

# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

print(f'W1: {W1.shape}')
print(f'b1: {b1.shape}')
print(f'W2: {W2.shape}')
print(f'b2: {b2.shape}')

输出

W1: (784, 64)
b1: (1, 64)
W2: (64, 10)
b2: (1, 10)

Relu激活函数

# ReLU 激活函数
def relu(x):
    return np.maximum(0, x)

正向传播函数实现

# 前向传播
def forward_propagation(X):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = relu(Z2)
    return Z1, A1, Z2, A2

Z1, A1, Z2, A2 = forward_propagation(train_images)
print(f'Z1: {Z1.shape}')
print(f'A1: {A1.shape}')
print(f'Z2: {Z2.shape}')
print(f'A2: {A2.shape}')

验证传播的结果

Z1: (60000, 64)
A1: (60000, 64)
Z2: (60000, 10)
A2: (60000, 10)

Relu激活函数的导数

# ReLU 激活函数的导数
def relu_derivative(x):
    return np.where(x > 0, 1, 0)

损失函数实现

# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
    m = Y.shape[0]
    cost = np.sum((A2 - Y) ** 2) / (2 * m)
    return cost

反向传播

# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
    m = X.shape[0]
    
    dZ2 = A2 - Y # 输出层
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m
    
    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m
    
    return dW1, db1, dW2, db2

在许多标准的实现中,计算平方和误差的梯度时,常常不显式地保留乘以 2
的系数。公式里的2可以省略掉,

dZ2 = A2 - Y

由于:对于输出层的权重 w i j ( L ) w_{ij}^{(L)} wij(L),梯度公式为:
∂ C ∂ w i j ( L ) = ( a j ( L ) − y j ) ⋅ σ ′ ( z j ( L ) ) ⋅ a i ( L − 1 ) = d Z 2 ∗ A 1 \frac{\partial C}{\partial w_{ij}^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) \cdot a_i^{(L-1)} = dZ2 * A1 wij(L)C=(aj(L)yj)σ(zj(L))ai(L1)=dZ2A1
然后A1的维度为(60000,64),而A2的维度为(60000, 10),所以写成

dW2 = np.dot(A1.T, dZ2) / m

由于:对于输出层的偏置 b j ( L ) b_j^{(L)} bj(L),梯度公式为:
∂ C ∂ b j ( L ) = ( a j ( L ) − y j ) ⋅ σ ′ ( z j ( L ) ) = d Z 2 \frac{\partial C}{\partial b_j^{(L)}} = (a_j^{(L)} - y_j) \cdot \sigma'(z_j^{(L)}) = dZ2 bj(L)C=(aj(L)yj)σ(zj(L))=dZ2

db2 = np.sum(dZ2, axis=0, keepdims=True) / m

由于
∂ C ∂ a ( l ) = ∂ C ∂ z ( l + 1 ) ⋅ w ( l + 1 ) \frac{\partial C}{\partial a^{(l)}} = \frac{\partial C}{\partial z^{(l+1)}} \cdot w^{(l+1)} a(l)C=z(l+1)Cw(l+1)
所以

dA1 = np.dot(dZ2, W2.T)

由于
∂ C ∂ z ( l ) = ∂ C ∂ a ( l ) ⋅ σ ′ ( z ( l ) ) \frac{\partial C}{\partial z^{(l)}} = \frac{\partial C}{\partial a^{(l)}} \cdot \sigma'(z^{(l)}) z(l)C=a(l)Cσ(z(l))
所以

dZ1 = dA1 * relu_derivative(Z1)

由于
∂ C ∂ w ( l ) = ∂ C ∂ z ( l ) ⋅ a ( l − 1 ) = d z 1 ⋅ x \frac{\partial C}{\partial w^{(l)}} = \frac{\partial C}{\partial z^{(l)}} \cdot a^{(l-1)} = dz1 \cdot x w(l)C=z(l)Ca(l1)=dz1x
这里 a ( l − 1 ) = x a^{(l-1)}=x a(l1)=x 因为只有一层隐藏层,上一层就是输入层 x x x
所以

dW1 = np.dot(X.T, dZ1) / m

由于
∂ C ∂ b ( l ) = ∂ C ∂ z ( l ) = d z 1 \frac{\partial C}{\partial b^{(l)}} = \frac{\partial C}{\partial z^{(l)}} = dz1 b(l)C=z(l)C=dz1
所以

db1 = np.sum(dZ1, axis=0, keepdims=True) / m

更新参数

def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
    global W1, b1, W2, b2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

训练模型

num_epochs = 1000
learning_rate = 0.01

for epoch in range(num_epochs):
    Z1, A1, Z2, A2 = forward_propagation(train_images)
    cost = compute_loss_mse(A2, train_labels)
    dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
    update_parameters(dW1, db1, dW2, db2, learning_rate)
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Cost: {cost}')

模型评估

def predict(X):
	_, _, _, A2 = forward_propagation(X)
    predictions = np.argmax(A2, axis=1)
    return predictions

train_predictions = predict(train_images)
test_predictions = predict(test_images)

train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)

print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')

完整代码

# 下载
import urllib.request
import os

data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
                           filename=os.path.join(data_folder, 'test-labels.gz'))

# 解压数据集
import gzip
import shutil
import os

def extract_gz(file_path):
    with gzip.open(file_path, 'rb') as f_in:
        with open(file_path.replace('.gz', ''), 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    print(f'Extracted {file_path} to {file_path.replace(".gz", "")}')

# 指定数据目录
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

# 遍历目录并解压所有 .gz 文件
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.endswith('.gz'):
            extract_gz(os.path.join(root, file))

print("All files have been extracted successfully.")

# 加载数据集
import numpy as np
import struct
import os

def load_mnist_images(file_path):
    with open(file_path, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.fromfile(f, dtype=np.uint8).reshape(num, 784)
    return images

def load_mnist_labels(file_path):
    with open(file_path, 'rb') as f:
        magic, num = struct.unpack('>II', f.read(8))
        labels = np.fromfile(f, dtype=np.uint8)
    return labels

# 读取数据
data_dir = 'C:\\Users\\jike4\\data'  # 替换为你的文件路径

train_images = load_mnist_images(os.path.join(data_dir, 'train-images-idx3-ubyte'))
train_labels = load_mnist_labels(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
test_images = load_mnist_images(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
test_labels = load_mnist_labels(os.path.join(data_dir, 't10k-labels-idx1-ubyte'))

# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0

train_labels = np.eye(10)[train_labels]  # One-hot encoding
test_labels = np.eye(10)[test_labels]

# 初始化权重和偏置量 
# 初始化网络参数
input_size = 784    # 输入层神经元数(28x28像素)
hidden_size = 64    # 隐藏层神经元数
output_size = 10    # 输出层神经元数(数字0-9)

# 权重和偏置初始化
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

# ReLU 激活函数
def relu(x):
    return np.maximum(0, x)

# 前向传播
def forward_propagation(X):
    Z1 = np.dot(X, W1) + b1
    A1 = relu(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = relu(Z2)
    return Z1, A1, Z2, A2

# ReLU 激活函数的导数
def relu_derivative(x):
    return np.where(x > 0, 1, 0)

# 使用平方和误差计算损失
def compute_loss_mse(A2, Y):
    m = Y.shape[0]
    cost = np.sum((A2 - Y) ** 2) / (2 * m)
    return cost

# 反向传播
def backward_propagation(X, Y, Z1, A1, Z2, A2):
    m = X.shape[0]
    
    dZ2 = A2 - Y # 输出层
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m
    
    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m
    
    return dW1, db1, dW2, db2

# 更新参数
def update_parameters(dW1, db1, dW2, db2, learning_rate=0.01):
    global W1, b1, W2, b2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2

# 训练模型
num_epochs = 1000
learning_rate = 0.01

for epoch in range(num_epochs):
    Z1, A1, Z2, A2 = forward_propagation(train_images)
    cost = compute_loss_mse(A2, train_labels)
    dW1, db1, dW2, db2 = backward_propagation(train_images, train_labels, Z1, A1, Z2, A2)
    update_parameters(dW1, db1, dW2, db2, learning_rate)
    
    if epoch % 100 == 0:
        print(f'Epoch {epoch}, Cost: {cost}')

# 模型评估
def predict(X):
	_, _, _, A2 = forward_propagation(X)
    predictions = np.argmax(A2, axis=1)
    return predictions

train_predictions = predict(train_images)
test_predictions = predict(test_images)

train_accuracy = np.mean(np.argmax(train_labels, axis=1) == train_predictions)
test_accuracy = np.mean(np.argmax(test_labels, axis=1) == test_predictions)

print(f'Train Accuracy: {train_accuracy}')
print(f'Test Accuracy: {test_accuracy}')
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值