小批量梯度下降（Mini-batch Gradient Descent）原理与实现-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_57063846/article/details/145525819

小批量梯度下降（Mini-batch Gradient Descent）原理与实现

小批量梯度下降是机器学习中优化模型参数的常用方法，介于全量梯度下降（使用全部数据）和随机梯度下降（每次使用单个样本）之间。它通过每次迭代使用小批量数据计算梯度，既减少了单次计算量，又保证了收敛稳定性。本文将结合代码实现，详细解析其核心步骤。

一、算法原理

1. 与全量/随机梯度下降对比

全量梯度下降：每次迭代使用全部数据，计算精确但耗时。
随机梯度下降：每次随机使用一个样本，计算快但收敛不稳定。
小批量梯度下降：折中方案，每次使用小批量（如10、32、64个样本）计算梯度，兼顾效率和稳定性。

2. 核心公式

参数更新公式：
$\theta = \theta - \eta \cdot \nabla_\theta J(\theta)$
其中， $\eta$ 为学习率， $\nabla_\theta J(\theta)$ 为损失函数对参数的梯度。

二、代码实现与解析

1. 数据生成

生成线性回归数据，添加随机噪声：

import numpy as np

# 生成模拟数据：y = 4 + 3x + 噪声
x = 2 * np.random.rand(100, 1)
y = 4 + 3 * x + np.random.randn(100, 1)
x_b = np.c_[np.ones((100, 1)), x]  # 添加偏置项x0=1

2. 超参数设置

n_epochs：训练轮数
batch_size：每批数据量
n_batchs：总批次数

n_epochs = 10000
m = 100                  # 总样本数
batch_size = 10          # 每批数据量
n_batchs = int(m / batch_size)  # 总批次数=10

3. 参数初始化

随机初始化参数 $\theta$ （包含偏置项和权重）：

theta = np.random.randn(2, 1)  # 初始化为随机值

4. 学习率调整

动态调整学习率，避免后期震荡：

t0, t1 = 1, 200
def learning_rate_adjust(t):
    return t0 / (t + t1)  # 学习率随迭代次数衰减

5. 训练过程

核心步骤：打乱数据、分批计算梯度、更新参数。

for t in range(n_epochs):
    # 打乱数据顺序，确保均匀采样
    shuffled_indices = np.random.permutation(m)
    x_shuffled = x_b[shuffled_indices]
    y_shuffled = y[shuffled_indices]
    
    # 遍历每个批次
    for i in range(n_batchs):
        start = i * batch_size
        end = start + batch_size
        x_batch = x_shuffled[start:end]  # 当前批次特征
        y_batch = y_shuffled[start:end]  # 当前批次标签
        
        # 计算梯度并更新参数
        learning_rate = learning_rate_adjust(t * n_batchs + i)
        gradients = x_batch.T.dot(x_batch.dot(theta) - y_batch)  # 梯度公式
        theta = theta - learning_rate * gradients

代码解析：

打乱数据：np.random.permutation 生成随机索引，确保每轮训练数据的随机性。
批次划分：通过 start 和 end 索引划分批次，避免越界。
梯度计算：
公式为 $\nabla_\theta J(\theta) = X_{\text{batch}}^T \cdot (X_{\text{batch}} \theta - y_{\text{batch}})$ ，对应代码 x_batch.T.dot(x_batch.dot(theta) - y_batch)。