Pytorch学习日记6:Optimizing Model Parameters

本文讲述了深度学习中模型参数训练、优化过程,涉及数据预处理、模型构建、超参数选择和随机梯度下降等关键步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

主要内容:

准备好模型和数据后,需要通过训练来优化模型的参数。训练模型是一个迭代的过程,在每个迭代过程中(称为epoch),模型输出预测的结果,计算其预测的误差(损失loss),反向传播得到误差相对于其参数的导数,并使用梯度下降优化这些参数。本文讲解如何优化模型参数。

一.准备好数据和模型

这一部分是在前面的学习日记中已经详细阐述过,在这里不多说,直接上代码。

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# 加载数据,可详见学习日记2
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

# dataloader的使用,可详见学习日记2
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)


#搭建神经网络,可详见学习日记4
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

二.超参数

超参数就是可以调整的参数,从而控制优化过程。不同超参数会影响模型的训练和收敛率。主要介绍如下的超参数。

number of epochs:epoch数,即在数据集上迭代的次数。

batch size:批量大小,即在更新参数之前,通过网络传播的数据样本的数量。

learning rate:学习率,在每个批次更新模型参数的程度。较小的值产生缓慢的学习速度,而较大的值可能会导致训练期间的不可预测的行为。

# 超参数
learning_rate = 1e-3    # 学习率
batch_size = 64         # 批量大小
epochs = 5              # epoch数

三.优化Loop

每个epoch有两部分组成:train_loop和test_loop

train_loop:在训练数据集上进行迭代,试图收敛到最佳参数。

test_loop:在测试数据集上进行迭代,检查模型性能是否得到提高。

为了让大家有更清晰的认识,我们首先来看看代码,只需看大框架即可,不需要看train_loop和test_loop具体里面的内容。目前我们准备好了数据、模型、设置好了超参数,定义好了train_loop和test_loop,之后便是初始化损失函数和优化器,最后就调用train_loop和test_loop。我们首先来看看损失函数和优化器这两个概念。

# train_loop:迭代训练数据集,尝试收敛到最佳参数
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)   # 训练数据总量
    for batch, (X, y) in enumerate(dataloader):
        # batch迭代的次数938  每次训练输入的是X[64,1,28,28]的张量  y是真实值
        # 计算预测值
        pred = model(X)
        # 计算误差,输出形式为torch.tensor(1.10200)
        loss = loss_fn(pred, y)

        # 梯度归零
        optimizer.zero_grad()
        # 误差反向传播,产生梯度
        loss.backward()
        # 根据梯度下降,调整参数
        optimizer.step()

        # 每迭代100次,输出损失函数值和遍历进度
        if batch % 100 == 0:
            # 用item()获取值
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


# test_loop:迭代测试数据集,检查模型性能是否改善
def test_loop(dataloader, model, loss_fn):

    size = len(dataloader.dataset)  # 10000
    num_batches = len(dataloader)   # 157
    test_loss, correct = 0, 0

    # 测试时不需要进行梯度计算了
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            # 累加loss
            test_loss += loss_fn(pred, y).item()
            # pred.argmax(1)返回最大值对应的位置,sum()求批量的正确数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    # 误差总和/总迭代次数=平均误差
    test_loss /= num_batches
    # 正确数总和/数据总量=准确率
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


loss_fn = nn.CrossEntropyLoss()   # 初始化损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  # 初始化SGD优化器(采用随机梯度下降)

# 增加epoch数,跟踪模型性能
epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)   # 执行train_loop
    test_loop(test_dataloader, model, loss_fn)                # 执行test_loop
print("Done!")
损失函数loss_fn

当遇到一些训练数据时,我们未经训练的网络很可能不会给出正确的答案。损失函数衡量的是获得的结果与目标值的不相似程度,它是我们在训练期间想要最小化的损失函数。为了计算损失,我们使用给定数据样本的输入进行预测,并与真实数据标签值进行比较。

用于回归任务的nn.MSELoss(均方误差)

用于分类任务的nn.NLLLoss(负对数似然)

用于分类任务的nn.CrossEntropyLoss(交叉熵损失):结合了nn.LogSoftmax和nn.NLLLoss

其中对于nn.CrossEntropyLoss(交叉熵损失)想要详细了解的,请看这篇文章:

nn.CrossEntropyLoss详解

我们将模型的输出对数传递给 nn.CrossEntropyLoss,它将对对数进行标准化处理并计算预测误差。如下进行损失函数的初始化。

loss_fn = nn.CrossEntropyLoss()   # 初始化损失函数
优化器

优化是在每个训练步骤中调整模型参数以减少模型误差的过程。优化算法定义了这个过程是如何进行的(在这个例子中,我们使用随机梯度下降法Stochastic Gradient Descent)。所有的优化逻辑都被封装在优化器对象中。在这里,我们使用SGD优化器;此外,PyTorch中还有许多不同的优化器,如Adam和RMSProp,它们对不同类型的模型和数据有更好的效果。

我们通过注册需要训练的模型参数来初始化优化器,并传入学习率超参数。

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  # 初始化SGD优化器(采用随机梯度下降)
train_loop实现过程

接下来介绍一下train_loop的实现过程

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)   # 训练数据总量 60000
    for batch, (X, y) in enumerate(dataloader):
        # batch迭代的次数938  每次训练输入的是X[64,1,28,28]的张量  y是真实值
        # 计算预测值
        pred = model(X)   # pred大小为[64,10]   y大小为[64]
        # 计算误差,输出形式为torch.tensor(1.10200)
        loss = loss_fn(pred, y)

        # 梯度归零
        optimizer.zero_grad()
        # 误差反向传播,产生梯度
        loss.backward()
        # 根据梯度下降,调整参数
        optimizer.step()

        # 每迭代100次,输出损失函数值和遍历进度
        if batch % 100 == 0:
            # 用item()获取值
            print(pred.size())
            print(y.size())
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
print(len(train_dataloader.dataset))  # 训练数据集总量  60000
print(len(train_dataloader))          # 938  938*64=60032
print(len(test_dataloader.dataset))   # 测试数据集总量  10000
print(len(test_dataloader))           # 157  157*64=10048
X,y = next(iter(train_dataloader))
print(len(X))    # 64   batch_size=64
print(X.size())  # [64,1,28,28]
print(y.size())  # [64]

首先看看数据:训练数据总量是60000,batch_size是64,因此batch大小是938;X是[64,1,28,28]的张量,这个不难理解,batch_size是64,一张图像是[1,28,28];y是真实的标签。

其次看看过程,每次训练都要经过这几步:前向传播计算预测值pred;pred和y根据损失函数计算误差;梯度归零;误差向后传播,得到参数梯度;优化器根据梯度调整参数。

关于optimizer.zero_grad()

功能:梯度初始化为零,把loss关于weight的导数变成0

为什么每一轮batch都需要设置optimizer.zero_grad?
根据pytorch中的backward()函数的计算,当网络参量进行反馈时,梯度是被积累的而不是被替换掉。但是在每一个batch时毫无疑问并不需要将两个batch的梯度混合起来累积,因此这里就需要每个batch设置一遍zero_grad。

其他的大家看看注释就明白了。

test_loop实现过程
# test_loop:迭代测试数据集,检查模型性能是否改善
def test_loop(dataloader, model, loss_fn):

    size = len(dataloader.dataset)  # 10000
    num_batches = len(dataloader)   # 157
    test_loss, correct = 0, 0

    # 测试时不需要进行梯度计算了
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            # 累加loss
            test_loss += loss_fn(pred, y).item()
            # pred.argmax(1)返回最大值对应的位置,sum()求批量的正确数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    # 误差总和/总迭代次数=平均误差
    test_loss /= num_batches
    # 正确数总和/数据总量=准确率
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

这一部分相对简单一些,这里主要说一下test_loss和correct是如何计算的。

test_loss从0开始每个batch_size算一遍进行累加,item()作用是从包含单个元素的张量中取出该元素值,并保持该元素的类型不变。因为loss_fn输出是torch.tensor(1.0028)这样的形式,是张量。

correct怎么算的:pred大小[64,10],pred.argmax(1) 行中比较找出列中最大值的位置,这样大小就变成了[64,1],将这个类别与真实标签进行比较(==)。如果相等,则返回True,否则返回False。接着,将布尔值转换为浮点数(type(torch.float)),并对所有元素求和(sum())。最后,使用item()方法将结果转换为标量值。

四.完整版可直接运行的代码

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# 加载数据,可详见学习日记2
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

# dataloader的使用,可详见学习日记2
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)


#搭建神经网络,可详见学习日记4
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()


# 超参数
learning_rate = 1e-3    # 学习率
batch_size = 64         # 批量大小
epochs = 5              # epoch数


# train_loop:迭代训练数据集,尝试收敛到最佳参数
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)   # 训练数据总量
    for batch, (X, y) in enumerate(dataloader):
        # batch迭代的次数938  每次训练输入的是X[64,1,28,28]的张量  y是真实值
        # 计算预测值
        pred = model(X)   # pred大小为[64,10]   y大小为[64]
        # 计算误差,输出形式为torch.tensor(1.10200)
        loss = loss_fn(pred, y)

        # 梯度归零
        optimizer.zero_grad()
        # 误差反向传播,产生梯度
        loss.backward()
        # 根据梯度下降,调整参数
        optimizer.step()

        # 每迭代100次,输出损失函数值和遍历进度
        if batch % 100 == 0:
            # 用item()获取值
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


# test_loop:迭代测试数据集,检查模型性能是否改善
def test_loop(dataloader, model, loss_fn):

    size = len(dataloader.dataset)  # 10000
    num_batches = len(dataloader)   # 157
    test_loss, correct = 0, 0

    # 测试时不需要进行梯度计算了
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            # 累加loss
            test_loss += loss_fn(pred, y).item()
            # pred.argmax(1)返回最大值对应的位置,sum()求批量的正确数
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    # 误差总和/总迭代次数=平均误差
    test_loss /= num_batches
    # 正确数总和/数据总量=准确率
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


loss_fn = nn.CrossEntropyLoss()   # 初始化损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  # 初始化SGD优化器(采用随机梯度下降)

# 增加epoch数,跟踪模型性能
epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)   # 执行train_loop
    test_loop(test_dataloader, model, loss_fn)                # 执行test_loop
print("Done!")

'''
print(len(train_dataloader.dataset))  # 训练数据集总量  60000
print(len(train_dataloader))          # 938  938*64=60032
print(len(test_dataloader.dataset))   # 测试数据集总量  10000
print(len(test_dataloader))           # 157  157*64=10048
X,y = next(iter(train_dataloader))
print(len(X))    # 64   batch_size=64
print(X.size())  # [64,1,28,28]
print(y.size())  # [64]
'''

### YOLO GPU Inference Implementation and Optimization #### Overview of YOLO on GPUs YOLO (You Only Look Once) is an efficient real-time object detection algorithm that divides input images into grids where each grid predicts bounding boxes and class probabilities[^3]. Implementing this model efficiently on GPUs can significantly enhance performance due to parallel processing capabilities. #### Setting Up Environment for GPU-Based Inference To perform inference using a YOLO model on a GPU, one must ensure the environment supports CUDA and cuDNN libraries which are essential for leveraging NVIDIA's hardware acceleration features. Installation typically involves setting up TensorFlow or PyTorch with GPU support: For **TensorFlow**, install via pip command: ```bash pip install tensorflow-gpu==<version> ``` For **PyTorch**, use the following script provided by official documentation: ```bash conda install pytorch torchvision torchaudio cudatoolkit=<cuda_version> -c pytorch ``` #### Loading Pre-trained Model Weights Once the necessary software stack is ready, load pre-trained weights from files like `.weights` specific to Darknet framework used originally with YOLO models. Conversion utilities exist between different formats ensuring compatibility across platforms. Example code snippet demonstrating loading process within Python context utilizing `cv2.dnn` module: ```python import cv2 # Initialize net with configuration file path and weight paths. net = cv2.dnn.readNetFromDarknet('yolov4.cfg', 'yolov4.weights') # Set backend and target device as OpenCV DNN Backend & CUDA respectively. net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA) net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16) blob = cv2.dnn.blobFromImage(image, scalefactor=1/255.0, size=(416, 416), swapRB=True, crop=False) layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] outputs = net.forward(output_layers) ``` This setup ensures optimal utilization of available GPU resources while performing forward passes through neural networks during inference phase[^1]. #### Optimizing Frame Rate During Inference Improving frame rate often requires optimizing both software architecture design choices alongside fine-tuning parameters related directly to deep learning algorithms themselves. One approach mentioned previously involved implementing asynchronous operations such as thread pools to maximize NPU usage efficiency leading potentially higher FPS rates compared synchronous counterparts[^2]. Another critical aspect lies within selecting appropriate batch sizes when preparing data batches fed into network layers; larger values generally yield better throughput at expense increased latency per individual prediction request. Experimentation may be required here depending upon application requirements balancing speed versus accuracy tradeoffs effectively. Additionally, quantization techniques reduce precision levels without sacrificing much predictive power allowing faster computations especially beneficial under constrained resource conditions found commonly embedded systems deploying edge AI solutions today. --related questions-- 1. What factors should developers consider before choosing between CPU vs GPU implementations? 2. How does converting floating point numbers to lower precisions impact overall system performance? 3. Can you explain how batching affects inference time in convolutional neural networks? 4. Are there any best practices regarding memory management specifically tailored towards running large-scale machine learning workloads on GPUs?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

南风知我意95

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值