20.初识Pytorch使用cuda对模型进行训练和测试或使用cuda对模型进行训练再用cpu测试Use cuda to train and test or use cpu test

1.训练

如果使用cuda进行训练,则需要在以下三个地方进行修改,告诉计算机使用的是cuda,并且有两种方式(待会再讲):
If using cuda for training, you need to modify the following three places to tell the computer to use cuda, and there are two ways (more on this later):

1.网络结构
1.Network structure

2.损失函数
2.Loss function

3.数据马上使用之前
Data,immediately before use

two way that we can use cuda:
1. xx.cuda()
2. xx.to(device=torch.device("cuda"))

方式(way)1:

1.network structure
model.cuda()

2.loss function
cross_entropy_loss.cuda()

3.data,immediately before use
imgs,targets = data
imgs.cuda()
targets.cuda() 

注意:其实这种方式应该在最训练代码的最前面写argparse.ArgumentParser()才比较好用,但是为了方便代码好读,就不写这么难。
PS:In fact, this method should be better to write argparse.ArgumentParser() at the top of the most training code, but in order to make the code easier to read, it is not so difficult to write. 

上代码(code):

from torch.utils.data import DataLoader
from LeNet_5 import *
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter


# 1.Create SummaryWriter
writer = SummaryWriter("log_loss")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="data", train=True, transform=torchvision.transforms.ToTensor(),
                                             download=True)

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = LeNet_5()
# a.add cuda
if torch.cuda.is_available():
    model = model.cuda()

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()
# b.add cuda
cross_entropy_loss = cross_entropy_loss.cuda()

# 7.Optimizer
learning_rate = 1e-2
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 8. Set some parameters to control loop
# epoch
epoch = 80

total_train_step = 0

for i in range(epoch):
    print(" -----------------the {} number of training epoch --------------".format(i + 1))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        # c.add cuda
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            targets = targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)

        optim.zero_grad()
        loss_train.backward()
        optim.step()
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("the training step is {} and its loss of model is {}".format(total_train_step, loss_train.item()))
            writer.add_scalar("train_loss", loss_train.item(), total_train_step)
            if total_train_step % 10000 == 0:
                torch.save(model.state_dict(), "model_save/model_{}_GPU.pth".format(total_train_step))
                print("the model of {} training step was saved! ".format(total_train_step))
            if i == (epoch - 1):
                torch.save(model.state_dict(), "model_save/model_{}_GPU.pth".format(total_train_step))
                print("the model of {} training step was saved! ".format(total_train_step))
writer.close()

方式(way)2:

1.network structure
model.to(device=torch.device("cuda"))

2.loss function
cross_entropy_loss.to(device=torch.device("cuda"))

3.data,immediately before use
imgs,targets = data
imgs.to(device=torch.device("cuda"))
targets.to(device=torch.device("cuda"))

上代码(code):

from torch.utils.data import DataLoader
from LeNet_5 import *
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter

# 1. torch choose cuda or cpu
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

# 2.Create SummaryWriter
writer = SummaryWriter("log_loss")

# 3.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="data", train=True, transform=torchvision.transforms.ToTensor(),
                                             download=True)

# 4.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 5.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 6.Create model
model = LeNet_5()
# a.add cuda
model = model.to(device=device)

# 7.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()
# b.add cuda
cross_entropy_loss = cross_entropy_loss.to(device=device)

# 8.Optimizer
learning_rate = 1e-2
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 9. Set some parameters to control loop
# epoch
epoch = 80

total_train_step = 0

for i in range(epoch):
    print(" -----------------the {} number of training epoch --------------".format(i + 1))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        imgs = imgs.to(device)
        targets = targets.to(device)
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)

        optim.zero_grad()
        loss_train.backward()
        optim.step()
        total_train_step = total_train_step + 1
        if total_train_step % 100 == 0:
            print("the training step is {} and its loss of model is {}".format(total_train_step, loss_train.item()))
            writer.add_scalar("train_loss", loss_train.item(), total_train_step)
            if total_train_step % 10000 == 0:
                torch.save(model.state_dict(), "model_save/model_{}_GPU.pth".format(total_train_step))
                print("the model of {} training step was saved! ".format(total_train_step))
            if i == (epoch - 1):
                torch.save(model.state_dict(), "model_save/model_{}_GPU.pth".format(total_train_step))
                print("the model of {} training step was saved! ".format(total_train_step))
writer.close()

2.测试

2.1.使用cuda训练,使用cpu测试
Use cuda to train, and then use cpu to test

上代码(code):

import torch
from torch.utils.data import DataLoader
from LeNet_5 import *
import torchvision

# test

# 1.Create model
model = LeNet_5()

# 2.Ready Dataset
test_dataset = torchvision.datasets.CIFAR10(root="data", train=False, transform=torchvision.transforms.ToTensor(),
                                            download=True)
# 3.Length
test_dataset_size = len(test_dataset)
print("the test dataset size is {}".format(test_dataset_size))

# 4.DataLoader
test_dataloader = DataLoader(dataset=test_dataset, batch_size=64)

# 5. Set some parameters for testing the network
total_accuracy = 0

# test
model.eval()
with torch.no_grad():
    for data in test_dataloader:
        imgs, targets = data
        model_load = torch.load("model_save/model_62500_GPU.pth", map_location=torch.device("cpu"))
        model.load_state_dict(model_load)
        outputs = model(imgs)
        accuracy = (outputs.argmax(1) == targets).sum()
        total_accuracy = total_accuracy + accuracy
        accuracy = total_accuracy / test_dataset_size
    print("the total accuracy is {}".format(accuracy))

2.2.使用cuda训练,使用cuda测试
Use cuda to train, and then also use cuda to test
import torch
from torch.utils.data import DataLoader
from LeNet_5 import *
import torchvision

# test

# 1.Create model
model = LeNet_5()
if torch.cuda.is_available():
    model = model.cuda()

# 2.Ready Dataset
test_dataset = torchvision.datasets.CIFAR10(root="data", train=False, transform=torchvision.transforms.ToTensor(),
                                            download=True)
# 3.Length
test_dataset_size = len(test_dataset)
print("the test dataset size is {}".format(test_dataset_size))

# 4.DataLoader
test_dataloader = DataLoader(dataset=test_dataset, batch_size=64)

# 5. Set some parameters for testing the network
total_accuracy = 0

# test
model.eval()
with torch.no_grad():
    for data in test_dataloader:
        imgs, targets = data
        # add cuda
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            targets = targets.cuda()
        model_load = torch.load("model_save/model_62500_GPU.pth")
        model.load_state_dict(model_load)
        outputs = model(imgs)
        accuracy = (outputs.argmax(1) == targets).sum()
        total_accuracy = total_accuracy + accuracy
        accuracy = total_accuracy / test_dataset_size
    print("the total accuracy is {}".format(accuracy))


其运行结果,可参考之前章节,这里不再过多阐述。
For the results, please refer to the previous chapters, which I will not be elaborated here. 

上一章 19.初识Pytorch之完整的模型训练套路-整理后的代码 Complete model training routine - compiled code

未完待续…
To be continued…

### 利用CUDA加速深度学习模型训练 #### 配置CUDA环境 安装适合特定GPU型号的CUDA Toolkit是必要的第一步。这一步骤确保了硬件与软件之间的兼容性,从而实现最佳性能[^3]。 对于环境变量的配置,需设置`PATH``LD_LIBRARY_PATH`等变量指向CUDA库的位置。完成上述步骤之后,可以通过执行`nvcc --version`命令来确认CUDA Toolkit已经成功安装并可正常使用。 #### 使用cuDNN进行卷积操作 在深度学习框架中集成NVIDIA cuDNN库能够显著提升基于神经网络的应用程序效率。此库专门针对深层神经网络中的常见运算进行了优化,特别是卷积层的操作。它不仅提高了速度还减少了内存占用量[^1]。 ```cpp #include <cudnn.h> // 初始化CuDNN句柄 cudnnHandle_t cudnn; cudnnCreate(&cudnn); // 创建描述符... ``` #### GPU加速的优势 当采用GPU而非单纯依赖CPU来进行深度学习模型训练时,可以大幅度缩短所需时间。这是因为现代图形处理器拥有大量核心单元,非常适合处理大规模矩阵运算平行化任务,这些都是构建复杂机器学习算法所必需的基础构件[^2]。 #### 实现多GPU并行训练的方法 为了进一步提高训练效率,在条件允许的情况下还可以考虑使用多个GPU协同工作。MATLAB提供了内置的支持用于跨多个设备分配数据集以及同步更新权重参数等功能,使得开发者无需深入理解底层通信机制就能轻松搭建起高效稳定的分布式系统架构。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值