PyTorch的可复用代码模板（持续更新ing...）

诸神缄默不语

已于 2023-04-24 16:48:53 修改

阅读量5k

点赞数 18

CC 4.0 BY-SA版权

分类专栏：人工智能学习笔记文章标签： pytorch 深度学习神经网络数据挖掘

于 2021-05-24 14:39:46 首次发布

本文链接：https://blog.youkuaiyun.com/PolarisRisingWar/article/details/117223028

人工智能学习笔记专栏收录该内容

271 篇文章

订阅专栏

本文详细介绍了使用PyTorch进行深度学习的实践过程，包括模型搭建、数据集划分、训练验证测试流程、辅助可视化及模型保存。内容涵盖激活函数、数据加载、训练策略、损失曲线绘制以及准确率分析等方面，为读者提供了一个完整的深度学习项目模板。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

诸神缄默不语-个人优快云博文目录

最早更新时间：2021.5.24
最近更新时间：2023.4.24

1. 导入包、可复现性配置、异常检测和其他

import torch
import random
import numpy as np

import matplotlib.pyplot as plt

myseed = 12345
torch.manual_seed(myseed)
torch.random.manual_seed(myseed)
random.seed(0)
np.random.seed(myseed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.cuda.manual_seed_all(myseed)

torch.autograd.set_detect_anomaly(True)  #可在NaN出现时报错，定位错误代码。正向传播时：开启自动求导的异常侦测
# 反向传播时：在求导时开启侦测
#with torch.autograd.detect_anomaly():
#    loss.backward()

torch.multiprocessing.set_sharing_strategy('file_system')

最后这一条似乎是用于解决PyTorch的DataLoader在batch size和num workers过大时，打开太多文件、workers之间无法通讯、所以需要增加limit、巴拉巴拉，所以需要加这一项。可参考这篇博文：pytorch常见问题_Du_JuneLi的博客-优快云博客_torch.multiprocessing.set_sharing_strategy('file_s

其他我还没有理解的放在代码开头的操作：
这个出自CECP：

import os 
os.environ["OMP_NUM_THREADS"] = "1"    # prevent numpy from using multiple threads

2. 模型搭建

模板：

import torch

class Net(torch.nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        #一般会在这里放网络层和其他后续会用到的全局超参

    def forward(self, x):
        #用__init__中的Module来搭建网络
        #（在这里也可以新加层，如放激活函数等）
        #返回输出。

model=Net()

示例：

from torch import nn  #另一种常用的写法是import torch.nn as nn
from torch.nn import functional as F  #另一种常用的写法是import torch.nn.functional as F

class Residual(nn.Module):
    def __init__(self, input_channels, num_channels,
                 use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels,
                               kernel_size=3, padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels,
                               kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

（这个代码的来源：https://github.com/linkedlist771/DeepLearning-Basics/blob/17afc5bc0ee6dcc9ea796f613b1babb504165855/d2l/torch.py）

2.1 激活函数

常用激活函数：Sigmoid，ReLU，tanh

3. 数据集

给出一个简单的数据划分小函数，简单修改后就能随意用到任何项目里：

import random

def split_dataset(dataset,ratio=[6,2,2],random_seed=20221207):
    """
    dataset: list
    ratio: 训练集-验证集-测试集所占比例
    random_seed: 为了统一每一次划分之后的结果

    返回值: 元组形式，每个元素依次是：训练集，验证集，测试集，三种数据集分别在原数据集中对应的索引
    （如果用numpy格式的数据集，可以直接用列表格式的索引实现切片，就不用这么麻烦了）
    """
    random.seed(random_seed)

    index=list(range(len(dataset)))
    random.shuffle(index)

    split_point1=int(len(dataset)*ratio[0]/sum(ratio))
    split_point2=int(len(dataset)*(ratio[0]+ratio[1])/sum(ratio))

    train_index=index[:split_point1]
    valid_index=index[split_point1:split_point2]
    test_index=index[split_point2:]

    train_data=[dataset[i] for i in train_index]
    valid_data=[dataset[i] for i in valid_index]
    test_data=[dataset[i] for i in test_index]

    return (train_data,valid_data,test_data,train_index,valid_index,test_index)

random.seed(1614)
(d1,d2,d3,i1,i2,i3)=split_dataset([[random.gauss(10,1) for _ in range(128)] for _ in range(1000)])

3.1 Dataset

from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
	def __init__(self):
		#可以在这个部分根据mode（train/val/test）入参来对数据集进行划分
		self.data = ...
	def __getitem__(self, index):
		#每个数据对象
		return self.data[index]
	def __len__(self):
		#数据集总长
		return len(self.data)

3.2 DataLoader

（dataset部分可以直接用其他可迭代的数据对象，比如list）

dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size, shuffle=True)

DataLoader各项入参：

batch_size
shuffle：训练时置True，测试时置False
drop_last：是否丢弃最后一个放不满的batch
pin_memory

4. 训练→验证→测试，模型保存

4.1 训练

训练模型

#把模型放到GPU上（此处仅考虑了单卡的情况，多卡情况可参考我之前撰写的这篇博文：https://blog.youkuaiyun.com/PolarisRisingWar/article/details/116069338 其中介绍了一些使用torch.nn.DataParallel的方法。更多分布式训练的方法我会在后续博文中陆续撰写）
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

#定义优化器和损失函数
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
criterion = torch.nn.MSELoss()
#注意：有些模型可能需要自定义损失函数（如PTA模型）
#PTA模型等的做法是在模型中自定义了loss_function函数，返回仅含loss这一元素的Tensor

#Optional: pre-processing工作（如PTA模型，开始训练前先跑了一波标签传播算法）
model.train()
for epoch in range(epochs):
	for batch_x,batch_y in train_data:
		#有些模型不需要分批次训练，比如很多GNN模型就是一波流……
		batch_x,batch_y=batch_x.to(device),batch_y.to(device)
		prediction = model(batch_x)
		loss = criterion(prediction, batch_y)
		loss.backward()
		optimizer.step()
		optimizer.zero_grad()
	#Optinal: post-processing工作（如C&S模型和PTA模型）
	#验证	
	#存储在验证集上表现最好的模型
	#设置early stopping：如果验证集上的表现在超过某个阈值的次数内仍然没有变好，就可以停了
	#关于这几个步骤的顺序：计算loss→backward→step有严格的先后顺序，zero_grad加在网络还是优化器上看需求（一般都是在优化器上），zero_grad在backward之前或者step之后都可以（一般都是step之后，也就是一个epoch运算结束之后）

储存模型

PATH = '.model.pth'
torch.save(model.state_dict(), PATH)

4.2 加载模型，验证和测试

加载储存在本地的模型

model=Net()
model.load_state_dict(torch.load(PATH))

验证和测试

model.eval()
with torch.no_grad():
    for batch_x in test_data:
    	batch_x.to(device)
        prediction = model(batch_x)
        #在验证集上也可以计算损失函数：loss = criterion(prediction, batch_y)
        
        #如果需要从prediction（logits）中获取最高的一项作为预测标签：
        predicted_label=torch.argmax(prediction,1)

5. 辅助可视化

5.1 绘制沿epoch的loss变化曲线图（在训练或验证时储存记录）

def plot_learning_curve(loss_record, title=''):
    ''' Plot learning curve of your DNN (train & dev loss) '''
    total_steps = len(loss_record['train'])
    x_1 = range(total_steps)
    x_2 = x_1[::len(loss_record['train']) // len(loss_record['dev'])]
    figure(figsize=(6, 4))
    plt.plot(x_1, loss_record['train'], c='tab:red', label='train')
    plt.plot(x_2, loss_record['dev'], c='tab:cyan', label='dev')
    plt.ylim(0.0, 5.)
    plt.xlabel('Training steps')
    plt.ylabel('MSE loss')
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()

示例图：在这里插入图片描述

5.2 绘制沿epoch的loss和ACC变化曲线图（在训练或验证时储存记录）

plt.title(dataset_name+'数据集在'+model_name+'模型上的loss')
plt.plot(train_losses, label="training loss")
plt.plot(val_losses, label="validating loss")
plt.plot(test_losses, label="testing loss")
plt.legend()
plt.savefig(pics_root+'/loss_'+pics_name)
plt.close()  #为了防止多图冲突

plt.title(dataset_name+'数据集在'+model_name+'模型上的ACC',fontproperties=font)
plt.plot(train_accs, label="training acc")
plt.plot(val_accs, label="validating acc")
plt.plot(test_accs, label="testing acc")
plt.legend()
plt.savefig(pics_root+'/acc_'+pics_name)
plt.close()

在这里插入图片描述

5.3 绘制预测值-真实标签点图

def plot_pred(dv_set, model, device, lim=35., preds=None, targets=None):
    ''' Plot prediction of your DNN '''
    if preds is None or targets is None:
        model.eval()
        preds, targets = [], []
        for x, y in dv_set:
            x, y = x.to(device), y.to(device)
            with torch.no_grad():
                pred = model(x)
                preds.append(pred.detach().cpu())
                targets.append(y.detach().cpu())
        preds = torch.cat(preds, dim=0).numpy()
        targets = torch.cat(targets, dim=0).numpy()

    figure(figsize=(5, 5))
    plt.scatter(targets, preds, c='r', alpha=0.5)
    plt.plot([-0.2, lim], [-0.2, lim], c='b')
    plt.xlim(-0.2, lim)
    plt.ylim(-0.2, lim)
    plt.xlabel('ground truth value')
    plt.ylabel('predicted value')
    plt.title('Ground Truth v.s. Prediction')
    plt.show()

示例图：在这里插入图片描述

5.4 打印每一类的accuracy（multi-class one-label分类）

def get_report(labels, preds):
	"""
	输入是每个样本的标签和预测值的列表
	要求严格按照从0开始的标签索引顺序来排列
	"""
    N_CLASSES = max(labels) + 1
    class_correct = list(0. for i in range(N_CLASSES))
    class_total = list(0. for i in range(N_CLASSES))
    c = (preds == labels)
    for i in range(len(labels)):
        label = labels[i]
        class_correct[label] += c[i]
        class_total[label] += 1
    report = ""
    for i in range(N_CLASSES):
        if class_total[i]:
            report += 'Accuracy of %d : %d/%d=%.4f' % (
            i, class_correct[i], class_total[i], class_correct[i] / class_total[i]) + "\n"
    return report

输出是一个字符串，列出每一类别，标签预测正确的与总该标签样本数的比例，输出打印的效果类似：

Accuracy of 0 : 0/201=0.0000
Accuracy of 1 : 0/37=0.0000
Accuracy of 2 : 0/191=0.0000
Accuracy of 3 : 0/263=0.0000
Accuracy of 4 : 0/194=0.0000
Accuracy of 5 : 0/11=0.0000