搭建网络与绘图_网络搭建文字绘图-优快云博客

本文链接：https://blog.youkuaiyun.com/yykzyj123456/article/details/125410178

本文介绍了如何在PyTorch中使用Adam优化器管理多个网络参数，展示了绘制学习曲线的基本步骤，并详细阐述了模型状态切换（model.eval()与model.train()）、停止部分网络训练的方法以及张量操作中with torch.no_grad()的作用。同时，讲解了如何在设备间移动张量以及理解to(device)的功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

pytorch 想在一个优化器中设置多个网络参数的写法

import itertools
self.optimizer = optim.Adam(itertools.chain(self.encoder.parameters(), self.decoder.parameters()), lr=self.opt.lr, betas=(self.opt.beta1, 0.999))

绘图

import matplotlib.pyplot as plt
import numpy as np

axis_x = np.array([-8, -7, -6, -5, -4, -3, -2, -1])
axis_y = np.array([0, 1, 2, 3, 4, 5, 6, 7])
fig1 = plt.figure(1)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.plot(axis_x, axis_y)
plt.show()
plt.pause(4)# 间隔的秒数： 4s
plt.close(fig1)

fig2 = plt.figure(2)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.plot(axis_y, axis_x)
plt.show()
plt.pause(6)# 间隔的秒数：6s
plt.close(fig2)

停止部分网络的训练方法
(1) torch.load()

>>> torch.load('tensors.pt')
>
# Load all tensors onto the CPU
>>> torch.load('tensors.pt', map_location=torch.device('cpu'))
>
# Load all tensors onto the CPU, using a function
>>> torch.load('tensors.pt', map_location=lambda storage, loc: storage)
>
# Load all tensors onto GPU 1
>>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1))
>
# Map tensors from GPU 1 to GPU 0
>>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})
>
# Load tensor from io.BytesIO object
>>> with open('tensor.pt', 'rb') as f:
        buffer = io.BytesIO(f.read())
>>> torch.load(buffer)
>
# Load a module with 'ascii' encoding for unpickling
>>> torch.load('module.pt', encoding='ascii')

(2) pytorch的 model.eval()和model.train()作用
pytorch中model.train()和model.eval()的区别主要在于Batch Normalization和Dropout两层。

model.eval()：人为停止Batch Normalization的均值和方差统计，否则，即使不训练，因为有输入数据，BN的均值和方差也会改变。Dropout关闭，所有神经元都参与计算。

model.train()：Batch Normalization的均值和方差统计开启，使得网络用到每一批数据的均值和方差，Dropout功能开启，定义好模型后，默认是model.train()模式。

(3) with torch no_grad

“with ”torch.no_grad()的使用就像一个循环，其中循环内的每个张量都将requires_grad设置为False。

# import torch library
import torch

# define a torch tensor
x = torch.tensor(2., requires_grad = True)
print("x:", x)   ## x: tensor(2., requires_grad=True)

# define a function y
with torch.no_grad():
   y = x ** 2
print("y:", y)   ## y: tensor(4.)

# check gradient for Y
print("y.requires_grad:", y.requires_grad)   ## y.requires_grad: False

(4) to(device)

## 对于输入数据
mytensor = my_tensor.to(device)
## 这行代码的意思是将所有最开始读取数据时的tensor变量copy一份到device所指定的GPU上去，之后的运算都在GPU上进行，后面衍生的变量自然也都在GPU上

对于Tensor类型的数据，使用to.(device) 之后，需要接收返回值，返回值才是正确设置了device的Tensor。

对于Module对象，只用调用to.(device) 就可以将模型设置为指定的device。不必接收返回值