文章目录
根据龙良曲Pytorch学习视频整理,视频链接:
【计算机-AI】PyTorch学这个就够了!
(好课推荐)深度学习与PyTorch入门实战——主讲人龙良曲
25. Visdom可视化
- Tensorboard: gogle出品的tensorflow可视化工具
- TensorboardX: 对应的pytorch可视化工具
数据必须搬到cpu上转化为numpy数据才能可视化 - Visdom: from Facebook 效率高、美观
visdom安装
- 不建议直接使用pip安装,会出现error
- 推荐使用源码安装visdom
源码安装步骤
- 安装包地址:fossasia/visdom
- 解压后cmd进入visdom-master目录执行
pip install -e .
- 安装完成要将visdom-master文件下py目录下的文件夹复制到依赖包地址
- cmd执行命令
python -m visdom.server
打开服务连接,找到网址复制到浏览器 - 如果执行命令时
Checking for scripts
加载时间太慢,找出visdom的安装文件注释掉server.py中的download_scripts()
- 如果运行程序时visdom浏览器显示蓝屏,说明visdom文件被qiang了,需要再重新手动覆盖
可视化代码
from visdom import Visdom
# lines: single trace
viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([loss.item()], [global_step], win='train_loss', update='append')
# lines: multi-traces
viz.line([[0., 0.]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
# visual X
viz.images(data.view(-1, 1, 28, 28), win='x')
viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))
可视化结果(没有手写数字的数据集???)
26.过拟合&欠拟合
- underfitting: Estimated < Ground-truth
e.g. WGAN
train acc. is bad
test acc. is bad as well - overfitting: Estimated > Ground-truth
how to detect
how to reduce
27.Train-Val-Test划分
不能根据Test set的performance调整参数,只能通过Validation Set的结果调整,否则会造成数据污染
k-fold cross validation
- merge train/val sets
- randomly sample 1/k as val set
- 尽可能多的数据实行backward调参,且防止死记硬背
# load data
train_db = datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
]))
train_loader = torch.utils.data.DataLoader(
train_db,
batch_size=batch_size, shuffle=True
)
test_db = datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
]))
test_loader = torch.utils.data.DataLoader(
test_db,
batch_size=batch_size, shuffle=True
)
print('train', len(train_db), 'test', len(test_db))
train_db, val_db = torch.utils.data.random_split(train_db, [50000, 10000])
print('db1', len(train_db), 'db2', len(val_db))
train_loader = torch.utils.data.DataLoader(
train_db,
batch_size=batch_size, shuffle=True
)
val_loader = torch.utils.data.DataLoader(
val_db,
batch_size=batch_size, shuffle=True
)
28.正则化
Occam’s Razor
- More thing should not be used than are necessary
Reduce Overfitting
- More data
- Constraint model complexity
shallow
regularization - Dropout
- Data argumentation
- Early Stopping
Regularization(weight decay):Loss后加入正则化项
- L1-regularization λ ∑ i = 1 n ∣ θ i ∣ \lambda \sum^n_{i=1}|\theta _i| λ∑i=1n∣θi∣
- L2-regularization 1 2 λ ∣ ∣ W ∣ ∣ 2 \frac{1}{2}\lambda ||W||^2 21λ∣∣W∣∣2
L2正则化
optimizer = optim.SGD(net.parameters(), lr=learning_rate, weight_decay=0.01)
L1正则化(只能手动实现)
regularization_loss = 0
for param in model.parameters():
regularization_loss += torch.sum(torch.abs(param))
classify_loss = criteon(logits, target)
loss = classify_loss + 0.01 * regularization_loss
torch.optim.optimizer.zero_grad()
loss.backward()
optimizer.step()
29.动量与学习率衰减
momentum
- 考虑上一次历史数据
- 原梯度更新函数: w k + 1 = w k − α ▽ f ( w k ) w^{k+1}=w^k-\alpha\bigtriangledown f(w^k) wk+1=wk−α▽f(wk)
- 加入动量后:
w
k
+
1
=
w
k
−
α
▽
f
(
w
k
)
−
β
z
k
w^{k+1}=w^k-\alpha\bigtriangledown f(w^k)-\beta z^k
wk+1=wk−α▽f(wk)−βzk
w k + 1 = w k − α z k + 1 w^{k+1}=w^k-\alpha z^{k+1} wk+1=wk−αzk+1,其中 z k + 1 = β z k + α ▽ f ( w k ) z^{k+1}=\beta z^k+\alpha\bigtriangledown f(w^k) zk+1=βzk+α▽f(wk) - pytorch的Adam优化器内置动量
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.78, weight_decay=0.01)
learning-rate decay
# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05 if epoch < 30
# lr = 0.005 if 30 <= epoch < 60
# lr = 0.0005 if 60 <= epoch < 90
# ...`在这里插入代码片`
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
scheduler.step()
train(...)
validata(...)
30.early-stopping, dropout, sgd
early-stopping
- Validation set to select parameters
- Monitor validation performance
- Stop at the highest val pref.
dropout
- Learning less to learn better
- Each connection has p=[0, 1] to lose
- test时不dropout
torch.nn.Dropout(p=dropout_prob)
tensorflow.nn.drropout(keep_prob)
net_dropped = nn.Sequential(
nn.Linear(784, 200),
nn.Dropout(0.5),
nn.LeakyReLU(inplace=True),
nn.Linear(200, 200),
nn.Dropout(0.5),
nn.LeakyReLU(inplace=True),
nn.Linear(200, 10),
nn.LeakyReLU(inplace=True),
)
for epoch in range(epochs):
# train
net_dropped.train()
for batch_idx, (data, target) in enumerate(train_loader):
...
net_dropped.eval()
test_loss = 0
correct = 0
for data, target in test_loader:
...
Stochastic Gradient Descent(sgd)
- Stochastic: not random 不是一次加载所有数据而是batch
- Deterministic
31.贝叶斯定理
P ( A ∣ B ) = P ( B ∣ A ) × P ( A ) P ( B ) P(A|B)=\frac{P(B|A)×P(A)}{P(B)} P(A∣B)=P(B)P(B∣A)×P(A)
找了全网没有这一节的教程:(