Pytorch框架之训练中不同阶段使用不同的optimizer
最近在使用余弦退火算法学习率调度器动态更新学习率,遇到了使用一个optimizer
训练到中后期时,因为initial_lr
参数设置得太大而导致的loss爆炸问题。
我想着能不能以epoch
数为划分,在不同的阶段采取不同的optimizer
呢?
首先,在配置文件*.yaml
中,添加TURN_EPOCH
,BASE_LR_SECOND
这两个参数,分别表示转折epoch和二阶段optimizer的最大学习率。
SOLVER:
EPOCHS: 70 # The total training epochs
TURN_EPOCH: 40 # The epoch which is the turning point from first-level to second-level
T_0: 10 # The total epochs for the first learning cycle (learning rate warms up then)
T_MULT: 1 # The learning cycle would be (T_0, T_0*T_MULT, T_0*T_MULT^2, T_0*T_MULT^3, ...)
ETA_MIN: 0.000001 # Initial learning rate in each learning cycle
BASE_LR: 0.0002 # Learning rate in the end of each learning cycle (first-level)
BASE_LR_SECOND: 0.0001 # Learning rate in the end of each learning cycle (second-level)
先在train.py
中定义两个optimizer
# create optimizer
# 一阶段用: 0 to 40 epoch
optimizer1 = optim.Adam([{'params': model.parameters(), 'initial_lr': args.BASE_LR}], betas=(0.9, 0.999))
# 二阶段用: 41 epoch to the end
optimizer2 = optim.Adam([{'params': model.parameters(), 'initial_lr': args.BASE_LR_SECOND}], betas=(0.9, 0.999))
读取检查点进行optimizer
信息的恢复,以保证训练的连续性。需要特别注意的是,在转折点epoch的时候不要导入检查点的optimizer
信息,否则后面会产生一系列的问题:
def load_checkpoint(model, optimizer1, optimizer2, load_epoch):
load_dir = args.NETS_DIR + '/checkpoint' + '_' + '%06d' % load_epoch + '.tar'
print('Loading pre-trained checkpoint %s' % load_dir)
checkpoint = torch.load(load_dir)
model.load_state_dict(checkpoint['state_dict'])
optimizer_dict = checkpoint['optimizer']
if load_epoch < args.TURN_EPOCH:
optimizer1.load_state_dict(optimizer_dict)
elif load_epoch > args.TURN_EPOCH:
optimizer2.load_state_dict(optimizer_dict)
else:
# 为了避免一系列奇怪的问题,如果load_epoch为转折点,就直接用新的optimizer2,不要搞那么多了
pass
learning_rate = checkpoint['learning_rate']
iters = checkpoint['iters']
print('Learning rate recorded from the checkpoint: %s' % str(learning_rate))
return learning_rate, iters
给每个optimizer
分配一个lr_shceduler
。
# create learning rate scheduler
lr_scheduler1 = CosineAnnealingWarmRestarts(optimizer1, T_0=args.T_0, T_mult=args.T_MULT, eta_min=args.ETA_MIN,
last_epoch=args.LOAD_EPOCH - 1)
lr_scheduler2 = CosineAnnealingWarmRestarts(optimizer2, T_0=args.T_0, T_mult=args.T_MULT, eta_min=args.ETA_MIN,
last_epoch=args.LOAD_EPOCH - 1 - args.TURN_EPOCH)
训练的时候,不同的epoch选择不同的optimizer
和lr_scheduler
即大功告成!
for epoch in range(args.LOAD_EPOCH + 1, args.EPOCHS + 1):
if epoch <= args.TURN_EPOCH:
learning_rate, avg_train_loss, iters = train_epoch(args, TrainImgLoader, model, model_fn, optimizer1, epoch,
iters, lr_scheduler1)
else:
learning_rate, avg_train_loss, iters = train_epoch(args, TrainImgLoader, model, model_fn, optimizer2, epoch,
iters, lr_scheduler2)