参考资料
nohup训练
1:ps -aux|grep python (能识别你的进程名称的关键词):会返回nohup启动的所有相关进程
2:kill -9 pid(pid为每一行的第二个id编码,表示的是该进程的父进程)
Pycharm 页面卡住解决方案
使用ps命令结合grep来查找PyCharm相关的进程
ps aux | grep pycharm
kill -9 [PID]
关于怎么找这个卡住的进程,据初步观察,卡住进程打印的信息是最长的,此外,在卡住进程的打印信息结尾处会提示工程名称
TensorBoard
记录loss
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix")
writer.add_scalars("Loss", {"Valid": np.mean(valid_curve)}, iter_count)
writer.add_scalars("Loss", {"Train": loss.item()}, iter_count)
记录feature_map
fmap_1_grid = vutils.make_grid(fmap_1, normalize=True, scale_each=True, nrow=8)
writer.add_image('feature map in conv1', fmap_1_grid, global_step=322)
writer.close()
记录kernel
flag = 1
if flag:
writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix")
alexnet = models.alexnet(pretrained=True)
kernel_num = -1
vis_max = 1
for sub_module in alexnet.modules():
if isinstance(sub_module, nn.Conv2d):
kernel_num += 1
if kernel_num > vis_max:
break
kernels = sub_module.weight
c_out, c_int, k_w, k_h = tuple(kernels.shape)
for o_idx in range(c_out):
kernel_idx = kernels[o_idx, :, :, :].unsqueeze(1) # make_grid需要 BCHW,这里拓展C维度
kernel_grid = vutils.make_grid(kernel_idx, normalize=True, scale_each=True, nrow=c_int)
writer.add_image('{}_Convlayer_split_in_channel'.format(kernel_num), kernel_grid, global_step=o_idx)
kernel_all = kernels.view(-1, 3, k_h, k_w) # 3, h, w
kernel_grid = vutils.make_grid(kernel_all, normalize=True, scale_each=True, nrow=8) # c, h, w
writer.add_image('{}_all'.format(kernel_num), kernel_grid, global_step=322)
print("{}_convlayer shape:{}".format(kernel_num, tuple(kernels.shape)))
writer.close()
启动tensorboard
tensorboard --logdir=logs --port=6006 #指定文件路径 指定端口
GPU的使用
多GPU训练时,利用环境变量指定可见GPU
gpu_list = [0,1]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
正则化weight_decay
optim_wdecay = torch.optim.SGD(net_weight_decay.parameters(), lr=lr_init, momentum=0.9, weight_decay=1e-2)
关于focal_loss损失函数
L k p = − 1 N ∑ c ν u { ( 1 − y ^ c ν u ) α l o g ( y ^ c ν u ) , 如果 y c ν u = 1 ( 1 − y c ν u ) β y ^ c ν u α l o g ( 1 − y ^ c ν u ) , 其他 L_{\mathrm{kp}}=-\frac{1}{N}\sum_{c\nu u}\begin{cases}(1-\hat{y}_{c\nu u})^{\alpha}\mathrm{log}(\hat{y}_{c\nu u}),&\text{如果 }y_{c\nu u}=1\\(1-y_{c\nu u})^{\beta}\hat{y}_{c\nu u}^{\alpha}\mathrm{log}(1-\hat{y}_{c\nu u}),&\text{其他}\end{cases} Lkp=−N1cνu∑{(1−y^cνu)αlog(y^cνu),(1−ycνu)βy^cνuαlog(1−y^cνu),如果 ycνu=1其他
&esmp;在计算 L k p L_{kp} Lkp梯度时,分两种情况进行讨论
- y c v u = 1 y_{cvu}=1 ycvu=1 则 L 1 L_1 L1= ( 1 − y ^ c ν u ) α l o g ( y ^ c ν u ) (1-\hat{y}_{c\nu u})^\alpha\mathrm{log}(\hat{y}_{c\nu u}) (1−y^cνu)αlog(y^cνu)
- y c v u = 0 y_{cvu}=0 ycvu=0 则 L 2 = y ^ c ν u α l o g ( 1 − y ^ c ν u ) L_2 = \hat{y}_{c\nu u}^{\alpha}\mathrm{log}(1-\hat{y}_{c\nu u}) L2=y^cνuαlog(1−y^cνu)
- y c v u ! = 1 y_{cvu}!=1 ycvu!=1 则 此时的情况就是在情况2的条件下乘以一个因子
下面是
L
1
L_1
L1的求导之后函数公式
下面是
L
2
L_2
L2的求导之后函数公式