训练结果不稳定_模型训练无法固定结果怎么办-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_44901346/article/details/116262346

本文探讨了在深度学习训练中如何通过设置随机种子来复现实验结果，特别指出nn.upsample层的'bilinear'模式可能导致结果的随机性。建议使用'nearest'模式以提高结果一致性。通过对随机种子的控制和模型结构的调整，可以更好地控制和复现深度学习实验。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一般情况下，同一个网络跑多次，结果是不可能一样的。想要复现结果首先可以试试随机种子，固定住网络的大部分随机性。随机种子，设置如下：

if manual_seed is not None:
            logger.info(f'Seed the RNG for all devices with {manual_seed}')
            os.environ['PYTHONHASHSEED'] = str(manual_seed)
            torch.manual_seed(manual_seed)
            torch.cuda.manual_seed(manual_seed)
            torch.cuda.manual_seed_all(manual_seed)  # if you are using multi-GPU
            random.seed(manual_seed)
            np.random.seed(manual_seed)
            torch.backends.cudnn.deterministic = True
            torch.backends.cudnn.benchmark = False
            torch.backends.cudnn.enabled = True

随机种子要放到训练代码的最前面。
可使用下面的代码在每次实验室产生随机的随机种子，然后想办法将每次的随机种子保存下来以便复现结果。

seed = int(time.time() * 256)

若是固定随机种子后结果还是不能复现，那么原因应该就出在定义的模型里面了。
我就是这样子的，在使用随机种子后，前几个batch的训练的结果一样，然后到后面结果就慢慢差距变大。后来通过多次实验找到了这个随机性出现在nn.upsample层。

self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)

nn.Upsample中若使用 mode='bilinear’会使得结果出现随机性。
可将mode改为 mode=‘nearest’。

 nn.Upsample(scale_factor=(2, 2, 2), mode='nearest')

torch.nn.functional.interpolate也是一样的，'bilinear’模式会带来随机性。