训练损失越来越大_Auto Seg-Loss: 自动损失函数设计

提出了一种名为AutoSeg-Loss的方法,该方法利用强化学习自动搜索针对特定指标优化的损失函数,以提高语义分割任务的表现,尤其是在边缘相关指标上的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

35c0109bd6ce8c83878b9b9252d4fafc.png

前段时间,有一则新闻比较火:

全国游泳冠军赛引发争议,傅园慧等五位预赛排名第一的名将因体能测试分数低而无缘决赛
傅园慧预赛第一无缘决赛,击剑、体操等比赛也出现类似情况,反映了哪些问题?体能决定运动员赛事成绩合理吗?​www.zhihu.com

体能水平可以反映竞技水平吗?对于普通人来说,体能水平和单项的比赛能力大多存在正相关关系。然而,高水平运动员需要的是对特定项目的针对性训练,更高的体能水平并不意味着更好的成绩,比如对于某些项目(如长跑)来说,上肢过于强壮反而是负担。因此,许多高水平运动员即使在专项上打破了亚洲记录,面对体能测试也败下阵来。反过来考虑,如果一个运动员日常只以体能测试的项目作为自己的训练目标,最终长跑、冲刺、引体、深蹲等项目炉火纯青,那么他大概率也可以在某些专项上凭借身体素质取得远超普通人的成绩,但并不足以成为顶尖的运动员。

为什么要说这则新闻呢?实际上,如果把我们的神经网络模型看做一个运动员,许多时候,这个运动员面对的专项比赛的评价指标(比如语义分割里的mIoU)和它的训练目标(比如常用的Cross Entropy Loss)并不完全一致。尽管CE Loss在绝大多数时候可以训练出不错的模型,但这是凭借足够强的“身体素质”得到的成绩,缺少了对于专项的针对性优化。

那么,能否用专项的评价指标,比如mIoU来指导训练呢?不幸的是,多数的评价指标都是不可微的,无法直接通过反向传播进行训练。当然,这并不能阻挡住研究者们的脚步。许多研究尝试通过对评价指标进行可微近似的方式,得到一个代理损失函数来指导训练(比如The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks 和 Learning Surrogates via Deep Embedding)。但是,这些可微近似并不一定能让模型取得良好的训练效果——尽管训练目标是一致的,但教练的训练计划(梯度)的好坏也决定了运动员(模型)的训练水平。因此,代理损失函数的设计需要expertise以及较高的试错成本;即便如此,设计出的代理损失函数有许多也不足以独当一面,需要和CE Loss联合训练才能达到不错的效果。

Auto Seg-Loss 希望(在语义分割任务上)将这个流程自动化。简单来说,我们发现主流的语义分割指标(基本由TP/TN/FP/FN组成)都能写成可微运算(比如加、乘)、量化(one-hot)和logical运算(与AND、或OR)的形式。由于logical运算实际上只定义在

上,我们使用一个参数化的曲面对logical运算进行插值,使得其在
上有可微的定义,并使用softmax替代one-hot量化,从而得到一个可微版本的代理损失函数。接下来,我们使用强化学习算法(PPO2)对这个曲面的形状进行搜索,从而保证这个代理损失函数能够良好地指导训练。图1是我们算法的整体框架。

2b8adf56d9705dc37479d7e577837b7f.png
图1 整体框架

在实现时,我们尝试了分段Bezier曲线分段线性曲线两种参数化的方式。我们提出了真值表约束单调性约束作为先验,以限制参数化曲面的形状。实验表明这两种参数化的方式都可以得到不错的代理损失函数,同时这两条约束有效地提高了搜索的效率和效果。

我们在PASCAL VOC和Cityscapes上进行了实验。相对于手动设计的代理损失函数或Cross Entropy的改进,搜索出来的代理损失函数在主流的语义分割指标上都能达到on par或更高的水平,尤其对边缘相关指标的提升比较大。值得一提的有两点:

  1. 可能得益于搜索空间的设计,我们的搜索效率比较高,在VOC上使用DeepLab V3+,对于mIoU的搜索只需要8个小时左右(8卡V100,实际上搜到一半已经基本收敛了,时间大概相当于相同数据集两次正常训练);
  2. 我们搜索出的代理损失函数可以很好地迁移到其他的模型架构数据集,因此只需要一次搜索就可多次使用(不仅如此,我们发现针对mIoU搜索出的参数同样适用于FWIoU和Boundary IoU等同类指标)。下面的两个表是我们尝试的两种参数化形式的实验结果。

a5c2582d695327ee65dfe0a15874db06.png
表1 分段bezier参数化实验效果

f14553618a7b3e709ec670c3a5a3ea39.png
表2 分段linear参数化实验效果

对于边缘相关的指标,我们发现,单独使用边缘指标指导训练会使模型只关注边缘的分割结果。图2展示了这个现象。通过将边缘指标和整体指标(如mIoU)组合进行训练,模型可以在保证合理的整体表现的同时提高边缘的分割效果。另外一个有趣的发现是,用来评估边缘准确度的Boundary F1 score在容许误差不为0时,用来指导训练可能会造成边缘的锯齿效果,这实际上是对这个指标的一个hack。我们在附录里讨论了这个问题。

8227e806594a1adc87738423090bed68.png
图2 单独使用边缘指标以及组合整体指标

我们希望Auto Seg-Loss可以降低研究以及业务中,为了某个给定的指标(比如边缘部分的IoU或者F-score)设计和调整损失函数时的试错成本,向自动损失函数设计前进一步。我们的文章已经在arxiv挂出来了,代码也即将开源并整合进一些开源Segmentation框架,期待各位试用以及与各位的讨论!

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation​arxiv.org
# Ultralytics YOLO 🚀, AGPL-3.0 license # Default training settings and hyperparameters for medium-augmentation COCO training #中等增强的COCO训练的默认训练设置和超参数 task: detect # (str) YOLO task, i.e. detect, segment, classify, pose mode: train # (str) YOLO mode, i.e. train, val, predict, export, track, benchmark # Train settings ------------------------------------------------------------------------------------------------------- model: # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml data: # (str, optional) path to data file, i.e. coco128.yaml epochs: 100 # (int) number of epochs to train for time: # (float, optional) number of hours to train for, overrides epochs if supplied patience: 20 # (int) epochs to wait for no observable improvement for early stopping of training batch: 20 # (int) number of images per batch (-1 for AutoBatch)!!!!!!! imgsz: 640 # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes save: True # (bool) save train checkpoints and predict results save_period: -1 # (int) Save checkpoint every x epochs (disabled if < 1) cache: False # (bool) True/ram, disk or False. Use cache for data loading device: 0 # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu workers: 8 # (int) number of worker threads for data loading (per RANK if DDP)!!!!!!! project: # (str, optional) project name name: # (str, optional) experiment name, results saved to 'project/name' directory exist_ok: False # (bool) whether to overwrite existing experiment pretrained: True # (bool | str) whether to use a pretrained model (bool) or a model to load weights from (str) optimizer: auto # (str) optimizer to use, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto] verbose: True # (bool) whether to print verbose output seed: 0 # (int) random seed for reproducibility deterministic: True # (bool) whether to enable deterministic mode single_cls: False # (bool) train multi-class data as single-class rect: False # (bool) rectangular training if mode='train' or rectangular validation if mode='val' cos_lr: False # (bool) use cosine learning rate scheduler close_mosaic: 10 # (int) disable mosaic augmentation for final epochs (0 to disable) resume: False # (bool) resume training from last checkpoint amp: True # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check fraction: 1.0 # (float) dataset fraction to train on (default is 1.0, all images in train set) profile: False # (bool) profile ONNX and TensorRT speeds during training for loggers freeze: None # (int | list, optional) freeze first n layers, or freeze list of layer indices during training # Segmentation overlap_mask: True # (bool) masks should overlap during training (segment train only) mask_ratio: 2 # (int) mask downsample ratio (segment train only) # Classification dropout: 0.0 # (float) use dropout regularization (classify train only) # Val/Test settings ---------------------------------------------------------------------------------------------------- val: True # (bool) validate/test during training split: val # (str) dataset split to use for validation, i.e. 'val', 'test' or 'train' save_json: False # (bool) save results to JSON file save_hybrid: False # (bool) save hybrid version of labels (labels + additional predictions) conf: 0.3 # 置信度(float, optional) object confidence threshold for detection (default 0.25 predict, 0.001 val) iou: 0.8 # (float) intersection over union (IoU) threshold for NMS max_det: 300 # (int) maximum number of detections per image half: False # (bool) use half precision (FP16) dnn: False # (bool) use OpenCV DNN for ONNX inference plots: True # (bool) save plots and images during train/val # Predict settings ----------------------------------------------------------------------------------------------------- source: # (str, optional) source directory for images or videos vid_stride: 1 # (int) video frame-rate stride stream_buffer: False # (bool) buffer all streaming frames (True) or return the most recent frame (False) visualize: False # (bool) visualize model features augment: False # (bool) apply image augmentation to prediction sources agnostic_nms: False # (bool) class-agnostic NMS classes: # (int | list[int], optional) filter results by class, i.e. classes=0, or classes=[0,2,3] retina_masks: True # (bool) use high-resolution segmentation masks # Visualize settings --------------------------------------------------------------------------------------------------- show: False # (bool) show predicted images and videos if environment allows save_frames: False # (bool) save predicted individual video frames save_txt: False # (bool) save results as .txt file save_conf: False # (bool) save results with confidence scores save_crop: False # (bool) save cropped images with results show_labels: True # (bool) show prediction labels, i.e. 'person' show_conf: True # (bool) show prediction confidence, i.e. '0.99' show_boxes: True # (bool) show prediction boxes line_width: # (int, optional) line width of the bounding boxes. Scaled to image size if None. # Export settings ------------------------------------------------------------------------------------------------------ format: torchscript # (str) format to export to, choices at https://docs.ultralytics.com/modes/export/#export-formats keras: False # (bool) use Kera=s optimize: False # (bool) TorchScript: optimize for mobile int8: False # (bool) CoreML/TF INT8 quantization dynamic: False # (bool) ONNX/TF/TensorRT: dynamic axes simplify: False # (bool) ONNX: simplify model opset: # (int, optional) ONNX: opset version workspace: 4 # (int) TensorRT: workspace size (GB) nms: False # (bool) CoreML: add NMS # Hyperparameters ------------------------------------------------------------------------------------------------------ lr0: 0.01 # (float) initial learning rate (i.e. SGD=1E-2, Adam=1E-3) lrf: 0.01 # (float) final learning rate (lr0 * lrf) momentum: 0.937 # (float) SGD momentum/Adam beta1 weight_decay: 0.0005 # (float) optimizer weight decay 5e-4 warmup_epochs: 3.0 # (float) warmup epochs (fractions ok) warmup_momentum: 0.8 # (float) warmup initial momentum warmup_bias_lr: 0.1 # (float) warmup initial bias lr box: 7.5 # (float) box loss gain cls: 0.5 # (float) cls loss gain (scale with pixels) dfl: 1.5 # (float) dfl loss gain pose: 12.0 # (float) pose loss gain kobj: 1.0 # (float) keypoint obj loss gain label_smoothing: 0.0 # (float) label smoothing (fraction) nbs: 64 # (int) nominal batch size hsv_h: 0 # (float) image HSV-Hue augmentation (fraction) 0.015 hsv_s: 0 # (float) image HSV-Saturation augmentation (fraction)0.7 hsv_v: 0 # (float) image HSV-Value augmentation (fraction)0.4 degrees: 0.0 # (float) image rotation (+/- deg) translate: 0.1 # (float) image translation (+/- fraction) scale: 0.5 # (float) image scale (+/- gain) shear: 0.0 # (float) image shear (+/- deg) perspective: 0.0 # (float) image perspective (+/- fraction), range 0-0.001 flipud: 0 # (float) image flip up-down (probability)0.5 fliplr: 0 # (float) image flip left-right (probability)0.5 mosaic: 1 # (float) image mosaic (probability)1.0 mixup: 0.0 # (float) image mixup (probability) copy_paste: 0.0 # (float) segment copy-paste (probability) # Custom config.yaml --------------------------------------------------------------------------------------------------- cfg: # (str, optional) for overriding defaults.yaml # Tracker settings ------------------------------------------------------------------------------------------------------ tracker: botsort.yaml # (str) tracker type, choices=[botsort.yaml, bytetrack.yaml] 帮我介绍一下这些参数
最新发布
07-23
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值