详解yolov8的nms中multi-label功能为什么不是真正的multi-label任务实现

最新推荐文章于 2025-02-11 17:46:55 发布

原创

最新推荐文章于 2025-02-11 17:46:55 发布 · 3.3k 阅读

54 ·

CC 4.0 BY-SA版权

文章标签：

#yolov8 #多分类

文章探讨了YOLOv8中的非极大抑制(NMS)在多标签分类任务中的应用，指出NMS本质上并非真正的多标签网络。同时，详细解释了v8的lossfunction计算过程，尤其是TaskAlignedAssigner在样本分配和loss计算中的关键作用，强调了模型对每个像素最大面积真实目标框的依赖，限制了对重叠多类别框的训练能力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、什么是multi-label？

多标签分类(Multilabel classification): 给每个样本一系列的目标标签，即表示的是样本各属性而不是相互排斥的。比如图片中的猫可同时拥有两个标签cat、animal，需要预测出一个概念集合。

2.一般思路如何实现multi-label任务？

要实现这个任务，一种是使用多个模型，可以并行使用两个模型分别预测同一个物体，每个模型对该物体的预测不同。即一个模型预测图片中的猫为cat，另一个预测其为animal。这种方法比较简单实用，但可能满足不了一些场合的单一模型推理要求。

一种是专门设计一个网络同时对物体带有的多个标签进行训练，设计思路：
1.从网络的数据集、输入、损失函数、标签分配策略进行修改。
2.类似multi-task网络的形式，对网络输出做分支并行。

（两种实现方法并不是本文所讲述主题，一言带过~）

二、yolov8中nms函数的multi-label

首先放一段v8中nms源码：

def non_max_suppression(
        prediction,
        conf_thres=0.25,
        iou_thres=0.45,
        classes=None,
        agnostic=False,
        multi_label=False,
        labels=(),
        max_det=300,
        nc=0,  # number of classes (optional)
        max_time_img=0.05,
        max_nms=30000,
        max_wh=7680,
):
    """
    Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.

    Args:
        prediction (torch.Tensor): A tensor of shape (batch_size, num_classes + 4 + num_masks, num_boxes)
            containing the predicted boxes, classes, and masks. The tensor should be in the format
            output by a model, such as YOLO.
        conf_thres (float): The confidence threshold below which boxes will be filtered out.
            Valid values are between 0.0 and 1.0.
        iou_thres (float): The IoU threshold below which boxes will be filtered out during NMS.
            Valid values are between 0.0 and 1.0.
        classes (List[int]): A list of class indices to consider. If None, all classes will be considered.
        agnostic (bool): If True, the model is agnostic to the number of classes, and all
            classes will be considered as one.
        multi_label (bool): If True, each box may have multiple labels.
        labels (List[List[Union[int, float, torch.Tensor]]]): A list of lists, where each inner
            list contains the apriori labels for a given image. The list should be in the format
            output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).
        max_det (int): The maximum number of boxes to keep after NMS.
        nc (int, optional): The number of classes output by the model. Any indices after this will be considered masks.
        max_time_img (float): The maximum time (seconds) for processing one image.
        max_nms (int): The maximum number of boxes into torchvision.ops.nms().
        max_wh (int): The maximum box width and height in pixels

    Returns:
        (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of
            shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns
            (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).
    """

    # Checks
    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'
    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'
    if isinstance(prediction, (list, tuple)):  # YOLOv8 model in validation model, output = (inference_out, loss_out)
        prediction = prediction[0]  # select only inference output

    device = prediction.device
    mps = 'mps' in device.type  # Apple MPS
    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS
        prediction = prediction.cpu()
    bs = prediction.shape[0]  # batch size
    nc = nc or (prediction.shape[1] - 4)  # number of classes
    nm = prediction.shape[1] - nc - 4
    mi = 4 + nc  # mask