DW_目标检测基础

最新推荐文章于 2025-07-10 09:44:41 发布

原创

最新推荐文章于 2025-07-10 09:44:41 发布 · 838 阅读

1 ·

CC 4.0 BY-SA版权

本文介绍了目标检测的基础概念，包括目标定位、IOU计算和VOC数据集的详细结构。重点讲解了VOC数据集的类别划分、样本信息和dataloader的构建过程，特别是PascalVOCDataset的实现与数据预处理。

DW_目标检测基础

目标检测基本概念
目标检测数据集VOC
- 数据集介绍
- dataloader的构建

目标检测基本概念

目标检测：需要在识别出图片中目标类别的基础上（图像分类），还要精确定位到目标的具体位置，并用外接矩形框标出。
物体的位置：通过滑窗的方式确定众多候选框，罗列图中各种可能的区域，再对候选框进行分类和微调。这样对于图像中每个区域都能得到（class,x1,y1,x2,y2）五个属性，汇总后最终就得到了图中物体的类别和坐标信息。
除此之外，每个框送入到分类网络分类都有一个得分(代表当前框的置信度)，那么得分最高的就代表识别的最准确的框，其位置就是最终要检测的目标的位置。
目标框定义：目标检测的标签信息有5个，除了类别label以外，需要同时包含目标的位置信息，也就是目标的外接矩形框bounding box。
用来表达bbox的格式通常有两种，(x1, y1, x2, y2) 和 (x_c, y_c, w, h)

两种格式会分别在后续不同场景下更加便于计算。
两种格式互相转换的实现utils.py

def xy_to_cxcy(xy):
    """
    Convert bounding boxes from boundary coordinates (x_min, y_min, x_max, y_max) to center-size coordinates (c_x, c_y, w, h).

    :param xy: bounding boxes in boundary coordinates, a tensor of size (n_boxes, 4)
    :return: bounding boxes in center-size coordinates, a tensor of size (n_boxes, 4)
    """
    return torch.cat([(xy[:, 2:] + xy[:, :2]) / 2,  # c_x, c_y
                      xy[:, 2:] - xy[:, :2]], 1)  # w, h


def cxcy_to_xy(cxcy):
    """
    Convert bounding boxes from center-size coordinates (c_x, c_y, w, h) to boundary coordinates (x_min, y_min, x_max, y_max).

    :param cxcy: bounding boxes in center-size coordinates, a tensor of size (n_boxes, 4)
    :return: bounding boxes in boundary coordinates, a tensor of size (n_boxes, 4)
    """
    return torch.cat([cxcy[:, :2] - (cxcy[:, 2:] / 2),  # x_min, y_min
                      cxcy[:, :2] + (cxcy[:, 2:] / 2)], 1)  # x_max, y_max

交并比

流程：

1.首先获取两个框的坐标，红框坐标: 左上(red_x1, red_y1), 右下(red_x2, red_y2)，绿框坐标: 左上(green_x1, green_y1)，右下(green_x2, green_y2)
2.计算两个框左上点的坐标最大值:(max(red_x1, green_x1), max(red_y1, green_y1)), 和右下点坐标最小值:(min(red_x2, green_x2), min(red_y2, green_y2))
3.利用2算出的信息计算黄框面积：yellow_area
4.计算红绿框的面积：red_area 和 green_area
5.iou = yellow_area / (red_area + green_area - yellow_area)

def find_intersection(set_1, set_2):
    """ 
    Find the intersection of every box combination between two sets of boxes that are in boundary coordinates.

    :param set_1: set 1, a tensor of dimensions (n1, 4)                                                                                                           
    :param set_2: set 2, a tensor of dimensions (n2, 4)
    :return: intersection of each of the boxes in set 1 with respect to each of the boxes in set 2, a tensor of dimensions (n1, n2)
    """

    # PyTorch auto-broadcasts singleton dimensions
    lower_bounds = torch.max(set_1[:, :