yolo训练参数scale和multi-scale的区别

原创已于 2023-08-09 05:35:04 修改 · 5k 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#YOLO #计算机视觉 #人工智能 #深度学习 #目标检测

于 2023-08-07 22:41:30 首次发布

yolov7/8系列解读与实战专栏收录该内容

13 篇文章

订阅专栏

该文章已生成可运行项目，

yolov5/v7训练时，有两个和多尺度有关的参数，一个是scale, 另一个是multi-scale（yolov8去掉了这个）。
其中scale在超参数配置文件中设置：

multi-scale在训练脚本中设置：

那么这两个参数有什么区别呢？
首先我们看看代码中使用它们的地方。
scale
scale在datasets.py的random_perspective中使用，作用是缩放图像（透视变换中的尺度）。

    R = np.eye(3)
    a = random.uniform(-degrees, degrees)
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
    s = random.uniform(1 - scale, 1.1 + scale)
    # s = 2 ** random.uniform(-scale, scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)
    ...

    # Combined rotation matrix
    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT
    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed
        if perspective:
            img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))
        else:  # affine
            img = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))

在对图像进行变换后，还要对目标的框信息进行变换：

xy = np.ones((n * 4, 3))
            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            xy = xy @ M.T  # transform
            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine

            # create new boxes
            x = xy[:, [0, 2, 4, 6]]
            y = xy[:, [1, 3, 5, 7]]
            new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T

            # clip
            new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
            new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)

scale是在dataset读取时发生作用，得到的图像的大小等于网络的输入大小，目标的大小（占比）变化了。

multi scale
multi scale在train.py中，在从dataset中读取出数据后进行。

# Multi-scale
            if opt.multi_scale:
                sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs  # size
                sf = sz / max(imgs.shape[2:])  # scale factor
                if sf != 1:
                    ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]]  # new shape (stretched to gs-multiple)
                    imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)

得到的图像大小不一定等于网络的输入大小，相当于使用多种输入大小训练网络。这里没有对框进行缩放，是因为框坐标本身是归一化的，图像缩放不影响目标的占比，所以不需要对框进行处理。