目标检测-锚框概念和代码实现_目标检测框的实现代码-优快云博客

本文链接：https://blog.youkuaiyun.com/TOPthemaster/article/details/120831120

前言

经历过图像分类后，进一步的就是更复杂的目标检测了，从这一章开始，将会不断记录图像目标检测中的学习经历，其中大多数思路以及代码来源，来自于李沐的动手学深度学习课程，不过在这里，我会尽可能不用d2l的库，而是把里面方法提取出来，或者重写，以便理解和单独使用。

锚框概念

在目标检测中，我们需要去框选出目标所在位置的坐标，这个时候，在初始的深度学习方案中，提出了锚框的概念，即预先对每个像素绘制5个左右的虚拟框，例如：
假设输入一张500x500的图片，那么它应该得到的锚框数为：
500x500x5个
代码实现如下：


def multibox_prior(data, sizes, ratios):
    in_height, in_width = data.shape[-2:]
    device = data.device
    num_sizes, num_ratios = len(sizes), len(ratios)
    boxes_per_pixel = num_sizes + num_ratios - 1  # 每个像素的anchor数量

    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    offset_h, offset_w = 0.5, 0.5
    # 归一化
    steps_h = 1.0 / in_height
    steps_w = 1.0 / in_width

    # 计算中心偏移
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    shift_y, shift_x = torch.meshgrid(center_h, center_w)
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)

    # 由于一个像素对应boxes_per_pixel个anchor，交叉重复boxes_per_pixel次
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1).repeat_interleave(boxes_per_pixel, dim=0)

    # 计算在一个像素处，anchor左上、右下坐标相对于像素中心的偏移
    # 下面在计算w时，为了处理矩形的情况,需要* in_height / in_width
    w = torch.cat(
        (size_tensor * torch.sqrt(ratio_tensor[0]), sizes[0] * torch.sqrt(ratio_tensor[1:]))) * in_height / in_width
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]), sizes[0] / torch.sqrt(ratio_tensor[1:])))

    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(in_height * in_width, 1) / 2

    output = out_grid + anchor_manipulations
    return output.unsqueeze(0)
img_get = Image.open("../img/1.jpeg")  # 读取图片
plt.imshow(img_get , cmap=plt.cm.binary)
# plt.show()
print(img_get)
trans = transforms.Compose([  # 将所有的transform操作合并在一起执行
transforms.Compose([transforms.ToTensor()])
])
img = img_get.convert("RGB")
img =trans(img)
img = torch.unsqueeze(img, dim=0)
print(img.shape)
h, w = img.shape[-2:]
print(h, w)
# 构建与图像大小一直的锚框模板
X = torch.rand(size=(1, 3, h, w))

print("X.shape",X.shape)
print(