强势源码理解RPN区域推荐网络

深度解析Anchor生成机制

最新推荐文章于 2025-10-24 22:23:36 发布

原创最新推荐文章于 2025-10-24 22:23:36 发布 · 521 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#Faster-RCNN #RPN #源码理解

深度学习专栏收录该内容

21 篇文章

订阅专栏

本文深入探讨了在目标检测中，Anchor生成的具体实现方法，包括RBG大神的Caffe源码解读及Pytorch版本的简化实现。从参数设置到核心函数的运行逻辑，详细解释了如何生成不同尺度和长宽比的Anchor。

部署运行你感兴趣的模型镜像

1. Anchor Generation Layer

对于生成anchors的源码理解主要来源于两个代码

RBG大神的caffe源码：https://github.com/rbgirshick/py-faster-rcnn
Github上复现的pytorch源码：https://github.com/chenyuntc/simple-faster-rcnn-pytorch

由于两种方法生成anchors的技巧不同，故分开讨论，并主要以RBG大神的代码为主，讲解anchors的生成原理与生成技巧。

1.1 Caffe源码

首先，解释一下，重要的参数
- base_size=16，由于原图经过卷积池化后得到的特征图是原图的 $116\frac{1}{16}$ ，故用于采样anchor的特征图上的一个cell就相当于原图的 $16 \times 16$ 区域。
- ratios=[0.5, 1, 2]，固定anchor面积下的长宽比，即 $\quad 1:1 \quad 2:1]$
- scales=[8, 16, 32]，即将anchors放大的倍数，具体在哪里用到会在后面详细解释
其次，我们根据RBG大神的源码走一遍anchors生成的流程
- ```
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """

    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    return anchors
```
  - generate_anchors() 函数是一切的开端，首先定义了base_anchor，由于图像的坐标以左上角为原点且值为(0, 0)，故base_anchor的坐标(xmin, ymin, xmax, ymax)为(0, 0, 15, 15)。
  - 其次，调用_ratio_enum()函数如下
- ```
def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
```
  - 为了计算w, h, x_ctr, y_ctr，又调用了_whctrs()函数，如下所示
- ```
def _whctrs(anchor):
    """
    Return width, height, x center, and y center for an anchor (window).
    """

    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr
```
  - _whctrs()函数的功能就是传入参数为（左上角x，左上角y，右上角x，右上角y），将其转换为（宽，高，中心坐标x，中心坐标y）
- 让我们回到_ratio_enum()函数
  - 得到base_anchor的（宽，高，中心坐标x，中心坐标y），经过计算值为（16, 16, 7.5, 7.5）
  - size = w x h = 16 x 16 = 256
  - size_ratios = $256[0.512]\frac{256}{[0.5 \quad 1 \quad 2]}$ = $[512, 256, 128]$
  - 对size_ratios开根号，再四舍五入，得到 ws = [23, 16, 11]
  - ws和ratios相乘就得到了 hs = [12, 16, 22]
  - ws和hs其实是相同面积下，anchor不同长宽比条件下，得到的长和宽。但由于四舍五入的缘故，ws x hs的面积值不一定相等
  - 得到上面的变量值后，又调用了_mkanchors()函数返回计算后的anchors，函数如下
- ```
def _mkanchors(ws, hs, x_ctr, y_ctr):
    """
    Given a vector of widths (ws) and heights (hs) around a center
    (x_ctr, y_ctr), output a set of anchors (windows).
    """

    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors
```
  - 根据上面的代码，会得到如下的计算公式
    
    $\frac{1}{2}\left[\begin{matrix} 22 \\ 15 \\ 10 \end{matrix}\right] = \left[\begin{matrix} -3.5\\ 0\\ 2.5\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 12\\ 16\\ 22\end{matrix}\right] = \left[\begin{matrix} 1.5\\ 0\\ -3\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 22 \\ 15 \\ 10 \end{matrix}\right] = \left[\begin{matrix} 18.5\\ 15\\ 12.5\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 12\\ 16\\ 22\end{matrix}\right] = \left[\begin{matrix} 13\\ 15\\ 18\end{matrix}\right]$
  - 最后anchors的值为 $[−3.51.518.513.50015152.5−312.518]\left[\begin{matrix} -3.5 & 1.5 & 18.5 & 13.5\\ 0 & 0 & 15 & 15\\ 2.5 & -3 & 12.5 & 18\end{matrix}\right]$
  - 这里得到的是，面积都为256下，以（7.5， 7.5）为中心坐标的，不同长宽比例下的anchor坐标。根据坐标的计算公式，可以发现，都是以7.5为中心坐标减去一半的长或宽，那么得到的是新的（左上角x，左上角y，右上角x，右上角y）形式的坐标值。为什么坐标会是负数，因为左上角坐标超出了图片范围，故为负数。
- 得到以上anchors后，我们直接返回到generate_anchors()函数
  - 通过一系列函数的调用，我们得到了ratio_anchors的值，即 $[−3.51.518.513.50015152.5−312.518]\left[\begin{matrix} -3.5 & 1.5 & 18.5 & 13.5\\ 0 & 0 & 15 & 15\\ 2.5 & -3 & 12.5 & 18\end{matrix}\right]$
  - 最后一步，就是调用_scale_enum()函数，得到不同scale下，不同长宽比例的anchors。目前的scale为[8, 16, 32]，对于每一个scale都要调用_scale_enum()函数；传入不同长宽比、以(7.5, 7.5)为中心坐标的anchors（即ratio_anchors的每一行），每次返回3组变换尺度后的anchors，故最后会有9组anchors。_scale_enum()函数如下
- ```
def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
```
  - 我们以 $\quad 1.5 \quad 18.5 \quad 13.5]$ 为例
  - 调用_whctrs()函数，得到中心坐标表示，w, h, x_ctr, y_ctr = $\quad 12 \quad 7.5\quad 7.5]$
  - $\times \left[\begin{matrix} 8\\ 16\\ 32\end{matrix}\right] = \left[\begin{matrix} 184\\ 368\\ 736\end{matrix}\right]$ ，其实是宽为23的情况下，放大宽的值
  - $\times \left[\begin{matrix} 8\\ 16\\ 32\end{matrix}\right] = \left[\begin{matrix} 96\\ 192\\ 384\end{matrix}\right]$ ，其实是长为12的情况下，放大长的值
  - 由于中心坐标都是(7.5, 7.5)不变，但宽和高的值变了，所以新得到的anchors坐标需要再次调用_mkanchors()对坐标进行调整。在新的长和宽下，仍然以(7.5, 7.5)为中心坐标。
  - 最后计算得到的anchors坐标为 $[−83−3910056−175−87192104−359−183376200]\left[\begin{matrix} -83 & -39 & 100 & 56\\ -175 & -87 & 192 & 104\\ -359 & -183 & 376 & 200\end{matrix}\right]$
至此，RBG大神生成Anchors的方法就介绍完毕

1.2 Pytorch源码

Pytorch版本就不详细解释了，直接上代码，简单易懂

def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):
    """
    Returns:
        ~numpy.ndarray:
        An array of shape :math:`(R, 4)`.
        Each element is a set of coordinates of a bounding box.
        The second axis corresponds to
        :math:`(y_{min}, x_{min}, y_{max}, x_{max})` of a bounding box.
    """
    py = base_size / 2.
    px = base_size / 2.

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in six.moves.range(len(ratios)):
        for j in six.moves.range(len(anchor_scales)):
            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            index = i * len(anchor_scales) + j
            anchor_base[index, 0] = py - h / 2.
            anchor_base[index, 1] = px - w / 2.
            anchor_base[index, 2] = py + h / 2.
            anchor_base[index, 3] = px + w / 2.
    return anchor_base

参数和caffee一致，不同点在于，计算anchor_base的方式
这里的anchor_base没有-1
调用了两个循环，即遍历9次，每次得到一个anchors的坐标
计算的公式很奇怪，为何对ratios开根号，应该是有奇怪的转换公式的
最后，是直接求anchor_base的每一个坐标，以中心坐标为基准，计算(ymin, xmin, ymax, xmax)

本文为作者原创，转载需注明出处！

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理