EAST算法中对于 Score Map Generation for Quadrangle 部分的理解-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_39298073/article/details/108931785

EAST算法针对四边形文本框进行缩放处理，以减少人工标注误差。通过计算每个顶点的参考长度，先缩放较长边，再缩放较短边，实现Score Map上的四边形正区域为原始标注的缩小版，从而避免背景区域被误判为文本区域。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

概述
一、Paper原文
二、理解分析
1.译文
2.分析
2.代码

概述

EAST算法为“去除人工标定误差”，对Quadrangle形状的标注框，做出shrink处理。Paper中对应章节为：3.3.1 Score Map Generation for Quadrangle，本文简单记录一下这部分的理解。

一、Paper原文

3.3.1 Score Map Generation for Quadrangle

Without loss of generality, we only consider the case where the geometry is a quadrangle. The positive area of the quadrangle on the score map is designed to be roughly a shrunk version of the original one, illustrated in Fig.4(a).
在这里插入图片描述
For a quadrangle $Q = \{p_i|i∈\{1, 2, 3, 4\} \}$ , where $p_i = \{x_i, y_i\}$ are vertices on the quadrangle in clockwise order. To shrink Q, we first compute a reference length $r_i$ for each vertex $p_i$ as
$r_i = min(D(p_i, p_{\left(i\ mod\ 4\right)\ +\ 1}), D(p_i, p_{\left(\left(i+2\right)\ mod\ 4\right)\ +\ 1}))$

where $D(p_i, p_j)$ is the $L 2$ distance between $p_i$ and $p_j$ .

We first shrink the two longer edges of a quadrangle, and then the two shorter ones. For each pair of two opposing edges, we determine the “longer” pair by comparing the mean of their lengths. For each edge $<p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}>$ , we shrink it by moving its two endpoints inward along the edge by $0.3r_i$ and $0.3r_{\left(i\ mod\ 4 \right)\ +\ 1}$ respectively.

二、理解分析

1.译文

提示：一些名词，如score map等未翻译，更便于理解。

不失一般性，我们仅考虑文本几何形状标注为四边形的情况。在score map中，四边形的正区域我们设置为原始标注大致缩放后的版本，如图4(a)所示。黄色为原始四边形文本框，绿色实线为缩放处理后的四边形文本框。

对于四边形 $Q = \{p_i|i∈\{1, 2, 3, 4\} \}$ , 其中 $p_i = \{x_i, y_i\}$ ，表示顺时针排列的四边形的四个顶点。为了对 $Q$ 进行缩放处理，我们首先对每个顶点 $p_i$ 计算一个reference length（参考长度，用于缩放） $r_i$ ，计算方法如下公式：

$r_i = min(D(p_i, p_{\left(i\ mod\ 4\right)\ +\ 1}), D(p_i, p_{\left(\left(i+2\right)\ mod\ 4\right)\ +\ 1}))$

其中 $D(p_i, p_j)$ 表示两个点 $p_i$ 和 $p_j$ 之间的 $L 2$ 距离。

我们首先对四边形的两条长边进行缩放，然后再处理两条短边。对于每一组对边（四边形有两组对边），通过比较长度均值决定哪一组“longer”。对于每条边 $<p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}>$ ，通过将边的两个顶点各自沿着该边向内移动 $0.3r_i$ and $0.3r_{\left(i\ mod\ 4 \right)\ +\ 1}$ 距离。

2.分析

根据reference length的计算公式，可得：
$r_1 = min(D(p_1, p_2), D(p_1, p_4)) = D(p_1. p_2)$
$r_2 = min(D(p_2, p_3), D(p_2, p_1)) = D(p_2. p_3)$
$r_3 = min(D(p_3, p_4), D(p_3, p_2)) = D(p_3. p_2)$
$r_4 = min(D(p_4, p_1), D(p_4, p_3)) = D(p_4. p_1)$
实际上， $r_i$ 的计算即是“找出经过每个顶点的两条边中较短的那一条”，然后将其作为缩放的基准。

Quadrangle shrink
根据Paper，we first shrink the two longer edges，长边对应上图 $p_1p_2$ 和 $p_3p_4$ ，对照 $<p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}>$ 可知， $p_1, p_2$ 点沿着边分别向内移动 $0.3r_1, 0.3r_2$
同理， $p_3, p_4$ 点沿着边分别向内移动 $0.3r_3, 0.3r_4$

Then the two shorter ones. 与长边端点移动同理， $p_4, p_1$ 点沿着边分别向内移动 $0.3r_4, 0.3r_1$ ;
$p_2, p_3$ 点沿着边分别向内移动 $0.3r_2, 0.3r_3$

经过上述步骤，原标注四边形框（黑色虚线）完成了缩放，成为蓝色实线框。

该过程之所以能够减小标定误差，原因是：原始标注的Bounding Box，是由人工通过选择能够框住文本区域的四个顶点来完成的，这不可避免地在标注框地边缘位置引入了负类，即非文本区域地背景区域。文本检测的核心原理在于区分图像中的文字区域和背景区域。若不缩放，则在训练过程中将实际为背景的负类错误的学习成为了文本区域的正类。因此EAST算法通过这样一个巧妙地缩放，直接将边缘位置完全当作负类，这样反而使得样本更加纯正。

2.代码

$r_i$ 计算部分：

        # 对每个顶点，找到经过他的两条边中较短的那条
        r = [None, None, None, None]  # r中每个值就是经过该点两条边中较短那条边的值
        for i in range(4):
            # linalg = linear（线性）+algebra（代数），norm则表示范数。默认为二范数
            r[i] = min(np.linalg.norm(poly[i] - poly[(i + 1) % 4]),  # 就是根据两点坐标求出两点间距离 d=sqrt((x1-x2)^2+(y1-y2)^2)
                       np.linalg.norm(poly[i] - poly[(i - 1) % 4]))

            # 对原始标记框进行0.3倍边长的缩放，这样做可以进一步去除人工标注的误差，拿到更准确的label信息。
            shrinked_poly = shrink_poly(poly.copy(), r).astype(np.int32)[np.newaxis, :, :]

shrink_poly(poly, r)函数：
函数功能：对原始文本框进行缩放，源码中比上文例子更加普适，加入了角度信息。

def shrink_poly(poly, r):
    """
    :param poly: the text poly
    :param r: r in the paper
    :return: the shrinked poly
    """
    # shrink ratio 缩放比例
    R = 0.3
    # find the longer pair 在两组对边中找较长地一组
    if np.linalg.norm(poly[0] - poly[1]) + np.linalg.norm(poly[2] - poly[3]) > \
            np.linalg.norm(poly[0] - poly[3]) + np.linalg.norm(poly[1] - poly[2]):
        # first move (p0, p1), (p2, p3), then (p0, p3), (p1, p2) 
        # p0, p1
        theta = np.arctan2((poly[1][1] - poly[0][1]), (poly[1][0] - poly[0][0]))
        poly[0][0] += R * r[0] * np.cos(theta)
        poly[0][1] += R * r[0] * np.sin(theta)
        poly[1][0] -= R * r[1] * np.cos(theta)
        poly[1][1] -= R * r[1] * np.sin(theta)
        # p2, p3
        theta = np.arctan2((poly[2][1] - poly[3][1]), (poly[2][0] - poly[3][0]))
        poly[3][0] += R * r[3] * np.cos(theta)
        poly[3][1] += R * r[3] * np.sin(theta)
        poly[2][0] -= R * r[2] * np.cos(theta)
        poly[2][1] -= R * r[2] * np.sin(theta)
        # p0, p3
        theta = np.arctan2((poly[3][0] - poly[0][0]), (poly[3][1] - poly[0][1]))
        poly[0][0] += R * r[0] * np.sin(theta)
        poly[0][1] += R * r[0] * np.cos(theta)
        poly[3][0] -= R * r[3] * np.sin(theta)
        poly[3][1] -= R * r[3] * np.cos(theta)
        # p1, p2
        theta = np.arctan2((poly[2][0] - poly[1][0]), (poly[2][1] - poly[1][1]))
        poly[1][0] += R * r[1] * np.sin(theta)
        poly[1][1] += R * r[1] * np.cos(theta)
        poly[2][0] -= R * r[2] * np.sin(theta)
        poly[2][1] -= R * r[2] * np.cos(theta)
    else:
        # p0, p3
        # print poly
        theta = np.arctan2((poly[3][0] - poly[0][0]), (poly[3][1] - poly[0][1]))
        poly[0][0] += R * r[0] * np.sin(theta)
        poly[0][1] += R * r[0] * np.cos(theta)
        poly[3][0] -= R * r[3] * np.sin(theta)
        poly[3][1] -= R * r[3] * np.cos(theta)
        # p1, p2
        theta = np.arctan2((poly[2][0] - poly[1][0]), (poly[2][1] - poly[1][1]))
        poly[1][0] += R * r[1] * np.sin(theta)
        poly[1][1] += R * r[1] * np.cos(theta)
        poly[2][0] -= R * r[2] * np.sin(theta)
        poly[2][1] -= R * r[2] * np.cos(theta)
        # p0, p1
        theta = np.arctan2((poly[1][1] - poly[0][1]), (poly[1][0] - poly[0][0]))
        poly[0][0] += R * r[0] * np.cos(theta)
        poly[0][1] += R * r[0] * np.sin(theta)
        poly[1][0] -= R * r[1] * np.cos(theta)
        poly[1][1] -= R * r[1] * np.sin(theta)
        # p2, p3
        theta = np.arctan2((poly[2][1] - poly[3][1]), (poly[2][0] - poly[3][0]))
        poly[3][0] += R * r[3] * np.cos(theta)
        poly[3][1] += R * r[3] * np.sin(theta)
        poly[2][0] -= R * r[2] * np.cos(theta)
        poly[2][1] -= R * r[2] * np.sin(theta)
    return poly