概述
EAST算法为“去除人工标定误差”,对Quadrangle形状的标注框,做出shrink处理。Paper中对应章节为:3.3.1 Score Map Generation for Quadrangle,本文简单记录一下这部分的理解。
一、Paper原文
3.3.1 Score Map Generation for Quadrangle
Without loss of generality, we only consider the case where the geometry is a quadrangle. The positive area of the quadrangle on the score map is designed to be roughly a shrunk version of the original one, illustrated in Fig.4(a).
For a quadrangle
Q
=
{
p
i
∣
i
∈
{
1
,
2
,
3
,
4
}
}
Q = \{p_i|i∈\{1, 2, 3, 4\} \}
Q={pi∣i∈{1,2,3,4}}, where
p
i
=
{
x
i
,
y
i
}
p_i = \{x_i, y_i\}
pi={xi,yi} are vertices on the quadrangle in clockwise order. To shrink Q, we first compute a reference length
r
i
r_i
ri for each vertex
p
i
p_i
pi as
r
i
=
m
i
n
(
D
(
p
i
,
p
(
i
m
o
d
4
)
+
1
)
,
D
(
p
i
,
p
(
(
i
+
2
)
m
o
d
4
)
+
1
)
)
r_i = min(D(p_i, p_{\left(i\ mod\ 4\right)\ +\ 1}), D(p_i, p_{\left(\left(i+2\right)\ mod\ 4\right)\ +\ 1}))
ri=min(D(pi,p(i mod 4) + 1),D(pi,p((i+2) mod 4) + 1))
where D ( p i , p j ) D(p_i, p_j) D(pi,pj) is the L 2 L2 L2 distance between p i p_i pi and p j p_j pj.
We first shrink the two longer edges of a quadrangle, and then the two shorter ones. For each pair of two opposing edges, we determine the “longer” pair by comparing the mean of their lengths. For each edge < p i , p ( i m o d 4 ) + 1 > <p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}> <pi,p(i mod 4) + 1>, we shrink it by moving its two endpoints inward along the edge by 0.3 r i 0.3r_i 0.3ri and 0.3 r ( i m o d 4 ) + 1 0.3r_{\left(i\ mod\ 4 \right)\ +\ 1} 0.3r(i mod 4) + 1 respectively.
二、理解分析
1.译文
提示:一些名词,如score map等未翻译,更便于理解。
不失一般性,我们仅考虑文本几何形状标注为四边形的情况。在score map中,四边形的正区域我们设置为原始标注大致缩放后的版本,如图4(a)所示。黄色为原始四边形文本框,绿色实线为缩放处理后的四边形文本框。
对于四边形 Q = { p i ∣ i ∈ { 1 , 2 , 3 , 4 } } Q = \{p_i|i∈\{1, 2, 3, 4\} \} Q={pi∣i∈{1,2,3,4}}, 其中 p i = { x i , y i } p_i = \{x_i, y_i\} pi={xi,yi} ,表示顺时针排列的四边形的四个顶点。为了对 Q Q Q进行缩放处理,我们首先对每个顶点 p i p_i pi计算一个reference length(参考长度,用于缩放) r i r_i ri,计算方法如下公式:
r i = m i n ( D ( p i , p ( i m o d 4 ) + 1 ) , D ( p i , p ( ( i + 2 ) m o d 4 ) + 1 ) ) r_i = min(D(p_i, p_{\left(i\ mod\ 4\right)\ +\ 1}), D(p_i, p_{\left(\left(i+2\right)\ mod\ 4\right)\ +\ 1})) ri=min(D(pi,p(i mod 4) + 1),D(pi,p((i+2) mod 4) + 1))
其中 D ( p i , p j ) D(p_i, p_j) D(pi,pj) 表示两个点 p i p_i pi 和 p j p_j pj之间的 L 2 L2 L2 距离。
我们首先对四边形的两条长边进行缩放,然后再处理两条短边。对于每一组对边(四边形有两组对边),通过比较长度均值决定哪一组“longer”。对于每条边 < p i , p ( i m o d 4 ) + 1 > <p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}> <pi,p(i mod 4) + 1>,通过将边的两个顶点各自沿着该边向内移动 0.3 r i 0.3r_i 0.3ri and 0.3 r ( i m o d 4 ) + 1 0.3r_{\left(i\ mod\ 4 \right)\ +\ 1} 0.3r(i mod 4) + 1 距离。
2.分析
根据reference length的计算公式,可得:
r
1
=
m
i
n
(
D
(
p
1
,
p
2
)
,
D
(
p
1
,
p
4
)
)
=
D
(
p
1
.
p
2
)
r_1 = min(D(p_1, p_2), D(p_1, p_4)) = D(p_1. p_2)
r1=min(D(p1,p2),D(p1,p4))=D(p1.p2)
r
2
=
m
i
n
(
D
(
p
2
,
p
3
)
,
D
(
p
2
,
p
1
)
)
=
D
(
p
2
.
p
3
)
r_2 = min(D(p_2, p_3), D(p_2, p_1)) = D(p_2. p_3)
r2=min(D(p2,p3),D(p2,p1))=D(p2.p3)
r
3
=
m
i
n
(
D
(
p
3
,
p
4
)
,
D
(
p
3
,
p
2
)
)
=
D
(
p
3
.
p
2
)
r_3 = min(D(p_3, p_4), D(p_3, p_2)) = D(p_3. p_2)
r3=min(D(p3,p4),D(p3,p2))=D(p3.p2)
r
4
=
m
i
n
(
D
(
p
4
,
p
1
)
,
D
(
p
4
,
p
3
)
)
=
D
(
p
4
.
p
1
)
r_4 = min(D(p_4, p_1), D(p_4, p_3)) = D(p_4. p_1)
r4=min(D(p4,p1),D(p4,p3))=D(p4.p1)
实际上,
r
i
r_i
ri的计算即是“找出经过每个顶点的两条边中较短的那一条”,然后将其作为缩放的基准。
根据Paper,we first shrink the two longer edges,长边对应上图
p
1
p
2
p_1p_2
p1p2和
p
3
p
4
p_3p_4
p3p4,对照
<
p
i
,
p
(
i
m
o
d
4
)
+
1
>
<p_i, p_{\left(i\ mod\ 4 \right)\ +\ 1}>
<pi,p(i mod 4) + 1>可知,
p
1
,
p
2
p_1, p_2
p1,p2点沿着边分别向内移动
0.3
r
1
,
0.3
r
2
0.3r_1, 0.3r_2
0.3r1,0.3r2
同理,
p
3
,
p
4
p_3, p_4
p3,p4点沿着边分别向内移动
0.3
r
3
,
0.3
r
4
0.3r_3, 0.3r_4
0.3r3,0.3r4
Then the two shorter ones. 与长边端点移动同理,
p
4
,
p
1
p_4, p_1
p4,p1点沿着边分别向内移动
0.3
r
4
,
0.3
r
1
0.3r_4, 0.3r_1
0.3r4,0.3r1;
p
2
,
p
3
p_2, p_3
p2,p3点沿着边分别向内移动
0.3
r
2
,
0.3
r
3
0.3r_2, 0.3r_3
0.3r2,0.3r3
经过上述步骤,原标注四边形框(黑色虚线)完成了缩放,成为蓝色实线框。
该过程之所以能够减小标定误差,原因是:原始标注的Bounding Box,是由人工通过选择能够框住文本区域的四个顶点来完成的,这不可避免地在标注框地边缘位置引入了负类,即非文本区域地背景区域。文本检测的核心原理在于区分图像中的文字区域和背景区域。若不缩放,则在训练过程中将实际为背景的负类错误的学习成为了文本区域的正类。因此EAST算法通过这样一个巧妙地缩放,直接将边缘位置完全当作负类,这样反而使得样本更加纯正。
2.代码
r i r_i ri 计算部分:
# 对每个顶点,找到经过他的两条边中较短的那条
r = [None, None, None, None] # r中每个值就是经过该点两条边中较短那条边的值
for i in range(4):
# linalg = linear(线性)+algebra(代数),norm则表示范数。默认为二范数
r[i] = min(np.linalg.norm(poly[i] - poly[(i + 1) % 4]), # 就是根据两点坐标求出两点间距离 d=sqrt((x1-x2)^2+(y1-y2)^2)
np.linalg.norm(poly[i] - poly[(i - 1) % 4]))
# 对原始标记框进行0.3倍边长的缩放,这样做可以进一步去除人工标注的误差,拿到更准确的label信息。
shrinked_poly = shrink_poly(poly.copy(), r).astype(np.int32)[np.newaxis, :, :]
shrink_poly(poly, r)函数:
函数功能:对原始文本框进行缩放,源码中比上文例子更加普适,加入了角度信息。
def shrink_poly(poly, r):
"""
:param poly: the text poly
:param r: r in the paper
:return: the shrinked poly
"""
# shrink ratio 缩放比例
R = 0.3
# find the longer pair 在两组对边中找较长地一组
if np.linalg.norm(poly[0] - poly[1]) + np.linalg.norm(poly[2] - poly[3]) > \
np.linalg.norm(poly[0] - poly[3]) + np.linalg.norm(poly[1] - poly[2]):
# first move (p0, p1), (p2, p3), then (p0, p3), (p1, p2)
# p0, p1
theta = np.arctan2((poly[1][1] - poly[0][1]), (poly[1][0] - poly[0][0]))
poly[0][0] += R * r[0] * np.cos(theta)
poly[0][1] += R * r[0] * np.sin(theta)
poly[1][0] -= R * r[1] * np.cos(theta)
poly[1][1] -= R * r[1] * np.sin(theta)
# p2, p3
theta = np.arctan2((poly[2][1] - poly[3][1]), (poly[2][0] - poly[3][0]))
poly[3][0] += R * r[3] * np.cos(theta)
poly[3][1] += R * r[3] * np.sin(theta)
poly[2][0] -= R * r[2] * np.cos(theta)
poly[2][1] -= R * r[2] * np.sin(theta)
# p0, p3
theta = np.arctan2((poly[3][0] - poly[0][0]), (poly[3][1] - poly[0][1]))
poly[0][0] += R * r[0] * np.sin(theta)
poly[0][1] += R * r[0] * np.cos(theta)
poly[3][0] -= R * r[3] * np.sin(theta)
poly[3][1] -= R * r[3] * np.cos(theta)
# p1, p2
theta = np.arctan2((poly[2][0] - poly[1][0]), (poly[2][1] - poly[1][1]))
poly[1][0] += R * r[1] * np.sin(theta)
poly[1][1] += R * r[1] * np.cos(theta)
poly[2][0] -= R * r[2] * np.sin(theta)
poly[2][1] -= R * r[2] * np.cos(theta)
else:
# p0, p3
# print poly
theta = np.arctan2((poly[3][0] - poly[0][0]), (poly[3][1] - poly[0][1]))
poly[0][0] += R * r[0] * np.sin(theta)
poly[0][1] += R * r[0] * np.cos(theta)
poly[3][0] -= R * r[3] * np.sin(theta)
poly[3][1] -= R * r[3] * np.cos(theta)
# p1, p2
theta = np.arctan2((poly[2][0] - poly[1][0]), (poly[2][1] - poly[1][1]))
poly[1][0] += R * r[1] * np.sin(theta)
poly[1][1] += R * r[1] * np.cos(theta)
poly[2][0] -= R * r[2] * np.sin(theta)
poly[2][1] -= R * r[2] * np.cos(theta)
# p0, p1
theta = np.arctan2((poly[1][1] - poly[0][1]), (poly[1][0] - poly[0][0]))
poly[0][0] += R * r[0] * np.cos(theta)
poly[0][1] += R * r[0] * np.sin(theta)
poly[1][0] -= R * r[1] * np.cos(theta)
poly[1][1] -= R * r[1] * np.sin(theta)
# p2, p3
theta = np.arctan2((poly[2][1] - poly[3][1]), (poly[2][0] - poly[3][0]))
poly[3][0] += R * r[3] * np.cos(theta)
poly[3][1] += R * r[3] * np.sin(theta)
poly[2][0] -= R * r[2] * np.cos(theta)
poly[2][1] -= R * r[2] * np.sin(theta)
return poly