(Paper reading)ACE loss

loopun

于 2019-07-10 19:22:03 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

分类专栏： Text Recognition 文章标签： Text Recognition

本文链接：https://blog.youkuaiyun.com/loopun/article/details/95368887

ACE loss函数简化了文本识别任务，直接处理2D预测，仅需字符和数量的监督。它避免了精确的序列对齐，但可能限制了序列信息的学习。相关工作包括CTC和注意力机制。实现中，2D图像的网络输出对应像素点，通过FCN获取。结论提出疑问，实际效果待验证，探索与序列模型结合的可能性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Abstract

the proposed ACE loss function exhibits two noteworthy properties:

it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input

it requires only characters and their numbers in the sequence annotation for supervision

对于第一点，这个似乎可以来解决任意排列的文本，论文中做的是1D和2D，颠覆了之前的先定位后识别的pipeline，看起来确实不错。
对于第二点更少的监督，都有个问题是，如果gt都没有序列信息，那么网络能学习到序列信息吗?ACE loss让网络失去了处理序列信息的能力吗？

Related Work

Connectionist temporal classification: CNN-LSTM-CTC
Attention mechanism: use attention mechansim locate the character.

Aggregation Cross-Entropy

对于识别任务,loss函数可以抽象成下面的形式 $S$ 为标注, $I$ 为输入, $\omega$ 为网络参数, $Q$ 训练数据。
$L(\omega)=-\sum_{(I,s)\in Q}logP(S\mid I;\omega)\\ =-\sum_{(I,S)\in Q}{\sum_{l=1}^{L}logP(S_l\mid l;\omega)}$

最低0.47元/天解锁文章