【计算机视觉】文本检测综述（含2019年）

最新推荐文章于 2025-08-31 02:48:08 发布

原创最新推荐文章于 2025-08-31 02:48:08 发布 · 1.9k 阅读

17 ·

CC 4.0 BY-SA版权

计算机视觉专栏收录该内容

44 篇文章

订阅专栏

本文汇总了截至2019年中旬的文本检测主流思路、解决方案及常用数据集，包括ICDAR系列数据集、MSRA-TD500等，并介绍了主流框架如Faster R-CNN、SSD、Mask R-CNN的改进模型及其在文本检测领域的应用。

1 文本检测主流思路

到2019年中旬，目前的文本检测方案汇总如下：（看不清的可以点大图）

2 文本检测解决方案

含常用数据集上的检测结果

3 文本检测常用数据集介绍

Benchmark Datasets

ICDAR 2013（Focused Scene Text）（水平文本）

ICDAR2015（Incidental Scene Text）（倾斜文本）

ICDAR 2017 MLT. （主要是多语言）

MSRA-TD500 Text Detection

现有benchmark现状

4 主流框架

（1）anchor/roi-pooling based methods

High level illustration of existing anchor/roi-pooling based methods: (a) Similar to YOLO, predicting at each anchor positions. Representative methods include rotating default boxes. (b) Variants of SSD, including Textboxes, predicting at feature maps of different sizes. (c) Direct regression of bounding boxes, also predicting at each anchor position. (d) Region Proposal based methods, including rotating Region of Interests (RoI) and RoI of varying aspect ratios.

（2）bottom-up methods

Illustration of representative bottom-up methods: (a) SegLink: with SSD as base network, predict word segments at each anchor position, and connections be- tween adjacent anchors. (b) PixelLink: predict for each pixel, text/non-text classification and whether it belongs to the same text as adjacent pixels or not/ (c) Corner Localization: predict the four corners of each text and group those belonging to the same text instances. (d) TextSnake: predict text/non-text and local geometries, which are used to reconstruct text instance.