本文为美国斯坦福大学(作者:DavidJ. Wu)的毕业论文,共60页。
自然图像中全端到端的文本识别是近年来计算机视觉和机器学习领域备受关注的一个具有挑战性的问题。该领域的传统系统依赖于精心设计的模型,这些模型结合了精心设计的特征或大量的先验知识。在这篇论文中,描述了一种将大型多层神经网络的表现力与无监督特征学习的最新发展相结合的替代方法。这种特殊的方法使我们能够训练高度精确的文本检测和字符识别模块。由于这些检测和识别模块具有高度的准确性和鲁棒性,因此可以仅使用简单的现成技术将它们集成到完整的端到端、词典驱动的场景文本识别系统中。基于以上工作,我们演示了在切分单词识别和全端到端文本识别中标准基准测试上的最新性能。
Full end-to-end text recognition in naturalimages is a challenging problem that has recently received much attention incomputer vision and machine learning. Traditional systems in this area haverelied on elaborate models that incorporate carefully hand-engineered featuresor large amounts of prior knowledge. In this thesis, I describe an alternativeapproach that combines the representational power of large, multilayer neural networkswith recent developments in unsupervised feature learning. This particularapproach enables us to train highly accurate text detection and characterrecognition modules. Because of the high degree of accuracy and robustness ofthese detection and recognition modules, it becomes possible to integrate theminto a full end-to-end, lexicon-driven, scene text recognition system usingonly simple off-the-shelf techniques. In doing so, we demonstratestate-of-theart performance on standard benchmarks in both cropped-wordrecognition as well as full end-to-end text recognition.
1 引言
2 项目背景与相关工作
2.1 场景文本识别
2.2 无监督的特征学习
2.3 卷积神经网络
3 研究方法
3.1 检测与识别模块
3.2 文本行检测
3.3 端到端集成
4 实验
4.1 文本检测
4.2 字符与文字识别
4.3 全端到端文本识别
5 结论
5.1 总结
5.2 本系统的局限性与未来研究方向
下载英文原文地址:
http://page2.dfpan.com/fs/2lc8j2b21f293166c07/
更多精彩文章请关注微信号: