Tesseract识别指定字符范围的字符

最新推荐文章于 2025-10-12 22:24:35 发布

原创

最新推荐文章于 2025-10-12 22:24:35 发布 · 1.8w 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#opencv #Tesseract #Emgu #OCR

通过配置Emgu.CV.OCR.Tesseract的Tesseract对象，可以设置白名单参数"tessedit_char_whitelist"来限制Tesseract仅识别特定字符，如数字或字母，从而提高识别准确率。例如，设置"tessedit_char_whitelist"为"0123456789"可只识别数字，将它改为"abcdefghijklmnopqrstuvwxyz"则仅识别字母。

可以通过配置Tesseract来使用Tesseract进行OCR，opencv和opencv的C#版本Emgu都集成了Tesseract这个工具。

但是在使用时经常会出现误判，比如把“s”识别成“5”，把“1”识别成“l”或“i”。可以设置相应的参数来识别指定范围的字符。

下面是Emgu中关于这个函数的API文档：

Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)

public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)
Member of Emgu.CV.OCR.Tesseract

Summary:
Create an tesseract OCR engine.

Parameters:
dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.
language: The language is (usually) an ISO 639-3 string or NULL will default to eng. It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier. The language may be a string of the form [~]%lt

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

try-catch-finally

关注关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
5
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

Tesseract配置参数详解及适用场景（PyTesseract进行OCR）

qq_37292005的博客

06-06

2594

在使用 PyTesseract 进行 OCR 时，合理配置参数是提高识别准确率的关键。以下是 Tesseract 常用参数的详细解释和适用场景

OpenCV+Tesseract自动识别文字区域并识别文字

bluesea089的博客

08-05

5169

OpenCV+Tesseract自动识别文字区域并识别文字文字区域识别文字区域处理完成代码如果图片中有非文字的其他图形，直接用tesseract进行识别的话，会把非文字的图形当成文字进行识别（往往识别出来的是乱七八糟的字符）。因此首先需要把文字区域识别出来，再对文字区域进行处理，最后进行文字识别。文字区域识别 Mat preprocess(Mat gray) { //1.Sobel算子，x方向求梯度 Mat sobel; Sobel(gray, sobel, CV_8U, 1, 0, 3);

5 条评论您还未登录，请先登录后发表或查看评论

5 条评论

ljjjjy 2017.08.10
请问可以使用中文包来识别吗？
- try-catch-finally回复ljjjjy 2017.08.17
  [reply]ljjjjy[/reply] 不好意思，时间太长了，我也不太清楚了

bmw-chenzengbing 2017.01.18
可以设置中文的范围么，例如tesseract.SetVariable("tessedit_char_whitelist", "中文");
- try-catch-finally回复bmw-chenzengbing 2017.08.17
  [reply]hankaokeczb[/reply] 不好意思，时间太长了，我也不太清楚了