Tesseract编译安装

本文详细介绍了如何在系统中编译和安装Tesseract OCR,包括安装依赖项如leptonica和automake,配置步骤,以及训练工具的构建和安装。完成安装后,Tesseract的相关工具将位于/usr/local/tesseract/bin目录下,可用于文字识别和训练。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

编译安装Tesseract

安装依赖项

leptonica

wget http://www.leptonica.com/source/leptonica-1.72.tar.gz
tar -xvf leptonica-1.72.tar.gz  
cd leptonica-1.72
./configure && make && make instal

automake、libtool

yum install automake libtool

安装Tesseract

wget https://github.com/tesseract-ocr/tesseract/archive/3.04.00.tar.gz
mv 3.04.00  Tesseract3.04.00.tar.gz
tar -xvf Tesseract3.04.00.tar.gz  
cd tesseract-3.04.00/
./autogen.sh
./configure --prefix=/usr/local/tesseract
make && make install

​ 注意事项:在配置完后

Configuration is done.
You can now build and install tesseract by running:

make sudo make install

Training tools can be build and installed (after building of tesseract) with:

maketraining sudo make training-install

会在/usr/local/tesseract/bin目录下多出训练数据的的工具

/usr/local/tesseract/bin/ambiguous_words
/usr/local/tesseract/bin/classifier_tester
/usr/local/tesseract/bin/cntraining
/usr/local/tesseract/bin/combine_tessdata
/usr/local/tesseract/bin/dawg2wordlist
/usr/local/tesseract/bin/mftraining
/usr/local/tesseract/bin/set_unicharset_properties
/usr/local/tesseract/bin/shapeclustering
/usr/local/tesseract/bin/text2image
/usr/local/tesseract/bin/unicharset_extractor
/usr/local/tesseract/bin/wordlist2dawg

配置

下载字体库

cd /usr/local/tesseract/share/tessdata
wget --no-check-certificate https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget --no-check-certificate https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata 

环境变量设置

vim ~/.bash_profile
export TESS_ROOT=/usr/local/tesseract
export TESSDATA_PREFIX=/usr/local/tesseract/share
export PATH=$PATH:$TESS_ROOT/bin

# 可选
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/tesseract/lib           # 添加查找库的路径
export PKG_CONFIG_PATH=${PKG_CONFIG_PATH}:/usr/local/tesseract/lib/pkgconfig  # 添加pkgconfig的查找路径

测试

convert -negate card.jpg card.tif
tesseract card.tif ./b  -psm 3 -l chi_sim+eng   

参考

CENTOS 下 编译安装 tesseract-ocr 3.0.4 识别文字

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值