安装leptonica
先安装第三方包:
yum install libtiff-devel libjpeg-devel libpng-devel -y
wget http://www.leptonica.org/source/leptonica-1.78.0.tar.gz
tar -xzvf leptonica-1.78.0.tar.gz
cd leptonica-1.78.0
./configure
make && make install
安装Tesseract-OCR
wget https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.1
tar -xvf 4.1.1
cd tesseract-4.1.1/
./autogen.sh
./configure
make && make install
sudo ldconfig
./configure这一步可能的报错以及解决
问题一
报错:
[root@iZwz9bpg2u1r39ml9st8qzZ tesseract-master]# ./autogen.sh
Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
./autogen.sh: line 59: bail_out: command not found
Running aclocal
./autogen.sh: line 88: aclocal: command not found
Something went wrong, bailing out!
解决:yum install automake -y
问题二
报错:
Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
./autogen.sh: line 59: bail_out: command not found
Running aclocal
Running
./autogen.sh: line 87: -f: command not found
Something went wrong, bailing out!
解决:yum install libtool -y
问题三
报错:
Leptonica 1.74 or higher is required. Try to install libleptonica-dev package
解决:
配置一下leptonica的环境变量
export LD_LIBRARY_PATH=$LD_LIBRARY_PAYT:/usr/local/lib
export LIBLEPT_HEADERSDIR=/usr/local/include
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
安装语言包
#获取所有语言包
git clone https://github.com/tesseract-ocr/tessdata.git
下载地址:https://github.com/tesseract-ocr/tessdata
wget --no-check-certificate https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
chi_sim.traineddata 中文
eng.traineddata 英文
enm.traineddata 数字
一般默认的语言包路径是:/usr/local/share/tessdata
也就是把需要用的语言包移动到此目录下即可
执行命令
tesseract /home/aa.jpg stdout -l chi_sim+eng
总结
识别效果不咋的,应该是训练的模型对中文支持不是很好
参考一下两篇文章: