语言包下载地址:https://gitcode.net/mirrors/tesseract-ocr/tessdata?utm_source=csdn_github_accelerator 中文简写:chi_sim.traineddata
window 开箱即用 不做赘述了。
linux部署步骤
Tess4j 版本5.1.0:maven 引用
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>5.1.0</version>
</dependency>
依据tess4j版本, 查找 tesseract 和lept 版本(版本需要对应)
pom引包后 引用
net.sourceforge.lept4j.util.LeptUtils
定位jar包内容,得到lept版本,如图为1.82.0
tess4j :github地址:https://github.com/nguyenq/tess4j/releases
tesseract:GitHub地址:https://github.com/tesseract-ocr/tesseract/releases
lept:GitHub地址::http://www.leptonica.org/download.html
去对应地址下载对应的.tar.gz
准备好资源包 开始操作:
1、将Tesseract、Leptonica压缩包 上传至 /root/
2、安装依赖
yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel gcc gcc-c++
3、安装lept,4步
mkdir /usr/local/leptonica
tar -xzvf leptonica-1.82.0.tar.gz
cd leptonica-1.82.0
./configure --prefix=/usr/local/leptonica && make && make install
4、安装tesseract
mkdir /usr/local/tesseract
tar -xzvf 5.0.1.tar.gz
cd tesseract-5.0.1
./autogen.sh
./configure --prefix=/usr/local/tesseract && make && make install
5、配置环境变量
vim /etc/profile //打开环境变量配置文件
//插入如下内容
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/leptonica/lib:/usr/local/tesseract/lib
export LD_LIBRARY_PATH
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/leptonica/lib:/usr/local/tesseract/lib
export LIBRARY_PATH
LIBLEPT_HEADERSDIR=/usr/local/leptonica/include/leptonica
export LIBLEPT_HEADERSDIR
PATH=$PATH:/usr/local/tesseract/bin
export PATH
export TESSDATA_PREFIX=/usr/local/tesseract/share/tessdata
export PATH=$PATH:$TESSDATA_PREFIX
//end
source /etc/profile //环境变量生效
6、检查安装
tesseract --version
7、进入linux 路径中拿编译完成的文件
/usr/local/tesseract/lib 下载libtesseract.so.5.0.0
/usr/local/leptonica/lib 下载liblept.so.5.0.4
8、重命名去掉版本号放入项目
9.使用
public static void getResult(BufferedImage bufferedImage) {
try {
ITesseract instance = new Tesseract();
instance.setLanguage("chi_sim");//语言包名称 如chi_sim.traineddata 写chi_sim
String path = "";//获取语言包路径 如chi_sim.traineddata 的路径为d:\\tess\\chi_sim.traineddata 填d:\\tess linux与windows 不同但要保证路径的文件夹下有上步需要的语言包
System.out.println("文件路径:" + path);
instance.setDatapath(path);
// 设置识别引擎
instance.setOcrEngineMode(1);
instance.setPageSegMode(6);
List<ITesseract.RenderedFormat> arr = new ArrayList<>();
arr.add(ITesseract.RenderedFormat.TEXT);
OCRResult result = instance.createDocumentsWithResults(bufferedImage, "", "", arr, 3);
List<Word> res = result.getWords();
System.out.println("置信度:" + result.getConfidence());
for (Word word : res) {
//识别的内容
}
} catch (Exception e) {
e.printStackTrace();
}
}