1. 确认环境是否支持
我用的实践环境:
一台Windows7_x64,和一台Windows10_x64
Python 3.8.5
pip 21.0.1
我是安装的CPU版,对CUDA没有特别要求。
2. 在PyCharm中创建一个新的工程
其中pip升级了一次,注意升级时要先把原来虚拟环境中的pip包删除,否则更新不了。
3. 安装PaddleOCR包
在虚拟环境中用pip安装
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
4. 验证安装
import paddle
if __name__ == '__main__':
paddle.utils.run_check()
5. 下载PaddleOCR repo代码
可以用git clone(推荐)
git clone https://github.com/PaddlePaddle/PaddleOCR
也可以通svn checkout
svn co https://github.com/PaddlePaddle/PaddleOCR/trunk
在PyCharm中将PaddleOCR文件夹设置成源码目录
6. 安装相关第三方库
cd PaddleOCR
pip install -r requirements.txt
安装前可能需要修改一下仓库地址:https://mirror.baidu.com/pypi/simple/
并且这一步骤依赖vc14编译工具,否则会报错:
可以按照提示从官方链接下载:https://visualstudio.microsoft.com/visual-cpp-build-tools/
然后双击安装
相关第三方库安装完成后的效果如下图:
(venv) D:\000\PaddleStudy\PaddleOCR>pip install -r requirements.txt
Requirement already satisfied: shapely in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 1)) (1.7.1)
Requirement already satisfied: scikit-image==0.17.2 in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 2)) (0.17.2)
Requirement already satisfied: imgaug==0.4.0 in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 3)) (0.4.0)
Requirement already satisfied: pyclipper in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 4)) (1.2.1)
Requirement already satisfied: lmdb in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 5)) (1.1.1)
Requirement already satisfied: opencv-python==4.2.0.32 in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 6)) (4.2.0.32)
Requirement already satisfied: tqdm in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 7)) (4.59.0)
Requirement already satisfied: numpy in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 8)) (1.19.3)
Requirement already satisfied: visualdl in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 9)) (2.1.1)
Requirement already satisfied: python-Levenshtein in d:\000\paddlestudy\venv\lib\site-packages (from -r requirements.txt (line 10)) (0.12.2)
Requirement already satisfied: matplotlib in d:\000\paddlestudy\venv\lib\site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (3.3.4)
Requirement already satisfied: Pillow in d:\000\paddlestudy\venv\lib\site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (8.1.2)
Requirement already satisfied: imageio in d:\000\paddlestudy\venv\lib\site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (2.9.0)
Requirement already satisfied: scipy in d:\000\paddlestudy\venv\lib\site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (1.6.1)
Requirement already satisfied: six in d:\000\paddlestudy\venv\lib\site-packages (from imgaug==0.4.0->-r requirements.txt (line 3)) (1.15.0)
Requirement already satisfied: tifffile>=2019.7.26 in d:\000\paddlestudy\venv\lib\site-packages (from scikit-image==0.17.2->-r requirements.txt (line 2)) (2021.3.5)
Requirement already satisfied: networkx>=2.0 in d:\000\paddlestudy\venv\lib\site-packages (from scikit-image==0.17.2->-r requirements.txt (line 2)) (2.5)
Requirement already satisfied: PyWavelets>=1.1.1 in d:\000\paddlestudy\venv\lib\site-packages (from scikit-image==0.17.2->-r requirements.txt (line 2)) (1.1.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in d:\000\paddlestudy\venv\lib\site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in d:\000\paddlestudy\venv\lib\site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in d:\000\paddlestudy\venv\lib\site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (2.8.1)
Requirement already satisfied: cycler>=0.10 in d:\000\paddlestudy\venv\lib\site-packages (from matplotlib->imgaug==0.4.0->-r requirements.txt (line 3)) (0.10.0)
Requirement already satisfied: decorator>=4.3.0 in d:\000\paddlestudy\venv\lib\site-packages (from networkx>=2.0->scikit-image==0.17.2->-r requirements.txt (line 2)) (4.4.2)
Requirement already satisfied: setuptools in d:\000\paddlestudy\venv\lib\site-packages (from python-Levenshtein->-r requirements.txt (line 10)) (54.1.0)
Requirement already satisfied: pre-commit in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (2.10.1)
Requirement already satisfied: flake8>=3.7.9 in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (3.8.4)
Requirement already satisfied: requests in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (2.25.1)
Requirement already satisfied: Flask-Babel>=1.0.0 in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (2.0.0)
Requirement already satisfied: bce-python-sdk in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (0.8.53)
Requirement already satisfied: protobuf>=3.11.0 in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (3.15.5)
Requirement already satisfied: shellcheck-py in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (0.7.1.1)
Requirement already satisfied: flask>=1.1.1 in d:\000\paddlestudy\venv\lib\site-packages (from visualdl->-r requirements.txt (line 9)) (1.1.2)
Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in d:\000\paddlestudy\venv\lib\site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 9)) (2.2.0)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in d:\000\paddlestudy\venv\lib\site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 9)) (0.6.1)
Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in d:\000\paddlestudy\venv\lib\site-packages (from flake8>=3.7.9->visualdl->-r requirements.txt (line 9)) (2.6.0)
Requirement already satisfied: Jinja2>=2.10.1 in d:\000\paddlestudy\venv\lib\site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 9)) (2.11.3)
Requirement already satisfied: click>=5.1 in d:\000\paddlestudy\venv\lib\site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 9)) (7.1.2)
Requirement already satisfied: itsdangerous>=0.24 in d:\000\paddlestudy\venv\lib\site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 9)) (1.1.0)
Requirement already satisfied: Werkzeug>=0.15 in d:\000\paddlestudy\venv\lib\site-packages (from flask>=1.1.1->visualdl->-r requirements.txt (line 9)) (1.0.1)
Requirement already satisfied: pytz in d:\000\paddlestudy\venv\lib\site-packages (from Flask-Babel>=1.0.0->visualdl->-r requirements.txt (line 9)) (2021.1)
Requirement already satisfied: Babel>=2.3 in d:\000\paddlestudy\venv\lib\site-packages (from Flask-Babel>=1.0.0->visualdl->-r requirements.txt (line 9)) (2.9.0)
Requirement already satisfied: MarkupSafe>=0.23 in d:\000\paddlestudy\venv\lib\site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->-r requirements.txt (line 9)) (1.1.1)
Requirement already satisfied: pycryptodome>=3.8.0 in d:\000\paddlestudy\venv\lib\site-packages (from bce-python-sdk->visualdl->-r requirements.txt (line 9)) (3.10.1)
Requirement already satisfied: future>=0.6.0 in d:\000\paddlestudy\venv\lib\site-packages (from bce-python-sdk->visualdl->-r requirements.txt (line 9)) (0.18.2)
Requirement already satisfied: toml in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (0.10.2)
Requirement already satisfied: virtualenv>=20.0.8 in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (20.4.2)
Requirement already satisfied: identify>=1.0.0 in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (2.1.0)
Requirement already satisfied: cfgv>=2.0.0 in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (3.2.0)
Requirement already satisfied: pyyaml>=5.1 in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (5.4.1)
Requirement already satisfied: nodeenv>=0.11.1 in d:\000\paddlestudy\venv\lib\site-packages (from pre-commit->visualdl->-r requirements.txt (line 9)) (1.5.0)
Requirement already satisfied: distlib<1,>=0.3.1 in d:\000\paddlestudy\venv\lib\site-packages (from virtualenv>=20.0.8->pre-commit->visualdl->-r requirements.txt (line 9)) (0.3.1)
Requirement already satisfied: appdirs<2,>=1.4.3 in d:\000\paddlestudy\venv\lib\site-packages (from virtualenv>=20.0.8->pre-commit->visualdl->-r requirements.txt (line 9)) (1.4.4)
Requirement already satisfied: filelock<4,>=3.0.0 in d:\000\paddlestudy\venv\lib\site-packages (from virtualenv>=20.0.8->pre-commit->visualdl->-r requirements.txt (line 9)) (3.0.12)
Requirement already satisfied: certifi>=2017.4.17 in d:\000\paddlestudy\venv\lib\site-packages (from requests->visualdl->-r requirements.txt (line 9)) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in d:\000\paddlestudy\venv\lib\site-packages (from requests->visualdl->-r requirements.txt (line 9)) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in d:\000\paddlestudy\venv\lib\site-packages (from requests->visualdl->-r requirements.txt (line 9)) (1.26.3)
Requirement already satisfied: chardet<5,>=3.0.2 in d:\000\paddlestudy\venv\lib\site-packages (from requests->visualdl->-r requirements.txt (line 9)) (4.0.0)
7. 创建目录
下列目录需要手工创建
ch_lite
det_db
inference
inference_results
models
8. 下载文本检测、文本识别、文本方向分类模型的参数文件
【文本检测模型】 下载地址:
https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar
解压到ch_lite目录下
【文本识别模型】下载地址:
https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar
解压到ch_lite目录下
【文本方向分类模型】下载地址:
https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar
解压到ch_lite目录下
直接下载解压后,可能出现压缩包内的某些“文件夹”会显示成“文件”。
我试验了在Linux下用wget下载tar包,然后用tar -xf解压,可以解决此问题。感觉大概率应该是出在解压上,在windows上也可以尝试用别的解压软件试试看。
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/
wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar -C ./ch_lite/
[user@localhost tmp]$ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar -C ./ch_lite/
--2021-03-11 16:05:36-- https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar
正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 2409:8c00:6c21:10ad:0:ff:b00e:67d, 112.34.111.44, 39.156.69.23
正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|2409:8c00:6c21:10ad:0:ff:b00e:67d|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:12400640 (12M) [application/x-tar]
正在保存至: “./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar”
100%[====================================================================================================================================================================================================================================>] 12,400,640 3.17MB/s 用时 6.0s
2021-03-11 16:05:43 (1.98 MB/s) - 已保存 “./ch_lite/ch_ppocr_mobile_v1.1_det_train.tar” [12400640/12400640])
[user@localhost tmp]$ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_rec_train.tar -C ./ch_lite/
--2021-03-11 16:05:53-- https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_train.tar
正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 2409:8c00:6c21:10ad:0:ff:b00e:67d, 39.156.69.23, 112.34.111.44
正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|2409:8c00:6c21:10ad:0:ff:b00e:67d|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:13675015 (13M) [application/x-tar]
正在保存至: “./ch_lite/ch_ppocr_mobile_v1.1_rec_train.tar”
100%[====================================================================================================================================================================================================================================>] 13,675,015 2.45MB/s 用时 7.6s
2021-03-11 16:06:01 (1.71 MB/s) - 已保存 “./ch_lite/ch_ppocr_mobile_v1.1_rec_train.tar” [13675015/13675015])
[user@localhost tmp]$ wget -P ./ch_lite/ https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar && tar xf ./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar -C ./ch_lite/
--2021-03-11 16:06:10-- https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar
正在解析主机 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 2409:8c00:6c21:10ad:0:ff:b00e:67d, 112.34.111.44, 39.156.69.23
正在连接 paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|2409:8c00:6c21:10ad:0:ff:b00e:67d|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:3957248 (3.8M) [application/x-tar]
正在保存至: “./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar”
100%[====================================================================================================================================================================================================================================>] 3,957,248 1.05MB/s 用时 4.0s
2021-03-11 16:06:14 (976 KB/s) - 已保存 “./ch_lite/ch_ppocr_mobile_v1.1_cls_train.tar” [3957248/3957248])
[user@localhost tmp]$ ll
总用量 4
drwxrwxr-x. 5 user group 4096 3月 11 16:06 ch_lite
[user@localhost tmp]$ cd ch_lite/
[user@localhost ch_lite]$ ll
总用量 29336
drwxr-xr-x. 2 user group 184 9月 15 16:11 ch_ppocr_mobile_v1.1_cls_train
-rw-rw-r--. 1 user group 3957248 9月 17 14:43 ch_ppocr_mobile_v1.1_cls_train.tar
drwxr-xr-x. 2 user group 92 9月 16 14:56 ch_ppocr_mobile_v1.1_det_train
-rw-rw-r--. 1 user group 12400640 9月 16 15:18 ch_ppocr_mobile_v1.1_det_train.tar
drwxrwxr-x. 2 user group 92 9月 24 09:15 ch_ppocr_mobile_v1.1_rec_train
-rw-rw-r--. 1 user group 13675015 9月 24 10:48 ch_ppocr_mobile_v1.1_rec_train.tar
9. 将模型参数文件转换成inference模型
(venv) E:\test\PycharmProjects\PaddleStudy\PaddleOCR>python tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy Global.save_inference_dir=./inference/det_db
[2021/03/11 16:21:47] root INFO: resume from ./ch_lite/ch_ppocr_mobile_v1.1_det_train/best_accuracy
[2021/03/11 16:21:52] root INFO: inference model is saved to ./inference/det_db//inference
(venv) E:\test\PycharmProjects\PaddleStudy\PaddleOCR>python tools/export_model.py -c configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy Global.save_i
erence_dir=./inference/rec_crnn/
[2021/03/11 16:22:04] root INFO: resume from ./ch_lite/ch_ppocr_mobile_v1.1_rec_train/best_accuracy
[2021/03/11 16:22:08] root INFO: inference model is saved to ./inference/rec_crnn//inference
(venv) E:\test\PycharmProjects\PaddleStudy\PaddleOCR>python tools/export_model.py -c configs/cls/cls_mv3.yml -o Global.checkpoints=./ch_lite/ch_ppocr_mobile_v1.1_cls_train/best_accuracy Global.save_inference_dir=./inference/cls/
[2021/03/11 16:22:15] root INFO: resume from ./ch_lite/ch_ppocr_mobile_v1.1_cls_train/best_accuracy
[2021/03/11 16:22:19] root INFO: inference model is saved to ./inference/cls//inference
(venv) E:\test\PycharmProjects\PaddleStudy\PaddleOCR>
其中:
# -c后面设置训练算法的yml配置文件
# -o配置可选参数
# Global.checkpoints 参数指向训练中保存的模型参数文件,不用添加文件后缀.pdmodel,.pdopt或.pdparams。
# Global.save_inference_dir 参数设置转换的模型将保存的地址。
转换完成后
在inference/det_db下生成了以下文件(文本检测模型):
在inferencerec_crnn下生成了以下文件(文本识别模型):
在inference/cls下生成了以下文件(文本方向分类模型):
我留意到,不同环境下生成的.pdmodel模型文件大小会不同(.pdiparams和.info对应文件大小相同),我在另一台windows10上的文件大小为:
inference/det_db/inference.pdmodel - 1149KB
inference/rec_crnn/inference.pdmodel - 848KB
inference/cls/inference.pdmodel - 835KB
【参考文献】
飞桨安装文档:
https://www.paddlepaddle.org.cn/install/quick
基于Python预测引擎推理:
https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/inference.md
PaddleOCR简单文字识别使用
https://blog.youkuaiyun.com/weixin_43134049/article/details/110670762