目录
前言:
OCR场景应用集合:包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等9个垂类模型,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。
一、环境搭建
(下面流程pythonv版本3.8及3.9可以正常使用,3.10版本不行,其它版本没测试)
1、PaddleOCR环境搭建
conda create -n ppocr python==3.8
conda activate ppocr
进入paddlepaddle官网输入以下指令安装paddlepaddle GPU版本
(我的cuda版本是11.8,根据你电脑装合适版本)
pip install paddlepaddle-gpu==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
进入PaddlePaddle / PaddleOCR官网下载 PaddleOCR_2.7版本,放在你主目录下:
首先要克隆paddleocr项目,项目地址。(老是有地址不能用多备几个)
git clone https://github.com/PaddlePaddle/Paddle.git
# 克隆到本地
git clone https://gitcode.com/gh_mirrors/pa/PaddleOCR.git
之后安装命令:
百度云(百度云里面包含百度官方下载软件比较快)
百度云有点问题换其它的
pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple/
阿里云
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
华为云(华为云有些软件特快,面对百度软件的特慢只有几十K)
pip install -r requirements.txt -i https://mirrors.huaweicloud.com/repository/pypi/simple/
清华云(下载速度都比较平均)
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
腾讯云
https://mirrors.cloud.tencent.com/pypi/simple/
豆瓣
https://pypi.douban.com/simple/
网易
https://mirrors.163.com/pypi/simple/
中科大
https://pypi.mirrors.ustc.edu.cn/simple/
没有报错说明paddlepaddle-gpu环境安装成功。
2、PaddleOCR模型下载,PaddleOCR模型官网
最新更新模型地址:ppocr模型官网
分别下载检测和识别的推理模型
模型下载之后为两个压缩包,在PaddleOCR-realase-2.7根目录下新建文件夹inference_model
将压缩包解压到该文件夹下,如下图所示。
打开anaconda终端激活环境进入到PaddleOCR-releas-2.7目录下运行以下指令,其中image_dir为所要识别的图片路径,det_model_dir为刚才下载的文字检测模型,rec_model_dir为刚才下载的文字识别模型。
python tools/infer/predict_system.py --image_dir="./test_images/1.jpg" --det_model_dir="./inference_model/ch_PP-OCRv4_det_infer/" --rec_model_dir="./inference_model/ch_PP-OCRv4_rec_infer/"
环境搭建成功
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3、安装PPOCRLabel的环境
进入到PPOCRLabel目录下运行:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
运行PaddleOCR
python PPOCRLabel.py --lang ch
遇到问题:
ValueError: operands could not be broadcast together with shapes (213,488,4) (1,1,3)
解决方法
找到PaddleOCR/data/paddle.png 改成paddle1.png
报错:
FatalError: `Process abort signal` is detected by the operating system.
[TimeInfo: *** Aborted at 1728701280 (unix time) try "date -d @1728701280" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e800004230) received by PID 16944 (TID 0x79e6f6ad2740) from PID 16944 ***]
已放弃 (核心已转储)
解决方法:
pip install opencv-python-headless -i https://pypi.tuna.tsinghua.edu.cn/simple
再次运行:
python PPOCRLabel.py --lang ch
有报错需要在PPOCRLabel文件目录下的PPOCRLabel.py文件最上面加入以下代码。
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
弹出PPOCRLabel运行界面:
二、制作数据集
选择自动标注点击ok等待自动标注完成
自动标注:
自动标注结束后,从第一张开始检查,漏打标的按下Q框出字体,打标文字错误的,点击方框,在右边修改,并对每一个方框给出关键词列表(点击编辑点击更改box关键词信息)。最后删除无用信息,切换下一张快捷键为D,如下图所示。
全部打标完成之后,点击文件选择导出标记结果,再点击文件选择导出识别结果,完成后再文件夹多出四个文件fileState,Label,rec_gt, crop_img。其中crop_img中的图片用来训练文字识别模型,fileState记录图片的打标完成与否,Label为训练文字检测模型的标签,rec_gt为训练文字识别模型的标签。
打标签后,下面进行数据集的制作。在PaddleOCR根目录下建立train_data文件夹,并且将打标签生成的文件和图片放在该文件夹下。
打开终端进入PPOCRLabel的文件夹下,输入以下代码进行数据集的划分
其中gen_ocr_train_val_test.py在PPOCRLabel下:
# coding:utf8
import os
import shutil
import random
import argparse
# 删除划分的训练集、验证集、测试集文件夹,重新创建一个空的文件夹
def isCreateOrDeleteFolder(path, flag):
flagPath = os.path.join(path, flag)
if os.path.exists(flagPath):
shutil.rmtree(flagPath)
os.makedirs(flagPath)
flagAbsPath = os.path.abspath(flagPath)
return flagAbsPath
def splitTrainVal(root, absTrainRootPath, absValRootPath, absTestRootPath, trainTxt, valTxt, testTxt, flag):
# 按照指定的比例划分训练集、验证集、测试集
dataAbsPath = os.path.abspath(root)
if flag == "det":
labelFilePath = os.path.join(dataAbsPath, args.detLabelFileName)
elif flag == "rec":
labelFilePath = os.path.join(dataAbsPath, args.recLabelFileName)
labelFileRead = open(labelFilePath, "r", encoding="UTF-8")
labelFileContent = labelFileRead.readlines()
random.shuffle(labelFileContent)
labelRecordLen = len(labelFileContent)
for index, labelRecordInfo in enumerate(labelFileContent):
imageRelativePath = labelRecordInfo.split('\t')[0]
imageLabel = labelRecordInfo.split('\t')[1]
imageName = os.path.basename(imageRelativePath)
if flag == "det":
imagePath = os.path.join(dataAbsPath, imageName)
elif flag == "rec":
imagePath = os.path.join(dataAbsPath, "{}/{}".format(args.recImageDirName, imageName))
# 按预设的比例划分训练集、验证集、测试集
trainValTestRatio = args.trainValTestRatio.split(":")
trainRatio = eval(trainValTestRatio[0]) / 10
valRatio = trainRatio + eval(trainValTestRatio[1]) / 10
curRatio = index / labelRecordLen
if curRatio < trainRatio:
imageCopyPath = os.path.join(absTrainRootPath, imageName)
shutil.copy(imagePath, imageCopyPath)
trainTxt.write("{}\t{}".format(imageCopyPath, imageLabel))
elif curRatio >= trainRatio and curRatio < valRatio:
imageCopyPath = os.path.join(absValRootPath, imageName)
shutil.copy(imagePath, imageCopyPath)
valTxt.write("{}\t{}".format(imageCopyPath, imageLabel))
else:
imageCopyPath = os.path.join(absTestRootPath, imageName)
shutil.copy(imagePath, imageCopyPath)
testTxt.write("{}\t{}".format(imageCopyPath, imageLabel))
# 删掉存在的文件
def removeFile(path):
if os.path.exists(path):
os.remove(path)
def genDetRecTrainVal(args):
detAbsTrainRootPath = isCreateOrDeleteFolder(args.detRootPath, "train")
detAbsValRootPath = isCreateOrDeleteFolder(args.detRootPath, "val")
detAbsTestRootPath = isCreateOrDeleteFolder(args.detRootPath, "test")
recAbsTrainRootPath = isCreateOrDeleteFolder(args.recRootPath, "train")
recAbsValRootPath = isCreateOrDeleteFolder(args.recRootPath, "val")
recAbsTestRootPath = isCreateOrDeleteFolder(args.recRootPath, "test")
removeFile(os.path.join(args.detRootPath, "train.txt"))
removeFile(os.path.join(args.detRootPath, "val.txt"))
removeFile(os.path.join(args.detRootPath, "test.txt"))
removeFile(os.path.join(args.recRootPath, "train.txt"))
removeFile(os.path.join(args.recRootPath, "val.txt"))
removeFile(os.path.join(args.recRootPath, "test.txt"))
detTrainTxt = open(os.path.join(args.detRootPath, "train.txt"), "a", encoding="UTF-8")
detValTxt = open(os.path.join(args.detRootPath, "val.txt"), "a", encoding="UTF-8")
detTestTxt = open(os.path.join(args.detRootPath, "test.txt"), "a", encoding="UTF-8")
recTrainTxt = open(os.path.join(args.recRootPath, "train.txt"), "a", encoding="UTF-8")
recValTxt = open(os.path.join(args.recRootPath, "val.txt"), "a", encoding="UTF-8")
recTestTxt = open(os.path.join(args.recRootPath, "test.txt"), "a", encoding="UTF-8")
splitTrainVal(args.datasetRootPath, detAbsTrainRootPath, detAbsValRootPath, detAbsTestRootPath, detTrainTxt, detValTxt,
detTestTxt, "det")
for root, dirs, files in os.walk(args.datasetRootPath):
for dir in dirs:
if dir == 'crop_img':
splitTrainVal(root, recAbsTrainRootPath, recAbsValRootPath, recAbsTestRootPath, recTrainTxt, recValTxt,
recTestTxt, "rec")
else:
continue
break
if __name__ == "__main__":
# 功能描述:分别划分检测和识别的训练集、验证集、测试集
# 说明:可以根据自己的路径和需求调整参数,图像数据往往多人合作分批标注,每一批图像数据放在一个文件夹内用PPOCRLabel进行标注,
# 如此会有多个标注好的图像文件夹汇总并划分训练集、验证集、测试集的需求
parser = argparse.ArgumentParser()
parser.add_argument(
"--trainValTestRatio",
type=str,
default="6:2:2",
help="ratio of trainset:valset:testset")
parser.add_argument(
"--datasetRootPath",
type=str,
default="../train_data/",
help="path to the dataset marked by ppocrlabel, E.g, dataset folder named 1,2,3..."
)
parser.add_argument(
"--detRootPath",
type=str,
default="../train_data/det",
help="the path where the divided detection dataset is placed")
parser.add_argument(
"--recRootPath",
type=str,
default="../train_data/rec",
help="the path where the divided recognition dataset is placed"
)
parser.add_argument(
"--detLabelFileName",
type=str,
default="Label.txt",
help="the name of the detection annotation file")
parser.add_argument(
"--recLabelFileName",
type=str,
default="rec_gt.txt",
help="the name of the recognition annotation file"
)
parser.add_argument(
"--recImageDirName",
type=str,
default="crop_img",
help="the name of the folder where the cropped recognition dataset is located"
)
args = parser.parse_args()
genDetRecTrainVal(args)
python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ../train_data/drivingData
文字检测和文字识别的数据集就都制作完成!!
paddlepaddle中configs里面具体应用:
- cls:
- det:代表 Detection(检测)。检测任务涉及在图像或视频中定位和识别对象或特征。例如,在目标检测中,模型需要确定图像中对象的类别和位置。
- e2e:代表 End-to-End(端到端)。端到端模型通常指的是直接从原始输入生成最终输出的模型,不需要中间的手动干预或特征工程。
- kie:代表 Key Information Extraction(关键信息提取)。这是一种从文本中自动提取特定信息(如实体、关系、事件等)的任务。
- rec:代表 Recognition(识别)。识别任务可以是多种形式的,例如手写字符识别、语音识别等,其中模型学习将输入数据(如手写文字或语音信号)转换为可读的输出。
- sr:代表 Super-Resolution(超分辨率)。超分辨率是一种图像处理技术,用于从低分辨率图像生成高分辨率图像。
- table:与表格的任务有关,例如表格结构识别、表格内容提取等。
三、训练文字检测模型
1. 下载模型训练文件 下载官方训练模型
下载之后在PaddleOCR-release-2.7根目录下建立pretrain_models文件夹,并将训练模型解压至该文件夹下。如下图:
2. 配置ppocr检测模型文件
在configs / det / ch_ppocr_v2.0 /找到 ch_det_res18_db_v2.0.yml配置文件
需要修改参数:
use_gpu: true # 是否用GPU,无改为false
epoch_num: 50 # 训练迭代次数
print_batch_step: 2 # 一次图片传输张数
save_epoch_step: 50 # 训练迭代多少次保存一次训练模型
save_model_dir: ./output/ch_db_res18/ # 输出模型路径
pretrained_model: ./pretrain_models/ch_PP-OCRv4_det_train/best_accuracy.pdparams
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/det/train.txt # 数据集标签路径
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/det/val.txt # 数据集中的评估标签
我自己代码
Global:
use_gpu: true # 是否用GPU,无改为false
epoch_num: 50 # 训练迭代次数
log_smooth_window: 20
print_batch_step: 2 # 一次图片传输张数
save_model_dir: ./output/ch_db_res18/ # 输出模型文件路径
save_epoch_step: 50 # 训练迭代多少次保存一次训练模型
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [3000, 2000]
cal_metric_during_train: False
pretrained_model: ./pretrain_models/ch_PP-OCRv4_det_train/best_accuracy.pdparams # 刚下载好的训练模型路径
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_en/img_10.jpg
save_res_path: ./output/det_db/predicts_db.txt
Architecture:
model_type: det
algorithm: DB
Transform:
Backbone:
name: ResNet_vd
layers: 18
disable_se: True
Neck:
name: DBFPN
out_channels: 256
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 2
regularizer:
name: 'L2'
factor: 0
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/det/train.txt # 数据集标签路径
ratio_list: [1.0]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- IaaAugment:
augmenter_args:
- { 'type': Fliplr, 'args': { 'p': 0.5 } }
- { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
- { 'type': Resize, 'args': { 'size': [0.5, 3] } }
- EastRandomCropData:
size: [960, 960]
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask'] # the order of the dataloader list
loader:
shuffle: True
drop_last: False
batch_size_per_card: 2
num_workers: 2
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/det/val.txt # 数据集中的评估标签
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- DetLabelEncode: # Class handling label
- DetResizeForTest:
# image_shape: [736, 1280]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 1 # must be 1
num_workers: 2
3.模型开始训练
打开anaconda终端,激活环境进入到PaddleOCR-releas-2.7根目录下
输入以下指令开始模型训练:
python tools/train.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml
4. 测试训练模型
找到模型保存的路径:/output/ch_db_res18/
使用best_accuracy.pdparams进行我们的模型测试,没有说明训练次数少,用latest.pdparams模型测试
在anaconda终端中输入以下指令进行测试, 其中Global.pretrained_model是训练好并且需要测试的模型,Global.infer_img为所要检测的图片路径。
python tools/infer_det.py -c configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml -o Global.pretrained_model=output/ch_db_res18/latest.pdparams Global.infer_img="./test_images/1.jpg"
查看输出结果:
检测模型完成,接下来进行识别模型训练和测试
四、训练文字识别模型
1、修改识别模型配置文件
文字识别使用配置文件为ch_PP-OCRv3_rec.yml
在configs / rec / PP-OCRv3 /找到 ch_PP-OCRv3_rec.yml 配置文件
修改的地方和文字检测修改类似。
需要修改参数:
use_gpu: true # 是否用GPU,无改为false
epoch_num: 50 # 训练迭代次数
print_batch_step: 2 # 一次图片传输张数
save_epoch_step: 50 # 训练迭代多少次保存一次训练模型
save_model_dir: ./output/ch_db_res18/ # 输出模型路径
pretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_train/student.pdparams # 识别训练模型路径
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/rec/train.txt # 识别训练数据集标签路径
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/ # train_data路径
label_file_list:
- ./train_data/rec/val.txt # 识别数据集中的评估标签路径
我自己代码
Global:
debug: false
use_gpu: true
epoch_num: 50
log_smooth_window: 20
print_batch_step: 1
save_model_dir: ./output/rec_ppocr_v3
save_epoch_step: 15
eval_batch_step: [3000, 2000]
cal_metric_during_train: true
pretrained_model: ./pretrain_models/ch_PP-OCRv4_rec_train/student.pdparams # 识别训练模型路径
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv4.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05
Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
last_pool_kernel_size: [2, 2]
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: True
Head:
fc_decay: 0.00001
- SARHead:
enc_dim: 512
max_text_length: *max_text_length
Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- SARLoss:
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
ignore_space: False
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
ext_op_transform_idx: 1
label_file_list:
- ./train_data/rec/train.txt # 识别训练数据集标签路径
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: true
batch_size_per_card: 16
drop_last: true
num_workers: 8
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/
label_file_list:
- ./train_data/rec/val.txt # 识别数据集中的评估标签路径
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 16
num_workers: 8
2.模型训练
打开anaconda终端,激活环境进入到PaddleOCR-releas-2.7根目录下。输入以下指令开始模型训练。
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
报错如下:
Out of memory error on GPU 0. Cannot allocate 120.000000MB memory on GPU 0, 5.726685GB memory has been allocated and available memory is only 41.250000MB.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:86)
解决办法:
调低 batch_size
识别模型训练完成
3.模型测试
在anaconda终端中输入以下指令进行测试。 其中Global.pretrained_model是我们训练好并且需要测试的模型,Global.infer_img为所要检测的图片路径。
python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o Global.pretrained_model=output/rec_ppocr_v3/best_accuracy.pdparams Global.infer_img="./test_images/1.jpg"
识别模型结束
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
五、转换成推理模型
在anaconda终端中输入指令进行测试
其中Global.pretrained_model是训练好并且需要推理的模型,Global.save_inference_dir为要保存推理模型的位置。推理模型是可以直接被调用进行识别和检测。分别把训练好的文字检测模型和文字识别模型推理。
python tools/export_model.py -c "./configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml" -o Global.pretrained_model="./output/ch_db_res18/latest.pdparams" Global.save_inference_dir="./inference_model/det/"
保存在 inference model is saved to ./inference_model/det/inference
python tools/export_model.py -c "./configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml" -o Global.pretrained_model="./output/rec_ppocr_v3/latest.pdparams" Global.save_inference_dir="./inference_model/rec/"
保存在 inference model is saved to ./inference_model/rec/inference
其中det和rec即是保存的推理模型
用predict_system.py进行验证。打开anaconda终端输入以下指令:
python tools/infer/predict_system.py --image_dir="./test_images/1.jpg" --det_model_dir="./inference_model/det/" --rec_model_dir="./inference_model/rec"
输入指令后结果如下:
表格识别在下篇文章:PP-Structure 文档分析-优快云博客