[paddleocr]PP-ChatOCRv4 产线使用教程

最新推荐文章于 2025-07-04 16:07:02 发布

代码终究输给规则

最新推荐文章于 2025-07-04 16:07:02 发布

阅读量630

点赞数 3

CC 4.0 BY-SA版权

分类专栏： OCR 文章标签： java 前端 javascript

本文链接：https://blog.youkuaiyun.com/futureflsl/article/details/148268767

OCR 专栏收录该内容

44 篇文章

订阅专栏


# 此脚本只展示了图片的用例，其他文件类型的调用请查看API参考来调整

import base64
import pprint
import sys
import requests


API_BASE_URL = "http://0.0.0.0:8080"

image_path = "./demo.jpg"
keys = ["姓名"]

with open(image_path, "rb") as file:
    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode("ascii")

payload = {
    "file": image_data,
    "fileType": 1,
}

resp_visual = requests.post(url=f"{API_BASE_URL}/chatocr-visual", json=payload)
if resp_visual.status_code != 200:
    print(
        f"Request to chatocr-visual failed with status code {resp_visual.status_code}."
    )
    pprint.pp(resp_visual.json())
    sys.exit(1)
result_visual = resp_visual.json()["result"]

for i, res in enumerate(result_visual["layoutParsingResults"]):
    print(res["prunedResult"])
    for img_name, img in res["outputImages"].items():
        img_path = f"{img_name}_{i}.jpg"
        with open(img_path, "wb") as f:
            f.write(base64.b64decode(img))
        print(f"Output image saved at {img_path}")

payload = {
    "visualInfo": result_visual["visualInfo"],
}
resp_vector = requests.post(url=f"{API_BASE_URL}/chatocr-vector", json=payload)
if resp_vector.status_code != 200:
    print(
        f"Request to chatocr-vector failed with status code {resp_vector.status_code}."
    )
    pprint.pp(resp_vector.json())
    sys.exit(1)
result_vector = resp_vector.json()["result"]

payload = {
    "image": image_data,
    "keyList": keys,
}
resp_mllm = requests.post(url=f"{API_BASE_URL}/chatocr-mllm", json=payload)
if resp_mllm.status_code != 200:
    print(
        f"Request to chatocr-mllm failed with status code {resp_mllm.status_code}."
    )
    pprint.pp(resp_mllm.json())
    sys.exit(1)
result_mllm = resp_mllm.json()["result"]

payload = {
    "keyList": keys,
    "visualInfo": result_visual["visualInfo"],
    "useVectorRetrieval": True,
    "vectorInfo": result_vector["vectorInfo"],
    "mllmPredictInfo": result_mllm["mllmPredictInfo"],
}
resp_chat = requests.post(url=f"{API_BASE_URL}/chatocr-chat", json=payload)
if resp_chat.status_code != 200:
    print(
        f"Request to chatocr-chat failed with status code {resp_chat.status_code}."
    )
    pprint.pp(resp_chat.json())
    sys.exit(1)
result_chat = resp_chat.json()["result"]
print("Final result:")
print(result_chat["chatResult"])

4. 二次开发¶

如果 PP-ChatOCRv4 产线提供的默认模型权重在您的场景中，精度或速度不满意，您可以尝试利用您自己拥有的特定领域或应用场景的数据对现有模型进行进一步的微调，以提升在您的场景中的识别效果。

4.1 模型微调¶

由于 PP-ChatOCRv4 产线包含若干模块，模型产线的效果如果不及预期，可能来自于其中任何一个模块。您可以对提取效果差的 case 进行分析，通过可视化图像，确定是哪个模块存在问题，并参考以下表格中对应的微调教程链接进行模型微调。

情形	微调模块	微调参考链接
版面区域检测不准，如印章、表格未检出等	版面区域检测模块	链接
表格结构识别不准	表格结构识别	链接
印章文本存在漏检	印章文本检测模块	链接
文本存在漏检	文本检测模块	链接
文本内容都不准	文本识别模块	链接
垂直或者旋转文本行矫正不准	文本行方向分类模块	链接
整图旋转矫正不准	文档图像方向分类模块	链接
图像扭曲矫正不准	文本图像矫正模块	暂不支持微调

4.2 模型应用¶

当您使用私有数据集完成微调训练后，可获得本地模型权重文件，然后可以通过自定义产线配置文件的方式，使用微调后的模型权重。

获取产线配置文件

可调用 PaddleOCR 中 PPChatOCRv4 产线对象的 export_paddlex_config_to_yaml 方法，将当前产线配置导出为 YAML 文件：

from paddleocr import PPChatOCRv4

pipeline = PPChatOCRv4()
pipeline.export_paddlex_config_to_yaml("PP-ChatOCRv4.yaml")

修改配置文件

在得到默认的产线配置文件后，将微调后模型权重的本地路径替换至产线配置文件中的对应位置即可。例如

......
SubModules:
    TextDetection:
    module_name: text_detection
    model_name: PP-OCRv5_server_det
    model_dir: null # 替换为微调后的文本检测模型权重路径
    limit_side_len: 960
    limit_type: max
    thresh: 0.3
    box_thresh: 0.6
    unclip_ratio: 1.5

    TextRecognition:
    module_name: text_recognition
    model_name: PP-OCRv5_server_rec
    model_dir: null # 替换为微调后的文本检测模型权重路径
    batch_size: 1
            score_thresh: 0
......

在产线配置文件中，不仅包含 PaddleOCR CLI 和 Python API 支持的参数，还可进行更多高级配置，具体信息可在 PaddleX模型产线使用概览中找到对应的产线使用教程，参考其中的详细说明，根据需求调整各项配置。

在 CLI 中加载产线配置文件

在修改完成配置文件后，通过命令行的 --paddlex_config 参数指定修改后的产线配置文件的路径，PaddleOCR 会读取其中的内容作为产线配置。示例如下：

paddleocr pp_chatocrv4_doc --paddlex_config PP-ChatOCRv4.yaml ...

在 Python API 中加载产线配置文件

初始化产线对象时，可通过 paddlex_config 参数传入 PaddleX 产线配置文件路径或配置字典，PaddleOCR 会读取其中的内容作为产线配置。示例如下：

from paddleocr import PPChatOCRv4

pipeline = PPChatOCRv4(paddlex_config="PP-ChatOCRv4.yaml")