BetterOCR 开源项目教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00623/article/details/141621837

BetterOCR 开源项目教程

BetterOCR🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.项目地址:https://gitcode.com/gh_mirrors/be/BetterOCR

1. 项目的目录结构及介绍

BetterOCR 项目的目录结构如下：

BetterOCR/
├── examples/
│   └── detect_boxes.py
├── src/
│   ├── __init__.py
│   ├── ocr_engines/
│   │   ├── easyocr.py
│   │   ├── tesseract.py
│   │   └── pororo.py
│   └── utils/
│       ├── config.py
│       └── helpers.py
├── tests/
│   └── test_ocr.py
├── README.md
├── setup.py
└── requirements.txt

目录结构介绍

examples/: 包含示例脚本，如 detect_boxes.py，用于展示如何使用 BetterOCR 进行文本检测。
src/: 项目的源代码目录。
- __init__.py: 初始化文件，使 src 目录成为一个 Python 包。
- ocr_engines/: 包含多个 OCR 引擎的实现，如 easyocr.py, tesseract.py, 和 pororo.py。
- utils/: 包含工具函数和配置文件，如 config.py 和 helpers.py。
tests/: 包含测试脚本，如 test_ocr.py，用于测试 OCR 功能的正确性。
README.md: 项目说明文档。
setup.py: 用于安装项目的脚本。
requirements.txt: 项目依赖的 Python 包列表。

2. 项目的启动文件介绍

项目的启动文件是 examples/detect_boxes.py。该文件展示了如何使用 BetterOCR 进行文本检测。

启动文件内容

import betterocr

# 文本检测示例
text = betterocr.detect_text(
    "demo.png",  # 输入图像文件
    ["ko", "en"],  # 语言代码
    context="",  # 可选上下文
    tesseract={
        "config": "--tessdata-dir /tessdata"
    },
    openai={
        "API_KEY": "sk-xxxxxxx"
    }
)

print(text)

启动文件介绍

betterocr.detect_text: 调用 BetterOCR 的文本检测功能。
"demo.png": 输入图像文件路径。
["ko", "en"]: 指定检测的语言代码。
context="": 可选的上下文信息。
tesseract 和 openai: 配置 Tesseract 和 OpenAI 的选项。

3. 项目的配置文件介绍

项目的配置文件主要位于 src/utils/config.py。该文件包含了项目的配置选项。

配置文件内容

# config.py

import os

class Config:
    TESSERACT_CONFIG = "--tessdata-dir /tessdata"
    OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "sk-xxxxxxx")
    LANGUAGES = ["ko", "en"]
    CONTEXT = ""

config = Config()