Umi-OCR HTTP接口详解：RESTful API集成开发教程-优快云博客

Umi-OCR HTTP接口详解：RESTful API集成开发教程

【免费下载链接】Umi-OCR Umi-OCR: 这是一个免费、开源、可批量处理的离线OCR软件，适用于Windows系统，支持截图OCR、批量OCR、二维码识别等功能。项目地址: https://gitcode.com/GitHub_Trending/um/Umi-OCR

概述

Umi-OCR作为一款免费开源的离线OCR软件，提供了强大的HTTP接口功能，让开发者能够通过RESTful API轻松集成OCR能力到自己的应用中。本文将深入解析Umi-OCR的HTTP接口体系，提供完整的集成开发指南。

接口基础配置

启用HTTP服务

在使用HTTP接口前，需要在Umi-OCR的全局设置中启用HTTP服务：

打开Umi-OCR软件
进入"全局设置"页面
勾选"高级"选项显示完整设置
确保HTTP服务已启用（默认开启）
可选择"任何可用地址"以允许局域网访问

默认配置

默认端口: 1224
默认地址: 127.0.0.1
协议: HTTP

核心接口详解

1. 图片OCR接口

1.1 参数查询接口

GET /api/ocr/get_options

此接口用于获取OCR参数配置信息，返回JSON格式的参数定义。

响应示例:

{
    "ocr.language": {
        "title": "语言/模型库",
        "optionsList": [
            ["models/config_chinese.txt","简体中文"],
            ["models/config_en.txt","English"]
        ],
        "type": "enum",
        "default": "models/config_chinese.txt"
    },
    "ocr.cls": {
        "title": "纠正文本方向",
        "default": false,
        "type": "boolean"
    }
}

1.2 Base64识别接口

POST /api/ocr
Content-Type: application/json

请求参数:

{
    "base64": "iVBORw0KGgoAAAAN...",
    "options": {
        "data.format": "text",
        "ocr.language": "models/config_chinese.txt"
    }
}

响应结构:

{
    "code": 100,
    "data": "识别文本内容",
    "time": 0.5,
    "timestamp": 1711521012.625574
}

2. 文档识别接口（PDF/EPUB等）

文档识别采用多步流程，支持批量处理文档文件。

识别流程时序图

mermaid

2.1 上传文档接口

POST /api/doc/upload
Content-Type: multipart/form-data

表单参数:

file: 文档文件
json: 配置参数（JSON字符串）

2.2 状态查询接口

POST /api/doc/result
Content-Type: application/json

请求参数:

{
    "id": "任务ID",
    "is_data": true,
    "format": "text"
}

3. 二维码接口

3.1 二维码识别

POST /api/qrcode
Content-Type: application/json

请求参数:

{
    "base64": "图片Base64编码",
    "options": {
        "preprocessing.sharpness_factor": 1.0
    }
}

3.2 二维码生成

POST /api/qrcode
Content-Type: application/json

请求参数:

{
    "text": "要生成的文本内容",
    "options": {
        "format": "QRCode",
        "w": 200,
        "h": 200
    }
}

参数配置详解

OCR参数配置表

参数名	类型	默认值	说明
`ocr.language`	enum	"models/config_chinese.txt"	语言/模型库选择
`ocr.cls`	boolean	false	纠正文本方向
`ocr.limit_side_len`	enum	960	限制图像边长
`tbpu.parser`	enum	"multi_para"	排版解析方案
`data.format`	enum	"dict"	数据返回格式

排版解析方案选项

mermaid

代码示例

Python集成示例

1. 图片OCR识别

import requests
import json
import base64

class UmiOCRClient:
    def __init__(self, host="127.0.0.1", port=1224):
        self.base_url = f"http://{host}:{port}"
    
    def image_to_text(self, image_path, options=None):
        """图片OCR识别"""
        # 读取图片并转换为Base64
        with open(image_path, "rb") as image_file:
            image_data = base64.b64encode(image_file.read()).decode('utf-8')
        
        # 构建请求数据
        data = {
            "base64": image_data,
            "options": options or {"data.format": "text"}
        }
        
        # 发送请求
        response = requests.post(
            f"{self.base_url}/api/ocr",
            json=data,
            headers={"Content-Type": "application/json"}
        )
        
        return response.json()

# 使用示例
client = UmiOCRClient()
result = client.image_to_text("test.png")
print(result["data"])

2. 文档批量处理

def process_document(self, file_path, output_format="pdfLayered"):
    """处理文档文件"""
    # 1. 上传文件获取任务ID
    with open(file_path, "rb") as file:
        files = {"file": file}
        data = {"json": json.dumps({"doc.extractionMode": "mixed"})}
        response = requests.post(
            f"{self.base_url}/api/doc/upload",
            files=files,
            data=data
        )
    
    task_id = response.json()["data"]
    
    # 2. 轮询任务状态
    while True:
        status_response = requests.post(
            f"{self.base_url}/api/doc/result",
            json={
                "id": task_id,
                "is_data": False
            }
        )
        status = status_response.json()
        
        if status["is_done"]:
            break
        time.sleep(2)
    
    # 3. 生成目标文件
    download_response = requests.post(
        f"{self.base_url}/api/doc/download",
        json={
            "id": task_id,
            "file_types": [output_format]
        }
    )
    
    download_url = download_response.json()["data"]
    
    # 4. 下载文件
    file_response = requests.get(download_url, stream=True)
    with open("output.pdf", "wb") as f:
        for chunk in file_response.iter_content(chunk_size=8192):
            f.write(chunk)
    
    # 5. 清理任务
    requests.get(f"{self.base_url}/api/doc/clear/{task_id}")
    
    return "output.pdf"

JavaScript集成示例

前端OCR集成

class UmiOCRWebClient {
    constructor(baseUrl = 'http://127.0.0.1:1224') {
        this.baseUrl = baseUrl;
    }

    // 图片文件转Base64
    async fileToBase64(file) {
        return new Promise((resolve, reject) => {
            const reader = new FileReader();
            reader.onload = () => {
                // 移除data:image/png;base64,前缀
                const base64 = reader.result.split(',')[1];
                resolve(base64);
            };
            reader.onerror = reject;
            reader.readAsDataURL(file);
        });
    }

    // OCR识别
    async recognizeImage(file, options = {}) {
        try {
            const base64 = await this.fileToBase64(file);
            
            const response = await fetch(`${this.baseUrl}/api/ocr`, {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({
                    base64: base64,
                    options: {
                        data_format: 'text',
                        ...options
                    }
                })
            });

            const result = await response.json();
            
            if (result.code === 100) {
                return result.data;
            } else {
                throw new Error(`OCR失败: ${result.data}`);
            }
        } catch (error) {
            console.error('OCR识别错误:', error);
            throw error;
        }
    }

    // 生成二维码
    async generateQRCode(text, options = {}) {
        const response = await fetch(`${this.baseUrl}/api/qrcode`, {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                text: text,
                options: {
                    format: 'QRCode',
                    ...options
                }
            })
        });

        const result = await response.json();
        return result.data; // Base64格式的二维码图片
    }
}

// 使用示例
const ocrClient = new UmiOCRWebClient();

// 文件选择事件处理
document.getElementById('fileInput').addEventListener('change', async (event) => {
    const file = event.target.files[0];
    if (file) {
        try {
            const text = await ocrClient.recognizeImage(file);
            document.getElementById('result').textContent = text;
        } catch (error) {
            alert('识别失败: ' + error.message);
        }
    }
});

错误处理与最佳实践

错误代码表

错误代码	说明	处理建议
100	成功	-
101	无文本	检查图片是否包含文字
200-299	二维码相关错误	检查图片质量和二维码格式
900-999	系统错误	重启Umi-OCR或检查日志

性能优化建议

图像预处理: 对于大图，使用ocr.limit_side_len参数进行压缩
批量处理: 使用文档接口处理多个页面，减少HTTP请求
连接复用: 保持HTTP连接，避免频繁建立连接
异步处理: 对于耗时操作，采用异步轮询方式

安全注意事项

mermaid

高级应用场景

1. 自动化文档处理流水线

class DocumentProcessingPipeline:
    def __init__(self, ocr_client):
        self.client = ocr_client
    
    async def process_folder(self, folder_path, output_format="txt"):
        """处理文件夹中的所有文档"""
        import os
        import asyncio
        
        results = []
        
        for filename in os.listdir(folder_path):
            if filename.lower().endswith(('.pdf', '.epub', '.xps')):
                file_path = os.path.join(folder_path, filename)
                try:
                    result_file = await asyncio.to_thread(
                        self.client.process_document, 
                        file_path, 
                        output_format
                    )
                    results.append({
                        'filename': filename,
                        'status': 'success',
                        'output': result_file
                    })
                except Exception as e:
                    results.append({
                        'filename': filename,
                        'status': 'error',
                        'error': str(e)
                    })
        
        return results

2. 实时OCR服务集成

// 实时截图OCR服务
class RealTimeOCRService {
    constructor() {
        this.umiClient = new UmiOCRWebClient();
        this.isProcessing = false;
    }

    // 从剪贴板获取图片并识别
    async recognizeFromClipboard() {
        try {
            const items = await navigator.clipboard.read();
            for (const item of items) {
                for (const type of item.types) {
                    if (type.startsWith('image/')) {
                        const blob = await item.getType(type);
                        return await this.umiClient.recognizeImage(blob);
                    }
                }
            }
            throw new Error('剪贴板中没有图片');
        } catch (error) {
            console.error('剪贴板访问错误:', error);
            throw error;
        }
    }

    // 监听剪贴板变化
    startClipboardMonitoring() {
        document.addEventListener('paste', async (event) => {
            if (this.isProcessing) return;
            
            this.isProcessing = true;
            try {
                const text = await this.recognizeFromClipboard();
                this.onTextRecognized(text);
            } catch (error) {
                console.error('OCR识别错误:', error);
            } finally {
                this.isProcessing = false;
            }
        });
    }

    onTextRecognized(text) {
        // 处理识别结果
        console.log('识别结果:', text);
        // 可以自动填充到输入框、翻译、保存等
    }
}

常见问题解答

Q1: 接口调用返回连接拒绝错误

A: 检查Umi-OCR是否正在运行，HTTP服务是否启用，防火墙是否阻止了连接。

Q2: 识别结果包含乱码

A: 确保选择了正确的语言模型，检查图片质量，尝试调整ocr.limit_side_len参数。

Q3: 处理大文件时超时

A: 增加超时时间设置，或使用文档接口的异步处理方式。

Q4: 如何提高识别准确率

使用高质量的输入图片
选择合适的语言模型
调整图像预处理参数
使用排版解析优化结果

总结

Umi-OCR的HTTP接口提供了强大而灵活的OCR集成能力，支持图片识别、文档处理、二维码操作等多种场景。通过本文的详细讲解和代码示例，开发者可以快速将OCR功能集成到自己的应用中。

关键优势：

✅ 完全离线，保护隐私
✅ 支持多种文件格式
✅ 灵活的参数配置
✅ 完善的错误处理
✅ 跨平台兼容性

无论是开发桌面应用、Web服务还是移动应用，Umi-OCR的HTTP接口都能为你的项目增添强大的OCR能力。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考