Java 整合实现图片文字识别

种棵二叉树

于 2025-07-01 15:45:07 发布

阅读量1.2k

点赞数 27

CC 4.0 BY-SA版权

分类专栏：工具方法文章标签： java paddle ocr

本文链接：https://blog.youkuaiyun.com/qq_57748030/article/details/149039809

工具方法专栏收录该内容

6 篇文章

订阅专栏

一、需求分析：

二、解决方案：

1、OCR云服务 2、AI大模型(如GPT-4、Claude)

3、Tesseract(开源OCR引擎) 4、PaddleOCR(百度开源框架)

三、实现

预处理:

1、OpenCV 2、Java jdk 自带Java 2D API

识别:

1、Tesseract 2、Paddle

问题：大量后端服务器应用都是基于Java语言开发的，会遇到OCR文字识别功能的需求。

在项目的最近开发过程中，我遇到了【身份证】图片识别为基础的一个业务。

以下是我的解决步骤……

一、需求分析：

> jdk1.8 SpringBoot 2.6.13 框架下的单体项目，需要实现用户上传身份证图片后，识别读取出其中姓名、身份证号等基础信息返回前端。要求识别迅速、文字识别率高、低成本。

二、解决方案：

> 通过搜索和AI，获得了以下解决方案。

1、OCR云服务

特点：

基于云端API，如阿里云OCR、百度OCR、腾讯OCR等，支持高精度识别，涵盖多语言、表格、手写体等场景。
需网络请求，依赖服务商计费（按次或包月），适合企业级应用。
Java集成通常通过HTTP调用RESTful API，需处理JSON响应。

优势：识别率高、功能丰富（如证件识别），维护成本低。
劣势：网络延迟、数据隐私风险，长期使用成本较高。

2、AI大模型(如GPT-4、Claude)

特点：

多模态模型可直接解析图片内容，无需传统OCR步骤，适合复杂场景（如理解图片上下文）。
需调用API（如OpenAI），Java通过HTTP请求交互，响应为结构化文本。
成本较高，且可能对图片细节（如小文字）识别弱于专用OCR。

优势：语义理解能力强，可处理非结构化内容。
劣势：响应速度慢，价格昂贵，适合特定需求。

3、Tesseract(开源OCR引擎)

特点：

本地化开源引擎，Java通过JNA或Tess4J封装调用，支持多语言训练。
需自行处理预处理（二值化、降噪）以提高精度，适合嵌入式或离线场景。
免费，但复杂版面（如表格）识别效果较差。

优势：无需网络，数据隐私性好，定制灵活。
劣势：配置复杂，精度依赖调参，维护成本高。

4、PaddleOCR(百度开源框架)

特点：

轻量级开源方案，支持中英文高精度识别，提供Java推理接口（如Paddle Inference）。
本地部署，可离线使用，结合PP-Structure处理表格、文档。
需模型加载（约几十MB），性能优于Tesseract。

优势：平衡精度与速度，社区活跃，持续更新。
劣势：本地资源占用较高，初次部署较复杂。

方案	精度	速度	成本	适用场景
OCR云服务	★★★★★	★★★☆	按量付费	企业级、多格式需求
AI大模型	★★★★☆	★★☆	高价	语义理解、非结构化内容
Tesseract	★★★☆	★★★☆	免费	离线、简单文档
PaddleOCR	★★★★☆	★★★★	免费	本地化、中英文混合场景

推荐选择：

快速上线选OCR云服务；
重隐私/离线选PaddleOCR；
简单需求选Tesseract；
复杂解析选AI大模型。

选择：

第三方OCR云服务按量收费，是愿意付费情况下的优选方案。
AI云服务按次收费，需要成本；本地部署，需要硬件条件；OCR识别文字，选择AI大材小用
Tesseract本地部署，开源免费。有许多项目都是选择的tesserate方案。
Paddle整合Java实现麻烦，识别效果感觉更好。我本次最终就是Paddle方案

三、实现

预处理:

待识别的图像可以先预处理再识别，会一定程度提高识别率

如：去噪、二值化、旋转校正等。

对于清晰的图像可以不做特殊处理，但是可能要做旋转处理

常用为【±5°、±10°、±90°、180°】

1、OpenCV

参考【在Java中使用OpenCV】这篇博文下载配置OpenCV，我选择的版本为4.0.0

OpenCV 图像工具类，旋转、灰度、二值化，按照需求添加处理

/** OpenCV工具类，主要做图片旋转、一般预处理*/
import org.opencv.core.*;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;
import org.springframework.lang.NonNull;

import java.io.FileNotFoundException;

public class OpencvUtil {
    static {
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
    }

    /**
     * 旋转图像并保存
     *
     * @param inputPath  输入图像路径
     * @param outputPath 输出图像路径
     * @param angle      旋转角度（支持任意角度）
     */
    public static void rotateAndSaveImage(@NonNull String inputPath, @NonNull String outputPath, double angle) {
        // 读取图像
        Mat src = Imgcodecs.imread(inputPath);
        if (src.empty()) {
            System.err.println("无法加载图像: " + inputPath);
            return;
        }

        // 旋转图像
        Mat rotated = rotateImage(src, angle);

        // 保存图像
        boolean success = Imgcodecs.imwrite(outputPath, rotated);
        if (success) {
            System.out.println("图像已保存至: " + outputPath);
        } else {
            System.err.println("图像保存失败");
        }

        // 释放资源
        src.release();
        rotated.release();
    }

    /**
     * 读取指定路径的图像并按给定角度旋转
     *
     * @param inputPath 输入图像的文件路径
     * @param angle     旋转角度（支持任意角度）
     * @return 旋转后的图像矩阵(Mat对象)，若读取失败则返回null
     */
    public static Mat getRotateImageMat(@NonNull String inputPath, double angle) throws FileNotFoundException {
        // 读取图像
        Mat src = Imgcodecs.imread(inputPath);
        if (src.empty()) {
            throw new FileNotFoundException("路径:" + inputPath);
        }
        // 旋转图像
        return rotateImage(src, angle);
    }

    /**
     * 图像预处理优化 - 仅转换为白底黑字（适合 OCR）
     */
    public static Mat preprocessForOCR(@NonNull Mat src) {
        // 1. 转换为灰度图
        Mat gray = new Mat();
        Imgproc.cvtColor(src, gray, Imgproc.COLOR_BGR2GRAY);

        // 2. 自适应二值化（直接得到白底黑字）
        Mat binary = new Mat();
        Imgproc.adaptiveThreshold(gray, binary, 255,
                Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C,
                Imgproc.THRESH_BINARY, 61, 9); // THRESH_BINARY 表示白底黑字

        // 释放资源
        gray.release();

        return binary;
    }


    /**
     * 图像旋转逻辑
     */
    private static Mat rotateImage(Mat src, double angle) {
        try {
            // 处理90倍数旋转（高效方法）
            if (angle == 90) {
                Mat rotated = new Mat();
                Core.rotate(src, rotated, Core.ROTATE_90_CLOCKWISE);
                return rotated;
            } else if (angle == -90 || angle == 270) {
                Mat rotated = new Mat();
                Core.rotate(src, rotated, Core.ROTATE_90_COUNTERCLOCKWISE);
                return rotated;
            } else if (angle == 180) {
                Mat rotated = new Mat();
                Core.rotate(src, rotated, Core.ROTATE_180);
                return rotated;
            }

            angle = -angle;
            // 任意角度旋转（使用仿射变换）
            Point center = new Point(src.cols() / 2.0, src.rows() / 2.0);
            Mat rotationMatrix = Imgproc.getRotationMatrix2D(center, angle, 1.0);

            // 计算旋转后的边界
            double radians = Math.toRadians(angle);
            double sin = Math.abs(Math.sin(radians));
            double cos = Math.abs(Math.cos(radians));
            int newWidth = (int) (src.cols() * cos + src.rows() * sin);
            int newHeight = (int) (src.cols() * sin + src.rows() * cos);

            // 调整变换矩阵以包含平移
            rotationMatrix.put(0, 2, rotationMatrix.get(0, 2)[0] + (newWidth - src.cols()) / 2.0);
            rotationMatrix.put(1, 2, rotationMatrix.get(1, 2)[0] + (newHeight - src.rows()) / 2.0);

            Mat rotated = new Mat();
            Imgproc.warpAffine(
                    src, rotated, rotationMatrix,
                    new Size(newWidth, newHeight),
                    Imgproc.INTER_LINEAR,
                    Core.BORDER_CONSTANT,
                    new Scalar(255, 255, 255) // 白色背景填充
            );

            rotationMatrix.release();
            return rotated;
        } finally {
            src.release();
        }
    }

    public static void main(String[] args) throws FileNotFoundException {
        String fileUrl = "C:\\Users\\28433\\Desktop\\id_card_demo.png";
        String fileOutUrl = "C:\\Users\\28433\\Desktop\\id_card_demo_out.png";

        //读取旋转后的图片
        Mat rotateImage = OpencvUtil.getRotateImageMat(fileUrl, 30);

        //可做预处理
        Mat preprocessedMat = OpencvUtil.preprocessForOCR(rotateImage);

        //保存处理后的图片
        Imgcodecs.imwrite(fileOutUrl, preprocessedMat);
    }

}

2、Java jdk 自带Java 2D API

Jdk版本图像工具类无需其他依赖，提供旋转、灰度、二值化处理

/** Jdk自带处理图像 工具类 */
import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.geom.AffineTransform;
import java.awt.image.AffineTransformOp;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;

/**
 * 图像工具类，提供围绕图像中心的任意角度旋转功能，
 */
public class ImageUtil {

    /**
     * 将指定路径的图像进行旋转并返回字节输出流
     *
     * @param inFilePath 输入图像路径
     * @param angle     旋转角度（支持任意角度）
     * @return 处理后的图像字节输出流
     * @throws IOException 图像读取/处理异常
     */
    public static ByteArrayOutputStream rotateImageAndGetStream(String inFilePath, double angle) throws IOException {
        // 读取图像
        BufferedImage originalImage = getBufferedImageFromFilePath(inFilePath);

        // 获取旋转后的图像
        BufferedImage rotatedImage = getRotateBufferedImage(originalImage, angle);

        ByteArrayOutputStream outputStream = getByteArrayOutputStreamFromImage(inFilePath, rotatedImage);
        return outputStream;
    }


    /**
     * 将图像保存到指定路径
     *
     * @param outFilePath   输出文件路径
     * @param rotatedImage 旋转后的图像对象
     * @throws IOException 如果图像保存失败则抛出异常
     */
    private static void saveImageToOutFilePath(String outFilePath, BufferedImage rotatedImage) throws IOException {
        String formatName = getFormatName(outFilePath); // 根据输出文件获取格式名称
        File outputFile = new File(outFilePath);
        boolean success = ImageIO.write(rotatedImage, formatName, outputFile);
        if (!success) {
            throw new IOException("图像保存失败");
        }
    }

    private static BufferedImage getBufferedImageFromFilePath(String inFilePath) throws IOException {
        // 读取图像
        File inputFile = new File(inFilePath);
        BufferedImage originalImage = ImageIO.read(inputFile);
        if (originalImage == null) {
            throw new IOException("无法加载图像: " + inFilePath);
        }
        return originalImage;
    }

    /**
     * 使用 AffineTransform 实现任意角度图像旋转
     *
     * @param bufferedImage 原始图像
     * @param angle 旋转角度
     * @return 旋转后的图像
     */
    /**
     * 使用 AffineTransform 实现任意角度图像旋转
     *
     * @param bufferedImage 原始图像
     * @param angle         旋转角度
     * @return 旋转后的图像
     */
    private static BufferedImage getRotateBufferedImage(BufferedImage bufferedImage, double angle) {
        double radians = Math.toRadians(angle);

        int width = bufferedImage.getWidth();
        int height = bufferedImage.getHeight();

        double sin = Math.abs(Math.sin(radians));
        double cos = Math.abs(Math.cos(radians));

        int newWidth = (int) Math.floor(width * cos + height * sin);
        int newHeight = (int) Math.floor(width * sin + height * cos);

        // 防止无效尺寸
        if (newWidth <= 0 || newHeight <= 0) {
            throw new IllegalArgumentException("Invalid rotation dimensions: " + newWidth + "x" + newHeight);
        }

        // 转换为标准图像类型（避免 TYPE_CUSTOM）
        BufferedImage sourceImage = new BufferedImage(
                width,
                height,
                BufferedImage.TYPE_INT_ARGB
        );
        Graphics2D g2 = sourceImage.createGraphics();
        g2.drawImage(bufferedImage, 0, 0, null);
        g2.dispose();

        // 创建仿射变换
        AffineTransform transform = new AffineTransform();
        transform.translate((newWidth - width) / 2.0, (newHeight - height) / 2.0);
        transform.rotate(radians, width / 2.0, height / 2.0); // 绕原图中心旋转

        // 创建目标图像
        BufferedImage rotatedImage = new BufferedImage(
                newWidth,
                newHeight,
                BufferedImage.TYPE_INT_ARGB
        );

        // 执行旋转操作
        AffineTransformOp op = new AffineTransformOp(transform, AffineTransformOp.TYPE_BILINEAR);
        return op.filter(sourceImage, rotatedImage);
    }
    

    /**
     * 将图像转换为灰度图
     *
     * @param originalImage 原始图像
     * @return 灰度图像
     */
    private static BufferedImage getGrayscaleBufferedImage(BufferedImage originalImage) {
        BufferedImage grayscaleImage = new BufferedImage(
                originalImage.getWidth(),
                originalImage.getHeight(),
                BufferedImage.TYPE_BYTE_GRAY
        );
        Graphics2D g2d = grayscaleImage.createGraphics();
        g2d.drawImage(originalImage, 0, 0, null);
        g2d.dispose();
        return grayscaleImage;
    }

    /**
     * 自适应二值化（局部均值法）
     *
     * // block size 越大越模糊，C 是偏移量（建议取值 5~15）
     * BufferedImage binaryImage = adaptiveBinarize(grayImage, 15, 10)
     */
    private static BufferedImage adaptiveBinarize(BufferedImage grayImage, int blockSize, int C) {
        int width = grayImage.getWidth();
        int height = grayImage.getHeight();
        BufferedImage binaryImage = new BufferedImage(width, height, BufferedImage.TYPE_BYTE_BINARY);

        for (int y = 0; y < height; y++) {
            for (int x = 0; x < width; x++) {
                int sum = 0, count = 0;
                for (int dy = -blockSize / 2; dy <= blockSize / 2; dy++) {
                    for (int dx = -blockSize / 2; dx <= blockSize / 2; dx++) {
                        int nx = x + dx;
                        int ny = y + dy;
                        if (nx >= 0 && nx < width && ny >= 0 && ny < height) {
                            int rgb = grayImage.getRGB(nx, ny);
                            int r = (rgb >> 16) & 0xFF;
                            sum += r;
                            count++;
                        }
                    }
                }
                int mean = sum / count;
                int threshold = mean - C;
                int r = (grayImage.getRGB(x, y) >> 16) & 0xFF;
                int binaryValue = (r < threshold) ? 0 : 255;
                binaryImage.setRGB(x, y, (binaryValue << 16) | (binaryValue << 8) | binaryValue);
            }
        }

        return binaryImage;
    }



    private static ByteArrayOutputStream getByteArrayOutputStreamFromImage(String inFilePath, BufferedImage rotatedImage) throws IOException {
        // 返回字节输出流
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        String formatName = getFormatName(inFilePath); // 根据输入文件获取格式名称
        ImageIO.write(rotatedImage, formatName, outputStream);
        return outputStream;
    }

    /**
     * 保存旋转后的图像到指定路径
     *
     * @param inFilePath  输入图像路径
     * @param outFilePath 输出图像路径
     * @param angle      旋转角度（支持任意角度）
     * @throws IOException 图像读取/写入异常
     */
    public static void rotateAndSaveImage(String inFilePath, String outFilePath, double angle) throws IOException {
        BufferedImage originalImage = getBufferedImageFromFilePath(inFilePath);

        // 获取旋转后的图像
        BufferedImage rotatedImage = getRotateBufferedImage(originalImage, angle);

        // 保存图像
        saveImageToOutFilePath(outFilePath, rotatedImage);
    }

    /**
     * 获取图像格式名称
     *
     * @param path 文件路径
     * @return 格式名称（如 "png", "jpg" 等）
     */
    private static String getFormatName(String path) {
        int dotIndex = path.lastIndexOf(".");
        if (dotIndex == -1 || dotIndex == path.length() - 1) {
            throw new IllegalArgumentException("无效的文件名，缺少扩展名");
        }
        return path.substring(dotIndex + 1);
    }

    public static void main(String[] args) throws IOException {
         String fileUrl = "C:\\Users\\28433\\Desktop\\id_card_demo.png";
         String fileTempUrl = "C:\\Users\\28433\\Desktop\\id_card_demo_temp.png";
         String fileOutUrl = "C:\\Users\\28433\\Desktop\\id_card_demo_out.png";

         //获取bufferedImage
        BufferedImage bufferedImageFromFilePath = getBufferedImageFromFilePath(fileUrl);

        //旋转
        BufferedImage rotateBufferedImage = getRotateBufferedImage(bufferedImageFromFilePath, 30);

        //灰度
        BufferedImage grayscaleBufferedImage = getGrayscaleBufferedImage(rotateBufferedImage);

        //二值化
        BufferedImage bufferedImage = adaptiveBinarize(grayscaleBufferedImage, 15, 10);

        //保存
        saveImageToOutFilePath(fileOutUrl,bufferedImage);
    }
}

识别:

1、Tesseract

参考【Tesseract OCR 的使用】，安装时，Additional language data (download) 不要勾选，【存在网络问题】就会无法下载附加训练语言包，安装失败，所以正常安装时不要勾选。默认带有eng训练文件，识别英文。需要识别简体中文，自己网上找chi_sim的训练文件放到tessdata目录下
识别效果一般，对某些字符识别不清，模糊或者倾斜图片识别乱码，需要自己训练。