ImageIO.frameWork 解析

本文详细介绍了ImageI/O框架的基础知识及其使用方法。包括如何创建和使用ImageSources及ImageDestinations,支持的图片格式,以及如何进行渐进式图片加载等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Image I/O 基础

Image I/O framework提供不透明数据类型(opaque data types),从CGImageSourceRef获取图片数据,将图片数据写入到CGImageDestinationRef。它提供一个范围很广的图片格式,包含web格式,动态图,原始相机数据。

Image I/O的一些其它特性
  • MAC平台最快速图片编码和解码操作
  • 加载多张图片的功能
  • 支持图片元数据
  • 有效的缓存
Create image source and image destination objects方式
  • URLs,在Image I/O中,路径是通过Core Foundation 的数据类型CFURLRef。
  • Core Foundation的对象CFDataRef和CFMutableDataRef
  • 通过Quartz data consumer(CGDataConsumerRef) 和 data provider (CGDataProviderRef) objects.

如何使用Image I/O

在工程中添加Image I/O Framework。然后在需要使用的敌方
#import <ImageIO/ImageIO.h>即可。


支持的图片格式

可以通过以下方式来获取支持的图片格式


CFArrayRef mySourceTypes = CGImageSourceCopyTypeIdentifiers();
CFShow(mySourceTypes);
CFArrayRef myDestinationTypes = CGImageDestinationCopyTypeIdentifiers();
CFShow(myDestinationTypes);

JPEG, JPEG2000, RAW, TIFF, BMP,GIF,PICT and PNG等都支持。


创建和使用Image Sources

一个Image Sources抽象出来了图片数据,通过raw memory buffer减轻开发人员对数据的处理。Image Sources包含不止一个图像,缩略图,各个图像的特征和图片文件。通过CGImageSource实现。

从Image Sources中创建一个图像

当从Image Sources中创建图片时,可以提供一个index和dictionary(利用键值对)来创建一个缩略图或者是允许缓存。
在创建图片的时候,也需提供一个index值来索引图片,因为Image Sources中可能是多张图片,如果参数时0,那么只有一个图片。可以通过CGImageSourceGetCount来获得图片在Image Sources中的数量。下面是两个例子。

下面是设置了缓存,并使用float-point的方式来编译


CGImageRef MyCreateCGImageFromFile(NSString *path)
{
    NSURL *url = [NSURL URLWithString:path];

    CGImageRef image;
    CGImageSourceRef imageSource;

    CFDictionaryRef imageOptions;
    CFStringRef imageKeys[2];
    CFTypeRef imageValues[2];
    //缓存键值对
    imageKeys[0] = kCGImageSourceShouldCache;
    imageValues[0] = (CFTypeRef)kCFBooleanTrue;
    //float-point键值对
    imageKeys[1] = kCGImageSourceShouldAllowFloat;
    imageValues[1] = (CFTypeRef)kCFBooleanTrue;
    //获取Dictionary,用来创建资源
    imageOptions = CFDictionaryCreate(NULL, (const void **) imageKeys,
                                      (const void **) imageValues, 2,
                                      &kCFTypeDictionaryKeyCallBacks,
                                      &kCFTypeDictionaryValueCallBacks);
    //资源创建
    imageSource = CGImageSourceCreateWithURL((CFURLRef)url, imageOptions);
    CFRelease(imageOptions);

    if (imageSource == NULL) {
        return NULL;
    }
    //图片获取,index=0
    image = CGImageSourceCreateImageAtIndex(imageSource, 0, NULL);

    return image;
}

接下来的例子是设置了缩略图


CGImageRef MyCreateThumbnailCGImageFromURL(NSURL * url, int imageSize)
{
    CGImageRef thumbnailImage;
    CGImageSourceRef imageSource;

    CFDictionaryRef imageOptions;
    CFStringRef imageKeys[3];
    CFTypeRef imageValues[3];

    CFNumberRef thumbnailSize;
    //先判断数据是否存在
    imageSource = CGImageSourceCreateWithURL((CFURLRef)url, NULL);

    if (imageSource == NULL) {
        fprintf(stderr, "Image source is NULL.");
        return  NULL;
    }
    //创建缩略图等比缩放大小,会根据长宽值比较大的作为imageSize进行缩放
    thumbnailSize = CFNumberCreate(NULL, kCFNumberIntType, &imageSize);

    imageKeys[0] = kCGImageSourceCreateThumbnailWithTransform;
    imageValues[0] = (CFTypeRef)kCFBooleanTrue;

    imageKeys[1] = kCGImageSourceCreateThumbnailFromImageIfAbsent;
    imageValues[1] = (CFTypeRef)kCFBooleanTrue;
    //缩放键值对
    imageKeys[2] = kCGImageSourceThumbnailMaxPixelSize;
    imageValues[2] = (CFTypeRef)thumbnailSize;

    imageOptions = CFDictionaryCreate(NULL, (const void **) imageKeys,
                                      (const void **) imageValues, 3,
                                      &kCFTypeDictionaryKeyCallBacks,
                                      &kCFTypeDictionaryValueCallBacks);
    //获取缩略图
    thumbnailImage = CGImageSourceCreateThumbnailAtIndex(imageSource, 0, imageOptions)
    CFRelease(imageOptions);
    CFRelease(thumbnailSize);
    CFRelease(imageSource);  
    if (thumbnailImage == NULL) {
        return NULL;

    return thumbnailImage;
}


渐进式图片加载

当图片从网络中获取的时候,可能由于过大,数据缓慢,这时候就需要渐进式加载图片来显示。主要通过CFData对象来实现:

  1. 创建一个CFData去添加image data.
  2. 创建一个渐进式图片资源,通过 CGImageSourceCreateIncremental
  3. 获取图片数据到CFData中
  4. 调用CGImageSourceUpdateData函数,传递CFData和一个bool值,去描述这个数据是否包含全部图片数据或者只是部分数据。无论什么情况,这个data包含已经积累的全部图片文件。
  5. 如果已经有足够的图片数据,可以通过函数绘制CGImageSourceCreateImageAtIndex部分图片,然后记得要Release掉它。
  6. 检查是否已经有全部的图片数据通过使用CGImageSourceGetStatusAtIndex函数。如果图片是完整的,函数返回值为kCGImageStatusComplete。否则继续3,4步骤,直到获得全部数据。
  7. Release掉渐进式增长的image source。

Image Destinations的使用

设置属性

一个图形目标(Image Destinations)抽象出来了数据写入,并且消除了用户通过原始数据流来处理数据的需求。一个 image destination相当于一张图片或者一组图片,对于每张图片它可以包含缩略图和属性,在通过(URL, CFData,Quartz data consumer)创建CGImageDestination对象之后.可以向里面添加图片数据和设置图片属性。当设置完成之后,调用CGImageDestinationFinalize函数来结束。
给目标图片设置属性


float compression = 1.0; //设置压缩比
int orientation = 4; // 设置朝向bottom, left.
CFStringRef myKeys[3];
CFTypeRef   myValues[3];
CFDictionaryRef myOptions = NULL;
myKeys[0] = kCGImagePropertyOrientation;
myValues[0] = CFNumberCreate(NULL, kCFNumberIntType, &orientation);
myKeys[1] = kCGImagePropertyHasAlpha;
myValues[1] = kCFBooleanTrue;
myKeys[2] = kCGImageDestinationLossyCompressionQuality;
myValues[2] = CFNumberCreate(NULL, kCFNumberFloatType, &compression);
myOptions = CFDictionaryCreate( NULL, (const void **)myKeys, (const void **)myValues, 3,
                      &kCFTypeDictionaryKeyCallBacks, &kCFTypeDictionaryValueCallBacks);
// 记得Release不需要的变量
将图片数据写入Image Destination

首先需要创建一个Image Destination,可以通过三种方式CGImageDestinationCreateWithURL, CGImageDestinationCreateWithData, or CGImageDestinationCreateWithDataConsumer(目前不知道URL路径怎么设置才能存入...orz)


- (void) writeCGImage: (CGImageRef) image toURL: (NSURL*) url withType: (CFStringRef) imageType andOptions: (CFDictionaryRef) options
{
   CGImageDestinationRef myImageDest = CGImageDestinationCreateWithURL((CFURLRef)url, imageType, 1, nil);
   CGImageDestinationAddImage(myImageDest, image, options);//添加数据和图片
   CGImageDestinationFinalize(myImageDest);//最后调用,完成数据写入
   CFRelease(myImageDest);
}


文/c_xiaoqiang(简书作者)
原文链接:http://www.jianshu.com/p/4dcd6e4bdbf0
著作权归作者所有,转载请联系作者获得授权,并标注“简书作者”。



public static void main(String[] args) throws TesseractException { // 识别图片的路径(修改为自己的图片路径) String path = "D:\\pdf\\Snipaste_2025-07-17_07-59-02.jpg"; // 语言库位置(修改为跟自己语言库文件夹的路径) String lagnguagePath = "D:\\maven_use\\lingxi-lhc\\lingxi-ai-extend\\lingxi-ai-comparison\\src\\main\\resources\\tessdata"; File file = new File(path); ITesseract instance = new Tesseract(); // 设置训练库的位置 instance.setDatapath(lagnguagePath); instance.setLanguage("eng"); instance.setPageSegMode(6); instance.setOcrEngineMode(1); String result = instance.doOCR(file); System.out.println(result); }C:\Java\jdk-17\bin\java.exe "-javaagent:D:\work\IntelliJ IDEA 2024.3\lib\idea_rt.jar=51929:D:\work\IntelliJ IDEA 2024.3\bin" -Dfile.encoding=UTF-8 -classpath D:\maven_use\lingxi-lhc\lingxi-ai-extend\lingxi-ai-comparison\target\classes;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-web\3.4.4\spring-boot-starter-web-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter\3.4.4\spring-boot-starter-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot\3.4.4\spring-boot-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-autoconfigure\3.4.4\spring-boot-autoconfigure-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-logging\3.4.4\spring-boot-starter-logging-3.4.4.jar;D:\maven_use\repository\ch\qos\logback\logback-classic\1.5.18\logback-classic-1.5.18.jar;D:\maven_use\repository\ch\qos\logback\logback-core\1.5.18\logback-core-1.5.18.jar;D:\maven_use\repository\org\apache\logging\log4j\log4j-to-slf4j\2.24.3\log4j-to-slf4j-2.24.3.jar;D:\maven_use\repository\org\apache\logging\log4j\log4j-api\2.24.3\log4j-api-2.24.3.jar;D:\maven_use\repository\org\slf4j\jul-to-slf4j\2.0.17\jul-to-slf4j-2.0.17.jar;D:\maven_use\repository\jakarta\annotation\jakarta.annotation-api\2.1.1\jakarta.annotation-api-2.1.1.jar;D:\maven_use\repository\org\springframework\spring-core\6.2.5\spring-core-6.2.5.jar;D:\maven_use\repository\org\springframework\spring-jcl\6.2.5\spring-jcl-6.2.5.jar;D:\maven_use\repository\org\yaml\snakeyaml\2.3\snakeyaml-2.3.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-json\3.4.4\spring-boot-starter-json-3.4.4.jar;D:\maven_use\repository\com\fasterxml\jackson\datatype\jackson-datatype-jdk8\2.18.3\jackson-datatype-jdk8-2.18.3.jar;D:\maven_use\repository\com\fasterxml\jackson\datatype\jackson-datatype-jsr310\2.18.3\jackson-datatype-jsr310-2.18.3.jar;D:\maven_use\repository\com\fasterxml\jackson\module\jackson-module-parameter-names\2.18.3\jackson-module-parameter-names-2.18.3.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-tomcat\3.4.4\spring-boot-starter-tomcat-3.4.4.jar;D:\maven_use\repository\org\apache\tomcat\embed\tomcat-embed-core\10.1.39\tomcat-embed-core-10.1.39.jar;D:\maven_use\repository\org\apache\tomcat\embed\tomcat-embed-el\10.1.39\tomcat-embed-el-10.1.39.jar;D:\maven_use\repository\org\apache\tomcat\embed\tomcat-embed-websocket\10.1.39\tomcat-embed-websocket-10.1.39.jar;D:\maven_use\repository\org\springframework\spring-web\6.2.5\spring-web-6.2.5.jar;D:\maven_use\repository\org\springframework\spring-beans\6.2.5\spring-beans-6.2.5.jar;D:\maven_use\repository\io\micrometer\micrometer-observation\1.14.5\micrometer-observation-1.14.5.jar;D:\maven_use\repository\io\micrometer\micrometer-commons\1.14.5\micrometer-commons-1.14.5.jar;D:\maven_use\repository\org\springframework\spring-webmvc\6.2.5\spring-webmvc-6.2.5.jar;D:\maven_use\repository\org\springframework\spring-aop\6.2.5\spring-aop-6.2.5.jar;D:\maven_use\repository\org\springframework\spring-context\6.2.5\spring-context-6.2.5.jar;D:\maven_use\repository\org\springframework\spring-expression\6.2.5\spring-expression-6.2.5.jar;D:\maven_use\repository\org\apache\pdfbox\pdfbox\2.0.29\pdfbox-2.0.29.jar;D:\maven_use\repository\org\apache\pdfbox\fontbox\2.0.29\fontbox-2.0.29.jar;D:\maven_use\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;D:\maven_use\repository\org\json\json\20231013\json-20231013.jar;D:\maven_use\repository\com\hankcs\hanlp\portable-1.8.4\hanlp-portable-1.8.4.jar;D:\maven_use\lingxi-lhc\lingxi-ai-common\lingxi-ai-common-core\target\classes;D:\maven_use\repository\org\springframework\spring-context-support\6.2.5\spring-context-support-6.2.5.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-validation\3.4.4\spring-boot-starter-validation-3.4.4.jar;D:\maven_use\repository\org\hibernate\validator\hibernate-validator\8.0.2.Final\hibernate-validator-8.0.2.Final.jar;D:\maven_use\repository\jakarta\validation\jakarta.validation-api\3.0.2\jakarta.validation-api-3.0.2.jar;D:\maven_use\repository\com\fasterxml\classmate\1.7.0\classmate-1.7.0.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-starter-aop\3.4.4\spring-boot-starter-aop-3.4.4.jar;D:\maven_use\repository\org\aspectj\aspectjweaver\1.9.23\aspectjweaver-1.9.23.jar;D:\maven_use\repository\org\apache\commons\commons-lang3\3.17.0\commons-lang3-3.17.0.jar;D:\maven_use\repository\jakarta\servlet\jakarta.servlet-api\6.0.0\jakarta.servlet-api-6.0.0.jar;D:\maven_use\repository\cn\hutool\hutool-core\5.8.35\hutool-core-5.8.35.jar;D:\maven_use\repository\cn\hutool\hutool-http\5.8.35\hutool-http-5.8.35.jar;D:\maven_use\repository\cn\hutool\hutool-extra\5.8.35\hutool-extra-5.8.35.jar;D:\maven_use\repository\cn\hutool\hutool-setting\5.8.35\hutool-setting-5.8.35.jar;D:\maven_use\repository\cn\hutool\hutool-log\5.8.35\hutool-log-5.8.35.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-configuration-processor\3.4.4\spring-boot-configuration-processor-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-properties-migrator\3.4.4\spring-boot-properties-migrator-3.4.4.jar;D:\maven_use\repository\org\springframework\boot\spring-boot-configuration-metadata\3.4.4\spring-boot-configuration-metadata-3.4.4.jar;D:\maven_use\repository\com\vaadin\external\google\android-json\0.0.20131108.vaadin1\android-json-0.0.20131108.vaadin1.jar;D:\maven_use\repository\io\github\linpeilie\mapstruct-plus-spring-boot-starter\1.4.6\mapstruct-plus-spring-boot-starter-1.4.6.jar;D:\maven_use\repository\io\github\linpeilie\mapstruct-plus\1.4.6\mapstruct-plus-1.4.6.jar;D:\maven_use\repository\org\mapstruct\mapstruct\1.5.5.Final\mapstruct-1.5.5.Final.jar;D:\maven_use\repository\io\github\linpeilie\mapstruct-plus-object-convert\1.4.6\mapstruct-plus-object-convert-1.4.6.jar;D:\maven_use\repository\org\lionsoul\ip2region\2.7.0\ip2region-2.7.0.jar;D:\maven_use\repository\net\sourceforge\tess4j\tess4j\5.8.0\tess4j-5.8.0.jar;D:\maven_use\repository\net\java\dev\jna\jna\5.13.0\jna-5.13.0.jar;D:\maven_use\repository\com\github\jai-imageio\jai-imageio-core\1.4.0\jai-imageio-core-1.4.0.jar;D:\maven_use\repository\org\apache\pdfbox\pdfbox-tools\2.0.29\pdfbox-tools-2.0.29.jar;D:\maven_use\repository\org\apache\pdfbox\pdfbox-debugger\2.0.29\pdfbox-debugger-2.0.29.jar;D:\maven_use\repository\org\apache\pdfbox\jbig2-imageio\3.0.4\jbig2-imageio-3.0.4.jar;D:\maven_use\repository\commons-io\commons-io\2.15.0\commons-io-2.15.0.jar;D:\maven_use\repository\net\sourceforge\lept4j\lept4j\1.18.1\lept4j-1.18.1.jar;D:\maven_use\repository\org\jboss\jboss-vfs\3.2.17.Final\jboss-vfs-3.2.17.Final.jar;D:\maven_use\repository\org\jboss\logging\jboss-logging\3.6.1.Final\jboss-logging-3.6.1.Final.jar;D:\maven_use\repository\org\slf4j\slf4j-api\2.0.17\slf4j-api-2.0.17.jar;D:\maven_use\repository\org\openpnp\opencv\4.5.5-1\opencv-4.5.5-1.jar;D:\maven_use\repository\com\fasterxml\jackson\core\jackson-databind\2.18.3\jackson-databind-2.18.3.jar;D:\maven_use\repository\com\fasterxml\jackson\core\jackson-annotations\2.18.3\jackson-annotations-2.18.3.jar;D:\maven_use\repository\com\fasterxml\jackson\core\jackson-core\2.18.3\jackson-core-2.18.3.jar;D:\maven_use\repository\com\fasterxml\jackson\dataformat\jackson-dataformat-xml\2.18.3\jackson-dataformat-xml-2.18.3.jar;D:\maven_use\repository\org\codehaus\woodstox\stax2-api\4.2.2\stax2-api-4.2.2.jar;D:\maven_use\repository\com\fasterxml\woodstox\woodstox-core\7.0.0\woodstox-core-7.0.0.jar;D:\maven_use\repository\jakarta\xml\bind\jakarta.xml.bind-api\4.0.0\jakarta.xml.bind-api-4.0.0.jar;D:\maven_use\repository\jakarta\activation\jakarta.activation-api\2.1.3\jakarta.activation-api-2.1.3.jar;D:\maven_use\repository\org\glassfish\jaxb\jaxb-runtime\4.0.2\jaxb-runtime-4.0.2.jar;D:\maven_use\repository\org\glassfish\jaxb\jaxb-core\4.0.5\jaxb-core-4.0.5.jar;D:\maven_use\repository\org\eclipse\angus\angus-activation\2.0.2\angus-activation-2.0.2.jar;D:\maven_use\repository\org\glassfish\jaxb\txw2\4.0.5\txw2-4.0.5.jar;D:\maven_use\repository\com\sun\istack\istack-commons-runtime\4.1.2\istack-commons-runtime-4.1.2.jar;D:\maven_use\repository\org\ehcache\ehcache\3.10.8\ehcache-3.10.8.jar;D:\maven_use\repository\javax\cache\cache-api\1.1.1\cache-api-1.1.1.jar com.luxsan.service.bb 18:42:34.862 [main] ERROR net.sourceforge.tess4j.Tesseract -- Not a JPEG file: starts with 0x89 0x50 javax.imageio.IIOException: Not a JPEG file: starts with 0x89 0x50 at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.readImageHeader(Native Method) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.readNativeHeader(JPEGImageReader.java:739) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(JPEGImageReader.java:354) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.getNumImagesOnThread(JPEGImageReader.java:434) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.getNumImages(JPEGImageReader.java:400) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:236) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:210) at com.luxsan.service.bb.main(bb.java:32) Exception in thread "main" net.sourceforge.tess4j.TesseractException: javax.imageio.IIOException: Not a JPEG file: starts with 0x89 0x50 at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:261) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:210) at com.luxsan.service.bb.main(bb.java:32) Caused by: javax.imageio.IIOException: Not a JPEG file: starts with 0x89 0x50 at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.readImageHeader(Native Method) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.readNativeHeader(JPEGImageReader.java:739) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(JPEGImageReader.java:354) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.getNumImagesOnThread(JPEGImageReader.java:434) at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.getNumImages(JPEGImageReader.java:400) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:236) ... 2 more报错为什么
07-18
package com.luxsan.service; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; import com.hankcs.hanlp.HanLP; import com.hankcs.hanlp.seg.common.Term; import com.luxsan.common.core.utils.MessageUtils; import com.luxsan.domain.ValidationResult; import jakarta.annotation.PostConstruct; import lombok.RequiredArgsConstructor; import net.sourceforge.tess4j.Tesseract; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import org.springframework.stereotype.Service; import org.springframework.web.multipart.MultipartFile; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.File; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.StandardCopyOption; import java.util.*; @RequiredArgsConstructor @Service public class ReadFileContentService { private final ObjectMapper objectMapper = new ObjectMapper(); private Tesseract tesseract; @PostConstruct public void initOcrEngine() { tesseract = new Tesseract(); //语言包路径和支持语言 tesseract.setDatapath("D:\\maven_use\\lingxi-lhc\\lingxi-ai-extend\\lingxi-ai-comparison\\src\\main\\resources\\tessdata"); tesseract.setLanguage("eng+chi_sim"); tesseract.setPageSegMode(6); // 自动页面分割 tesseract.setOcrEngineMode(1); // LSTM引擎 } /** * 支持PDF读取文件和图片ocr */ public String extractContent(MultipartFile file) { String contentType = file.getContentType(); String fileName = file.getOriginalFilename().toLowerCase(); if (contentType == null) { return "不支持的文件类型: " + contentType; } if (fileName.endsWith(".pdf")) { return readPdfText(file); } return extractImageText(file); } /** * 读取PDF文本内容 * * @param file * @return */ public String readPdfText(MultipartFile file) { try (PDDocument doc = PDDocument.load(file.getInputStream())) { PDFTextStripper stripper = new PDFTextStripper(); // 设置行分隔符 stripper.setLineSeparator("\n"); // 设置字符间距 stripper.setSortByPosition(true); String rawText = stripper.getText(doc); System.out.println("pdf内容" + rawText); return rawText.trim(); } catch (Exception e) { return MessageUtils.message("file.read.pdf.error"); } } /** * OCR识别图片内容 */ private String extractImageText(MultipartFile file) { try (InputStream is = file.getInputStream()) { // 将输入流直接转为BufferedImage BufferedImage image = ImageIO.read(is); if (image == null) { return MessageUtils.message("Image.parsing.failed"); } // 直接对BufferedImage进行OCR String result = tesseract.doOCR(image).replaceAll("\\s+", " ").trim(); return result; } catch (Exception e) { return MessageUtils.message("file.read.picture.error"); } } // private String getFileExtension(String filename) { // if (filename == null) return ".tmp"; // int dotIndex = filename.lastIndexOf('.'); // return (dotIndex == -1) ? ".tmp" : filename.substring(dotIndex); // } /** * 解析json */ public JsonNode parseJson(String jsonContent) throws Exception { return this.objectMapper.readTree(jsonContent); } public List<ValidationResult> compareContent(String pdfText, JsonNode jsonConfig) { List<ValidationResult> results = new ArrayList<>(); String cleanPdf = pdfText.replaceAll("\\s+", ""); // 预处理PDF文本 //处理JSON结构对象/数组 JsonNode dataNode = jsonConfig.isArray() && jsonConfig.size() > 0 ? jsonConfig.get(0) : jsonConfig; // 高效遍历JSON字段 dataNode.fields().forEachRemaining(entry -> { String key = entry.getKey(); String value = entry.getValue().asText().replaceAll("\\s+", ""); if (!value.isEmpty()) { boolean found = cleanPdf.contains(value); results.add(new ValidationResult( "FIELD", key, value, found ? "Found" : "Not Found", found )); } }); return results; } public JsonNode parsePipeSeparatedDataToJson(String inputData) throws Exception { Map<String, String> dataMap = parsePipeSeparatedData(inputData); return objectMapper.valueToTree(dataMap); } //解析分隔数据 public Map<String, String> parsePipeSeparatedData(String fileCONTENT) { //处理转义的换行符 fileCONTENT = fileCONTENT.replace("\\n", "\n").replaceAll("\\|+\"$", "").trim();; Map<String, String> dataMap = new LinkedHashMap<>(); String[] lines = fileCONTENT.split("\n"); if (lines.length >= 2) { String[] headers = lines[0].split("\\|"); String[] values = lines[1].split("\\|"); int minLength = Math.min(headers.length, values.length); for (int i = 0; i < minLength; i++) { dataMap.put(headers[i], values[i]); } } return dataMap; } //判断是不是json public boolean isPipeSeparatedData(String inputData) { return inputData.contains("|"); } // 比较和校验数据 public List<ValidationResult> compareAndValidate(String fileContent, JsonNode jsonConfig) { List<ValidationResult> results = new ArrayList<>(); Map<String, String> pipeDataMap = objectMapper.convertValue(jsonConfig, Map.class); fileContent = fileContent.replaceAll("\\s+", ""); for (Map.Entry<String, String> entry : pipeDataMap.entrySet()) { String key = entry.getKey(); String value = entry.getValue(); if (!value.isEmpty()) { value = value.replaceAll("\\s+", ""); boolean found = fileContent.contains(value); results.add(new ValidationResult( "FIELD", key, value, found ? "Found" : "Not Found", found )); } } return results; } } 哪里需要改进待优化
07-17
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值