1.引入PDFBoxpom依赖
2.以下是PDFBox全部功能所需要的的pom依赖,一般引入前三个依赖即可
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jempbox</artifactId>
<version>1.8.11</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>xmpbox</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>preflight</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox-tools</artifactId>
<version>2.0.0</version>
</dependency>
</dependencies>
3.读取pdf文件的代码
public static void main(String args[]) throws IOException {
//Loading an existing document
File file = new File("D:\\test\\Attachment.pdf");
PDDocument document = PDDocument.load(file);
//Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
//Retrieving text from PDF document
String text = pdfStripper.getText(document);
System.out.println(text);
//Closing the document
document.close();
}
4.Just like that!
5.有需要更多骚操作的,可以去阅读文档
https://iowiki.com/pdfbox/pdfbox_index.html
本文介绍如何通过PDFBox库加载并读取PDF文件中的文本。提供了详细的Maven依赖配置及示例代码,帮助开发者快速实现PDF文本的提取。
1295

被折叠的 条评论
为什么被折叠?



