java 解析doc,docx,rtf格式文件代码
解析读取文件前首先将以下jar包引入程序中:

- doc 解析
String buffer="";
InputStream is = new FileInputStream(dest);
WordExtractor ex = new WordExtractor(is);
buffer = ex.getText();
System.out.println(buffer);
2.docx 解析
String buffer="";
FileInputStream fis = new FileInputStream(dest);
XWPFDocument xdoc = new XWPFDocument(fis);
XWPFWordExtractor extractor = new XWPFWordExtractor(xdoc);
buffer = extractor.getText();
System.out.println(buffer);
3.rtf 解析
String result = null;
try {
DefaultStyledDocument styledDoc = new DefaultStyledDocument();
// 创建文件输入流
InputStream streamReader = new FileInputStream(dest);
new RTFEditorKit().read(streamReader, styledDoc, 0);
//以 ISO-8859-1的编码形式获取字节byte[], 并以 GBK 的编码形式生成字符串
result = new String(styledDoc.getText(0, styledDoc.getLength()).getBytes(“ISO8859-1”),“GBK”);
} catch (IOException e) {
e.printStackTrace();
} catch (BadLocationException e) {
e.printStackTrace();
}
System.out.println(result);
本文介绍如何使用Java代码解析DOC、DOCX和RTF格式的文档,涵盖使用特定库提取文本的方法,适用于需要从这些格式文件中读取内容的场景。
5038





