方法很简单,利用著名的Runtime.getRuntime()即可,如下:

/** *//**
* @param filePath pdf文件路径
* @return
*/

public String getPdfContent(String filePath)...{
String excute="pdftotext";


String[] cmd=new String[]...{excute, "-enc", "UTF-8", "-q", filePath,"-"};
Process p=null;

try ...{
p=Runtime.getRuntime().exec(cmd);

} catch (IOException e) ...{
e.printStackTrace();
}

BufferedInputStream bis=new BufferedInputStream(p.getInputStream());

InputStreamReader reader=null;


try ...{
reader=new InputStreamReader(bis,"UTF-8");

} catch (UnsupportedEncodingException e1) ...{
e1.printStackTrace();
}

StringBuffer sb=new StringBuffer();


try ...{
BufferedReader br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();

while (line != null) ...{
sb.append(line);
sb.append(" ");
line = br.readLine();
}

} catch (Exception e) ...{
e.printStackTrace();
}
return sb.toString();
}

/** *//**
* @param filePath pdf文件路径
* @return
*/
public String getPdfContent(String filePath)...{
String excute="pdftotext";

String[] cmd=new String[]...{excute, "-enc", "UTF-8", "-q", filePath,"-"};
Process p=null;
try ...{
p=Runtime.getRuntime().exec(cmd);
} catch (IOException e) ...{
e.printStackTrace();
}
BufferedInputStream bis=new BufferedInputStream(p.getInputStream());
InputStreamReader reader=null;

try ...{
reader=new InputStreamReader(bis,"UTF-8");
} catch (UnsupportedEncodingException e1) ...{
e1.printStackTrace();
}
StringBuffer sb=new StringBuffer();

try ...{
BufferedReader br = new BufferedReader(reader);
String line = br.readLine();
sb = new StringBuffer();
while (line != null) ...{
sb.append(line);
sb.append(" ");
line = br.readLine();
}
} catch (Exception e) ...{
e.printStackTrace();
}
return sb.toString();
}
PDF转文本方法
本文介绍了一种使用Java运行外部命令的方式将PDF文件转换为纯文本的方法。通过Runtime.getRuntime().exec(cmd)执行pdftotext命令行工具,可以将PDF文件的内容读取并转化为UTF-8编码的字符串。
1152





