解析csv、 pdf文件

最新推荐文章于 2025-09-07 20:32:33 发布

weixin_30847939

最新推荐文章于 2025-09-07 20:32:33 发布

阅读量119

点赞数

CC 4.0 BY-SA版权

文章标签： java

原文链接：http://www.cnblogs.com/dreammyone/p/9934628.html

本文详细介绍了一种从CSV文件中解析数据并将其转换为List集合的方法，同时提供了使用Maven依赖导入PDF解析所需的jar包步骤及示例代码，演示了如何读取PDF文件并提取文本。

/**
 * 解析csv文件 到一个list中
 * 每个单元个为一个String类型记录，每一行为一个list。
 * 再将所有的行放到一个总list中
 *
 * @return
 * @throws IOException
 */
public static List<List<String>> importCsv(MultipartFile file) {
    List<List<String>> dataList = new ArrayList<>();
    BufferedReader brReader = null;
    InputStreamReader inReader = null;
    try {
        inReader = new InputStreamReader(file.getInputStream());
        brReader = new BufferedReader(inReader);
        String rec = null;//一行
        String str;//一个单元格
        while ((rec = brReader.readLine()) != null) {
            Pattern pCells = Pattern.compile("(\"[^\"]*(\"{2})*[^\"]*\")*[^,]*,");
            Matcher mCells = pCells.matcher(rec);
            List<String> cells = new ArrayList<>(); //每行记录一个list
            //读取每个单元格
            while (mCells.find()) {
                str = mCells.group();
                str = str.replaceAll("(?sm)\"?([^\"]*(\"{2})*[^\"]*)\"?.*,", "$1");
                str = str.replaceAll("(?sm)(\"(\"))", "$2");
                cells.add(str);
            }
            dataList.add(cells);
        }
    } catch (Exception e) {
    } finally {
        if (brReader != null) {
            try {
                brReader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (inReader != null) {
            try {
                inReader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    return dataList;
}

解析pdf文件
需要的jar包，配置到maven
 <dependency>
     <groupId>org.apache.pdfbox</groupId>
     <artifactId>pdfbox</artifactId>
     <version>2.0.6</version>
</dependency>
//demo
public static void main(String[] args) {
    try (PDDocument document = PDDocument.load(new File("pdf文件路径"))) {
        document.getClass();
        if(!document.isEncrypted()) {
            PDFTextStripperByArea stripper = new PDFTextStripperByArea();
            stripper.setSortByPosition(true);
            PDFTextStripper tStripper = new PDFTextStripper();
            String pdfFileInText = tStripper.getText(document);
            String[] lines = pdfFileInText.split("\\r?\\n");
            for(String line : lines) {
                System.out.println(line);
            }
        }
    } catch (InvalidPasswordException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

转载于:https://www.cnblogs.com/dreammyone/p/9934628.html