获取目录下所有html文件

原创于 2017-08-22 11:26:12 发布 · 3.3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#jsoup #java #遍历文件 #html文件

杂项专栏收录该内容

11 篇文章

订阅专栏

本文介绍了一种使用Java程序遍历指定路径下所有HTML文件的方法，并通过Jsoup库来解析这些HTML文件的内容。该过程涉及递归地访问文件夹结构，识别HTML文件并读取其数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

因为需要解析一些html，所以要遍历各个目录下的所有html

方法：

private static void GetFile(String path){
		  File file=new File(path);
		  File[] tempList = file.listFiles();
		  //System.out.println("该目录下对象个数："+tempList.length);
		  for (int i = 0; i < tempList.length; i++) {
		   if (tempList[i].isFile()) {
			   if(tempList[i].toString().endsWith("htm")){
				   System.out.println("进入文件："+tempList[i]);
				   
				  
				try {
					GetHtml(tempList[i].toString());
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
				 System.out.println("离开文件："+tempList[i]);
			   }
		   }
		   if (tempList[i].isDirectory()) {
			   GetFile(tempList[i].toString());
		    //System.out.println("文件夹："+tempList[i]);
		   }
		  }
	}

接下来就是使用jsoup了

private static void GetHtml(String filename) throws IOException {

File input = new File(filename);
Document doc = Jsoup.parse(input, "ISO-8859-1", "");

。。。。

。。。。。