java网络爬虫之网页邮箱采集器源码

最新推荐文章于 2021-02-25 21:08:49 发布

转载最新推荐文章于 2021-02-25 21:08:49 发布 · 421 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://blog.51cto.com/byjth/1357846

文章标签：

#java #爬虫

本文分享了一段Java网络爬虫源码，该爬虫能够从指定网站抓取并匹配电子邮箱地址。通过使用正则表达式进行邮箱匹配，展示了如何创建URL连接、读取网页内容及解析匹配结果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import java.io.*;
import java.net.*;
import java.util.regex.*;

public class YouXiangHuoQu {
public static void main(String[] args) throws Exception {
  getMail();
}
public static void getMail() throws Exception{
  URL url=new URL("http://www.byjth.com");//网页地址
  URLConnection conn=url.openConnection();
  BufferedReader bufin=new BufferedReader(new InputStreamReader(conn.getInputStream()));
  String line=null;
  String mailreg="\\w+@\\w+(\\.\\w+)+";//正则匹配
  Pattern p=Pattern.compile(mailreg);
  while((line=bufin.readLine()) != null){
   Matcher m=p.matcher(line);
   while(m.find()){
    System.out.print(m.group()+"\r\n");
   }
  }
}