之前用python爬取网页了,现在我们在用Java来爬取网页,两者可以对比一下,个人感觉python语言简洁清新,比较好用.
public static void main(String[] args){
NodeList rt= getNodeList("http://www.ip138.com:8080/search.asp");
System.out.println(rt.toHtml());
}
public static NodeList getNodeList(String url){
Parser parser = null;
HtmlPage visitor = null;
try {
parser = new Parser(url);
parser.setEncoding("UTF-8");
visitor = new HtmlPage(parser);
parser.visitAllNodesWith(visitor);
} catch (ParserException e) {
e.printStackTrace();
}
NodeList nodeList = visitor.getBody();
return nodeList;
}
运行结果
