因为要分析页面数据,而本人正则能力实在不强,所以在页面分析时选择使用jsoup作为页面解析工具。用来处理页面数据,先上开发文档:http://www.open-open.com/jsoup/parsing-a-document.htm
public static void main(String[] args) throws InterruptedException {
try {
Document html = getDetail(url);
System.out.println("url===================="+url);
if(html==null|| html.toString().length()==0){
System.out.println("获取页面失败");
return;
}
Document document = html;
System.out.println("获取到的页面是"+document.toString());
Elements divs = document.getElementsByClass(");
Elements prices = document.getElementsByClass("price g_price g_price-highlight");
Elements icons = document.getElementsByAttributeValue("class", "icons");
Elements addrs2 = document.getElementsByAttributeValue("class", "location");
} catch (FailingHttpStatusCodeException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}/*catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}*/
}
最简单的办法是页面分析class名称,使用 document.getElementsByClass(“class名称”)方法获取元素集合,如果是多个的话可以循环遍历输出