今天用NekoHtml 将html网页转换成DOM,再使用XPath 筛选节点时,始终不能成功,结果集始终为空。找了半天原因,才发现使用XPath 时,其中的tag必须是大写英文字母。示例代码如下:
Java语言 :
NekoHtml +XPath
String productsXpath =
"//TABLE[@id='product-list']//DIV[1]/B/A" ;
// tags should be in upper case
NodeList products;
try {
products = XPathAPI. selectNodeList (dom, productsXpath);
System. out . println ( "found: " + products. getLength ());
} catch (TransformerException e) {
e. printStackTrace ();
}
NodeList products;
try {
products = XPathAPI. selectNodeList (dom, productsXpath);
System. out . println ( "found: " + products. getLength ());
} catch (TransformerException e) {
e. printStackTrace ();
}
注意上面代码中:String productsXpath = "//TABLE[@id='product-list']//DIV[1]/B/A" ;
如果将其变成String productsXpath = "//table[@id='product-list']//div[1]/b/a" ; 则无法找到期望的节点。
今天想用xpath抓取 javaeye的内容,既然没办法抓,郁闷!