A Great HTML Parser-Html Agility Pack

HtmlAgilityPack解析库

What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature

Sample applications:

  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.
  • Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.
  • Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.


There is no dependency on anything else than .Net's XPATH implementation. There is no dependency on Internet Explorer's MSHTML dll or W3C's HTML tidy or ActiveX / COM object, or anything like that. There is also no adherence to XHTML or XML, although you can actually produce XML using the tool. The version posted here on CodePlex is for the .NET Framework 2.0. If you need the old version, please go to the old page or drop me a note.

Examples - Code Examples

 

The home entry of the open source project is http://htmlagilitypack.codeplex.com/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值