html抽取正文内容 c++,如何用C/C++获取html或shtml文件的内容?

解析html这些应该有开源代码可以借鉴的,帮老大搜索了下,找到了下面这些:

Steev's HTML Parser

Steev's HTML Parser is an HTML parsing library that builds a complete hierarchy for each element and attribute in the supplied HTML file. Each element is its own C++ class, replete with child nodes, allowing for full control and processing. An 'HTML beautifier' example is included.

网址: http://freshmeat.net/projects/steevshtmlparser/

htmlcxx

htmlcxx is a simple non-validating CSS1 and HTML parser for C++. The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tree gives exactly the same output as the original HTML document. Another key feature is an STL-like tree navigation API provided by the tree.hh template library.

网址: http://freshmeat.net/projects/htmlcxx/

Xport toolkit

Xport is a C++ template class library that can be included in any C++ project to enable the creation and generation of XHTML documents. Although it was developed with the idea of creating XHTML documents for reporting purposes, Xport can be used to create XHTML documents for many other uses as well. It can easily generate and parse (X)HTML documents and stylesheets. It is intuitive to use, and allows many options for parsing and generating documents.

网址: http://freshmeat.net/projects/xporttoolkit/

搜索的方法我是在freshmeat网站搜索关键字 html parse 搜到的,上面的三个都是开源的,最后一个貌似很好很强大,希望对老大有用。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值