HtmlParser入门

简介:

HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.

解析HTML库,两个目的 1 transformation(化为简单的html),2 extraction(抽取web资源);


使用库时,需添加 htmllexer.jar 或 htmlparser.jar,前者属于较低级别(轻量级)的parser,后者基于前者,有所提升;

  • htmllexer使用情况:If your application requires only modest structural knowledge of the page, and is primarily concerned with individual, isolated nodes, you should consider using the lightweight lexer.
  • htmlparser使用情况:But if your application requires knowledge of the nested structure of the page, for example processing tables, you will probably want to use the full parser.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值