HTML Agility Pack 搭配 ScrapySharp,解析Html解析

本文介绍HtmlAgilityPack 1.8.0和ScrapySharp 2.6.2两个强大的.NET库。HtmlAgilityPack用于解析HTML文件并支持XPath和XSLT操作;ScrapySharp则提供了基于HtmlAgilityPack的扩展,支持使用CSS选择器来选取元素。本文详细展示了如何使用这两个库进行网页抓取和数据解析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

HtmlAgilityPack 1.8.0

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

PM> Install-Package HtmlAgilityPack -Version 1.8.0

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
doc.LoadHtml(html);  
HtmlAgilityPack.HtmlNode rootnode = doc.DocumentNode;  
HtmlAgilityPack.HtmlNode row = rootnode.SelectSingleNode("//*[@id='content']/div[3]/div[1]"); 


ScrapySharp 2.6.2

Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)


PM> Install-Package ScrapySharp -Version 2.6.2

   

 html.CssSelect("div"); //all div elements
    html.CssSelect("div.content"); //all div elements with css class 'content'
    html.CssSelect("div.widget.monthlist"); //all div elements with the both css class
    html.CssSelect("#postPaging"); //all HTML elements with the id postPaging
    html.CssSelect("div#postPaging.testClass");     // all HTML elements with the id postPaging and css class testClass
    html.CssSelect("div.content > p.para");     //p elements who are direct children of div elements with css class 'content'
    html.CssSelect("input[type = text].login");     // textbox with css class login

更多的CSS选择器使用方法可以参看W3的网页:CSS 选择器参考手册



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

JackieZhengChina

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值