c#使用HtmlParser解析HTML

最新推荐文章于 2025-02-26 09:19:03 发布

黑鸦log

最新推荐文章于 2025-02-26 09:19:03 发布

阅读量8.4k

点赞数

分类专栏： c# 文章标签：爬虫

c# 专栏收录该内容

26 篇文章

订阅专栏

1.相关依赖的包

Winista.Text.HtmlParser
从NutGet中获取就好

2.使用方法

将html文件导入

String html= "<!DOC......"//此为String版的html代码
//进行解析
Parser parser = Parser.CreateParser(html, "utf-8");
//筛选要查找的对象 这里查找td，封装成过滤器
NodeFilter filter = new TagNameFilter("td");
//将过滤器导入筛选，得到对象列表
NodeList nodes = parser.Parse(filter);
//遍历对象列表，进行取值
for (int i = 0; i < nodes.Size(); i++)
{
    INode textnode = nodes[i];        
    ITag tag = getTag(textnode.FirstChild);
    String id= tag.GetAttribute("value") ;
    String value= textnode.ToPlainTextString();

    result.Add(new pojo.Game(id,value));

}
private static ITag getTag(INode node)
{
    if (node == null)
        return null;
    return node is ITag ? node as ITag : null;
}

官方文档：
http://www.netomatix.com/Products/DocumentManagement/HTMLParserDocs.aspx

参考：
http://www.cnblogs.com/doll-net/archive/2007/06/29/800396.html
https://blog.youkuaiyun.com/cdefg198/article/details/8004203