清除html注释,【已解决】C#去除Html的tag且同时去除注释

最新推荐文章于 2021-07-08 15:27:31 发布

FromNowToNow

最新推荐文章于 2021-07-08 15:27:31 发布

阅读量504

点赞数

文章标签：清除html注释

本文介绍了一种使用C#和HtmlAgilityPack库去除HTML文档中的所有标签及注释的方法。通过加载HTML文档并遍历节点，实现内容过滤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【问题】

C#中，想要去除html的标签tag，且同时去除注释comment。

【解决过程】

1.参考：

去试试用：public string htmlRemoveTag(string html)

{

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

htmlDoc.LoadHtml(html);

if (htmlDoc == null)

{

return "";

}

string filteredHtml = "";

foreach (var node in htmlDoc.DocumentNode.ChildNodes)

{

filteredHtml += node.InnerText;

}

return filteredHtml;

}

结果是，可以去除所有的tag了。

但是对于html的注释： Frigidaire Mini Air Conditioner Frigidaire’s FRA052XT7 5,000 BTU 115-Volt Window-Mounted Mini-Compact Air Conditioner is perfect for rooms up to 150 square feet. It quickly cools a room on hot days and quie。。。。。。。。

却没去掉。

2.继续去除comment。

参考：

然后用：public string htmlRemoveTag(string html)

{

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

htmlDoc.LoadHtml(html);

if (htmlDoc == null)

{

return "";

}

// 1. remove all comments

//(1)get all comment nodes using XPATH

foreach (HtmlNode comment in htmlDoc.DocumentNode.SelectNodes("//comment()"))

{

//(2) remove comment node itself

comment.ParentNode.RemoveChild(comment);

}

//2. get all content

string filteredHtml = "";

foreach (var node in htmlDoc.DocumentNode.ChildNodes)

{

filteredHtml += node.InnerText;

}

return filteredHtml;

}

就实现了目的，结果是html的内容，没有tag，没有comment：” Frigidaire Mini Air Conditioner Frigidaire’s FRA052XT7 5,000 BTU 115-Volt Window-Mounted Mini-Compact Air Conditioner is perfect for rooms up to 150 square feet. It quickly cools a room on hot days and quiet operation keeps you cool without keeping you awake. This unit features mechanical rotary controls and top, full-width, 2-way air direction control. The antimicrobial mesh filter with side, slide-out access cleans the air removing harmful bacteria. Low voltage start-up conserves energy and saves you money 。。。。。。。。。。。。。。

【总结】

想要去除html的tag，并且不保留对应的comment，那么可以用：using HtmlAgilityPack;

public string htmlRemoveTag(string html)

{

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

htmlDoc.LoadHtml(html);

if (htmlDoc == null)

{

return "";

}

// 1. remove all comments

//(1)get all comment nodes using XPATH