我想从我的html中仅提取文字
var sb = new StringBuilder();
doc.LoadHtml(inputHTml);
foreach (var node in Doc.DocumentNode.ChildNodes)
{
if (node.Name == "strong" || node.Name == "#text"
|| node.Name == "br" || node.Name == "div"
|| node.Name == "p" || node.Name != "img")
{
sb.Append(node.InnerHtml);
}
}
现在在我的node.InnerHtml中是这个html:
1
text
, text
text
2
text text text.
href="/content/essie-classics">text
src="" alt="" title="" height="100">
src="http://example.com/img_8862.jpg"
alt="" title="" height="100">
如何删除img和标签?
img标签没有关闭标签