正则html查找a href,查找<a>链接的'href'值的正则表达式

本文介绍了一种使用C#通过正则表达式和WebBrowser控件抓取HTML页面中链接的方法。该方法适用于网站管理员和ASP.NET开发者合法抓取网页信息。

b9a79786fd153e70e6726ff5ff43e503.png

叮当猫咪

尝试这个 : public partial class Form1 : Form    {        public Form1()        {            InitializeComponent();        }        private void Form1_Load(object sender, EventArgs e)        {            var res = Find(html);        }        public static List Find(string file)        {            List list = new List();            // 1.            // Find all matches in file.            MatchCollection m1 = Regex.Matches(file, @"(.*?)",                RegexOptions.Singleline);            // 2.            // Loop over each match.            foreach (Match m in m1)            {                string value = m.Groups[1].Value;                LinkItem i = new LinkItem();                // 3.                // Get href attribute.                Match m2 = Regex.Match(value, @"href=\""(.*?)\""",                RegexOptions.Singleline);                if (m2.Success)                {                    i.Href = m2.Groups[1].Value;                }                // 4.                // Remove inner tags from text.                string t = Regex.Replace(value, @"\s*<.>\s*", "",                RegexOptions.Singleline);                i.Text = t;                list.Add(i);            }            return list;        }        public struct LinkItem        {            public string Href;            public string Text;            public override string ToString()            {                return Href + "\n\t" + Text;            }        }    }  输入:  string html = " 2. "; 结果:[0] = {www.aaa.xx/xx.zz?id=xxxx&name=xxxx}[1] = {http://www.aaa.xx/xx.zz?id=xxxx&name=xxxx}C#抓取HTML链接刮HTML提取重要的页面元素。它对网站管理员和ASP.NET开发人员有许多法律用途。使用Regex类型和WebClient,我们实现了HTML的屏幕抓取。已编辑另一种简单的方法:您可以使用web browser控件href从tag 进行获取a,例如:(请参阅我的示例) public Form1()        {            InitializeComponent();            webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);        }        private void Form1_Load(object sender, EventArgs e)        {            webBrowser1.DocumentText = "";        }        void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)        {            List href = new List();            foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("a"))            {                href.Add(el.GetAttribute("href"));            }        }

</tr><tr class="frontend"><td class=ac><a name="keystone-2/Frontend"></a><a class=lfsb href="#keystone-2/Frontend">Frontend</a></td><td colspan=3></td><td><u>0<div class=tips><table class=det><tr><th>Current connection rate:</th><td>0/s</td></tr><tr><th>Current session rate:</th><td>0/s</td></tr><tr><th>Current request rate:</th><td>0/s</td></tr></table></div></u></td><td><u>25<div class=tips><table class=det><tr><th>Max connection rate:</th><td>25/s</td></tr><tr><th>Max session rate:</th><td>25/s</td></tr><tr><th>Max request rate:</th><td>25/s</td></tr></table></div></u></td><td>-</td><td>0</td><td>25</td><td>3<span class="rls">0</span>000</td><td><u><span class="rls">1</span>273<div class=tips><table class=det><tr><th>Cum. connections:</th><td><span class="rls">1</span>273</td></tr><tr><th>Cum. sessions:</th><td><span class="rls">1</span>273</td></tr><tr><th>Cum. HTTP requests:</th><td><span class="rls">1</span>273</td></tr><tr><th>- HTTP 1xx responses:</th><td>0</td></tr><tr><th>- HTTP 2xx responses:</th><td>0</td></tr><tr><th>  Compressed 2xx:</th><td>0</td><td>(0%)</td></tr><tr><th>- HTTP 3xx responses:</th><td>50</td></tr><tr><th>- HTTP 4xx responses:</th><td>860</td></tr><tr><th>- HTTP 5xx responses:</th><td>100</td></tr><tr><th>- other responses:</th><td>263</td></tr><tr><th>Intercepted requests:</th><td>0</td></tr></table></div></u></td><td></td><td></td><td>34<span class="rls">8</span>783</td><td>21<span class="rls">7</span>912<div class=tips>compression: in=0 out=0 bypassed=0 savings=0%</div></td><td>0</td><td>0</td><td>446</td><td></td><td></td><td></td><td></td><td class=ac>OPEN</td><td class=ac colspan=8></td></tr><tr class="active4"><td class=ac><a name="keystone-2/node-1"></a><u><a class=lfsb href="#keystone-2/node-1">node-1</a><div class=tips>IPv4: 172.31.0.3:35357, id: 1</div></u></td><td>0</td><td>0</td><td>-</td><td>0</td><td>9</td><td></td><td>0</td><td>8</td><td>-</td><td><u>275<div class=tips><table class=det><tr><th>Cum. sessions:</
03-14
<!-- HTML_START --> <li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2024-09-12 22:07:55</div><div class="gong-body">没有9市政排水的</p></div></div></li><li class="gong-admin layui-clear"><div class="gong-img"><img src="adminimg" /></div><div class="gong-cont"><div class="gong-data">网站客服   2024-09-12 22:27:33</div><div class="gong-body">9市政排水是什么意思?有实例吗?谢谢</p></div></div></li><li class="gong-admin layui-clear"><div class="gong-img"><img src="adminimg" /></div><div class="gong-cont"><div class="gong-data">网站客服   2024-09-13 17:29:00</div><div class="gong-body">市政的需要哪些计算?</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 17:57:07</div><div class="gong-body">44444444444444444444444455 asdfasdfasdf asdfaasdfasfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff AAAAAAAAA</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 17:57:10</div><div class="gong-body">44444444444444444444444455 asdfasdfasdf asdfaasdfasfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff AAAAAAAAA</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 17:57:21</div><div class="gong-body">44444444444444444444444455 asdfasdfasdf asdfaasdfasfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff AAAAAAAAA</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 17:57:25</div><div class="gong-body">44444444444444444444444455 asdfasdfasdf asdfaasdfasfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff AAAAAAAAA</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 18:01:39</div><div class="gong-body">11111111111111111111111111111 22222222222222222222222222222 3333333333333333333333333 4444444444444444444444444 5555555555555555555555555555</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 21:12:59</div><div class="gong-body">12121221212121212 1212121212121212</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 21:16:12</div><div class="gong-body">1111111111111111111 2222222222222222222222222 33333333333333333333333333333333 4444444444444444444444</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 21:26:38</div><div class="gong-body">121212 333 444444</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 21:27:05</div><div class="gong-body">12121212</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 21:27:11</div><div class="gong-body">121212 12121</p></div></div></li><li class="gong-admin layui-clear"><div class="gong-img"><img src="adminimg" /></div><div class="gong-cont"><div class="gong-data">网站客服   2025-09-13 22:07:27</div><div class="gong-body">1222222222222222222</p></div></div></li><li class="gong-user layui-clear"><div class="gong-img"><img src="userimg" /></div><div class="gong-cont"><div class="gong-data">userStaticName   2025-09-13 22:09:25</div><div class="gong-body">12asdfasfdasdfadfadfa<br>adfadfasdf<br>adsfadfadfadfaf<br>adfadfadfadaf<br>adfadfadf<br><p><a class="uploads" href="{#ZC_BLOG_HOST#}zb_users/upload/mochu_us_gong/2025091322092517577725654600416677.png" target="_blank"><i class="layui-icon"></i>含有附件</a></p></div></div></li> <!-- HTML_END --> 提取出上面html代码中的数据,注意如果有的有附件,请把附件的网址也提取出来 用php正则表达式提取,并写出测试程序
最新发布
09-14
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值