提取规则如下
根据URL的规则,设计一段正则表达式,提取出一段文本中的所有网址
import <span class="wp_keywordlink_affiliate"><a href="https://www.168seo.cn/tag/re" title="View all posts in re" target="_blank">re</a></span> <span class="wp_keywordlink_affiliate"><a href="https://www.168seo.cn/tag/re" title="View all posts in re" target="_blank">re</a></span>.findall('(https?://[a-zA-Z0-9\.\?/%-_]*)',r.text)
1
2
3
|
测试:
In [13]: import re,<span class="wp_keywordlink_affiliate"><a href="https://www.168seo.cn/tag/requests" title="View all posts in requests" target="_blank">requests</a></span> In [14]: url = "https://www.168seo.cn" In [15]: r = <span class="wp_keywordlink_affiliate"><a href="https://www.168seo.cn/tag/requests" title="View all posts in requests" target="_blank">requests</a></span>.get(url) In [16]: r Out[16]: <Response [200]> In [17]: re.findall('(https?://[a-zA-Z0-9\.\?/%-_]*)',r.text)
1
2
3
4
5
6
7
8
9
10
11
|
In
[
13
]
:
import
re
,
requests
In
[
14
]
:
url
=
"https://www.168seo.cn"
In
[
15
]
:
r
=
requests
.
get
(
url
)
In
[
16
]
:
r
Out
[
16
]
:
<
Response
[
200
]
>
In
[
17
]
:
re
.
findall
(
'(https?://[a-zA-Z0-9\.\?/%-_]*)'
,
r
.
text
)
|