python 批量保存网页中的超链接网址源代码

最新推荐文章于 2025-01-20 10:56:33 发布

LINZHENYU1996

最新推荐文章于 2025-01-20 10:56:33 发布

阅读量2k

点赞数

CC 4.0 BY-SA版权

分类专栏： python 文章标签： python url 源代码

本文链接：https://blog.youkuaiyun.com/LINZHENYU1996/article/details/63390997

python 专栏收录该内容

15 篇文章

订阅专栏

本篇博客介绍了一个简单的Python爬虫程序，该程序使用urllib2库抓取指定网页范围内的链接，并将这些链接保存到本地文件中。通过正则表达式匹配网页中的特定格式链接。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import urllib2
import time
import re
#connect to a URL
f1=open('all2.txt','a')
for page in range(5,194):
    #url= "https://www.hac-ker.net/search.php?var=That%20is%20me&page="+str(page)
    url= "http://www.example.com/archive?page="+str(page)
    website = urllib2.urlopen(url,timeout = 10)
    #read html code
    html = website.read()
    #use re.findall to get all the links
    #links = re.findall('"((http)s?://.*?)"', html)
    links = re.findall('>((http)s?://.*?)<', html)
    #ti=time.strftime('%y-%m-%d %H:%M:%S',time.localtime(time.time()))
    #f1.write(ti)
    #f1.write("\n\n")
    for i,b in links:
        f1.write(i)
        f1.write("\n")
    page+=1
    print page
    print "\n"
f1.close()