python第一个爬虫程序

Python3爬取百度贴吧图片

最新推荐文章于 2022-05-08 21:30:28 发布

转载最新推荐文章于 2022-05-08 21:30:28 发布 · 291 阅读

文章标签：

#python #爬虫 #爬取图片

python 专栏收录该内容

6 篇文章

订阅专栏

本文介绍了一种使用Python3从百度贴吧指定帖子中抓取图片的方法。通过urllib.request和re模块，实现了网页的读取和图片链接的正则匹配，最终下载所有匹配到的图片。

转载https://www.cnblogs.com/Axi8/p/5757270.html

把python2的部分改成python3了，爬取百度贴吧某帖子内的图片。

    #coding:utf-8
    import urllib.request#python3
    import re
    
    def get_html(url):
        page = urllib.request.urlopen(url)#打开网页
        html = page.read()#读取页面源码
        #html = html.decode(encoding='UTF-8')#python3
        html=html.decode('utf-8')#python3
        return html
        
    
    reg = r'src="(.+?\.jpg)" width'#正则表达式
    reg_img = re.compile(reg)#编译一下，运行更快
    imglist = reg_img.findall(get_html('http://tieba.baidu.com/p/1753935195'))#进行匹配
    x = 0
    for img in imglist:
        urllib.request.urlretrieve(img,'%s.jpg'% x)
        x += 1