Python3 爬取网页中的图片

最新推荐文章于 2024-04-12 15:15:37 发布

xiaofZhang

最新推荐文章于 2024-04-12 15:15:37 发布

阅读量487

点赞数

CC 4.0 BY-SA版权

分类专栏： Python 文章标签： urllib.request python3

本文链接：https://blog.youkuaiyun.com/weixin_42932072/article/details/94654846

Python 专栏收录该内容

2 篇文章

订阅专栏

亲测OK ！

import urllib.request
import re

# 要爬取的网站
req = urllib.request.urlopen('http://www.sohu.com/a/241123779_661259')
buf = req.read()
# the type of buf : <class 'bytes'>
print (len(buf), type(buf))

# change to <class 'str'>
data = buf.decode('utf-8')
print (len(data), type(data))

# 正则表达式，匹配图片格式
listurl = re.findall(r'http:.+\.jpeg', data)
print (len(listurl), type(listurl))

i = 0
for url in listurl:
    # 爬取图片保存路径, 可以自己设置，这里为当前路径
    f = open(str(i) + '.jpg', "wb")
    req = urllib.request.urlopen(url)
    buf = req.read()  # 读出文件
    f.write(buf)  # 写入文件
    i = i + 1

# 显示当前路径
import os
print (os.getcwd())