来张爬取的美女镇楼
先上代码,再给你讲解
import re
import requests
import os
import easygui
(min, max) = easygui.multenterbox(fields=['起始数', '终止数'], values=['1', '100'])
min = int(min)
max = int(max)
if os.path.exists('zhiwei'):
os.chdir('zhiwei')
else:
os.mkdir('zhiwei')
os.chdir('zhiwei')
num = 1
for i in range(min, max):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3760.400 QQBrowser/10.5.4083.400'
}
urls = "http://pic.netbian.com"
url = "http://pic.netbian.com/tupian/{}.html".format(str(i))
res = requests.get(url, headers=headers)
res.encoding = 'gbk'
html = res.text
image = re.findall('<img src="(.*?)" data-pic', html)
name = re.findall('<h1>(.*?)</h1>', html)
images = [urls + i for i in image]
print(images)
for names in name:
for img in images:
file_name = names + '.jpg'
print("===========================开始下载第{0}张壁纸================================".format(num))
print(file_name)
print(img)
response = requests.get(img)
with open(file_name, 'wb') as file:
file.write(response.content)
print("下载完成")
num += 1
代码很简单,会一点爬虫的老哥都能看懂,入门级的代码
基本思路
- 创建文件夹,判断是否存在,存在就不创建,不存在就新建
if os.path.exists('zhiwei'):
os.chdir('zhiwei')
else:
os.mkdir('zhiwei')
os.chdir('zhiwei')
- 访问网址,找到图片位置,利用正则爬取到图片地址(不带前面)还要标题(为啥要爬标题?当然你想把图片命名成123456的话,那你可以不爬名字)
image = re.findall('<img src="(.*?)" data-pic', html)
name = re.findall('<h1>(.*?)</h1>', html)
- 拼接图片地址(就是加一下网站域名而已)
images = [urls + i for i in image]
- 有了文件名称,遍历循环图片名称,根据每次名称去访问图片地址进行下载
response = requests.get(img)
with open(file_name, 'wb') as file:
file.write(response.content)
print("下载完成")
- 然后就完事了
使用方法
运行代码,输入起始页,坐等图片入包