开始写了如下的代码:
import requests
req = requests.get("https://i.meizitu.net/2019/01/13d01.jpg")
#这个地址来源于复制“图片地址”
with open("C://Users//Administrator//Desktop//girl.jpg","wb") as f:
f.write(req.content)
发现桌面只创建了一个空文件。加入代码查找原因:
status_code = req.status_code
print("req.status_code:%d"%status_code)
>>>req.status_code:403
网站禁止了爬取访问。伪装一下:
import requests
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.2107.204 SafarMozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
req = requests.get("https://i.meizitu.net/2019/01/13d01.jpg",headers = header)
with open("C://Users//Administrator//Desktop//girl.jpg","wb") as f:
f.write(req.content)
还是不行,桌面仍然是空文件。查了资料发现还有Referer这个参数可以用于反爬虫。
那就把这个参数加进去,修改headers:
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.2107.204 SafarMozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',"Referer" : "http://www.mzitu.com/",'Host' : 'i.meizitu.net'}
#还可以设置host
运行下:
import requests
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.2107.204 SafarMozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',"Referer" : "http://www.mzitu.com/",'Host' : 'i.meizitu.net'}
req = requests.get("https://i.meizitu.net/2019/01/13d01.jpg",headers = header)
with open("C://Users//Administrator//Desktop//girl.jpg","wb") as f:
f.write(req.content)
成功了,girl已在桌面了。
自学速度好慢呐,这个问题搞了半天。