一起进步吧!感谢大家的支持和关注
需求
nba图片爬取到本地
目标地址: https://slide.sports.sina.com.cn/k/
分析
- 静态页面
- 拿到标题
- 定位 xpath 取内容text()
- 拿到图片地址再次发起请求下载
- 存储格式
源代码
from lxml import etree
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
url = 'https://slide.sports.sina.com.cn/k/'
response = requests.get(url=url,headers=headers)
response.encoding = 'gbk'
page_text = response.text
tree = etree.HTML(page_text)
dl_list = tree.xpath('//*[@id="eData"]/dl')
for dl in dl_list:
title = dl.xpath("./dd[1]/text()")[0] + ".jpg"
url = dl.xpath("./dd[2]/text()")[0]
response = requests.get(url=url, headers=headers)
response.encoding = 'gbk'
img = response.content
with open(f"./imags/{title}","wb") as f:
print(title)
f.write(img)