link = i[‘href’]
print({‘标题’: title,
‘链接’: link
})
很常规的处理方式,抓取效果如下:
· 方式二:requests+BeautifulSoup+find_all进行信息提取
#find_all method import requests from bs4 import BeautifulSoup
headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36’}
url = ‘’ Soup = BeautifulSoup(requests.get(url=url, headers=headers).text.encode(“utf-8”), ‘lxml’)
em = Soup.find_all(‘em’, attrs={‘class’: ‘f14 l24’})for i in em:
title = i.a.get_text()
link = i.a[‘href’]
print({‘标题’: title, ‘链接’: link
})
同样是requests+BeautifulSoup的爬虫组合,但在信息提取上采用了find_all的方式。效果如下: