分别用BeautifulSoup和scrapy爬取某一城市天气预报
爬取网站:中国天气网 http://www.weather.com.cn
此次我们以北京为例。
1、首先我们搜索进入到北京页面:
http://www.weather.com.cn/weather/101010100.shtml?from=cityListCmp
然后分析页面源代码构造
BeautifulSoup
from urllib import request
from bs4 import BeautifulSoup
from bs4 import UnicodeDammit
url = "http://www.weather.com.cn/weather/101010100.shtml"
try:
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}
req = request.Request(url,headers=headers)
data = request.urlopen(req)
data = data.read() #爬取该网页全部内容
#print(data)
dammit = UnicodeDammit(data,["Utf-8",