网络爬虫（一）--抓取天气预报网站城市信息

最新推荐文章于 2025-04-19 23:25:11 发布

原创最新推荐文章于 2025-04-19 23:25:11 发布 · 3.7k 阅读

2 ·

CC 4.0 BY-SA版权

python 专栏收录该内容

11 篇文章

订阅专栏

本篇博客介绍了一个Python脚本，该脚本使用urllib.request模块从天气网站爬取省市及区县信息，并将获取的数据整理成易于使用的字典格式。此过程包括按层级抓取省份、城市和区县代码。

import urllib.request

url1='http://m.weather.com.cn/data5/city.xml'
content1=urllib.request.urlopen(url1).read().decode('utf-8')
provinces=content1.split(',')

#抓省份
for p in provinces:
p_code=p.split('|')[0]
url2='http://m.weather.com.cn/data3/city%s.xml'%p_code
content2=urllib.request.urlopen(url2).read().decode('utf-8')
citys=content2.split(',')

#抓城市，限制了每个省份只抓3个城市
for c in citys[:3]:
c_code=c.split('|')[0]
url3='http://m.weather.com.cn/data3/city%s.xml'%c_code
content3=urllib.request.urlopen(url3).read().decode('utf-8')
districts=content3.split(',')

#抓地区，限制每个城市只抓3个地区
for d in districts[:3]:
d_code=d.split('|')[0]
name=d.split('|')[1]
url4='http://m.weather.com.cn/data3/city%s.xml'%d_code
content4=urllib.request.urlopen(url4).read().decode('utf-8')
#print(content4)
code=content4.split('|')[1]
line="'%s':'%s',\n"%(name,code)

#保存文件
result='city = {\n%s}'%line
print (result)
#f = open('city2.py', 'a')
#f.write(result)
#f.close()