Python|使用 scrapy 框架爬取山东各城市天气预报

最新推荐文章于 2022-12-15 02:27:45 发布

cw11lq

最新推荐文章于 2022-12-15 02:27:45 发布

阅读量4.6k

点赞数 6

分类专栏： Python 文章标签： python scrapy 爬虫

本文链接：https://blog.youkuaiyun.com/shmilylqd/article/details/125740111

版权

实验内容：
安装 Python 扩展库 scrapy ，然后编写爬虫项目，从网站 http://www.weather.com.cn/shandong/index.shtml 爬取山东各城市的天气预报数据，并把爬取到的天气数据写入本地文本 weather.txt。
实验步骤：

在命令提示符环境使用 pip install scrapy 命令安装 Python 扩展库 scrapy。 2. 在命令提示符环境使用 scrapy startproject sdWeatherSpider 创建爬虫项目。 3. 进入爬虫项目文件夹，然后执行命令 scrapy genspider everyCityinSD.py www.weather.com.cn 创建爬虫程序。 4. 使用浏览器打开网址 http://www.weather.com.cn/shandong/index.shtml，找到下面位置
实验步骤：
在命令提示符环境使用 pip install scrapy 命令安装 Python 扩展库 scrapy。
在命令提示符环境使用 scrapy startproject sdWeatherSpider 创建爬虫项目。
进入爬虫项目文件夹，然后执行命令 scrapy genspider everyCityinSD.py www.weather.com.cn 创建爬虫程序。
使用浏览器打开网址 http://www.weather.com.cn/shandong/index.shtml，找到下面位置
5.在页面上单击鼠标右键，选择“查看网页源代码”，然后找到与“城市预报列表”对应的位置。

6.选择并打开山东省内任意城市的天气预报页面，此处以烟台为例。

7.在页面上单击鼠标右键，选择“查看网页源代码”，找到与上图中天气预报相对应的位置。

8.修改items.py文件，定义要爬取的内容。

import scrapy
class SdweatherspiderItem(scrapy.Item):
       #definethefieldsforyouritemherelike:
       #name=scrapy.Field()
       city=scrapy.Field()
       weather=scrapy.Field()

修改爬虫文件 everyCityinSD.py，定义如何爬取内容，其中用到的规则参考前面对页面的分析，如果无法正常运行，有可能是网页结构有变化，可以回到前面的步骤重新分析网页源代码。

from re import findall 
from urllib.request import urlopen 
import scrapy

最低0.47元/天解锁文章