配置文件
1、/etc/scrapy.cfg
or c:\scrapy\scrapy.cfg
(system-wide),
2、~/.config/scrapy.cfg
($XDG_CONFIG_HOME
) and ~/.scrapy.cfg
($HOME
) for global settings(user-wide)
3、scrapy.cfg
inside a scrapy project’s root (project-wide).
优先级:project-wide > user-wide > system-wide
常见命令
一些命令在项目内
和项目外
运行有一些差别。命令可以分为全局命令(Global commands)
和只能在项目里使用的命令(Project-only commands)
Global commands
scrapy startproject
scrapy startproject <project_name> [project_dir]
# 在project_dir目录下创建一个project_name的项目,如果project_dir未指定,则proejct_dir默认和project_dir系统
# 例:scrapy startproject myproject
scrapy genspider
scrapy genspider [-t template] <name> <domain>
# 使用模板生成一个爬虫,
# -t:指定模板,否则使用默认模板
# name:爬虫的名字
# domain:爬取的域名
# 例:scrapy genspider example example.com
scrapy settings
scrapy settings [options]
# 获取scrapy的设置
# 例:scrapy settings --get BOT_NAME
scrapy runspider
scrapy runspider <spider_file.py>
# 直接运行一个爬虫文件
# 例:scrapy runspider myspider.py
scrapy shell
scrapy shell [url]
# 开启scrapy shell
# 例:scrapy shell http://www.example.com/some/page.html
scrapy fetch
scrapy fetch <url>
# 下载给定url的网页,并在终端输出
# 例:scrapy fetch --nolog http://www.example.com/some/page.html
scrapy view
scrapy view <url>
# 开启一个浏览器,并查看url的内容
# 例:scrapy view http://www.example.com/some/page.html
scrapy version
scrapy version [-v]
# 输出scrapy的版本,如果加上-v也会输出python的版本
Project-only commands
scrapy crawl
scrapy crawl <spider>
# 运行一个爬虫
# 例:scrapy crawl myspider
scrapy check
scrapy check [-l] <spider>
# 检查项目或爬虫是否有错误
# 例:scrapy check -l
scrapy list
scrapy list
# 列出当前目录所有的爬虫
scrapy edit
scrapy edit <spider>
# 修改一个爬虫文件
# 例:scrapy edit spider1
scrapy parse
scrapy parse <url> [options]
# 获取给定url网页的内容,并用spider处理它
# 例:scrapy parse http://www.example.com/ -c parse_item
scrapy bench
scrapy bench
# 运行基准测试,模拟测试爬虫的爬取速度