Scrap命令集

本文详细介绍Scrapy爬虫框架的使用方法,包括项目创建、蜘蛛构建、命令行操作及常用命令解析。从零开始教你如何搭建并运行爬虫,掌握Scrapy核心功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

创建项目:scrapy startproject huwaiSpiders

创建蜘蛛:scrapy genspider test8264 8264.com     #需要在项目目录下运行

运行蜘蛛:scrapy crawl  mingyan2

来源:https://www.cnblogs.com/qlshine/p/5926102.html

查看所有命令

scrapy -h

查看帮助信息

scapy --help

查看版本信息

(venv)ql@ql:~$ scrapy version
Scrapy 1.1.2
(venv)ql@ql:~$ 
(venv)ql@ql:~$ scrapy version -v
Scrapy    : 1.1.2
lxml      : 3.6.4.0
libxml2   : 2.9.4
Twisted   : 16.4.0
Python    : 2.7.12 (default, Jul  1 2016, 15:12:24) - [GCC 5.4.0 20160609]
pyOpenSSL : 16.1.0 (OpenSSL 1.0.2g-fips  1 Mar 2016)
Platform  : Linux-4.4.0-36-generic-x86_64-with-Ubuntu-16.04-xenial
(venv)ql@ql:~$ 

新建一个工程

scrapy startproject spider_name

构建爬虫genspider(generator spider)

一个工程中可以存在多个spider, 但是名字必须唯一

scrapy genspider name domain
#如:
#scrapy genspider sohu sohu.org

查看当前项目内有多少爬虫

scrapy list

view使用浏览器打开网页

scrapy view http://www.baidu.com

shell命令, 进入scrpay交互环境

#进入该url的交互环境
scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/

之后便进入交互环境
我们主要使用这里面的response命令, 例如可以使用

response.xpath()    #括号里直接加xpath路径

runspider命令用于直接运行创建的爬虫, 并不会运行整个项目

scrapy runspider 爬虫名称
select * from ( SELECT -- 手工入库字段 a.id, a.tag_num, a.orgid, a.createperson, a.damageid, a.type, a.weight, a.createtime, a.operator, d.wasteCode, d.dangerName, d.wasteCategory, s.name AS orgName, '贮存' AS direction, -- 手工入库无方向 '人工' AS source FROM incoming_parameter a LEFT JOIN dangerwaste d ON a.damageid = d.id LEFT JOIN scrap s ON a.orgid = s.id where a.type=1 and a.orgid in ( 1526361506692370433 ) UNION ALL SELECT -- 手工出库字段 b.id, b.tag_num, b.orgid, b.createperson, b.damageid, b.type, b.weight, b.createtime, b.operator, d.wasteCode, d.dangerName, d.wasteCategory, s.name AS orgName, b.direction, '人工' AS source FROM outbound_parameter b LEFT JOIN dangerwaste d ON b.damageid = d.id LEFT JOIN scrap s ON b.orgid = s.id LEFT JOIN incoming_parameter ipc ON b.in_num = ipc.tag_num where b.type=1 and b.orgid in ( 1526361506692370433 ) UNION ALL SELECT -- 设备产废字段,自动补齐其他列为 NULL NULL AS id, d.deviceCode AS tag_num, d.scrapId AS orgid, ( SELECT su.name FROM haolan_message hm JOIN sys_user su ON su.id = hm.create_user WHERE hm.point_id = wg.device_code AND hm.message LIKE '%DOOR%' AND hm.create_time BETWEEN wg.create_time - INTERVAL 600 SECOND AND wg.create_time ORDER BY hm.create_time DESC LIMIT 1 ) AS createperson, d.danger_waste_id AS damageid, NULL AS type, wg.increment_or_decrement_weight AS weight, wg.create_time AS createtime, NULL AS operator, dw.wasteCode, dw.dangerName, dw.wasteCategory, d.scrapName AS orgName, '贮存' AS direction, '设备' AS source FROM waste_generation_data wg JOIN device d ON wg.scrap_id = d.scrapId LEFT JOIN dangerwaste dw ON d.danger_waste_id = dw.id WHERE wg.flag = 1 AND wg.increment_or_decrement_weight > 1 and wg.scrap_id in ( 1526361506692370433 ) group by wg.id UNION ALL SELECT -- 设备转运字段 NULL AS id, d.deviceCode AS tag_num, d.scrapId AS orgid, su.name AS createperson, d.danger_waste_id AS damageid, NULL AS type, wtd.weight AS weight, wtd.create_time AS createtime, su.name AS operator, -- 来源于 sys_user dw.wasteCode, dw.dangerName, dw.wasteCategory, d.scrapName AS orgName, '转运' AS direction, '设备' AS source FROM waste_transport_data wtd JOIN device d ON wtd.deviceCode = d.deviceCode LEFT JOIN dangerwaste dw ON d.danger_waste_id = dw.id LEFT JOIN sys_user su ON wtd.create_user = su.id where 1=1 and d.scrapId in ( 1526361506692370433 ) group by wtd.id ) t ORDER BY createtime DESC 这个sql如何优化速度
最新发布
07-10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值