scrapy.crawler.CrawlerProcess

https://doc.scrapy.org/en/latest/topics/api.html#crawler-api

方法描述其他
crawl(crawler_or_spidercls, *args, **kwargs)根据传入的参数启动一个爬虫
crawlers查看已经添加的爬虫
create_crawler(crawler_or_spidercls)创建一个爬虫
join()Returns a deferred that is fired when all managed crawlers have completed their executions.
start(stop_after_crawl=True)
stop()停止
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

# 'followall' is the name of one of the spiders of the project.
process.crawl('followall', domain='scrapinghub.com')
process.start() # the script will block here until the crawling is finished
V2025-03-18 15:57:52 [scrapy.utils.log] INFO: Scrapy 2.11.2 started (bot: weibo) 2025-03-18 15:57:52 [scrapy.utils.log] INFO: Versions: lxml 5.3.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 3.8.5 (t ags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)], pyOpenSSL 25.0.0 (OpenSSL 3.4.1 11 Feb 2025), cryptography 44.0.1, Platform Windows-10-10.0.22621-SP0 2025-03-18 15:57:52 [weibo_comment] INFO: Reading start URLs from redis key 'weibo_comment:start_urls' (batch size: 16, encoding: utf-8) 2025-03-18 15:57:52 [scrapy.addons] INFO: Enabled addons: [] 2025-03-18 15:57:52 [asyncio] DEBUG: Using selector: SelectSelector 2025-03-18 15:57:52 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-03-18 15:57:52 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-03-18 15:57:52 [scrapy.extensions.telnet] INFO: Telnet Password: ed3efe598fe58086 2025-03-18 15:57:52 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-03-18 15:57:52 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'weibo', 'DOWNLOAD_DELAY': 2, 'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDupeFilter', 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'weibo.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'ROBOTSTXT_OBEY': True, 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler', 'SPIDER_MODULES': ['weibo.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} Unhandled error in Deferred: 2025-03-18 15:57:52 [twisted] CRITICAL: Unhandled error in Deferred: Traceback (most recent call last): File "e:\python\lib\site-packages\twisted\internet\defer.py", line 2017, in _inlineCallbacks result = context.run(gen.send, result) File "e:\python\lib\site-packages\scrapy\crawle
最新发布
03-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值