仅作参考
修改setting.py中的一些线程参数配置,
# Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 100
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 0
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 100
CONCURRENT_REQUESTS_PER_IP = 100
DOWNLOAD_DELAY默认为3
CONCURRENT_REQUESTS,CONCURRENT_REQUESTS_PER_DOMAIN,CONCURRENT_REQUESTS_PER_IP等默认为16,可以根据自己的任务需求来进行修改配置参数。
本文介绍了如何修改Scrapy的settings.py文件,增大并发请求数(CONCURRENT_REQUESTS)和并发请求限制(CONCURRENT_REQUESTS_PER_DOMAIN, CONCURRENT_REQUESTS_PER_IP),以及调整下载延迟(DOWNLOAD_DELAY)为0,以实现更高效的爬虫运行效率。适合需要快速抓取大量数据的爬虫开发者参考。

被折叠的 条评论
为什么被折叠?



