Scrapy:[scrapy.core.engine] DEBUG: Crawled (200)解决尝试

在学习Scrapy时,遇到大量DEBUG和httperror报错。经过检查链接有效性、设置、代码和尝试网络延迟后,发现频繁请求可能导致IP被封锁。添加time.sleep(1)以降低请求频率,问题得到解决。提醒新手注意控制请求间隔,避免数据获取受阻。

在学习Scrapy用法的实验过程中,拟“得到”某网站信息时,出现大量的[scrapy.core.engine] DEBUG和[scrapy.spidermiddlewares.httperror]报错。

 刚学习Scrapy视频半个小时,顿时有点懵B!抱着解决问题就是最好的学习收获的原则,开始解决尝试。

第一步:确认传给具体解析def_prase的href链接有效性。结果:网页均可以在浏览器中正常打开。

第二部:核对参数setting和items的py文件中的配置是否异常。结果:正常。

第三步:再次检查代码是否出现“低级”错误。结果:正常。

第四步:网上查找类似案例套用解决。结果:对应不上。

第五步:“死马当活马医”自己分析原因。具体思路如下:

1、网络正常,网页正常打开正常、配置正常、代码正常,说明自身和通道环节基本上没有问题。

2、首页(目类内容)获取正常,进入目类中具体页面获取数据时报错,估算一下,获取量有点频繁,是否跟“得到”cateyes数据时被关“小黑屋”类似。当时,正常调试具体代码,但提前写了获取N页数据,run了几次就获取不到了,页面提示IP频繁,暂不能访问。

3、尝试加一段time.sleep(1)尝试效果。结果,具体页面数据运行正常。

结论:

在本机IP做“得到”网站数据的实验时,切记!最好设定每页的延迟时间。特别时新手入门时,每部验证代码时,不自觉的“scrapy crwal XXXXX”,多刷几次,数据死活获取不到会让人崩溃和状况。

注:

以上行为系学习研究的测试活动,所获取到数据已删除或不保存。

2025-07-08 15:43:37 [scrapy.utils.log] INFO: Scrapy 2.13.3 started (bot: scrapybot) 2025-07-08 15:43:37 [scrapy.utils.log] INFO: Versions: {'lxml': '6.0.0', 'libxml2': '2.11.9', 'cssselect': '1.3.0', 'parsel': '1.10.0', 'w3lib': '2.3.1', 'Twisted': '25.5.0', 'Python': '3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 ' '64 bit (AMD64)]', 'pyOpenSSL': '25.1.0 (OpenSSL 3.5.1 1 Jul 2025)', 'cryptography': '45.0.5', 'Platform': 'Windows-10-10.0.22631-SP0'} 2025-07-08 15:43:37 [scrapy.addons] INFO: Enabled addons: [] 2025-07-08 15:43:37 [asyncio] DEBUG: Using selector: SelectSelector 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet Password: 8a6ca1391bfb9949 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-07-08 15:43:37 [scrapy.crawler] INFO: Overridden settings: {'DOWNLOAD_DELAY': 1, 'NEWSPIDER_MODULE': 'nepu_spider.spiders', 'SPIDER_MODULES': ['nepu_spider.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'} 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.MultiJsonPipeline'] 2025-07-08 15:43:37 [scrapy.addons] INFO: Enabled addons: [] 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet Password: 671a36aa7bc330e0 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-07-08 15:43:37 [scrapy.crawler] INFO: Overridden settings: {'DOWNLOAD_DELAY': 1, 'NEWSPIDER_MODULE': 'nepu_spider.spiders', 'SPIDER_MODULES': ['nepu_spider.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'} 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.MultiJsonPipeline'] 2025-07-08 15:43:37 [scrapy.addons] INFO: Enabled addons: [] 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet Password: 76f044bac415a70c 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-07-08 15:43:37 [scrapy.crawler] INFO: Overridden settings: {'DOWNLOAD_DELAY': 1, 'NEWSPIDER_MODULE': 'nepu_spider.spiders', 'SPIDER_MODULES': ['nepu_spider.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'} 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.MultiJsonPipeline'] 2025-07-08 15:43:37 [scrapy.addons] INFO: Enabled addons: [] 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-08 15:43:37 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet Password: fc500ad4454da624 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-07-08 15:43:37 [scrapy.crawler] INFO: Overridden settings: {'DOWNLOAD_DELAY': 1, 'NEWSPIDER_MODULE': 'nepu_spider.spiders', 'SPIDER_MODULES': ['nepu_spider.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'} 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-08 15:43:37 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.MultiJsonPipeline'] 2025-07-08 15:43:37 [scrapy.core.engine] INFO: Spider opened 2025-07-08 15:43:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-08 15:43:37 [scrapy.core.engine] INFO: Spider opened 2025-07-08 15:43:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-08 15:43:37 [scrapy.core.engine] INFO: Spider opened 2025-07-08 15:43:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-08 15:43:37 [scrapy.core.engine] INFO: Spider opened 2025-07-08 15:43:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6025 2025-07-08 15:43:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026 2025-07-08 15:43:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://xxgk.nepu.edu.cn/xxgklm/xxgk.htm> (referer: None) 2025-07-08 15:43:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/jgsz/jxdw.htm> (referer: None) 2025-07-08 15:43:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://zsxxw.nepu.edu.cn/> (referer: None) 2025-07-08 15:43:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/xxgk/xxjj.htm> (referer: None) 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-08 15:43:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 314, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 4815, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.265455, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 8, 7, 43, 38, 4643, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 18235, 'httpcompression/response_count': 1, 'items_per_minute': None, 'log_count/DEBUG': 8, 'log_count/INFO': 26, 'response_received_count': 1, 'responses_per_minute': None, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2025, 7, 8, 7, 43, 37, 739188, tzinfo=datetime.timezone.utc)} 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Spider closed (finished) 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-08 15:43:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 311, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 5880, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.282532, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 8, 7, 43, 38, 21720, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 18387, 'httpcompression/response_count': 1, 'items_per_minute': None, 'log_count/DEBUG': 6, 'log_count/INFO': 22, 'response_received_count': 1, 'responses_per_minute': None, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2025, 7, 8, 7, 43, 37, 739188, tzinfo=datetime.timezone.utc)} 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Spider closed (finished) 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-08 15:43:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 300, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 9026, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.284539, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 8, 7, 43, 38, 22730, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 32943, 'httpcompression/response_count': 1, 'items_per_minute': None, 'log_count/DEBUG': 10, 'log_count/INFO': 39, 'response_received_count': 1, 'responses_per_minute': None, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2025, 7, 8, 7, 43, 37, 738191, tzinfo=datetime.timezone.utc)} 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Spider closed (finished) 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-08 15:43:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 311, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 9736, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.285536, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 8, 7, 43, 38, 22730, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 25723, 'httpcompression/response_count': 1, 'items_per_minute': None, 'log_count/DEBUG': 13, 'log_count/INFO': 49, 'response_received_count': 1, 'responses_per_minute': None, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2025, 7, 8, 7, 43, 37, 737194, tzinfo=datetime.timezone.utc)} 2025-07-08 15:43:38 [scrapy.core.engine] INFO: Spider closed (finished)
07-09
(scrapy_env) C:\Users\Lenovo\nepu_spider>scrapy crawl nepu 2025-07-04 11:45:59 [scrapy.utils.log] INFO: Scrapy 2.8.0 started (bot: nepu_spider) 2025-07-04 11:45:59 [scrapy.utils.log] INFO: Versions: lxml 4.9.3.0, libxml2 2.10.4, cssselect 1.1.0, parsel 1.6.0, w3lib 1.21.0, Twisted 22.10.0, Python 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 23.2.0 (OpenSSL 3.0.10 1 Aug 2023), cryptography 41.0.3, Platform Windows-10-10.0.26100-SP0 2025-07-04 11:45:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'nepu_spider', 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'nepu_spider.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['nepu_spider.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} 2025-07-04 11:45:59 [asyncio] DEBUG: Using selector: SelectSelector 2025-07-04 11:45:59 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-07-04 11:45:59 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-07-04 11:45:59 [scrapy.extensions.telnet] INFO: Telnet Password: 72f4ee89904f02b3 2025-07-04 11:45:59 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-07-04 11:46:00 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-07-04 11:46:00 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-07-04 11:46:00 [scrapy.middleware] INFO: Enabled item pipelines: ['nepu_spider.pipelines.NepuSpiderPipeline'] 2025-07-04 11:46:00 [scrapy.core.engine] INFO: Spider opened 2025-07-04 11:46:00 [nepu] INFO: 🆕 数据表 NewsArticles 创建成功或已存在 2025-07-04 11:46:00 [nepu] INFO: ✅ 成功连接到 SQL Server 数据库 2025-07-04 11:46:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-07-04 11:46:00 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.nepu.edu.cn/xsdt.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9813.htm> from <GET http://www.nepu.edu.cn/info/1049/9813.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9815.htm> from <GET http://www.nepu.edu.cn/info/1049/9815.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9836.htm> from <GET http://www.nepu.edu.cn/info/1049/9836.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9837.htm> from <GET http://www.nepu.edu.cn/info/1049/9837.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/10129.htm> from <GET http://www.nepu.edu.cn/info/1049/10129.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9809.htm> from <GET http://www.nepu.edu.cn/info/1049/9809.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9812.htm> from <GET http://www.nepu.edu.cn/info/1049/9812.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/9808.htm> from <GET http://www.nepu.edu.cn/info/1049/9808.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/10162.htm> from <GET http://www.nepu.edu.cn/info/1049/10162.htm> 2025-07-04 11:46:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.nepu.edu.cn/info/1049/10155.htm> from <GET http://www.nepu.edu.cn/info/1049/10155.htm> 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9813.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9815.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9808.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9809.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9812.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/10129.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9837.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/10155.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/10162.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.nepu.edu.cn/info/1049/9836.htm> (referer: None) 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9813.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9813.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9815.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9815.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9808.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9808.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9809.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9809.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9812.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9812.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/10129.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/10129.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9837.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9837.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/10155.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/10155.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/10162.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/10162.htm'} 2025-07-04 11:46:00 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.nepu.edu.cn/info/1049/9836.htm> {'content': '', 'title': None, 'url': 'https://www.nepu.edu.cn/info/1049/9836.htm'} 2025-07-04 11:46:00 [scrapy.core.engine] INFO: Closing spider (finished) 2025-07-04 11:46:00 [nepu] INFO: 🔌 已安全关闭数据库连接 2025-07-04 11:46:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 5100, 'downloader/request_count': 21, 'downloader/request_method_count/GET': 21, 'downloader/response_bytes': 93797, 'downloader/response_count': 21, 'downloader/response_status_count/200': 11, 'downloader/response_status_count/302': 10, 'elapsed_time_seconds': 0.455654, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2025, 7, 4, 3, 46, 0, 836127), 'httpcompression/response_bytes': 251471, 'httpcompression/response_count': 11, 'item_scraped_count': 10, 'log_count/DEBUG': 34, 'log_count/INFO': 13, 'request_depth_max': 1, 'response_received_count': 11, 'scheduler/dequeued': 21, 'scheduler/dequeued/memory': 21, 'scheduler/enqueued': 21, 'scheduler/enqueued/memory': 21, 'start_time': datetime.datetime(2025, 7, 4, 3, 46, 0, 380473)} 2025-07-04 11:46:00 [scrapy.core.engine] INFO: Spider closed (finished) (scrapy_env) C:\Users\Lenovo\nepu_spider>
07-05
06-08 21:23:22 [scrapy.utils.log] INFO: Scrapy 2.13.1 started (bot: scrapy_douban) 2025-06-08 21:23:22 [scrapy.utils.log] INFO: Versions: {'lxml': '5.4.0', 'libxml2': '2.11.9', 'cssselect': '1.3.0', 'parsel': '1.10.0', 'w3lib': '2.3.1', 'Twisted': '25.5.0', 'Python': '3.13.4 (tags/v3.13.4:8a526ec, Jun 3 2025, 17:46:04) [MSC v.1943 ' '64 bit (AMD64)]', 'pyOpenSSL': '25.1.0 (OpenSSL 3.5.0 8 Apr 2025)', 'cryptography': '45.0.3', 'Platform': 'Windows-11-10.0.22631-SP0'} 2025-06-08 21:23:22 [scrapy.addons] INFO: Enabled addons: [] 2025-06-08 21:23:22 [asyncio] DEBUG: Using selector: SelectSelector 2025-06-08 21:23:22 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2025-06-08 21:23:22 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2025-06-08 21:23:22 [scrapy.extensions.telnet] INFO: Telnet Password: 8f0b34d911bcb84f 2025-06-08 21:23:22 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2025-06-08 21:23:22 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scrapy_douban', 'DOWNLOAD_DELAY': 2, 'NEWSPIDER_MODULE': 'scrapy_douban.spiders', 'SPIDER_MODULES': ['scrapy_douban.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' 'Chrome/123.0 Safari/537.36'} 2025-06-08 21:23:23 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2025-06-08 21:23:23 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.start.StartSpiderMiddleware', 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2025-06-08 21:23:23 [scrapy.middleware] INFO: Enabled item pipelines: ['scrapy_douban.pipelines.ScrapyDoubanPipeline'] 2025-06-08 21:23:23 [scrapy.core.engine] INFO: Spider opened 2025-06-08 21:23:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2025-06-08 21:23:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2025-06-08 21:23:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://movie.douban.com/top250> (referer: None)
06-09
评论 2
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值