Scrapy -- 在下载器中间件中处理异常

最新推荐文章于 2024-04-25 22:16:29 发布

原创最新推荐文章于 2024-04-25 22:16:29 发布 · 2.2k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#中间件

网络爬虫同时被 2 个专栏收录

32 篇文章

订阅专栏

Scrapy

6 篇文章

订阅专栏

这段代码定义了一个ProcessException类，用于处理Scrapy爬虫中遇到的TCP超时和一般超时异常。当发生这些异常时，它会打印错误信息并返回一个HtmlResponse对象，url设置为'exception'。

代码

from twisted.internet.error import TCPTimedOutError, TimeoutError
from scrapy.http import HtmlResponse


class ProcessException:
    def process_exception(self, request, exception, spider):
        if isinstance(exception, TCPTimedOutError):
            print(f"出异常了(tcp超时) --> {exception}")
        elif isinstance(exception, TimeoutError):
            print(f"出异常了(超时) --> {exception}")
        response = HtmlResponse(url='exception')
        return response