2021SC@SDUSC
最后,作为软工应用这门课的收尾,熬夜硬肝,看完了response部分的代码。结合着官方文档进行分析:
Response objects:
类:scrapy.http.Response(*args, **kwargs) 源码附在最后
A Response object represents an HTTP response, which is usually downloaded (by the Downloader) and fed to the Spiders for processing.
一个 Response 对象代表一个 HTTP 响应,通常被下载(由下载器)并提供给爬虫程序进行处理。
Parameters
-
url (str) – the URL of this response
-
status (int) – the HTTP status of the response. Defaults to
200. -
headers (dict) – the headers of this response. The dict values can be strings (for single valued headers) or lists (for multi-valued headers).
-
body (bytes) – the response body. To access the decoded text as a string, use
response.textfrom an encoding-aware Response subclass, such as TextResponse. -
flags (list) – is a list containing the initial values for the Response.flags attribute. If given, the list will be shallow copied.
-
request (scrapy.http.Request) – the initial value of the Response.request attribute. This represents the Request that generated this response.
-
certificate (twisted.internet.ssl.Certificate) – an objec

本文详细介绍了Scrapy框架中的Response对象,包括其构造函数、属性如url、status、headers、body等,以及方法如follow_all()和replace()。Response对象代表HTTP响应,通常由下载器下载后提供给爬虫处理。TextResponse、HtmlResponse和XmlResponse是Response的子类,分别提供了文本编码支持和特定格式的自动检测。follow_all()方法用于跟踪并生成多个Request实例以爬取页面中的链接。
最低0.47元/天 解锁文章
3万+

被折叠的 条评论
为什么被折叠?



