1. 安装
按照官方文档的安装指南,一步步走就行了。我安装在windows下
http://scrapy-chs.readthedocs.org/zh_CN/latest/intro/install.html#windows
2. 初探
还是官方文档,继续
http://scrapy-chs.readthedocs.org/zh_CN/latest/intro/tutorial.html
但是在运行爬虫程序的时候报错了,如下:
E:\Python Workspace\tutorial>scrapy crawl dmoz
:0: UserWarning: You do not have a working installation of the service_identity
module: 'No module named service_identity'. Please install it from <https://pyp
i.python.org/pypi/service_identity> and make sure all of its dependencies are sa
tisfied. Without the service_identity module and a recent enough pyOpenSSL to s
upport it, Twisted can perform only rudimentary TLS client hostname verification
. Many valid certificate/hostname mappings may be rejected.
2015-05-28 11:23:20+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: tutorial)
根据提示,去下载和安装service_identity,地址为:https://pypi.python.org/pypi/service_identity#downloads,下载whl文件
使用pip安装:pip install service_identity-14.0.0-py2.py3-none-any.whl
再次运行,继续报错:
raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'scrapy.core.downloader.handlers.s3.S3Download
Handler': DLL load failed: 找不到指定的模块。
需要安装pywin32,这里有个问题,在官网下载的安装时会报错,我使用了另一个之前下载的,版本号、大小完全一样,但安装没问题。
本文详细介绍了如何解决在安装和运行Scrapy爬虫时遇到的错误,包括安装依赖库service_identity和pywin32,以及解决DLL加载失败的问题。
65万+

被折叠的 条评论
为什么被折叠?



