安装下载Centos下安装Scrapy 安装下载

本文提供了一份详细的Scrapy安装步骤说明,涵盖了从Python环境搭建到各依赖库安装的全过程,并特别指出了解决某些常见问题的方法。

最近用使开发的过程中出现了一个小问题,顺便记录一下原因和方法--安装下载

    Scrapy是一个开源的遇机twisted框架的python的单机爬虫,该爬虫实际上包括大多数网页抓取的工具包,用于爬虫下载端以及取抽端。

    安装环境:

 

centos5.4
python2.7.3

 

    安装步调:

    1.下载python2.7  http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[root@zxy-websgs ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt

[root@zxy-websgs opt]# tar xvf Python-2.7.3.tgz 

[root@zxy-websgs Python-2.7.3]# ./configure 

[root@zxy-websgs Python-2.7.3]# make && make install

验证python2.7安装

[root@zxy-websgs Python-2.7.3]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz

[root@zxy-websgs ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/
[root@zxy-websgs opt]# tar zxvf setuptools-0.6c11.tar.gz 
[root@zxy-websgs setuptools-0.6c11]# python2.7 setup.py  install

 

    3.安装Twisted

[root@zxy-websgs setuptools-0.6c11]# easy_install Twisted
......
Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg
......
Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg

Twisted要安装zope.interface,可以从面下地址下载

    zope.interface:http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz

    twisted:http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2

    5.安装w3lib

[root@zxy-websgs setuptools-0.6c11]# easy_install -U w3lib
Searching for w3lib
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth file

Installed /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.egg
Processing dependencies for w3lib
Finished processing dependencies for w3lib

w3lib:http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz

    6.安装libxml2或者用easy_install安装lxml

[root@zxy-websgs lxml-3.1.0]# easy_install lxml

验证lxml安装

[root@zxy-websgs lxml-3.1.0]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>> exit()

也可以安装libxml2,官网上荐推安装2.6.28或者以上的本版,但在官网上没找到,我先是安装的2.6.9的本版,行运scrapy时报以下错误

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 5, in <module>
    pkg_resources.run_script('Scrapy==0.14.4', 'scrapy')
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
    execute()
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute
    cmds = _get_commands_dict(inproject)
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict
    cmds = _get_commands_from_module('scrapy.commands', inproject)
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module
    for cmd in _iter_command_classes(module):
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes
    for module in walk_modules(module_name):
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules
    submod = __import__(fullpath, {}, {}, [''])
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>
    from scrapy.shell import Shell
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>
    from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>
    from scrapy.selector.libxml2sel import *
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>
    from .factories import xmlDoc_from_html, xmlDoc_from_xml
  File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>
    libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

升级到2.6.21本版当前决解了。

    libxml2.6.1:ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz

    7.安装pyOpenSSL(这个是可选安装的,重要为了使scrapy够能持支https)

    用easy_install pyOpenSSL安装的是pyOpenSSL-0.13本版,没安装功成,于是手动下载.011本版来停止安装。

[root@zxy-websgs opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt
[root@zxy-websgs opt]# tar zxvf pyOpenSSL-0.11.tar.gz 
[root@zxy-websgs pyOpenSSL-0.11]# python2.7 setup.py install

pyOpenSSL:http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz

    8.安装scrapy

[root@zxy-websgs pyOpenSSL-0.11]# easy_install -U Scrapy

验证安装

[root@zxy-websgs pyOpenSSL-0.11]# scrapy
Scrapy 0.16.4 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

scrapy:http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz

    结总:

    pyOpenSSL独自安装的时候不功成,也可以先下载pyOpenSSL0.11停止安装,再用使easy_install -U Scrapy停止全程安装

文章结束给大家分享下程序员的一些笑话语录: 火车
一个年轻的程序员和一个项目经理登上了一列在山里行驶的火车,他们发现 列车上几乎都坐满了,只有两个在一起的空位,这个空位的对面是一个老奶 奶和一个年轻漂亮的姑娘。两个上前坐了下来。程序员和那个姑娘他们比较 暧昧地相互看对方。这时,火车进入山洞,车厢里一片漆黑。此时,只听见 一个亲嘴的声音,随后就听到一个响亮的巴掌声。很快火车出了山洞,他们 四个人都不说话。
那个老奶奶在喃喃道, “这个年轻小伙怎么这么无礼, 不过我很高兴我的孙女 扇了一个巴掌”。
项目经理在想,“没想到这个程序员居然这么大胆,敢去亲那姑娘,只可惜那 姑娘打错了人,居然给打了我。”
漂亮的姑娘想,“他亲了我真好,希望我的祖母没有打疼他”。
程序员坐在那里露出了笑容, “生活真好啊。 这一辈子能有几次机会可以在亲 一个美女的同时打项目经理一巴掌啊”

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值