工具
wingIDE pro 6.0
python 3.6
scrapy 1.5.0
按照书本教程,模拟抓取数码印刷网站最新头条
- 运行scrapy startproject digiprintnews,创建工程项目;
- 运行scarpy genspider basic web;创建抓取文件
- items代码如下;
import scrapy
class DigiprintnewsItem(scrapy.Item):define the fields for your item here like:
name = scrapy.Field()
title = scrapy.Field()
new_urls = scrapy.Field()
date = scrapy.Field()
news_from = scrapy.Field()
pass - basic.py代码如下;
import scrapy
from digiprintnews.items import DigiprintnewsItem
class BasicSpider(scrapy.Spider):
name = ‘basic’
allowed_domains = [‘web’]
start_urls = [‘https://www.chinakuaiyin.cn’]
def parse(self, response):
item = DigiprintnewsItem()
item['title'] = response.xpath('//div[2]/div[2]/div[2]/ul/li/a/@title')
item['new_urls'] = response.xpath('//div[2]/div[2]/div[2]/ul/li/a/@href')
return item
一运行就报错:No module named ‘digiprintnews’。网上各种查资料,要嘛说是pycharm需要运行一下生成路径,要嘛就是各种绝对路径、相对路径的说法。但是绝对是路径上出问题了。python无法找到对应的digiprintnews在哪里。
终于找到对应的解决办法。原方法地址:https://blog.youkuaiyun.com/smh2208/article/details/80955126?utm_source=blogxgwz6
加入以下代码:
import sys
import os
fpath = os.path.abspath(os.path.join(os.path.dirname(file),"…"))
ffpath = os.path.abspath(os.path.join(fpath,"…"))
sys.path.append(ffpath)
再次运行问题解决。