用python的scrapy框架爬取图片时,运行报错信息如下:
ValueError: Missing scheme in request url: //cdn.shopify.com/s/files/1/1182/9792/products/ARD1040_3_100x.jpg?v=1527112369
2018-05-24 23:02:44 [scrapy.core.scraper] ERROR: Error processing {'imageLink': u'//cdn.shopify.com/s/files/1/1182/9792/products/ARD1038_100x.jpg?v=1523948814'}
爬虫源码如下:
# -*- coding: utf-8 -*-
import scrapy
from sheergirl.items import SheergirlItem
class SheerSpider(scrapy.Spider):
name = 'sheer'
allowed_domains = ['sheergirl.com']
offset = 1
url = "https://www.sheergirl.com/collections/prom-dresses?page="
start_urls = [url + str(offset)]
def parse(self, response):
subSelector = response.xpath('//div[contains(@class,"three columns")]')
for each in subSelector:
item = SheergirlItem()
item["imageLink"] = each.xpath('.//img/@src').extract()[0]

使用Scrapy进行图片爬取时遇到`ValueError: Missing scheme in request url`错误。通过在xpath获取的图片URL前手动添加'http:',解决了该问题。报错原因是Scrapy.Request需要完整URL,而XPath解析得到的URL缺少协议部分。
最低0.47元/天 解锁文章
840

被折叠的 条评论
为什么被折叠?



