scrapy ValueError: Missing scheme in request url://www.xxxx.com.html

最新推荐文章于 2025-10-30 11:50:52 发布

原创最新推荐文章于 2025-10-30 11:50:52 发布 · 10w+ 阅读

1 ·

CC 4.0 BY-SA版权

本文介绍了一种在爬虫中处理不完全URL的方法，通过使用response.urljoin()函数将相对路径转换为完整URL，确保链接可以正确访问。

该文章已生成可运行项目，

其实就是缺少了 http那些东西，补全就行，完整的连接

 urll = nodeList[i].extract()   #拿到得连接没有http
            urlll = response.urljoin(urll)  #添加http成为一个完整得连接
            print("object_url_xpath :" + urlll)  # 拿到其中一个链接
            yield scrapy.Request(urlll, meta={'item': item}, callback=self.parse, headers=self.headers)

本文章已经生成可运行项目

5 条评论

地推789 2021.12.25
学习一下

Latity 2021.04.28
[code=python] import scrapy from ..items import Scrapy6Item import response class TestSpider(scrapy.Spider): name = 'test' # allowed_domains = ['url'] start_urls = [ 'https://list.tmall.com/search_product.htm?q=%CA%D6%BB%FA&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&from=mallfp..pc_1_searchbutton'] def parse(self, response): # 取详细页面的链接，通过yield不断调用取详细也数据的方法， parse方法中 detail_page = response.xpath('//*[@id="J_ItemList"]/div/div/div[1]/a//@href').extract() print(detail_page) title = response.xpath('//*[@id="J_ItemList"]/div/div/p[2]/a//text()').extract() shop = response.xpath('//*[@id="J_ItemList"]/div/div/div[2]/a//text()').extract() price = response.xpath('//*[@id="J_ItemList"]/div/div/p[1]/em//text()').extract() deal = response.xpath('//*[@id="J_ItemList"]/div/div/p[3]/span[1]//text()').extract() for i in range(len(detail_page)): yield scrapy.Request(url=detail_page[i], [/code]
- Latity回复Latity 2021.04.28
  我用了你那个，但还是不行
- Latity回复Latity 2021.04.28
  detail_page = response.xpath('//*[@id="J_ItemList"]/div/div/div[1]/a//@href').extract()爬出来的href都是['//detail.tmall.com/item.htm?id=640814759553&skuId=4784617971081&user_id=1714128138&cat_id=2&is_b=1&rn=fd869eb90882a7ad3e79847db96374fe',] 然后就报ERROR: Spider error processing <GET https://list.tmall.com/search_product.htm?q=%CA%D6%BB%FA&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&from=mallfp..pc_1_searchbutton> (referer: None)和 ValueError: Missing scheme in request url: //detail.tmall.com/item.htm?id=640814759553&skuId=4784617971081&user_id=1714128138&cat_id=2&is_b=1&rn=fd869eb90882a7ad3e79847db96374fe 报错，该怎么改啊
- Latity回复Latity 2021.04.28
  [code=python] meta={"title": title[i], "shop": shop[i], "price": price[i], 'deal': deal[i]}) # parse方法中 next_page = response.xpath('//*[@id="content"]/div/div[8]/div/b[1]/a[3]/@href').extract_first() base_url = "https://list.tmall.com/search_product.htm?q=%CA%D6%BB%FA&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&from=mallfp..pc_1_searchbutton" if next_page: yield scrapy.Request(url=base_url + next_page, callback=self.parse) # 详细页面的爬取方法 def parse_detail(self, response): meta = response.meta address = response.xpath('//*[@id="J_deliveryAdd"]//text()').extract() address = [] for i in address: address.append(i.strip()) address = "".join(address) # 把取出来的span合并 item = Scrapy6Item() # new一个对象 item['file_title'] = meta['title'] item['file_shop'] = meta['shop'] item['file_price'] = meta['price'] item['file_deal'] = meta['dea [/code]