目的
Scrapy框架为文件和图片的下载专门提供了两个Item Pipeline 它们分别是:
FilePipeline
ImagesPipeline
这里主要介绍ImagesPipeline!!
目标分析:
这次我们要爬的是汽车之家:car.autohome.com.cn。最近喜欢吉利博越,所以看了不少这款车的资料。
我们就点开博越汽车的图片网站:
传统的Scrapy框架图片下载
Scrapy 框架的实施:
1.创建scrapy项目和爬虫:
$ scrapy startproject Geely
$ cd Geely
$ scrapy genspider BoYue car.autohome.com.cn
2.编写items.py:
import scrapy
class GeelyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
# 存储图片分类
catagory = scrapy.Field()
# 存储图片地址
image_urls = scrapy.Field()
# ImagesPipeline
images = scrapy.Field()
3.编写Spider:
# -*- coding: utf-8 -*-
import scrapy
#导入CrawlSpider模块 需改写原来的def parse(self,response)方法
from scrapy.spiders import CrawlSpider ,Rule
#导入链接提取模块
from scrapy.linkextractors import LinkExtractor
from Geely.items import GeelyItem
class BoyueSpider(CrawlSpider):
name = 'BoYue'
allowed_domains = ['car.autohome.com.cn']
start_urls = ['https://car.autohome.com.cn/pic/series/37