第一篇技术博客,现在开始一直不懈的追求技术的进步了hhh。
先从打好Python基础开始,利用Scrapy包爬取英雄联盟首页所有英雄的信息并保存到本地QAQ。
工具和环境
1.Python 2.7
2.IDE: Pycharm
3.Scrapy
代码实现
1.控制台
scrapy startproject lolheros
cd lolheros
scrapy genspider loler lol.qq.com
2.项目
#items
import scrapy
class LolherosItem(scrapy.Item):
# define the fields for your item here like:
names = scrapy.Field()
#spiders
import scrapy
from lolheros.items import LolherosItem
class LolerSpider(scrapy.Spider):
name = 'loler'
allowed_domains = ['lol.qq.com']
start_urls = ['http://lol.qq.com/web201310/info-heros.shtml']
def parse(self, response):
items=[]
item = LolherosItem()
item['names'] = response.xpath('//*[@id="jSearchHeroDiv"]/li/a/text()').extract()
items.append(item)
return items
#pipelines
import os
class LolherosPipeline(object):
def process_item(self, item, spider):
base_dir = os.getcwd()
filename = base_dir+'/lolheros.txt'
with open(filename,'rb') as f:
f.write(items)
return items
#settings
BOT_NAME = 'lolheros'
SPIDER_MODULES = ['lolheros.spiders']
NEWSPIDER_MODULE = 'lolheros.spiders'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
'lolheros.pipelines.LolherosPipeline': 300,
}