首先是安装scrapy, windows下安装略坑,直接pip install scrapy会报错,因为scrapy基于twisted, 需要我们手动安装twisted,见我的上一篇博客好了~~~
开始建立爬虫工程:scrapy startproject yangguang2
再进入爬虫:cd yangguang2
生成爬虫:scrapy genspider ygspider url
记下来是主程序:
# -*- coding: utf-8 -*-
import scrapy
from yangguang2.items import Yangguang2Item
class YgspiderSpider(scrapy.Spider): # 主页面的处理
name = 'ygspider'
allowed_domains = ['wz.sun0769.com']
start_urls = ['http://d.wz.sun0769.com/index.php/question/huiyin']
def parse(self, response):
tr_list = response.xpath(
'//div[@class="newsHead clearfix"]/table[2]/tr')
for tr in tr_list:
item = Yangguang2Item()
item['title'] = tr.xpath(
'./td[3]/a[@class="news14"]/text()').extract_first()
item['href'] = tr.xpath(
'./td[3]/a[@class="news14