1、创建一个项目
- scrapy startproject 项目名称
我的项目叫Neteasy_music,所以命令是scrapy startproject Neteasy_music
2、创建一个爬虫
先把目录切换到项目里面
- cd 项目名称
- scrapy genspider 爬虫名字 网站地址
我这里取的名字是neteasy_music,爬取的网页是music.163.com/discover/artist,
所以命令是scrapy genspider neteasy_music music.163.com/discover/artist
3、编写爬虫文件
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
from Neteasy_music.items import SingerItem
class NeteasyMusicSpider(scrapy.Spider):
name = 'neteasy_music'
allowed_domains = ['music.163.com']
start_urls = ['https://music.163.com/discover/artist']
base_url = 'https://music.163.com'
def parse(self, response):