创建项目:
scrapy startproject ScrapyDemo
cd ScrapyDemo
scrapy genspider bigqcwy msearch.51job.com
items.py文件添加爬取信息:
class ScrapydemoItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
# 职位名称
name = scrapy.Field()
# 薪资水平
salary = scrapy.Field()
# 招聘单位
company = scrapy.Field()
# 工作地点
jobPlace = scrapy.Field()
# 工作经验
jobExperience = scrapy.Field()
# 学历要求
education = scrapy.Field()
# 工作内容(岗位职责)
# jobContent = scrapy.Field()
# 任职要求(技能要求)
jobRequirement = scrapy.Field()
编辑spider文件bigqcwy.py:
对薪资简单做了清洗
# -*- coding: utf-8 -*-
import scrapy
import time
from ScrapyDemo.items import ScrapydemoItem
import re
class BigqcwySpider(scrapy.Spider):
name = 'bigqcwy'
allowed_domains = ['msearch.51job.com']
custom_settings = {
"DEFAULT_REQUEST_HEADERS": {
'Cookie':'设置你的cookie',
},
"AUTOTHROTTLE_ENABLED": True,
# "DOWNLOAD_DELAY": 1,
# "ScrapyDemo.pipelines.ScrapydemoPipeline": 300,
}
start_urls = ['https://msearch.51job.com/']
def start_requests(self):
# 搜索关键词列表
list = ['0100%2C7700%2C7200%2C7300%2C7800', '7400%2C2700%2C7900%2C7500%2C6600', '800

本文介绍了如何利用Scrapy框架爬取前程无忧网站的招聘数据,包括在items.py中定义爬取字段,编写spider文件进行网页抓取,对薪资数据进行清洗,然后配置pipelines.py将数据存储到MongoDB数据库中,最后在settings.py中调整相关设置。通过这些步骤,成功获取并存储了招聘职位信息。
最低0.47元/天 解锁文章
2927

被折叠的 条评论
为什么被折叠?



