前言:
在慕课上买了个付费视频,里面有一章是讲用CrawlSpider来爬取拉勾网,可能因为视频录的时候比较早,老师没加headers,也没用cookie所有的拉勾的工作的详情页面就全部200了,但是自己动手的时候,怎么都是302,去网上查了查,说是加上cookie和headers就可以了,自己动手试了试还真是成功了,拿出来和大家分享一下
1.Selenium获得登陆的cookie:
首先第一步是要拿到cookie,这里采用selenium模拟登陆浏览器的方法,简单粗暴,就是调试的时候比较慢。直接上源码了,没啥好理解的:
#-*- coding:utf-8 -*-
__author__ = 'GaoShuai'
from selenium import webdriver
from scrapy.selector import Selector
import time
def login_lagou():
browser = webdriver.Chrome(executable_path="D:/chromedriver.exe")
browser.get("https://passport.lagou.com/login/login.html")
# 填充账号密码
browser\
.find_element_by_css_selector("body > section > div.left_area.fl > div:nth-child(2) > form > div:nth-child(1) > input")\
.send_keys("username")
browser\
.find_element_by_css_selector("body > section > div.left_area.fl > div:nth-child(2) > form > div:nth-child(2) > input")\
.send_keys("password")
# 点击登陆按钮
browser\
.find_element_by_css_selector("body > section > div.left_area.fl > div:nth-child(2) > form > div.input_item.btn_group.clearfix > input")\
.click()
cookie_dict={}
time.sleep(3)
Cookies = browser.get_cookies()
for cookie in Cookies:
cookie_dict[cookie['name']] = cookie['value']
# browser.quit()
return cookie_dict
def login_zhihu():
browser = webdriver.Chrome(executable_path="D:/chromedriver.exe")
browser.get("https://www.zhihu.com/signup?next=%2F")
#