# 爬虫**网站
import requests
from bs4 import BeautifulSoup
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
'Cookie':'用账户登录后获取的网页Cookie'
}
res=requests.get("需要爬取的网页",headers=headers)
soup=BeautifulSoup(res.text,'html.parser')
print(soup)
实验环境搜狗浏览器
按照如下步骤获取用账户登录后获取的网页Cookie
step1:登录账号;
step2:按F12,进入Network->Doc选项
step3:Ctrl+R(或F5)刷新,找Name,Status,Domin那一行,通常第一行就是要找的,单击进去(不是双击),选择cookie字段下所有的内容复制,粘贴到上边代码“账户登录后获取的网页Cookie”字样处,同时,在“需要爬取的网页”字样处输入Cookie对应网址;
step4:执行即可;
下面附网络爬虫程序一个仅供学习参考(ps:请遵循robots协议)
import requests
import re
self={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
'Cookie':'t=9b2e3b2d2ca9cc3ea9b377e34038612e; thw=cn; enc=FLDBgHQZTLcLCXVYOvApOOnLL%2F1Akz4Ed55sa0qt8DDjELEqawC35W1FP%2FTcpXPPcBtkuhGDla9dTXLH7