风火编程--爬虫素材及工具方法heades, ua, cookies

本文介绍如何设置User-Agent请求头,处理Headers和Cookies的键值对格式,包括从f12复制的Headers和Cookies转换为字典格式的方法,以及通过Selenium登录并保存、读取Cookies的过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

设置user-agent的请求头

headers = {'User-Agent': 'Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0;'}

ua列表

user_agent_list = [
"Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0;",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
"Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"
]
user_agent = random.choice(user_agent_list )

处理headers键值对格式

处理直接从f12复制出来的headers
如:

s="""Host: blog.youkuaiyun.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"""

def make_headers(s):
    """
    把字符串格式的headers转换成字典格式
    :param s: "key:value"
    :return: {"key": "value"}
    """
    r = {k: j for k, j in [i.split(': ') for i in s.split('\n')]}
    return r

处理cookie键值对格式

处理直接从f12复制出来的cookie
如:

s="""__yadk_uid=xR1OjgKQ2CeqmfVrRH1DIkO73khKET08; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=1788*1*PC_VC; smidV2=201807251019067decffaff952b0bb01f47a768824572e00a4a65cfd6889290; UN=weixin_42620314; ARK_ID=JS475ff4d4717c679c604192c471bb0153475f; __utma=17226283.973575484.1539159222.1539159222.1539159222.1; __utmz=17226283.1539159222.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; uuid_tt_dd=10_28867322940-1540532715013-534135; bdshare_firstime=1540540501498; dc_session_id=10_1541137469803.249467; TY_SESSION_ID=466d7b7c-6d1e-4ea0-99b5-3ec79bf848fa; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1540975415,1541137470,1541159297,1541159379; UserName=weixin_42620314; UserInfo=uZH9oCALKs49rvxo2JZfV9eMAs6epxR1sAi3jPOHS2oocX1epaEo9HeZ1hO4oZ6y%2Brx6oW5e4yFXxOUE4WChVmwoAn1%2BpWKxdBPoPNv4C2DiS%2B21ZjEdkpaAy%2Bc3M6ra45f2OHJeOCFBcA4ThtSuXQ%3D%3D; UserNick=%E9%A3%8E%E7%81%AB%E7%BC%96%E7%A8%8B; AU=878; BT=1541163196688; UserToken=uZH9oCALKs49rvxo2JZfV9eMAs6epxR1sAi3jPOHS2oocX1epaEo9HeZ1hO4oZ6y%2Brx6oW5e4yFXxOUE4WChVmwoAn1%2BpWKxdBPoPNv4C2DiS%2B21ZjEdkpaAy%2Bc3M6rajz2U%2FLxS3fcudt0W6LpqSLuiGoGR2SxS35B4xPXy7TvibWr3xYp%2F1dTamuVKz%2FjO; aliyungf_tc=AQAAAOPWZS7v5gEADFl0cdBgnaTzsfWs; dc_tos=php6u6; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1541383135"""

def make_cookie(s):
    """
    把字符串格式的cookie转换成字典
    :param s: "key=value;"
    :return: 字典形式的cookie
    """
    r = {k: j for k, j in [i.split('=', 1) for i in s.split(';')]}
    return r

通过selenium登录并将cookies到cookies.json文件

注意事项
使用之前请先阅读此段文字, 以避开各种坑.
1. input()会阻塞程序, 登录成功后要在控制台输入回车使程序继续登录成功后要在控制台输入回车使程序继续进行.
2. get_cookies之前要等待cookies加载完毕.
3. add_cookie之前要访问login页面以创建该域名对应的cookie.

driver.get("login_url")
# 输入账号密码登录
input()
# 登录成功后要在控制台输入回车使程序继续进行
time.sleep(5)  # get_cookie之前要等待cookie加载完毕.
cookies = driver.get_cookies()
            cookies_json = json.dumps(cookies) 
            with open("cookies.json", "w") as f:
                f.write(cookies_json)

读取json文件中的cookies,并添加到driver

with open("cookies.json", "r") as f:
    cookies = json.load(f)
driver.get("login_url")  # 添加cookie之前先访问登录页面创建session
for cookie in cookies:
    driver.add_cookie({"name":cookie["name"], "value": cookie["value"]})
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值