fake-useragent库:伪装浏览器Header

在爬虫时,需设定http请求的Header并伪装不同浏览器以欺骗服务器,常需重复复制粘贴。现发现可伪装请求Header的库,能让人从简单复制粘贴工作中解放出来,文中给出了安装和使用相关内容。

在爬虫的时候,我们要在代码中设定http请求的Header,会重复复制粘贴这种工作。我们要经常伪装不同的浏览器来欺骗服务器来完成我们的请求。最近发现了一个可以用来伪装请求Header的库,让我们从这种简单的复制粘贴中解放出来。

安装

pip install fake-useragent

使用

浏览器的user-agent值,由于一目了然,直接上代码。

from fake_useragent import UserAgent

#创建UserAgent对象
ua = UserAgent(verify_ssl=False)

#直接输出各种浏览器的userAgent
print(ua.ie)
print(ua.opera)
print(ua.chrome)
print(ua.firefox)
print(ua.safari)

#随机输出某种浏览器的userAgent
print(ua.random)
print(ua.random)
print(ua.random)

执行结果如下:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 3.0)

Opera/9.80 (Windows NT 6.1; U; en-US) Presto/2.7.62 Version/11.01

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20130331 Firefox/21.0

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4

Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0

Mozilla/5.0 (Windows; U; Windows NT 6.0; tr-TR) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36

具体用法大家可以自由发挥。

import requests from collections import Counter from requests.packages.urllib3.exceptions import InsecureRequestWarning # 禁用SSL警告 requests.packages.urllib3.disable_warnings(InsecureRequestWarning) class firstHeaders: def items(self): headers = { "Content-Length": "0", "Sec-Ch-Ua": '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"', "Sec-Ch-Ua-Mobile": "?0", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "Sec-Ch-Ua-Platform": "Windows", "Accept": "*/*", "Origin": "https://match.yuanrenxue.cn", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Referer": "https://match.yuanrenxue.cn/match/3", "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "zh-CN,zh;q=0.9", "Cookie": "Hm_lvt_c99546cf032aaa5a679230de9a95c7db=1716902967; no-alert3=true; Hm_lvt_9bcbda9cbf86757998a2339a0437208e=1717295263,1717314335; tk=-8838488315772654498; sessionid=t2ivk05m79d9oxqxutbcynikyeb93u88; m=7d8915425233260fbbc811c253bdc577|1717314348000; Hm_lvt_434c501fe98c1a8ec74b813751d4e3e3=1717311666,1717314893; Hm_lpvt_434c501fe98c1a8ec74b813751d4e3e3=1717314893; Hm_lpvt_9bcbda9cbf86757998a2339a0437208e=1717314899; Hm_lpvt_c99546cf032aaa5a679230de9a95c7db=1717334940" } return ((k, v) for k, v in headers.items()) def firstRequests(): """获取:JSSM的sessionid""" url = "https://match.yuanrenxue.cn/jssm" response = requests.post(url, headers=firstHeaders(), verify=False) cookies = { "sessionid": response.cookies.get("sessionid") } return cookies class secondHeaders: def items(self): headers = { "sec-ch-ua": '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"', "Accept": "application/json, text/javascript, */*; q=0.01", "X-Requested-With": "XMLHttpRequest", "sec-ch-ua-mobile": "?0", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36", "sec-ch-ua-platform": "Windows", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Referer": "https://match.yuanrenxue.cn/match/3", "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "zh-CN,zh;q=0.9", } return ((k, v) for k, v in headers.items()) def secondRequests(cookies, page): url = "https://match.yuanrenxue.cn/api/match/3?page={}".format(page) response = requests.get(url, headers=secondHeaders(), cookies=cookies, verify=False) return response.json()['data'] if __name__ == '__main__': calculate_num = [] for page in range(1, 6): cookies = firstRequests() data_row = secondRequests(cookies, page) print(data_row) for row in data_row: val = row['value'] calculate_num.append(val) top = Counter(calculate_num).most_common(1)[0] print(top) 分析代码原理和执行顺序和每个函数的作用,仔细讲解。我是新手
10-27
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值