常见的请求头
1.Content-Type 定义网络文件的类型和网页的编码
2.Host 主机和端口号
3.Connection 链接类型
4.upgrade-insecure-requests 升级为HTTPS请求
5.User-Agent 浏览器名称(√)
6.Referer 页面跳转处,防盗链(√)
7.Cookie cookie(√)
8.Authorization 表示HTTP协议中需要认证资源的认证信息
响应头
set-cookie
requests模块
1.requests模块作用
发送http请求,获取响应数据
2.requests发送给get请求
import requests
url = 'http://www.baidu.com'
response = requests.get(url)
print(response.text)
3.response响应对象
response.text
类型 str
解码类型 request模块自动根据http头部对响应的编码做出推测
response.content
类型 byte
解码类型 没有指定
response.text = response.content.decode('推测的字符集')
response.encoding 手动设定编码格式以显示中文
response.url 响应的url地址
response.tatus_code 响应状态码
response.request.headers 响应对应的请求头
response.headers 响应头
response.request._cookies响应对应请求的cookie,返回cookieJar类型
response.cookie 响应的cookie
response.json() 自动将json字符串类型响应内容转化为python对象(dict or list)
4.发送带请求头的请求
import requests
# 发送带请求头的请求
url = 'http://www.baidu.com'
# 1.构建headers 请求头字典
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'
}
response = requests.get(url,headers=headers)
print(response.text)
5.发送带参数的请求
# 方法1 url直接携带参数
import requests
url = 'https://www.baidu.com/s?wd=python'
# 1.构建headers 请求头字典
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'
}
response = requests.get(url,headers=headers)
with open('python.html','wb') as f:
f.write(response.content)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# 方法2 通过params携带参数字典
url = 'https://www.baidu.com/s?'
# 1.构建headers 请求头字典
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'
}
# 2.构建参数字典
data = {
'wd':'python'
}
response = requests.get(url,headers=headers,params=data)
with open('python.html','wb') as f:
f.write(response.content)
6.在headers中携带cookie,保持登录状态
构建请求头时,headers 携带cookie参数就行
7.使用cookies参数保持会话
构建cookies字典
在请求的时候将 cookies字典赋值给cookies参数
response = requests.get(url,headers=headers,cookies=cookies)
使用代理
import requests
url = 'http://www.baidu.com'
proxies = {
'http': 'http://219.95.51:9000',
'https': 'http://219.95.51:9000'
}
response = requests.get(url,proxies=proxies)
print(response.text)
使用verify参数忽略CA证书