python爬虫—requests库的用法
总结:get、post等用法…
import requests
req = requests.get("https://www.baidu.com/?tn=15007414_8_dg")
req = requests.post("https://www.baidu.com/?tn=15007414_8_dg")
req = requests.put("https://www.baidu.com/?tn=15007414_8_dg")
req = requests.delete("https://www.baidu.com/?tn=15007414_8_dg")
req = requests.head("https://www.baidu.com/?tn=15007414_8_dg")
req = requests.options("https://www.baidu.com/?tn=15007414_8_dg")
get请求
import requests
url = "https://www.baidu.com/s"
params = {"wd":'篮球'}
response = requests.get(url,params = params)#传个字典不用格式
print(response.url)
response.encoding = "utf-8"
html = response.text
print(html)
post请求,参数是字典,也可以传输json参数
import requests
from fake_useragent import UserAgent
url = "https://jwmis.ncwu.edu.cn/hsjw/cas/login.action"
headers = {
"User-Agent":UserAgent().firefox
}
formdata = {
"user":"******",
"password":"****** "
}
response = requests.post(url,data=formdata,headers =headers)
response.encoding = "utf-8"
html = response.text
print(html)
自定义请求头部,伪装请求头是采集术静静常用的,我们可用这个方法去隐藏
import requests
from fake_useragent import UserAgent
headers = {"User-Agent":UserAgent().firefox}
r = requests.get("https://www.baidu.com",headers =headers)
print(r.request.headers["User-Agent"])
设置超时时间
import requests
requests.get("https://www.baidu.com/index.php?tn=monline_3_dg",timeout = 0.001)
代理访问
import requests
proxies = {
"http":"http://122.9.101.6:8888",
"https":"https://61.157.206.174:37259"
"http":"http:user:password@//122.9.101.6:8888"#需要账户密码
}
requests.get("https://www.baidu.com/",proxies = proxies)
session 自动保存cookies\seesion的意思是操持一个对话,比如登陆后续继续操作(记录身份信息),而requests的请求,身份信息不会被记录
import requests
s = requests.Session()
#用session对象发出get请求,设置cookies
print(s.get("http://httpbin.org/cookies/set/sessioncookie/123456789"))
ssl验证,禁用安全请求警告
import requests
from fake_useragent import UserAgent
headers = {
'User-Agent':UserAgent().firefox
}
url = "https://www.baidu.com/index.php?tn=monline_3_dg"
requests.packages.urllib3.disable_warnings()#关闭安全请求的警告
response = requests.get(url,verify = False,headers =headers)
print(response)
获取响应信息
“”""
resp.json() 获取相应内容以json为例
resp.text 获取响应内容以字符串形式
resp.content 获取响应内容(以字节的形式)
resp.headers 获取响应头内容
resp.url 获取访问地址
resp.encoding 获取网页编码
resp.request.headers 请求头内容
resp.cookie 获取cookie
“”"

本文详细介绍了Python爬虫中requests库的使用,包括GET和POST请求,自定义请求头,设置超时,代理访问,Session的运用,SSL验证以及如何获取响应信息,如json内容、文本、响应头、URL、编码和请求头等。
63万+

被折叠的 条评论
为什么被折叠?



