python的requests库简介_python中requests是什么库-优快云博客

本文链接：https://blog.youkuaiyun.com/geter_CS/article/details/84873295

打开网页的过程就是浏览器作为“客户端”，向服务器端发送一次请求，把服务器的文件（一般是html文件或者图像，视频等其他文件）“抓”到本地，在进行解释，展现。HTML是一种标记语言，用标记内容加以解析和区分。

requests是python的一个简单易用的爬虫框架。下面表格中是其主要的方法。

在这里插入图片描述

1. requests.get(url,params=None,**kwargs)

通过r=requests.get(url)(构造一个向服务器请求资源的request对象，返回一个包含资源的response对象)

url:获取信息页面的url连接：
URI的概念：web上的每一中可用资源如html文档，图像，视频，程序等都有一个通用资源标志符（Universal resource identifier，URI）进行定位。一般URI由三部分组成：命名机制，存放资源的主机名和资源路径。分别由//和/来分开。比如：
http://www.why.com.cn/myhtml/html123/
命名机制为http协议（表示可以通过http协议访问资源），资源在主机www.why.com.cn上，路径myhtml/html123/
URL（Uniform Resource Locator）是URI的子集，表示：统一资源定位符。由三部分组成：协议，存有资源的主机IP地址（有时也包含端口号），资源的具体地址。分别由第一部分用//分开和第二部分用/来分开。前两部分是不可或缺的第三部分可以省略。碧如：http://www.rol.cn.net/talk/talk1.htm，其计算机域名是www.rol.cn.net，talk1.htm文件在/talk下。也可以是文件，碧如：file://ftp.yoyodyne.com/pub/files/foobar.txt。
例子：

import requests
url="http://www.baidu.com"
r=requests.get(url)
print(r.text)
print(r.headers)

输出：

<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç™¾åº¦ä¸€ä¸‹ class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ–°é—»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>åœ°å›¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§†é¢‘</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç™»å½•</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">ç™»å½•</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">æ›´å¤šäº§å“</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å
³äºŽç™¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>ä½¿ç”¨ç™¾åº¦å‰å¿
è¯»</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>æ„è§åé¦ˆ</a>&nbsp;äº¬ICPè¯030173å·&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

{'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'Keep-Alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Fri, 07 Dec 2018 06:48:33 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:28:24 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}

2.通过，Request得到的response对象，有如下属性：

在这里插入图片描述

我们可以看见上面的例子中，输出的html，其中有乱码，我满可以通过修改reponse的encoding属性来避免乱码：

print('encoding:',r.encoding)
print('apparent_encoding:',r.apparent_encoding)

输出：

encoding: ISO-8859-1
apparent_encoding: utf-8

修改：

r.encoding="utf-8"
print(r.text)

输出：

<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

修改reponse的encoding属性后可以看见不再有乱码。所以一般设置编码为r.encoding=r.apparent_encoding。

3. requests请求的异常，及其处理：

在这里插入图片描述

处理：我们使用try - except和r.raise_for_status()方法来解决异常。r.raise_for_status()在内部判断r.status_code是否为200，若不是会引发HTTPError。使用如下：

def gethtml_text(url):
      try:
          r=requests.get(url,timeout=20)
          r.raise_for_status()#若不为200，引发异常
          r.encoding=r.apparent_encoding
          return r.text
      except:
          return "产生异常"

4.HTTP协议对资源的操作

在这里插入图片描述

其中PATCH和PUT的区别

通常我们在访问URL时可能是想修改某一组数据UserInfo，包括UserID和UserName等20个字段。当我们要修改UserName，其他不变时，若使用PATCH，只需向URL提交UserName的更新请求。而PUT必须将所有20个字段一起提交给URL，未提交字段南被删除。所以PATCH的好处就是节省网络宽带。
其他的与Requests库一致：
在这里插入图片描述

使用方式：

在这里插入图片描述

定制Requests

有些网页需要对Requests的参数进行设置才能获取其数据，这包括了传递URL参数、定制请求头、发送POST请求、设置超时等。

传递URL参数

import requests
url="http://httpbin.org/get"
url="http://www.baidu.com"
key_dict={'Uername':'xiaoming','key2':'value2'}
r=requests.get(url,params=key_dict)
r.encoding=r.apparent_encoding
print(r.url)
print(r.text)

输出中url有了参数：

http://www.baidu.com/?Uername=xiaoming&key2=value2
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

定制请求头：

请求头Headers提供了关于请求、响应、或其他发送实体的信息。它对爬虫至关重要。Headers信息在要请求的网页中找（单击右键使用浏览器的查看功能）。

import requests
url="http://httpbin.org/get"
url="https://www.youkuaiyun.com"
headers={
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
    'Host':'www.youkuaiyun.com'
}
r=requests.get(url,headers=headers)
r.encoding=r.apparent_encoding
print(r.url)
print(r.text)

输出：

https://www.youkuaiyun.com/
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <meta name="apple-mobile-web-app-status-bar-style" content="black">
    <meta name="referrer"content="always">
    <meta name="msvalidate.01" content="3189512127C34C46BC74BED5852D45E4" />
    <title>优快云-专业IT技术社区</title>
    <script src='/js/tingyun-rum-feed.js?v=20180509'...............

发送POST请求

有的请求需要发送一些编码为表单形式的数据，比如在登录时的请求就是POST。（这是因为若使用GET请求,密码会显示在URL中，很不安全，但是若采用POST会进行编码）：

import requests
url="http://httpbin.org/get"
url="https://www.youkuaiyun.com"
url="http://www.baidu.com"
key_dict={
    'Username':'xiaoming',
    "Keywords":1268172
}
r=requests.post(url,data=key_dict)
r.encoding=r.apparent_encoding
print(r.url)
print(r.text)

输出：

http://www.baidu.com/
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>