python的请求网页方法浅析（requests和urllib）_python urllib.request post file-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_45668674/article/details/113481815

1.urllib模块

urllib.request可以用来发送request和获取request的结果
urllib.error包含了urllib.request产生的异常
urllib.parse用来解析和处理URL
urllib.robotparse用来解析页面的robots.txt文件

请求返回响应的对象
response = urllib.request.urlopen(url)
# 响应的对象它主要包含的方法有 read() 、 readinto() 、getheader(name) 、 getheaders() 、 fileno() 等函数和 msg 、 version 、 status 、 reason 、 debuglevel 、 closed 等属性。 得到这个对象之后，赋值为 response ，然后就可以用 response 调用这些方法和属性，得到返回结果的一系列信息。例如 response.read() 就可以得到返回的网页内容， response.status 就可以得到返回结果的状态码，如200代表请求成功，404代表网页未找到等。

获取响应的二进制对象
content = response.read()

print(response.status)
获取状态码

获取响应的头信息
print(response.getheaders())

1.requests模块（第三方模块）

1   requests.get(‘https://github.com/timeline.json’)                                # GET请求
2   requests.post(“http://httpbin.org/post”)                                        # POST请求
3   requests.put(“http://httpbin.org/put”)                                          # PUT请求
4   requests.delete(“http://httpbin.org/delete”)                                    # DELETE请求
5   requests.head(“http://httpbin.org/get”)                                         # HEAD请求
6   requests.options(“http://httpbin.org/get” )                                     # OPTIONS请求

请求url响应response对象
response_obj = requests.get(url)

response_obj.encoding                       #获取当前的编码
response_obj.encoding = 'utf-8'             #设置编码
response_obj.text                           #以encoding解析返回内容。字符串方式的响应体，会自动根据响应头部的字符编码进行解码。
response_obj.content                        #以字节形式（二进制）返回。字节方式的响应体，会自动为你解码 gzip 和 deflate 压缩。

response_obj.headers                        #以字典对象存储服务器响应头，但是这个字典比较特殊，字典键不区分大小写，若键不存在则返回None

response_obj.status_code                     #响应状态码
response_obj.raw                             #返回原始响应体，也就是 urllib 的 response 对象，使用 r.raw.read()   
response_obj.ok                              # 查看r.ok的布尔值便可以知道是否登陆成功
 #*特殊方法*#
response_obj.json()                         #Requests中内置的JSON解码器，以json形式返回,前提返回的内容确保是json格式的，不然解析出错会抛异常
response_obj.raise_for_status()             #失败请求(非200响应)抛出异常

请求时设置请求信息（定制头和cookie信息）

header = {'user-agent': 'my-app/0.0.1''}
cookie = {'key':'value'}
 r = requests.get/post('your url',headers=header,cookies=cookie)

或者

data = {'some': 'data'}
headers = {'content-type': 'application/json',
           'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0'}
 
r = requests.post('https://api.github.com/some/endpoint', data=data, headers=headers)
print(r.text)