Urllib库整理

最新推荐文章于 2023-03-01 21:58:00 发布

原创最新推荐文章于 2023-03-01 21:58:00 发布 · 193 阅读

0 ·

CC 4.0 BY-SA版权

爬虫专栏收录该内容

19 篇文章

订阅专栏

本文介绍了Python内置的Urllib库，包括其优点、主要模块及request子模块的使用，如urlopen函数、Request对象和Handle代理设置。尽管Urllib不如requests库方便，但在不需要额外安装的情况下仍有一定应用价值。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

优点，python内置库，无需额外安装，但是确实不如requests库，我平时做练习就不用这个库

包含模块

1.urllib.request 请求模块

2.urllib.erro 异常处理

3.urllib.parse URL解析

4.urllib.robotparser robots.txt解析

Urllib库的request模块

urlopen语法：urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

网址，数据，超时设置，证书设置

#例子 普通get请求
import urllib.request

response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))

#例子，post请求
import urllib.parse
import urllib.request

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding='utf8')
response = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(response.read())

响应

import urllib.request

response = urllib.request.urlopen('https://www.python.org')
print(type(response))       #返回类型是HTTPResponse
<class 'http.client.HTTPResponse'>

print(response.status)      #获取状态码
200

print(response.getheaders())    #获取响应头
print(response.getheader('Server'))

response.read()         #获取HTML内容

Request，可以包含请求头

这里将request当做一个对象，先创建对象，然后调用URLopen打开

request = urllib.request.Request('https://python.org')
response = urllib.request.urlopen(request)

#可以包含headers，data，method
req = request.Request(url=url, data=data, headers=headers, method='POST')
response = request.urlopen(req)

Handle

代理使用

import urllib.request

proxy_handler = urllib.request.ProxyHandler({
    'http': 'http://127.0.0.1:9743',
    'https': 'https://127.0.0.1:9743'
})
opener = urllib.request.build_opener(proxy_handler)
response = opener.open('http://httpbin.org/get')