- 进入主程序入口
if __name__ == "__main__":
print('hello')
- urllib
- get请求
import urllib.request
#get方式
response = urllib.request.urlopen("http://www.baidu.com")
print(response)
运行结果
<http.client.HTTPResponse object at 0x000002A5EE9B8208>
这是因为urlopen返回的是一个respose的对象,使用.read()方法就可以读出来内容
print(response.read())
为防止乱码,可以使用decode方法
print(response.read().decode('utf-8'))
- post请求
下面这个网址用来测试
http://httpbin.org/- 获取post请求
执行下面的代码会报错,原因是post请求必须传递给其一些参数
- 获取post请求
response = urllib.request.urlopen('http://httpbin.org/post')
print(response.read())
而正确的方式是用下面的方法
import urllib.parse
data = bytes(urllib.parse.urlencode({'hello': 'world'}), encoding='utf-8')
response = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(response.read().decode('utf-8'))
解释:通过urlencode方法将字典解析,再通过bytes转化成二进制形式数据包,最后以参数形式传入到urlopen方法中
- 超时处理
下面这样的访问会超时,
response = urllib.request.urlopen('http://httpbin.org/get', timeout=0.01)
print(response.read().decode('utf-8'))
报的错误如下
urllib.error.URLError: <urlopen error timed out>
用try,except来防止报错
try:
response = urllib.request.urlopen('http://httpbin.org/get', timeout=0.01)
print(response.read().decode('utf-8'))
except urllib.error.URLError as e:
print('time out')
运行结果
<http.client.HTTPResponse object at 0x0000017F5E19ED88>
time out
- 状态码 response.status,响应头response.getheaders()
response = urllib.request.urlopen('http://httpbin.org/get')
print(response.status)
运行结果
200
通过下面方法可以获得响应头
response = urllib.request.urlopen('http://httpbin.org/get')
print(response.getheaders())
运行部分结果
[('Date', 'Fri, 18 Sep 2020 01:43:30 GMT'),
('Content-Type', 'application/json'), ('Content-Length', '272')]
如果要部分响应头可以用以下方法
response = urllib.request.urlopen('http://httpbin.org/get')
print(response.getheader('Content-Type'))
真正开始模拟浏览器访问
利用Request对象来封装Url,
之后再调用urlopen方法
url = 'http://httpbin.org/post'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
}
data = bytes(urllib.parse.urlencode({'hello': 15}), encoding='utf-8')
req = urllib.request.Request(url=url, data=data, headers=headers, method='POST')
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))
下面来访问一下豆瓣
url = 'https://www.douban.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'
}
req = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))