《2018年7月10日》【连续281天】
标题:基本库urllib复习;
内容:
复习一波基本库:
1.urllib:
a.
urllib.request.urlopen()
data参数:缺省,使用后请求方法就是post了,
测试:import urllib.parse
import urllib.request
data =bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
response =urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read())
timeout参数:测试:
try:
response =urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
print(response.read())
except urllib.error.URLError as e:
if isinstance(e.reason,socket.timeout):
print('TIME OUT')
b.request:request =urllib.request.Request('https://python.org')
response =urllib.request.urlopen(request)
c.BaseHandler类,所有Handler类的父类
OpenerDirector;
通过Handler和opener的高级用法,来处理密码验证,代理,Cookies等;
d.urlparse():实现url的识别和分段;
from urllib.parse import urlparse
result =urlparse('http://www.baidu.com/index.html;user?id=5#comment')
print(type(result), result)
输出:
<class 'urllib.parse.ParseResult'> ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
有三个参数:urlstring:url
scheme:缺省的协议
allow_fragments:是否忽略fragment,缺省为Ture;
e.
robotparser:解析robot.txt