1. urlopen()
import urllib.request
response = urllib.request.urlopen("http://www.baidu.com")
urlopen(url)返回一个HTTPResponse类型的对象
print(type(response))
<class ‘http.client.HTTPResponse’>
我们可以使用read()获取网页的源码
print(response.read().decode('utf-8'))
也可以获取状态码,头信息。使用getheader(key)可获得头信息中key对应的信息,例如第三行代码传入的参数是Server,我们获得了BWS/1.1。顺便一提,BWS/1.1是百度自己开发的服务器。
print(response.status)
print(response.getheaders())
print(response.getheader('Server'))
200
[(‘Bdpagetype’, ‘1’), (‘Bdqid’, ‘0x86df3e810007cff5’), (‘Cache-Control’, ‘private’), (‘Content-Type’, ‘text/html’), (‘Cxy_all’, ‘baidu+a3b3120deaa50f29fcbabca556087115’), (‘Date’, ‘Sun, 23 Sep 2018 12:57:25 GMT’), (‘Expires’, ‘Sun, 23 Sep 2018 12:56:31 GMT’), (‘P3p’, ‘CP=" OTI DSP COR IVA OUR IND COM "’), (‘Server’, ‘BWS/1.1’), (‘Set-Cookie’, ‘BAIDUID=67D8FDD012F5DDD1D66D4CF65F0097DE:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com’), (‘Set-Cookie’, ‘BIDUPSID=67D8FDD012F5DDD1D66D4CF65F0097DE; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com’), (‘Set-Cookie’, ‘PSTM=1537707445; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com’), (‘Set-Cookie’, ‘delPer=0; expires=Tue, 15-Sep-2048 12:56:31 GMT’), (‘Set-Cookie’, ‘BDSVRTM=0; path=/’), (‘Set-Cookie’, ‘BD_HOME=0; path=/’), (‘Set-Cookie’, ‘H_PS_PSSID=1429_27213_21093_22158; path=/; domain=.baidu.com’), (‘Vary’, ‘Accept-Encoding’), (‘X-Ua-Compatible’, ‘IE=Edge,chrome=1’), (‘Connection’, ‘close’), (‘Transfer-Encoding’, ‘chunked’)]BWS/1.1