2.Python3第三方库requests
2.1安装requests
由于requests是第三方库,因此在vs code上需要安装才能使用。
PS D:\> cd '.\Program Files (x86)\Python\Python37-32\Scripts\'
PS D:\Program Files (x86)\Python\Python37-32\Scripts> ls
目录: D:\Program Files (x86)\Python\Python37-32\Scripts
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2018/12/21 16:22 93065 easy_install-3.7.exe
-a---- 2018/12/21 16:22 93065 easy_install.exe
-a---- 2019/1/19 18:37 93051 f2py.exe
-a---- 2018/12/21 16:22 93047 pip.exe
-a---- 2018/12/21 16:22 93047 pip3.7.exe
-a---- 2018/12/21 16:22 93047 pip3.exe
PS D:\Program Files (x86)\Python\Python37-32\Scripts> pip install requests
Collecting requests
Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)
100% |████████████████████████████████| 61kB 26kB/s
Collecting idna<2.9,>=2.5 (from requests)
Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
100% |████████████████████████████████| 61kB 13kB/s
Collecting urllib3<1.25,>=1.21.1 (from requests)
Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
100% |████████████████████████████████| 122kB 65kB/s
Collecting certifi>=2017.4.17 (from requests)
Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)
100% |████████████████████████████████| 163kB 110kB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
100% |████████████████████████████████| 143kB 79kB/s
Installing collected packages: idna, urllib3, certifi, chardet, requests
The script chardetect.exe is installed in 'd:\program files (x86)\python\python37-32\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed certifi-2018.11.29 chardet-3.0.4 idna-2.8 requests-2.21.0 urllib3-1.24.1
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
2.2 requests库常用函数
(1)def get(url, params=None, **kwargs):GET请求;
(2)def post(url, data=None, json=None, **kwargs):POST请求;
(3)def put(url, data=None, **kwargs):PUT请求;
(4)def delete(url, **kwargs):DELETE请求;
(5)def head(url, **kwargs):HEAD请求;
(6)def options(url, **kwargs):OPTIONS请求。
2.3编写代码
2.3.1 get请求
程序2-1:不带参数的get请求,即参数放在url内
import requests
url = 'http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=python&rsv_pq=fa43b12200000164&rsv_t=f170%2BqL27EHcWq0l5BhkvhF0xZLUYJgY2uSZcFmdBQ1ckOGJrpW8Q9%2Bq5s8&rqlang=cn&rsv_enter=1&rsv_sug3=2&rsv_sug1=2&rsv_sug7=101&rsv_sug2=0&inputT=2747&rsv_sug4=3882&rsv_sug=2'
response = requests.get(url)
print(response)
print(response.status_code) # 打印状态码
print(response.url) # 打印请求url
print(response.headers) # 打印头信息
print(response.cookies) # 打印cookie信息
print(len(response.text)) #以文本形式打印网页源码
print(len(response.content)) #以字节流形式打印
运行结果:
<Response [200]>
200
http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=python&rsv_pq=fa43b12200000164&rsv_t=f170%2BqL27EHcWq0l5BhkvhF0xZLUYJgY2uSZcFmdBQ1ckOGJrpW8Q9%2Bq5s8&rqlang=cn&rsv_enter=1&rsv_sug3=2&rsv_sug1=2&rsv_sug7=101&rsv_sug2=0&inputT=2747&rsv_sug4=3882&rsv_sug=2
{'Server': '', 'Date': 'Tue, 22 Jan 2019 17:42:37 GMT', 'Content-Type': 'text/html;charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Bdpagetype': '3', 'Bdqid': '0xbd4c225100000a7f', 'Cxy_all': 'baidu+f994cc8c061baa454dfb53081e83ad67', 'Cxy_ex': '1548178957+3343058212+d41d8cd98f00b204e9800998ecf8427e', 'P3p': 'CP=" OTI DSP COR IVA OUR IND COM "', 'Set-Cookie': 'BAIDUID=88BA57D469EBBB843189406B0343DA31:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, BIDUPSID=88BA57D469EBBB843189406B0343DA31; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, PSTM=1548178957; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, delPer=0; path=/; domain=.baidu.com, BD_CK_SAM=1;path=/, PSINO=5; domain=.baidu.com; path=/, BDSVRTM=367; path=/, H_PS_PSSID=1458_21117_28329_28131_26350_28415_22158; path=/; domain=.baidu.com', 'Vary': 'Accept-Encoding', 'X-Ua-Compatible': 'IE=Edge,chrome=1'}
<RequestsCookieJar[<Cookie BAIDUID=88BA57D469EBBB843189406B0343DA31:FG=1 for .baidu.com/>, <Cookie BIDUPSID=88BA57D469EBBB843189406B0343DA31 for .baidu.com/>, <Cookie H_PS_PSSID=1458_21117_28329_28131_26350_28415_22158 for .baidu.com/>, <Cookie PSINO=5 for .baidu.com/>, <Cookie PSTM=1548178957 for .baidu.com/>, <Cookie delPer=0 for .baidu.com/>, <Cookie BDSVRTM=367 for www.baidu.com/>, <Cookie BD_CK_SAM=1 for www.baidu.com/>]>
430754
437553
程序2-2:带参数的get请求
import requests
url = 'http://www.baidu.com/s?'
params = {
'wd':'python'
}
response = requests.get(url,params=params)
print(response)
print(response.status_code) # 打印状态码
print(response.url) # 打印请求url
print(response.headers) # 打印头信息
print(response.cookies) # 打印cookie信息
print(len(response.text)) #以文本形式打印网页源码
print(len(response.content)) #以字节流形式打印
运行结果:
<Response [200]>
200
http://www.baidu.com/s?wd=python
{'Server': '', 'Date': 'Tue, 22 Jan 2019 17:48:30 GMT', 'Content-Type': 'text/html;charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Bdpagetype': '3', 'Bdqid': '0xf0856d7f000016df', 'Cxy_all': 'baidu+1c4aa8621260b97cff125fd24a7e8dc4', 'Cxy_ex': '1548179310+3343058212+d41d8cd98f00b204e9800998ecf8427e', 'P3p': 'CP=" OTI DSP COR IVA OUR IND COM "', 'Set-Cookie': 'BAIDUID=F885F86D3499A74A325A4A8152F4DF77:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, BIDUPSID=F885F86D3499A74A325A4A8152F4DF77; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, PSTM=1548179310; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com, delPer=0; path=/; domain=.baidu.com, BD_CK_SAM=1;path=/, PSINO=5; domain=.baidu.com; path=/, BDSVRTM=522; path=/, H_PS_PSSID=1444_21088_28328_28132_26350_27244; path=/; domain=.baidu.com', 'Vary': 'Accept-Encoding', 'X-Ua-Compatible': 'IE=Edge,chrome=1'}
<RequestsCookieJar[<Cookie BAIDUID=F885F86D3499A74A325A4A8152F4DF77:FG=1 for .baidu.com/>, <Cookie BIDUPSID=F885F86D3499A74A325A4A8152F4DF77 for .baidu.com/>, <Cookie H_PS_PSSID=1444_21088_28328_28132_26350_27244 for
.baidu.com/>, <Cookie PSINO=5 for .baidu.com/>, <Cookie PSTM=1548179310 for .baidu.com/>, <Cookie delPer=0 for .baidu.com/>, <Cookie BDSVRTM=522 for www.baidu.com/>, <Cookie BD_CK_SAM=1 for www.baidu.com/>]>
420101
427290
URL问号(?)后的字符串表示参数,因此带参数时可以把问号后的字符串删除;由于在百度网页输入’python’,才会访问下一个网页,因此在参数处只需要填写对应的python参数即可,在原先的url中对应变量wd(也可以按f12,进入network-XHR查看)。
程序2-3:带头部信息
import requests
url = 'http://www.baidu.com/s?'
params = {
'wd':'python'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/63.0.3239.132 Safari/537.36',
}
response = requests.get(url,params=params,headers=headers)
print(response)
print(len(response.text)) #以文本形式打印网页源码
运行结果:
<Response [200]>
430857
在put函数中,头部变量必须使用headers的名称。
2.3.2 post请求
#程序2-4
import requests
url = 'https://www.lagou.com/jobs/list_python?\
city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput='
heads = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/63.0.3239.132 Safari/537.36'
}
data = {
'first': 'true',
'pn': '1',
'kd': 'python'
}
response = requests.post(url,data=data,headers=heads)
print(response)
print(len(response.text)) #以文本形式打印网页源码
运行结果:
<Response [200]>
92679
2.3.3代理IP
#程序2-5
import requests
url = 'http://www.baidu.com/s?'
params = {
'wd':'python'
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/63.0.3239.132 Safari/537.36',
}
proxy = {
'https': '61.129.70.109:1080'
}
response = requests.get(url=url, params=params, headers=headers, proxies=proxy)
print(response)
print(len(response.text)) #以文本形式打印网页源码
在get函数中,使用代理IP的变量名必须是proxies。