Python爬虫技术与应用：原生态网络爬虫开发

最新推荐文章于 2025-12-18 10:59:28 发布

原创

最新推荐文章于 2025-12-18 10:59:28 发布 · 1.7k 阅读

24 ·

CC 4.0 BY-SA版权

文章标签：

#python #爬虫 #开发语言

Python爬虫技术与应用：原生态网络爬虫开发

3.1　requests库详解

3.1.1　requests语法

1．安装requests包

pip install requests

2．GET请求

基本GET请求，代码如下：

import requests
r=requests.get('http://httpbin.org/get')
print(r.text)

带参数GET请求，代码如下：

import requests
r=requests.get('http://httpbin.org/get?name=williams_z&age=21')
param={
   
   'name':'williams_z','age':21}                             #注意要用字典形式
r=requests.get('http://httpbin.org/get',params=param)            #加参数用params函数
print(r.text)

假如想请求JSON文件，可利用JSON()方法解析，以文字为基础且易于让人阅读，同时也方便机器进行解析和生成，代码如下：

    import requests
    import json
    r=requests.get('http://httpbin.org/get')
    print(r.json())
获得二进制数据，主要用以解析图片和视频等，代码如下：

    import requests
    r=requests.get('http://httpbin.org/get')
    print(r.content)

保存二进制数据，代码如下：

import requests
r=requests.get('https://github.com/favicon.ico')
with open('favicon.ico','wb') as f:
    f.write(r.content)
f.close()
#wb:以二进制格式打开一个文件只用于写入
#w:即为write
#f:即file（文件）

添加headers，代码如下：

    import requests
    headers={
   
   'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36 Edg/93.0.961.52'}
    r=requests.get('https://www.zhihu.com/explore',headers=headers)
    print(r.text)

3．高级操作

文件上传，代码如下：

import requests
file={
   
   'file':open('favicon.ico','rb')}
r=requests.post('http://httpbin.org/post',files=files)
print(r.text)

获得Cookie，代码如下：

import requests
r=requests.get('http://www.baidu.com')
print(r.cookies)
for key,value in r.cookies.items():
    print(key+ '=' +value)

证书验证，代码如下：

import requests
from requests.packages import urllib3
urllib3.disable_warnings() #这两句用以消除证书未验证系统弹出的警告
r=requests.get('https://www.12306.cn',verify=False)
print(r.status_code)

代理设置，代码如下：

import requests
proxies={
   
   'http':'http://127.0.0.1:9743','http':'https://127.0.0.1:9744',}
r=requests.get('https://www.taobao.com',proxies=proxies)
print(r.status_code)