【Python爬虫】如何优雅地使用装饰器爬虫
Intro 介绍
在使用python进行爬虫网络请求时,对于不同的接口往往会设计不同的函数
。为了提高代码的复用性,以及尽可能提高运行效率,我们可以通过定义装饰器来请求响应及除错。
Req. 依赖
- python3
- httpx:
pip install httpx
Common 常规用法
import httpx
# Decorator 定义装饰器
def request(func):
def wrapper(*arg, **kwarg):
req = func(*arg, **kwarg)
client = httpx.Client()
ret = client.send(req)
return ret
return wrapper
# 定义爬虫函数
@request
def getUsers(method, url, params):
return httpx.Request(method, url, params=params)
# 执行程序
data = getUsers('GET', 'http://your_host:port/getUsers',{'user_query': 'username'})
print(data.text)
Async 协程
通过协程来使用装饰器时,需要额外借助functools来对异步函数进行装饰
import httpx
import asyncio
import functools
def asyncRequest(client):
def inner(func):
@functools.wraps(func)
async def wrapper(*arg,**kwarg):
req = func(*arg, **kwarg)
ret = await client.send(req)
return ret
return wrapper
return inner
client = httpx.AsyncClient()
@asyncRequest(client)
def asyncGetUsers(method, url, params):
return httpx.Request(method, url, params=params)
data = asyncio.run(getUsers('GET', 'http://your_host:port/getUsers',{'user_query': 'username'}))
print(data.text)
Exception 异常处理
可以在装饰器函数中加入异常处理,以提高代码复用性
try:
# ...
ret = client.send(req)
# ...
except httpx.RemoteProtocolError as err:
print(f"HTTP request failed: Server protocol error; {err}")
except httpx.ReadTimeout as err:
print(f"HTTP request failed: Response timeout; {err}")
except httpx.RequestError as err:
print(f"HTTP request failed: RequestError;{err}")