Python由于众所周知的GIL的原因,导致其线程无法发挥多核的并行计算能力(当然,后来有了multiprocessing,可以实现多进程并行),显得比较鸡肋。既然在GIL之下,同一时刻只能有一个线程在运行,那么对于CPU密集的程序来说,线程之间的切换开销就成了拖累,而以I/O为瓶颈的程序正是协程所擅长的:
多任务并发(非并行),每个任务在合适的时候挂起(发起I/O)和恢复(I/O结束)
Python中的协程经历了很长的一段发展历程。其大概经历了如下三个阶段:
- 最初的生成器变形yield/send
- 引入@asyncio.coroutine和yield from
- 在最近的Python3.5版本中引入async/await关键字
从yield说起
先看一段普通的计算斐波那契续列的代码:
Python
def old_fib(n): res = [0] * n index = 0 a = 0 b = 1 while index < n: res[index] = b a, b = b, a + b index += 1 return res print('-'*10 + 'test old fib' + '-'*10) for fib_res in old_fib(20): print(fib_res)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def old_fib(n): res = [0] * n index = 0 a = 0 b = 1 while index < n: res[index] = b a, b = b, a + b index += 1 return res
print('-'*10 + 'test old fib' + '-'*10) for fib_res in old_fib(20): print(fib_res) |
如果我们仅仅是需要拿到斐波那契序列的第n位,或者仅仅是希望依此产生斐波那契序列,那么上面这种传统方式就会比较耗费内存。
这时,yield就派上用场了。
Python
def fib(n): index = 0 a = 0 b = 1 while index < n: yield b a, b = b, a + b index += 1 print('-'*10 + 'test yield fib' + '-'*10) for fib_res in fib(20): print(fib_res)
1 2 3 4 5 6 7 8 9 10 11 12 | def fib(n): index = 0 a = 0 b = 1 while index < n: yield b a, b = b, a + b index += 1
print('-'*10 + 'test yield fib' + '-'*10) for fib_res in fib(20): print(fib_res) |
当一个函数中包含yield语句时,python会自动将其识别为一个生成器。这时fib(20)并不会真正调用函数体,而是以函数体生成了一个生成器对象实例。
yield在这里可以保留fib函数的计算现场,暂停fib的计算并将b返回。而将fib放入for…in循环中时,每次循环都会调用next(fib(20)),唤醒生成器,执行到下一个yield语句处,直到抛出StopIteration异常。此异常会被for循环捕获,导致跳出循环。
Send来了
从上面的程序中可以看到,目前只有数据从fib(20)中通过yield流向外面的for循环;如果可以向fib(20)发送数据,那不是就可以在Python中实现协程了嘛。
于是,Python中的生成器有了send函数,yield表达式也拥有了返回值。
我们用这个特性,模拟一个额慢速斐波那契数列的计算:
Python
def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_cnt = yield b print('let me think {0} secs'.format(sleep_cnt)) time.sleep(sleep_cnt) a, b = b, a + b index += 1 print('-'*10 + 'test yield send' + '-'*10) N = 20 sfib = stupid_fib(N) fib_res = next(sfib) while True: print(fib_res) try: fib_res = sfib.send(random.uniform(0, 0.5)) except StopIteration: break
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_cnt = yield b print('let me think {0} secs'.format(sleep_cnt)) time.sleep(sleep_cnt) a, b = b, a + b index += 1 print('-'*10 + 'test yield send' + '-'*10) N = 20 sfib = stupid_fib(N) fib_res = next(sfib) while True: print(fib_res) try: fib_res = sfib.send(random.uniform(0, 0.5)) except StopIteration: break |
其中next(sfib)相当于sfib.send(None),可以使得sfib运行至第一个yield处返回。后续的sfib.send(random.uniform(0, 0.5))则将一个随机的秒数发送给sfib,作为当前中断的yield表达式的返回值。这样,我们可以从“主”程序中控制协程计算斐波那契数列时的思考时间,协程可以返回给“主”程序计算结果,Perfect!
yield from是个什么鬼?
yield from用于重构生成器,简单的,可以这么使用:
Python
def copy_fib(n): print('I am copy from fib') yield from fib(n) print('Copy end') print('-'*10 + 'test yield from' + '-'*10) for fib_res in copy_fib(20): print(fib_res)
1 2 3 4 5 6 7 | def copy_fib(n): print('I am copy from fib') yield from fib(n) print('Copy end') print('-'*10 + 'test yield from' + '-'*10) for fib_res in copy_fib(20): print(fib_res) |
这种使用方式很简单,但远远不是yield from的全部。yield from的作用还体现可以像一个管道一样将send信息传递给内层协程,并且处理好了各种异常情况,因此,对于stupid_fib也可以这样包装和使用:
Python
def copy_stupid_fib(n): print('I am copy from stupid fib') yield from stupid_fib(n) print('Copy end') print('-'*10 + 'test yield from and send' + '-'*10) N = 20 csfib = copy_stupid_fib(N) fib_res = next(csfib) while True: print(fib_res) try: fib_res = csfib.send(random.uniform(0, 0.5)) except StopIteration: break
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def copy_stupid_fib(n): print('I am copy from stupid fib') yield from stupid_fib(n) print('Copy end') print('-'*10 + 'test yield from and send' + '-'*10) N = 20 csfib = copy_stupid_fib(N) fib_res = next(csfib) while True: print(fib_res) try: fib_res = csfib.send(random.uniform(0, 0.5)) except StopIteration: break |
如果没有yield from,这里的copy_yield_from将会特别复杂(因为要自己处理各种异常)。
asyncio.coroutine和yield from
yield from在asyncio模块中得以发扬光大。先看示例代码:
Python
@asyncio.coroutine def smart_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.2) yield from asyncio.sleep(sleep_secs) print('Smart one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1 @asyncio.coroutine def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.4) yield from asyncio.sleep(sleep_secs) print('Stupid one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1 if __name__ == '__main__': loop = asyncio.get_event_loop() tasks = [ asyncio.async(smart_fib(10)), asyncio.async(stupid_fib(10)), ] loop.run_until_complete(asyncio.wait(tasks)) print('All fib finished.') loop.close()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | @asyncio.coroutine def smart_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.2) yield from asyncio.sleep(sleep_secs) print('Smart one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1
@asyncio.coroutine def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.4) yield from asyncio.sleep(sleep_secs) print('Stupid one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1
if __name__ == '__main__': loop = asyncio.get_event_loop() tasks = [ asyncio.async(smart_fib(10)), asyncio.async(stupid_fib(10)), ] loop.run_until_complete(asyncio.wait(tasks)) print('All fib finished.') loop.close() |
asyncio是一个基于事件循环的实现异步I/O的模块。通过yield from,我们可以将协程asyncio.sleep的控制权交给事件循环,然后挂起当前协程;之后,由事件循环决定何时唤醒asyncio.sleep,接着向后执行代码。
这样说可能比较抽象,好在asyncio是一个由python实现的模块,那么我们来看看asyncio.sleep中都做了些什么:
Python
@coroutine def sleep(delay, result=None, *, loop=None): """Coroutine that completes after a given time (in seconds).""" future = futures.Future(loop=loop) h = future._loop.call_later(delay, future._set_result_unless_cancelled, result) try: return (yield from future) finally: h.cancel()
1 2 3 4 5 6 7 8 9 10 | @coroutine def sleep(delay, result=None, *, loop=None): """Coroutine that completes after a given time (in seconds).""" future = futures.Future(loop=loop) h = future._loop.call_later(delay, future._set_result_unless_cancelled, result) try: return (yield from future) finally: h.cancel() |
首先,sleep创建了一个Future对象,作为更内层的协程对象,通过yield from交给了事件循环;其次,它通过调用事件循环的call_later函数,注册了一个回调函数。
通过查看Future类的源码,可以看到,Future是一个实现了__iter__对象的生成器:
Python
class Future: #blabla... def __iter__(self): if not self.done(): self._blocking = True yield self # This tells Task to wait for completion. assert self.done(), "yield from wasn't used with future" return self.result() # May raise too.
1 2 3 4 5 6 7 8 | class Future: #blabla... def __iter__(self): if not self.done(): self._blocking = True yield self # This tells Task to wait for completion. assert self.done(), "yield from wasn't used with future" return self.result() # May raise too. |
那么当我们的协程yield from asyncio.sleep时,事件循环其实是与Future对象建立了练习。每次事件循环调用send(None)时,其实都会传递到Future对象的__iter__函数调用;而当Future尚未执行完毕的时候,就会yield self,也就意味着暂时挂起,等待下一次send(None)的唤醒。
当我们包装一个Future对象产生一个Task对象时,在Task对象初始化中,就会调用Future的send(None),并且为Future设置好回调函数。
Python
class Task(futures.Future): #blabla... def _step(self, value=None, exc=None): #blabla... try: if exc is not None: result = coro.throw(exc) elif value is not None: result = coro.send(value) else: result = next(coro) #exception handle else: if isinstance(result, futures.Future): # Yielded Future must come from Future.__iter__(). if result._blocking: result._blocking = False result.add_done_callback(self._wakeup) #blabla... def _wakeup(self, future): try: value = future.result() except Exception as exc: # This may also be a cancellation. self._step(None, exc) else: self._step(value, None) self = None # Needed to break cycles when an exception occurs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | class Task(futures.Future): #blabla... def _step(self, value=None, exc=None): #blabla... try: if exc is not None: result = coro.throw(exc) elif value is not None: result = coro.send(value) else: result = next(coro) #exception handle else: if isinstance(result, futures.Future): # Yielded Future must come from Future.__iter__(). if result._blocking: result._blocking = False result.add_done_callback(self._wakeup) #blabla...
def _wakeup(self, future): try: value = future.result() except Exception as exc: # This may also be a cancellation. self._step(None, exc) else: self._step(value, None) self = None # Needed to break cycles when an exception occurs. |
预设的时间过后,事件循环将调用Future._set_result_unless_cancelled:
Python
class Future: #blabla... def _set_result_unless_cancelled(self, result): """Helper setting the result only if the future was not cancelled.""" if self.cancelled(): return self.set_result(result) def set_result(self, result): """Mark the future done and set its result. If the future is already done when this method is called, raises InvalidStateError. """ if self._state != _PENDING: raise InvalidStateError('{}: {!r}'.format(self._state, self)) self._result = result self._state = _FINISHED self._schedule_callbacks()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | class Future: #blabla... def _set_result_unless_cancelled(self, result): """Helper setting the result only if the future was not cancelled.""" if self.cancelled(): return self.set_result(result)
def set_result(self, result): """Mark the future done and set its result.
If the future is already done when this method is called, raises InvalidStateError. """ if self._state != _PENDING: raise InvalidStateError('{}: {!r}'.format(self._state, self)) self._result = result self._state = _FINISHED self._schedule_callbacks() |
这将改变Future的状态,同时回调之前设定好的Tasks._wakeup;在_wakeup中,将会再次调用Tasks._step,这时,Future的状态已经标记为完成,因此,将不再yield self,而return语句将会触发一个StopIteration异常,此异常将会被Task._step捕获用于设置Task的结果。同时,整个yield from链条也将被唤醒,协程将继续往下执行。
async和await
弄清楚了asyncio.coroutine和yield from之后,在Python3.5中引入的async和await就不难理解了:可以将他们理解成asyncio.coroutine/yield from的完美替身。当然,从Python设计的角度来说,async/await让协程表面上独立于生成器而存在,将细节都隐藏于asyncio模块之下,语法更清晰明了。
Python
async def smart_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.2) await asyncio.sleep(sleep_secs) print('Smart one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1 async def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.4) await asyncio.sleep(sleep_secs) print('Stupid one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1 if __name__ == '__main__': loop = asyncio.get_event_loop() tasks = [ asyncio.ensure_future(smart_fib(10)), asyncio.ensure_future(stupid_fib(10)), ] loop.run_until_complete(asyncio.wait(tasks)) print('All fib finished.') loop.close()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | async def smart_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.2) await asyncio.sleep(sleep_secs) print('Smart one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1
async def stupid_fib(n): index = 0 a = 0 b = 1 while index < n: sleep_secs = random.uniform(0, 0.4) await asyncio.sleep(sleep_secs) print('Stupid one think {} secs to get {}'.format(sleep_secs, b)) a, b = b, a + b index += 1
if __name__ == '__main__': loop = asyncio.get_event_loop() tasks = [ asyncio.ensure_future(smart_fib(10)), asyncio.ensure_future(stupid_fib(10)), ] loop.run_until_complete(asyncio.wait(tasks)) print('All fib finished.') loop.close() |
总结
至此,Python中的协程就介绍完毕了。示例程序中都是以sleep为异步I/O的代表,在实际项目中,可以使用协程异步的读写网络、读写文件、渲染界面等,而在等待协程完成的同时,CPU还可以进行其他的计算。协程的作用正在于此。
相关代码可以在GitHub上找到https://github.com/yubo1911/saber/tree/master/coroutine。
本文将会讲述Python 3.5之后出现的async/await的使用方法,以及它们的一些使用目的,如果错误,欢迎指正。
昨天看到David Beazley在16年的一个演讲:Fear and Awaiting in Async,给了我不少的感悟和启发,于是想梳理下自己的思路,所以有了以下这篇文章。
Python在3.5版本中引入了关于协程的语法糖async和await,关于协程的概念可以先看我在上一篇文章提到的内容。
看下Python中常见的几种函数形式:
1. 普通函数
def function():
return 1
2. 生成器函数
def generator():
yield 1
在3.5过后,我们可以使用async修饰将普通函数和生成器函数包装成异步函数和异步生成器。
3. 异步函数(协程)
async def async_function():
return 1
4. 异步生成器
async def async_generator():
yield 1
通过类型判断可以验证函数的类型
import types
print(type(function) is types.FunctionType)
print(type(generator()) is types.GeneratorType)
print(type(async_function()) is types.CoroutineType)
print(type(async_generator()) is types.AsyncGeneratorType)
直接调用异步函数不会返回结果,而是返回一个coroutine对象:
print(async_function())
# <coroutine object async_function at 0x102ff67d8>
协程需要通过其他方式来驱动,因此可以使用这个协程对象的send方法给协程发送一个值:
print(async_function().send(None))
不幸的是,如果通过上面的调用会抛出一个异常:
StopIteration: 1
因为生成器/协程在正常返回退出时会抛出一个StopIteration异常,而原来的返回值会存放在StopIteration对象的value属性中,通过以下捕获可以获取协程真正的返回值:
try:
async_function().send(None)
except StopIteration as e:
print(e.value)
# 1
通过上面的方式来新建一个run函数来驱动协程函数:
def run(coroutine):
try:
coroutine.send(None)
except StopIteration as e:
return e.value
在协程函数中,可以通过await语法来挂起自身的协程,并等待另一个协程完成直到返回结果:
async def async_function():
return 1
async def await_coroutine():
result = await async_function()
print(result)
run(await_coroutine())
# 1
要注意的是,await语法只能出现在通过async修饰的函数中,否则会报SyntaxError错误。
而且await后面的对象需要是一个Awaitable,或者实现了相关的协议。
查看Awaitable抽象类的代码,表明了只要一个类实现了__await__方法,那么通过它构造出来的实例就是一个Awaitable:
class Awaitable(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __await__(self):
yield
@classmethod
def __subclasshook__(cls, C):
if cls is Awaitable:
return _check_methods(C, "__await__")
return NotImplemented
而且可以看到,Coroutine类也继承了Awaitable,而且实现了send,throw和close方法。所以await一个调用异步函数返回的协程对象是合法的。
class Coroutine(Awaitable):
__slots__ = ()
@abstractmethod
def send(self, value):
...
@abstractmethod
def throw(self, typ, val=None, tb=None):
...
def close(self):
...
@classmethod
def __subclasshook__(cls, C):
if cls is Coroutine:
return _check_methods(C, '__await__', 'send', 'throw', 'close')
return NotImplemented
接下来是异步生成器,来看一个例子:
假如我要到一家超市去购买土豆,而超市货架上的土豆数量是有限的:
class Potato:
@classmethod
def make(cls, num, *args, **kws):
potatos = []
for i in range(num):
potatos.append(cls.__new__(cls, *args, **kws))
return potatos
all_potatos = Potato.make(5)
现在我想要买50个土豆,每次从货架上拿走一个土豆放到篮子:
def take_potatos(num):
count = 0
while True:
if len(all_potatos) == 0:
sleep(.1)
else:
potato = all_potatos.pop()
yield potato
count += 1
if count == num:
break
def buy_potatos():
bucket = []
for p in take_potatos(50):
bucket.append(p)
对应到代码中,就是迭代一个生成器的模型,显然,当货架上的土豆不够的时候,这时只能够死等,而且在上面例子中等多长时间都不会有结果(因为一切都是同步的),也许可以用多进程和多线程解决,而在现实生活中,更应该像是这样的:
async def take_potatos(num):
count = 0
while True:
if len(all_potatos) == 0:
await ask_for_potato()
potato = all_potatos.pop()
yield potato
count += 1
if count == num:
break
当货架上的土豆没有了之后,我可以询问超市请求需要更多的土豆,这时候需要等待一段时间直到生产者完成生产的过程:
async def ask_for_potato():
await asyncio.sleep(random.random())
all_potatos.extend(Potato.make(random.randint(1, 10)))
当生产者完成和返回之后,这是便能从await挂起的地方继续往下跑,完成消费的过程。而这整一个过程,就是一个异步生成器迭代的流程:
async def buy_potatos():
bucket = []
async for p in take_potatos(50):
bucket.append(p)
print(f'Got potato {id(p)}...')
async for语法表示我们要后面迭代的是一个异步生成器。
def main():
import asyncio
loop = asyncio.get_event_loop()
res = loop.run_until_complete(buy_potatos())
loop.close()
用asyncio运行这段代码,结果是这样的:
Got potato 4338641384...
Got potato 4338641160...
Got potato 4338614736...
Got potato 4338614680...
Got potato 4338614568...
Got potato 4344861864...
Got potato 4344843456...
Got potato 4344843400...
Got potato 4338641384...
Got potato 4338641160...
...
既然是异步的,在请求之后不一定要死等,而是可以做其他事情。比如除了土豆,我还想买番茄,这时只需要在事件循环中再添加一个过程:
def main():
import asyncio
loop = asyncio.get_event_loop()
res = loop.run_until_complete(asyncio.wait([buy_potatos(), buy_tomatos()]))
loop.close()
再来运行这段代码:
Got potato 4423119312...
Got tomato 4423119368...
Got potato 4429291024...
Got potato 4421640768...
Got tomato 4429331704...
Got tomato 4429331760...
Got tomato 4423119368...
Got potato 4429331760...
Got potato 4429331704...
Got potato 4429346688...
Got potato 4429346072...
Got tomato 4429347360...
...
看下AsyncGenerator的定义,它需要实现__aiter__和__anext__两个核心方法,以及asend,athrow,aclose方法。
class AsyncGenerator(AsyncIterator):
__slots__ = ()
async def __anext__(self):
...
@abstractmethod
async def asend(self, value):
...
@abstractmethod
async def athrow(self, typ, val=None, tb=None):
...
async def aclose(self):
...
@classmethod
def __subclasshook__(cls, C):
if cls is AsyncGenerator:
return _check_methods(C, '__aiter__', '__anext__',
'asend', 'athrow', 'aclose')
return NotImplemented
异步生成器是在3.6之后才有的特性,同样的还有异步推导表达式,因此在上面的例子中,也可以写成这样:
bucket = [p async for p in take_potatos(50)]
类似的,还有await表达式:
result = [await fun() for fun in funcs if await condition()]
除了函数之外,类实例的普通方法也能用async语法修饰:
class ThreeTwoOne:
async def begin(self):
print(3)
await asyncio.sleep(1)
print(2)
await asyncio.sleep(1)
print(1)
await asyncio.sleep(1)
return
async def game():
t = ThreeTwoOne()
await t.begin()
print('start')
实例方法的调用同样是返回一个coroutine:
function = ThreeTwoOne.begin
method = function.__get__(ThreeTwoOne, ThreeTwoOne())
import inspect
assert inspect.isfunction(function)
assert inspect.ismethod(method)
assert inspect.iscoroutine(method())
同理还有类方法:
class ThreeTwoOne:
@classmethod
async def begin(cls):
print(3)
await asyncio.sleep(1)
print(2)
await asyncio.sleep(1)
print(1)
await asyncio.sleep(1)
return
async def game():
await ThreeTwoOne.begin()
print('start')
根据PEP 492中,async也可以应用到上下文管理器中,__aenter__和__aexit__需要返回一个Awaitable:
class GameContext:
async def __aenter__(self):
print('game loading...')
await asyncio.sleep(1)
async def __aexit__(self, exc_type, exc, tb):
print('game exit...')
await asyncio.sleep(1)
async def game():
async with GameContext():
print('game start...')
await asyncio.sleep(2)
在3.7版本,contextlib中会新增一个asynccontextmanager装饰器来包装一个实现异步协议的上下文管理器:
from contextlib import asynccontextmanager
@asynccontextmanager
async def get_connection():
conn = await acquire_db_connection()
try:
yield
finally:
await release_db_connection(conn)
async修饰符也能用在__call__方法上:
class GameContext:
async def __aenter__(self):
self._started = time()
print('game loading...')
await asyncio.sleep(1)
return self
async def __aexit__(self, exc_type, exc, tb):
print('game exit...')
await asyncio.sleep(1)
async def __call__(self, *args, **kws):
if args[0] == 'time':
return time() - self._started
async def game():
async with GameContext() as ctx:
print('game start...')
await asyncio.sleep(2)
print('game time: ', await ctx('time'))
await和yield from
Python3.3的yield from语法可以把生成器的操作委托给另一个生成器,生成器的调用方可以直接与子生成器进行通信:
def sub_gen():
yield 1
yield 2
yield 3
def gen():
return (yield from sub_gen())
def main():
for val in gen():
print(val)
# 1
# 2
# 3
利用这一特性,使用yield from能够编写出类似协程效果的函数调用,在3.5之前,asyncio正是使用@asyncio.coroutine和yield from语法来创建协程:
# https://docs.python.org/3.4/library/asyncio-task.html
import asyncio
@asyncio.coroutine
def compute(x, y):
print("Compute %s + %s ..." % (x, y))
yield from asyncio.sleep(1.0)
return x + y
@asyncio.coroutine
def print_sum(x, y):
result = yield from compute(x, y)
print("%s + %s = %s" % (x, y, result))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_sum(1, 2))
loop.close()
然而,用yield from容易在表示协程和生成器中混淆,没有良好的语义性,所以在Python 3.5推出了更新的async/await表达式来作为协程的语法。
因此类似以下的调用是等价的:
async with lock:
...
with (yield from lock):
...
######################
def main():
return (yield from coro())
def main():
return (await coro())
那么,怎么把生成器包装为一个协程对象呢?这时候可以用到types包中的coroutine装饰器(如果使用asyncio做驱动的话,那么也可以使用asyncio的coroutine装饰器),@types.coroutine装饰器会将一个生成器函数包装为协程对象:
import asyncio
import types
@types.coroutine
def compute(x, y):
print("Compute %s + %s ..." % (x, y))
yield from asyncio.sleep(1.0)
return x + y
async def print_sum(x, y):
result = await compute(x, y)
print("%s + %s = %s" % (x, y, result))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_sum(1, 2))
loop.close()
尽管两个函数分别使用了新旧语法,但他们都是协程对象,也分别称作native coroutine以及generator-based coroutine,因此不用担心语法问题。
下面观察一个asyncio中Future的例子:
import asyncio
future = asyncio.Future()
async def coro1():
await asyncio.sleep(1)
future.set_result('data')
async def coro2():
print(await future)
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([
coro1(),
coro2()
]))
loop.close()
两个协程在在事件循环中,协程coro1在执行第一句后挂起自身切到asyncio.sleep,而协程coro2一直等待future的结果,让出事件循环,计时器结束后coro1执行了第二句设置了future的值,被挂起的coro2恢复执行,打印出future的结果'data'。
future可以被await证明了future对象是一个Awaitable,进入Future类的源码可以看到有一段代码显示了future实现了__await__协议:
class Future:
...
def __iter__(self):
if not self.done():
self._asyncio_future_blocking = True
yield self # This tells Task to wait for completion.
assert self.done(), "yield from wasn't used with future"
return self.result() # May raise too.
if compat.PY35:
__await__ = __iter__ # make compatible with 'await' expression
当执行await future这行代码时,future中的这段代码就会被执行,首先future检查它自身是否已经完成,如果没有完成,挂起自身,告知当前的Task(任务)等待future完成。
当future执行set_result方法时,会触发以下的代码,设置结果,标记future已经完成:
def set_result(self, result):
...
if self._state != _PENDING:
raise InvalidStateError('{}: {!r}'.format(self._state, self))
self._result = result
self._state = _FINISHED
self._schedule_callbacks()
最后future会调度自身的回调函数,触发Task._step()告知Task驱动future从之前挂起的点恢复执行,不难看出,future会执行下面的代码:
class Future:
...
def __iter__(self):
...
assert self.done(), "yield from wasn't used with future"
return self.result() # May raise too.
最终返回结果给调用方。
前面讲了那么多关于asyncio的例子,那么除了asyncio,就没有其他协程库了吗?asyncio作为python的标准库,自然受到很多青睐,但它有时候还是显得太重量了,尤其是提供了许多复杂的轮子和协议,不便于使用。
你可以理解为,asyncio是使用async/await语法开发的协程库,而不是有asyncio才能用async/await,除了asyncio之外,curio和trio是更加轻量级的替代物,而且也更容易使用。
curio的作者是David Beazley,下面是使用curio创建tcp server的例子,据说这是dabeaz理想中的一个异步服务器的样子:
from curio import run, spawn
from curio.socket import *
async def echo_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
print('Server listening at', address)
async with sock:
while True:
client, addr = await sock.accept()
await spawn(echo_client, client, addr)
async def echo_client(client, addr):
print('Connection from', addr)
async with client:
while True:
data = await client.recv(100000)
if not data:
break
await client.sendall(data)
print('Connection closed')
if __name__ == '__main__':
run(echo_server, ('',25000))
无论是asyncio还是curio,或者是其他异步协程库,在背后往往都会借助于IO的事件循环来实现异步,下面用几十行代码来展示一个简陋的基于事件驱动的echo服务器:
from socket import socket, AF_INET, SOCK_STREAM, SOL_SOCKET, SO_REUSEADDR
from selectors import DefaultSelector, EVENT_READ
selector = DefaultSelector()
pool = {}
def request(client_socket, addr):
client_socket, addr = client_socket, addr
def handle_request(key, mask):
data = client_socket.recv(100000)
if not data:
client_socket.close()
selector.unregister(client_socket)
del pool[addr]
else:
client_socket.sendall(data)
return handle_request
def recv_client(key, mask):
sock = key.fileobj
client_socket, addr = sock.accept()
req = request(client_socket, addr)
pool[addr] = req
selector.register(client_socket, EVENT_READ, req)
def echo_server(address):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(5)
selector.register(sock, EVENT_READ, recv_client)
try:
while True:
events = selector.select()
for key, mask in events:
callback = key.data
callback(key, mask)
except KeyboardInterrupt:
sock.close()
if __name__ == '__main__':
echo_server(('',25000))
验证一下:
# terminal 1
$ nc localhost 25000
hello world
hello world
# terminal 2
$ nc localhost 25000
hello world
hello world
现在知道,完成异步的代码不一定要用async/await,使用了async/await的代码也不一定能做到异步,async/await是协程的语法糖,使协程之间的调用变得更加清晰,使用async修饰的函数调用时会返回一个协程对象,await只能放在async修饰的函数里面使用,await后面必须要跟着一个协程对象或Awaitable,await的目的是等待协程控制流的返回,而实现暂停并挂起函数的操作是yield。
个人认为,async/await以及协程是Python未来实现异步编程的趋势,我们将会在更多的地方看到他们的身影,例如协程库的curio和trio,web框架的sanic,数据库驱动的asyncpg等等...在Python 3主导的今天,作为开发者,应该及时拥抱和适应新的变化,而基于async/await的协程凭借良好的可读性和易用性日渐登上舞台,看到这里,你还不赶紧上车?
---------------------------------------------------------
from time import sleep, time
def demo1():
"""
假设我们有三台洗衣机, 现在有三批衣服需要分别放到这三台洗衣机里面洗.
"""
def washing1():
sleep(3) # 第一台洗衣机, 需要洗3秒才能洗完 (只是打个比方)
print('washer1 finished') # 洗完的时候, 洗衣机会响一下, 告诉我们洗完了
def washing2():
sleep(2)
print('washer2 finished')
def washing3():
sleep(5)
print('washer3 finished')
washing1()
washing2()
washing3()
"""
这个还是很容易理解的, 运行 demo1(), 那么需要10秒钟才能把全部衣服洗完.
没错, 大部分时间都花在挨个地等洗衣机上了.
"""
def demo2():
"""
现在我们想要避免无谓的等待, 为了提高效率, 我们将使用 async.
washing1/2/3() 本是 "普通函数", 现在我们用 async 把它们升级为 "异步函数".
注: 一个异步的函数, 有个更标准的称呼, 我们叫它 "协程" (coroutine).
"""
async def washing1():
sleep(3)
print('washer1 finished')
async def washing2():
sleep(2)
print('washer2 finished')
async def washing3():
sleep(5)
print('washer3 finished')
washing1()
washing2()
washing3()
"""
从正常人的理解来看, 我们现在有了异步函数, 但是却忘了定义应该什么时候 "离开" 一台洗衣
机, 去看看另一个... 这就会导致, 现在的情况是我们一边看着第一台洗衣机, 一边着急地想着
"是不是该去开第二台洗衣机了呢?" 但又不敢去 (只是打个比方), 最终还是花了10秒的时间才
把衣服洗完.
PS: 其实 demo2() 是无法运行的, Python 会直接警告你:
RuntimeWarning: coroutine 'demo2.<locals>.washing1' was never awaited
RuntimeWarning: coroutine 'demo2.<locals>.washing2' was never awaited
RuntimeWarning: coroutine 'demo2.<locals>.washing3' was never awaited
"""
def demo3():
"""
现在我们吸取了上次的教训, 告诉自己洗衣服的过程是 "可等待的" (awaitable), 在它开始洗衣服
的时候, 我们可以去弄别的机器.
"""
async def washing1():
await sleep(3) # 注意这里加入了 await
print('washer1 finished')
async def washing2():
await sleep(2)
print('washer2 finished')
async def washing3():
await sleep(5)
print('washer3 finished')
washing1()
washing2()
washing3()
"""
尝试运行一下, 我们会发现还是会报错 (报错内容和 demo2 一样). 这里我说一下原因, 以及在
demo4 中会给出一个最终答案:
1. 第一个问题是, await 后面必须跟一个 awaitable 类型或者具有 __await__ 属性的
对象. 这个 awaitable, 并不是我们认为 sleep() 是 awaitable 就可以 await 了,
常见的 awaitable 对象应该是:
await asyncio.sleep(3) # asyncio 库的 sleep() 机制与 time.sleep() 不
# 同, 前者是 "假性睡眠", 后者是会导致线程阻塞的 "真性睡眠"
await an_async_function() # 一个异步的函数, 也是可等待的对象
以下是不可等待的:
await time.sleep(3)
x = await 'hello' # <class 'str'> doesn't define '__await__'
x = await 3 + 2 # <class 'int'> dosen't define '__await__'
x = await None # ...
x = await a_sync_function() # 普通的函数, 是不可等待的
2. 第二个问题是, 如果我们要执行异步函数, 不能用这样的调用方法:
washing1()
washing2()
washing3()
而应该用 asyncio 库中的事件循环机制来启动 (具体见 demo4 讲解).
"""
def demo4():
"""
这是最终我们想要的实现.
"""
import asyncio # 引入 asyncio 库
async def washing1():
await asyncio.sleep(3) # 使用 asyncio.sleep(), 它返回的是一个可等待的对象
print('washer1 finished')
async def washing2():
await asyncio.sleep(2)
print('washer2 finished')
async def washing3():
await asyncio.sleep(5)
print('washer3 finished')
"""
事件循环机制分为以下几步骤:
1. 创建一个事件循环
2. 将异步函数加入事件队列
3. 执行事件队列, 直到最晚的一个事件被处理完毕后结束
4. 最后建议用 close() 方法关闭事件循环, 以彻底清理 loop 对象防止误用
"""
# 1. 创建一个事件循环
loop = asyncio.get_event_loop()
# 2. 将异步函数加入事件队列
tasks = [
washing1(),
washing2(),
washing3(),
]
# 3. 执行事件队列, 直到最晚的一个事件被处理完毕后结束
loop.run_until_complete(asyncio.wait(tasks))
"""
PS: 如果不满意想要 "多洗几遍", 可以多写几句:
loop.run_until_complete(asyncio.wait(tasks))
loop.run_until_complete(asyncio.wait(tasks))
loop.run_until_complete(asyncio.wait(tasks))
...
"""
# 4. 如果不再使用 loop, 建议养成良好关闭的习惯
# (有点类似于文件读写结束时的 close() 操作)
loop.close()
"""
最终的打印效果:
washer2 finished
washer1 finished
washer3 finished
elapsed time = 5.126561641693115
(毕竟切换线程也要有点耗时的)
说句题外话, 我看有的博主的加入事件队列是这样写的:
tasks = [
loop.create_task(washing1()),
loop.create_task(washing2()),
loop.create_task(washing3()),
]
运行的效果是一样的, 暂不清楚为什么他们这样做.
"""
if __name__ == '__main__':
# 为验证是否真的缩短了时间, 我们计个时
start = time()
# demo1() # 需花费10秒
# demo2() # 会报错: RuntimeWarning: coroutine ... was never awaited
# demo3() # 会报错: RuntimeWarning: coroutine ... was never awaited
demo4() # 需花费5秒多一点点
end = time()
print('elapsed time = ' + str(end - start))