python multiprocessing 多进程

本文深入探讨了Python中multiprocessing模块的使用,详细解释了map_async和imap在处理大量数据时的区别,包括它们如何消费迭代器、返回结果的方式,以及何时使用它们更合适。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

【先挖坑】python multiprocessing 多进程

  • 多进程的基本概念

  • 使用multiprocessing

  • multiprocess pool

  • pickling error

  • 类方法的实例不能pickle Python2.9, Python2.7.15

【2018.11.12更新】这个链接简单易懂,可以参考

https://morvanzhou.github.io/tutorials/python-basic/multiprocessing/

这个回答非常棒:
https://stackoverflow.com/questions/26520781/multiprocessing-pool-whats-the-difference-between-map-async-and-imap

[Q]:

I’m trying to learn how to use Python’s multiprocessing package, but I don’t understand the difference between map_async and imap. I noticed that both map_async and imap are executed asynchronously. So when should I use one over the other? And how should I retrieve the result returned by map_async?

Should I use something like this?

def test():
    result = pool.map_async()
    pool.close()
    pool.join()
    return result.get()

result=test()
for i in result:
    print i
[A]:

There are two key differences between imap/imap_unordered and map/map_async:
- The way they consume the iterable you pass to them.
- The way they return the result back to you.

  • map consumes your iterable by converting the iterable to a list (assuming it isn’t a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.

  • imap doesn’t turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don’t take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksize argument larger than default of 1, however.

  • The other major difference between imap/imap_unordered and map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they’re ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResult is returned right away, but you can’t actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map does (map is actually implemented internally as map_async(...).get()). There’s no way to get partial results; you either have the entire result, or nothing.

  • imap and imap_unordered both return iterables right away. With imap, the results will be yielded from the iterable as soon as they’re ready, while still preserving the ordering of the input iterable. With imap_unordered, results will be yielded as soon as they’re ready, regardless of the order of the input iterable.

So, say you have this:

import multiprocessing
import time

def func(x):
    time.sleep(x)
    return x + 2

if __name__ == "__main__":    
    p = multiprocessing.Pool()
    start = time.time()
    for x in p.imap(func, [1,5,3]):
        print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))
This will output:

3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
If you use p.imap_unordered instead of p.imap, you'll see:

3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)
If you use p.map or p.map_async().get(), you'll see:

3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
So, the primary reasons to use imap/imap_unordered over map_async are:

Your iterable is large enough that converting it to a list would cause you to run out of/use too much memory.
You want to be able to start processing the results before all of them are completed.
shareedit
edited May 19 '16 at 22:21
answered Oct 23 '14 at 4:51

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值