Python使用multiprocessing多进程报错的一种情况解决方法

最新推荐文章于 2025-03-08 16:38:24 发布

SDDX_CDY

最新推荐文章于 2025-03-08 16:38:24 发布

阅读量3.6k

点赞数 1

文章标签： python

本文链接：https://blog.youkuaiyun.com/SDDX_CDY/article/details/127320096

版权

问题

Python使用multiprocessing进行多进程执行是常常会报各种奇怪的错误。比如使用tushare时会反复拉起pro_api，全局变量访问次数超过限制，进程帧读取权限报错等等。

分析

注意到一个问题，在这种情况下无论是报什么错误，往往是因为权限冲突或者反复调用。这意味着很多不应该被重复调用的工具被多次重复调用了。

在multiprocessing的官方文档中，我找到了这样一条提示：

Functionality within this package requires that the __main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter.

也就是说，在交互式环境中，我们应当使用main保护进程创建过程。但是实际上在验证时，未必有效。

在Stack Overflow中，我找到了这样一条说明：

On some systems, multiprocessing has to spawn a new copy of python and import your module to get to the worker code. Anything at module level is executed again... including the parent code that creates the pool. This would be an infinite recursion except python detects the problem and gives you a handy tip.

也就是说，这是因为multiprocessing会复制整个Python进程环境来保证子进程能够工作，然后重新执行所有模块级的任务，如果创建进程的过程不受保护，就会形成无限递归。

解决方法

综合上述两点，做出一个基本猜测：由于multiprocessing会复制整个父进程来保证子进程能够工作，那么创建进程的过程必须受到main的保护以避免子进程的进程池创建过程调用创建进程池的方法。那么在任何时候保证只在main空间能创建进程池即可。代码如下：

if __name__ == "__main__":
    with mp.Pool(10) as pool:
        use_function(pool, *args, **kwargs)

评价

这里往往报错的不会是multiprocessing，而且往往不止一个地方报错，因此会比较有迷惑性。我这次，同时出现了读取权限冲突、网络访问超限、程序拉起出错等错误。上在出现多重错误的时候就应该能想到可能问题不出在所有执行过程了，所以我首先检查了multiprocessing，但还是错过了，后来排查了一圈回来还是multiprocessing的嫌疑最大，只有这个模块我使用的懒人方法，里头的创建过程对我而言是完全黑箱的。