python编程精要(1)---多线程和多进程

最新推荐文章于 2022-03-18 13:38:23 发布

原创最新推荐文章于 2022-03-18 13:38:23 发布 · 475 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#python多进程和多线程

python编程精要专栏收录该内容

4 篇文章

订阅专栏

本文对比了Python多线程和多进程的使用，详细介绍了各自的优缺点，并通过实例展示了如何利用多进程充分发挥多核CPU的性能。

前言

本系列博文为Python编程的精粹和要点，并不是python编程入门笔记，适合有一定编程经验的朋友。

在训练机器学习算法模板时，在数据量上百万条时，如果还是使用单进程、单核去跑，那就太慢了。Python的高并发尝试过使用多线程，但因为多线程有GIL方面的问题，无法充分发挥多核的威力，所以选择使用了多进程。现对比Python多线程和多进程的使用，并记录相关注意点。

如果想要直接能发挥多核效果的python代码，那么请看多进程一节。

多线程

多线程的优点是所有线程都能看到所有资源，不需要写复杂的通信相关的代码，但它需要考虑多线程竞争问题，需要使用使用锁之类的技术解决竞争，另外，python的多线程无法发挥多核，所以只要了解下即可。

常规多线程

一个简单的多线程例子

#coding:utf-8

import threading
def per_theard_work(i,j):
    print("当前线程id:", threading.currentThread().ident)
    if i==2:
        time.sleep(1)

    print("I'm thread %d,%d" % (i,j))

def threads_test():
    threads=[]
    for i in range(10):
        thread=threading.Thread(target=per_theard_work,args=[i,i+1])
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

if __name__ == '__main__':
    threads_test()

该例子通过一个循环调用threading.Thread()创建了10个线程，每个线程执行per_theard_work()，可通过args传递参数给它，然后把threading.Thread()的返回值放到列表threads中，再调用thread.start()启动线程。

那么主程序何时退出呢？如果工作线程没有执行完，主程序就退出了，那么工作线程也就退出了，所以需要一种机制来等待子进程执行完。代码中就是遍历threads列表，然后执行每个线程的join()函数。这样子主进程就会等待所有线程执行完后再退出。

输出结果:

当前线程id: 18924
I'm thread 1,2
当前线程id: 6020
当前线程id: 11028
I'm thread 3,4
当前线程id: 16268
I'm thread 4,5
当前线程id: 11272
I'm thread 5,6
当前线程id: 11976
I'm thread 6,7
当前线程id: 16280
I'm thread 7,8
当前线程id: 15344
I'm thread 8,9
当前线程id: 16912
I'm thread 9,10
I'm thread 2,3

以上代码没有考虑到竟争情况，所以现在引入一个全局变量，所有线程处理完就需要在这个全局变量中记录信息，如果不加入锁之类的机制，那么结果会是一团糟的。

代码如下：

import threading
import time

threds_msg = []
def per_theard_work(i,j):
    print("当前线程id:", threading.currentThread().ident)
    time.sleep(3)
    threds_msg.append("I'm thread %d,%d" % (i,j))

def threads_test():
    threads=[]
    for i in range(1,10):
        thread=threading.Thread(target=per_theard_work,args=[i,i+1])
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

    print(threds_msg)

if __name__ == '__main__':
    threads_test()

好吧，我失算了，实际的运行结果如下（运行了很多次，而且把线程开到100个，threds_msg[]都没有出现乱的情况，这难道就是GIL锁生效了？）：

当前线程id: 2760
当前线程id: 20448
当前线程id: 30356
当前线程id: 23484
当前线程id: 6020
当前线程id: 30520
当前线程id: 30252
当前线程id: 22048
当前线程id: 26660
["I'm thread 4,5", "I'm thread 3,4", "I'm thread 2,3", "I'm thread 1,2", "I'm thread 6,7", "I'm thread 5,6", "I'm thread 9,10", "I'm thread 8,9", "I'm thread 7,8"]

如果上述实现情况不对，需要加锁的话也很简单，只要进行如下操作即可，先定义一把全局锁，然后在需要的地方加锁和解锁。

lock=threading.Lock()#在全局位置定义一把锁

lock.acquire()#加锁
threds_msg.append("I'm thread %d,%d" % (i,j))
lock.release()#解锁

使用map

def thread_pool_thread(msg):
    print("now:", msg, end='\n')

def thread_pool_test():
    from multiprocessing import Pool
    from multiprocessing.dummy import Pool as ThreadPool
    pool = ThreadPool(10)

    msgs = [
        'hello, world.',
        'nice to meet you',
        'I\'m good boy'
    ]
    results = pool.map(thread_pool_thread, msgs)
    pool.close()
    pool.join()

def multi_threads_test():
    thread_pool_test()

使用线程池

线程池的思想很好，和集群的思路map和reduce差不多，如果map和reduce这个例子体现的很好。但使用线程池还是一样的毛病，无法充分使用多核。

#定义结果统一回调
totalsum=0
def onresult(req,sum):
    global totalsum
    totalsum+=sum

#并发10条线程并求和
def threadpoolSum():
    # 创建需求列表
    reqlist = []
    for i in range(10):
        reqlist.append(([i * 10 ** 7 + 1, 10 ** 7 * (i + 1)], None))

    start_time = time.time()
    #创建需求
    reqs = threadpool.makeRequests(sum, reqlist, callback=onresult)
    # 创建线程为10的线程池
    mypool = threadpool.ThreadPool(10)
    #把需求添加到线程池
    for item in reqs:
        mypool.putRequest(item)

    # 阻塞等待
    mypool.wait()
    # 打印结果
    print(totalsum)

    print("耗时：%f s"  % (time.time()-start_time))

10个线程时的CPU使用情况，发现大部分的CPU都是空闲的。

1524814781237

运行结果：

[root@ceshi03 study]# python python_base.py 
5000000050000000
耗时：10.739171 s

使用100个线程

代码稍稍有点区别，主要是创建需求列表时，要分成100份了。

#定义结果统一回调
totalsum=0
def onresult(req,sum):
    global totalsum
    totalsum+=sum

#并发10条线程并求和
def threadpoolSum():
    # 创建需求列表
    reqlist = []
    for i in range(100):
        reqlist.append(([i * 10 ** 6 + 1, 10 ** 6 * (i + 1)], None))

    start_time = time.time()
    #创建需求
    reqs = threadpool.makeRequests(sum, reqlist, callback=onresult)
    # 创建线程为10的线程池
    mypool = threadpool.ThreadPool(100)
    #把需求添加到线程池
    for item in reqs:
        mypool.putRequest(item)

    # 阻塞等待
    mypool.wait()
    # 打印结果
    print(totalsum)

    print("耗时：%f s"  % (time.time()-start_time))

100个线程使用CPU的情况和10个线程差不多，说明增加线程已经没有用了。

1524815071528

不过运行结果稍微快点

[root@ceshi03 study]# python python_base.py 
5000000050000000
耗时：8.464780 s

多进程

直接上多进程代码：

import multiprocessing

#调用sum函数求和
def sum(start,end):
    sum=0
    for i in range(start,end+1):
        sum+=i
    return sum

#定义结果统一回调
totalsum=0
def onresult2(sum):
    global totalsum
    totalsum+=sum

#使用多进程处理。
#process_func()是自己实现的每个进程需要独立做的事。
#args_list为传给process_func()的参数，args_list中的每一个为独立的参数，如果process_func()是多参数，那么使用括号括起来，括号里面用逗号分隔。
def multi_process_aync_handle(process_func,args_list,process_nums=24):
    pool = multiprocessing.Pool(processes=process_nums)#创建进程池个数，最好和CPU核数一致
    result = []
    tmp_list = []
    for i in range(len(args_list)):#根据args_list的长度创建进程个数，如果这个长度大于process_nums，那么每次只能创建process_nums个进程，只有当某个进程完成任务后才能继续执行。args_list中的每个成员作为参数分别传给对应进程的process_func()函数
        pool.apply_async(process_func, args_list[i], callback=onresult2)#每个进程执行process_func()函数后的结果传给callbak，最终由callback合并所有结果。
    pool.close()
    pool.join()

    print(totalsum)

def multi_processes_test():
    # 创建需求列表
    reqlist = []
    for i in range(10):
        reqlist.append((i * 10 ** 7 + 1, 10 ** 7 * (i + 1)))

    start_time = time.time()
    multi_process_aync_handle(sum, reqlist, 24)
    print("多进程耗时：%f s" % (time.time()-start_time))

if __name__ == '__main__':
    #多进程测试
    multi_processes_test()

上述代码虽然创建了24个进程池，但需求列表只有10份，所以只有10个CPU在工作，而这些工作的CPU没有达到100%是因为多进程这个情况下执行速度实在太快，还没到达100%就执行完了。

1524817781780

运行结果：

[root@ceshi03 study]# python python_base.py 
5000000050000000
多进程耗时：1.247409 s

为了演示所有CPU都被占到100%的情况，现计算从1到100亿的情况，只要修改如下代码即可。

def multi_processes_test():
    # 创建需求列表
    reqlist = []
    for i in range(100):
        reqlist.append((i * 10 ** 8 + 1, 10 ** 8 * (i + 1)))

    start_time = time.time()
    multi_process_aync_handle(sum, reqlist, 100)
    print("多进程耗时：%f s" % (time.time()-start_time))

1524818326783

这个执行太慢了，不等执行结果了。

至此其实已经差不多了，说明python可以使用多进程把多核利用起来。

现再记一段代码，不使用回调函数，而是把所有子进程的结果放到一个列表中，然后再调用相关函数处理，以下代码效果和上述代码一致，还是建议使用上面的代码，这里仅做备忘。

#使用多进程处理。
def multi_process_aync_handle(process_func,args_list,process_nums=24):
    pool = multiprocessing.Pool(processes=process_nums)
    result = []
    tmp_list = []
    for i in range(len(args_list)):
        result.append(pool.apply_async(process_func, (args_list[i])))
    pool.close()
    pool.join()

    for x in result:
        tmp_list.append(x.get())

    print(tmp_list)
    for i in tmp_list:
        onresult2(i)

    print(totalsum)

输出结果：

[root@ceshi03 study]# python python_base.py 
[50000005000000, 150000005000000, 250000005000000, 350000005000000, 450000005000000, 550000005000000, 650000005000000, 750000005000000, 850000005000000, 950000005000000]
5000000050000000
多进程耗时：1.932801 s