python 中的进程和线程_time.sleep(random.random(0,1)*3)-优快云博客

本文链接：https://blog.youkuaiyun.com/hjiaqing/article/details/134157262

在爬中开发中，进程和线程的概念是非常重要。以下是查找的学习材料以做笔记

1，多进程--使用multiprocessing模块创建多进程

multiprocessing模块提供了一个Process类来描述一个进程对象。创建子进程时，只需要传入一个执行函数和函数参数，即可完成一个Process实例的创建，用start()方法启动进程，用join()方法实现进程间的同步。

import os
from multiprocessing import Process

#child process execute code
def run_proc(name):
    print (f'Child process {name} :{os.getpid()}')
if __name__ == '__main__':
    print(f'Parent process {os.getpid()}')
    for i in range(5):
        p = Process(target = run_proc, args = (str(i),))
        print(f'Process will start')
        p.start()
        print(f'the childProcess {i} is running')
    p.join()
    print('Process end')

2,multiprocessing模块提供了一个Pool类开代表进程池对象

Pool提供指定数量进程供用户调用，默认大小是CPU的核数，也可心指定。当有新的请求提交到Pool中时，如果池还没有满，那么就会创建一个新的进程用来执行该请求，但如果池中的进程数已经达到规定的最大值，那么该请求就会等待，直到池中有进程结束，

from multiprocessing import Pool
import os,time ,random

def run_task(name) :
    print(f'Task {name} ,{os.getpid()} is running...')
    time.sleep(random.random()*3)
    print(f'Task {name} end.')

if __name__=="__main__" :
    print(f'the mainProcess is {os.getpid()}')
    run_pool = Pool(processes=2) 
    for i in range(5) :
        run_pool.apply_async(run_task,args=(i,))
        print(f'Pool is start :{i}')

    run_pool.close()
    run_pool.join()

ps:Pool对象调用join()方法会等待所有子进程执行完毕，调用join()之前必须先调用close(),调用close()之后，就不能继续添加新的process了

3，进程间通信————QUEUE 和 pipe

Queue是多进程安全队列，有两个方法：put插入数据到队列，get从队列读取并且删除一个元素

rom multiprocessing import Queue, Process
import random,time,os

def process_write(q,urls):
    '''write in queue'''
    print(f'the write queue id is {os.getpid()}')
    for url in urls:
        q.put(url)
        print(f'the url is {url}')
        time.sleep(random.random())

def process_read(q):
    '''read in queue'''
    print(f'the read is {os.getpid()}')
    while True:
        url=q.get(True)
        print(f'the read url is {url}')
if __name__=="__main__" :
    q = Queue()
    write_process1 = Process(target=process_write,args=(q,['url1','url2','url3']))
    write_process2 = Process(target=process_write,args=(q,['url4','url5','url6']))
    read_process = Process(target=process_read,args=(q,))
    write_process1.start()
    write_process2.start()
    read_process.start()
    write_process1.join()
    write_process2.join()
    read_process.terminate()

4,pipe通信机制

Pipe常用来在两个进程间通信，两个进程分别位于管道的两端。

Pipe方法返回（conn1,conn2)代表一个管道的两个端，Pipe方法有duplex参数，如果duplex为TRUE,即为全双工模式，两端均可以收发。如为False，conn1只负责收，con2只负责发。send，recv方法分别是发和收

import multiprocessing
import random,os,time

def proc_send(pip,urls) :
    #print(f'the Process {os.getpid()} send')
    for url in urls:
        pip.send(url)
        print(f'the Process {os.getpid()} send url is {url}')
        time.sleep(random.random())

def proc_recv(pip):
    #print(f'the process recv{os.getpid()} ')
    while True:
        print(f'the Process {os.getpid()},{pip.recv()}')
        time.sleep(random.random())
    #print(f'recv is {re}')

if __name__=="__main__" :
    print(f'the main process is {os.getpid()}')
    pip = multiprocessing.Pipe()
    proccess_send=multiprocessing.Process(target=proc_send,args=(pip[0],['url_'+ str(i) for i in range(10)]))
    proccess_recv=multiprocessing.Process(target=proc_recv,args=(pip[1],))
    proccess_send.start()
    proccess_recv.start()
    proccess_send.join()
    proccess_recv.join()
    print(f'the main process is over')

5,多线程

应用场景：运行时间长的任务放后台，需要等待的任务实现上，如网络收发数据。

两种方式创建多线程，第一种把一个函数传入并创建Thread实例，再调用start,

第二种直接继承threading.Thread。重写__init__方法和run方法

import threading
import os,time,random

def threading_run(urls):
    print(f'the threading name is {threading.current_thread().name}---{os.getpid()}')
    for url in urls :
        print(f'the the {threading.current_thread().name} is {url}')
        time.sleep(random.random())
        print(f'the threading end {threading.current_thread().name}')

t1 = threading.Thread(target=threading_run,name='t1',args=(['url_1','url_2','url_3'],))
t2 = threading.Thread(target=threading_run,name='t2', args=(['url_4','url_5','url_6'],))
t1.start()
t2.start()
t1.join()
t2.join()

第二种继承

import threading
import time,random

class MyThread(threading.Thread):
    def __init__(self, name,urls) -> None:
        threading.Thread.__init__(self, name=name )
        self.urls=urls

    def run(self):
        print(f'the thread name is {threading.current_thread().name}')
        for url in self.urls :
            print(f'the threading name is {threading.current_thread().name},the url is {url}')
            time.sleep(random.random())    
            print(f'the threading {threading.current_thread().name} is ended')    

t1 = MyThread(name='t1',urls=['url_1','url_2','url_3'])
t2 = MyThread(name='t2', urls=['url_4','url_5','url_6'])
t1.start()
t2.start()
t1.join()
t2.join()

参考文献：

1，Python 的 Gevent --- 高性能的 Python 并发框架-优快云博客

2，Python 高级编程之并发与多线程（三）_python3多线程并发_大数据老司机的博客-优快云博客

3，后端编程Python3-多进程与多线程