Python系列 之 单进程 多进程和多线程实例


为了能够更直观的看出 单进程 多进程和多线程在IO密集任务中的效率,现在对不同形式情况下的一个耗时情况做个对比:

计时装饰器

也是为了回顾之前的学习知识点,先做一个计时用的装饰器函数,用来对不同情况函数运行的耗时情况:

import time
def timer(is_timing: bool=True):
    """计时器装饰器"""
    def decorator(fn):
        from functools import wraps

        @wraps(fn)
        def wrapper(*args, **keywords):
            start_time = time.time()
            func = fn(*args, **keywords)
            end_time = time.time()
            if is_timing:
                print(f"耗时:{end_time-start_time}")
            return func
        return wrapper
    return decorator

功能函数

两个函数:
open_url(url):向url方法请求,返回url响应的内容;
save_page(title, page_text):把url响应的内容保存到本地。

def open_url(url):
    import requests
    try:
        headers = {
            "user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
        }
        resp = requests.get(url, headers=headers)
        print(url, resp.status_code, resp.apparent_encoding)
        resp.encoding = resp.apparent_encoding
        return resp.text
    except Exception as e:
        print(url, "访问失败\n", e)
        return None

def save_page(title, page_text):
    if page_text is None:
        return
    try:
        with open(f"{title}.txt", "w", encoding="utf-8") as w:
            w.write(page_text)
        print(f"{title}:下载成功")
    except Exception as e:
        print(f"{title}:下载失败", e)

urls 是 不同网站的网址

单进程运行

单进程执行以上函数,查看需要多少时长:
电脑性能,网速,目标url主机的响应速度都会影响最终的耗时情况。

@timer(is_timing=True)
def run_url():
    """单进程 运行"""
    urls = [
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
         "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/"
    ]
    for i, url in enumerate(urls):
        page_text = open_url(url)
        save_page(i, page_text)
        
if __name__ == '__main__':
    run_url()
# 耗时:23.446341037750244
# 这什么破电脑

多进程运行

开启多个进程的方法有很多,在这里用了这两种方法:

方法一: Pool方法
方法二: ProcessPoolExecutor方法

示例代码:

from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor,as_completed

@timer(is_timing=True)
def run_url_pro():
    """开启 多进程 运行"""
    urls = [
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
         "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/"
    ]
    # # 方法一: Pool方法
    # with Pool() as p:
    #     # Pool().apply_async()方法
    #     # futures = [p.apply_async(func=open_url, args=(url,)) for url in urls]
    #     # for i, future in enumerate(futures):
    #     #     p.apply_async(func=save_page, args=(i, future.get(),)).get()

    #     # Pool().map()方法 只可以接受1个函数参数的可迭代对象
    #     futures = p.map(open_url, urls)
    #     # Pool().starmap() 可以接受多个函数参数 形式如:[(param1,param2),(param1,param2),(param1,param2)]
    #     p.starmap(save_page, zip(range(5), futures))

    # 方法二: ProcessPoolExecutor方法
    with ProcessPoolExecutor() as executor:
        # Executor.submit方法
        futures = [executor.submit(open_url, url) for url in urls]
        for i, future in enumerate(as_completed(futures)):
            executor.submit(save_page, i, future.result())
        # Executor.map方法
        # futures = executor.map(open_url, urls)
        # executor.map(save_page, range(5), futures)

if __name__ == '__main__':
    run_url_pro()

# 10.064575433731079

比单线程好多了。

多线程运行

多线程开启用的:

ThreadPoolExecutor方法

代码示例:

from concurrent.futures import ThreadPoolExecutor, as_completed

@ timer(is_timing=True)
def run_url_thread():
    urls = [
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/",
        "https://www.***.com/"
    ]
    # ThreadPoolExecutor方法
    with ThreadPoolExecutor() as executor:
        # # Executor.submit方法
        # futures = [executor.submit(open_url, url) for url in urls]
        # for i, future in enumerate(as_completed(futures)):
        #     executor.submit(save_page, i, future.result())
        # Executor.map方法
        futures = executor.map(open_url, urls)
        executor.map(save_page, range(5), futures)

if __name__ == '__main__':
    run_url_thread()
# 耗时:15.902909755706787

多线程为什么会比多进程耗时要多呢?

结果

单进程 耗时:23.446341037750244
多进程 耗时:10.064575433731079
多线程 耗时:15.902909755706787

通过对10个不同url的访问并保存网站响应的内容到本地,
为什么我的结果里面 多线程要比多进程更耗时呢?
是有什么地方写的不合理吗?

如果有什么不对的地方,欢迎指正。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值