一、进程 vs 线程:核心差异解剖
|
特性 |
进程 (Process) |
线程 (Thread) |
|
内存空间 |
独立内存,互不干扰 |
共享父进程内存,需锁机制防冲突 |
|
GIL 影响 |
无GIL限制,可并行利用多核 |
受全局解释器锁(GIL)限制,CPU密集型并发效率低 |
|
创建开销 |
资源消耗大,启动慢 |
轻量级,创建切换快 |
|
数据共享 |
需IPC(管道/队列),复杂 |
可直接读写全局变量,风险高 |
|
适用场景 |
CPU密集型任务(计算、数据处理) |
I/O密集型任务(网络请求、文件读写) |
GIL真相:Python线程因全局解释器锁(GIL)在同一时刻仅允许一个线程执行字节码,导致多线程在CPU计算中无法真正并行。
二、关键对决:CPU密集型任务实战
场景:计算千万级质数数量(压榨CPU)
import multiprocessing
import threading
import time
def is_prime(n):
if n < 2: return False
for i in range(2, int(n**0.5)+1):
if n % i == 0: return False
return True
def count_primes(start, end):
return sum(1 for i in range(start, end) if is_prime(i))
# 多进程方案
def process_demo():
with multiprocessing.Pool(4) as pool:
results = pool.starmap(count_primes, [(1, 250000), (250000, 500000), (500000, 750000), (750000, 1000000)])
print(f"进程总质数: {sum(results)}")
# 多线程方案
def thread_demo():
results = [0] * 4
threads = []
ranges = [(1, 250000), (250000, 500000), (500000, 750000), (750000, 1000000)]
def worker(idx, start, end):
results[idx] = count_primes(start, end)
for idx, (start, end) in enumerate(ranges):
t = threading.Thread(target=worker, args=(idx, start, end))
t.start()
threads.append(t)
for t in threads:
t.join()
print(f"线程总质数: {sum(results)}")
# 性能对比
if __name__ == "__main__":
start = time.time()
process_demo()
print(f"进程耗时: {time.time()-start:.2f}s")
start = time.time()
thread_demo()
print(f"线程耗时: {time.time()-start:.2f}s")
结果示例:
进程总质数: 78498
进程耗时: 3.21s
线程总质数: 78498
线程耗时: 12.87s # GIL导致线程无法并行计算!
三、IO密集型任务:线程逆袭
场景:快速下载多个网页(I/O阻塞主导)
import threading
import multiprocessing
import requests
urls = ["https://www.example.com", "https://www.python.org", ...] # 10个URL
def download(url):
response = requests.get(url)
return len(response.content)
# 多线程下载
def thread_io_demo():
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download, urls)
# 多进程下载
def process_io_demo():
with multiprocessing.Pool(5) as pool:
pool.map(download, urls)
# 测试结果(典型):
# 线程耗时:1.8s | 进程耗时:2.5s(进程创建开销影响)
四、决策流程图:秒选并发方案
graph TD
A[你的任务类型?] --> B{CPU密集型?}
B --> |是| C[选多进程<br>multiprocessing]
B --> |否| D{涉及阻塞I/O?}
D --> |是| E[选多线程<br>threading/concurrent.futures]
D --> |否| F[考虑协程<br>asyncio]
C --> G[需共享数据?]
G --> |是| H[用Manager/Queue]
G --> |否| I[直接启动]
E --> J[需线程安全?]
J --> |是| K[加Lock/RLock]
J --> |否| L[直接执行]
避坑指南:
- 进程间通信:优先用
multiprocessing.Queue或Pipe,避免文件/共享内存 - 线程安全:对共享变量必须加锁(
threading.Lock()) - 协程优势:超高频I/O(如万级网络请求)选asyncio,比线程更轻量
终极启示录:
- CPU杀手 → 拥抱多进程,释放多核性能
- I/O达人 → 首选多线程,轻量切换快如风
- 数据共享 → 进程用队列,线程用锁控
- 性能巅峰 → 混合进程+线程+协程(高级模式)
掌握此战局,让你的Python并发代码性能飙升!

被折叠的 条评论
为什么被折叠?



