参考:
https://blog.youkuaiyun.com/moxiaomomo/article/details/11470157
https://stackoverflow.com/questions/25557686/python-sharing-a-lock-between-processes
场景:
使用python多进程处理程序时,先后分别使用了process和pool两种方式。实际是使用了pytorch中的multiprocessing,和python中的multiprocessing几乎无异,仅仅是tensor放入共享内存更方便。
出现的问题是最先使用processing创建子进程执行正常。后在优化代码时,使用pool创建的子进程不执行。
原因:
创建过程中出现异常,因所以子进程未执行。
出现异常的原因是processing和pool中共享变量的方式不同,processing使用multiprocessing.value这列方式创建共享变量,而pool需要使用multiprocessing.manager.value方式创建共享变量。
实现代码及结果
1. multiprocessing.process:
使用mp.value(‘i’,0)的形式来传递共享变量,信号量也是使用了mp.semaphore(value)来创建。可以正常运行
def main():
tensorSize = 6
kernel_size = 5
padding_size = (kernel_size-1) // 2
tensor = torch.ones((tensorSize, tensorSize))
tensor = F.pad(tensor, (padding_size, padding_size, padding_size, padding_size))
tensor.share_memory_()
# 初始化信号量
semList = []
for i in range(tensorSize - 1):
semList.append(mp.Semaphore(0))
# 初始化进程(p太多)
progressList = []
p1 = mp.Process(target=convFirtRow, args=(mp.Value('i', kernel_size), mp.Value('i', 0),
tensor, semList[0]))
progressList.append(p1)
for i in range(tensorSize-2):
p = mp.Process(target=convOneRow, args=(mp.Value('i', kernel_size), mp.Value('i', i),
tensor, semList[i], semList[i+1]))
progressList.append(p)
p_last = mp.Process(target=convLastRow, args=(mp.Value('i', kernel_size), mp.Value('i', tensorSize-1),
tensor, semList[-1]))
progressList.append(p_last)
for p in progressList:
p.start()
for p in progressList:
p.join()
2. multiprocessing.pool
改用pool代替process之后,如果还用multiprocessing.semaphore()、multiprocessing.value()来创建信号量和共享变量,则pool创建进程时则会出现异常,导致子进程不执行。
tensorSize = 6
kernel_size = 5
padding_size = (kernel_size-1) // 2
maxProccessingNum = 3
tensor = torch.ones((tensorSize, tensorSize))
tensor = F.pad(tensor, (padding_size, padding_size, padding_size, padding_size))
tensor.share_memory_()
# 初始化信号量
semList = []
for i in range(tensorSize - 1):
semList.append(mp.Semaphore(0))
# 创建进程池
with mp.Pool(processes=maxProccessingNum) as pool:
try:
result = pool.apply_async(convFirtRow, (mp.Value('i', kernel_size), mp.Value('i', 0),
tensor, semList[0],))
print(result.get())
for i in range(1, tensorSize - 1):
pool.apply_async(convOneRow, (mp.Value('i', kernel_size), mp.Value('i', i),
tensor, semList[i - 1], semList[i],))
pool.apply_async(convLastRow, (mp.Value('i', kernel_size), mp.Value('i', tensorSize - 1),
tensor, semList[-1],))
pool.close()
pool.join()
except:
traceback.print_exc()
print(tensor)
如图所示:
解决方法
将multiprocessing.semaphore()、multiprocessing.value()改为multiprocessing.Manager.Semaphore()、multiprocessing.Manager.value()即可。其实常数、字符串等也可以直接传递,如下面代码所示:
tensorSize = 6
kernel_size = 5
padding_size = (kernel_size-1) // 2
maxProccessingNum = 3
tensor = torch.ones((tensorSize, tensorSize))
tensor = F.pad(tensor, (padding_size, padding_size, padding_size, padding_size))
tensor.share_memory_()
# 初始化信号量
semList = []
for i in range(tensorSize - 1):
semList.append(mp.Manager().Semaphore(0))
# 创建进程池
with mp.Pool(processes=maxProccessingNum) as pool:
try:
result = pool.apply_async(convFirtRow, (kernel_size, 0,
tensor, semList[0],))
print(result.get())
for i in range(1, tensorSize - 1):
pool.apply_async(convOneRow, (kernel_size, i,
tensor, semList[i - 1], semList[i],))
pool.apply_async(convLastRow, (kernel_size, tensorSize - 1,
tensor, semList[-1],))
pool.close()
pool.join()
except:
traceback.print_exc()
print(tensor)