CUDA Stream优化经验

原创

已于 2024-06-13 21:13:22 修改 · 4.5k 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#linux #windows #服务器

于 2020-05-06 23:03:34 首次发布

Multi-Process Service(MPS)原理：

一个GPU卡上同时只能执行一个context；因此多进程同时往一个GPU卡上提交任务时，同时只能有一个任务跑起来，没法多任务并行；

MPS服务：多进程提交的任务先提交至MPS服务进程，该进程会把所有任务使用同一个context但不同的stream, 提交给该块GPU卡，使得可以多任务并行；

缺点：增大了任务提交的延迟，因为要多经过MPS服务进程这个“代理”；

CUDA Context里面有什么：

The cuda API exposes features of a stateful library: two consecutive calls relate one-another. In short, the context is its state.

The runtime API is a wrapper/helper of the driver API. You can see in the driver API that the context is explicitly made available, and you can have a stack of contexts for convenience. There is one specific context which is shared between driver and runtime API (See