Multi-Process Service(MPS)原理:
一个GPU卡上同时只能执行一个context;因此多进程同时往一个GPU卡上提交任务时,同时只能有一个任务跑起来,没法多任务并行;
MPS服务:多进程提交的任务先提交至MPS服务进程,该进程会把所有任务使用同一个context但不同的stream, 提交给该块GPU卡,使得可以多任务并行;
缺点:增大了任务提交的延迟,因为要多经过MPS服务进程这个“代理”;
CUDA Context里面有什么:
The cuda API exposes features of a stateful library: two consecutive calls relate one-another. In short, the context is its state.
The runtime API is a wrapper/helper of the driver API. You can see in the driver API that the context is explicitly made available, and you can have a stack of contexts for convenience. There is one specific context which is shared between driver and runtime API (See

最低0.47元/天 解锁文章

451





