nvidia-cuda-mps-control mps 参数介绍

最新推荐文章于 2025-05-10 18:15:00 发布

转载最新推荐文章于 2025-05-10 18:15:00 发布 · 3.1k 阅读

2 ·

CC 4.0 BY-SA版权

原文链接：http://manpages.ubuntu.com/manpages/trusty/man1/alt-nvidia-331-cuda-mps-control.1.html

文章标签：

#nvidia #mps #启动 #gpu #共享

linux 专栏收录该内容

26 篇文章

订阅专栏

本文详细介绍了NVIDIA CUDA MPS（Multi-Process Service）的运行原理、启动方法，包括如何通过命令行启动控制台和前端管理用户界面，以及环境变量设置、文件管理及关键操作如启动服务器、查看连接列表等。MPS是CUDA编程中透明管理多GPU并行计算的重要工具。

部署运行你感兴趣的模型镜像

1. 功能

nvidia-cuda-mps-control - NVIDIA CUDA Multi Process Service management program

2. 启动

nvidia-cuda-mps-control -d

3. 描述


MPS  is  a  runtime  service  designed  to  let  multiple  MPI processes using CUDA to run
       concurrently on a single GPU in a way that's transparent  to  the  MPI  program.   A  CUDA
       program runs in MPS mode if the MPS control daemon is running on the system.

       When  CUDA  is  first initialized in a program, the CUDA driver attempts to connect to the
       MPS control daemon. If the connection attempt fails, the program continues to  run  as  it
       normally  would  without  MPS.  If  however,  the connection attempt to the control daemon
       succeeds, the CUDA driver then requests the daemon to start an MPS server on  its  behalf.
       If  there's  an MPS server already running, and the user id of that server process matches
       that of the requesting client process, the  control  daemon  simply  notifies  the  client
       process  of  it,  which  then  proceeds to connect to the server. If there's no MPS server
       already running on the system, the control daemon launches an MPS  server  with  the  same
       user  id  (UID) as that of the requesting client process. If there's an MPS server already
       running, but with a different user id than that of the client process, the control  daemon
       requests  the  existing  server  to shutdown as soon as all its clients are done. Once the
       existing server has terminated, the control daemon launches a new server with the user  id
       same as that of the queued client process.

       The MPS server creates the shared GPU context, manages its clients, and issues work to the
       GPU on behalf of its clients. An MPS server can support upto 16 client CUDA contexts at  a
       time.  MPS  is  transparent  to  CUDA  programs,  with all the complexity of communication
       between the client process, the server and the control daemon  hidden  within  the  driver
       binaries.

       Currently,  CUDA  MPS  is  available on 64-bit Linux only, requires a device that supports
       Unified Virtual Address (UVA) and has compute capability SM 3.5 or  higher.   Applications
       requiring  pre-CUDA  4.0 APIs are not supported under CUDA MPS.  MPS is also not supported
       on multi-GPU configurations. Please use CUDA_VISIBLE_DEVICES  when  starting  the  control
       daemon to limit visibility to a single device.

4  选项

-d
       Start the MPS control daemon, assuming the user has enough privilege (e.g. root).

   -h, --help
       Print a help message.

   <no arguments>
       Start the front-end management user interface to the MPS control daemon, which needs to be
       started first. The front-end UI keeps reading commands from stdin until EOF.  Commands are
       separated by the newline character. If an invalid command is issued and rejected, an error
       message will be printed to stdout. The  exit  status  of  the  front-end  UI  is  zero  if
       communication with the daemon is successful. A non-zero value is returned if the daemon is
       not found or connection to the daemon is broken unexpectedly. See the "quit" command below
       for more information about the exit status.

       Commands supported by the MPS control daemon:

       get_server_list
              Print out a list of PIDs of all MPS servers.

       start_server -uid UID
              Start a new MPS server for the specified user (UID).

       shutdown_server PID [-f]
              Shutdown  the  MPS  server  with  given PID. The MPS server will not accept any new
              client connections and it exits when all current clients disconnect. -f  is  forced
              immediate  shutdown.  If  a  client  launches  a faulty kernel that runs forever, a
              forced shutdown of the MPS server may be required, since the MPS server creates and
              issues GPU work on behalf of its clients.

       get_client_list PID
              Print out a list of PIDs of all clients connected to the MPS server with given PID.

       quit [-t TIMEOUT]
              Shutdown the MPS control daemon process and all MPS servers. The MPS control daemon
              stops accepting new clients while waiting for current MPS servers and  MPS  clients
              to  finish. If TIMEOUT is specified (in seconds), the daemon will force MPS servers
              to shutdown if they are still running after TIMEOUT seconds.

              This command is synchronous. The front-end UI waits for  the  daemon  to  shutdown,
              then  returns the daemon's exit status. The exit status is zero iff all MPS servers
              have exited gracefully.

5. 环境

CUDA_MPS_PIPE_DIRECTORY
              Specify the directory that contains the named pipes used  for  communication  among
              MPS  control,  MPS  server, and MPS clients. The value of this environment variable
              should be consistent in the MPS  control  daemon  and  all  MPS  client  processes.
              Default directory is /tmp/nvidia-mps

CUDA_MPS_LOG_DIRECTORY
              Specify the directory that contains the MPS log files. This variable is used by the
              MPS control daemon only. Default directory is /var/log/nvidia-mps

6. 文件

Log files created by the MPS control daemon in the specified directory

       control.log
              Record startup and shutdown of MPS control daemon, user commands issued with  their
              results, and status of MPS servers.

       server.log
              Record startup and shutdown of MPS servers, and status of MPS clients.

您可能感兴趣的与本文相关的镜像