深入理解nvidia container toolkit核心组件与流程

一、引言

在安装好nvidia container toolkit的节点上,我们先起一个普通容器,并进入容器执行nvidia-smi命令,发现无法执行:

# docker run -d --name normal-container nginx:latest
992ed0b4cb7134b7cb528124b4ebed193215f0987ed288a582fb088486a9b67a

# docker exec -ti normal-container bash
root@992ed0b4cb71:/# nvidia-smi
bash: nvidia-smi: command not found

在上述命令加上--gpus all参数创建另外一个容器,并进入该容器执行nvidia-smi命令,发现可以正常执行:

# docker run -d --name nvidia-container --gpus all nginx:latest
81281dc9dc0a7d3c9de5e90dffdfa593975976c5a2a07c7a5ebddfd4e704bbe3

# docker exec -ti nvidia-container bash
root@81281dc9dc0a:/# nvidia-smi
Sun May 18 12:49:09 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8              8W /  165W |     835MiB /  16380MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        24      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

为什么在创建命令中增加了一个--gpus all参数,nvidia-container里就有了nvidia-smi命令呢?这个可执行文件到底是怎么来的呢?又与nvidia container toolkit哪些组件有关呢?本文带着这些问题,尝试深入理解nvidia container toolkit。

二、nvidia container toolkit组件组成

继续在安装好nvidia container toolkit的节点上输入nvidia-c并按tab键补全,会有如下输出:

# nvidia-c
nvidia-cdi-hook                nvidia-container-cli           nvidia-container-runtime       nvidia-container-runtime-hook  nvidia-container-toolkit       nvidia-ctk

这些就是nvidia container toolkit的可执行组件,nvidia container toolkit可执行组件对应的源码在两个仓库:

  • https://github.com/NVIDIA/nvidia-container-toolkit:go语言开发
  • https://github.com/NVIDIA/libnvidia-container:c语言开发

为了更快速的了解github开源项目源码,除了看项目的readme和官网文档,还可以尝试这个AI工具:deepwiki。上述两个仓库在deepwiki上分别对应:

  • https://deepwiki.com/NVIDIA/nvidia-container-toolkit
  • https://deepwiki.com/NVIDIA/libnvidia-container

对应的各个组件及基本功能:

  • nvidia-cdi-hook :go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/main.go。支持CDI(Container Device Interface)环境的hook,如果环境不支持CDI,用的还是nvidia-container-runtime-hook。
  • nvidia-container-cli :c开发的可执行文件,代码在https://github.com/NVIDIA/libnvidia-container项目中,main函数入口:https://github.com/NVIDIA/libnvidia-container/src/cli/main.c。核心命令行工具,负责设备驱动库、nvidia-smi等命令注入(挂载)相关工作。
  • nvidia-container-runtime :go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime。容器运行时封装层,拦截并扩展容器创建流程,为OCI spec注入preStartHook(例如nvidia-container-runtime-hook)后调用底层runc等工具创建容器。
  • nvidia-container-runtime-hook :go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime-hook。容器的preStartHook,容器启动前执行,主要是拼接参数调用nvidia-container-cli。
  • nvidia-container-toolkit :go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/tools/container/nvidia-toolkit/run.go。可以借助该工具辅助安装nvidia-container-runtime(例如配置docker配置文件并重启docker)。
  • nvidia-ctk :go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk。包含hook、runtime、cdi、config等子命令,cdi、csv、graphics等场景会用到。

2.1.2 数据流示意图

在docker+runc的环境中,以前面的命令docker run -d --name normal-container nginx:latest创建容器为例,数据流如下:

在这里插入图片描述

在我的环境中不支持CDI,如果加上--gpus all参数后,创建使用GPU的容器时相关的nvidia container toolkit组件只有三个:nvidia-container-runtimenvidia-container-runtime-hooknvidia-container-cli,数据流变成了如下形式:

在这里插入图片描述

三、关键组件源码分析

有了上述流程理解,下面再从源码角度来深入验证上述流程。

以下代码基于https://github.com/NVIDIA/nvidia-container-toolkit@1.17.4

3.1 nvidia-container-runtime

  • main函数入口
// nvidia-container-toolkit/cmd/nvidia-container-runtime/main.go
func main() {
   
   
    r := runtime.New()
    err := r.Run(os.Args)
    if err != nil {
   
   
        os.Exit(1)
    }
}
  • Run函数

Run函数逻辑不多,主要是解析配置文件内容、初始化nvidia-ctk和nvidia-container-runtime-hook配置、配置runtime对象并执行Exec方法:

// nvidia-container-toolkit/internal/runtime/runtime.go
func (r rt) Run(argv []string) (rerr error) {
   
   
    ...
    // 获取配置文件
    cfg, err := config.GetConfig()
    ...
    // 配置nvidia-container-runtime-hook路径
    cfg.NVIDIAContainerRuntimeHookConfig.Path = config.ResolveNVIDIAContainerRuntimeHookPath(&logger.NullLogger{
   
   }, cfg.NVIDIAContainerRuntimeHookConfig.Path)
    ...
    driver := root.New(
        root.WithLogger(r.logger),
        root.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
    )

    r.logger.Tracef("Command line arguments: %v", argv)
    runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
    if err != nil {
   
   
        return fmt.Errorf("failed to create NVIDIA Container Runtime: %v", err)
    }

    if printVersion {
   
   
        fmt.Print("\n")
    }
    return runtime.Exec(argv)
}
  • 解析配置文件

配置文件首先会先读取环境变量XDG_CONFIG_HOME的值,如果不为空,则配置文件为$XDG_CONFIG_HOME/nvidia-container-runtime/config.toml,否则默认为/etc/nvidia-container-runtime/config.toml

// nvidia-container-toolkit/internal/config/config.go
func GetConfig() (*Config, error) {
   
   
    cfg, err := New(
        WithConfigFile(GetConfigFilePath()),
    )
    if err != nil {
   
   
        return nil, err
    }

    return cfg.Config()
}

// nvidia-container-toolkit/internal/config/config.go
func GetConfigFilePath() string {
   
   
    // configOverride = XDG_CONFIG_HOME
    if XDGConfigDir := os.Getenv(configOverride); len(XDGConfigDir) != 0 {
   
   
        return filepath.Join(XDGConfigDir, configFilePath) // configFilePath = nvidia-container-runtime/config.toml
    }

    return filepath.Join("/etc", configFilePath)
}

// nvidia-container-toolkit/internal/config/toml.go
func (t *Toml) Config() (*Config, error) {
   
   
    cfg, err := t.configNoOverrides()
    if err != nil {
   
   
        return nil, err
    }
    if err := cfg.assertValid(); err != nil {
   
   
        return nil, err
    }
    return cfg, nil
}

// nvidia-container-toolkit/internal/config/toml.go
func (t *Toml) configNoOverrides() (*Config, error) {
   
   
    cfg, err := GetDefault()
    if err != nil {
   
   
        return nil, err
    }
    if t == nil {
   
   
        return cfg, nil
    }
    if err := t.Unmarshal(cfg); err != nil {
   
   
        return nil, fmt.Errorf("failed to unmarshal config: %v", err)
    }
    return cfg, nil
}

// nvidia-container-toolkit/internal/config/config.go
func GetDefault() (*Config, error) {
   
   
    d := Config{
   
   
        AcceptEnvvarUnprivileged:    true,
        SupportedDriverCapabilities: image.SupportedDriverCapabilities.String(),
        NVIDIAContainerCLIConfig: ContainerCLIConfig{
   
   
            LoadKmods: true,
            Ldconfig:  getLdConfigPath(),
            User:      getUserGroup(),
        },
        NVIDIACTKConfig: CTKConfig{
   
   
            Path: nvidiaCTKExecutable, // nvidiaCTKExecutable = nvidia-ctk
        },
        NVIDIAContainerRuntimeConfig: RuntimeConfig{
   
   
            DebugFilePath: "/dev/null",
            LogLevel:      "info",
            Runtimes:      []string{
   
   "docker-runc", "runc", "crun"},
            Mode:          "auto",
            Modes: modesConfig{
   
   
                CSV: csvModeConfig{
   
   
                    MountSpecPath: "/etc/nvidia-container-runtime/host-files-for-container.d",
                },
                CDI: cdiModeConfig{
   
   
                    DefaultKind:        "nvidia.com/gpu",
                    AnnotationPrefixes: []string{
   
   cdi.AnnotationPrefix}, // cdi.AnnotationPrefix = cdi.k8s.io/
                    SpecDirs:           cdi.DefaultSpecDirs,
                },
            },
        },
        NVIDIAContainerRuntimeHookConfig: RuntimeHookConfig{
   
   
            Path: NVIDIAContainerRuntimeHookExecutable,
        },
    }
    return &d, nil
}

而默认的配置文件如下:

$ cat /etc/nvidia-container-runtime/config.toml
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"
  • 配置nvidia-container-runtime-hook

先去$PATH、/usr/local/sbin、/usr/local/bin、/usr/sbin、/usr/bin、/sbin、/bin下查找nvidia-container-runtime-hook,如果没找到则默认使用/usr/bin/nvidia-container-runtime-hook:

func ResolveNVIDIAContainerRuntimeHookPath(logger logger.Interface, nvidiaContainerRuntimeHookPath string) string {
   
   
    return resolveWithDefault(
        logger,
        "NVIDIA Container Runtime Hook",
        nvidiaContainerRuntimeHookPath, // 配置文件中读取,默认配置为nvidia-container-runtime-hook
        nvidiaContainerRuntimeHookDefaultPath, // nvidiaContainerRuntimeHookDefaultPath = /usr/bin/nvidia-container-runtime-hook
    )
}
  • 初始化driver与runtime

初始化driver对象时会传入配置cfg.NVIDIAContainerCLIConfig.Root,该配置默认为""。而在初始化runtime的核心关键函数newNVIDIAContainerRuntime中,会先查找更底层的runtime(我本地环境只有runc,所以这里找到的是runc),之后判断命令中是否包含创建子命令,如果没有则直接透传上层的调用命令给底层runc执行,否则用NewModifyingRuntimeWrapper方法初始化一个wrapper并调用该wrpper的Exec方法去执行。

// nvidia-container-toolkit/internal/runtime/runtime.go
func (r rt) Run(argv []string) (rerr error) {
   
   
    ...
    driver := root.New(
        root.WithLogger(r.logger),
        root.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root), // 配置文件中读取,默认配置为""(注释掉的配置)
    )
    ...
    runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
    ...
}

// nvidia-container-toolkit/internal/runtime/runtime_factory.go
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string, driver *root.Driver) (oci.Runtime, error) {
   
   
    // 查找更底层的runtime:从$PATH、/usr/local/sbin、/usr/local/bin、/usr/sbin、/usr/bin、/sbin、/bin下查找"docker-runc", "runc", "crun"
    // 如果查找到了,封装返回一个pathRuntime对象
    lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes)// 配置文件中读取,默认配置为["docker-runc", "runc", "crun"]
    if err != nil {
   
   
        return nil, fmt.Errorf("error constructing low-level runtime: %v", err)
    }

    logger.Tracef("Using low-level runtime %v", lowLevelRuntime.String())
    // 检查是否包含create的子命令,检查规则启动参数中不包含-b/-bundle参数,且包含create参数
    if !oci.HasCreateSubcommand(argv) {
   
   
        logger.Tracef("Skipping modifier for non-create subcommand")
        return lowLevelRuntime, nil
    }

    ociSpec, err := oci.NewSpec(logger, argv)
    if err != nil {
   
   
        return nil, fmt.Errorf("error constructing OCI specification: %v", err)
    }

    specModifier, err := newSpecModifier(logger, cfg, ociSpec, driver)
    if err != nil {
   
   
        return nil, fmt.Errorf("failed to construct OCI spec modifier: %v", err)
    }

    // Create the wrapping runtime with the specified modifier.
    r := oci.NewModifyingRuntimeWrapper(
        logger,
        lowLevelRuntime,
        ociSpec,
        specModifier,
    )

    return r, nil
}
  • wrapper Exec

可以看到走到这里时wrapper Exec会先对OCI Spec做modify操作,然后再调用底层runc执行创建命令。

// nvidia-container-toolkit/internal/oci/runtime_modifier.go
func (r *modifyingRuntimeWrapper) Exec(args []string) error {
   
   
    if HasCreateSubcommand(args) {
   
   
        r.logger.Debugf("Create command detected; applying OCI specification modifications")
        err := r.modify()
        if err != nil {
   
   
            return fmt.Errorf("could not apply required modification to OCI specification: %w", err)
        }
        r.logger.Debugf("Applied required modification to OCI specification")
    }

    r.logger.Debugf("Forwarding command to runtime %v", r.runtime.String())
    return r.runtime.Exec(args)
}

// nvidia-container-toolkit/internal/oci/runtime_modifier.go
func (r *modifyingRuntimeWrapper) modify() error {
   
   
    _, err := r.ociSpec.Load()
    if err != nil {
   
   
        return fmt.Errorf("error loading OCI specification for modification: %v", err)
    }

    err = r.ociSpec.Modify(r.modifier)
    if err != nil {
   
   
        return fmt.Errorf("error modifying OCI spec: %v", err)
    }

    err = r.ociSpec.Flush()
    if err != nil {
   
   
        return fmt.Errorf("error writing modified OCI specification: %v", err)
    }
    return nil
}
  • OCI Spec Modify

Modify过程中先基于NewCUDAImageFromSpec初始化一个cuda image对象,从NewCUDAImageFromSpec函数的WithEnvWithMounts可以看出,注入的信息只会与环境变量和mount信息有关。再根据mode、cfg、ociSpec、image等信息调用newModeModifier初始化modeModifyier对象,并根据环境所支持的mode可能返回多个modifier。我本地环境支持"mode"(本地环境对应StableRuntimeModifier)、“graphics”(对应GraphicsModifier)、“feature-gated”(对应FeatureGatedModifier),所以经过三个modifier的Modify。

// nvidia-container-toolkit/internal/runtime/runtime_factory.go
func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec, driver *root.Driver) (oci.SpecModifier, error) {
   
   
### 如何安装和配置 NVIDIA Container Toolkit #### 安装前提条件 为了确保顺利安装NVIDIA Container Toolkit,在开始之前需确认已安装Docker以及NVIDIA驱动程序。对于Ubuntu系统而言,推荐先更新软件包索引并安装必要的依赖项[^1]。 ```bash sudo apt-get update && sudo apt-get install -y \ ca-certificates \ curl \ gnupg \ lsb-release ``` #### 添加官方GPG密钥仓库 通过执行下面命令来添加NVIDIA容器工具包的官方GPG密钥到本地系统中,并设置Apt源以便后续操作: ```bash curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list |\ sed 's#deb http://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' |\ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list ``` #### 安装NVIDIA Container Toolkit 完成上述准备工作之后,可以继续按照如下方式安装NVIDIA Container Toolkit及其相关组件: ```bash sudo apt-get update && sudo apt-get install -y nvidia-docker2 ``` 此时已经完成了大部分工作,但是为了让新安装生效还需要重启docker服务: ```bash sudo systemctl restart docker ``` #### 验证安装成果 最后一步是验证GPU是否能够被正确识别和支持。可以通过拉取一个支持CUDA的基础镜像来进行测试: ```bash docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` 如果一切正常,则会看到有关GPU硬件的信息输出,证明Docker容器确实获得了对NVIDIA GPU资源的有效访问权限[^2]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值