一、引言
在安装好nvidia container toolkit的节点上,我们先起一个普通容器,并进入容器执行nvidia-smi命令,发现无法执行:
# docker run -d --name normal-container nginx:latest
992ed0b4cb7134b7cb528124b4ebed193215f0987ed288a582fb088486a9b67a
# docker exec -ti normal-container bash
root@992ed0b4cb71:/# nvidia-smi
bash: nvidia-smi: command not found
在上述命令加上--gpus all参数创建另外一个容器,并进入该容器执行nvidia-smi命令,发现可以正常执行:
# docker run -d --name nvidia-container --gpus all nginx:latest
81281dc9dc0a7d3c9de5e90dffdfa593975976c5a2a07c7a5ebddfd4e704bbe3
# docker exec -ti nvidia-container bash
root@81281dc9dc0a:/# nvidia-smi
Sun May 18 12:49:09 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02 Driver Version: 560.94 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti On | 00000000:01:00.0 On | N/A |
| 0% 43C P8 8W / 165W | 835MiB / 16380MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 24 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
为什么在创建命令中增加了一个--gpus all参数,nvidia-container里就有了nvidia-smi命令呢?这个可执行文件到底是怎么来的呢?又与nvidia container toolkit哪些组件有关呢?本文带着这些问题,尝试深入理解nvidia container toolkit。
二、nvidia container toolkit组件组成
继续在安装好nvidia container toolkit的节点上输入nvidia-c并按tab键补全,会有如下输出:
# nvidia-c
nvidia-cdi-hook nvidia-container-cli nvidia-container-runtime nvidia-container-runtime-hook nvidia-container-toolkit nvidia-ctk
这些就是nvidia container toolkit的可执行组件,nvidia container toolkit可执行组件对应的源码在两个仓库:
https://github.com/NVIDIA/nvidia-container-toolkit:go语言开发https://github.com/NVIDIA/libnvidia-container:c语言开发
为了更快速的了解github开源项目源码,除了看项目的readme和官网文档,还可以尝试这个AI工具:deepwiki。上述两个仓库在deepwiki上分别对应:
https://deepwiki.com/NVIDIA/nvidia-container-toolkithttps://deepwiki.com/NVIDIA/libnvidia-container
对应的各个组件及基本功能:
nvidia-cdi-hook:go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/main.go。支持CDI(Container Device Interface)环境的hook,如果环境不支持CDI,用的还是nvidia-container-runtime-hook。nvidia-container-cli:c开发的可执行文件,代码在https://github.com/NVIDIA/libnvidia-container项目中,main函数入口:https://github.com/NVIDIA/libnvidia-container/src/cli/main.c。核心命令行工具,负责设备驱动库、nvidia-smi等命令注入(挂载)相关工作。nvidia-container-runtime:go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime。容器运行时封装层,拦截并扩展容器创建流程,为OCI spec注入preStartHook(例如nvidia-container-runtime-hook)后调用底层runc等工具创建容器。nvidia-container-runtime-hook:go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime-hook。容器的preStartHook,容器启动前执行,主要是拼接参数调用nvidia-container-cli。nvidia-container-toolkit:go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/tools/container/nvidia-toolkit/run.go。可以借助该工具辅助安装nvidia-container-runtime(例如配置docker配置文件并重启docker)。nvidia-ctk:go开发的可执行文件,代码在https://github.com/NVIDIA/nvidia-container-toolkit项目中,main函数入口:https://github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk。包含hook、runtime、cdi、config等子命令,cdi、csv、graphics等场景会用到。
2.1.2 数据流示意图
在docker+runc的环境中,以前面的命令docker run -d --name normal-container nginx:latest创建容器为例,数据流如下:

在我的环境中不支持CDI,如果加上--gpus all参数后,创建使用GPU的容器时相关的nvidia container toolkit组件只有三个:nvidia-container-runtime、nvidia-container-runtime-hook、nvidia-container-cli,数据流变成了如下形式:

三、关键组件源码分析
有了上述流程理解,下面再从源码角度来深入验证上述流程。
以下代码基于https://github.com/NVIDIA/nvidia-container-toolkit@1.17.4
3.1 nvidia-container-runtime
- main函数入口
// nvidia-container-toolkit/cmd/nvidia-container-runtime/main.go
func main() {
r := runtime.New()
err := r.Run(os.Args)
if err != nil {
os.Exit(1)
}
}
- Run函数
Run函数逻辑不多,主要是解析配置文件内容、初始化nvidia-ctk和nvidia-container-runtime-hook配置、配置runtime对象并执行Exec方法:
// nvidia-container-toolkit/internal/runtime/runtime.go
func (r rt) Run(argv []string) (rerr error) {
...
// 获取配置文件
cfg, err := config.GetConfig()
...
// 配置nvidia-container-runtime-hook路径
cfg.NVIDIAContainerRuntimeHookConfig.Path = config.ResolveNVIDIAContainerRuntimeHookPath(&logger.NullLogger{
}, cfg.NVIDIAContainerRuntimeHookConfig.Path)
...
driver := root.New(
root.WithLogger(r.logger),
root.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
)
r.logger.Tracef("Command line arguments: %v", argv)
runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
if err != nil {
return fmt.Errorf("failed to create NVIDIA Container Runtime: %v", err)
}
if printVersion {
fmt.Print("\n")
}
return runtime.Exec(argv)
}
- 解析配置文件
配置文件首先会先读取环境变量XDG_CONFIG_HOME的值,如果不为空,则配置文件为$XDG_CONFIG_HOME/nvidia-container-runtime/config.toml,否则默认为/etc/nvidia-container-runtime/config.toml:
// nvidia-container-toolkit/internal/config/config.go
func GetConfig() (*Config, error) {
cfg, err := New(
WithConfigFile(GetConfigFilePath()),
)
if err != nil {
return nil, err
}
return cfg.Config()
}
// nvidia-container-toolkit/internal/config/config.go
func GetConfigFilePath() string {
// configOverride = XDG_CONFIG_HOME
if XDGConfigDir := os.Getenv(configOverride); len(XDGConfigDir) != 0 {
return filepath.Join(XDGConfigDir, configFilePath) // configFilePath = nvidia-container-runtime/config.toml
}
return filepath.Join("/etc", configFilePath)
}
// nvidia-container-toolkit/internal/config/toml.go
func (t *Toml) Config() (*Config, error) {
cfg, err := t.configNoOverrides()
if err != nil {
return nil, err
}
if err := cfg.assertValid(); err != nil {
return nil, err
}
return cfg, nil
}
// nvidia-container-toolkit/internal/config/toml.go
func (t *Toml) configNoOverrides() (*Config, error) {
cfg, err := GetDefault()
if err != nil {
return nil, err
}
if t == nil {
return cfg, nil
}
if err := t.Unmarshal(cfg); err != nil {
return nil, fmt.Errorf("failed to unmarshal config: %v", err)
}
return cfg, nil
}
// nvidia-container-toolkit/internal/config/config.go
func GetDefault() (*Config, error) {
d := Config{
AcceptEnvvarUnprivileged: true,
SupportedDriverCapabilities: image.SupportedDriverCapabilities.String(),
NVIDIAContainerCLIConfig: ContainerCLIConfig{
LoadKmods: true,
Ldconfig: getLdConfigPath(),
User: getUserGroup(),
},
NVIDIACTKConfig: CTKConfig{
Path: nvidiaCTKExecutable, // nvidiaCTKExecutable = nvidia-ctk
},
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/dev/null",
LogLevel: "info",
Runtimes: []string{
"docker-runc", "runc", "crun"},
Mode: "auto",
Modes: modesConfig{
CSV: csvModeConfig{
MountSpecPath: "/etc/nvidia-container-runtime/host-files-for-container.d",
},
CDI: cdiModeConfig{
DefaultKind: "nvidia.com/gpu",
AnnotationPrefixes: []string{
cdi.AnnotationPrefix}, // cdi.AnnotationPrefix = cdi.k8s.io/
SpecDirs: cdi.DefaultSpecDirs,
},
},
},
NVIDIAContainerRuntimeHookConfig: RuntimeHookConfig{
Path: NVIDIAContainerRuntimeHookExecutable,
},
}
return &d, nil
}
而默认的配置文件如下:
$ cat /etc/nvidia-container-runtime/config.toml
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig.real"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false
[nvidia-ctk]
path = "nvidia-ctk"
- 配置nvidia-container-runtime-hook
先去$PATH、/usr/local/sbin、/usr/local/bin、/usr/sbin、/usr/bin、/sbin、/bin下查找nvidia-container-runtime-hook,如果没找到则默认使用/usr/bin/nvidia-container-runtime-hook:
func ResolveNVIDIAContainerRuntimeHookPath(logger logger.Interface, nvidiaContainerRuntimeHookPath string) string {
return resolveWithDefault(
logger,
"NVIDIA Container Runtime Hook",
nvidiaContainerRuntimeHookPath, // 配置文件中读取,默认配置为nvidia-container-runtime-hook
nvidiaContainerRuntimeHookDefaultPath, // nvidiaContainerRuntimeHookDefaultPath = /usr/bin/nvidia-container-runtime-hook
)
}
- 初始化driver与runtime
初始化driver对象时会传入配置cfg.NVIDIAContainerCLIConfig.Root,该配置默认为""。而在初始化runtime的核心关键函数newNVIDIAContainerRuntime中,会先查找更底层的runtime(我本地环境只有runc,所以这里找到的是runc),之后判断命令中是否包含创建子命令,如果没有则直接透传上层的调用命令给底层runc执行,否则用NewModifyingRuntimeWrapper方法初始化一个wrapper并调用该wrpper的Exec方法去执行。
// nvidia-container-toolkit/internal/runtime/runtime.go
func (r rt) Run(argv []string) (rerr error) {
...
driver := root.New(
root.WithLogger(r.logger),
root.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root), // 配置文件中读取,默认配置为""(注释掉的配置)
)
...
runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
...
}
// nvidia-container-toolkit/internal/runtime/runtime_factory.go
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string, driver *root.Driver) (oci.Runtime, error) {
// 查找更底层的runtime:从$PATH、/usr/local/sbin、/usr/local/bin、/usr/sbin、/usr/bin、/sbin、/bin下查找"docker-runc", "runc", "crun"
// 如果查找到了,封装返回一个pathRuntime对象
lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes)// 配置文件中读取,默认配置为["docker-runc", "runc", "crun"]
if err != nil {
return nil, fmt.Errorf("error constructing low-level runtime: %v", err)
}
logger.Tracef("Using low-level runtime %v", lowLevelRuntime.String())
// 检查是否包含create的子命令,检查规则启动参数中不包含-b/-bundle参数,且包含create参数
if !oci.HasCreateSubcommand(argv) {
logger.Tracef("Skipping modifier for non-create subcommand")
return lowLevelRuntime, nil
}
ociSpec, err := oci.NewSpec(logger, argv)
if err != nil {
return nil, fmt.Errorf("error constructing OCI specification: %v", err)
}
specModifier, err := newSpecModifier(logger, cfg, ociSpec, driver)
if err != nil {
return nil, fmt.Errorf("failed to construct OCI spec modifier: %v", err)
}
// Create the wrapping runtime with the specified modifier.
r := oci.NewModifyingRuntimeWrapper(
logger,
lowLevelRuntime,
ociSpec,
specModifier,
)
return r, nil
}
- wrapper Exec
可以看到走到这里时wrapper Exec会先对OCI Spec做modify操作,然后再调用底层runc执行创建命令。
// nvidia-container-toolkit/internal/oci/runtime_modifier.go
func (r *modifyingRuntimeWrapper) Exec(args []string) error {
if HasCreateSubcommand(args) {
r.logger.Debugf("Create command detected; applying OCI specification modifications")
err := r.modify()
if err != nil {
return fmt.Errorf("could not apply required modification to OCI specification: %w", err)
}
r.logger.Debugf("Applied required modification to OCI specification")
}
r.logger.Debugf("Forwarding command to runtime %v", r.runtime.String())
return r.runtime.Exec(args)
}
// nvidia-container-toolkit/internal/oci/runtime_modifier.go
func (r *modifyingRuntimeWrapper) modify() error {
_, err := r.ociSpec.Load()
if err != nil {
return fmt.Errorf("error loading OCI specification for modification: %v", err)
}
err = r.ociSpec.Modify(r.modifier)
if err != nil {
return fmt.Errorf("error modifying OCI spec: %v", err)
}
err = r.ociSpec.Flush()
if err != nil {
return fmt.Errorf("error writing modified OCI specification: %v", err)
}
return nil
}
- OCI Spec Modify
Modify过程中先基于NewCUDAImageFromSpec初始化一个cuda image对象,从NewCUDAImageFromSpec函数的WithEnv和WithMounts可以看出,注入的信息只会与环境变量和mount信息有关。再根据mode、cfg、ociSpec、image等信息调用newModeModifier初始化modeModifyier对象,并根据环境所支持的mode可能返回多个modifier。我本地环境支持"mode"(本地环境对应StableRuntimeModifier)、“graphics”(对应GraphicsModifier)、“feature-gated”(对应FeatureGatedModifier),所以经过三个modifier的Modify。
// nvidia-container-toolkit/internal/runtime/runtime_factory.go
func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec, driver *root.Driver) (oci.SpecModifier, error) {

最低0.47元/天 解锁文章
3506

被折叠的 条评论
为什么被折叠?



