GPU passthrought setup for Nvidia V100 (Part I)

本文详细介绍了如何在SUSE Linux Enterprise Server上配置GPU直通,包括环境验证、IOMMU启用、驱动设置、VFIO配置及GPU隔离步骤,确保GPU资源高效分配。
部署运行你感兴趣的模型镜像

This is an instruction based on V100 and GPU compute purpose only. There will be two parts for this instruction, Host setup and Guest Setup 

Part one: HOST

Part two: Guest

Please make sure using Nvidia Tesla production, which means Maxwell,

Pascal, and Volta. We do not have hardware matrix from Nvidia yet.

Please also make sure you have an extra display card on the host at the meantime, or a SSH enviroment at least.

 

1. HOST enviroment verification

1.1 Make sure Your HOST is SLES12SP3 and so on

baird:~/:[0]# cat /etc/issue

Welcome to SUSE Linux Enterprise Server 15  (x86_64) - Kernel \r (\l).

1.2  Make sure your HOST support VT-d and being enabled from BIOS:

baird:~/:[0]# dmesg | grep -e "Directed I/O"

[   12.819760] DMAR: Intel(R) Virtualization Technology for Directed I/O

1.3 Make sure if you an extra GPU or VGA card:

baird:~/:[0]# lspci | grep -i "vga"

07:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA

G200e [Pilot] ServerEngines (SEP1) (rev 05)

baird:~/:[0]# lspci | grep -i nvidia

03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)

 

2. Enable IOMMU

vim /etc/default/grub

# Make this line look like this

GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt rd.driver.pre=vfio-pci"

grub2-mkconfig -o /boot/grub2/grub.cfg

After reboot, you could verify by

dmesg |  grep -e DMAR -e IOMMU

 

3. Add nouveau to blacklist

baird:~/:[0]# vim /etc/modprobe.d/50-blacklist.conf

add "blacklist nouveau"

 

4. Setup VFIO and isolate the GPU used for pass-through

Add a file under /etc/modprobe.d

baird:~/:[0]# cat /etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:1db4

10de:1db4 is vender id and model id, lspci -nn will give you these values

baird:~/:[0]# lspci -nn | grep 03:00.0

03:00.0 3D controller [0302]: NVIDIA Corporation GV100 [Tesla V100 PCIe]

[10de:1db4] (rev a1)

5. load VFIO driver

baird:~/:[0]# modprobe vfio-pci

or add to your initrd file

baird:~/:[0]# cat /etc/dracut.conf.d/gpu-passthrough.conf

add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"

dracut --force /boot/initrd $(uname -r)

 

6. Reboot Host and check GPU is isolated in different iommu group and

vfio driver is in use

find /sys/kernel/iommu_groups/*/devices/*

/sys/kernel/iommu_groups/47/devices/0000:03:00.0

/sys/kernel/iommu_groups/49/devices/0000:07:00.0

lspci -k

03:00.0 3D controller: NVIDIA Corporation GV100 [Tesla V100 PCIe] (rev a1)

        Subsystem: NVIDIA Corporation Device 1214

        Kernel driver in use: vfio-pci

        Kernel modules: nouveau

您可能感兴趣的与本文相关的镜像

Wan2.2-I2V-A14B

Wan2.2-I2V-A14B

图生视频
Wan2.2

Wan2.2是由通义万相开源高效文本到视频生成模型,是有​50亿参数的轻量级视频生成模型,专为快速内容创作优化。支持480P视频生成,具备优秀的时序连贯性和运动推理能力

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

认真的柯南

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值