KubeVirt下gpu operator实践(GPU直通)
参考《在 KubeVirt 中使用 GPU Operator》,记录gpu operator在KubeVirt下实践的过程,包括虚拟机配置GPU直通,容器挂载GPU设备等。
KubeVirt 提供了一种将主机设备分配给虚拟机的机制。该机制具有通用性,支持分配各种类型的 PCI 设备,例如加速器(包括 GPU)或其他连接到 PCI 总线的设备。
1. 准备工作
硬件信息
三台服务器,每台服务器的主要硬件配置信息如下:
-
型号: H3C UniServer R5200 G3
-
cpu:Xeon® Gold 5118 CPU @ 2.30GHz,2x12x2
-
mem: 512GB
-
disk: 1.75GBx2,一块用于系统盘,一块用于ceph数据盘。
-
gpu:GV100GL_TESLA_V100_PCIE_16GB*4
kubernetes集群环境
k8s环境信息如下,使用v1.27.6版本,节点同时作为管理节点和业务节点。
]# kubectl get node
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 7d18h v1.27.6
node2 Ready control-plane,master 7d18h v1.27.6
node3 Ready control-plane,master 7d18h v1.27.6
rook-ceph环境安装
使用版本v1.13.5,组件信息如下:
]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-5k87k 2/2 Running 4 (5d22h ago) 12d
csi-cephfsplugin-88rmz 2/2 Running 0 5d22h
csi-cephfsplugin-np9lb 2/2 Running 4 (5d22h ago) 12d
csi-cephfsplugin-provisioner-5556f68f89-bsphz 5/5 Running 0 5d22h
csi-cephfsplugin-provisioner-5556f68f89-jswvj 5/5 Running 0 5d22h
csi-rbdplugin-5v8dm 2/2 Running 0 5d22h
csi-rbdplugin-provisioner-76f966fdd8-6jwdk 5/5 Running 0 5d22h
csi-rbdplugin-provisioner-76f966fdd8-sjf6c 5/5 Running 3 (4d21h ago) 5d22h
csi-rbdplugin-s8k4x 2/2 Running 4 (5d22h ago) 12d
csi-rbdplugin-s97mc 2/2 Running 4 (5d22h ago) 12d
os-rook-set-cronjob-28865084-h5jdf 0/1 Completed 0 118s
rook-ceph-agent-7bdf69b4f7-wzl7r 1/1 Running 0 5d22h
rook-ceph-crashcollector-node1-c9bc54894-s7lps 1/1 Running 0 5d22h
rook-ceph-crashcollector-node2-6448cdd8f9-7dlgs 1/1 Running 0 5d22h
rook-ceph-crashcollector-node3-56d876f9c6-6bjlr 1/1 Running 0 5d22h
rook-ceph-exporter-node1-7c7d659d96-6k55c 1/1 Running 0 5d22h
rook-ceph-exporter-node2-7bc85dfdf-xdt8g 1/1 Running 0 5d22h
rook-ceph-exporter-node3-f45f5db9d-8jrgs 1/1 Running 0 5d22h
rook-ceph-mds-ceph-filesystem-a-5bbf4d5d79-qwsd2 2/2 Running 0 5d22h
rook-ceph-mds-ceph-filesystem-b-69b5fc4f7-fhlth 2/2 Running 0 5d22h
rook-ceph-mgr-a-5f5768988-6dsm2 3/3 Running 0 5d22h
rook-ceph-mgr-b-5c96dcf465-7gcrk 3/3 Running 0 5d22h
rook-ceph-mon-a-5dcb9b69c5-p5mh9 2/2 Running 0 5d22h
rook-ceph-mon-b-6575d4f46b-gsthp 2/2 Running 0 5d22h
rook-ceph-mon-c-7ff969d568-gqzr4 2/2 Running 0 5d22h
rook-ceph-operator-86d5cb7c46-nx4jc 1/1 Running 0 5d22h
rook-ceph-osd-0-69c8c7fb45-nvvsx 2/2 Running 0 5d22h
rook-ceph-osd-1-5fcdbc57bf-dh8cf 2/2 Running 0 5d22h
rook-ceph-osd-2-7445bdc885-sxqbc 2/2 Running 0 5d22h
rook-ceph-rgw-ceph-objectstore-a-795c4c64cf-xbhkl 2/2 Running 0 5d22h
rook-ceph-tools-5877f9f669-tndwc 1/1 Running 0 5d22h
kubevirt安装
参考kubevirt官方文档完成虚拟化插件的安装,本文使用的版本为v1.2.0-amd64。
]# kubectl get pod -n kubevirt
NAME READY STATUS RESTARTS AGE
virt-api-74d58d7fc8-5v5t4 1/1 Running 0 5d21h
virt-api-74d58d7fc8-m7lhw 1/1 Running 0 5d21h
virt-controller-55d7978dc-d9tk2 1/1 Running 0 5d21h
virt-controller-55d7978dc-xsvtm 1/1 Running 0 5d21h
virt-exportproxy-795d79f86b-qgc4c 1/1 Running 0 5d21h
virt-exportproxy-795d79f86b-wxt4b 1/1 Running 0 5d21h
virt-handler-4x55q 1/1 Running 0 5d21h
virt-handler-b9b27 1/1 Running 0 5