KVM(Kernel-based Virtual Machine)最初是由以色列的公司Qumranet开发的。KVM在2007年2月被正式合并到Linux 2.6.20核心中,成为内核源代码的一部分。2008年9月4日,RedHat公司收购了Qumranet,开始在RHEL中用KVM替换Xen,第一个包含KVM的版本是RHEL 5.4。从RHEL 6开始,KVM成为默认的虚拟化引擎。KVM必须在具备Intel VT或AMD-V功能的X86平台上运行。在Linux内核3.9版中,加入了对ARM架构的支持。具体kvm虚拟化支持的处理器可以通过官网进行查询。
KVM包含一个为处理器提供底层虚拟化、可加载的核心模块kvm.ko(kvm-intel.ko或kvm-amd.ko),使用QEMU(QEMU-KVM)作为虚拟机上层控制工具。KVM不需要改变Linux或Windows系统就能运行。
实际上,在Linux中,kvm就是内核中的一个模块,用户空间通过QEMU模拟硬件提供给虚拟机使用,而一台使用kvm创建的虚拟机就是一个Linux中的进程,管理这个对应的进程就是相当于管理整个对应的虚拟机。
根据 kvmtool github仓库文档的描述,类似于qemu的作用,kvmtool是一个支持运行KVM Guest OS的 host os端用户态虚拟机工具,它是一个纯虚拟化工具,guest os不需要修改即可运行其上, 不过,由于KVM是基于CPU的硬件虚拟化支持的,所以类似于qemu-kvm,它只支持基于相同架构的Guest OS.
kvmtool代码量只有5KLOC,是一个干净的、从头开始写的、轻量级虚拟化工具, 由于轻量,对于想要学习虚拟化的人来说非常友好。kvmtool 作为KVM主机工具实现,可以引导Linux映像,无需BIOS和其他相关依赖. 下面我们尝试基于ubuntu22环境下搭建一个kvmtool运行环境,在虚拟机上运行另一个linux guest os。
主机环境
本实验使用的主机系统是ubuntu22.04,具体信息参考下图:

下载代码
下载kvmtool:
$ git clone https://github.com/kvmtool/kvmtool.git
下载busybox:
$ wget https://busybox.net/downloads/busybox-1.32.0.tar.bz2
下载Linux内核:
$ axel -a -n 80 https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.15.18.tar.gz
选择版本的时候,刻意选择工具和源码版本大体同一段时间的即可,无需太多关注。
编译kvmtool
本次实验使用的kvmtool版本为:e17d182ad3f797f01947fc234d95c96c050c534b,编译方式简单直接,进入 kvmtool目录下直接make 即可:

编译后的可执行程序为lkvm,同时建立了一个lkvm的硬连接vm.两者完全一致,如果需要调试kvmtool,可以手动修改Makefile中的CFLAGS:


编译Linux内核
内核的编译方法很简单,参考博客
桌面PC/服务器 ubuntu18.04 Linux内核编译升级与机制分析_ubuntu18.04升级内核_papaofdoudou的博客-优快云博客
这里需要注意三点:
- 修掉.pem文件缺失相关的编译错误,有两个
- 只需要编译bzImage目标,不需要编译模块
- 默认menuconfig可能配置为CONFIG_VIRTIO_NET=m,需要设置为CONFIG_VIRTIO_NET=y,已经打开了KVM,VIRTIO(CONFIG_VIRTIO_NET=y)相关选项
最后生成bzImage文件:

编译busybox
基于busybox制作根文件系统,构建目录结构,参考博客:
linux4.15 arm qemu @ubuntu18.04环境搭建与bootgraph启动优化_ubuntu18的gdb版本_papaofdoudou的博客-优快云博客
需要注意的是,执行完博客中的操作后,需要将顶层目录的linuxrc文件重命名为init(这步必须做,否则无法进入控制台)。原因很简单,1号进程的启动优先级如下图所示:
所以,linuxrc无法启动的原因很简单,就是默认的ramdisk_execute_command为init, 如果将ramdisk_execute_command修改为linuxrc,则不修改文件名也可以正常启动。
之后将rootfs目录压缩为cpio文件。
$ find . | cpio -o --format=newc > root_fs.cpio

完成后目录结构如下:

以上三步操作完成后,就可以开始运行了。
运行虚拟机
执行前,确认主机存在/dev/kvm设备节点

运行虚拟机执行如下命令
$ sudo ./lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -i ../busybox-1.32.0/_install/root_fs.cpio
zlcao@zlcao-RedmiBook-14:~/kvm/kvmtool$ sudo ./lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -i ../busybox-1.32.0/_install/root_fs.cpio
# lkvm run -k ../linux-5.15.18/arch/x86/boot/bzImage -m 704 -c 8 --name guest-100110
[ 0.000000] Linux version 5.15.18 (zlcao@zlcao-RedmiBook-14) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1 SMP Fri Jan 27 12:27:51 CST 2023
[ 0.000000] Command line: noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 earlyprintk=serial i8042.noaux=1 console=ttyS0 root=/dev/vda rw
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
[ 0.000000] signal: max sigframe size: 2032
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000ffffe] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000002bffffff] usable
[ 0.000000] printk: bootconsole [earlyser0] enabled
[ 0.000000] ERROR: earlyprintk= earlyser already used
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 11c01001, primary cpu clock
[ 0.000004] kvm-clock: using sched offset of 198180346 cycles
[ 0.000522] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.002007] tsc: Detected 1992.002 MHz processor
[ 0.002444] last_pfn = 0x2c000 max_arch_pfn = 0x400000000
[ 0.002986] Disabled
[ 0.003182] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.003765] CPU MTRRs all blank - virtualized system.
[ 0.004236] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
Memory KASLR using RDRAND RDTSC...
[ 0.005590] found SMP MP-table at [mem 0x000f03b0-0x000f03bf]
[ 0.006456] Using GB pages for direct mapping
[ 0.007160] RAMDISK: [mem 0x2bd00000-0x2bf83fff]
[ 0.007640] ACPI: Early table checksum verification disabled
[ 0.008311] ACPI BIOS Error (bug): A valid RSDP was not found (20210730/tbxfroot-210)
[ 0.009234] No NUMA configuration found
[ 0.009526] Faking a node at [mem 0x0000000000000000-0x000000002bffffff]
[ 0.010001] NODE_DATA(0) allocated [mem 0x2bfd6000-0x2bffffff]
[ 0.010937] Zone ranges:
[ 0.011122] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.011581] DMA32 [mem 0x0000000001000000-0x000000002bffffff]
[ 0.012074] Normal empty
[ 0.012351] Device empty
[ 0.012626] Movable zone start for each node
[ 0.012971] Early memory node ranges
[ 0.013292] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.013732] node 0: [mem 0x0000000000100000-0x000000002bffffff]
[ 0.014192] Initmem setup node 0 [mem 0x0000000000001000-0x000000002bffffff]
[ 0.014710] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.014878] On node 0, zone DMA: 97 pages in unavailable ranges
[ 0.022910] On node 0, zone DMA32: 16384 pages in unavailable ranges
[ 0.023633] Intel MultiProcessor Specification v1.4
[ 0.024453] MPTABLE: OEM ID: KVMCPU00
[ 0.024719] MPTABLE: Product ID: 0.1
[ 0.025000] MPTABLE: APIC at: 0xFEE00000
[ 0.025279] Processor #0 (Bootup-CPU)
[ 0.025527] Processor #1
[ 0.025698] Processor #2
[ 0.025861] Processor #3
[ 0.026025] Processor #4
[ 0.026191] Processor #5
[ 0.026356] Processor #6
[ 0.026521] Processor #7
[ 0.026715] IOAPIC[0]: apic_id 9, version 17, address 0xfec00000, GSI 0-23
[ 0.027163] Processors: 8
[ 0.027344] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[ 0.027735] kvm-guest: KVM setup pv remote TLB flush
[ 0.028059] kvm-guest: setup PV sched yield
[ 0.028372] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.028859] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.029349] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.029843] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fefff]
[ 0.030330] PM: hibernation: Registered nosave memory: [mem 0x000ff000-0x000fffff]
[ 0.030820] [mem 0x2c000000-0xffffffff] available for PCI devices
[ 0.031217] Booting paravirtualized kernel on KVM
[ 0.031546] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.032234] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
[ 0.034042] percpu: Embedded 61 pages/cpu s212992 r8192 d28672 u262144
[ 0.034524] kvm-guest: setup async PF for cpu 0
[ 0.034866] kvm-guest: stealtime: cpu 0, msr 2ae33080
[ 0.035203] kvm-guest: PV spinlocks enabled
[ 0.035483] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes, linear)
[ 0.035994] Built 1 zonelists, mobility grouping on. Total pages: 177152
[ 0.036454] Policy zone: DMA32
[ 0.036658] Kernel command line: noapic noacpi pci=conf1 reboot=k panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 earlyprintk=serial i8042.noaux=1 console=ttyS0 root=/dev/vda rw
[ 0.037994] Unknown kernel command line parameters "noacpi", will be passed to user space.
[ 0.039146] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[ 0.039968] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[ 0.040621] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 0.045493] Memory: 657968K/720504K available (16393K kernel code, 4387K rwdata, 10492K rodata, 2932K init, 4816K bss, 62276K reserved, 0K cma-reserved)
[ 0.046448] random: get_random_u64 called from __kmem_cache_create+0x2f/0x520 with crng_init=0
[ 0.046702] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[ 0.047702] ftrace: allocating 47928 entries in 188 pages
[ 0.064484] ftrace: allocated 188 pages with 5 groups
[ 0.065149] rcu: Hierarchical RCU implementation.
[ 0.065448] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=8.
[ 0.065873] Rude variant of Tasks RCU enabled.
[ 0.066157] Tracing variant of Tasks RCU enabled.
[ 0.066456] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.066930] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[ 0.070850] NR_IRQS: 524544, nr_irqs: 488, preallocated irqs: 16
[ 0.071549] random: crng done (trusting CPU's manufacturer)
[ 0.071979] Console: colour *CGA 80x25
[ 0.072283] printk: console [ttyS0] enabled
[ 0.072283] printk: console [ttyS0] enabled
[ 0.072969] printk: bootconsole [earlyser0] disabled
[ 0.072969] printk: bootconsole [earlyser0] disabled
[ 0.073921] APIC: Switch to symmetric I/O mode setup
[ 0.074351] Not enabling interrupt remapping due to skipped IO-APIC setup
[ 0.075319] kvm-guest: setup PV IPIs
[ 0.075970] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x396d566cf43, max_idle_ns: 881590760263 ns
[ 0.076947] Calibrating delay loop (skipped) preset value.. 3984.00 BogoMIPS (lpj=7968008)
[ 0.077665] pid_max: default: 32768 minimum: 301
[ 0.081003] LSM: Security Framework initializing
[ 0.081417] landlock: Up and running.
[ 0.081733] Yama: becoming mindful.
[ 0.082087] AppArmor: AppArmor initialized
[ 0.082481] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.083131] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Poking KASLR using RDRAND RDTSC...
[ 0.085044] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[ 0.085971] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 0.086434] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[ 0.086991] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.088961] Spectre V2 : Mitigation: Full generic retpoline
[ 0.089424] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.090121] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.090713] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.091429] Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl
[ 0.092018] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.092950] SRBDS: Unknown: Dependent on hypervisor status
[ 0.093435] MDS: Mitigation: Clear CPU buffers
[ 0.101335] Freeing SMP alternatives memory: 40K
[ 0.318017] smpboot: CPU0: Intel 06/8e (family: 0x6, model: 0x8e, stepping: 0xb)
[ 0.319105] Performance Events: Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
[ 0.321782] ... version: 2
[ 0.322127] ... bit width: 48
[ 0.322481] ... generic registers: 4
[ 0.322819] ... value mask: 0000ffffffffffff
[ 0.323267] ... max period: 00007fffffffffff
[ 0.323719] ... fixed-purpose events: 3
[ 0.324941] ... event mask: 000000070000000f
[ 0.325598] rcu: Hierarchical SRCU implementation.
[ 0.327121] smp: Bringing up secondary CPUs ...
[ 0.327742] x86: Booting SMP configuration:
[ 0.328094] .... node #0, CPUs: #1
[ 0.009568] kvm-clock: cpu 1, msr 11c01041, secondary cpu clock
[ 0.329211] kvm-guest: setup async PF for cpu 1
[ 0.329667] kvm-guest: stealtime: cpu 1, msr 2ae73080
[ 0.330021] #2
[ 0.009568] kvm-clock: cpu 2, msr 11c01081, secondary cpu clock
[ 0.009568] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 APIC: 7
[ 0.331227] kvm-guest: setup async PF for cpu 2
[ 0.331227] kvm-guest: stealtime: cpu 2, msr 2aeb3080
[ 0.333172] #3
[ 0.009568] kvm-clock: cpu 3, msr 11c010c1, secondary cpu clock
[ 0.009568] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 APIC: 7
[ 0.334905] kvm-guest: setup async PF for cpu 3
[ 0.334905] kvm-guest: stealtime: cpu 3, msr 2aef3080
[ 0.334905] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 0.337190] #4
[ 0.009568] kvm-clock: cpu 4, msr 11c01101, secondary cpu clock
[ 0.009568] [Firmware Bug]: CPU4: APIC id mismatch. Firmware: 4 APIC: 1
[ 0.339458] kvm-guest: setup async PF for cpu 4
[ 0.339458] kvm-guest: stealtime: cpu 4, msr 2af33080
[ 0.341165] #5
[ 0.009568] kvm-clock: cpu 5, msr 11c01141, secondary cpu clock
[ 0.009568] [Firmware Bug]: CPU5: APIC id mismatch. Firmware: 5 APIC: 0
[ 0.343159] kvm-gu