vhost in qemu

最新推荐文章于 2025-04-28 13:56:53 发布

转载最新推荐文章于 2025-04-28 13:56:53 发布 · 1.1k 阅读

·

0

·

QEMU-KVM虚拟化专栏收录该内容

38 篇文章

订阅专栏

本文深入解析vhost架构，包括vhost的用户态程序接口、memorywritelog机制、vhost_dev初始化与清理过程、vhost_virtqueue初始化细节、guest_notifier与host_notifier的工作原理以及vhost_net的初始化流程。

vhost的用户态程序接口定义在/usr/include/linux/vhost.h

vhost目前只支持tap network backend

vhost.h/vhost.c

------------------------------

vhost的memory write log机制

vhost使用一个bitmap记录对Guest物理内存的改变，每VHOST_LOG_PAGE(4K)字节大小的内存使用

一个bit来记录，0表示没有改变过，1表示改变过。

由于vhost_dev.log是以vhost_log_chunk_t (unsigned long)为单位的，因此一个元素包含的位数为

VHOST_LOG_BITS (8 * sizeof(vhost_log_chunk_t))，其所能记录的内存大小为

VHOST_LOG_CHUNK (VHOST_LOG_PAGE * VHOST_LOG_BITS)

因此，对于一个大小为n的内存区域，其log所需的元素数目是 n / VHOST_LOG_CHUNK + 1

vhost_get_log_size函数用于计算所需要的log size大小。其实在实现上，可以先计算出最大的地址，

然后直接return (max_addr / VHOST_LOG_CHUNK + 1)，这样就不用在中间每次都使用除法了。

------------------------------

public interface:

vhost_dev_init 初始化vhost设备

vhost_dev_cleanup 清除vhost设备

vhost_dev_query

vhost_dev_start 启动vhost设备

vhost_dev_stop 停止vhost设备

vhost_virtqueue_init函数(被vhost_dev_start调用)：

1. ioctl VHOST_SET_VRING_NUM 设置vring size

2. ioctl VHOST_SET_VRING_BASE 设置 (VirtQueue.last_avail_idx)

3. 设置vhost_virtqueue中ring相关的成员(desc, avail, used_size, used_phys, used,

ring_size, ring_phys, ring)

4. 调用vhost_virtqueue_set_addr设置相关地址

5. 调用ioctl VHOST_SET_VRING_KICK 设置kick fd (guest -> vhost) (VirtQueue.host_notifier.fd)

6. 调用ioctl VHOST_SET_VRING_CALL 设置call fd (vhost -> guest) (VirtQueue.guest_notifier.fd)

guest_notifier的初始化：

1. 在vhost_dev_start起始处，会调用vdev->binding->set_guest_notifiers(vdev->binding_opaque, true)

对于virtio pci设备来说，该函数就是virtio_pci_set_guest_notifiers(virtio-pci.c)

2. virtio_pci_set_guest_notifiers对每个可能的vq调用virtio_pci_set_guest_notifier

3. virtio_pci_set_guest_notifier先通过event_notifier_init初始化一个eventfd，然后调用

qemu_set_fd_handler将该eventfd添加到qemu的selectable fd列表中，并指定其read poll处理

函数为virtio_pci_guest_notifier_read

4. 调用msix_set_mask_notifier，其中msix_mask_notifier_func设置为virtio_pci_mask_notifier

5. 对该设备的所有MSI-X入口(0 - msix_entries_nr)，调用msix_set_mask_notifier_for_vector

6. 如果该vector没有被屏蔽，则调用dev->msix_mask_notifier(就是virtio_pci_mask_notifier)

7. 对当前的所有VirtQueue，调用virtio_pci_mask_vq

8. 调用kvm_set_irqfd，设置该VirtQueue的irqfd为guest notifier fd

KVM_IRQFD的文档参见 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/78601

9. 如果设置成功，则调用qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL)

取消掉对该guest notifier fd的select。

guest_notifier的使用：

1. vhost在处理完请求，将buffer放到used ring上面之后，往call fd里面写入

2. 如果成功设置了irqfd，则kvm会直接中断guest。如果没有成功设置，则走以下的路径：

3. qemu通过select调用侦测到该事件(因为vhost的call fd就是qemu里面对应vq的guest_notifier，它

已经被加入到selectable fd列表)

4. 调用virtio_pci_guest_notifier_read通知guest

5. guest从used ring上获取相关的数据

host_notifier的初始化：

1. 在vhost_virtqueue_init中，会调用

vdev->binding->set_host_notifiers(vdev->binding_opaque, idx, true)

对于virtio pci设备来说，该函数就是virtio_pci_set_host_notifier(virtio-pci.c)

2. virtio_pci_set_host_notifier调用virtio_pci_set_host_notifier_internal

3. virtio_pci_set_host_notifier_internal先通过event_notifier_init初始化一个eventfd，再调用

kvm_set_ioeventfd_pio_word

4. kvm_set_ioeventfd_pio_word通过调用kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick)来设置kick

fd，KVM_IOEVENTFD的文档参见 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/73076

在vhost_dev_init中，调用cpu_register_phys_memory_client注册了物理内存的客户端，三个hook函数

分别设置为vhost_client_set_memory, vhost_client_sync_dirty_bitmap, vhost_client_migration_log

CPUPhysMemoryClient结构用于在Guest物理内存变化时得到通知：

typedef struct CPUPhysMemoryClient CPUPhysMemoryClient;

struct CPUPhysMemoryClient {

void (*set_memory)(struct CPUPhysMemoryClient *client,

target_phys_addr_t start_addr,

ram_addr_t size,

ram_addr_t phys_offset);

int (*sync_dirty_bitmap)(struct CPUPhysMemoryClient *client,

target_phys_addr_t start_addr,

target_phys_addr_t end_addr);

int (*migration_log)(struct CPUPhysMemoryClient *client,

int enable);

QLIST_ENTRY(CPUPhysMemoryClient) list;

};

set_memory在添加Guest物理内存区域(Region)的时候被调用

sync_dirty_bitmap 同步指定内存区间的dirty bitmap(sync back)

migration_log 设置启用/禁用migration log

vhost_client_set_memory函数：

1. 调用qemu_realloc使vhost_memory的区域数目增加1

2. 调用vhost_dev_unassign_memory确保当前内存区域不和已存在的其他区域有交集

3. 如果是RAM，那么调用vhost_dev_assign_memory添加该内存区域

4. 如果vhost还未启动，则直接返回

5. 调用vhost_verify_ring_mappings，检查新添加的内存区域和所有的VirtQueue区域都没有交集

6. 如果禁用了memory write log，那么调用ioctl VHOST_SET_MEM_TABLE设置内存区域之后返回

7. 如果log size变大，那么先调用vhost_dev_log_resize设置log size，再调用

ioctl VHOST_SET_MEM_TABLE设置内存区域

8. 如果log size变小，那么先调用ioctl VHOST_SET_MEM_TABLE 设置内存区域，再调用

vhost_dev_log_resize设置log size

vhost_client_sync_dirty_bitmap函数(同步对应于指定内存区间的dirty bitmap)：

1. 如果禁用了memory write log或者vhost没有启动，则直接返回

2. 对所有vhost_dev上面的内存区域，调用vhost_dev_sync_region

3. 对所有vhost_dev上面的vhost_virtqueue的used ring所对应的内存区域，调用vhost_dev_sync_region

-----------------------------------------------------------------------------------------------

vhost_net.h/vhost_net.c

每个虚拟网卡都对应一个vhost_net，每个vhost_net都包含一个vhost_dev

代码很简单，基本是封装了vhost_dev的相关操作，并加入一些对tap的控制，对feature的控制等。

vhost_net的启用是在命令行的-netdev tap,... 中指定vhost=on选项，其初始化流程如下：

1. 根据<<qemu network backend的初始化>>一文，在其最后会调用net_client_types[i].init函数

对于tap，该函数是net_init_tap

2. 在其中会检查选项中是否指定vhost=on，如果指定，则会调用vhost_net_init函数进行初始化，

并将返回的vhost_net赋值给TAPState.vhost_net成员

3. 在Guest初始化设备并改变设备状态时，会调用到virtio-net.c中的virtio_net_set_status函数，

如果初始化设备成功(status & VIRTIO_CONFIG_S_DRIVER_OK != 0)，则会调用vhost_net_start

启动vhost_net。

参考文章：

http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html

评论

成就一亿技术人!

拼手气红包6.0元

还能输入1000个字符

添加红包

插入表情

表情包

代码片

HTML/XML
objective-c
Ruby
PHP
C
C++
JavaScript
Python
Java
CSS
SQL
其它

条评论被折叠查看

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。