Inter-VM shared memory PCI device

本文介绍了一种跨虚拟机共享内存设备的实现方案,该方案通过PCI设备将共享内存映射到客户机中,并支持通过Unix域套接字进行客户机间的中断通信。此外,还介绍了如何使用该设备的不同参数配置,包括共享内存大小、中断机制等。

Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

Changes in this version are using the qdev format and optional use of MSI and
ioeventfd/irqfd.

The non-interrupt version is supported by passing the shm parameter

    -device ivshmem,size=<size in MB>,[shm=<shm_name>]

which will simply map the shm object into a BAR.

Interrupts are supported between multiple VMs by using a shared memory server
that is connected to with a socket character device

    -device ivshmem,size=<size in MB>[,chardev=<chardev name>][,irqfd=on]
            [,msi=on][,nvectors=n]
    -chardev socket,path=<path>,id=<chardev name>

The server passes file descriptors for the shared memory object and eventfds (our
interrupt mechanism) to the respective qemu instances.

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
    IntrMask = 0,
    IntrStatus = 4,
    IVPosition = 8,
    Doorbell = 12
};

The first two registers are the interrupt mask and status registers.  Mask and
status are only used with pin-based interrupts.  They are unused with MSI
interrupts.  The IVPosition register is read-only and reports the guest's ID
number.  Interrupts are triggered when a message is received on the guest's
eventfd from another VM.  To trigger an event, a guest must write to another
guest's Doorbell.  The "Doorbells" begin at offset 12.  A particular guest's
doorbell offset in the MMIO region is equal to

guest_id * 32 + Doorbell

The doorbell register for each guest is 32-bits.  The doorbell-per-guest
design was motivated for use with ioeventfd.

The semantics of the value written to the doorbell depends on whether the
device is using MSI or a regular pin-based interrupt.

Regular Interrupts
------------------

If regular interrupts are used (due to either a guest not supporting MSI or the
user specifying not to use them on the command-line) then the value written to
a guest's doorbell is what the guest's status register will be set to.

An status of (2^32 - 1) indicates that a new guest has joined.  Guests
should not send a message of this value for any other reason.

Message Signalled Interrupts
----------------------------

The important thing to remember with MSI is that it is only a signal, no
status is set (since MSI interrupts are not shared).  All information other
than the interrupt itself should be communicated via the shared memory region.
MSI is on by default.  It can be turned off with the msi=off to the parameter.

If the device uses MSI then the value written to the doorbell is the MSI vector
that will be raised.  Vector 0 is used to notify that a new guest has joined.
Vector 0 cannot be triggered by another guest since a value of 0 does not
trigger an eventfd.

ioeventfd/irqfd
---------------

ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter (it is
off by default).  When using ioeventfd/irqfd the only interrupt value that can
be passed to another guest is 1 despite what value is written to a guest's
Doorbell.

Sample programs, init scripts and the shared memory server are available in a
git repo here:

    www.gitorious.org/nahanni

Cam Macdonell (2):
  Support adding a file to qemu's ram allocation
  Inter-VM shared memory PCI device

 Makefile.target |    3 +
 cpu-common.h    |    1 +
 exec.c          |   33 +++
 hw/ivshmem.c    |  622 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-char.c     |    6 +
 qemu-char.h     |    3 +
 6 files changed, 668 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

 

 

### Info-Inter-Flat 技术实现与概念解释 #### 1. 概念解释 Info-Inter-Flat 可以理解为一种基于信息互联的扁平化网络架构或技术实现。它通常用于描述在网络中如何通过简化层级结构来提高数据传输效率和管理便利性。这种架构的目标是减少中间路由器或网关的依赖,从而避免由于复杂网络拓扑带来的性能瓶颈[^2]。此外,通过引入图结构或其他高级索引机制,Info-Inter-Flat 还可以增强对实体间关系的理解能力,类似于将图结构整合到文本索引中的方法[^1]。 #### 2. 技术实现 在技术实现层面,Info-Inter-Flat 常常依赖于以下关键技术: - **软件定义网络(SDN)**: SDN 是实现扁平化网络的核心技术之一。通过集中化的控制器,SDN 能够动态地调整网络路径,使得节点之间的通信看起来如同连接在同一交换机上,无论实际的底层网络拓扑多么复杂[^3]。这种机制特别适用于 Kubernetes 等容器编排系统中的 Pod 间通信场景。 - **VLAN 和 Trunk 模式**: 在具体的网络配置中,Info-Inter-Flat 的实现可能涉及 VLAN 的使用。例如,在 OpenStack 或其他云平台中,br-int 与 qbr/VM 对接接口通常采用 Access 模式,而 br-int 与 br-ethx 则采用 Bridge 互联的 Trunk 模式。类似地,计算节点之间的 br-ethx 互联也属于 Trunk 模式,这有助于支持多 VLAN 流量的传输。 - **封装与解封装**: 在跨节点的通信中,数据包通常会被封装以便安全地穿越底层网络。到达目标节点后,数据包会被解封装并以原始形式交付给目标 Pod 或服务。这种机制确保了即使在复杂的网络环境中,数据仍然能够高效且可靠地传递。 #### 3. 示例代码 以下是一个简单的 Python 示例,展示如何模拟数据包的封装与解封装过程: ```python def encapsulate(data, metadata): return {"data": data, "metadata": metadata} def deencapsulate(packet): return packet["data"], packet["metadata"] # 示例数据 original_data = "Hello, Info-Inter-Flat!" metadata = {"source": "NodeA", "destination": "NodeB"} # 封装过程 packet = encapsulate(original_data, metadata) # 解封装过程 restored_data, restored_metadata = deencapsulate(packet) print("Restored Data:", restored_data) print("Restored Metadata:", restored_metadata) ``` #### 4. 应用场景 Info-Inter-Flat 的应用场景广泛,尤其是在需要高效、灵活的数据传输环境中。例如: - **云计算环境**:Kubernetes 集群中的 Pod 间通信。 - **数据中心网络**:通过 SDN 实现大规模数据中心内部的扁平化网络架构。 - **物联网(IoT)**:在 IoT 设备之间建立高效的点对点通信链路。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值