Keep learning, keep living...

0%

QEMU虚拟机内识别ivshmem设备

ivshmem:(Inter-VM shared memory device)QEMU提供的一种宿主机与虚拟机之间或多个虚拟机之间共享内存的特殊设备。它有两种形式:

  • ivshmem-plain: 简单的共享内存区域
  • ivshmem-doorbell: 除了共享内存,还能提供基于中断的通信机制

这种设备在虚拟机内部表现为PCI设备,共享的内存区域则以PCI BAR的形式存在。ivshmemPCI设备提供3个BAR:

  • BAR0: 设备寄存器
  • BAR1: MSI-X
  • BAR2: 共享内存区域

简单共享内存的场景只使用BAR2就足够了。如果需要基于中断实现额外通信,需要用到BAR0BAR1。这可能需要编写内核驱动在虚拟机内处理中断,宿主机上QEMU进程在启动前需要先启动ivshmem server, 然后让QEMU进程连接到serverunix socket

具体可以参考官方文档

本文只讨论ivshmem-plain模式。宿主机上添加ivshmem设备后,虚拟机应用如何找到相应的ivshmem设备呢?

Linux的/sys/bus/pci/devices/目录会列出所有的PCI设备,ivshmem设备也会包含在其中。PCI设备都存在vendor号和device两个标识,vendor表示厂商,device表示厂商内的设备类型。ivshmem设备的vendor号为0x1af4, device号为0x1110,PCI设备的vendordevice号可在这里进行查询。

虚拟机中应用可通过遍历该目录下的具体设备,通过读取vendordevice文件来识别ivshmem设备。

但如果有两种应用都需要使用一个独立的ivshmem设备,虚拟机应用如何识别出应该使用哪个ivshmem设备呢?

因为每个PCI设备都可以由BDF:(Bus, Device, Function)来唯一标识,简单做法可以为每个应用预留好固定BDF地址。BDF地址中,BUS占用8位,Device占用5位,Function占用3位。比如,预留总线pci0的最后两个设备地址0000:00:1e.00000:00:1f.0

有时候无法预留,不同虚拟机上的ivshmem地址可能不同。这种情况可以通过与宿主机上的应用约定好相应的固定内容做为signature写入共享内存头部,虚拟机应用读取共享内存头部的signature信息来识别相应设备。

之前的文章<<QEMU monitor机制实例介绍>>介绍了QEMU的监控机制。我们使用可以该机制动态添加ivshmem设备。

首先,识别虚拟机当前的PCI设备:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
[root@controller ~]# virsh qemu-monitor-command --hmp 4 info pci
Bus 0, device 0, function 0:
Host bridge: PCI device 8086:1237
id ""
Bus 0, device 1, function 0:
ISA bridge: PCI device 8086:7000
id ""
Bus 0, device 1, function 1:
IDE controller: PCI device 8086:7010
BAR4: I/O at 0xc0a0 [0xc0af].
id ""
Bus 0, device 1, function 2:
USB controller: PCI device 8086:7020
IRQ 11.
BAR4: I/O at 0xc040 [0xc05f].
id "usb"
Bus 0, device 1, function 3:
Bridge: PCI device 8086:7113
IRQ 9.
id ""
Bus 0, device 2, function 0:
VGA controller: PCI device 1013:00b8
BAR0: 32 bit prefetchable memory at 0xfc000000 [0xfdffffff].
BAR1: 32 bit memory at 0xfebd0000 [0xfebd0fff].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0000fffe].
id "video0"
Bus 0, device 3, function 0:
Ethernet controller: PCI device 1af4:1000
IRQ 11.
BAR0: I/O at 0xc060 [0xc07f].
BAR1: 32 bit memory at 0xfebd1000 [0xfebd1fff].
BAR4: 64 bit prefetchable memory at 0xfe000000 [0xfe003fff].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id "net0"
Bus 0, device 4, function 0:
SCSI controller: PCI device 1af4:1001
IRQ 11.
BAR0: I/O at 0xc000 [0xc03f].
BAR1: 32 bit memory at 0xfebd2000 [0xfebd2fff].
BAR4: 64 bit prefetchable memory at 0xfe004000 [0xfe007fff].
id "virtio-disk0"
Bus 0, device 5, function 0:
Class 0255: PCI device 1af4:1002
IRQ 10.
BAR0: I/O at 0xc080 [0xc09f].
BAR4: 64 bit prefetchable memory at 0xfe008000 [0xfe00bfff].
id "balloon0"

PCI地址使用到了0000:00:05.0, 我们使用未使用的PCI地址添加两个ivshmem设备,0000:00:10.0大小为16M, 0000:00:11.0大小为8M:

1
2
3
4
virsh qemu-monitor-command --hmp 4 "object_add  memory-backend-file,size=16M,share,mem-path=/dev/shm/shm1,id=shm1"
virsh qemu-monitor-command --hmp 4 "device_add ivshmem-plain,memdev=shm1,bus=pci.0,addr=0x10,master=on"
virsh qemu-monitor-command --hmp 4 "object_add memory-backend-file,size=8M,share,mem-path=/dev/shm/shm2,id=shm2"
virsh qemu-monitor-command --hmp 4 "device_add ivshmem-plain,memdev=shm2,bus=pci.0,addr=0x11,master=on"

登录到虚拟机上查看PCI设备,可以看到:

1
2
3
4
5
6
7
8
9
10
11
12
[root@host-172-16-0-29 ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:10.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)
00:11.0 RAM memory: Red Hat, Inc. Inter-VM shared memory (rev 01)

查看目录/sys/bus/pci/devices/, 也可以看这些设备:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@host-172-16-0-29 ~]# ls -l /sys/bus/pci/devices/
total 0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:01.1 -> ../../../devices/pci0000:00/0000:00:01.1
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:01.2 -> ../../../devices/pci0000:00/0000:00:01.2
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:01.3 -> ../../../devices/pci0000:00/0000:00:01.3
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:03.0 -> ../../../devices/pci0000:00/0000:00:03.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:04.0 -> ../../../devices/pci0000:00/0000:00:04.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:39 0000:00:05.0 -> ../../../devices/pci0000:00/0000:00:05.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:47 0000:00:10.0 -> ../../../devices/pci0000:00/0000:00:10.0
lrwxrwxrwx. 1 root root 0 Sep 12 08:47 0000:00:11.0 -> ../../../devices/pci0000:00/0000:00:11.0

分别查看两个ivshmem设备目录下vendordevice文件的内容,可以看到vendor都是0x1af4,device都是0x1110:

1
2
3
4
5
6
7
8
[root@host-172-16-0-29 ~]# cat /sys/bus/pci/devices/0000\:00\:10.0/vendor
0x1af4
[root@host-172-16-0-29 ~]# cat /sys/bus/pci/devices/0000\:00\:10.0/device
0x1110
[root@host-172-16-0-29 ~]# cat /sys/bus/pci/devices/0000\:00\:11.0/vendor
0x1af4
[root@host-172-16-0-29 ~]# cat /sys/bus/pci/devices/0000\:00\:11.0/device
0x1110

再查看一个其他PCI设备0000:00:05.0device值为0x1002, 不是0x1110:

1
2
[root@host-172-16-0-29 ~]# cat /sys/bus/pci/devices/0000\:00\:05.0/device
0x1002

ivshmem设备的共享内存以设备目录下的resource2文件存在,虚拟机应用可以通过mmap调用来读写该内存区域。查看两个ivshmem共享内存的大小,可以看到0000:00:10.0的大小为16M, 0000:00:11.0的大小为8M:

1
2
3
4
[root@host-172-16-0-29 ~]# ls -l /sys/bus/pci/devices/0000\:00\:10.0/resource2
-rw-------. 1 root root 16777216 9月 12 09:23 /sys/bus/pci/devices/0000:00:10.0/resource2
[root@host-172-16-0-29 ~]# ls -l /sys/bus/pci/devices/0000\:00\:11.0/resource2
-rw-------. 1 root root 8388608 9月 12 11:11 /sys/bus/pci/devices/0000:00:11.0/resource2

在宿主机上分别在两个共享内存区域中写入8字节(含字符串换行符)的不同标识信息:

1
2
echo "SIGN_01" > /dev/shm/shm1
echo "SIGN_02" > /dev/shm/shm2

在虚拟机上编写一个简单的程序来读出第一个ivshmem设备的前8字节, C语言代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <assert.h>


#define SHM_SIZE (16 * 1024 * 1024)


int main(int argc, char **argv) {
char *p;
int fd;
int i;

fd = open("/sys/bus/pci/devices/0000:00:10.0/resource2", O_RDWR);
assert(fd != -1);

p = mmap(0, SHM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
assert(p != NULL);

for (i = 0; i < 8; i++) {
printf("%c", p[i]);
}

munmap(p, SHM_SIZE);
close(fd);

return 0;
}

在虚拟机上编译执行, 可以看到宿主机上写入的标识信息。

1
2
3
[root@host-172-16-0-29 ~]# gcc t.c
[root@host-172-16-0-29 ~]# ./a.out
SIGN_01

在真实的生产应用中,对于共享内存的使用不会这么简单,而是要构造相当复杂的数据结构。比如,可以在共享内存中构造基于偏移量的环形队列结构,用于双向的信息发送。

基于中断方式的通信方式,后续再专门文章来介绍。

参考文档: