* Re: [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK
2020-08-14 19:16 [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Chenbo Xia
@ 2020-08-14 15:00 ` Stephen Hemminger
2020-08-17 2:58 ` Xia, Chenbo
2020-08-14 19:16 ` [dpdk-dev] [RFC v1 1/2] vfio_user: Add library for vfio over socket Chenbo Xia
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2020-08-14 15:00 UTC (permalink / raw)
To: Chenbo Xia
Cc: dev, thomas, xuan.ding, xiuchun.lu, cunming.liang, changpeng.liu,
zhihong.wang
On Fri, 14 Aug 2020 19:16:04 +0000
Chenbo Xia <chenbo.xia@intel.com> wrote:
> This series enables DPDK to be an alternative I/O device emulation library of
> building virtualized devices in separate processes outside QEMU. It introduces
> a new library (librte_vfio_user), a new device class (emudev) and one pilot
> device provider (avf_emudev) with its backend of Ethdev PMD (avfbe_ethdev).
>
> *librte_vfio_user* is a server implementation of VFIO-over-socket[1] (also
> known as vfio-user) which is a protocol that allows a device to be virtualized
> in a separate process outside of QEMU.
>
> *emudev* is a device type for emulated devices. It is up to device provider to
> choose the transport. In avf_emudev case, it uses vfio-user as transport
> communicate with its client (e.g., QEMU).
>
> *avf_emudev* is the emudev provider of AVF which is a device specification for
> Intel Virtual Function cross generation. It’s implemented by an AVF emudev
> driver which offers a few APIs for avfbe_ethdev or app logic to operate.
>
> *avfbe_ethdev* is a normal ethdev PMD to supply the basic I/O as backend data
> path of avf_emudev. One simple usage of avfbe_ethdev could be a para-virtualized
> backend connected with network application logic.
>
> Background & Motivation
> -----------------------
> In order to reduce the attack surface, QEMU community is disaggregating QEMU by
> removing part of device emulation from it. The disaggregated/multi-process QEMU
> is using VFIO-over-socket/vfio-user as the main transport mechanism to disaggregate
> I/O services from QEMU[2]. Vfio-user essentially implements the VFIO device model
> presented to the user process by a set of messages over a unix-domain socket. The
> main difference between application using vfio-user and application using vfio
> kernel module is that device manipulation is based on socket messages for vfio-user
> but system calls for vfio kernel module. The vfio-user devices consist of a generic
> VFIO device type, living in QEMU, which is called the client[3], and the core device
> implementation (emulated device), living outside of QEMU, which is called the server.
>
> With the introduction and support of vfio-user in QEMU, QEMU is explicitly adding
> support for external emulated device and data path. We are trying to leverage that
> and introducing vfio-user support in DPDK. By doing so, DPDK is enabled to be an
> alternative I/O device emulation library of building virtualized devices along with
> high-performance data path in separate processes outside QEMU. It will be easy for
> hardware vendors to provide virtualized solutions of their hardware devices by
> implementing emulated device in DPDK.
>
> Except for vfio-user introduced in DPDK, this series also introduces the first
> emulated device implementation. That is emulated AVF device (avf_emudev) implemented
> by AVF emulation driver (avf_emudev driver). Emulated AVF device demos how emulated
> device could be implemented in DPDK. SPDK is also investigating to implement use case
> for NVMe.
>
> Design overview
> ---------------
>
> +------------------------------------------------------+
> | +---------------+ +---------------+ |
> | | avf_emudev | | avfbe_ethdev | |
> | | driver | | driver | |
> | +---------------+ +---------------+ |
> | | | |
> | ------------------------------------------- VDEV BUS |
> | | | |
> | +---------------+ +--------------+ |
> +--------------+ | | vdev: | | vdev: | |
> | +----------+ | | | /path/to/vfio | | avf_emudev_# | |
> | | Generic | | | +---------------+ +--------------+ |
> | | vfio-dev | | | | |
> | +----------+ | | | |
> | +----------+ | | +----------+ |
> | | vfio-user| | | | vfio-user| |
> | | client | |<---|----->| server | |
> | +----------+ | | +----------+ |
> | QEMU | | DPDK |
> +--------------+ +------------------------------------------------------+
>
> - vfio-user. Vfio-user in DPDK is referred to the vfio-user protocol implementation
> playing server role. It provides transport between emulated device and generic VFIO
> device in QEMU. Emulated device in DPDK and generic VFIO device in QEMU are working
> together to present VFIO device model to VM. This series introduces vfio-user
> implementation as a library called librte_vfio_user which is under lib/librte_vfio_user.
>
> - vdev:/path/to/vfio. It defines the emudev device and binds to vdev bus driver. The
> emudev device is defined by DPDK applications through command line as '--vdev=emu_iavf,
> path=/path/to/socket, id=#' in avf_emudev case. Parameters in command line include device
> name (emu_iavf) which is used to identify corresponding driver (in this case, avf_emudev
> driver which implements emudev device of AVF), path=/path/to/socket which is used to open
> the transport interface to vfio-user client in QEMU, and id which is the index of emudev
> device.
>
> - avf_emudev driver. It implements emulated AVF device which is the emudev provider of
> AVF. The avf_emudev_driver offers a few APIs implementation exposed by emudev device APIs
> for avfbe_ethdev_pmd or application logic to operate. These APIs are described in
> lib/librte_emudev/rte_emudev.h.
>
> - vdev: avf_emudev_#. The vdev device is defined by DPDK application through command line
> as '--vdev=net_avfbe,id=#,avf_emu_id=#'.It is associated with emudev provider of AVF by
> 'avf_emu_id=#'.
>
> - avfbe_ethdev driver. It is a normal ethdev PMD to supply the basic I/O as backend data
> path of avf_emudev.
>
> Why not rawdev for emulated device
> ----------------------------------
> Instead of introducing new class emudev, emulated device could be presented as rawdev.
> However, existing rawdev APIs cannot meet the requirements of emulated device. There are
> three API categories for emudev. They are emudev device lifecycle management, backend
> facing APIs, and emudev device provider facing APIs respectively. Existing rawdev APIs
> could only cover lifecycle management APIs and some of backend facing APIs. Other APIs,
> even if added to rawdev API are not required by other rawdev applications.
>
> References
> ----------
> [1]: https://patchew.org/QEMU/1594913503-52271-1-git-send-email-thanos.makatos@nutanix.com/
> [2]: https://wiki.qemu.org/Features/MultiProcessQEMU
> [3]: https://github.com/elmarco/qemu/blob/wip/vfio-user/hw/vfio/libvfio-user.c
>
> Chenbo Xia (2):
> vfio_user: Add library for vfio over socket
> emudev: Add library for emulated device
>
> lib/librte_emudev/rte_emudev.h | 315 +++++++++++++++++++++++++
> lib/librte_vfio_user/rte_vfio_user.h | 335 +++++++++++++++++++++++++++
> 2 files changed, 650 insertions(+)
> create mode 100644 lib/librte_emudev/rte_emudev.h
> create mode 100644 lib/librte_vfio_user/rte_vfio_user.h
>
This looks good, but it would be good to have an example or way to integrate
it into a test framework. One of the agree upon principles by the tech board
is "all new features should have test cases". There have been a lot of exceptions though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK
2020-08-14 15:00 ` Stephen Hemminger
@ 2020-08-17 2:58 ` Xia, Chenbo
0 siblings, 0 replies; 7+ messages in thread
From: Xia, Chenbo @ 2020-08-17 2:58 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, thomas, Ding, Xuan, Lu, Xiuchun, Liang, Cunming, Liu,
Changpeng, Wang, Zhihong
Hi Stephen,
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, August 14, 2020 11:00 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Ding, Xuan <xuan.ding@intel.com>;
> Lu, Xiuchun <xiuchun.lu@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>; Liu, Changpeng <changpeng.liu@intel.com>; Wang,
> Zhihong <zhihong.wang@intel.com>
> Subject: Re: [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK
>
> On Fri, 14 Aug 2020 19:16:04 +0000
> Chenbo Xia <chenbo.xia@intel.com> wrote:
>
> > This series enables DPDK to be an alternative I/O device emulation
> library of
> > building virtualized devices in separate processes outside QEMU. It
> introduces
> > a new library (librte_vfio_user), a new device class (emudev) and one
> pilot
> > device provider (avf_emudev) with its backend of Ethdev PMD
> (avfbe_ethdev).
> >
> > *librte_vfio_user* is a server implementation of VFIO-over-socket[1]
> (also
> > known as vfio-user) which is a protocol that allows a device to be
> virtualized
> > in a separate process outside of QEMU.
> >
> > *emudev* is a device type for emulated devices. It is up to device
> provider to
> > choose the transport. In avf_emudev case, it uses vfio-user as transport
> > communicate with its client (e.g., QEMU).
> >
> > *avf_emudev* is the emudev provider of AVF which is a device
> specification for
> > Intel Virtual Function cross generation. It’s implemented by an AVF
> emudev
> > driver which offers a few APIs for avfbe_ethdev or app logic to operate.
> >
> > *avfbe_ethdev* is a normal ethdev PMD to supply the basic I/O as backend
> data
> > path of avf_emudev. One simple usage of avfbe_ethdev could be a para-
> virtualized
> > backend connected with network application logic.
> >
> > Background & Motivation
> > -----------------------
> > In order to reduce the attack surface, QEMU community is disaggregating
> QEMU by
> > removing part of device emulation from it. The disaggregated/multi-
> process QEMU
> > is using VFIO-over-socket/vfio-user as the main transport mechanism to
> disaggregate
> > I/O services from QEMU[2]. Vfio-user essentially implements the VFIO
> device model
> > presented to the user process by a set of messages over a unix-domain
> socket. The
> > main difference between application using vfio-user and application
> using vfio
> > kernel module is that device manipulation is based on socket messages
> for vfio-user
> > but system calls for vfio kernel module. The vfio-user devices consist
> of a generic
> > VFIO device type, living in QEMU, which is called the client[3], and the
> core device
> > implementation (emulated device), living outside of QEMU, which is
> called the server.
> >
> > With the introduction and support of vfio-user in QEMU, QEMU is
> explicitly adding
> > support for external emulated device and data path. We are trying to
> leverage that
> > and introducing vfio-user support in DPDK. By doing so, DPDK is enabled
> to be an
> > alternative I/O device emulation library of building virtualized devices
> along with
> > high-performance data path in separate processes outside QEMU. It will
> be easy for
> > hardware vendors to provide virtualized solutions of their hardware
> devices by
> > implementing emulated device in DPDK.
> >
> > Except for vfio-user introduced in DPDK, this series also introduces the
> first
> > emulated device implementation. That is emulated AVF device (avf_emudev)
> implemented
> > by AVF emulation driver (avf_emudev driver). Emulated AVF device demos
> how emulated
> > device could be implemented in DPDK. SPDK is also investigating to
> implement use case
> > for NVMe.
> >
> > Design overview
> > ---------------
> >
> > +---------------------------------------------------
> ---+
> > | +---------------+ +---------------+
> |
> > | | avf_emudev | | avfbe_ethdev |
> |
> > | | driver | | driver |
> |
> > | +---------------+ +---------------+
> |
> > | | |
> |
> > | ------------------------------------------- VDEV
> BUS |
> > | | |
> |
> > | +---------------+ +--------------+
> |
> > +--------------+ | | vdev: | | vdev: |
> |
> > | +----------+ | | | /path/to/vfio | | avf_emudev_# |
> |
> > | | Generic | | | +---------------+ +--------------+
> |
> > | | vfio-dev | | | |
> |
> > | +----------+ | | |
> |
> > | +----------+ | | +----------+
> |
> > | | vfio-user| | | | vfio-user|
> |
> > | | client | |<---|----->| server |
> |
> > | +----------+ | | +----------+
> |
> > | QEMU | | DPDK
> |
> > +--------------+ +---------------------------------------------------
> ---+
> >
> > - vfio-user. Vfio-user in DPDK is referred to the vfio-user protocol
> implementation
> > playing server role. It provides transport between emulated device and
> generic VFIO
> > device in QEMU. Emulated device in DPDK and generic VFIO device in QEMU
> are working
> > together to present VFIO device model to VM. This series introduces
> vfio-user
> > implementation as a library called librte_vfio_user which is under
> lib/librte_vfio_user.
> >
> > - vdev:/path/to/vfio. It defines the emudev device and binds to vdev bus
> driver. The
> > emudev device is defined by DPDK applications through command line as '-
> -vdev=emu_iavf,
> > path=/path/to/socket, id=#' in avf_emudev case. Parameters in command
> line include device
> > name (emu_iavf) which is used to identify corresponding driver (in this
> case, avf_emudev
> > driver which implements emudev device of AVF), path=/path/to/socket
> which is used to open
> > the transport interface to vfio-user client in QEMU, and id which is the
> index of emudev
> > device.
> >
> > - avf_emudev driver. It implements emulated AVF device which is the
> emudev provider of
> > AVF. The avf_emudev_driver offers a few APIs implementation exposed by
> emudev device APIs
> > for avfbe_ethdev_pmd or application logic to operate. These APIs are
> described in
> > lib/librte_emudev/rte_emudev.h.
> >
> > - vdev: avf_emudev_#. The vdev device is defined by DPDK application
> through command line
> > as '--vdev=net_avfbe,id=#,avf_emu_id=#'.It is associated with emudev
> provider of AVF by
> > 'avf_emu_id=#'.
> >
> > - avfbe_ethdev driver. It is a normal ethdev PMD to supply the basic I/O
> as backend data
> > path of avf_emudev.
> >
> > Why not rawdev for emulated device
> > ----------------------------------
> > Instead of introducing new class emudev, emulated device could be
> presented as rawdev.
> > However, existing rawdev APIs cannot meet the requirements of emulated
> device. There are
> > three API categories for emudev. They are emudev device lifecycle
> management, backend
> > facing APIs, and emudev device provider facing APIs respectively.
> Existing rawdev APIs
> > could only cover lifecycle management APIs and some of backend facing
> APIs. Other APIs,
> > even if added to rawdev API are not required by other rawdev
> applications.
> >
> > References
> > ----------
> > [1]: https://patchew.org/QEMU/1594913503-52271-1-git-send-email-
> thanos.makatos@nutanix.com/
> > [2]: https://wiki.qemu.org/Features/MultiProcessQEMU
> > [3]: https://github.com/elmarco/qemu/blob/wip/vfio-user/hw/vfio/libvfio-
> user.c
> >
> > Chenbo Xia (2):
> > vfio_user: Add library for vfio over socket
> > emudev: Add library for emulated device
> >
> > lib/librte_emudev/rte_emudev.h | 315 +++++++++++++++++++++++++
> > lib/librte_vfio_user/rte_vfio_user.h | 335 +++++++++++++++++++++++++++
> > 2 files changed, 650 insertions(+)
> > create mode 100644 lib/librte_emudev/rte_emudev.h
> > create mode 100644 lib/librte_vfio_user/rte_vfio_user.h
> >
>
> This looks good, but it would be good to have an example or way to
> integrate
> it into a test framework. One of the agree upon principles by the tech
> board
> is "all new features should have test cases". There have been a lot of
> exceptions though.
Thanks a lot for spending time on the review!
Since our example is one ethdev PMD driving an emulated vfio-user device, it should be
easy for APP like testpmd to test once a vfio-user client is ready (in QEMU or other
place). And yes, we will ensure the libs will be into a test framework 😊.
Thanks!
Chenbo
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [RFC v1 1/2] vfio_user: Add library for vfio over socket
2020-08-14 19:16 [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Chenbo Xia
2020-08-14 15:00 ` Stephen Hemminger
@ 2020-08-14 19:16 ` Chenbo Xia
2020-08-14 19:16 ` [dpdk-dev] [RFC v1 2/2] emudev: Add library for emulated device Chenbo Xia
2020-09-02 21:10 ` [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Thomas Monjalon
3 siblings, 0 replies; 7+ messages in thread
From: Chenbo Xia @ 2020-08-14 19:16 UTC (permalink / raw)
To: dev, thomas, xuan.ding, xiuchun.lu, cunming.liang, changpeng.liu
Cc: zhihong.wang
Vfio-over-socket, also named as vfio-user, is a protocol for
emulating devices in a separate process outside of QEMU. The
main difference between APP using vfio-user and vfio kernel
module is that device manipulation is based on socket messages
for vfio-user but system calls for vfio kernel module.
This protocol has a server/client model and for now QEMU plays
the role of client. This patch implements vfio-user server of the
protocol in DPDK.
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Signed-off-by: Xiuchun Lu <xiuchun.lu@intel.com>
---
lib/librte_vfio_user/rte_vfio_user.h | 335 +++++++++++++++++++++++++++
1 file changed, 335 insertions(+)
create mode 100644 lib/librte_vfio_user/rte_vfio_user.h
diff --git a/lib/librte_vfio_user/rte_vfio_user.h b/lib/librte_vfio_user/rte_vfio_user.h
new file mode 100644
index 000000000..d36516084
--- /dev/null
+++ b/lib/librte_vfio_user/rte_vfio_user.h
@@ -0,0 +1,335 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#ifndef _VFIO_USER_H
+#define _VFIO_USER_H
+
+#include <stdint.h>
+#include <stddef.h>
+#include <linux/vfio.h>
+#include <net/if.h>
+#include <sys/queue.h>
+#include <sys/un.h>
+
+#define VFIO_USER_MSG_MAX_NREGIONS 8
+#define VFIO_USER_MAX_MEM_REGIONS 256
+#define VFIO_MAX_RW_DATA 256
+#define VFIO_USER_MAX_FD 64
+#define VFIO_USER_IRQ_MAX_DATA 64
+#define VFIO_USER_MAX_IRQ_FD 64
+
+typedef enum VFIO_USER_CMD_TYPE {
+ VFIO_USER_NONE = 0,
+ VFIO_USER_VERSION = 1,
+ VFIO_USER_DMA_MAP = 2,
+ VFIO_USER_DMA_UNMAP = 3,
+ VFIO_USER_DEVICE_GET_INFO = 4,
+ VFIO_USER_DEVICE_GET_REGION_INFO = 5,
+ VFIO_USER_DEVICE_GET_IRQ_INFO = 6,
+ VFIO_USER_DEVICE_SET_IRQS = 7,
+ VFIO_USER_REGION_READ = 8,
+ VFIO_USER_REGION_WRITE = 9,
+ VFIO_USER_DMA_READ = 10,
+ VFIO_USER_DMA_WRITE = 11,
+ VFIO_USER_VM_INTERRUPT = 12,
+ VFIO_USER_DEVICE_RESET = 13,
+ VFIO_USER_MAX = 14,
+} VFIO_USER_CMD_TYPE;
+
+struct vfio_user_mem_reg {
+ uint64_t gpa;
+ uint64_t size;
+ uint64_t fd_offset;
+ uint32_t protection; /* attributes in <sys/mman.h> */
+#define VFIO_USER_MEM_MAPPABLE (0x1 << 0)
+ uint32_t flags;
+};
+
+struct vfio_user_dev_info {
+ uint32_t argsz; /* Reserved in vfio-user */
+ uint32_t flags;
+ uint32_t num_regions;
+ uint32_t num_irqs;
+};
+
+struct vfio_user_reg_rw {
+ uint64_t reg_offset;
+ uint32_t reg_idx;
+ uint32_t size;
+ char data[VFIO_MAX_RW_DATA];
+};
+
+struct vfio_user_dma_rw {
+ uint64_t addr;
+ uint32_t size;
+ char data[VFIO_MAX_RW_DATA];
+};
+
+struct vfio_user_intr {
+ uint32_t type;
+ uint32_t vector;
+};
+
+typedef struct vfio_user_msg {
+ uint16_t dev_id;
+ uint16_t msg_id;
+ uint32_t cmd;
+ uint32_t size;
+#define VFIO_USER_REPLY_MASK (0x1 << 0)
+#define VFIO_USER_NEED_NO_RP (0x1 << 1)
+ uint32_t flags;
+ union {
+ struct vfio_user_mem_reg memory[VFIO_USER_MSG_MAX_NREGIONS];
+ struct vfio_user_dev_info dev_info;
+ struct vfio_region_info reg_info;
+ struct vfio_irq_info irq_info;
+ struct vfio_irq_set irq_set;
+ struct vfio_user_reg_rw reg_rw;
+ struct vfio_user_dma_rw dma_rw;
+ struct vfio_user_intr intr;
+ } payload;
+ int fds[VFIO_USER_MAX_FD];
+ int fd_num;
+} __attribute((packed)) VFIO_USER_MSG;
+
+#define VFIO_USER_MSG_HDR_SIZE offsetof(VFIO_USER_MSG, payload.dev_info)
+
+enum vfio_user_msg_handle_result {
+ VFIO_USER_MSG_HANDLE_ERR = -1,
+ VFIO_USER_MSG_HANDLE_OK = 0,
+ VFIO_USER_MSG_HANDLE_REPLY = 1,
+};
+
+struct vfio_user_mem_table_entry {
+ struct vfio_user_mem_reg region;
+ uint64_t host_user_addr;
+ void *mmap_addr;
+ uint64_t mmap_size;
+ int fd;
+};
+
+struct vfio_user_mem {
+ uint32_t entry_num;
+ struct vfio_user_mem_table_entry entry[VFIO_USER_MAX_MEM_REGIONS];
+};
+
+struct vfio_user_regions {
+ uint32_t reg_num;
+ struct vfio_region_info **reg_info;
+};
+
+struct vfio_user_irq_info {
+ uint32_t irq_num;
+ struct vfio_irq_info *irq_info;
+};
+
+struct vfio_user_irq_set {
+ uint32_t set_num;
+ struct vfio_irq_set **irq;
+ int fds[VFIO_USER_MAX_IRQ_FD];
+};
+
+struct vfio_user_irqs {
+ struct vfio_user_irq_info *info;
+ struct vfio_user_irq_set *set;
+};
+
+struct vfio_user_region_resource {
+ void *base;
+ uint32_t size;
+ int fd;
+};
+
+struct vfio_user_resource {
+ uint16_t resource_num;
+ struct vfio_user_region_resource res[];
+};
+
+struct vfio_user {
+ int dev_id;
+ int is_ready;
+#define IF_NAME_SZ (IFNAMSIZ > PATH_MAX ? IFNAMSIZ : PATH_MAX)
+ char sock_addr[IF_NAME_SZ];
+ const struct vfio_user_notify_ops *ops;
+ struct vfio_user_mem *mem;
+ struct vfio_user_dev_info *dev_info;
+ struct vfio_user_regions *reg;
+ struct vfio_user_irqs *irq;
+ struct vfio_user_resource *res;
+};
+
+struct vfio_user_notify_ops {
+ int (*new_device)(int dev_id); /* Add device */
+ void (*destroy_device)(int dev_id); /* Remove device */
+ int (*update_status)(int dev_id); /* Update device status */
+};
+
+typedef void (*vfio_user_log)(const char *format, ...);
+
+typedef int (*event_handler)(int fd, void *data);
+
+typedef struct listen_fd_info {
+ int fd;
+ uint32_t event;
+ event_handler ev_handle;
+ void *data;
+} FD_INFO;
+
+struct vfio_user_epoll {
+ int epfd;
+ FD_INFO fdinfo[VFIO_USER_MAX_FD];
+ uint32_t fd_num; /* Current num of listen_fd */
+ struct epoll_event *events;
+ pthread_mutex_t fd_mutex;
+};
+
+struct vfio_user_socket {
+ char *sock_addr;
+ struct sockaddr_un un;
+ int sock_fd;
+ int dev_id;
+};
+
+struct vfio_user_ep_sock {
+ struct vfio_user_epoll ep;
+ struct vfio_user_socket *sock[VFIO_USER_MAX_FD];
+ uint32_t sock_num;
+ pthread_mutex_t mutex;
+};
+
+/**
+ * Register a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @param ops
+ * Notify ops for the device
+ * @param log
+ * Log callback for the device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_register(const char *sock_addr,
+ const struct vfio_user_notify_ops *ops,
+ vfio_user_log log);
+
+/**
+ * Unregister a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_unregister(const char *sock_addr);
+
+/**
+ * Start vfio-user handling for the device.
+ *
+ * This function triggers vfio-user message handling.
+ * @param sock_addr
+ * Unix domain socket address
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_start(const char *sock_addr);
+
+/**
+ * Stop vfio-user handling for the device.
+ *
+ * This function stops vfio-user message handling.
+ * @param sock_addr
+ * Unix domain socket address
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_stop(const char *sock_addr);
+
+/**
+ * Get the socket address for a vfio-user device.
+ *
+ * @param dev_id
+ * Vfio-user device ID
+ * @param buf
+ * Buffer to store socket address
+ * @param len
+ * The len of buf
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_get_sock_addr(int dev_id, char *buf, size_t len);
+
+/**
+ * Get the memory table of a vfio-user device.
+ *
+ * @param dev_id
+ * Vfio-user device ID
+ * @return
+ * Pointer to memory table on success, NULL on failure
+ */
+struct vfio_user_mem *rte_vfio_user_get_mem_table(int dev_id);
+
+/**
+ * Get the irq set of a vfio-user device.
+ *
+ * @param dev_id
+ * Vfio-user device ID
+ * @return
+ * Pointer to irq set on success, NULL on failure
+ */
+struct vfio_user_irq_set *rte_vfio_user_get_irq(int dev_id);
+
+/**
+ * Set the device info for a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @param dev_info
+ * Device info for the vfio-user device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_set_dev_info(const char *sock_addr,
+ struct vfio_user_dev_info *dev_info);
+
+/**
+ * Set the region info for a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @param reg
+ * Region info for the vfio-user device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_set_reg_info(const char *sock_addr,
+ struct vfio_user_regions *reg);
+
+/**
+ * Set the irq info for a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @param irq
+ * IRQ info for the vfio-user device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_set_irq_info(const char *sock_addr,
+ struct vfio_user_irq_info *irq);
+
+/**
+ * Set the device resource for a vfio-user device.
+ *
+ * @param sock_addr
+ * Unix domain socket address
+ * @param res
+ * Resource info for the vfio-user device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_vfio_user_set_resource(const char *sock_addr,
+ struct vfio_user_resource *res);
+
+#endif
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [RFC v1 2/2] emudev: Add library for emulated device
2020-08-14 19:16 [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Chenbo Xia
2020-08-14 15:00 ` Stephen Hemminger
2020-08-14 19:16 ` [dpdk-dev] [RFC v1 1/2] vfio_user: Add library for vfio over socket Chenbo Xia
@ 2020-08-14 19:16 ` Chenbo Xia
2020-09-02 21:10 ` [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Thomas Monjalon
3 siblings, 0 replies; 7+ messages in thread
From: Chenbo Xia @ 2020-08-14 19:16 UTC (permalink / raw)
To: dev, thomas, xuan.ding, xiuchun.lu, cunming.liang, changpeng.liu
Cc: zhihong.wang
To enable DPDK to be an alternative I/O device emulation library
of building virtualized devices in separate processes outside QEMU,
a new device class named emudev is introduced in this patch. Emudev
is a device type for emulated devices. Providers, which are specific
emudev drivers, could choose the transport to QEMU. An option of
tranport could be vfio-over-socket (also called vfio-user), which is
defined by a standard protocol in QEMU.
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Signed-off-by: Xiuchun Lu <xiuchun.lu@intel.com>
---
lib/librte_emudev/rte_emudev.h | 315 +++++++++++++++++++++++++++++++++
1 file changed, 315 insertions(+)
create mode 100644 lib/librte_emudev/rte_emudev.h
diff --git a/lib/librte_emudev/rte_emudev.h b/lib/librte_emudev/rte_emudev.h
new file mode 100644
index 000000000..2ffc4dbe0
--- /dev/null
+++ b/lib/librte_emudev/rte_emudev.h
@@ -0,0 +1,315 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#ifndef _RTE_EMUDEV_H_
+#define _RTE_EMUDEV_H_
+
+#include <rte_config.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+
+typedef void *rte_emudev_conf_t;
+typedef void *rte_emudev_attr_t;
+typedef void *rte_emudev_mem_table_t;
+typedef char *emu_dev_type_t;
+
+struct rte_emu_dev;
+
+struct emu_dev_info {
+ emu_dev_type_t dev_type;
+ uint32_t max_qp_num;
+ uint32_t max_event_num;
+};
+
+struct emu_dev_q_info {
+ uint64_t base;
+ uint64_t size;
+ uint32_t doorbell_id;
+ uint32_t irq_vector;
+ void *priv;
+};
+
+struct emu_dev_irq_info {
+ uint32_t vector;
+ int fd;
+ void *priv;
+};
+
+struct emu_dev_db_info {
+ uint32_t id;
+ uint32_t flag;
+#define EMU_DEV_DB_FD (0x1 << 0)
+#define EMU_DEV_DB_MEM (0x1 << 1)
+ union {
+ int fd;
+ struct {
+ uint64_t base;
+ uint64_t size;
+ } mem;
+ } data;
+ void *priv;
+};
+
+/**
+ * Back-end driver and emualated device provider should have
+ * the same definiton of events and events message.
+ */
+struct emu_dev_event_channel {
+ int fd;
+ struct rte_ring *queue;
+};
+
+struct emu_dev_attr_info {
+ const char *attr_name;
+ rte_emudev_attr_t attr;
+};
+
+struct emu_dev_ops {
+ int (*dev_start)(struct rte_emu_dev *dev);
+ void (*dev_stop)(struct rte_emu_dev *dev);
+ int (*dev_configure)(struct rte_emu_dev *dev,
+ rte_emudev_conf_t dev_conf);
+ int (*dev_close)(struct rte_emu_dev *dev);
+ struct emu_dev_info *(*get_dev_info)(struct rte_emu_dev *dev);
+ int (*subscribe_event)(struct rte_emu_dev *dev,
+ const struct emu_dev_event_channel *ev_chnl);
+ int (*unsubscribe_event)(struct rte_emu_dev *dev,
+ const struct emu_dev_event_channel *ev_chnl);
+ rte_emudev_mem_table_t (*get_mem_table)(struct rte_emu_dev *dev);
+ struct emu_dev_q_info *(*get_queue_info)(struct rte_emu_dev *dev,
+ uint32_t queue);
+ struct emu_dev_irq_info *(*get_irq_info)(struct rte_emu_dev *dev,
+ uint32_t vector);
+ struct emu_dev_db_info *(*get_db_info)(struct rte_emu_dev *dev,
+ uint32_t doorbell);
+ rte_emudev_attr_t (*get_attr)(struct rte_emu_dev *dev,
+ const char *attr_name);
+ int (*set_attr)(struct rte_emu_dev *dev, const char *attr_name,
+ rte_emudev_attr_t attr);
+ int (*region_map)(struct rte_emu_dev *dev, const char *region_name,
+ uint16_t region_size, uint64_t *base_addr);
+};
+
+struct rte_emu_dev {
+ struct rte_device *device;
+ const struct emu_dev_ops *dev_ops;
+ const struct emu_dev_event_channel *ev_chnl;
+ struct emu_dev_info *dev_info;
+ uint16_t num_attr;
+ struct emu_dev_attr_info **attr;
+ void *priv_data;
+} __rte_cache_aligned;
+
+/**
+ * Note that 'rte_emu_dev_allocate','rte_emu_dev_release' and
+ * 'rte_emu_dev_allocated' should be called by emulated device
+ * provider.
+ * /
+
+/**
+ * Allocate a new emudev for an emulation device and retures the pointer
+ * to the emudev.
+ *
+ * @param name
+ * Name of the emudev
+ * @return
+ * Pointer to rte_emu_dev on success, NULL on failure
+ */
+struct rte_emu_dev *
+rte_emu_dev_allocate(const char *name);
+
+/**
+ * Release the emudev.
+ *
+ * @param dev
+ * The emulated device
+ * @return
+ * 0 on success, -1 on failure
+ */
+int
+rte_emu_dev_release(struct rte_emu_dev *dev);
+
+/**
+ * Find an emudev using name.
+ *
+ * @param name
+ * Name of the emudev
+ * @return
+ * Pointer to rte_emu_dev on success, NULL on failure
+ */
+struct rte_emu_dev *
+rte_emu_dev_allocated(const char *name);
+
+/**
+ * Start an emulation device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_dev_start(uint16_t dev_id);
+
+/**
+ * Stop an emulation device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ */
+void rte_emu_dev_stop(uint16_t dev_id);
+
+/**
+ * Configure an emulation device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param dev_conf
+ * Device configure info
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_dev_configure(uint16_t dev_id, rte_emudev_conf_t dev_conf);
+
+/**
+ * Close an emulation device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ */
+void rte_emu_dev_close(uint16_t dev_id);
+
+/* Note that below APIs should only be called by back-end driver */
+
+/**
+ * Back-end driver subscribes events of the emulated device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param ev_chnl
+ * Event channel that events should be passed to
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_subscribe_event(uint16_t dev_id,
+ const struct emu_dev_event_channel *ev_chnl);
+
+/**
+ * Back-end driver unsubscribes events of the emulated device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param set
+ * Event channel that events should be passed to
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_unsubscribe_event(uint16_t dev_id,
+ const struct emu_dev_event_channel *ev_chnl);
+
+/**
+ * Back-end driver gets the device info of the emulated device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @return
+ * Pointer to dev info on success, NULL on failure
+ */
+struct emu_dev_info *rte_emu_get_dev_info(uint16_t dev_id);
+
+/**
+ * Get the memory table content and operations of the emulated device.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @return
+ * Pointer to memory table on success, NULL on failure
+ */
+rte_emudev_mem_table_t rte_emu_get_mem_table(uint16_t dev_id);
+
+/**
+ * Get queue info of the emudev.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param queue
+ * Queue ID of emudev
+ * @return
+ * Pointer to queue info on success, NULL on failure
+ */
+struct emu_dev_q_info *rte_emu_get_queue_info(uint16_t dev_id,
+ uint32_t queue);
+
+/**
+ * Get irq info of the emudev.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param vector
+ * Interrupt vector
+ * @return
+ * Pointer to irq info on success, NULL on failure
+ */
+struct emu_dev_irq_info *rte_emu_get_irq_info(uint16_t dev_id,
+ uint32_t vector);
+
+/**
+ * Get doorbell info of the emudev.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param doorbell
+ * Doorbell ID
+ * @return
+ * Pointer to doorbell info on success, NULL on failure
+ */
+struct emu_dev_db_info *rte_emu_get_db_info(uint16_t dev_id,
+ uint32_t doorbell);
+
+/**
+ * Set attribute of the emudev.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param attr_name
+ * Opaque object representing an attribute in implementation.
+ * @param attr
+ * Pointer to attribute
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_set_attr(uint16_t dev_id, const char *attr_name,
+ rte_emudev_attr_t attr);
+
+/**
+ * Get attribute of the emudev.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param attr_name
+ * Opaque object representing an attribute in implementation.
+ * @return
+ * Corresponding attr on success, NULL on failure
+ */
+rte_emudev_attr_t rte_emu_get_attr(uint16_t dev_id, const char *attr_name);
+
+/**
+ * Back-end driver maps a region to the emulated device.
+ * Region name identifies the meaning of the region and the emulated
+ * device and the back-end driver should have the same definition of
+ * region name and its meaning.
+ *
+ * @param dev_id
+ * Device ID of emudev
+ * @param region_name
+ * .
+ * @param attr
+ * Pointer to attribute
+ * @return
+ * 0 on success, -1 on failure
+ */
+int rte_emu_region_map(uint16_t dev_id, const char *region_name,
+ uint16_t region_size, uint64_t *base_addr);
+
+extern struct rte_emu_dev rte_emu_devices[];
+#endif /* _RTE_EMUDEV_H_ */
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK
2020-08-14 19:16 [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Chenbo Xia
` (2 preceding siblings ...)
2020-08-14 19:16 ` [dpdk-dev] [RFC v1 2/2] emudev: Add library for emulated device Chenbo Xia
@ 2020-09-02 21:10 ` Thomas Monjalon
2020-09-03 6:29 ` Xia, Chenbo
3 siblings, 1 reply; 7+ messages in thread
From: Thomas Monjalon @ 2020-09-02 21:10 UTC (permalink / raw)
To: Chenbo Xia
Cc: dev, xuan.ding, xiuchun.lu, cunming.liang, changpeng.liu,
zhihong.wang, Stephen Hemminger, bruce.richardson,
anatoly.burakov, david.marchand, maxime.coquelin, matan,
Adrian Moreno
14/08/2020 21:16, Chenbo Xia:
> Background & Motivation
> -----------------------
> In order to reduce the attack surface, QEMU community is disaggregating QEMU by
> removing part of device emulation from it. The disaggregated/multi-process QEMU
> is using VFIO-over-socket/vfio-user as the main transport mechanism to disaggregate
> I/O services from QEMU[2]. Vfio-user essentially implements the VFIO device model
> presented to the user process by a set of messages over a unix-domain socket. The
> main difference between application using vfio-user and application using vfio
> kernel module is that device manipulation is based on socket messages for vfio-user
> but system calls for vfio kernel module. The vfio-user devices consist of a generic
> VFIO device type, living in QEMU, which is called the client[3], and the core device
> implementation (emulated device), living outside of QEMU, which is called the server.
>
> With the introduction and support of vfio-user in QEMU, QEMU is explicitly adding
> support for external emulated device and data path. We are trying to leverage that
> and introducing vfio-user support in DPDK. By doing so, DPDK is enabled to be an
> alternative I/O device emulation library of building virtualized devices along with
> high-performance data path in separate processes outside QEMU. It will be easy for
> hardware vendors to provide virtualized solutions of their hardware devices by
> implementing emulated device in DPDK.
>
> Except for vfio-user introduced in DPDK, this series also introduces the first
> emulated device implementation. That is emulated AVF device (avf_emudev) implemented
> by AVF emulation driver (avf_emudev driver). Emulated AVF device demos how emulated
> device could be implemented in DPDK. SPDK is also investigating to implement use case
> for NVMe.
I am completely unaware of this change in QEMU.
I've found this presentation about Multi-process QEMU by Oracle:
https://static.sched.com/hosted_files/kvmforum2019/d2/kvm-mpqemu.pdf
and there is the wiki page you already referenced:
https://wiki.qemu.org/Features/MultiProcessQEMU
I guess virtio stays inside QEMU?
What is really moving out? e1000, ne2000 and vmxnet3?
Why emulated AVF is needed, compared to a simple VFIO passthrough?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK
2020-09-02 21:10 ` [dpdk-dev] [RFC v1 0/2] Add device emulation support in DPDK Thomas Monjalon
@ 2020-09-03 6:29 ` Xia, Chenbo
0 siblings, 0 replies; 7+ messages in thread
From: Xia, Chenbo @ 2020-09-03 6:29 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ding, Xuan, Lu, Xiuchun, Liang, Cunming, Liu, Changpeng,
Wang, Zhihong, Stephen Hemminger, Richardson, Bruce, Burakov,
Anatoly, david.marchand, maxime.coquelin, matan, Adrian Moreno
Hi Thomas,
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, September 3, 2020 5:11 AM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Ding, Xuan <xuan.ding@intel.com>; Lu, Xiuchun
> <xiuchun.lu@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Liu,
> Changpeng <changpeng.liu@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Stephen Hemminger <stephen@networkplumber.org>;
> Richardson, Bruce <bruce.richardson@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; david.marchand@redhat.com;
> maxime.coquelin@redhat.com; matan@nvidia.com; Adrian Moreno
> <amorenoz@redhat.com>
> Subject: Re: [RFC v1 0/2] Add device emulation support in DPDK
>
> 14/08/2020 21:16, Chenbo Xia:
> > Background & Motivation
> > -----------------------
> > In order to reduce the attack surface, QEMU community is disaggregating
> QEMU by
> > removing part of device emulation from it. The disaggregated/multi-
> process QEMU
> > is using VFIO-over-socket/vfio-user as the main transport mechanism to
> disaggregate
> > I/O services from QEMU[2]. Vfio-user essentially implements the VFIO
> device model
> > presented to the user process by a set of messages over a unix-domain
> socket. The
> > main difference between application using vfio-user and application
> using vfio
> > kernel module is that device manipulation is based on socket messages
> for vfio-user
> > but system calls for vfio kernel module. The vfio-user devices consist
> of a generic
> > VFIO device type, living in QEMU, which is called the client[3], and the
> core device
> > implementation (emulated device), living outside of QEMU, which is
> called the server.
> >
> > With the introduction and support of vfio-user in QEMU, QEMU is
> explicitly adding
> > support for external emulated device and data path. We are trying to
> leverage that
> > and introducing vfio-user support in DPDK. By doing so, DPDK is enabled
> to be an
> > alternative I/O device emulation library of building virtualized devices
> along with
> > high-performance data path in separate processes outside QEMU. It will
> be easy for
> > hardware vendors to provide virtualized solutions of their hardware
> devices by
> > implementing emulated device in DPDK.
> >
> > Except for vfio-user introduced in DPDK, this series also introduces the
> first
> > emulated device implementation. That is emulated AVF device (avf_emudev)
> implemented
> > by AVF emulation driver (avf_emudev driver). Emulated AVF device demos
> how emulated
> > device could be implemented in DPDK. SPDK is also investigating to
> implement use case
> > for NVMe.
>
> I am completely unaware of this change in QEMU.
> I've found this presentation about Multi-process QEMU by Oracle:
> https://static.sched.com/hosted_files/kvmforum2019/d2/kvm-mpqemu.pdf
> and there is the wiki page you already referenced:
> https://wiki.qemu.org/Features/MultiProcessQEMU
>
> I guess virtio stays inside QEMU?
> What is really moving out? e1000, ne2000 and vmxnet3?
Yes and it has not shown any impact on emulation of most existing devices.
AFAIK, one of the start point is NVMe.
> Why emulated AVF is needed, compared to a simple VFIO passthrough?
Emulated AVF is a show case of a specified virtual device, it's for para-virtualization
but not for HW device. Similar idea can apply on other virtual devices (e.g., memif).
Thanks!
Chenbo
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread