DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport
@ 2019-06-19 15:14 Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure Nikos Dragazis
                   ` (29 more replies)
  0 siblings, 30 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Hi everyone,

this patch series introduces the concept of the virtio-vhost-user
transport. This is actually a revised version of an earlier RFC
implementation that has been proposed by Stefan Hajnoczi [1]. Though
this is a great feature, it seems to have been stalled, so I’d like to
restart the conversation on this and hopefully get it merged with your
help. Let me give you an overview.

The virtio-vhost-user transport is a vhost-user transport implementation
that is based on the virtio-vhost-user device. Its key difference with
the existing transport is that it allows deploying vhost-user targets
inside dedicated Storage Appliance VMs instead of host user space. In
other words, it allows having guests that act as vhost-user backends for
other guests.

The virtio-vhost-user device implements the vhost-user control plane
(master-slave communication) as follows:

1. it parses the vhost-user messages from the vhost-user unix domain
   socket and forwards them to the slave guest through virtqueues

2. it maps the vhost memory regions in QEMU’s process address space and
   exposes them to the slave guest as a RAM-backed PCI MMIO region

3. it hooks up doorbells to the callfds. The slave guest can use these
   doorbells to interrupt the master guest driver

The device code has not yet been merged into upstream QEMU, but this is
definitely the end goal. The current state is that we are awaiting for
the approval of the virtio spec.

I have Cced Darek from the SPDK community who has helped me a lot by
reviewing this series. Note that any device type could be implemented
over this new transport. So, adding the virtio-vhost-user transport in
DPDK would allow using it from SPDK as well.

Getting into the code internals, this patch series makes the following
changes:

1. introduce a generic interface for the transport-specific operations.
   Each of the two available transports, the pre-existing AF_UNIX
   transport and the virtio-vhost-user transport, is going to implement
   this interface. The AF_UNIX-specific code has been extracted from the
   core vhost-user code and is now part of the AF_UNIX transport
   implementation in trans_af_unix.c.

2. introduce the virtio-vhost-user transport. The virtio-vhost-user
   transport requires a driver for the virtio-vhost-user devices. The
   driver along with the transport implementation have been packed into
   a separate library in `drivers/virtio_vhost_user/`. The necessary
   virtio-pci code has been copied from `drivers/net/virtio/`. Some
   additional changes have been made so that the driver can utilize the
   additional resources of the virtio-vhost-user device.

3. update librte_vhost public API to enable choosing transport for each
   new vhost device. Extend the vhost net driver and vhost-scsi example
   application to export this new API to the end user.

The primary changes I did to Stefan’s RFC implementation are the
following:

1. moved postcopy live migration code into trans_af_unix.c. Postcopy
   live migration relies on the userfault fd mechanism, which cannot be
   supported by virtio-vhost-user.

2. moved setup of the log memory region into trans_af_unix.c. Setting up
   the log memory region involves mapping/unmapping guest memory. This
   is an AF_UNIX transport-specific operation.

3. introduced a vhost transport operation for
   process_slave_message_reply()

4. moved the virtio-vhost-user transport/driver into a separate library
   in `drivers/virtio_vhost_user/`. This required making vhost.h and
   vhost_user.h part of librte_vhost public API and exporting some
   private symbols via the version script. This looks better to me that
   just moving the entire librte_vhost into `drivers/`. I am not sure if
   this is the most appropriate solution. I am looking forward to your
   suggestions on this.

5. made use of the virtio PCI capabilities for the additional device
   resources (doorbells, shared memory). This required changes in
   virtio_pci.c and trans_virtio_vhost_user.c.

6. [minor] changed some commit headlines to comply with
   check-git-log.sh.

Please, have a look and let me know about your thoughts. Any
reviews/pointers/suggestions are welcome.

Best regards,
Nikos

[1] http://mails.dpdk.org/archives/dev/2018-January/088155.html


Nikos Dragazis (23):
  vhost: introduce vhost transport operations structure
  vhost: move socket management code
  vhost: move socket fd and un sockaddr
  vhost: move vhost-user connection
  vhost: move vhost-user reconnection
  vhost: move vhost-user fdset
  vhost: propagate vhost transport operations
  vhost: use a single structure for the device state
  vhost: extract socket I/O into transport
  vhost: move slave request fd and lock
  vhost: move mmap/munmap
  vhost: move setup of the log memory region
  vhost: remove main fd parameter from msg handlers
  vhost: move postcopy live migration code
  vhost: support registering additional vhost-user transports
  drivers/virtio_vhost_user: add virtio PCI framework
  drivers: add virtio-vhost-user transport
  drivers/virtio_vhost_user: use additional device resources
  vhost: add flag for choosing vhost-user transport
  net/vhost: add virtio-vhost-user support
  mk: link apps with virtio-vhost-user driver
  config: add option for the virtio-vhost-user transport
  usertools: add virtio-vhost-user devices to dpdk-devbind.py

Stefan Hajnoczi (5):
  vhost: allocate per-socket transport state
  vhost: move start server/client calls
  vhost: add index field in vhost virtqueues
  examples/vhost_scsi: add --socket-file argument
  examples/vhost_scsi: add virtio-vhost-user support

 config/common_base                                 |    6 +
 config/common_linux                                |    1 +
 drivers/Makefile                                   |    5 +
 drivers/net/vhost/rte_eth_vhost.c                  |   13 +
 drivers/virtio_vhost_user/Makefile                 |   27 +
 .../rte_virtio_vhost_user_version.map              |    4 +
 .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
 drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
 drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
 drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
 drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
 examples/vhost_scsi/vhost_scsi.c                   |  103 +-
 lib/librte_vhost/Makefile                          |    4 +-
 lib/librte_vhost/rte_vhost.h                       |    1 +
 lib/librte_vhost/rte_vhost_version.map             |   11 +
 lib/librte_vhost/socket.c                          |  685 +-----------
 lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
 lib/librte_vhost/vhost.c                           |   22 +-
 lib/librte_vhost/vhost.h                           |  298 +++++-
 lib/librte_vhost/vhost_user.c                      |  474 ++-------
 lib/librte_vhost/vhost_user.h                      |   10 +-
 mk/rte.app.mk                                      |    6 +
 usertools/dpdk-devbind.py                          |    7 +
 23 files changed, 3764 insertions(+), 1092 deletions(-)
 create mode 100644 drivers/virtio_vhost_user/Makefile
 create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
 create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
 create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
 create mode 100644 drivers/virtio_vhost_user/virtqueue.h
 create mode 100644 lib/librte_vhost/trans_af_unix.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 20:14   ` Aaron Conole
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 02/28] vhost: move socket management code Nikos Dragazis
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

This is the first of a series of patches, whose purpose is to add
support for the virtio-vhost-user transport. This is a vhost-user
transport implementation that is different from the default AF_UNIX
transport. It uses the virtio-vhost-user PCI device in order to tunnel
vhost-user protocol messages over virtio. This lets guests act as vhost
device backends for other guests.

File descriptor passing is specific to the AF_UNIX vhost-user protocol
transport.  In order to add support for additional transports, it is
necessary to extract transport-specific code from the main vhost-user
code.

This patch introduces struct vhost_transport_ops and associates each
device with a transport.  Core vhost-user code calls into
vhost_transport_ops to perform transport-specific operations.

Notifying callfd is a transport-specific operation, so it belongs to
trans_af_unix.c.

Several more patches follow this one to complete the task of moving
AF_UNIX transport code out of core vhost-user code.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/Makefile        |  2 +-
 lib/librte_vhost/trans_af_unix.c | 20 ++++++++++++++++++++
 lib/librte_vhost/vhost.c         |  1 +
 lib/librte_vhost/vhost.h         | 34 +++++++++++++++++++++++++++++-----
 4 files changed, 51 insertions(+), 6 deletions(-)
 create mode 100644 lib/librte_vhost/trans_af_unix.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 8623e91..5ff5fb2 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -23,7 +23,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
-					vhost_user.c virtio_net.c vdpa.c
+					vhost_user.c virtio_net.c vdpa.c trans_af_unix.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
new file mode 100644
index 0000000..3f0c308
--- /dev/null
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2018 Intel Corporation
+ * Copyright(c) 2017 Red Hat, Inc.
+ * Copyright(c) 2019 Arrikto Inc.
+ */
+
+#include "vhost.h"
+
+static int
+af_unix_vring_call(struct virtio_net *dev __rte_unused,
+		   struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		eventfd_write(vq->callfd, (eventfd_t)1);
+	return 0;
+}
+
+const struct vhost_transport_ops af_unix_trans_ops = {
+	.vring_call = af_unix_vring_call,
+};
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 981837b..a36bc01 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -507,6 +507,7 @@ vhost_new_device(void)
 	dev->vid = i;
 	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	dev->slave_req_fd = -1;
+	dev->trans_ops = &af_unix_trans_ops;
 	dev->vdpa_dev_id = -1;
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->slave_req_lock);
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 884befa..077f213 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -286,6 +286,30 @@ struct guest_page {
 	uint64_t size;
 };
 
+struct virtio_net;
+
+/**
+ * A structure containing function pointers for transport-specific operations.
+ */
+struct vhost_transport_ops {
+	/**
+	 * Notify the guest that used descriptors have been added to the vring.
+	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
+	 * so this function just needs to perform the notification.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param vq
+	 *  vhost virtqueue
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
+};
+
+/** The traditional AF_UNIX vhost-user protocol transport. */
+extern const struct vhost_transport_ops af_unix_trans_ops;
+
 /**
  * Device structure contains all configuration information relating
  * to the device.
@@ -312,6 +336,7 @@ struct virtio_net {
 	uint16_t		mtu;
 
 	struct vhost_device_ops const *notify_ops;
+	struct vhost_transport_ops const *trans_ops;
 
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
@@ -544,12 +569,11 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
 					(vq->callfd >= 0)) ||
 				unlikely(!signalled_used_valid))
-			eventfd_write(vq->callfd, (eventfd_t) 1);
+			dev->trans_ops->vring_call(dev, vq);
 	} else {
 		/* Kick the guest if necessary. */
-		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-				&& (vq->callfd >= 0))
-			eventfd_write(vq->callfd, (eventfd_t)1);
+		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
+			dev->trans_ops->vring_call(dev, vq);
 	}
 }
 
@@ -601,7 +625,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		kick = true;
 kick:
 	if (kick)
-		eventfd_write(vq->callfd, (eventfd_t)1);
+		dev->trans_ops->vring_call(dev, vq);
 }
 
 static __rte_always_inline void
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 02/28] vhost: move socket management code
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 03/28] vhost: allocate per-socket transport state Nikos Dragazis
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The socket.c file serves two purposes:
1. librte_vhost public API entry points, e.g. rte_vhost_driver_register().
2. AF_UNIX socket management.

Move AF_UNIX socket code into trans_af_unix.c so that socket.c only
handles the librte_vhost public API entry points.  This will make it
possible to support other transports besides AF_UNIX.

This patch is a preparatory step that simply moves code from socket.c to
trans_af_unix.c unmodified, besides dropping 'static' qualifiers where
necessary because socket.c now calls into trans_af_unix.c.

A lot of socket.c state is exposed in vhost.h but this is a temporary
measure and will be cleaned up in later patches.  By simply moving code
unmodified in this patch it will be easier to review the actual
refactoring that follows.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        | 549 +--------------------------------------
 lib/librte_vhost/trans_af_unix.c | 485 ++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h         |  76 ++++++
 3 files changed, 562 insertions(+), 548 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 274988c..a993b67 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -4,16 +4,10 @@
 
 #include <stdint.h>
 #include <stdio.h>
-#include <limits.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <sys/un.h>
 #include <sys/queue.h>
-#include <errno.h>
-#include <fcntl.h>
 #include <pthread.h>
 
 #include <rte_log.h>
@@ -22,71 +16,7 @@
 #include "vhost.h"
 #include "vhost_user.h"
 
-
-TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
-
-/*
- * Every time rte_vhost_driver_register() is invoked, an associated
- * vhost_user_socket struct will be created.
- */
-struct vhost_user_socket {
-	struct vhost_user_connection_list conn_list;
-	pthread_mutex_t conn_mutex;
-	char *path;
-	int socket_fd;
-	struct sockaddr_un un;
-	bool is_server;
-	bool reconnect;
-	bool dequeue_zero_copy;
-	bool iommu_support;
-	bool use_builtin_virtio_net;
-
-	/*
-	 * The "supported_features" indicates the feature bits the
-	 * vhost driver supports. The "features" indicates the feature
-	 * bits after the rte_vhost_driver_features_disable/enable().
-	 * It is also the final feature bits used for vhost-user
-	 * features negotiation.
-	 */
-	uint64_t supported_features;
-	uint64_t features;
-
-	uint64_t protocol_features;
-
-	/*
-	 * Device id to identify a specific backend device.
-	 * It's set to -1 for the default software implementation.
-	 * If valid, one socket can have 1 connection only.
-	 */
-	int vdpa_dev_id;
-
-	struct vhost_device_ops const *notify_ops;
-};
-
-struct vhost_user_connection {
-	struct vhost_user_socket *vsocket;
-	int connfd;
-	int vid;
-
-	TAILQ_ENTRY(vhost_user_connection) next;
-};
-
-#define MAX_VHOST_SOCKET 1024
-struct vhost_user {
-	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
-	struct fdset fdset;
-	int vsocket_cnt;
-	pthread_mutex_t mutex;
-};
-
-#define MAX_VIRTIO_BACKLOG 128
-
-static void vhost_user_server_new_connection(int fd, void *data, int *remove);
-static void vhost_user_read_cb(int fd, void *dat, int *remove);
-static int create_unix_socket(struct vhost_user_socket *vsocket);
-static int vhost_user_start_client(struct vhost_user_socket *vsocket);
-
-static struct vhost_user vhost_user = {
+struct vhost_user vhost_user = {
 	.fdset = {
 		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
 		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
@@ -97,459 +27,6 @@ static struct vhost_user vhost_user = {
 	.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
 
-/*
- * return bytes# of read on success or negative val on failure. Update fdnum
- * with number of fds read.
- */
-int
-read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
-		int *fd_num)
-{
-	struct iovec iov;
-	struct msghdr msgh;
-	char control[CMSG_SPACE(max_fds * sizeof(int))];
-	struct cmsghdr *cmsg;
-	int got_fds = 0;
-	int ret;
-
-	*fd_num = 0;
-
-	memset(&msgh, 0, sizeof(msgh));
-	iov.iov_base = buf;
-	iov.iov_len  = buflen;
-
-	msgh.msg_iov = &iov;
-	msgh.msg_iovlen = 1;
-	msgh.msg_control = control;
-	msgh.msg_controllen = sizeof(control);
-
-	ret = recvmsg(sockfd, &msgh, 0);
-	if (ret <= 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
-		return ret;
-	}
-
-	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
-		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
-		return -1;
-	}
-
-	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
-		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
-		if ((cmsg->cmsg_level == SOL_SOCKET) &&
-			(cmsg->cmsg_type == SCM_RIGHTS)) {
-			got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int);
-			*fd_num = got_fds;
-			memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int));
-			break;
-		}
-	}
-
-	/* Clear out unused file descriptors */
-	while (got_fds < max_fds)
-		fds[got_fds++] = -1;
-
-	return ret;
-}
-
-int
-send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
-{
-
-	struct iovec iov;
-	struct msghdr msgh;
-	size_t fdsize = fd_num * sizeof(int);
-	char control[CMSG_SPACE(fdsize)];
-	struct cmsghdr *cmsg;
-	int ret;
-
-	memset(&msgh, 0, sizeof(msgh));
-	iov.iov_base = buf;
-	iov.iov_len = buflen;
-
-	msgh.msg_iov = &iov;
-	msgh.msg_iovlen = 1;
-
-	if (fds && fd_num > 0) {
-		msgh.msg_control = control;
-		msgh.msg_controllen = sizeof(control);
-		cmsg = CMSG_FIRSTHDR(&msgh);
-		if (cmsg == NULL) {
-			RTE_LOG(ERR, VHOST_CONFIG, "cmsg == NULL\n");
-			errno = EINVAL;
-			return -1;
-		}
-		cmsg->cmsg_len = CMSG_LEN(fdsize);
-		cmsg->cmsg_level = SOL_SOCKET;
-		cmsg->cmsg_type = SCM_RIGHTS;
-		memcpy(CMSG_DATA(cmsg), fds, fdsize);
-	} else {
-		msgh.msg_control = NULL;
-		msgh.msg_controllen = 0;
-	}
-
-	do {
-		ret = sendmsg(sockfd, &msgh, MSG_NOSIGNAL);
-	} while (ret < 0 && errno == EINTR);
-
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
-vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
-{
-	int vid;
-	size_t size;
-	struct vhost_user_connection *conn;
-	int ret;
-
-	if (vsocket == NULL)
-		return;
-
-	conn = malloc(sizeof(*conn));
-	if (conn == NULL) {
-		close(fd);
-		return;
-	}
-
-	vid = vhost_new_device();
-	if (vid == -1) {
-		goto err;
-	}
-
-	size = strnlen(vsocket->path, PATH_MAX);
-	vhost_set_ifname(vid, vsocket->path, size);
-
-	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
-
-	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id);
-
-	if (vsocket->dequeue_zero_copy)
-		vhost_enable_dequeue_zero_copy(vid);
-
-	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
-
-	if (vsocket->notify_ops->new_connection) {
-		ret = vsocket->notify_ops->new_connection(vid);
-		if (ret < 0) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to add vhost user connection with fd %d\n",
-				fd);
-			goto err_cleanup;
-		}
-	}
-
-	conn->connfd = fd;
-	conn->vsocket = vsocket;
-	conn->vid = vid;
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
-			NULL, conn);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to add fd %d into vhost server fdset\n",
-			fd);
-
-		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
-
-		goto err_cleanup;
-	}
-
-	pthread_mutex_lock(&vsocket->conn_mutex);
-	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
-	pthread_mutex_unlock(&vsocket->conn_mutex);
-
-	fdset_pipe_notify(&vhost_user.fdset);
-	return;
-
-err_cleanup:
-	vhost_destroy_device(vid);
-err:
-	free(conn);
-	close(fd);
-}
-
-/* call back when there is new vhost-user connection from client  */
-static void
-vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
-{
-	struct vhost_user_socket *vsocket = dat;
-
-	fd = accept(fd, NULL, NULL);
-	if (fd < 0)
-		return;
-
-	RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd);
-	vhost_user_add_connection(fd, vsocket);
-}
-
-static void
-vhost_user_read_cb(int connfd, void *dat, int *remove)
-{
-	struct vhost_user_connection *conn = dat;
-	struct vhost_user_socket *vsocket = conn->vsocket;
-	int ret;
-
-	ret = vhost_user_msg_handler(conn->vid, connfd);
-	if (ret < 0) {
-		struct virtio_net *dev = get_device(conn->vid);
-
-		close(connfd);
-		*remove = 1;
-
-		if (dev)
-			vhost_destroy_device_notify(dev);
-
-		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
-
-		vhost_destroy_device(conn->vid);
-
-		pthread_mutex_lock(&vsocket->conn_mutex);
-		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-		pthread_mutex_unlock(&vsocket->conn_mutex);
-
-		free(conn);
-
-		if (vsocket->reconnect) {
-			create_unix_socket(vsocket);
-			vhost_user_start_client(vsocket);
-		}
-	}
-}
-
-static int
-create_unix_socket(struct vhost_user_socket *vsocket)
-{
-	int fd;
-	struct sockaddr_un *un = &vsocket->un;
-
-	fd = socket(AF_UNIX, SOCK_STREAM, 0);
-	if (fd < 0)
-		return -1;
-	RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n",
-		vsocket->is_server ? "server" : "client", fd);
-
-	if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"vhost-user: can't set nonblocking mode for socket, fd: "
-			"%d (%s)\n", fd, strerror(errno));
-		close(fd);
-		return -1;
-	}
-
-	memset(un, 0, sizeof(*un));
-	un->sun_family = AF_UNIX;
-	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
-	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
-
-	vsocket->socket_fd = fd;
-	return 0;
-}
-
-static int
-vhost_user_start_server(struct vhost_user_socket *vsocket)
-{
-	int ret;
-	int fd = vsocket->socket_fd;
-	const char *path = vsocket->path;
-
-	/*
-	 * bind () may fail if the socket file with the same name already
-	 * exists. But the library obviously should not delete the file
-	 * provided by the user, since we can not be sure that it is not
-	 * being used by other applications. Moreover, many applications form
-	 * socket names based on user input, which is prone to errors.
-	 *
-	 * The user must ensure that the socket does not exist before
-	 * registering the vhost driver in server mode.
-	 */
-	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to bind to %s: %s; remove it and try again\n",
-			path, strerror(errno));
-		goto err;
-	}
-	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
-
-	ret = listen(fd, MAX_VIRTIO_BACKLOG);
-	if (ret < 0)
-		goto err;
-
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
-		  NULL, vsocket);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to add listen fd %d to vhost server fdset\n",
-			fd);
-		goto err;
-	}
-
-	return 0;
-
-err:
-	close(fd);
-	return -1;
-}
-
-struct vhost_user_reconnect {
-	struct sockaddr_un un;
-	int fd;
-	struct vhost_user_socket *vsocket;
-
-	TAILQ_ENTRY(vhost_user_reconnect) next;
-};
-
-TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect);
-struct vhost_user_reconnect_list {
-	struct vhost_user_reconnect_tailq_list head;
-	pthread_mutex_t mutex;
-};
-
-static struct vhost_user_reconnect_list reconn_list;
-static pthread_t reconn_tid;
-
-static int
-vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
-{
-	int ret, flags;
-
-	ret = connect(fd, un, sz);
-	if (ret < 0 && errno != EISCONN)
-		return -1;
-
-	flags = fcntl(fd, F_GETFL, 0);
-	if (flags < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"can't get flags for connfd %d\n", fd);
-		return -2;
-	}
-	if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-				"can't disable nonblocking on fd %d\n", fd);
-		return -2;
-	}
-	return 0;
-}
-
-static void *
-vhost_user_client_reconnect(void *arg __rte_unused)
-{
-	int ret;
-	struct vhost_user_reconnect *reconn, *next;
-
-	while (1) {
-		pthread_mutex_lock(&reconn_list.mutex);
-
-		/*
-		 * An equal implementation of TAILQ_FOREACH_SAFE,
-		 * which does not exist on all platforms.
-		 */
-		for (reconn = TAILQ_FIRST(&reconn_list.head);
-		     reconn != NULL; reconn = next) {
-			next = TAILQ_NEXT(reconn, next);
-
-			ret = vhost_user_connect_nonblock(reconn->fd,
-						(struct sockaddr *)&reconn->un,
-						sizeof(reconn->un));
-			if (ret == -2) {
-				close(reconn->fd);
-				RTE_LOG(ERR, VHOST_CONFIG,
-					"reconnection for fd %d failed\n",
-					reconn->fd);
-				goto remove_fd;
-			}
-			if (ret == -1)
-				continue;
-
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"%s: connected\n", reconn->vsocket->path);
-			vhost_user_add_connection(reconn->fd, reconn->vsocket);
-remove_fd:
-			TAILQ_REMOVE(&reconn_list.head, reconn, next);
-			free(reconn);
-		}
-
-		pthread_mutex_unlock(&reconn_list.mutex);
-		sleep(1);
-	}
-
-	return NULL;
-}
-
-static int
-vhost_user_reconnect_init(void)
-{
-	int ret;
-
-	ret = pthread_mutex_init(&reconn_list.mutex, NULL);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex");
-		return ret;
-	}
-	TAILQ_INIT(&reconn_list.head);
-
-	ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL,
-			     vhost_user_client_reconnect, NULL);
-	if (ret != 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread");
-		if (pthread_mutex_destroy(&reconn_list.mutex)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to destroy reconnect mutex");
-		}
-	}
-
-	return ret;
-}
-
-static int
-vhost_user_start_client(struct vhost_user_socket *vsocket)
-{
-	int ret;
-	int fd = vsocket->socket_fd;
-	const char *path = vsocket->path;
-	struct vhost_user_reconnect *reconn;
-
-	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
-					  sizeof(vsocket->un));
-	if (ret == 0) {
-		vhost_user_add_connection(fd, vsocket);
-		return 0;
-	}
-
-	RTE_LOG(WARNING, VHOST_CONFIG,
-		"failed to connect to %s: %s\n",
-		path, strerror(errno));
-
-	if (ret == -2 || !vsocket->reconnect) {
-		close(fd);
-		return -1;
-	}
-
-	RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path);
-	reconn = malloc(sizeof(*reconn));
-	if (reconn == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to allocate memory for reconnect\n");
-		close(fd);
-		return -1;
-	}
-	reconn->un = vsocket->un;
-	reconn->fd = fd;
-	reconn->vsocket = vsocket;
-	pthread_mutex_lock(&reconn_list.mutex);
-	TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next);
-	pthread_mutex_unlock(&reconn_list.mutex);
-
-	return 0;
-}
-
 static struct vhost_user_socket *
 find_vhost_user_socket(const char *path)
 {
@@ -952,30 +429,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	return ret;
 }
 
-static bool
-vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
-{
-	int found = false;
-	struct vhost_user_reconnect *reconn, *next;
-
-	pthread_mutex_lock(&reconn_list.mutex);
-
-	for (reconn = TAILQ_FIRST(&reconn_list.head);
-	     reconn != NULL; reconn = next) {
-		next = TAILQ_NEXT(reconn, next);
-
-		if (reconn->vsocket == vsocket) {
-			TAILQ_REMOVE(&reconn_list.head, reconn, next);
-			close(reconn->fd);
-			free(reconn);
-			found = true;
-			break;
-		}
-	}
-	pthread_mutex_unlock(&reconn_list.mutex);
-	return found;
-}
-
 /**
  * Unregister the specified vhost socket
  */
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 3f0c308..89a5b7d 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -4,7 +4,492 @@
  * Copyright(c) 2019 Arrikto Inc.
  */
 
+#include <fcntl.h>
+
+#include <rte_log.h>
+
 #include "vhost.h"
+#include "vhost_user.h"
+
+#define MAX_VIRTIO_BACKLOG 128
+
+static void vhost_user_read_cb(int connfd, void *dat, int *remove);
+
+/*
+ * return bytes# of read on success or negative val on failure. Update fdnum
+ * with number of fds read.
+ */
+int
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
+		int *fd_num)
+{
+	struct iovec iov;
+	struct msghdr msgh;
+	char control[CMSG_SPACE(max_fds * sizeof(int))];
+	struct cmsghdr *cmsg;
+	int got_fds = 0;
+	int ret;
+
+	*fd_num = 0;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len  = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+	msgh.msg_control = control;
+	msgh.msg_controllen = sizeof(control);
+
+	ret = recvmsg(sockfd, &msgh, 0);
+	if (ret <= 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
+		return ret;
+	}
+
+	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
+		return -1;
+	}
+
+	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+		if ((cmsg->cmsg_level == SOL_SOCKET) &&
+			(cmsg->cmsg_type == SCM_RIGHTS)) {
+			got_fds = (cmsg->cmsg_len - CMSG_LEN(0)) / sizeof(int);
+			*fd_num = got_fds;
+			memcpy(fds, CMSG_DATA(cmsg), got_fds * sizeof(int));
+			break;
+		}
+	}
+
+	/* Clear out unused file descriptors */
+	while (got_fds < max_fds)
+		fds[got_fds++] = -1;
+
+	return ret;
+}
+
+int
+send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+
+	if (fds && fd_num > 0) {
+		msgh.msg_control = control;
+		msgh.msg_controllen = sizeof(control);
+		cmsg = CMSG_FIRSTHDR(&msgh);
+		if (cmsg == NULL) {
+			RTE_LOG(ERR, VHOST_CONFIG, "cmsg == NULL\n");
+			errno = EINVAL;
+			return -1;
+		}
+		cmsg->cmsg_len = CMSG_LEN(fdsize);
+		cmsg->cmsg_level = SOL_SOCKET;
+		cmsg->cmsg_type = SCM_RIGHTS;
+		memcpy(CMSG_DATA(cmsg), fds, fdsize);
+	} else {
+		msgh.msg_control = NULL;
+		msgh.msg_controllen = 0;
+	}
+
+	do {
+		ret = sendmsg(sockfd, &msgh, MSG_NOSIGNAL);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static void
+vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
+{
+	int vid;
+	size_t size;
+	struct vhost_user_connection *conn;
+	int ret;
+
+	if (vsocket == NULL)
+		return;
+
+	conn = malloc(sizeof(*conn));
+	if (conn == NULL) {
+		close(fd);
+		return;
+	}
+
+	vid = vhost_new_device();
+	if (vid == -1) {
+		goto err;
+	}
+
+	size = strnlen(vsocket->path, PATH_MAX);
+	vhost_set_ifname(vid, vsocket->path, size);
+
+	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
+
+	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id);
+
+	if (vsocket->dequeue_zero_copy)
+		vhost_enable_dequeue_zero_copy(vid);
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
+
+	if (vsocket->notify_ops->new_connection) {
+		ret = vsocket->notify_ops->new_connection(vid);
+		if (ret < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to add vhost user connection with fd %d\n",
+				fd);
+			goto err_cleanup;
+		}
+	}
+
+	conn->connfd = fd;
+	conn->vsocket = vsocket;
+	conn->vid = vid;
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
+			NULL, conn);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add fd %d into vhost server fdset\n",
+			fd);
+
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->vid);
+
+		goto err_cleanup;
+	}
+
+	pthread_mutex_lock(&vsocket->conn_mutex);
+	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
+	pthread_mutex_unlock(&vsocket->conn_mutex);
+
+	fdset_pipe_notify(&vhost_user.fdset);
+	return;
+
+err_cleanup:
+	vhost_destroy_device(vid);
+err:
+	free(conn);
+	close(fd);
+}
+
+/* call back when there is new vhost-user connection from client  */
+static void
+vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
+{
+	struct vhost_user_socket *vsocket = dat;
+
+	fd = accept(fd, NULL, NULL);
+	if (fd < 0)
+		return;
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd);
+	vhost_user_add_connection(fd, vsocket);
+}
+
+static void
+vhost_user_read_cb(int connfd, void *dat, int *remove)
+{
+	struct vhost_user_connection *conn = dat;
+	struct vhost_user_socket *vsocket = conn->vsocket;
+	int ret;
+
+	ret = vhost_user_msg_handler(conn->vid, connfd);
+	if (ret < 0) {
+		struct virtio_net *dev = get_device(conn->vid);
+
+		close(connfd);
+		*remove = 1;
+
+		if (dev)
+			vhost_destroy_device_notify(dev);
+
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->vid);
+
+		vhost_destroy_device(conn->vid);
+
+		pthread_mutex_lock(&vsocket->conn_mutex);
+		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
+		pthread_mutex_unlock(&vsocket->conn_mutex);
+
+		free(conn);
+
+		if (vsocket->reconnect) {
+			create_unix_socket(vsocket);
+			vhost_user_start_client(vsocket);
+		}
+	}
+}
+
+int
+create_unix_socket(struct vhost_user_socket *vsocket)
+{
+	int fd;
+	struct sockaddr_un *un = &vsocket->un;
+
+	fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (fd < 0)
+		return -1;
+	RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n",
+		vsocket->is_server ? "server" : "client", fd);
+
+	if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost-user: can't set nonblocking mode for socket, fd: "
+			"%d (%s)\n", fd, strerror(errno));
+		close(fd);
+		return -1;
+	}
+
+	memset(un, 0, sizeof(*un));
+	un->sun_family = AF_UNIX;
+	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
+	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
+
+	vsocket->socket_fd = fd;
+	return 0;
+}
+
+int
+vhost_user_start_server(struct vhost_user_socket *vsocket)
+{
+	int ret;
+	int fd = vsocket->socket_fd;
+	const char *path = vsocket->path;
+
+	/*
+	 * bind () may fail if the socket file with the same name already
+	 * exists. But the library obviously should not delete the file
+	 * provided by the user, since we can not be sure that it is not
+	 * being used by other applications. Moreover, many applications form
+	 * socket names based on user input, which is prone to errors.
+	 *
+	 * The user must ensure that the socket does not exist before
+	 * registering the vhost driver in server mode.
+	 */
+	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to bind to %s: %s; remove it and try again\n",
+			path, strerror(errno));
+		goto err;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
+
+	ret = listen(fd, MAX_VIRTIO_BACKLOG);
+	if (ret < 0)
+		goto err;
+
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
+		  NULL, vsocket);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add listen fd %d to vhost server fdset\n",
+			fd);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	close(fd);
+	return -1;
+}
+
+struct vhost_user_reconnect {
+	struct sockaddr_un un;
+	int fd;
+	struct vhost_user_socket *vsocket;
+
+	TAILQ_ENTRY(vhost_user_reconnect) next;
+};
+
+TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect);
+struct vhost_user_reconnect_list {
+	struct vhost_user_reconnect_tailq_list head;
+	pthread_mutex_t mutex;
+};
+
+static struct vhost_user_reconnect_list reconn_list;
+pthread_t reconn_tid;
+
+static int
+vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
+{
+	int ret, flags;
+
+	ret = connect(fd, un, sz);
+	if (ret < 0 && errno != EISCONN)
+		return -1;
+
+	flags = fcntl(fd, F_GETFL, 0);
+	if (flags < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"can't get flags for connfd %d\n", fd);
+		return -2;
+	}
+	if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"can't disable nonblocking on fd %d\n", fd);
+		return -2;
+	}
+	return 0;
+}
+
+static void *
+vhost_user_client_reconnect(void *arg __rte_unused)
+{
+	int ret;
+	struct vhost_user_reconnect *reconn, *next;
+
+	while (1) {
+		pthread_mutex_lock(&reconn_list.mutex);
+
+		/*
+		 * An equal implementation of TAILQ_FOREACH_SAFE,
+		 * which does not exist on all platforms.
+		 */
+		for (reconn = TAILQ_FIRST(&reconn_list.head);
+		     reconn != NULL; reconn = next) {
+			next = TAILQ_NEXT(reconn, next);
+
+			ret = vhost_user_connect_nonblock(reconn->fd,
+						(struct sockaddr *)&reconn->un,
+						sizeof(reconn->un));
+			if (ret == -2) {
+				close(reconn->fd);
+				RTE_LOG(ERR, VHOST_CONFIG,
+					"reconnection for fd %d failed\n",
+					reconn->fd);
+				goto remove_fd;
+			}
+			if (ret == -1)
+				continue;
+
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"%s: connected\n", reconn->vsocket->path);
+			vhost_user_add_connection(reconn->fd, reconn->vsocket);
+remove_fd:
+			TAILQ_REMOVE(&reconn_list.head, reconn, next);
+			free(reconn);
+		}
+
+		pthread_mutex_unlock(&reconn_list.mutex);
+		sleep(1);
+	}
+
+	return NULL;
+}
+
+int
+vhost_user_reconnect_init(void)
+{
+	int ret;
+
+	ret = pthread_mutex_init(&reconn_list.mutex, NULL);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex");
+		return ret;
+	}
+	TAILQ_INIT(&reconn_list.head);
+
+	ret = rte_ctrl_thread_create(&reconn_tid, "vhost_reconn", NULL,
+			     vhost_user_client_reconnect, NULL);
+	if (ret != 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread");
+		if (pthread_mutex_destroy(&reconn_list.mutex)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to destroy reconnect mutex");
+		}
+	}
+
+	return ret;
+}
+
+int
+vhost_user_start_client(struct vhost_user_socket *vsocket)
+{
+	int ret;
+	int fd = vsocket->socket_fd;
+	const char *path = vsocket->path;
+	struct vhost_user_reconnect *reconn;
+
+	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
+					  sizeof(vsocket->un));
+	if (ret == 0) {
+		vhost_user_add_connection(fd, vsocket);
+		return 0;
+	}
+
+	RTE_LOG(WARNING, VHOST_CONFIG,
+		"failed to connect to %s: %s\n",
+		path, strerror(errno));
+
+	if (ret == -2 || !vsocket->reconnect) {
+		close(fd);
+		return -1;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path);
+	reconn = malloc(sizeof(*reconn));
+	if (reconn == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to allocate memory for reconnect\n");
+		close(fd);
+		return -1;
+	}
+	reconn->un = vsocket->un;
+	reconn->fd = fd;
+	reconn->vsocket = vsocket;
+	pthread_mutex_lock(&reconn_list.mutex);
+	TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next);
+	pthread_mutex_unlock(&reconn_list.mutex);
+
+	return 0;
+}
+
+bool
+vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
+{
+	int found = false;
+	struct vhost_user_reconnect *reconn, *next;
+
+	pthread_mutex_lock(&reconn_list.mutex);
+
+	for (reconn = TAILQ_FIRST(&reconn_list.head);
+	     reconn != NULL; reconn = next) {
+		next = TAILQ_NEXT(reconn, next);
+
+		if (reconn->vsocket == vsocket) {
+			TAILQ_REMOVE(&reconn_list.head, reconn, next);
+			close(reconn->fd);
+			free(reconn);
+			found = true;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&reconn_list.mutex);
+	return found;
+}
 
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 077f213..c363369 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -5,6 +5,7 @@
 #ifndef _VHOST_NET_CDEV_H_
 #define _VHOST_NET_CDEV_H_
 #include <stdint.h>
+#include <stdbool.h>
 #include <stdio.h>
 #include <stdbool.h>
 #include <sys/types.h>
@@ -13,13 +14,16 @@
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <sys/socket.h>
+#include <sys/un.h> /* TODO remove when trans_af_unix.c refactoring is done */
 #include <linux/if.h>
+#include <pthread.h>
 
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
 #include <rte_malloc.h>
 
+#include "fd_man.h"
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
 
@@ -360,6 +364,78 @@ struct virtio_net {
 	struct rte_vhost_user_extern_ops extern_ops;
 } __rte_cache_aligned;
 
+/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect
+ * declarations are temporary measures for moving AF_UNIX code into
+ * trans_af_unix.c.  They will be cleaned up as socket.c is untangled from
+ * trans_af_unix.c.
+ */
+TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
+
+/*
+ * Every time rte_vhost_driver_register() is invoked, an associated
+ * vhost_user_socket struct will be created.
+ */
+struct vhost_user_socket {
+	struct vhost_user_connection_list conn_list;
+	pthread_mutex_t conn_mutex;
+	char *path;
+	int socket_fd;
+	struct sockaddr_un un;
+	bool is_server;
+	bool reconnect;
+	bool dequeue_zero_copy;
+	bool iommu_support;
+	bool use_builtin_virtio_net;
+
+	/*
+	 * The "supported_features" indicates the feature bits the
+	 * vhost driver supports. The "features" indicates the feature
+	 * bits after the rte_vhost_driver_features_disable/enable().
+	 * It is also the final feature bits used for vhost-user
+	 * features negotiation.
+	 */
+	uint64_t supported_features;
+	uint64_t features;
+
+	uint64_t protocol_features;
+
+	/*
+	 * Device id to identify a specific backend device.
+	 * It's set to -1 for the default software implementation.
+	 * If valid, one socket can have 1 connection only.
+	 */
+	int vdpa_dev_id;
+
+	struct vhost_device_ops const *notify_ops;
+};
+
+struct vhost_user_connection {
+	struct vhost_user_socket *vsocket;
+	int connfd;
+	int vid;
+
+	TAILQ_ENTRY(vhost_user_connection) next;
+};
+
+#define MAX_VHOST_SOCKET 1024
+struct vhost_user {
+	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
+	struct fdset fdset;
+	int vsocket_cnt;
+	pthread_mutex_t mutex;
+};
+
+extern struct vhost_user vhost_user;
+
+int create_unix_socket(struct vhost_user_socket *vsocket);
+int vhost_user_start_server(struct vhost_user_socket *vsocket);
+int vhost_user_start_client(struct vhost_user_socket *vsocket);
+
+extern pthread_t reconn_tid;
+
+int vhost_user_reconnect_init(void);
+bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket);
+
 static __rte_always_inline bool
 vq_is_packed(struct virtio_net *dev)
 {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 03/28] vhost: allocate per-socket transport state
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 02/28] vhost: move socket management code Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 04/28] vhost: move socket fd and un sockaddr Nikos Dragazis
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

From: Stefan Hajnoczi <stefanha@redhat.com>

vhost-user transports have per-socket state (like file descriptors).
Make it possible for transports to keep state beyond what is included in
struct vhost_user_socket.

This patch makes it possible to move AF_UNIX-specific fields from struct
vhost_user_socket into trans_af_unix.c in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        | 6 ++++--
 lib/librte_vhost/trans_af_unix.c | 5 +++++
 lib/librte_vhost/vhost.h         | 9 +++++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index a993b67..60d3546 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -316,6 +316,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 {
 	int ret = -1;
 	struct vhost_user_socket *vsocket;
+	const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops;
 
 	if (!path)
 		return -1;
@@ -328,10 +329,11 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		goto out;
 	}
 
-	vsocket = malloc(sizeof(struct vhost_user_socket));
+	vsocket = malloc(trans_ops->socket_size);
 	if (!vsocket)
 		goto out;
-	memset(vsocket, 0, sizeof(struct vhost_user_socket));
+	memset(vsocket, 0, trans_ops->socket_size);
+	vsocket->trans_ops = trans_ops;
 	vsocket->path = strdup(path);
 	if (vsocket->path == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 89a5b7d..4de2579 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -13,6 +13,10 @@
 
 #define MAX_VIRTIO_BACKLOG 128
 
+struct af_unix_socket {
+	struct vhost_user_socket socket; /* must be the first field! */
+};
+
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /*
@@ -501,5 +505,6 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 }
 
 const struct vhost_transport_ops af_unix_trans_ops = {
+	.socket_size = sizeof(struct af_unix_socket),
 	.vring_call = af_unix_vring_call,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c363369..9615392 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -296,6 +296,9 @@ struct virtio_net;
  * A structure containing function pointers for transport-specific operations.
  */
 struct vhost_transport_ops {
+	/** Size of struct vhost_user_socket-derived per-socket state */
+	size_t socket_size;
+
 	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
@@ -374,6 +377,11 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
+ *
+ * Transport-specific per-socket state can be kept by embedding this struct at
+ * the beginning of a transport-specific struct.  Set
+ * vhost_transport_ops->socket_size to the size of the transport-specific
+ * struct.
  */
 struct vhost_user_socket {
 	struct vhost_user_connection_list conn_list;
@@ -407,6 +415,7 @@ struct vhost_user_socket {
 	int vdpa_dev_id;
 
 	struct vhost_device_ops const *notify_ops;
+	struct vhost_transport_ops const *trans_ops;
 };
 
 struct vhost_user_connection {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 04/28] vhost: move socket fd and un sockaddr
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (2 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 03/28] vhost: allocate per-socket transport state Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 05/28] vhost: move start server/client calls Nikos Dragazis
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The socket file descriptor and AF_UNIX sockaddr are specific to the
AF_UNIX transport, so move them into trans_af_unix.c.

In order to do this, we need to begin defining the vhost_transport_ops
interface that will allow librte_vhost to support multiple transports.
This patch adds socket_init() and socket_cleanup() to
vhost_transport_ops.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        | 11 ++------
 lib/librte_vhost/trans_af_unix.c | 55 ++++++++++++++++++++++++++++++++--------
 lib/librte_vhost/vhost.h         | 30 ++++++++++++++++++----
 3 files changed, 72 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 60d3546..3b5608c 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -408,7 +408,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	} else {
 		vsocket->is_server = true;
 	}
-	ret = create_unix_socket(vsocket);
+	ret = trans_ops->socket_init(vsocket, flags);
 	if (ret < 0) {
 		goto out_mutex;
 	}
@@ -480,14 +480,7 @@ rte_vhost_driver_unregister(const char *path)
 			}
 			pthread_mutex_unlock(&vsocket->conn_mutex);
 
-			if (vsocket->is_server) {
-				fdset_del(&vhost_user.fdset,
-						vsocket->socket_fd);
-				close(vsocket->socket_fd);
-				unlink(path);
-			} else if (vsocket->reconnect) {
-				vhost_user_remove_reconnect(vsocket);
-			}
+			vsocket->trans_ops->socket_cleanup(vsocket);
 
 			pthread_mutex_destroy(&vsocket->conn_mutex);
 			vhost_user_socket_mem_free(vsocket);
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 4de2579..f23bb9c 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -4,6 +4,8 @@
  * Copyright(c) 2019 Arrikto Inc.
  */
 
+#include <sys/socket.h>
+#include <sys/un.h>
 #include <fcntl.h>
 
 #include <rte_log.h>
@@ -15,8 +17,11 @@
 
 struct af_unix_socket {
 	struct vhost_user_socket socket; /* must be the first field! */
+	int socket_fd;
+	struct sockaddr_un un;
 };
 
+static int create_unix_socket(struct vhost_user_socket *vsocket);
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /*
@@ -244,11 +249,13 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 	}
 }
 
-int
+static int
 create_unix_socket(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int fd;
-	struct sockaddr_un *un = &vsocket->un;
+	struct sockaddr_un *un = &af_vsocket->un;
 
 	fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (fd < 0)
@@ -269,15 +276,17 @@ create_unix_socket(struct vhost_user_socket *vsocket)
 	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
 	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
 
-	vsocket->socket_fd = fd;
+	af_vsocket->socket_fd = fd;
 	return 0;
 }
 
 int
 vhost_user_start_server(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
-	int fd = vsocket->socket_fd;
+	int fd = af_vsocket->socket_fd;
 	const char *path = vsocket->path;
 
 	/*
@@ -290,7 +299,7 @@ vhost_user_start_server(struct vhost_user_socket *vsocket)
 	 * The user must ensure that the socket does not exist before
 	 * registering the vhost driver in server mode.
 	 */
-	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
+	ret = bind(fd, (struct sockaddr *)&af_vsocket->un, sizeof(af_vsocket->un));
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"failed to bind to %s: %s; remove it and try again\n",
@@ -432,13 +441,15 @@ vhost_user_reconnect_init(void)
 int
 vhost_user_start_client(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
-	int fd = vsocket->socket_fd;
+	int fd = af_vsocket->socket_fd;
 	const char *path = vsocket->path;
 	struct vhost_user_reconnect *reconn;
 
-	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
-					  sizeof(vsocket->un));
+	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&af_vsocket->un,
+					  sizeof(af_vsocket->un));
 	if (ret == 0) {
 		vhost_user_add_connection(fd, vsocket);
 		return 0;
@@ -461,7 +472,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket)
 		close(fd);
 		return -1;
 	}
-	reconn->un = vsocket->un;
+	reconn->un = af_vsocket->un;
 	reconn->fd = fd;
 	reconn->vsocket = vsocket;
 	pthread_mutex_lock(&reconn_list.mutex);
@@ -471,7 +482,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket)
 	return 0;
 }
 
-bool
+static bool
 vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
 {
 	int found = false;
@@ -496,6 +507,28 @@ vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
 }
 
 static int
+af_unix_socket_init(struct vhost_user_socket *vsocket,
+		    uint64_t flags __rte_unused)
+{
+	return create_unix_socket(vsocket);
+}
+
+static void
+af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
+{
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
+
+	if (vsocket->is_server) {
+		fdset_del(&vhost_user.fdset, af_vsocket->socket_fd);
+		close(af_vsocket->socket_fd);
+		unlink(vsocket->path);
+	} else if (vsocket->reconnect) {
+		vhost_user_remove_reconnect(vsocket);
+	}
+}
+
+static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
 {
@@ -506,5 +539,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
+	.socket_init = af_unix_socket_init,
+	.socket_cleanup = af_unix_socket_cleanup,
 	.vring_call = af_unix_vring_call,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 9615392..40b5c25 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -14,7 +14,6 @@
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <sys/socket.h>
-#include <sys/un.h> /* TODO remove when trans_af_unix.c refactoring is done */
 #include <linux/if.h>
 #include <pthread.h>
 
@@ -291,6 +290,7 @@ struct guest_page {
 };
 
 struct virtio_net;
+struct vhost_user_socket;
 
 /**
  * A structure containing function pointers for transport-specific operations.
@@ -300,6 +300,30 @@ struct vhost_transport_ops {
 	size_t socket_size;
 
 	/**
+	 * Initialize a vhost-user socket that is being created by
+	 * rte_vhost_driver_register().  This function checks that the flags
+	 * are valid but does not establish a vhost-user connection.
+	 *
+	 * @param vsocket
+	 *  new socket
+	 * @param flags
+	 *  flags argument from rte_vhost_driver_register()
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*socket_init)(struct vhost_user_socket *vsocket, uint64_t flags);
+
+	/**
+	 * Free resources associated with a socket, including any established
+	 * connections.  This function calls vhost_destroy_device() to destroy
+	 * established connections for this socket.
+	 *
+	 * @param vsocket
+	 *  vhost socket
+	 */
+	void (*socket_cleanup)(struct vhost_user_socket *vsocket);
+
+	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
 	 * so this function just needs to perform the notification.
@@ -387,8 +411,6 @@ struct vhost_user_socket {
 	struct vhost_user_connection_list conn_list;
 	pthread_mutex_t conn_mutex;
 	char *path;
-	int socket_fd;
-	struct sockaddr_un un;
 	bool is_server;
 	bool reconnect;
 	bool dequeue_zero_copy;
@@ -436,14 +458,12 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-int create_unix_socket(struct vhost_user_socket *vsocket);
 int vhost_user_start_server(struct vhost_user_socket *vsocket);
 int vhost_user_start_client(struct vhost_user_socket *vsocket);
 
 extern pthread_t reconn_tid;
 
 int vhost_user_reconnect_init(void);
-bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket);
 
 static __rte_always_inline bool
 vq_is_packed(struct virtio_net *dev)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 05/28] vhost: move start server/client calls
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (3 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 04/28] vhost: move socket fd and un sockaddr Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 06/28] vhost: move vhost-user connection Nikos Dragazis
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

From: Stefan Hajnoczi <stefanha@redhat.com>

Introduce a vhost_transport_ops->socket_start() interface so the
transport can begin establishing vhost-user connections.  This is part
of the AF_UNIX transport refactoring and removes AF_UNIX code from
vhost.h and socket.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        |  5 +----
 lib/librte_vhost/trans_af_unix.c | 16 ++++++++++++++--
 lib/librte_vhost/vhost.h         | 16 +++++++++++++---
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 3b5608c..df6d707 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -564,8 +564,5 @@ rte_vhost_driver_start(const char *path)
 		}
 	}
 
-	if (vsocket->is_server)
-		return vhost_user_start_server(vsocket);
-	else
-		return vhost_user_start_client(vsocket);
+	return vsocket->trans_ops->socket_start(vsocket);
 }
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index f23bb9c..93d11f7 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -22,6 +22,8 @@ struct af_unix_socket {
 };
 
 static int create_unix_socket(struct vhost_user_socket *vsocket);
+static int vhost_user_start_server(struct vhost_user_socket *vsocket);
+static int vhost_user_start_client(struct vhost_user_socket *vsocket);
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /*
@@ -280,7 +282,7 @@ create_unix_socket(struct vhost_user_socket *vsocket)
 	return 0;
 }
 
-int
+static int
 vhost_user_start_server(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *af_vsocket =
@@ -438,7 +440,7 @@ vhost_user_reconnect_init(void)
 	return ret;
 }
 
-int
+static int
 vhost_user_start_client(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *af_vsocket =
@@ -529,6 +531,15 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 }
 
 static int
+af_unix_socket_start(struct vhost_user_socket *vsocket)
+{
+	if (vsocket->is_server)
+		return vhost_user_start_server(vsocket);
+	else
+		return vhost_user_start_client(vsocket);
+}
+
+static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
 {
@@ -541,5 +552,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
+	.socket_start = af_unix_socket_start,
 	.vring_call = af_unix_vring_call,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 40b5c25..c74753b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -324,6 +324,19 @@ struct vhost_transport_ops {
 	void (*socket_cleanup)(struct vhost_user_socket *vsocket);
 
 	/**
+	 * Start establishing vhost-user connections.  This function is
+	 * asynchronous and connections may be established after it has
+	 * returned.  Call vhost_user_add_connection() to register new
+	 * connections.
+	 *
+	 * @param vsocket
+	 *  vhost socket
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*socket_start)(struct vhost_user_socket *vsocket);
+
+	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
 	 * so this function just needs to perform the notification.
@@ -458,9 +471,6 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-int vhost_user_start_server(struct vhost_user_socket *vsocket);
-int vhost_user_start_client(struct vhost_user_socket *vsocket);
-
 extern pthread_t reconn_tid;
 
 int vhost_user_reconnect_init(void);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 06/28] vhost: move vhost-user connection
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (4 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 05/28] vhost: move start server/client calls Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 07/28] vhost: move vhost-user reconnection Nikos Dragazis
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The AF_UNIX transport can accept multiple client connections on a server
socket.  Each connection instantiates a separate vhost-user device,
which is stored as a vhost_user_connection.  This behavior is specific
to AF_UNIX and other transports may not support N connections per
socket endpoint.

Move struct vhost_user_connection to trans_af_unix.c and
conn_list/conn_mutex into struct af_unix_socket.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        | 54 +++---------------------------
 lib/librte_vhost/trans_af_unix.c | 72 ++++++++++++++++++++++++++++++++++++----
 lib/librte_vhost/vhost.h         | 19 ++---------
 3 files changed, 74 insertions(+), 71 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index df6d707..976343c 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -341,13 +341,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		vhost_user_socket_mem_free(vsocket);
 		goto out;
 	}
-	TAILQ_INIT(&vsocket->conn_list);
-	ret = pthread_mutex_init(&vsocket->conn_mutex, NULL);
-	if (ret) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"error: failed to init connection mutex\n");
-		goto out_free;
-	}
 	vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
 
 	/*
@@ -395,7 +388,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Postcopy requested but not compiled\n");
 		ret = -1;
-		goto out_mutex;
+		goto out_free;
 #endif
 	}
 
@@ -403,14 +396,14 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
 		if (vsocket->reconnect && reconn_tid == 0) {
 			if (vhost_user_reconnect_init() != 0)
-				goto out_mutex;
+				goto out_free;
 		}
 	} else {
 		vsocket->is_server = true;
 	}
 	ret = trans_ops->socket_init(vsocket, flags);
 	if (ret < 0) {
-		goto out_mutex;
+		goto out_free;
 	}
 
 	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -418,11 +411,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	pthread_mutex_unlock(&vhost_user.mutex);
 	return ret;
 
-out_mutex:
-	if (pthread_mutex_destroy(&vsocket->conn_mutex)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"error: failed to destroy connection mutex\n");
-	}
 out_free:
 	vhost_user_socket_mem_free(vsocket);
 out:
@@ -439,51 +427,19 @@ rte_vhost_driver_unregister(const char *path)
 {
 	int i;
 	int count;
-	struct vhost_user_connection *conn, *next;
 
 	if (path == NULL)
 		return -1;
 
-again:
 	pthread_mutex_lock(&vhost_user.mutex);
 
 	for (i = 0; i < vhost_user.vsocket_cnt; i++) {
 		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
 
 		if (!strcmp(vsocket->path, path)) {
-			pthread_mutex_lock(&vsocket->conn_mutex);
-			for (conn = TAILQ_FIRST(&vsocket->conn_list);
-			     conn != NULL;
-			     conn = next) {
-				next = TAILQ_NEXT(conn, next);
-
-				/*
-				 * If r/wcb is executing, release the
-				 * conn_mutex lock, and try again since
-				 * the r/wcb may use the conn_mutex lock.
-				 */
-				if (fdset_try_del(&vhost_user.fdset,
-						  conn->connfd) == -1) {
-					pthread_mutex_unlock(
-							&vsocket->conn_mutex);
-					pthread_mutex_unlock(&vhost_user.mutex);
-					goto again;
-				}
-
-				RTE_LOG(INFO, VHOST_CONFIG,
-					"free connfd = %d for device '%s'\n",
-					conn->connfd, path);
-				close(conn->connfd);
-				vhost_destroy_device(conn->vid);
-				TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-				free(conn);
-			}
-			pthread_mutex_unlock(&vsocket->conn_mutex);
-
 			vsocket->trans_ops->socket_cleanup(vsocket);
-
-			pthread_mutex_destroy(&vsocket->conn_mutex);
-			vhost_user_socket_mem_free(vsocket);
+			free(vsocket->path);
+			free(vsocket);
 
 			count = --vhost_user.vsocket_cnt;
 			vhost_user.vsockets[i] = vhost_user.vsockets[count];
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 93d11f7..58fc9e2 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -15,8 +15,20 @@
 
 #define MAX_VIRTIO_BACKLOG 128
 
+TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
+
+struct vhost_user_connection {
+	struct vhost_user_socket *vsocket;
+	int connfd;
+	int vid;
+
+	TAILQ_ENTRY(vhost_user_connection) next;
+};
+
 struct af_unix_socket {
 	struct vhost_user_socket socket; /* must be the first field! */
+	struct vhost_user_connection_list conn_list;
+	pthread_mutex_t conn_mutex;
 	int socket_fd;
 	struct sockaddr_un un;
 };
@@ -131,6 +143,8 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 static void
 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int vid;
 	size_t size;
 	struct vhost_user_connection *conn;
@@ -188,9 +202,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		goto err_cleanup;
 	}
 
-	pthread_mutex_lock(&vsocket->conn_mutex);
-	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
-	pthread_mutex_unlock(&vsocket->conn_mutex);
+	pthread_mutex_lock(&af_vsocket->conn_mutex);
+	TAILQ_INSERT_TAIL(&af_vsocket->conn_list, conn, next);
+	pthread_mutex_unlock(&af_vsocket->conn_mutex);
 
 	fdset_pipe_notify(&vhost_user.fdset);
 	return;
@@ -221,6 +235,8 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 {
 	struct vhost_user_connection *conn = dat;
 	struct vhost_user_socket *vsocket = conn->vsocket;
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
 	ret = vhost_user_msg_handler(conn->vid, connfd);
@@ -238,9 +254,9 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 
 		vhost_destroy_device(conn->vid);
 
-		pthread_mutex_lock(&vsocket->conn_mutex);
-		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-		pthread_mutex_unlock(&vsocket->conn_mutex);
+		pthread_mutex_lock(&af_vsocket->conn_mutex);
+		TAILQ_REMOVE(&af_vsocket->conn_list, conn, next);
+		pthread_mutex_unlock(&af_vsocket->conn_mutex);
 
 		free(conn);
 
@@ -512,6 +528,18 @@ static int
 af_unix_socket_init(struct vhost_user_socket *vsocket,
 		    uint64_t flags __rte_unused)
 {
+	struct af_unix_socket *af_vsocket =
+		container_of(vsocket, struct af_unix_socket, socket);
+	int ret;
+
+	TAILQ_INIT(&af_vsocket->conn_list);
+	ret = pthread_mutex_init(&af_vsocket->conn_mutex, NULL);
+	if (ret) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: failed to init connection mutex\n");
+		return -1;
+	}
+
 	return create_unix_socket(vsocket);
 }
 
@@ -520,6 +548,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *af_vsocket =
 		container_of(vsocket, struct af_unix_socket, socket);
+	struct vhost_user_connection *conn, *next;
 
 	if (vsocket->is_server) {
 		fdset_del(&vhost_user.fdset, af_vsocket->socket_fd);
@@ -528,6 +557,37 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	} else if (vsocket->reconnect) {
 		vhost_user_remove_reconnect(vsocket);
 	}
+
+again:
+	pthread_mutex_lock(&af_vsocket->conn_mutex);
+	for (conn = TAILQ_FIRST(&af_vsocket->conn_list);
+	     conn != NULL;
+	     conn = next) {
+		next = TAILQ_NEXT(conn, next);
+
+		/*
+		 * If r/wcb is executing, release the
+		 * conn_mutex lock, and try again since
+		 * the r/wcb may use the conn_mutex lock.
+		 */
+		if (fdset_try_del(&vhost_user.fdset,
+				  conn->connfd) == -1) {
+			pthread_mutex_unlock(
+					&af_vsocket->conn_mutex);
+			goto again;
+		}
+
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"free connfd = %d for device '%s'\n",
+			conn->connfd, vsocket->path);
+		close(conn->connfd);
+		vhost_destroy_device(conn->vid);
+		TAILQ_REMOVE(&af_vsocket->conn_list, conn, next);
+		free(conn);
+	}
+	pthread_mutex_unlock(&af_vsocket->conn_mutex);
+
+	pthread_mutex_destroy(&af_vsocket->conn_mutex);
 }
 
 static int
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c74753b..5c3987d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -404,13 +404,10 @@ struct virtio_net {
 	struct rte_vhost_user_extern_ops extern_ops;
 } __rte_cache_aligned;
 
-/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect
- * declarations are temporary measures for moving AF_UNIX code into
- * trans_af_unix.c.  They will be cleaned up as socket.c is untangled from
- * trans_af_unix.c.
+/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary
+ * measures for moving AF_UNIX code into trans_af_unix.c.  They will be cleaned
+ * up as socket.c is untangled from trans_af_unix.c.
  */
-TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
-
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
@@ -421,8 +418,6 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
  * struct.
  */
 struct vhost_user_socket {
-	struct vhost_user_connection_list conn_list;
-	pthread_mutex_t conn_mutex;
 	char *path;
 	bool is_server;
 	bool reconnect;
@@ -453,14 +448,6 @@ struct vhost_user_socket {
 	struct vhost_transport_ops const *trans_ops;
 };
 
-struct vhost_user_connection {
-	struct vhost_user_socket *vsocket;
-	int connfd;
-	int vid;
-
-	TAILQ_ENTRY(vhost_user_connection) next;
-};
-
 #define MAX_VHOST_SOCKET 1024
 struct vhost_user {
 	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 07/28] vhost: move vhost-user reconnection
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (5 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 06/28] vhost: move vhost-user connection Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 08/28] vhost: move vhost-user fdset Nikos Dragazis
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The socket reconnection code is highly specific to AF_UNIX, so move the
remaining pieces of it into trans_af_unix.c.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        |  4 ----
 lib/librte_vhost/trans_af_unix.c |  9 +++++++--
 lib/librte_vhost/vhost.h         | 10 +++-------
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 976343c..373c01d 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -394,10 +394,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 
 	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
 		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-		if (vsocket->reconnect && reconn_tid == 0) {
-			if (vhost_user_reconnect_init() != 0)
-				goto out_free;
-		}
 	} else {
 		vsocket->is_server = true;
 	}
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 58fc9e2..00d5366 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -361,7 +361,7 @@ struct vhost_user_reconnect_list {
 };
 
 static struct vhost_user_reconnect_list reconn_list;
-pthread_t reconn_tid;
+static pthread_t reconn_tid;
 
 static int
 vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
@@ -431,7 +431,7 @@ vhost_user_client_reconnect(void *arg __rte_unused)
 	return NULL;
 }
 
-int
+static int
 vhost_user_reconnect_init(void)
 {
 	int ret;
@@ -532,6 +532,11 @@ af_unix_socket_init(struct vhost_user_socket *vsocket,
 		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
+	if (vsocket->reconnect && reconn_tid == 0) {
+		if (vhost_user_reconnect_init() != 0)
+			return -1;
+	}
+
 	TAILQ_INIT(&af_vsocket->conn_list);
 	ret = pthread_mutex_init(&af_vsocket->conn_mutex, NULL);
 	if (ret) {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5c3987d..d8b5ec2 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -404,9 +404,9 @@ struct virtio_net {
 	struct rte_vhost_user_extern_ops extern_ops;
 } __rte_cache_aligned;
 
-/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary
- * measures for moving AF_UNIX code into trans_af_unix.c.  They will be cleaned
- * up as socket.c is untangled from trans_af_unix.c.
+/* The vhost_user and vhost_user_socket declarations are temporary measures for
+ * moving AF_UNIX code into trans_af_unix.c.  They will be cleaned up as
+ * socket.c is untangled from trans_af_unix.c.
  */
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
@@ -458,10 +458,6 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-extern pthread_t reconn_tid;
-
-int vhost_user_reconnect_init(void);
-
 static __rte_always_inline bool
 vq_is_packed(struct virtio_net *dev)
 {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 08/28] vhost: move vhost-user fdset
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (6 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 07/28] vhost: move vhost-user reconnection Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 09/28] vhost: propagate vhost transport operations Nikos Dragazis
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The fdset is used by the AF_UNIX transport code but other transports may
not need it.  Move it to trans_af_unix.c and then make struct vhost_user
private again since nothing outside socket.c needs it.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/socket.c        | 37 +++++++---------------------------
 lib/librte_vhost/trans_af_unix.c | 43 +++++++++++++++++++++++++++++++++++-----
 lib/librte_vhost/vhost.h         | 15 --------------
 3 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 373c01d..fc78b63 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -16,13 +16,14 @@
 #include "vhost.h"
 #include "vhost_user.h"
 
+#define MAX_VHOST_SOCKET 1024
+struct vhost_user {
+	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
+	int vsocket_cnt;
+	pthread_mutex_t mutex;
+};
+
 struct vhost_user vhost_user = {
-	.fdset = {
-		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
-		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
-		.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
-		.num = 0
-	},
 	.vsocket_cnt = 0,
 	.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
@@ -484,7 +485,6 @@ int
 rte_vhost_driver_start(const char *path)
 {
 	struct vhost_user_socket *vsocket;
-	static pthread_t fdset_tid;
 
 	pthread_mutex_lock(&vhost_user.mutex);
 	vsocket = find_vhost_user_socket(path);
@@ -493,28 +493,5 @@ rte_vhost_driver_start(const char *path)
 	if (!vsocket)
 		return -1;
 
-	if (fdset_tid == 0) {
-		/**
-		 * create a pipe which will be waited by poll and notified to
-		 * rebuild the wait list of poll.
-		 */
-		if (fdset_pipe_init(&vhost_user.fdset) < 0) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to create pipe for vhost fdset\n");
-			return -1;
-		}
-
-		int ret = rte_ctrl_thread_create(&fdset_tid,
-			"vhost-events", NULL, fdset_event_dispatch,
-			&vhost_user.fdset);
-		if (ret != 0) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to create fdset handling thread");
-
-			fdset_pipe_uninit(&vhost_user.fdset);
-			return -1;
-		}
-	}
-
 	return vsocket->trans_ops->socket_start(vsocket);
 }
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 00d5366..e8a4ef2 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -10,11 +10,19 @@
 
 #include <rte_log.h>
 
+#include "fd_man.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
 #define MAX_VIRTIO_BACKLOG 128
 
+static struct fdset af_unix_fdset = {
+	.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+	.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+	.fd_pooling_mutex = PTHREAD_MUTEX_INITIALIZER,
+	.num = 0
+};
+
 TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 
 struct vhost_user_connection {
@@ -189,7 +197,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	conn->connfd = fd;
 	conn->vsocket = vsocket;
 	conn->vid = vid;
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
+	ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb,
 			NULL, conn);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -206,7 +214,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	TAILQ_INSERT_TAIL(&af_vsocket->conn_list, conn, next);
 	pthread_mutex_unlock(&af_vsocket->conn_mutex);
 
-	fdset_pipe_notify(&vhost_user.fdset);
+	fdset_pipe_notify(&af_unix_fdset);
 	return;
 
 err_cleanup:
@@ -330,7 +338,7 @@ vhost_user_start_server(struct vhost_user_socket *vsocket)
 	if (ret < 0)
 		goto err;
 
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
+	ret = fdset_add(&af_unix_fdset, fd, vhost_user_server_new_connection,
 		  NULL, vsocket);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -556,7 +564,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	struct vhost_user_connection *conn, *next;
 
 	if (vsocket->is_server) {
-		fdset_del(&vhost_user.fdset, af_vsocket->socket_fd);
+		fdset_del(&af_unix_fdset, af_vsocket->socket_fd);
 		close(af_vsocket->socket_fd);
 		unlink(vsocket->path);
 	} else if (vsocket->reconnect) {
@@ -575,7 +583,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 		 * conn_mutex lock, and try again since
 		 * the r/wcb may use the conn_mutex lock.
 		 */
-		if (fdset_try_del(&vhost_user.fdset,
+		if (fdset_try_del(&af_unix_fdset,
 				  conn->connfd) == -1) {
 			pthread_mutex_unlock(
 					&af_vsocket->conn_mutex);
@@ -598,6 +606,31 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 static int
 af_unix_socket_start(struct vhost_user_socket *vsocket)
 {
+	static pthread_t fdset_tid;
+
+	if (fdset_tid == 0) {
+		/**
+		 * create a pipe which will be waited by poll and notified to
+		 * rebuild the wait list of poll.
+		 */
+		if (fdset_pipe_init(&af_unix_fdset) < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to create pipe for vhost fdset\n");
+			return -1;
+		}
+
+		int ret = rte_ctrl_thread_create(&fdset_tid,
+			"vhost-events", NULL, fdset_event_dispatch,
+			&af_unix_fdset);
+		if (ret != 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to create fdset handling thread");
+
+			fdset_pipe_uninit(&af_unix_fdset);
+			return -1;
+		}
+	}
+
 	if (vsocket->is_server)
 		return vhost_user_start_server(vsocket);
 	else
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index d8b5ec2..64b7f77 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -22,7 +22,6 @@
 #include <rte_rwlock.h>
 #include <rte_malloc.h>
 
-#include "fd_man.h"
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
 
@@ -404,10 +403,6 @@ struct virtio_net {
 	struct rte_vhost_user_extern_ops extern_ops;
 } __rte_cache_aligned;
 
-/* The vhost_user and vhost_user_socket declarations are temporary measures for
- * moving AF_UNIX code into trans_af_unix.c.  They will be cleaned up as
- * socket.c is untangled from trans_af_unix.c.
- */
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
@@ -448,16 +443,6 @@ struct vhost_user_socket {
 	struct vhost_transport_ops const *trans_ops;
 };
 
-#define MAX_VHOST_SOCKET 1024
-struct vhost_user {
-	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
-	struct fdset fdset;
-	int vsocket_cnt;
-	pthread_mutex_t mutex;
-};
-
-extern struct vhost_user vhost_user;
-
 static __rte_always_inline bool
 vq_is_packed(struct virtio_net *dev)
 {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 09/28] vhost: propagate vhost transport operations
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (7 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 08/28] vhost: move vhost-user fdset Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 10/28] vhost: use a single structure for the device state Nikos Dragazis
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

This patch propagates struct vhost_user_socket's vhost_transport_ops
into the newly created vhost device.

This patch completes the initial refactoring of socket.c, with the
AF_UNIX-specific code now in trans_af_unix.c and the librte_vhost API
entrypoints in socket.c.

Now it is time to turn towards vhost_user.c and its mixture of
vhost-user protocol processing and socket I/O.  The socket I/O will be
moved into trans_af_unix.c so that other transports can be added that
don't use file descriptors.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/trans_af_unix.c | 2 +-
 lib/librte_vhost/vhost.c         | 4 ++--
 lib/librte_vhost/vhost.h         | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index e8a4ef2..865d862 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -167,7 +167,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		return;
 	}
 
-	vid = vhost_new_device();
+	vid = vhost_new_device(vsocket->trans_ops);
 	if (vid == -1) {
 		goto err;
 	}
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index a36bc01..a72edf3 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -480,7 +480,7 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(const struct vhost_transport_ops *trans_ops)
 {
 	struct virtio_net *dev;
 	int i;
@@ -507,7 +507,7 @@ vhost_new_device(void)
 	dev->vid = i;
 	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	dev->slave_req_fd = -1;
-	dev->trans_ops = &af_unix_trans_ops;
+	dev->trans_ops = trans_ops;
 	dev->vdpa_dev_id = -1;
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->slave_req_lock);
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 64b7f77..0831b27 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -568,7 +568,7 @@ get_device(int vid)
 	return dev;
 }
 
-int vhost_new_device(void);
+int vhost_new_device(const struct vhost_transport_ops *trans_ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 10/28] vhost: use a single structure for the device state
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (8 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 09/28] vhost: propagate vhost transport operations Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 11/28] vhost: extract socket I/O into transport Nikos Dragazis
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

There is a 1:1 relationship between struct virtio_net and struct
vhost_user_connection.  They share the same lifetime.  struct virtio_net
is the per-device state that is part of the vhost.h API.  struct
vhost_user_connection is the AF_UNIX-specific per-device state and is
private to trans_af_unix.c.  It will be necessary to go between these
two structs.

This patch embeds struct virtio_net within struct vhost_user_connection
so that AF_UNIX transport code can convert a struct virtio_net pointer
into a struct vhost_user_connection pointer.  There is now just a single
malloc/free for both of these structs together.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/trans_af_unix.c | 60 +++++++++++++++-------------------------
 lib/librte_vhost/vhost.c         | 12 ++++----
 lib/librte_vhost/vhost.h         | 11 +++++++-
 3 files changed, 40 insertions(+), 43 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 865d862..7e119b4 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -26,9 +26,9 @@ static struct fdset af_unix_fdset = {
 TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 
 struct vhost_user_connection {
+	struct virtio_net device; /* must be the first field! */
 	struct vhost_user_socket *vsocket;
 	int connfd;
-	int vid;
 
 	TAILQ_ENTRY(vhost_user_connection) next;
 };
@@ -153,7 +153,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *af_vsocket =
 		container_of(vsocket, struct af_unix_socket, socket);
-	int vid;
+	struct virtio_net *dev;
 	size_t size;
 	struct vhost_user_connection *conn;
 	int ret;
@@ -161,42 +161,37 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	if (vsocket == NULL)
 		return;
 
-	conn = malloc(sizeof(*conn));
-	if (conn == NULL) {
-		close(fd);
+	dev = vhost_new_device(vsocket->trans_ops);
+	if (!dev) {
 		return;
 	}
 
-	vid = vhost_new_device(vsocket->trans_ops);
-	if (vid == -1) {
-		goto err;
-	}
+	conn = container_of(dev, struct vhost_user_connection, device);
+	conn->connfd = fd;
+	conn->vsocket = vsocket;
 
 	size = strnlen(vsocket->path, PATH_MAX);
-	vhost_set_ifname(vid, vsocket->path, size);
+	vhost_set_ifname(dev->vid, vsocket->path, size);
 
-	vhost_set_builtin_virtio_net(vid, vsocket->use_builtin_virtio_net);
+	vhost_set_builtin_virtio_net(dev->vid, vsocket->use_builtin_virtio_net);
 
-	vhost_attach_vdpa_device(vid, vsocket->vdpa_dev_id);
+	vhost_attach_vdpa_device(dev->vid, vsocket->vdpa_dev_id);
 
 	if (vsocket->dequeue_zero_copy)
-		vhost_enable_dequeue_zero_copy(vid);
+		vhost_enable_dequeue_zero_copy(dev->vid);
 
-	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid);
 
 	if (vsocket->notify_ops->new_connection) {
-		ret = vsocket->notify_ops->new_connection(vid);
+		ret = vsocket->notify_ops->new_connection(dev->vid);
 		if (ret < 0) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"failed to add vhost user connection with fd %d\n",
 				fd);
-			goto err_cleanup;
+			goto err;
 		}
 	}
 
-	conn->connfd = fd;
-	conn->vsocket = vsocket;
-	conn->vid = vid;
 	ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb,
 			NULL, conn);
 	if (ret < 0) {
@@ -205,9 +200,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 			fd);
 
 		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
+			vsocket->notify_ops->destroy_connection(dev->vid);
 
-		goto err_cleanup;
+		goto err;
 	}
 
 	pthread_mutex_lock(&af_vsocket->conn_mutex);
@@ -217,11 +212,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	fdset_pipe_notify(&af_unix_fdset);
 	return;
 
-err_cleanup:
-	vhost_destroy_device(vid);
 err:
-	free(conn);
-	close(fd);
+	close(conn->connfd);
+	vhost_destroy_device(dev->vid);
 }
 
 /* call back when there is new vhost-user connection from client  */
@@ -247,26 +240,19 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
-	ret = vhost_user_msg_handler(conn->vid, connfd);
+	ret = vhost_user_msg_handler(conn->device.vid, connfd);
 	if (ret < 0) {
-		struct virtio_net *dev = get_device(conn->vid);
-
 		close(connfd);
 		*remove = 1;
 
-		if (dev)
-			vhost_destroy_device_notify(dev);
-
 		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
-
-		vhost_destroy_device(conn->vid);
+			vsocket->notify_ops->destroy_connection(conn->device.vid);
 
 		pthread_mutex_lock(&af_vsocket->conn_mutex);
 		TAILQ_REMOVE(&af_vsocket->conn_list, conn, next);
 		pthread_mutex_unlock(&af_vsocket->conn_mutex);
 
-		free(conn);
+		vhost_destroy_device(conn->device.vid);
 
 		if (vsocket->reconnect) {
 			create_unix_socket(vsocket);
@@ -594,9 +580,8 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 			"free connfd = %d for device '%s'\n",
 			conn->connfd, vsocket->path);
 		close(conn->connfd);
-		vhost_destroy_device(conn->vid);
 		TAILQ_REMOVE(&af_vsocket->conn_list, conn, next);
-		free(conn);
+		vhost_destroy_device(conn->device.vid);
 	}
 	pthread_mutex_unlock(&af_vsocket->conn_mutex);
 
@@ -648,6 +633,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
+	.device_size = sizeof(struct vhost_user_connection),
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index a72edf3..0fdc54f 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -7,6 +7,7 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
 #include <numa.h>
 #include <numaif.h>
@@ -479,7 +480,7 @@ reset_device(struct virtio_net *dev)
  * Invoked when there is a new vhost-user connection established (when
  * there is a new virtio device being attached).
  */
-int
+struct virtio_net *
 vhost_new_device(const struct vhost_transport_ops *trans_ops)
 {
 	struct virtio_net *dev;
@@ -493,14 +494,15 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 	if (i == MAX_VHOST_DEVICE) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to find a free slot for new device.\n");
-		return -1;
+		return NULL;
 	}
 
-	dev = rte_zmalloc(NULL, sizeof(struct virtio_net), 0);
+	assert(trans_ops->device_size >= sizeof(struct virtio_net));
+	dev = rte_zmalloc(NULL, trans_ops->device_size, 0);
 	if (dev == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to allocate memory for new dev.\n");
-		return -1;
+		return NULL;
 	}
 
 	vhost_devices[i] = dev;
@@ -512,7 +514,7 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 	dev->postcopy_ufd = -1;
 	rte_spinlock_init(&dev->slave_req_lock);
 
-	return i;
+	return dev;
 }
 
 void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0831b27..b9e4df1 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -298,6 +298,9 @@ struct vhost_transport_ops {
 	/** Size of struct vhost_user_socket-derived per-socket state */
 	size_t socket_size;
 
+	/** Size of struct virtio_net-derived per-device state */
+	size_t device_size;
+
 	/**
 	 * Initialize a vhost-user socket that is being created by
 	 * rte_vhost_driver_register().  This function checks that the flags
@@ -356,6 +359,11 @@ extern const struct vhost_transport_ops af_unix_trans_ops;
 /**
  * Device structure contains all configuration information relating
  * to the device.
+ *
+ * Transport-specific per-device state can be kept by embedding this struct at
+ * the beginning of a transport-specific struct.  Set
+ * vhost_transport_ops->device_size to the size of the transport-specific
+ * struct.
  */
 struct virtio_net {
 	/* Frontend (QEMU) memory and memory region information */
@@ -568,7 +576,8 @@ get_device(int vid)
 	return dev;
 }
 
-int vhost_new_device(const struct vhost_transport_ops *trans_ops);
+struct virtio_net *
+vhost_new_device(const struct vhost_transport_ops *trans_ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 11/28] vhost: extract socket I/O into transport
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (9 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 10/28] vhost: use a single structure for the device state Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 12/28] vhost: move slave request fd and lock Nikos Dragazis
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The core vhost-user protocol code should not do socket I/O, because the
details are transport-specific.  Move code to send and receive
vhost-user messages into trans_af_unix.c.

The connection fd is a transport-specific feature. Therefore, it should
and eventually will be removed from the core vhost-user code. That is,
it will be removed from the vhost_user_msg_handler() and the message
handlers. We keep it for now, because vhost_user_set_mem_table() needs
it. In a later commit, we will refactor the map/unmap functionality and
after that we will be able to remove the connection fds from the core
vhost-user code.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/trans_af_unix.c | 70 +++++++++++++++++++++++++++++++++---
 lib/librte_vhost/vhost.h         | 26 ++++++++++++++
 lib/librte_vhost/vhost_user.c    | 78 ++++++++--------------------------------
 lib/librte_vhost/vhost_user.h    |  7 +---
 4 files changed, 108 insertions(+), 73 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 7e119b4..c0ba8df 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -50,7 +50,7 @@ static void vhost_user_read_cb(int connfd, void *dat, int *remove);
  * return bytes# of read on success or negative val on failure. Update fdnum
  * with number of fds read.
  */
-int
+static int
 read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
 		int *fd_num)
 {
@@ -101,8 +101,8 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
 	return ret;
 }
 
-int
-send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+static int
+send_fd_message(int sockfd, void *buf, int buflen, int *fds, int fd_num)
 {
 	struct iovec iov;
 	struct msghdr msgh;
@@ -148,6 +148,23 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 	return ret;
 }
 
+static int
+af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	return send_fd_message(conn->connfd, msg,
+			       VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
+}
+
+static int
+af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	return send_fd_message(dev->slave_req_fd, msg,
+			       VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
+}
+
 static void
 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
@@ -231,6 +248,36 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
 	vhost_user_add_connection(fd, vsocket);
 }
 
+/* return bytes# of read on success or negative val on failure. */
+int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
+		msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num);
+	if (ret <= 0)
+		return ret;
+
+	if (msg->size) {
+		if (msg->size > sizeof(msg->payload)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"invalid msg size: %d\n", msg->size);
+			return -1;
+		}
+		ret = read(sockfd, &msg->payload, msg->size);
+		if (ret <= 0)
+			return ret;
+		if (ret != (int)msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"read control message failed\n");
+			return -1;
+		}
+	}
+
+	return ret;
+}
+
 static void
 vhost_user_read_cb(int connfd, void *dat, int *remove)
 {
@@ -238,10 +285,23 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 	struct vhost_user_socket *vsocket = conn->vsocket;
 	struct af_unix_socket *af_vsocket =
 		container_of(vsocket, struct af_unix_socket, socket);
+	struct VhostUserMsg msg;
 	int ret;
 
-	ret = vhost_user_msg_handler(conn->device.vid, connfd);
+	ret = read_vhost_message(connfd, &msg);
+	if (ret <= 0) {
+		if (ret < 0)
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"vhost read message failed\n");
+		else if (ret == 0)
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"vhost peer closed\n");
+		goto err;
+	}
+
+	ret = vhost_user_msg_handler(conn->device.vid, connfd, &msg);
 	if (ret < 0) {
+err:
 		close(connfd);
 		*remove = 1;
 
@@ -638,4 +698,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
 	.vring_call = af_unix_vring_call,
+	.send_reply = af_unix_send_reply,
+	.send_slave_req = af_unix_send_slave_req,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b9e4df1..b20773c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -290,6 +290,7 @@ struct guest_page {
 
 struct virtio_net;
 struct vhost_user_socket;
+struct VhostUserMsg;
 
 /**
  * A structure containing function pointers for transport-specific operations.
@@ -351,6 +352,31 @@ struct vhost_transport_ops {
 	 *  0 on success, -1 on failure
 	 */
 	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
+
+	/**
+	 * Send a reply to the master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param reply
+	 *  reply message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*send_reply)(struct virtio_net *dev, struct VhostUserMsg *reply);
+
+	/**
+	 * Send a slave request to the master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param req
+	 *  request message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*send_slave_req)(struct virtio_net *dev,
+			      struct VhostUserMsg *req);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c9e29ec..5c12435 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -80,8 +80,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_POSTCOPY_END]  = "VHOST_USER_POSTCOPY_END",
 };
 
-static int send_vhost_reply(int sockfd, struct VhostUserMsg *msg);
-static int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
+static int send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg);
+int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
 
 static uint64_t
 get_blk_size(int fd)
@@ -1042,7 +1042,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	if (dev->postcopy_listening) {
 		/* Send the addresses back to qemu */
 		msg->fd_num = 0;
-		send_vhost_reply(main_fd, msg);
+		send_vhost_reply(dev, msg);
 
 		/* Wait for qemu to acknolwedge it's got the addresses
 		 * we've got to wait before we're allowed to generate faults.
@@ -1764,49 +1764,8 @@ static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_POSTCOPY_END] = vhost_user_postcopy_end,
 };
 
-
-/* return bytes# of read on success or negative val on failure. */
 static int
-read_vhost_message(int sockfd, struct VhostUserMsg *msg)
-{
-	int ret;
-
-	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
-		msg->fds, VHOST_MEMORY_MAX_NREGIONS, &msg->fd_num);
-	if (ret <= 0)
-		return ret;
-
-	if (msg->size) {
-		if (msg->size > sizeof(msg->payload)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"invalid msg size: %d\n", msg->size);
-			return -1;
-		}
-		ret = read(sockfd, &msg->payload, msg->size);
-		if (ret <= 0)
-			return ret;
-		if (ret != (int)msg->size) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"read control message failed\n");
-			return -1;
-		}
-	}
-
-	return ret;
-}
-
-static int
-send_vhost_message(int sockfd, struct VhostUserMsg *msg)
-{
-	if (!msg)
-		return 0;
-
-	return send_fd_message(sockfd, (char *)msg,
-		VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
-}
-
-static int
-send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
+send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
 	if (!msg)
 		return 0;
@@ -1816,7 +1775,7 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
 	msg->flags |= VHOST_USER_VERSION;
 	msg->flags |= VHOST_USER_REPLY_MASK;
 
-	return send_vhost_message(sockfd, msg);
+	return dev->trans_ops->send_reply(dev, msg);
 }
 
 static int
@@ -1827,7 +1786,7 @@ send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg)
 	if (msg->flags & VHOST_USER_NEED_REPLY)
 		rte_spinlock_lock(&dev->slave_req_lock);
 
-	ret = send_vhost_message(dev->slave_req_fd, msg);
+	ret = dev->trans_ops->send_slave_req(dev, msg);
 	if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY))
 		rte_spinlock_unlock(&dev->slave_req_lock);
 
@@ -1908,10 +1867,10 @@ vhost_user_unlock_all_queue_pairs(struct virtio_net *dev)
 }
 
 int
-vhost_user_msg_handler(int vid, int fd)
+vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_)
 {
+	struct VhostUserMsg msg = *msg_; /* copy so we can build the reply */
 	struct virtio_net *dev;
-	struct VhostUserMsg msg;
 	struct rte_vdpa_device *vdpa_dev;
 	int did = -1;
 	int ret;
@@ -1933,15 +1892,8 @@ vhost_user_msg_handler(int vid, int fd)
 		}
 	}
 
-	ret = read_vhost_message(fd, &msg);
-	if (ret <= 0) {
-		if (ret < 0)
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"vhost read message failed\n");
-		else
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"vhost peer closed\n");
-
+	if (msg.request.master >= VHOST_USER_MAX) {
+		RTE_LOG(ERR, VHOST_CONFIG, "vhost read incorrect message\n");
 		return -1;
 	}
 
@@ -2004,7 +1956,7 @@ vhost_user_msg_handler(int vid, int fd)
 				(void *)&msg);
 		switch (ret) {
 		case RTE_VHOST_MSG_RESULT_REPLY:
-			send_vhost_reply(fd, &msg);
+			send_vhost_reply(dev, &msg);
 			/* Fall-through */
 		case RTE_VHOST_MSG_RESULT_ERR:
 		case RTE_VHOST_MSG_RESULT_OK:
@@ -2038,7 +1990,7 @@ vhost_user_msg_handler(int vid, int fd)
 			RTE_LOG(DEBUG, VHOST_CONFIG,
 				"Processing %s succeeded and needs reply.\n",
 				vhost_message_str[request]);
-			send_vhost_reply(fd, &msg);
+			send_vhost_reply(dev, &msg);
 			handled = true;
 			break;
 		default:
@@ -2053,7 +2005,7 @@ vhost_user_msg_handler(int vid, int fd)
 				(void *)&msg);
 		switch (ret) {
 		case RTE_VHOST_MSG_RESULT_REPLY:
-			send_vhost_reply(fd, &msg);
+			send_vhost_reply(dev, &msg);
 			/* Fall-through */
 		case RTE_VHOST_MSG_RESULT_ERR:
 		case RTE_VHOST_MSG_RESULT_OK:
@@ -2083,7 +2035,7 @@ vhost_user_msg_handler(int vid, int fd)
 		msg.payload.u64 = ret == RTE_VHOST_MSG_RESULT_ERR;
 		msg.size = sizeof(msg.payload.u64);
 		msg.fd_num = 0;
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 	} else if (ret == RTE_VHOST_MSG_RESULT_ERR) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"vhost message handling failed.\n");
@@ -2161,7 +2113,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 		},
 	};
 
-	ret = send_vhost_message(dev->slave_req_fd, &msg);
+	ret = send_vhost_slave_req(dev, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 				"Failed to send IOTLB miss message (%d)\n",
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2a650fe..0169bd2 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -146,12 +146,7 @@ typedef struct VhostUserMsg {
 
 
 /* vhost_user.c */
-int vhost_user_msg_handler(int vid, int fd);
+int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg);
 int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
-/* socket.c */
-int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int max_fds,
-		int *fd_num);
-int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
-
 #endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 12/28] vhost: move slave request fd and lock
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (10 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 11/28] vhost: extract socket I/O into transport Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 13/28] vhost: move mmap/munmap Nikos Dragazis
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The slave request file descriptor is specific to the AF_UNIX transport.
Move this field along with its spinlock out of struct virtio_net and
into the trans_af_unix.c private struct vhost_user_connection struct.
This implies that we also had to move the associated functions
send_vhost_slave_message() and process_slave_message_reply() out of
vhost_user.c and into trans_af_unix.c. We also moved the spinlock
initialization out of vhost_new_connection() and into trans_af_unix.c.

This change will allow future transports to implement the slave request
fd without relying on socket I/O.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/trans_af_unix.c | 87 +++++++++++++++++++++++++++++++++++++++-
 lib/librte_vhost/vhost.c         |  4 +-
 lib/librte_vhost/vhost.h         | 41 +++++++++++++++++--
 lib/librte_vhost/vhost_user.c    | 67 ++++---------------------------
 4 files changed, 132 insertions(+), 67 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index c0ba8df..5f9ef5a 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -29,6 +29,8 @@ struct vhost_user_connection {
 	struct virtio_net device; /* must be the first field! */
 	struct vhost_user_socket *vsocket;
 	int connfd;
+	int slave_req_fd;
+	rte_spinlock_t slave_req_lock;
 
 	TAILQ_ENTRY(vhost_user_connection) next;
 };
@@ -41,6 +43,7 @@ struct af_unix_socket {
 	struct sockaddr_un un;
 };
 
+int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
 static int create_unix_socket(struct vhost_user_socket *vsocket);
 static int vhost_user_start_server(struct vhost_user_socket *vsocket);
 static int vhost_user_start_client(struct vhost_user_socket *vsocket);
@@ -161,8 +164,71 @@ af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
 static int
 af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
-	return send_fd_message(dev->slave_req_fd, msg,
-			       VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+	int ret;
+
+	if (msg->flags & VHOST_USER_NEED_REPLY)
+		rte_spinlock_lock(&conn->slave_req_lock);
+
+	ret = send_fd_message(conn->slave_req_fd, msg,
+			VHOST_USER_HDR_SIZE + msg->size, msg->fds, msg->fd_num);
+
+	if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY))
+		rte_spinlock_unlock(&conn->slave_req_lock);
+
+	return ret;
+}
+
+static int
+af_unix_process_slave_message_reply(struct virtio_net *dev,
+				    const struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+	struct VhostUserMsg msg_reply;
+	int ret;
+
+	if ((msg->flags & VHOST_USER_NEED_REPLY) == 0)
+		return 0;
+
+	if (read_vhost_message(conn->slave_req_fd, &msg_reply) < 0) {
+		ret = -1;
+		goto out;
+	}
+
+	if (msg_reply.request.slave != msg->request.slave) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Received unexpected msg type (%u), expected %u\n",
+			msg_reply.request.slave, msg->request.slave);
+		ret = -1;
+		goto out;
+	}
+
+	ret = msg_reply.payload.u64 ? -1 : 0;
+
+out:
+	rte_spinlock_unlock(&conn->slave_req_lock);
+	return ret;
+}
+
+static int
+af_unix_set_slave_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+	int fd = msg->fds[0];
+
+	if (fd < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid file descriptor for slave channel (%d)\n",
+				fd);
+		return -1;
+	}
+
+	conn->slave_req_fd = fd;
+
+	return 0;
 }
 
 static void
@@ -185,7 +251,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 
 	conn = container_of(dev, struct vhost_user_connection, device);
 	conn->connfd = fd;
+	conn->slave_req_fd = -1;
 	conn->vsocket = vsocket;
+	rte_spinlock_init(&conn->slave_req_lock);
 
 	size = strnlen(vsocket->path, PATH_MAX);
 	vhost_set_ifname(dev->vid, vsocket->path, size);
@@ -682,6 +750,18 @@ af_unix_socket_start(struct vhost_user_socket *vsocket)
 		return vhost_user_start_client(vsocket);
 }
 
+static void
+af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	if (conn->slave_req_fd >= 0) {
+		close(conn->slave_req_fd);
+		conn->slave_req_fd = -1;
+	}
+}
+
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
@@ -697,7 +777,10 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
+	.cleanup_device = af_unix_cleanup_device,
 	.vring_call = af_unix_vring_call,
 	.send_reply = af_unix_send_reply,
 	.send_slave_req = af_unix_send_slave_req,
+	.process_slave_message_reply = af_unix_process_slave_message_reply,
+	.set_slave_req_fd = af_unix_set_slave_req_fd,
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0fdc54f..5b16390 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -256,6 +256,8 @@ cleanup_device(struct virtio_net *dev, int destroy)
 
 	for (i = 0; i < dev->nr_vring; i++)
 		cleanup_vq(dev->virtqueue[i], destroy);
+
+	dev->trans_ops->cleanup_device(dev, destroy);
 }
 
 void
@@ -508,11 +510,9 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 	vhost_devices[i] = dev;
 	dev->vid = i;
 	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
-	dev->slave_req_fd = -1;
 	dev->trans_ops = trans_ops;
 	dev->vdpa_dev_id = -1;
 	dev->postcopy_ufd = -1;
-	rte_spinlock_init(&dev->slave_req_lock);
 
 	return dev;
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b20773c..2213fbe 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -340,6 +340,16 @@ struct vhost_transport_ops {
 	int (*socket_start)(struct vhost_user_socket *vsocket);
 
 	/**
+	* Free resources associated with this device.
+	*
+	* @param dev
+	*  vhost device
+	* @param destroy
+	*  0 on device reset, 1 on full cleanup.
+	*/
+	void (*cleanup_device)(struct virtio_net *dev, int destroy);
+
+	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
 	 * so this function just needs to perform the notification.
@@ -377,6 +387,34 @@ struct vhost_transport_ops {
 	 */
 	int (*send_slave_req)(struct virtio_net *dev,
 			      struct VhostUserMsg *req);
+
+	/**
+	 * Process the master's reply on a slave request.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  slave request message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*process_slave_message_reply)(struct virtio_net *dev,
+					   const struct VhostUserMsg *msg);
+
+	/**
+	 * Process VHOST_USER_SET_SLAVE_REQ_FD message.  After this function
+	 * succeeds send_slave_req() may be called to submit requests to the
+	 * master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*set_slave_req_fd)(struct virtio_net *dev,
+				struct VhostUserMsg *msg);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
@@ -419,9 +457,6 @@ struct virtio_net {
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
 
-	int			slave_req_fd;
-	rte_spinlock_t		slave_req_lock;
-
 	int			postcopy_ufd;
 	int			postcopy_listening;
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5c12435..a4dcba1 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -160,11 +160,6 @@ vhost_backend_cleanup(struct virtio_net *dev)
 		dev->log_addr = 0;
 	}
 
-	if (dev->slave_req_fd >= 0) {
-		close(dev->slave_req_fd);
-		dev->slave_req_fd = -1;
-	}
-
 	if (dev->postcopy_ufd >= 0) {
 		close(dev->postcopy_ufd);
 		dev->postcopy_ufd = -1;
@@ -1549,17 +1544,13 @@ static int
 vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg,
 			int main_fd __rte_unused)
 {
+	int ret;
 	struct virtio_net *dev = *pdev;
-	int fd = msg->fds[0];
 
-	if (fd < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-				"Invalid file descriptor for slave channel (%d)\n",
-				fd);
-		return RTE_VHOST_MSG_RESULT_ERR;
-	}
+	ret = dev->trans_ops->set_slave_req_fd(dev, msg);
 
-	dev->slave_req_fd = fd;
+	if (ret < 0)
+		return RTE_VHOST_MSG_RESULT_ERR;
 
 	return RTE_VHOST_MSG_RESULT_OK;
 }
@@ -1778,21 +1769,6 @@ send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
 	return dev->trans_ops->send_reply(dev, msg);
 }
 
-static int
-send_vhost_slave_message(struct virtio_net *dev, struct VhostUserMsg *msg)
-{
-	int ret;
-
-	if (msg->flags & VHOST_USER_NEED_REPLY)
-		rte_spinlock_lock(&dev->slave_req_lock);
-
-	ret = dev->trans_ops->send_slave_req(dev, msg);
-	if (ret < 0 && (msg->flags & VHOST_USER_NEED_REPLY))
-		rte_spinlock_unlock(&dev->slave_req_lock);
-
-	return ret;
-}
-
 /*
  * Allocate a queue pair if it hasn't been allocated yet
  */
@@ -2069,35 +2045,6 @@ vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_)
 	return 0;
 }
 
-static int process_slave_message_reply(struct virtio_net *dev,
-				       const struct VhostUserMsg *msg)
-{
-	struct VhostUserMsg msg_reply;
-	int ret;
-
-	if ((msg->flags & VHOST_USER_NEED_REPLY) == 0)
-		return 0;
-
-	if (read_vhost_message(dev->slave_req_fd, &msg_reply) < 0) {
-		ret = -1;
-		goto out;
-	}
-
-	if (msg_reply.request.slave != msg->request.slave) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"Received unexpected msg type (%u), expected %u\n",
-			msg_reply.request.slave, msg->request.slave);
-		ret = -1;
-		goto out;
-	}
-
-	ret = msg_reply.payload.u64 ? -1 : 0;
-
-out:
-	rte_spinlock_unlock(&dev->slave_req_lock);
-	return ret;
-}
-
 int
 vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 {
@@ -2113,7 +2060,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 		},
 	};
 
-	ret = send_vhost_slave_req(dev, &msg);
+	ret = dev->trans_ops->send_slave_req(dev, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 				"Failed to send IOTLB miss message (%d)\n",
@@ -2148,14 +2095,14 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 		msg.fd_num = 1;
 	}
 
-	ret = send_vhost_slave_message(dev, &msg);
+	ret = dev->trans_ops->send_slave_req(dev, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to set host notifier (%d)\n", ret);
 		return ret;
 	}
 
-	return process_slave_message_reply(dev, &msg);
+	return dev->trans_ops->process_slave_message_reply(dev, &msg);
 }
 
 int rte_vhost_host_notifier_ctrl(int vid, bool enable)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 13/28] vhost: move mmap/munmap
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (11 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 12/28] vhost: move slave request fd and lock Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 14/28] vhost: move setup of the log memory region Nikos Dragazis
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Mapping the vhost memory regions is a transport-specific operation, so
this patch moves the relevant code into trans_af_unix.c.  The new
.map_mem_table()/.unmap_mem_table() interfaces allow transports to
perform the mapping and unmapping.

In addition, the function vhost_user_set_mem_table(), which performs the
mmaping, contains some code for postcopy live migration. However,
postcopy live migration is an AF_UNIX-bound feature, due to the
userfaultfd mechanism.  The virtio-vhost-user transport, which will be
added in later patches, cannot support it. Therefore, we move this code
into trans_af_unix.c as well.

The vhost_user_set_mem_table() debug logs have also been moved to the
.map_mem_table(). All new .map_mem_table() interfaces have to implement
the debug logs. This is necessary in order to keep the ordering of the
log messages in case of postcopy live migration.

Last but not least, after refactoring vhost_user_set_mem_table(),
read_vhost_message() is no longer being used in vhost_user.c. So, mark
it as static in trans_af_unix.c.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/trans_af_unix.c | 185 ++++++++++++++++++++++++++++++++++++++-
 lib/librte_vhost/vhost.h         |  22 +++++
 lib/librte_vhost/vhost_user.c    | 171 ++++--------------------------------
 lib/librte_vhost/vhost_user.h    |   3 +
 4 files changed, 225 insertions(+), 156 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 5f9ef5a..522823f 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -5,7 +5,14 @@
  */
 
 #include <sys/socket.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
 #include <sys/un.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+#include <linux/userfaultfd.h>
+#endif
 #include <fcntl.h>
 
 #include <rte_log.h>
@@ -43,7 +50,7 @@ struct af_unix_socket {
 	struct sockaddr_un un;
 };
 
-int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
+static int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
 static int create_unix_socket(struct vhost_user_socket *vsocket);
 static int vhost_user_start_server(struct vhost_user_socket *vsocket);
 static int vhost_user_start_client(struct vhost_user_socket *vsocket);
@@ -317,7 +324,7 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
 }
 
 /* return bytes# of read on success or negative val on failure. */
-int
+static int
 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 {
 	int ret;
@@ -771,6 +778,178 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 	return 0;
 }
 
+static uint64_t
+get_blk_size(int fd)
+{
+	struct stat stat;
+	int ret;
+
+	ret = fstat(fd, &stat);
+	return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
+}
+
+static int
+af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	uint32_t i;
+	struct VhostUserMemory *memory = &msg->payload.memory;
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+		uint64_t mmap_size = reg->mmap_size;
+		uint64_t mmap_offset = mmap_size - reg->size;
+		uint64_t alignment;
+		void *mmap_addr;
+		int populate;
+
+		/* mmap() without flag of MAP_ANONYMOUS, should be called
+		 * with length argument aligned with hugepagesz at older
+		 * longterm version Linux, like 2.6.32 and 3.2.72, or
+		 * mmap() will fail with EINVAL.
+		 *
+		 * to avoid failure, make sure in caller to keep length
+		 * aligned.
+		 */
+		alignment = get_blk_size(reg->fd);
+		if (alignment == (uint64_t)-1) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"couldn't get hugepage size through fstat\n");
+			return -1;
+		}
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
+
+		populate = (dev->dequeue_zero_copy) ? MAP_POPULATE : 0;
+		mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
+				 MAP_SHARED | populate, reg->fd, 0);
+
+		if (mmap_addr == MAP_FAILED) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"mmap region %u failed.\n", i);
+			return -1;
+		}
+
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr +
+				      mmap_offset;
+
+		if (dev->dequeue_zero_copy)
+			if (add_guest_pages(dev, reg, alignment) < 0) {
+				RTE_LOG(ERR, VHOST_CONFIG,
+					"adding guest pages to region %u failed.\n",
+					i);
+				return -1;
+			}
+
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap align: 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)reg->mmap_addr,
+			reg->mmap_size,
+			alignment,
+			mmap_offset);
+
+		if (dev->postcopy_listening) {
+			/*
+			 * We haven't a better way right now than sharing
+			 * DPDK's virtual address with Qemu, so that Qemu can
+			 * retrieve the region offset when handling userfaults.
+			 */
+			memory->regions[i].userspace_addr =
+				reg->host_user_addr;
+		}
+	}
+
+	if (dev->postcopy_listening) {
+		/* Send the addresses back to qemu */
+		msg->fd_num = 0;
+		/* Send reply */
+		msg->flags &= ~VHOST_USER_VERSION_MASK;
+		msg->flags &= ~VHOST_USER_NEED_REPLY;
+		msg->flags |= VHOST_USER_VERSION;
+		msg->flags |= VHOST_USER_REPLY_MASK;
+		af_unix_send_reply(dev, msg);
+
+		/* Wait for qemu to acknolwedge it's got the addresses
+		 * we've got to wait before we're allowed to generate faults.
+		 */
+		VhostUserMsg ack_msg;
+		if (read_vhost_message(conn->connfd, &ack_msg) <= 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to read qemu ack on postcopy set-mem-table\n");
+			return -1;
+		}
+		if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Bad qemu ack on postcopy set-mem-table (%d)\n",
+				ack_msg.request.master);
+			return -1;
+		}
+
+		/* Now userfault register and we can use the memory */
+		for (i = 0; i < memory->nregions; i++) {
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+			struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+			struct uffdio_register reg_struct;
+
+			/*
+			 * Let's register all the mmap'ed area to ensure
+			 * alignment on page boundary.
+			 */
+			reg_struct.range.start =
+				(uint64_t)(uintptr_t)reg->mmap_addr;
+			reg_struct.range.len = reg->mmap_size;
+			reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+			if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
+						&reg_struct)) {
+				RTE_LOG(ERR, VHOST_CONFIG,
+					"Failed to register ufd for region %d: (ufd = %d) %s\n",
+					i, dev->postcopy_ufd,
+					strerror(errno));
+				return -1;
+			}
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"\t userfaultfd registered for range : %llx - %llx\n",
+				reg_struct.range.start,
+				reg_struct.range.start +
+				reg_struct.range.len - 1);
+#else
+			return -1;
+#endif
+		}
+	}
+
+	return 0;
+}
+
+static void
+af_unix_unmap_mem_regions(struct virtio_net *dev)
+{
+	uint32_t i;
+	struct rte_vhost_mem_region *reg;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
+		}
+	}
+}
+
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.device_size = sizeof(struct vhost_user_connection),
@@ -783,4 +962,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.send_slave_req = af_unix_send_slave_req,
 	.process_slave_message_reply = af_unix_process_slave_message_reply,
 	.set_slave_req_fd = af_unix_set_slave_req_fd,
+	.map_mem_regions = af_unix_map_mem_regions,
+	.unmap_mem_regions = af_unix_unmap_mem_regions,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 2213fbe..28038c6 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -415,6 +415,28 @@ struct vhost_transport_ops {
 	 */
 	int (*set_slave_req_fd)(struct virtio_net *dev,
 				struct VhostUserMsg *msg);
+
+	/**
+	 * Map memory table regions in dev->mem->regions[].
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*map_mem_regions)(struct virtio_net *dev,
+				struct VhostUserMsg *msg);
+
+	/**
+	 * Unmap memory table regions in dev->mem->regions[] and free any
+	 * resources, such as file descriptors.
+	 *
+	 * @param dev
+	 *  vhost device
+	 */
+	void (*unmap_mem_regions)(struct virtio_net *dev);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a4dcba1..ed8dbd8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -81,17 +81,6 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 };
 
 static int send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg);
-int read_vhost_message(int sockfd, struct VhostUserMsg *msg);
-
-static uint64_t
-get_blk_size(int fd)
-{
-	struct stat stat;
-	int ret;
-
-	ret = fstat(fd, &stat);
-	return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
-}
 
 /*
  * Reclaim all the outstanding zmbufs for a virtqueue.
@@ -120,7 +109,6 @@ static void
 free_mem_region(struct virtio_net *dev)
 {
 	uint32_t i;
-	struct rte_vhost_mem_region *reg;
 	struct vhost_virtqueue *vq;
 
 	if (!dev || !dev->mem)
@@ -134,13 +122,7 @@ free_mem_region(struct virtio_net *dev)
 		}
 	}
 
-	for (i = 0; i < dev->mem->nregions; i++) {
-		reg = &dev->mem->regions[i];
-		if (reg->host_user_addr) {
-			munmap(reg->mmap_addr, reg->mmap_size);
-			close(reg->fd);
-		}
-	}
+	dev->trans_ops->unmap_mem_regions(dev);
 }
 
 void
@@ -792,7 +774,7 @@ add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr,
 	return 0;
 }
 
-static int
+int
 add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg,
 		uint64_t page_size)
 {
@@ -881,18 +863,13 @@ vhost_memory_changed(struct VhostUserMemory *new,
 
 static int
 vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd)
+			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory *memory = &msg->payload.memory;
 	struct rte_vhost_mem_region *reg;
-	void *mmap_addr;
-	uint64_t mmap_size;
 	uint64_t mmap_offset;
-	uint64_t alignment;
 	uint32_t i;
-	int populate;
-	int fd;
 
 	if (memory->nregions > VHOST_MEMORY_MAX_NREGIONS) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -904,8 +881,11 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 		RTE_LOG(INFO, VHOST_CONFIG,
 			"(%d) memory regions not changed\n", dev->vid);
 
-		for (i = 0; i < memory->nregions; i++)
-			close(msg->fds[i]);
+		for (i = 0; i < memory->nregions; i++) {
+			if (msg->fds[i] >= 0) {
+				close(msg->fds[i]);
+			}
+		}
 
 		return RTE_VHOST_MSG_RESULT_OK;
 	}
@@ -946,13 +926,15 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	dev->mem->nregions = memory->nregions;
 
 	for (i = 0; i < memory->nregions; i++) {
-		fd  = msg->fds[i];
 		reg = &dev->mem->regions[i];
 
 		reg->guest_phys_addr = memory->regions[i].guest_phys_addr;
 		reg->guest_user_addr = memory->regions[i].userspace_addr;
 		reg->size            = memory->regions[i].memory_size;
-		reg->fd              = fd;
+		reg->mmap_size       = reg->size + memory->regions[i].mmap_offset;
+		reg->mmap_addr       = NULL;
+		reg->host_user_addr  = 0;
+		reg->fd              = msg->fds[i];
 
 		mmap_offset = memory->regions[i].mmap_offset;
 
@@ -962,132 +944,13 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 				"mmap_offset (%#"PRIx64") and memory_size "
 				"(%#"PRIx64") overflow\n",
 				mmap_offset, reg->size);
-			goto err_mmap;
-		}
-
-		mmap_size = reg->size + mmap_offset;
-
-		/* mmap() without flag of MAP_ANONYMOUS, should be called
-		 * with length argument aligned with hugepagesz at older
-		 * longterm version Linux, like 2.6.32 and 3.2.72, or
-		 * mmap() will fail with EINVAL.
-		 *
-		 * to avoid failure, make sure in caller to keep length
-		 * aligned.
-		 */
-		alignment = get_blk_size(fd);
-		if (alignment == (uint64_t)-1) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"couldn't get hugepage size through fstat\n");
-			goto err_mmap;
-		}
-		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
-
-		populate = (dev->dequeue_zero_copy) ? MAP_POPULATE : 0;
-		mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
-				 MAP_SHARED | populate, fd, 0);
-
-		if (mmap_addr == MAP_FAILED) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap region %u failed.\n", i);
-			goto err_mmap;
+			goto err;
 		}
 
-		reg->mmap_addr = mmap_addr;
-		reg->mmap_size = mmap_size;
-		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
-				      mmap_offset;
-
-		if (dev->dequeue_zero_copy)
-			if (add_guest_pages(dev, reg, alignment) < 0) {
-				RTE_LOG(ERR, VHOST_CONFIG,
-					"adding guest pages to region %u failed.\n",
-					i);
-				goto err_mmap;
-			}
-
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"guest memory region %u, size: 0x%" PRIx64 "\n"
-			"\t guest physical addr: 0x%" PRIx64 "\n"
-			"\t guest virtual  addr: 0x%" PRIx64 "\n"
-			"\t host  virtual  addr: 0x%" PRIx64 "\n"
-			"\t mmap addr : 0x%" PRIx64 "\n"
-			"\t mmap size : 0x%" PRIx64 "\n"
-			"\t mmap align: 0x%" PRIx64 "\n"
-			"\t mmap off  : 0x%" PRIx64 "\n",
-			i, reg->size,
-			reg->guest_phys_addr,
-			reg->guest_user_addr,
-			reg->host_user_addr,
-			(uint64_t)(uintptr_t)mmap_addr,
-			mmap_size,
-			alignment,
-			mmap_offset);
-
-		if (dev->postcopy_listening) {
-			/*
-			 * We haven't a better way right now than sharing
-			 * DPDK's virtual address with Qemu, so that Qemu can
-			 * retrieve the region offset when handling userfaults.
-			 */
-			memory->regions[i].userspace_addr =
-				reg->host_user_addr;
-		}
 	}
-	if (dev->postcopy_listening) {
-		/* Send the addresses back to qemu */
-		msg->fd_num = 0;
-		send_vhost_reply(dev, msg);
-
-		/* Wait for qemu to acknolwedge it's got the addresses
-		 * we've got to wait before we're allowed to generate faults.
-		 */
-		VhostUserMsg ack_msg;
-		if (read_vhost_message(main_fd, &ack_msg) <= 0) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"Failed to read qemu ack on postcopy set-mem-table\n");
-			goto err_mmap;
-		}
-		if (ack_msg.request.master != VHOST_USER_SET_MEM_TABLE) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"Bad qemu ack on postcopy set-mem-table (%d)\n",
-				ack_msg.request.master);
-			goto err_mmap;
-		}
-
-		/* Now userfault register and we can use the memory */
-		for (i = 0; i < memory->nregions; i++) {
-#ifdef RTE_LIBRTE_VHOST_POSTCOPY
-			reg = &dev->mem->regions[i];
-			struct uffdio_register reg_struct;
 
-			/*
-			 * Let's register all the mmap'ed area to ensure
-			 * alignment on page boundary.
-			 */
-			reg_struct.range.start =
-				(uint64_t)(uintptr_t)reg->mmap_addr;
-			reg_struct.range.len = reg->mmap_size;
-			reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
-
-			if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
-						&reg_struct)) {
-				RTE_LOG(ERR, VHOST_CONFIG,
-					"Failed to register ufd for region %d: (ufd = %d) %s\n",
-					i, dev->postcopy_ufd,
-					strerror(errno));
-				goto err_mmap;
-			}
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"\t userfaultfd registered for range : %llx - %llx\n",
-				reg_struct.range.start,
-				reg_struct.range.start +
-				reg_struct.range.len - 1);
-#else
-			goto err_mmap;
-#endif
-		}
-	}
+	if (dev->trans_ops->map_mem_regions(dev, msg) < 0)
+		goto err;
 
 	for (i = 0; i < dev->nr_vring; i++) {
 		struct vhost_virtqueue *vq = dev->virtqueue[i];
@@ -1103,7 +966,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 			dev = translate_ring_addresses(dev, i);
 			if (!dev) {
 				dev = *pdev;
-				goto err_mmap;
+				goto err;
 			}
 
 			*pdev = dev;
@@ -1114,7 +977,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 	return RTE_VHOST_MSG_RESULT_OK;
 
-err_mmap:
+err:
 	free_mem_region(dev);
 	rte_free(dev->mem);
 	dev->mem = NULL;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 0169bd2..200e47b 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -147,6 +147,9 @@ typedef struct VhostUserMsg {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg);
+int add_guest_pages(struct virtio_net *dev,
+		   struct rte_vhost_mem_region *reg,
+		   uint64_t page_size);
 int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 #endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 14/28] vhost: move setup of the log memory region
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (12 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 13/28] vhost: move mmap/munmap Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 15/28] vhost: remove main fd parameter from msg handlers Nikos Dragazis
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Setting up the log memory region involves mapping/unmapping guest
memory. This is a transport-specific operation. Other transports may use
other means of accessing the guest memory log. Therefore, the mmap/unmap
operations, that are related to the memory log, are moved to
trans_af_unix.c. A new set_log_base() transport operation is introduced.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 lib/librte_vhost/trans_af_unix.c | 41 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h         | 13 +++++++++++++
 lib/librte_vhost/vhost_user.c    | 27 +-------------------------
 3 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 522823f..35b1c45 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -763,6 +763,11 @@ af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused)
 	struct vhost_user_connection *conn =
 		container_of(dev, struct vhost_user_connection, device);
 
+	if (dev->log_addr) {
+		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
+		dev->log_addr = 0;
+	}
+
 	if (conn->slave_req_fd >= 0) {
 		close(conn->slave_req_fd);
 		conn->slave_req_fd = -1;
@@ -950,6 +955,41 @@ af_unix_unmap_mem_regions(struct virtio_net *dev)
 	}
 }
 
+static int
+af_unix_set_log_base(struct virtio_net *dev, const struct VhostUserMsg *msg)
+{
+	int fd = msg->fds[0];
+	uint64_t size, off;
+	void *addr;
+
+	size = msg->payload.log.mmap_size;
+	off  = msg->payload.log.mmap_offset;
+
+	/*
+	 * mmap from 0 to workaround a hugepage mmap bug: mmap will
+	 * fail when offset is not page size aligned.
+	 */
+	addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	close(fd);
+	if (addr == MAP_FAILED) {
+		RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n");
+		return -1;
+	}
+
+	/*
+	 * Free previously mapped log memory on occasionally
+	 * multiple VHOST_USER_SET_LOG_BASE.
+	 */
+	if (dev->log_addr) {
+		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
+	}
+	dev->log_addr = (uint64_t)(uintptr_t)addr;
+	dev->log_base = dev->log_addr + off;
+	dev->log_size = size;
+
+	return 0;
+}
+
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.device_size = sizeof(struct vhost_user_connection),
@@ -964,4 +1004,5 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.set_slave_req_fd = af_unix_set_slave_req_fd,
 	.map_mem_regions = af_unix_map_mem_regions,
 	.unmap_mem_regions = af_unix_unmap_mem_regions,
+	.set_log_base = af_unix_set_log_base,
 };
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 28038c6..b15d223 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -437,6 +437,19 @@ struct vhost_transport_ops {
 	 *  vhost device
 	 */
 	void (*unmap_mem_regions)(struct virtio_net *dev);
+
+	/**
+	 * Setup the log memory region.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*set_log_base)(struct virtio_net *dev,
+			    const struct VhostUserMsg *msg);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ed8dbd8..acb1135 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -137,11 +137,6 @@ vhost_backend_cleanup(struct virtio_net *dev)
 	free(dev->guest_pages);
 	dev->guest_pages = NULL;
 
-	if (dev->log_addr) {
-		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
-		dev->log_addr = 0;
-	}
-
 	if (dev->postcopy_ufd >= 0) {
 		close(dev->postcopy_ufd);
 		dev->postcopy_ufd = -1;
@@ -1275,7 +1270,6 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	struct virtio_net *dev = *pdev;
 	int fd = msg->fds[0];
 	uint64_t size, off;
-	void *addr;
 
 	if (fd < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG, "invalid log fd: %d\n", fd);
@@ -1304,27 +1298,8 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 		"log mmap size: %"PRId64", offset: %"PRId64"\n",
 		size, off);
 
-	/*
-	 * mmap from 0 to workaround a hugepage mmap bug: mmap will
-	 * fail when offset is not page size aligned.
-	 */
-	addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	close(fd);
-	if (addr == MAP_FAILED) {
-		RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n");
+	if (dev->trans_ops->set_log_base(dev, msg) < 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
-	}
-
-	/*
-	 * Free previously mapped log memory on occasionally
-	 * multiple VHOST_USER_SET_LOG_BASE.
-	 */
-	if (dev->log_addr) {
-		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
-	}
-	dev->log_addr = (uint64_t)(uintptr_t)addr;
-	dev->log_base = dev->log_addr + off;
-	dev->log_size = size;
 
 	/*
 	 * The spec is not clear about it (yet), but QEMU doesn't expect
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 15/28] vhost: remove main fd parameter from msg handlers
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (13 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 14/28] vhost: move setup of the log memory region Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 16/28] vhost: move postcopy live migration code Nikos Dragazis
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

After refactoring the socket I/O and the vhost-user map/unmap operations
in previous patches, there is no need for the connection fds in the core
vhost-user code. This patch removes the connection fds from the core
vhost-user code.

Connection fds are used for socket I/O between master and slave.
However, this mechanism is transport-specific. Other transports may use
other mechanisms for the master/slave communication. Therefore, the
connection fds are moved into the AF_UNIX transport code.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 lib/librte_vhost/trans_af_unix.c |  2 +-
 lib/librte_vhost/vhost_user.c    | 82 ++++++++++++++--------------------------
 lib/librte_vhost/vhost_user.h    |  2 +-
 3 files changed, 30 insertions(+), 56 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 35b1c45..a451880 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -374,7 +374,7 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 		goto err;
 	}
 
-	ret = vhost_user_msg_handler(conn->device.vid, connfd, &msg);
+	ret = vhost_user_msg_handler(conn->device.vid, &msg);
 	if (ret < 0) {
 err:
 		close(connfd);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index acb1135..d3c9c5f 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -151,16 +151,14 @@ vhost_backend_cleanup(struct virtio_net *dev)
  */
 static int
 vhost_user_set_owner(struct virtio_net **pdev __rte_unused,
-			struct VhostUserMsg *msg __rte_unused,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg __rte_unused)
 {
 	return RTE_VHOST_MSG_RESULT_OK;
 }
 
 static int
 vhost_user_reset_owner(struct virtio_net **pdev,
-			struct VhostUserMsg *msg __rte_unused,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 	vhost_destroy_device_notify(dev);
@@ -174,8 +172,7 @@ vhost_user_reset_owner(struct virtio_net **pdev,
  * The features that we support are requested.
  */
 static int
-vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features = 0;
@@ -193,8 +190,7 @@ vhost_user_get_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
  * The queue number that we support are requested.
  */
 static int
-vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint32_t queue_num = 0;
@@ -212,8 +208,7 @@ vhost_user_get_queue_num(struct virtio_net **pdev, struct VhostUserMsg *msg,
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features = msg->payload.u64;
@@ -295,8 +290,7 @@ vhost_user_set_features(struct virtio_net **pdev, struct VhostUserMsg *msg,
  */
 static int
 vhost_user_set_vring_num(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -665,8 +659,7 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq;
@@ -703,8 +696,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, struct VhostUserMsg *msg,
  */
 static int
 vhost_user_set_vring_base(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -857,8 +849,7 @@ vhost_memory_changed(struct VhostUserMemory *new,
 }
 
 static int
-vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory *memory = &msg->payload.memory;
@@ -1019,8 +1010,7 @@ virtio_is_ready(struct virtio_net *dev)
 }
 
 static int
-vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_vring_file file;
@@ -1044,8 +1034,7 @@ vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
 }
 
 static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	if (!(msg->payload.u64 & VHOST_USER_VRING_NOFD_MASK))
 		close(msg->fds[0]);
@@ -1055,8 +1044,7 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev __rte_unused,
 }
 
 static int
-vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_vring_file file;
@@ -1111,8 +1099,7 @@ free_zmbufs(struct vhost_virtqueue *vq)
  */
 static int
 vhost_user_get_vring_base(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_virtqueue *vq = dev->virtqueue[msg->payload.state.index];
@@ -1182,8 +1169,7 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
  */
 static int
 vhost_user_set_vring_enable(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
@@ -1215,8 +1201,7 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
 
 static int
 vhost_user_get_protocol_features(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t features, protocol_features;
@@ -1242,8 +1227,7 @@ vhost_user_get_protocol_features(struct virtio_net **pdev,
 
 static int
 vhost_user_set_protocol_features(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint64_t protocol_features = msg->payload.u64;
@@ -1264,8 +1248,7 @@ vhost_user_set_protocol_features(struct virtio_net **pdev,
 }
 
 static int
-vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	int fd = msg->fds[0];
@@ -1312,8 +1295,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 }
 
 static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	close(msg->fds[0]);
 	RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
@@ -1330,8 +1312,7 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev __rte_unused,
  * a flag 'broadcast_rarp' to let rte_vhost_dequeue_burst() inject it.
  */
 static int
-vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	uint8_t *mac = (uint8_t *)&msg->payload.u64;
@@ -1361,8 +1342,7 @@ vhost_user_send_rarp(struct virtio_net **pdev, struct VhostUserMsg *msg,
 }
 
 static int
-vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	if (msg->payload.u64 < VIRTIO_MIN_MTU ||
@@ -1379,8 +1359,7 @@ vhost_user_net_set_mtu(struct virtio_net **pdev, struct VhostUserMsg *msg,
 }
 
 static int
-vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_set_req_fd(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	int ret;
 	struct virtio_net *dev = *pdev;
@@ -1443,8 +1422,7 @@ is_vring_iotlb_invalidate(struct vhost_virtqueue *vq,
 }
 
 static int
-vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
@@ -1490,8 +1468,7 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg,
 
 static int
 vhost_user_set_postcopy_advise(struct virtio_net **pdev,
-			struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 #ifdef RTE_LIBRTE_VHOST_POSTCOPY
@@ -1527,8 +1504,7 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
 
 static int
 vhost_user_set_postcopy_listen(struct virtio_net **pdev,
-			struct VhostUserMsg *msg __rte_unused,
-			int main_fd __rte_unused)
+			struct VhostUserMsg *msg __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
 
@@ -1543,8 +1519,7 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev,
 }
 
 static int
-vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg,
-			int main_fd __rte_unused)
+vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 
@@ -1562,8 +1537,7 @@ vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg,
 }
 
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
-					struct VhostUserMsg *msg,
-					int main_fd);
+					struct VhostUserMsg *msg);
 static vhost_message_handler_t vhost_message_handlers[VHOST_USER_MAX] = {
 	[VHOST_USER_NONE] = NULL,
 	[VHOST_USER_GET_FEATURES] = vhost_user_get_features,
@@ -1681,7 +1655,7 @@ vhost_user_unlock_all_queue_pairs(struct virtio_net *dev)
 }
 
 int
-vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_)
+vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg_)
 {
 	struct VhostUserMsg msg = *msg_; /* copy so we can build the reply */
 	struct virtio_net *dev;
@@ -1785,7 +1759,7 @@ vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg_)
 	if (request > VHOST_USER_NONE && request < VHOST_USER_MAX) {
 		if (!vhost_message_handlers[request])
 			goto skip_to_post_handle;
-		ret = vhost_message_handlers[request](&dev, &msg, fd);
+		ret = vhost_message_handlers[request](&dev, &msg);
 
 		switch (ret) {
 		case RTE_VHOST_MSG_RESULT_ERR:
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 200e47b..4cc912d 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -146,7 +146,7 @@ typedef struct VhostUserMsg {
 
 
 /* vhost_user.c */
-int vhost_user_msg_handler(int vid, int fd, const struct VhostUserMsg *msg);
+int vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg);
 int add_guest_pages(struct virtio_net *dev,
 		   struct rte_vhost_mem_region *reg,
 		   uint64_t page_size);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 16/28] vhost: move postcopy live migration code
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (14 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 15/28] vhost: remove main fd parameter from msg handlers Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 17/28] vhost: support registering additional vhost-user transports Nikos Dragazis
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Postcopy live migration is an AF_UNIX-bound feature due to the
userfaultfd mechanism. Therefore, this patch moves the relevant code from
vhost_user.c to trans_af_unix.c and exposes this functionality via
transport-specific functions. Any other vhost-user transport
could potentially implement this feature by implementing these
transport-specific functions.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 lib/librte_vhost/trans_af_unix.c | 94 ++++++++++++++++++++++++++++++++++++++--
 lib/librte_vhost/vhost.c         |  1 -
 lib/librte_vhost/vhost.h         | 41 ++++++++++++++++--
 lib/librte_vhost/vhost_user.c    | 61 ++------------------------
 4 files changed, 131 insertions(+), 66 deletions(-)

diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index a451880..4ccf9a7 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -10,6 +10,7 @@
 #include <sys/un.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>
+#include <sys/syscall.h>
 #ifdef RTE_LIBRTE_VHOST_POSTCOPY
 #include <linux/userfaultfd.h>
 #endif
@@ -39,6 +40,9 @@ struct vhost_user_connection {
 	int slave_req_fd;
 	rte_spinlock_t slave_req_lock;
 
+	int postcopy_ufd;
+	int postcopy_listening;
+
 	TAILQ_ENTRY(vhost_user_connection) next;
 };
 
@@ -261,6 +265,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	conn->slave_req_fd = -1;
 	conn->vsocket = vsocket;
 	rte_spinlock_init(&conn->slave_req_lock);
+	conn->postcopy_ufd = -1;
 
 	size = strnlen(vsocket->path, PATH_MAX);
 	vhost_set_ifname(dev->vid, vsocket->path, size);
@@ -772,6 +777,13 @@ af_unix_cleanup_device(struct virtio_net *dev, int destroy __rte_unused)
 		close(conn->slave_req_fd);
 		conn->slave_req_fd = -1;
 	}
+
+	if (conn->postcopy_ufd >= 0) {
+		close(conn->postcopy_ufd);
+		conn->postcopy_ufd = -1;
+	}
+
+	conn->postcopy_listening = 0;
 }
 
 static int
@@ -866,7 +878,7 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg)
 			alignment,
 			mmap_offset);
 
-		if (dev->postcopy_listening) {
+		if (conn->postcopy_listening) {
 			/*
 			 * We haven't a better way right now than sharing
 			 * DPDK's virtual address with Qemu, so that Qemu can
@@ -877,7 +889,7 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg)
 		}
 	}
 
-	if (dev->postcopy_listening) {
+	if (conn->postcopy_listening) {
 		/* Send the addresses back to qemu */
 		msg->fd_num = 0;
 		/* Send reply */
@@ -918,11 +930,11 @@ af_unix_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg)
 			reg_struct.range.len = reg->mmap_size;
 			reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
 
-			if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER,
+			if (ioctl(conn->postcopy_ufd, UFFDIO_REGISTER,
 						&reg_struct)) {
 				RTE_LOG(ERR, VHOST_CONFIG,
 					"Failed to register ufd for region %d: (ufd = %d) %s\n",
-					i, dev->postcopy_ufd,
+					i, conn->postcopy_ufd,
 					strerror(errno));
 				return -1;
 			}
@@ -990,6 +1002,77 @@ af_unix_set_log_base(struct virtio_net *dev, const struct VhostUserMsg *msg)
 	return 0;
 }
 
+static int
+af_unix_set_postcopy_advise(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+#ifdef RTE_LIBRTE_VHOST_POSTCOPY
+	struct uffdio_api api_struct;
+
+	conn->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+
+	if (conn->postcopy_ufd == -1) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n",
+			strerror(errno));
+		return RTE_VHOST_MSG_RESULT_ERR;
+	}
+	api_struct.api = UFFD_API;
+	api_struct.features = 0;
+	if (ioctl(conn->postcopy_ufd, UFFDIO_API, &api_struct)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n",
+			strerror(errno));
+		close(conn->postcopy_ufd);
+		conn->postcopy_ufd = -1;
+		return RTE_VHOST_MSG_RESULT_ERR;
+	}
+	msg->fds[0] = conn->postcopy_ufd;
+	msg->fd_num = 1;
+
+	return RTE_VHOST_MSG_RESULT_REPLY;
+#else
+	conn->postcopy_ufd = -1;
+	msg->fd_num = 0;
+
+	return RTE_VHOST_MSG_RESULT_ERR;
+#endif
+}
+
+static int
+af_unix_set_postcopy_listen(struct virtio_net *dev)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	if (dev->mem && dev->mem->nregions) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Regions already registered at postcopy-listen\n");
+		return RTE_VHOST_MSG_RESULT_ERR;
+	}
+	conn->postcopy_listening = 1;
+
+	return RTE_VHOST_MSG_RESULT_OK;
+}
+
+static int
+af_unix_set_postcopy_end(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	conn->postcopy_listening = 0;
+	if (conn->postcopy_ufd >= 0) {
+		close(conn->postcopy_ufd);
+		conn->postcopy_ufd = -1;
+	}
+
+	msg->payload.u64 = 0;
+	msg->size = sizeof(msg->payload.u64);
+	msg->fd_num = 0;
+
+	return RTE_VHOST_MSG_RESULT_REPLY;
+}
+
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.device_size = sizeof(struct vhost_user_connection),
@@ -1005,4 +1088,7 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.map_mem_regions = af_unix_map_mem_regions,
 	.unmap_mem_regions = af_unix_unmap_mem_regions,
 	.set_log_base = af_unix_set_log_base,
+	.set_postcopy_advise = af_unix_set_postcopy_advise,
+	.set_postcopy_listen = af_unix_set_postcopy_listen,
+	.set_postcopy_end = af_unix_set_postcopy_end,
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 5b16390..91a286d 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -512,7 +512,6 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
 	dev->trans_ops = trans_ops;
 	dev->vdpa_dev_id = -1;
-	dev->postcopy_ufd = -1;
 
 	return dev;
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b15d223..f5d6dc8 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -450,6 +450,44 @@ struct vhost_transport_ops {
 	 */
 	int (*set_log_base)(struct virtio_net *dev,
 			    const struct VhostUserMsg *msg);
+
+	/**
+	 * Register a userfault fd and send it to master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  RTE_VHOST_MSG_RESULT_REPLY on success,
+	 *  RTE_VHOST_MSG_RESULT_ERR on failure
+	 */
+	int (*set_postcopy_advise)(struct virtio_net *dev,
+				   struct VhostUserMsg *msg);
+
+	/**
+	 * Change live migration mode (entering postcopy mode).
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @return
+	 *  RTE_VHOST_MSG_RESULT_OK on success,
+	 *  RTE_VHOST_MSG_RESULT_ERR on failure
+	 */
+	int (*set_postcopy_listen)(struct virtio_net *dev);
+
+	/**
+	 * Register completion of postcopy live migration.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  RTE_VHOST_MSG_RESULT_REPLY
+	 */
+	int (*set_postcopy_end)(struct virtio_net *dev,
+				struct VhostUserMsg *msg);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
@@ -492,9 +530,6 @@ struct virtio_net {
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
 
-	int			postcopy_ufd;
-	int			postcopy_listening;
-
 	/*
 	 * Device id to identify a specific backend device.
 	 * It's set to -1 for the default software implementation.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index d3c9c5f..29c99e7 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -29,14 +29,10 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <sys/stat.h>
-#include <sys/syscall.h>
 #include <assert.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
 #include <numaif.h>
 #endif
-#ifdef RTE_LIBRTE_VHOST_POSTCOPY
-#include <linux/userfaultfd.h>
-#endif
 
 #include <rte_common.h>
 #include <rte_malloc.h>
@@ -136,13 +132,6 @@ vhost_backend_cleanup(struct virtio_net *dev)
 
 	free(dev->guest_pages);
 	dev->guest_pages = NULL;
-
-	if (dev->postcopy_ufd >= 0) {
-		close(dev->postcopy_ufd);
-		dev->postcopy_ufd = -1;
-	}
-
-	dev->postcopy_listening = 0;
 }
 
 /*
@@ -1471,35 +1460,8 @@ vhost_user_set_postcopy_advise(struct virtio_net **pdev,
 			struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
-#ifdef RTE_LIBRTE_VHOST_POSTCOPY
-	struct uffdio_api api_struct;
-
-	dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
-
-	if (dev->postcopy_ufd == -1) {
-		RTE_LOG(ERR, VHOST_CONFIG, "Userfaultfd not available: %s\n",
-			strerror(errno));
-		return RTE_VHOST_MSG_RESULT_ERR;
-	}
-	api_struct.api = UFFD_API;
-	api_struct.features = 0;
-	if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
-		RTE_LOG(ERR, VHOST_CONFIG, "UFFDIO_API ioctl failure: %s\n",
-			strerror(errno));
-		close(dev->postcopy_ufd);
-		dev->postcopy_ufd = -1;
-		return RTE_VHOST_MSG_RESULT_ERR;
-	}
-	msg->fds[0] = dev->postcopy_ufd;
-	msg->fd_num = 1;
-
-	return RTE_VHOST_MSG_RESULT_REPLY;
-#else
-	dev->postcopy_ufd = -1;
-	msg->fd_num = 0;
 
-	return RTE_VHOST_MSG_RESULT_ERR;
-#endif
+	return dev->trans_ops->set_postcopy_advise(dev, msg);
 }
 
 static int
@@ -1508,14 +1470,7 @@ vhost_user_set_postcopy_listen(struct virtio_net **pdev,
 {
 	struct virtio_net *dev = *pdev;
 
-	if (dev->mem && dev->mem->nregions) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"Regions already registered at postcopy-listen\n");
-		return RTE_VHOST_MSG_RESULT_ERR;
-	}
-	dev->postcopy_listening = 1;
-
-	return RTE_VHOST_MSG_RESULT_OK;
+	return dev->trans_ops->set_postcopy_listen(dev);
 }
 
 static int
@@ -1523,17 +1478,7 @@ vhost_user_postcopy_end(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
 
-	dev->postcopy_listening = 0;
-	if (dev->postcopy_ufd >= 0) {
-		close(dev->postcopy_ufd);
-		dev->postcopy_ufd = -1;
-	}
-
-	msg->payload.u64 = 0;
-	msg->size = sizeof(msg->payload.u64);
-	msg->fd_num = 0;
-
-	return RTE_VHOST_MSG_RESULT_REPLY;
+	return dev->trans_ops->set_postcopy_end(dev, msg);
 }
 
 typedef int (*vhost_message_handler_t)(struct virtio_net **pdev,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 17/28] vhost: support registering additional vhost-user transports
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (15 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 16/28] vhost: move postcopy live migration code Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework Nikos Dragazis
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

This patch introduces a global transport map which will hold pointers to
the transport-specific operations of all the available transports.
The AF_UNIX transport is supported by default. More transports can be
hooked up by implementing struct vhost_transport_ops and registering
this structure to the global transport map table. A new transport can be
registered with rte_vhost_register_transport(), which is part of
librtre_vhost public API.

This patch also exports vhost.h and vhost_user.h and some private
functions as part of librte_vhost public API. This allows implementing
vhost-user transports outside of lib/librte_vhost/.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 lib/librte_vhost/Makefile              |  2 +-
 lib/librte_vhost/rte_vhost_version.map | 11 +++++++++++
 lib/librte_vhost/socket.c              | 26 +++++++++++++++++++++++++-
 lib/librte_vhost/vhost.h               | 22 ++++++++++++++++++++++
 4 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 5ff5fb2..4f867ec 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -26,7 +26,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
 					vhost_user.c virtio_net.c vdpa.c trans_af_unix.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += vhost.h vhost_user.h rte_vhost.h rte_vdpa.h
 
 # only compile vhost crypto when cryptodev is enabled
 ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 5f1d4a7..9eda81f 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -60,6 +60,17 @@ DPDK_18.02 {
 
 } DPDK_17.08;
 
+DPDK_19.05 {
+	global:
+
+	rte_vhost_register_transport;
+	vhost_destroy_device;
+	vhost_new_device;
+	vhost_set_ifname;
+	vhost_user_msg_handler;
+
+} DPDK_18.02;
+
 EXPERIMENTAL {
 	global:
 
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index fc78b63..fe1c78d 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -317,7 +317,17 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 {
 	int ret = -1;
 	struct vhost_user_socket *vsocket;
-	const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops;
+	const struct vhost_transport_ops *trans_ops;
+
+	/* Register the AF_UNIX vhost-user transport in the transport map.
+	 * The AF_UNIX transport is supported by default.
+	 */
+	if (g_transport_map[VHOST_TRANSPORT_UNIX] == NULL) {
+		if (rte_vhost_register_transport(VHOST_TRANSPORT_UNIX, &af_unix_trans_ops) < 0)
+			goto out;
+	}
+
+	trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX];
 
 	if (!path)
 		return -1;
@@ -495,3 +505,17 @@ rte_vhost_driver_start(const char *path)
 
 	return vsocket->trans_ops->socket_start(vsocket);
 }
+
+int
+rte_vhost_register_transport(VhostUserTransport trans,
+		const struct vhost_transport_ops *trans_ops)
+{
+	if (trans >= VHOST_TRANSPORT_MAX) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Invalid vhost-user transport %d\n", trans);
+		return -1;
+	}
+
+	g_transport_map[trans] = trans_ops;
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index f5d6dc8..aba8d9b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -493,6 +493,28 @@ struct vhost_transport_ops {
 /** The traditional AF_UNIX vhost-user protocol transport. */
 extern const struct vhost_transport_ops af_unix_trans_ops;
 
+typedef enum VhostUserTransport {
+	VHOST_TRANSPORT_UNIX = 0,
+	VHOST_TRANSPORT_MAX = 1
+} VhostUserTransport;
+
+/* A list with all the available vhost-user transports. */
+const struct vhost_transport_ops *g_transport_map[VHOST_TRANSPORT_MAX];
+
+/**
+ * Register a new vhost-user transport in the transport map.
+ *
+ * @param trans
+ *  the transport that is going to be registered
+ * @param trans_ops
+ *  the transport operations supported by this transport
+ * @return
+ *  0 on success, -1 on failure
+ * */
+int
+rte_vhost_register_transport(VhostUserTransport trans,
+                const struct vhost_transport_ops *trans_ops);
+
 /**
  * Device structure contains all configuration information relating
  * to the device.
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (16 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 17/28] vhost: support registering additional vhost-user transports Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-09-05 16:34   ` Maxime Coquelin
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 19/28] vhost: add index field in vhost virtqueues Nikos Dragazis
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The virtio-vhost-user transport requires a driver for the
virtio-vhost-user PCI device, hence it needs a virtio-pci driver.  There
is currently no librte_virtio API that we can use.

This commit is a hack that duplicates the virtio pci code from
drivers/net/ into drivers/virtio_vhost_user/.  A better solution would
be to extract the code cleanly from drivers/net/ and share it.  Or
perhaps we could backport SPDK's lib/virtio/.

drivers/virtio_vhost_user/ will host the virtio-vhost-user transport
implementation in the upcoming patches.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/virtio_vhost_user/virtio_pci.c | 504 +++++++++++++++++++++++++++++++++
 drivers/virtio_vhost_user/virtio_pci.h | 270 ++++++++++++++++++
 drivers/virtio_vhost_user/virtqueue.h  | 181 ++++++++++++
 3 files changed, 955 insertions(+)
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
 create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
 create mode 100644 drivers/virtio_vhost_user/virtqueue.h

diff --git a/drivers/virtio_vhost_user/virtio_pci.c b/drivers/virtio_vhost_user/virtio_pci.c
new file mode 100644
index 0000000..9c2c981
--- /dev/null
+++ b/drivers/virtio_vhost_user/virtio_pci.c
@@ -0,0 +1,504 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+#include <stdint.h>
+
+/* XXX This file is based on drivers/net/virtio/virtio_pci.c.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifdef RTE_EXEC_ENV_LINUX
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "vring address shouldn't be above 16TB!\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_hw *hw, struct virtqueue *vq, uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "queue %u addresses:\n", vq->vq_queue_index);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t desc_addr: %" PRIx64 "\n", desc_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t aval_addr: %" PRIx64 "\n", avail_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t used_addr: %" PRIx64 "\n", used_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t notify addr: %p (notify offset: %u)\n",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_pci_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+
+void
+virtio_pci_read_dev_config(struct virtio_hw *hw, size_t offset,
+		      void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+virtio_pci_write_dev_config(struct virtio_hw *hw, size_t offset,
+		       const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+virtio_pci_negotiate_features(struct virtio_hw *hw, uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+virtio_pci_reset(struct virtio_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+virtio_pci_reinit_complete(struct virtio_hw *hw)
+{
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+virtio_pci_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+virtio_pci_get_status(struct virtio_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+virtio_pci_isr(struct virtio_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "invalid bar: %u\n", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "offset(%u) + length(%u) overflows\n",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+			"invalid cap: overflows bar space: %u > %" PRIu64 "\n",
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "bar %u base addr is NULL\n", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to map pci device!\n");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+				"failed to read pci cap at pos: %x\n", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG,
+				"[%2x] skipping non VNDR cap id: %02x\n",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u\n",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "no modern virtio pci device found.\n");
+		return -1;
+	}
+
+	RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "found modern virtio pci device.\n");
+
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "common cfg mapped at: %p\n", hw->common_cfg);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "device cfg mapped at: %p\n", hw->dev_cfg);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "isr cfg mapped at: %p\n", hw->isr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notify base: %p, notify off multiplier: %u\n",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+struct virtio_hw_internal virtio_pci_hw_internal[8];
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw)
+{
+	static size_t internal_id;
+
+	if (internal_id >=
+	    sizeof(virtio_pci_hw_internal) / sizeof(*virtio_pci_hw_internal)) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "too many virtio pci devices.\n");
+		return -1;
+	}
+
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device.
+	 */
+	if (virtio_read_caps(dev, hw) != 0) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "legacy virtio pci is not supported.\n");
+		return -1;
+	}
+
+	RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "modern virtio pci detected.\n");
+	hw->internal_id = internal_id++;
+	virtio_pci_hw_internal[hw->internal_id].vtpci_ops =
+		&virtio_pci_modern_ops;
+	return 0;
+}
+
+enum virtio_msix_status
+virtio_pci_msix_detect(struct rte_pci_device *dev)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n");
+		return VIRTIO_MSIX_NONE;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+				"failed to read pci cap at pos: %x\n", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				return VIRTIO_MSIX_ENABLED;
+			else
+				return VIRTIO_MSIX_DISABLED;
+		}
+
+		pos = cap.cap_next;
+	}
+
+	return VIRTIO_MSIX_NONE;
+}
diff --git a/drivers/virtio_vhost_user/virtio_pci.h b/drivers/virtio_vhost_user/virtio_pci.h
new file mode 100644
index 0000000..018e0b7
--- /dev/null
+++ b/drivers/virtio_vhost_user/virtio_pci.h
@@ -0,0 +1,270 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+/* XXX This file is based on drivers/net/virtio/virtio_pci.h.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <stdint.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_spinlock.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_VIRTIO_PCI_CONFIG RTE_LOGTYPE_USER2
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_PCI_VENDORID     0x1AF4
+#define VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER 0x1017
+#define VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER 0x1058
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
+				      * also clears the register (8, RO) */
+/* Only if MSIX is enabled: */
+#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
+				      (16, RW) */
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* VirtIO device IDs. */
+#define VIRTIO_ID_VHOST_USER  0x18
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them? */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/*
+ * Some VirtIO feature bits (currently bits 28 through 31) are
+ * reserved for the transport being used (eg. virtio_ring), the
+ * rest are per-device feature bits.
+ */
+#define VIRTIO_TRANSPORT_F_START 28
+
+#ifndef VIRTIO_TRANSPORT_F_END
+#define VIRTIO_TRANSPORT_F_END   34
+#endif
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field. */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field. */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;		/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;		/* Generic PCI field: next ptr. */
+	uint8_t cap_len;		/* Generic PCI field: capability length */
+	uint8_t cfg_type;		/* Identifies the structure. */
+	uint8_t bar;			/* Where to find it. */
+	uint8_t padding[3];		/* Pad to full dword. */
+	uint32_t offset;		/* Offset within bar. */
+	uint32_t length;		/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_hw *hw);
+	void    (*set_status)(struct virtio_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_hw *hw);
+	void     (*set_features)(struct virtio_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_hw *hw, struct virtqueue *vq,
+			uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id);
+	int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_hw {
+	uint64_t    guest_features;
+	uint32_t    max_queue_pairs;
+	uint16_t    started;
+	uint8_t	    use_msix;
+	uint16_t    internal_id;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	void	    *dev_cfg;
+	/*
+	 * App management thread and virtio interrupt handler thread
+	 * both can change device state, this lock is meant to avoid
+	 * such a contention.
+	 */
+	rte_spinlock_t state_lock;
+
+	struct virtqueue **vqs;
+};
+
+/*
+ * While virtio_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+};
+
+#define VTPCI_OPS(hw)	(virtio_pci_hw_internal[(hw)->internal_id].vtpci_ops)
+
+extern struct virtio_hw_internal virtio_pci_hw_internal[8];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+virtio_pci_with_feature(struct virtio_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw);
+void virtio_pci_reset(struct virtio_hw *);
+
+void virtio_pci_reinit_complete(struct virtio_hw *);
+
+uint8_t virtio_pci_get_status(struct virtio_hw *);
+void virtio_pci_set_status(struct virtio_hw *, uint8_t);
+
+uint64_t virtio_pci_negotiate_features(struct virtio_hw *, uint64_t);
+
+void virtio_pci_write_dev_config(struct virtio_hw *, size_t, const void *, int);
+
+void virtio_pci_read_dev_config(struct virtio_hw *, size_t, void *, int);
+
+uint8_t virtio_pci_isr(struct virtio_hw *);
+
+enum virtio_msix_status virtio_pci_msix_detect(struct rte_pci_device *dev);
+
+extern const struct virtio_pci_ops virtio_pci_modern_ops;
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/virtio_vhost_user/virtqueue.h b/drivers/virtio_vhost_user/virtqueue.h
new file mode 100644
index 0000000..e2ac78e
--- /dev/null
+++ b/drivers/virtio_vhost_user/virtqueue.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+/* XXX This file is based on drivers/net/virtio/virtqueue.h.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <stdint.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	struct virtio_hw  *hw; /**< virtio_hw structure pointer. */
+	struct vring vq_ring;  /**< vring keeping desc, used and avail */
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_nentries;  /**< vring desc numbers */
+	uint16_t vq_free_cnt;  /**< num of desc available */
+	uint16_t vq_avail_idx; /**< sync until needed */
+	uint16_t vq_free_thresh; /**< free threshold */
+
+	void *vq_ring_virt_mem;  /**< linear address of vring*/
+	unsigned int vq_ring_size;
+
+	rte_iova_t vq_ring_mem; /**< physical address of vring */
+
+	const struct rte_memzone *mz; /**< memzone backing vring */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	uint16_t  vq_queue_index;   /**< PCI queue index */
+	uint16_t  *notify_addr;
+	struct vq_desc_extra vq_descx[0];
+};
+
+/* Chain all the descriptors in the ring with an END */
+static inline void
+vring_desc_init(struct vring_desc *dp, uint16_t n)
+{
+	uint16_t i;
+
+	for (i = 0; i < n - 1; i++)
+		dp[i].next = (uint16_t)(i + 1);
+	dp[i].next = VQ_RING_DESC_CHAIN_END;
+}
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+static inline void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+/**
+ * Tell the backend to interrupt us.
+ */
+static inline void
+virtqueue_enable_intr(struct virtqueue *vq)
+{
+	vq->vq_ring.avail->flags &= (~VRING_AVAIL_F_NO_INTERRUPT);
+}
+
+/**
+ *  Dump virtqueue internal structures, for debug purpose only.
+ */
+void virtqueue_dump(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x\n", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 19/28] vhost: add index field in vhost virtqueues
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (17 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 20/28] drivers: add virtio-vhost-user transport Nikos Dragazis
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

From: Stefan Hajnoczi <stefanha@redhat.com>

Currently, the only way of determining a struct vhost_virtqueue's index
is to search struct virtio_net->virtqueue[] for its address.  Stash the
index in struct vhost_virtqueue so we won't have to search the array.

This new field will be used by virtio-vhost-user.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.c | 2 ++
 lib/librte_vhost/vhost.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 91a286d..d083d7e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -407,6 +407,8 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
+	vq->vring_idx = vring_idx;
+
 	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index aba8d9b..2e7eabe 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -107,6 +107,7 @@ struct vhost_virtqueue {
 		struct vring_packed_desc_event *device_event;
 	};
 	uint32_t		size;
+	uint32_t		vring_idx;
 
 	uint16_t		last_avail_idx;
 	uint16_t		last_used_idx;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 20/28] drivers: add virtio-vhost-user transport
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (18 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 19/28] vhost: add index field in vhost virtqueues Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 21/28] drivers/virtio_vhost_user: use additional device resources Nikos Dragazis
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

This patch introduces the virtio-vhost-user transport. This transport is
based on the virtio-vhost-user device. This device replaces the AF_UNIX
socket used by the vhost-user protocol with a virtio device that tunnels
vhost-user protocol messages.  This allows a guest to act as a vhost
device backend for other guests.

For more information on virtio-vhost-user, see
https://wiki.qemu.org/Features/VirtioVhostUser.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/Makefile                                   |    2 +
 drivers/virtio_vhost_user/Makefile                 |   27 +
 .../rte_virtio_vhost_user_version.map              |    4 +
 .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1067 ++++++++++++++++++++
 drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
 5 files changed, 1118 insertions(+)
 create mode 100644 drivers/virtio_vhost_user/Makefile
 create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
 create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
 create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h

diff --git a/drivers/Makefile b/drivers/Makefile
index 7d5da5d..72e2579 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -22,5 +22,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event
 DEPDIRS-event := common bus mempool net
 DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw
 DEPDIRS-raw := common bus mempool net event
+DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio_vhost_user
+DEPDIRS-virtio_vhost_user := bus
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/virtio_vhost_user/Makefile b/drivers/virtio_vhost_user/Makefile
new file mode 100644
index 0000000..61a77b6
--- /dev/null
+++ b/drivers/virtio_vhost_user/Makefile
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Arrikto Inc.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_virtio_vhost_user.a
+
+EXPORT_MAP := rte_virtio_vhost_user_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+CFLAGS += $(WERROR_FLAGS) -O3
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_bus_pci
+
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+LDLIBS += -lrte_vhost
+endif
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := trans_virtio_vhost_user.c \
+				   virtio_pci.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map b/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
new file mode 100644
index 0000000..4b2e621
--- /dev/null
+++ b/drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+        local: *;
+};
diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
new file mode 100644
index 0000000..72018a4
--- /dev/null
+++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
@@ -0,0 +1,1067 @@
+/* SPDX-License-Idenitifier: BSD-3-Clause
+ * Copyright(c) 2018 Red Hat, Inc.
+ * Copyright(c) 2019 Arrikto, Inc.
+ */
+
+/*
+ * @file
+ * virtio-vhost-user PCI transport driver
+ *
+ * This vhost-user transport communicates with the vhost-user master process
+ * over the virtio-vhost-user PCI device.
+ *
+ * Interrupts are used since this is the control path, not the data path.  This
+ * way the vhost-user command processing doesn't interfere with packet
+ * processing.  This is similar to the AF_UNIX transport's fdman thread that
+ * processes socket I/O separately.
+ *
+ * This transport replaces the usual vhost-user file descriptor passing with a
+ * PCI BAR that contains doorbell registers for callfd and logfd, and shared
+ * memory for the memory table regions.
+ *
+ * VIRTIO device specification:
+ * https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007
+ */
+
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_bus_pci.h>
+#include <rte_io.h>
+
+#include "vhost.h"
+#include "virtio_pci.h"
+#include "virtqueue.h"
+#include "virtio_vhost_user.h"
+#include "vhost_user.h"
+
+/*
+ * Data structures:
+ *
+ * Successfully probed virtio-vhost-user PCI adapters are added to
+ * vvu_pci_device_list as struct vvu_pci_device elements.
+ *
+ * When rte_vhost_driver_register() is called, a struct vvu_socket is created
+ * as the endpoint for future vhost-user connections.  The struct vvu_socket is
+ * associated with the struct vvu_pci_device that will be used for
+ * communication.
+ *
+ * When a vhost-user protocol connection is established, a struct
+ * vvu_connection is created and the application's new_device(int vid) callback
+ * is invoked.
+ */
+
+/** Probed PCI devices for lookup by rte_vhost_driver_register() */
+TAILQ_HEAD(, vvu_pci_device) vvu_pci_device_list =
+	TAILQ_HEAD_INITIALIZER(vvu_pci_device_list);
+
+struct vvu_socket;
+struct vvu_connection;
+
+/** A virtio-vhost-user PCI adapter */
+struct vvu_pci_device {
+	struct virtio_hw hw;
+	struct rte_pci_device *pci_dev;
+	struct vvu_socket *vvu_socket;
+	TAILQ_ENTRY(vvu_pci_device) next;
+};
+
+/** A vhost-user endpoint (aka per-path state) */
+struct vvu_socket {
+	struct vhost_user_socket socket; /* must be first field! */
+	struct vvu_pci_device *pdev;
+	struct vvu_connection *conn;
+
+	/** Doorbell registers */
+	uint16_t *doorbells;
+
+	/** This struct virtio_vhost_user_config field determines the number of
+	 * doorbells available so we keep it saved.
+	 */
+	uint32_t max_vhost_queues;
+
+	/** Receive buffers */
+	const struct rte_memzone *rxbuf_mz;
+
+	/** Transmit buffers.  It is assumed that the device completes them
+	 * in-order so a single wrapping index can be used to select the next
+	 * free buffer.
+	 */
+	const struct rte_memzone *txbuf_mz;
+	unsigned int txbuf_idx;
+};
+
+/** A vhost-user protocol session (aka per-vid state) */
+struct vvu_connection {
+	struct virtio_net device; /* must be first field! */
+	struct vvu_socket *vvu_socket;
+};
+
+/** Virtio feature bits that we support */
+#define VVU_VIRTIO_FEATURES ((1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+			     (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+			     (1ULL << VIRTIO_F_VERSION_1) | \
+			     (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+/** Virtqueue indices */
+enum {
+	VVU_VQ_RX,
+	VVU_VQ_TX,
+	VVU_VQ_MAX,
+};
+
+enum {
+	/** Receive buffer size, in bytes */
+	VVU_RXBUF_SIZE = 1024,
+
+	/** Transmit buffer size, in bytes */
+	VVU_TXBUF_SIZE = 1024,
+};
+
+/** Look up a struct vvu_pci_device from a DomBDF string */
+static struct vvu_pci_device *
+vvu_pci_by_name(const char *name)
+{
+	struct vvu_pci_device *pdev;
+
+	TAILQ_FOREACH(pdev, &vvu_pci_device_list, next) {
+		if (!strcmp(pdev->pci_dev->device.name, name))
+			return pdev;
+	}
+	return NULL;
+}
+
+/** Start connection establishment */
+static void
+vvu_connect(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	uint32_t status;
+
+	virtio_pci_read_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+	status |= RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+	virtio_pci_write_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+}
+
+static void
+vvu_disconnect(struct vvu_socket *vvu_socket)
+{
+	struct vhost_user_socket *vsocket = &vvu_socket->socket;
+	struct vvu_connection *conn = vvu_socket->conn;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	uint32_t status;
+
+	if (conn) {
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->device.vid);
+
+		vhost_destroy_device(conn->device.vid);
+	}
+
+	/* Make sure we're disconnected */
+	virtio_pci_read_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+	status &= ~RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+	virtio_pci_write_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+}
+
+static void
+vvu_reconnect(struct vvu_socket *vvu_socket)
+{
+	vvu_disconnect(vvu_socket);
+	vvu_connect(vvu_socket);
+}
+
+static void vvu_process_rxq(struct vvu_socket *vvu_socket);
+
+static void
+vvu_cleanup_device(struct virtio_net *dev, int destroy __rte_unused)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *vvu_socket = conn->vvu_socket;
+
+	vvu_socket->conn = NULL;
+	vvu_process_rxq(vvu_socket); /* discard old replies from master */
+	vvu_reconnect(vvu_socket);
+}
+
+static int
+vvu_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *vvu_socket = conn->vvu_socket;
+	uint16_t vq_idx = vq->vring_idx;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s vq_idx %u\n", __func__, vq_idx);
+
+	rte_write16(rte_cpu_to_le_16(vq_idx), &vvu_socket->doorbells[vq_idx]);
+	return 0;
+}
+
+static int
+vvu_send_reply(struct virtio_net *dev, struct VhostUserMsg *reply)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *vvu_socket = conn->vvu_socket;
+	struct virtqueue *vq = vvu_socket->pdev->hw.vqs[VVU_VQ_TX];
+	struct vring_desc *desc;
+	struct vq_desc_extra *descx;
+	unsigned int i;
+	void *buf;
+	size_t len;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG,
+		"%s request %u flags %#x size %u\n",
+		__func__, reply->request.master,
+		reply->flags, reply->size);
+
+	/* TODO convert reply to little-endian */
+
+	if (virtqueue_full(vq)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Out of tx buffers\n");
+		return -1;
+	}
+
+	i = vvu_socket->txbuf_idx;
+	len = VHOST_USER_HDR_SIZE + reply->size;
+	buf = (uint8_t *)vvu_socket->txbuf_mz->addr + i * VVU_TXBUF_SIZE;
+
+	memcpy(buf, reply, len);
+
+	desc = &vq->vq_ring.desc[i];
+	descx = &vq->vq_descx[i];
+
+	desc->addr = rte_cpu_to_le_64(vvu_socket->txbuf_mz->iova + i * VVU_TXBUF_SIZE);
+	desc->len = rte_cpu_to_le_32(len);
+	desc->flags = 0;
+
+	descx->cookie = buf;
+	descx->ndescs = 1;
+
+	vq->vq_free_cnt--;
+	vvu_socket->txbuf_idx = (vvu_socket->txbuf_idx + 1) & (vq->vq_nentries - 1);
+
+	vq_update_avail_ring(vq, i);
+	vq_update_avail_idx(vq);
+
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
+
+	return 0;
+}
+
+static int
+vvu_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg __rte_unused)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *vvu_socket = conn->vvu_socket;
+	struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev;
+	uint8_t *mmap_addr;
+	uint32_t i;
+
+	/* Memory regions start after the doorbell registers */
+	mmap_addr = (uint8_t *)pci_dev->mem_resource[2].addr +
+		    RTE_ALIGN_CEIL((vvu_socket->max_vhost_queues + 1 /* log fd */) *
+				   sizeof(uint16_t), 4096);
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+
+		reg->mmap_addr = mmap_addr;
+		reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr +
+				      reg->mmap_size - reg->size;
+
+		mmap_addr += reg->mmap_size;
+
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"guest memory region %u, size: 0x%" PRIx64 "\n"
+			"\t guest physical addr: 0x%" PRIx64 "\n"
+			"\t guest virtual  addr: 0x%" PRIx64 "\n"
+			"\t host  virtual  addr: 0x%" PRIx64 "\n"
+			"\t mmap addr : 0x%" PRIx64 "\n"
+			"\t mmap size : 0x%" PRIx64 "\n"
+			"\t mmap off  : 0x%" PRIx64 "\n",
+			i, reg->size,
+			reg->guest_phys_addr,
+			reg->guest_user_addr,
+			reg->host_user_addr,
+			(uint64_t)(uintptr_t)reg->mmap_addr,
+			reg->mmap_size,
+			reg->mmap_size - reg->size);
+	}
+
+	return 0;
+}
+
+static void
+vvu_unmap_mem_regions(struct virtio_net *dev)
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+
+		/* Just clear the pointers, the PCI BAR stays there */
+		reg->mmap_addr = NULL;
+		reg->host_user_addr = 0;
+	}
+}
+
+static void vvu_process_new_connection(struct vvu_socket *vvu_socket)
+{
+	struct vhost_user_socket *vsocket = &vvu_socket->socket;
+	struct vvu_connection *conn;
+	struct virtio_net *dev;
+	size_t size;
+
+	dev = vhost_new_device(vsocket->trans_ops);
+	if (!dev) {
+		vvu_reconnect(vvu_socket);
+		return;
+	}
+
+	conn = container_of(dev, struct vvu_connection, device);
+	conn->vvu_socket = vvu_socket;
+
+	size = strnlen(vsocket->path, PATH_MAX);
+	vhost_set_ifname(dev->vid, vsocket->path, size);
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid);
+
+	if (vsocket->notify_ops->new_connection) {
+		int ret = vsocket->notify_ops->new_connection(dev->vid);
+		if (ret < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to add vhost user connection\n");
+			vhost_destroy_device(dev->vid);
+			vvu_reconnect(vvu_socket);
+			return;
+		}
+	}
+
+	vvu_socket->conn = conn;
+	return;
+}
+
+static void vvu_process_status_change(struct vvu_socket *vvu_socket, bool slave_up,
+				      bool master_up)
+{
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s slave_up %d master_up %d\n",
+		__func__, slave_up, master_up);
+
+	/* Disconnected from the master, try reconnecting */
+	if (!slave_up) {
+		vvu_reconnect(vvu_socket);
+		return;
+	}
+
+	if (master_up && !vvu_socket->conn) {
+		vvu_process_new_connection(vvu_socket);
+		return;
+	}
+}
+
+static void
+vvu_process_txq(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	struct virtqueue *vq = hw->vqs[VVU_VQ_TX];
+	uint16_t n = VIRTQUEUE_NUSED(vq);
+
+	virtio_rmb();
+
+	/* Just mark the buffers complete */
+	vq->vq_used_cons_idx += n;
+	vq->vq_free_cnt += n;
+}
+
+static void
+vvu_process_rxq(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	struct virtqueue *vq = hw->vqs[VVU_VQ_RX];
+	bool refilled = false;
+
+	while (VIRTQUEUE_NUSED(vq)) {
+		struct vring_used_elem *uep;
+		VhostUserMsg *msg;
+		uint32_t len;
+		uint32_t desc_idx;
+		uint16_t used_idx;
+		size_t i;
+
+		virtio_rmb();
+
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		desc_idx = rte_le_to_cpu_32(uep->id);
+
+		msg = vq->vq_descx[desc_idx].cookie;
+		len = rte_le_to_cpu_32(uep->len);
+
+		if (msg->size > sizeof(VhostUserMsg) ||
+		    len != VHOST_USER_HDR_SIZE + msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid vhost-user message size %u, got %u bytes\n",
+				msg->size, len);
+			/* TODO reconnect */
+			abort();
+		}
+
+		RTE_LOG(DEBUG, VHOST_CONFIG,
+			"%s request %u flags %#x size %u\n",
+			__func__, msg->request.master,
+			msg->flags, msg->size);
+
+		/* Mark file descriptors invalid */
+		for (i = 0; i < RTE_DIM(msg->fds); i++)
+			msg->fds[i] = VIRTIO_INVALID_EVENTFD;
+
+		/* Only process messages while connected */
+		if (vvu_socket->conn) {
+			if (vhost_user_msg_handler(vvu_socket->conn->device.vid,
+						   msg) < 0) {
+				/* TODO reconnect */
+				abort();
+			}
+		}
+
+		vq->vq_used_cons_idx++;
+
+		/* Refill rxq */
+		vq_update_avail_ring(vq, desc_idx);
+		vq_update_avail_idx(vq);
+		refilled = true;
+	}
+
+	if (!refilled)
+		return;
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
+}
+
+/* TODO Audit thread safety.  There are 3 threads involved:
+ * 1. The main process thread that calls librte_vhost APIs during startup.
+ * 2. The interrupt thread that calls vvu_interrupt_handler().
+ * 3. Packet processing threads (lcores) calling librte_vhost APIs.
+ *
+ * It may be necessary to use locks if any of these code paths can race.  The
+ * librte_vhost API entry points already do some locking but this needs to be
+ * checked.
+ */
+static void
+vvu_interrupt_handler(void *cb_arg)
+{
+	struct vvu_socket *vvu_socket = cb_arg;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle;
+	uint8_t isr;
+
+	/* Read Interrupt Status Register (which also clears it) */
+	isr = VTPCI_OPS(hw)->get_isr(hw);
+
+	if (isr & VIRTIO_PCI_ISR_CONFIG) {
+		uint32_t status;
+		bool slave_up;
+		bool master_up;
+
+		virtio_pci_read_dev_config(hw,
+				offsetof(struct virtio_vhost_user_config, status),
+				&status, sizeof(status));
+		status = rte_le_to_cpu_32(status);
+
+		RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x status %#x\n", __func__, isr, status);
+
+		slave_up = status & (1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+		master_up = status & (1u << VIRTIO_VHOST_USER_STATUS_MASTER_UP);
+		vvu_process_status_change(vvu_socket, slave_up, master_up);
+	} else
+		RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x\n", __func__, isr);
+
+	/* Re-arm before processing virtqueues so no interrupts are lost */
+	rte_intr_enable(intr_handle);
+
+	vvu_process_txq(vvu_socket);
+	vvu_process_rxq(vvu_socket);
+}
+
+static int
+vvu_virtio_pci_init_rxq(struct vvu_socket *vvu_socket)
+{
+	char name[sizeof("0000:00:00.00 vq 0 rxbufs")];
+	struct virtqueue *vq;
+	size_t size;
+	size_t align;
+	int i;
+
+	vq = vvu_socket->pdev->hw.vqs[VVU_VQ_RX];
+
+	snprintf(name, sizeof(name), "%s vq %u rxbufs",
+		 vvu_socket->pdev->pci_dev->device.name, VVU_VQ_RX);
+
+	/* Allocate more than sizeof(VhostUserMsg) so there is room to grow */
+	size = vq->vq_nentries * VVU_RXBUF_SIZE;
+	align = 1024;
+	vvu_socket->rxbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY,
+							   0, align);
+	if (!vvu_socket->rxbuf_mz) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate rxbuf memzone\n");
+		return -1;
+	}
+
+	for (i = 0; i < vq->vq_nentries; i++) {
+		struct vring_desc *desc = &vq->vq_ring.desc[i];
+		struct vq_desc_extra *descx = &vq->vq_descx[i];
+
+		desc->addr = rte_cpu_to_le_64(vvu_socket->rxbuf_mz->iova +
+				              i * VVU_RXBUF_SIZE);
+		desc->len = RTE_LE32(VVU_RXBUF_SIZE);
+		desc->flags = RTE_LE16(VRING_DESC_F_WRITE);
+
+		descx->cookie = (uint8_t *)vvu_socket->rxbuf_mz->addr + i * VVU_RXBUF_SIZE;
+		descx->ndescs = 1;
+
+		vq_update_avail_ring(vq, i);
+		vq->vq_free_cnt--;
+	}
+
+	vq_update_avail_idx(vq);
+	virtqueue_notify(vq);
+	return 0;
+}
+
+static int
+vvu_virtio_pci_init_txq(struct vvu_socket *vvu_socket)
+{
+	char name[sizeof("0000:00:00.00 vq 0 txbufs")];
+	struct virtqueue *vq;
+	size_t size;
+	size_t align;
+
+	vq = vvu_socket->pdev->hw.vqs[VVU_VQ_TX];
+
+	snprintf(name, sizeof(name), "%s vq %u txbufs",
+		 vvu_socket->pdev->pci_dev->device.name, VVU_VQ_TX);
+
+	/* Allocate more than sizeof(VhostUserMsg) so there is room to grow */
+	size = vq->vq_nentries * VVU_TXBUF_SIZE;
+	align = 1024;
+	vvu_socket->txbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY,
+							   0, align);
+	if (!vvu_socket->txbuf_mz) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate txbuf memzone\n");
+		return -1;
+	}
+
+	vvu_socket->txbuf_idx = 0;
+	return 0;
+}
+
+static void
+virtio_init_vring(struct virtqueue *vq)
+{
+	int size = vq->vq_nentries;
+	struct vring *vr = &vq->vq_ring;
+	uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+	memset(ring_mem, 0, vq->vq_ring_size);
+	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_used_cons_idx = 0;
+	vq->vq_desc_head_idx = 0;
+	vq->vq_avail_idx = 0;
+	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+	vq->vq_free_cnt = vq->vq_nentries;
+	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+	vring_desc_init(vr->desc, size);
+	virtqueue_enable_intr(vq);
+}
+
+static int
+vvu_virtio_pci_init_vq(struct vvu_socket *vvu_socket, int vq_idx)
+{
+	char vq_name[sizeof("0000:00:00.00 vq 0")];
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	const struct rte_memzone *mz;
+	struct virtqueue *vq;
+	uint16_t q_num;
+	size_t size;
+
+	q_num = VTPCI_OPS(hw)->get_queue_num(hw, vq_idx);
+	RTE_LOG(DEBUG, VHOST_CONFIG, "vq %d q_num: %u\n", vq_idx, q_num);
+	if (q_num == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "virtqueue %d does not exist\n",
+			vq_idx);
+		return -1;
+	}
+
+	if (!rte_is_power_of_2(q_num)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"virtqueue %d has non-power of 2 size (%u)\n",
+			vq_idx, q_num);
+		return -1;
+	}
+
+	snprintf(vq_name, sizeof(vq_name), "%s vq %u",
+		 vvu_socket->pdev->pci_dev->device.name, vq_idx);
+
+	size = RTE_ALIGN_CEIL(sizeof(*vq) +
+			      q_num * sizeof(struct vq_desc_extra),
+			      RTE_CACHE_LINE_SIZE);
+	vq = rte_zmalloc(vq_name, size, RTE_CACHE_LINE_SIZE);
+	if (!vq) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocated virtqueue %d\n", vq_idx);
+		return -1;
+	}
+	hw->vqs[vq_idx] = vq;
+
+	vq->hw = hw;
+	vq->vq_queue_index = vq_idx;
+	vq->vq_nentries = q_num;
+
+	size = vring_size(q_num, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
+
+	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
+					 SOCKET_ID_ANY, 0,
+					 VIRTIO_PCI_VRING_ALIGN);
+	if (mz == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to reserve memzone for virtqueue %d\n",
+			vq_idx);
+		goto err_vq;
+	}
+
+	memset(mz->addr, 0, mz->len);
+
+	vq->mz = mz;
+	vq->vq_ring_mem = mz->iova;
+	vq->vq_ring_virt_mem = mz->addr;
+	virtio_init_vring(vq);
+
+	if (VTPCI_OPS(hw)->setup_queue(hw, vq) < 0)
+		goto err_mz;
+
+	return 0;
+
+err_mz:
+	rte_memzone_free(mz);
+
+err_vq:
+	hw->vqs[vq_idx] = NULL;
+	rte_free(vq);
+	return -1;
+}
+
+static void
+vvu_virtio_pci_free_virtqueues(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	int i;
+
+	if (vvu_socket->rxbuf_mz) {
+		rte_memzone_free(vvu_socket->rxbuf_mz);
+		vvu_socket->rxbuf_mz = NULL;
+	}
+	if (vvu_socket->txbuf_mz) {
+		rte_memzone_free(vvu_socket->txbuf_mz);
+		vvu_socket->txbuf_mz = NULL;
+	}
+
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		struct virtqueue *vq = hw->vqs[i];
+
+		if (!vq)
+			continue;
+
+		rte_memzone_free(vq->mz);
+		rte_free(vq);
+		hw->vqs[i] = NULL;
+	}
+
+	rte_free(hw->vqs);
+	hw->vqs = NULL;
+}
+
+static void
+vvu_virtio_pci_intr_cleanup(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle;
+	int i;
+
+	for (i = 0; i < VVU_VQ_MAX; i++)
+		VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i],
+					     VIRTIO_MSI_NO_VECTOR);
+	VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
+	rte_intr_disable(intr_handle);
+	rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, vvu_socket);
+	rte_intr_efd_disable(intr_handle);
+}
+
+static int
+vvu_virtio_pci_init_intr(struct vvu_socket *vvu_socket)
+{
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	struct rte_intr_handle *intr_handle = &vvu_socket->pdev->pci_dev->intr_handle;
+	int i;
+
+	if (!rte_intr_cap_multiple(intr_handle)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Multiple intr vector not supported\n");
+		return -1;
+	}
+
+	if (rte_intr_efd_enable(intr_handle, VVU_VQ_MAX) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to create eventfds\n");
+		return -1;
+	}
+
+	if (rte_intr_callback_register(intr_handle, vvu_interrupt_handler, vvu_socket) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to register interrupt callback\n");
+		goto err_efd;
+	}
+
+	if (rte_intr_enable(intr_handle) < 0)
+		goto err_callback;
+
+	if (VTPCI_OPS(hw)->set_config_irq(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to set config MSI-X vector\n");
+		goto err_enable;
+	}
+
+	/* TODO use separate vectors and interrupt handler functions.  It seems
+	 * <rte_interrupts.h> doesn't allow efds to have interrupt_handler
+	 * functions and it just clears efds when they are raised.  As a
+	 * workaround we use the configuration change interrupt for virtqueue
+	 * interrupts!
+	 */
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		if (VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i], 0) ==
+				VIRTIO_MSI_NO_VECTOR) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to set virtqueue MSI-X vector\n");
+			goto err_vq;
+		}
+	}
+
+	return 0;
+
+err_vq:
+	for (i = 0; i < VVU_VQ_MAX; i++)
+		VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i],
+					     VIRTIO_MSI_NO_VECTOR);
+	VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
+err_enable:
+	rte_intr_disable(intr_handle);
+err_callback:
+	rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, vvu_socket);
+err_efd:
+	rte_intr_efd_disable(intr_handle);
+	return -1;
+}
+
+static int
+vvu_virtio_pci_init_bar(struct vvu_socket *vvu_socket)
+{
+	struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev;
+	struct virtio_net *dev = NULL; /* just for sizeof() */
+
+	vvu_socket->doorbells = pci_dev->mem_resource[2].addr;
+	if (!vvu_socket->doorbells) {
+		RTE_LOG(ERR, VHOST_CONFIG, "BAR 2 not available\n");
+		return -1;
+	}
+
+	/* The number of doorbells is max_vhost_queues + 1 */
+	virtio_pci_read_dev_config(&vvu_socket->pdev->hw,
+			offsetof(struct virtio_vhost_user_config,
+				 max_vhost_queues),
+			&vvu_socket->max_vhost_queues,
+			sizeof(vvu_socket->max_vhost_queues));
+	vvu_socket->max_vhost_queues = rte_le_to_cpu_32(vvu_socket->max_vhost_queues);
+	if (vvu_socket->max_vhost_queues < RTE_DIM(dev->virtqueue)) {
+		/* We could support devices with a smaller max number of
+		 * virtqueues than dev->virtqueue[] in the future.  Fail early
+		 * for now since the current assumption is that all of
+		 * dev->virtqueue[] can be used.
+		 */
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Device supports fewer virtqueues than driver!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+vvu_virtio_pci_init(struct vvu_socket *vvu_socket)
+{
+	uint64_t host_features;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
+	int i;
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+
+	hw->guest_features = VVU_VIRTIO_FEATURES;
+	host_features = VTPCI_OPS(hw)->get_features(hw);
+	hw->guest_features = virtio_pci_negotiate_features(hw, host_features);
+
+	if (!virtio_pci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Missing VIRTIO 1 feature bit\n");
+		goto err;
+	}
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+	if (!(virtio_pci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Failed to set FEATURES_OK\n");
+		goto err;
+	}
+
+	if (vvu_virtio_pci_init_bar(vvu_socket) < 0)
+		goto err;
+
+	hw->vqs = rte_zmalloc(NULL, sizeof(struct virtqueue *) * VVU_VQ_MAX, 0);
+	if (!hw->vqs)
+		goto err;
+
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		if (vvu_virtio_pci_init_vq(vvu_socket, i) < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"virtqueue %u init failed\n", i);
+			goto err_init_vq;
+		}
+	}
+
+	if (vvu_virtio_pci_init_rxq(vvu_socket) < 0)
+		goto err_init_vq;
+
+	if (vvu_virtio_pci_init_txq(vvu_socket) < 0)
+		goto err_init_vq;
+
+	if (vvu_virtio_pci_init_intr(vvu_socket) < 0)
+		goto err_init_vq;
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+
+	return 0;
+
+err_init_vq:
+	vvu_virtio_pci_free_virtqueues(vvu_socket);
+
+err:
+	virtio_pci_reset(hw);
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s failed\n", __func__);
+	return -1;
+}
+
+static int
+vvu_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+	      struct rte_pci_device *pci_dev)
+{
+	struct vvu_pci_device *pdev;
+
+	/* TODO support multi-process applications */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"virtio-vhost-pci does not support multi-process "
+			"applications\n");
+		return -1;
+	}
+
+	pdev = rte_zmalloc_socket(pci_dev->device.name, sizeof(*pdev),
+				  RTE_CACHE_LINE_SIZE,
+				  pci_dev->device.numa_node);
+	if (!pdev)
+		return -1;
+
+	pdev->pci_dev = pci_dev;
+
+	if (virtio_pci_init(pci_dev, &pdev->hw) != 0) {
+		rte_free(pdev);
+		return -1;
+	}
+
+	/* Reset the device now, the rest is done in vvu_socket_init() */
+	virtio_pci_reset(&pdev->hw);
+
+	if (pdev->hw.use_msix == VIRTIO_MSIX_NONE) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"MSI-X is required for PCI device at %s\n",
+			pci_dev->device.name);
+		rte_free(pdev);
+		rte_pci_unmap_device(pci_dev);
+		return -1;
+	}
+
+	TAILQ_INSERT_TAIL(&vvu_pci_device_list, pdev, next);
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"Added virtio-vhost-user device at %s\n",
+		pci_dev->device.name);
+
+	return 0;
+}
+
+static int
+vvu_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct vvu_pci_device *pdev;
+
+	TAILQ_FOREACH(pdev, &vvu_pci_device_list, next)
+		if (pdev->pci_dev == pci_dev)
+			break;
+	if (!pdev)
+		return -1;
+
+	if (pdev->vvu_socket) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Cannot remove PCI device at %s with vhost still attached\n",
+			pci_dev->device.name);
+		return -1;
+	}
+
+	TAILQ_REMOVE(&vvu_pci_device_list, pdev, next);
+	rte_free(pdev);
+	rte_pci_unmap_device(pci_dev);
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_vvu_map[] = {
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID,
+			 VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER) },
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID,
+			 VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver vvu_pci_driver = {
+	.driver = {
+		.name = "virtio_vhost_user",
+	},
+	.id_table = pci_id_vvu_map,
+	.drv_flags = 0,
+	.probe = vvu_pci_probe,
+	.remove = vvu_pci_remove,
+};
+
+RTE_INIT(vvu_pci_init);
+static void
+vvu_pci_init(void)
+{
+	if (rte_eal_iopl_init() != 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"IOPL call failed - cannot use virtio-vhost-user\n");
+		return;
+	}
+
+	rte_pci_register(&vvu_pci_driver);
+}
+
+static int
+vvu_socket_init(struct vhost_user_socket *vsocket, uint64_t flags)
+{
+	struct vvu_socket *vvu_socket =
+		container_of(vsocket, struct vvu_socket, socket);
+	struct vvu_pci_device *pdev;
+
+	if (flags & RTE_VHOST_USER_NO_RECONNECT) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: reconnect cannot be disabled for virtio-vhost-user\n");
+		return -1;
+	}
+	if (flags & RTE_VHOST_USER_CLIENT) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: virtio-vhost-user does not support client mode\n");
+		return -1;
+	}
+	if (flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: virtio-vhost-user does not support dequeue-zero-copy\n");
+		return -1;
+	}
+
+	pdev = vvu_pci_by_name(vsocket->path);
+	if (!pdev) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Cannot find virtio-vhost-user PCI device at %s\n",
+			vsocket->path);
+		return -1;
+	}
+
+	if (pdev->vvu_socket) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Device at %s is already in use\n",
+			vsocket->path);
+		return -1;
+	}
+
+	vvu_socket->pdev = pdev;
+	pdev->vvu_socket = vvu_socket;
+
+	if (vvu_virtio_pci_init(vvu_socket) < 0) {
+		vvu_socket->pdev = NULL;
+		pdev->vvu_socket = NULL;
+		return -1;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "%s at %s\n", __func__, vsocket->path);
+	return 0;
+}
+
+static void
+vvu_socket_cleanup(struct vhost_user_socket *vsocket)
+{
+	struct vvu_socket *vvu_socket =
+		container_of(vsocket, struct vvu_socket, socket);
+
+	if (vvu_socket->conn)
+		vhost_destroy_device(vvu_socket->conn->device.vid);
+
+	vvu_virtio_pci_intr_cleanup(vvu_socket);
+	virtio_pci_reset(&vvu_socket->pdev->hw);
+	vvu_virtio_pci_free_virtqueues(vvu_socket);
+
+	vvu_socket->pdev->vvu_socket = NULL;
+	vvu_socket->pdev = NULL;
+}
+
+static int
+vvu_socket_start(struct vhost_user_socket *vsocket)
+{
+	struct vvu_socket *vvu_socket =
+		container_of(vsocket, struct vvu_socket, socket);
+
+	vvu_connect(vvu_socket);
+	return 0;
+}
+
+const struct vhost_transport_ops virtio_vhost_user_trans_ops = {
+	.socket_size = sizeof(struct vvu_socket),
+	.device_size = sizeof(struct vvu_connection),
+	.socket_init = vvu_socket_init,
+	.socket_cleanup = vvu_socket_cleanup,
+	.socket_start = vvu_socket_start,
+	.cleanup_device = vvu_cleanup_device,
+	.vring_call = vvu_vring_call,
+	.send_reply = vvu_send_reply,
+	.map_mem_regions = vvu_map_mem_regions,
+	.unmap_mem_regions = vvu_unmap_mem_regions,
+};
diff --git a/drivers/virtio_vhost_user/virtio_vhost_user.h b/drivers/virtio_vhost_user/virtio_vhost_user.h
new file mode 100644
index 0000000..baeaa74
--- /dev/null
+++ b/drivers/virtio_vhost_user/virtio_vhost_user.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C) 2018 Red Hat, Inc.
+ */
+
+#ifndef _LINUX_VIRTIO_VHOST_USER_H
+#define _LINUX_VIRTIO_VHOST_USER_H
+
+#include <stdint.h>
+
+struct virtio_vhost_user_config {
+    uint32_t status;
+#define VIRTIO_VHOST_USER_STATUS_SLAVE_UP 0
+#define VIRTIO_VHOST_USER_STATUS_MASTER_UP 1
+    uint32_t max_vhost_queues;
+    uint8_t uuid[16];
+};
+
+#endif /* _LINUX_VIRTIO_VHOST_USER_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 21/28] drivers/virtio_vhost_user: use additional device resources
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (19 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 20/28] drivers: add virtio-vhost-user transport Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 22/28] vhost: add flag for choosing vhost-user transport Nikos Dragazis
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Enhance the virtio-vhost-user device driver so that it utilizes the
device's additional resource capabilities. In specific, this patch adds
support for the doorbells and shared_memory capabilities. The former is
used to find the location of the device doorbells. The latter is used to
find the location of the vhost memory regions in the device's memory
address space. Also, support has been added for the notification
capability, though this configuration structure is not currently being
used by the virtio-vhost-user driver due to DPDK's poll-mode nature.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 .../virtio_vhost_user/trans_virtio_vhost_user.c    | 22 ++++++++++++++--------
 drivers/virtio_vhost_user/virtio_pci.c             | 16 ++++++++++++++++
 drivers/virtio_vhost_user/virtio_pci.h             | 19 +++++++++++++++++++
 3 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
index 72018a4..45863bd 100644
--- a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
+++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
@@ -198,11 +198,14 @@ vvu_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	struct vvu_connection *conn =
 		container_of(dev, struct vvu_connection, device);
 	struct vvu_socket *vvu_socket = conn->vvu_socket;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
 	uint16_t vq_idx = vq->vring_idx;
+	uint16_t *notify_addr = (void *)((uint8_t *)vvu_socket->doorbells +
+				vq_idx * hw->doorbell_off_multiplier);
 
 	RTE_LOG(DEBUG, VHOST_CONFIG, "%s vq_idx %u\n", __func__, vq_idx);
 
-	rte_write16(rte_cpu_to_le_16(vq_idx), &vvu_socket->doorbells[vq_idx]);
+	rte_write16(rte_cpu_to_le_16(vq_idx), notify_addr);
 	return 0;
 }
 
@@ -265,14 +268,14 @@ vvu_map_mem_regions(struct virtio_net *dev, struct VhostUserMsg *msg __rte_unuse
 	struct vvu_connection *conn =
 		container_of(dev, struct vvu_connection, device);
 	struct vvu_socket *vvu_socket = conn->vvu_socket;
-	struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
 	uint8_t *mmap_addr;
 	uint32_t i;
 
-	/* Memory regions start after the doorbell registers */
-	mmap_addr = (uint8_t *)pci_dev->mem_resource[2].addr +
-		    RTE_ALIGN_CEIL((vvu_socket->max_vhost_queues + 1 /* log fd */) *
-				   sizeof(uint16_t), 4096);
+	/* Get the starting address of vhost memory regions from
+	 * the shared memory virtio PCI capability
+	 */
+	mmap_addr = hw->shared_memory_cfg;
 
 	for (i = 0; i < dev->mem->nregions; i++) {
 		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
@@ -780,10 +783,13 @@ vvu_virtio_pci_init_intr(struct vvu_socket *vvu_socket)
 static int
 vvu_virtio_pci_init_bar(struct vvu_socket *vvu_socket)
 {
-	struct rte_pci_device *pci_dev = vvu_socket->pdev->pci_dev;
+	struct virtio_hw *hw = &vvu_socket->pdev->hw;
 	struct virtio_net *dev = NULL; /* just for sizeof() */
 
-	vvu_socket->doorbells = pci_dev->mem_resource[2].addr;
+	/* Get the starting address of the doorbells from
+	 * the doorbell virtio PCI capability
+	 */
+	vvu_socket->doorbells = hw->doorbell_base;
 	if (!vvu_socket->doorbells) {
 		RTE_LOG(ERR, VHOST_CONFIG, "BAR 2 not available\n");
 		return -1;
diff --git a/drivers/virtio_vhost_user/virtio_pci.c b/drivers/virtio_vhost_user/virtio_pci.c
index 9c2c981..7996729 100644
--- a/drivers/virtio_vhost_user/virtio_pci.c
+++ b/drivers/virtio_vhost_user/virtio_pci.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arrikto Inc.
  */
 #include <stdint.h>
 
@@ -407,6 +408,18 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
 		case VIRTIO_PCI_CAP_ISR_CFG:
 			hw->isr = get_cfg_addr(dev, &cap);
 			break;
+		case VIRTIO_PCI_CAP_DOORBELL_CFG:
+			rte_pci_read_config(dev, &hw->doorbell_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->doorbell_base = get_cfg_addr(dev, &cap);
+			rte_pci_read_config(dev, &hw->doorbell_length, 4, pos + 10);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFICATION_CFG:
+			hw->notify_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_SHARED_MEMORY_CFG:
+			hw->shared_memory_cfg = get_cfg_addr(dev, &cap);
+			break;
 		}
 
 next:
@@ -426,6 +439,9 @@ virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
 	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "isr cfg mapped at: %p\n", hw->isr);
 	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notify base: %p, notify off multiplier: %u\n",
 		hw->notify_base, hw->notify_off_multiplier);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "doorbell base: %p, doorbell off multiplier: %u\n", hw->doorbell_base, hw->doorbell_off_multiplier);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notification cfg mapped at: %p\n", hw->notify_cfg);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "shared memory region mapped at: %p\n", hw->shared_memory_cfg);
 
 	return 0;
 }
diff --git a/drivers/virtio_vhost_user/virtio_pci.h b/drivers/virtio_vhost_user/virtio_pci.h
index 018e0b7..12373d1 100644
--- a/drivers/virtio_vhost_user/virtio_pci.h
+++ b/drivers/virtio_vhost_user/virtio_pci.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arrikto Inc.
  */
 
 /* XXX This file is based on drivers/net/virtio/virtio_pci.h.  It would be
@@ -117,6 +118,10 @@ struct virtqueue;
 #define VIRTIO_PCI_CAP_DEVICE_CFG	4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG		5
+/* Additional capabilities for the virtio-vhost-user device */
+#define VIRTIO_PCI_CAP_DOORBELL_CFG    6
+#define VIRTIO_PCI_CAP_NOTIFICATION_CFG 7
+#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
 
 /* This is the PCI capability header: */
 struct virtio_pci_cap {
@@ -161,6 +166,12 @@ struct virtio_pci_common_cfg {
 	uint32_t queue_used_hi;		/* read-write */
 };
 
+/* Fields in VIRTIO_PCI_CAP_NOTIFICATION_CFG */
+struct virtio_pci_notification_cfg {
+	uint16_t notification_select;              /* read-write */
+	uint16_t notification_msix_vector;         /* read-write */
+};
+
 struct virtio_hw;
 
 struct virtio_pci_ops {
@@ -200,6 +211,14 @@ struct virtio_hw {
 	uint16_t    *notify_base;
 	struct virtio_pci_common_cfg *common_cfg;
 	void	    *dev_cfg;
+	/* virtio-vhost-user additional device resource capabilities
+	 * https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007
+	 */
+	uint32_t    doorbell_off_multiplier;
+	uint16_t    *doorbell_base;
+	uint32_t     doorbell_length;
+	struct virtio_pci_notification_cfg *notify_cfg;
+	uint8_t     *shared_memory_cfg;
 	/*
 	 * App management thread and virtio interrupt handler thread
 	 * both can change device state, this lock is meant to avoid
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 22/28] vhost: add flag for choosing vhost-user transport
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (20 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 21/28] drivers/virtio_vhost_user: use additional device resources Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 23/28] net/vhost: add virtio-vhost-user support Nikos Dragazis
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Extend the <rte_vhost.h> API to support the virtio-vhost-user transport
as an alternative to the AF_UNIX transport.  The caller provides a PCI
DomBDF address:

  rte_vhost_driver_register("0000:00:04.0",
                            RTE_VHOST_USER_VIRTIO_TRANSPORT);

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/virtio_vhost_user/trans_virtio_vhost_user.c |  4 ++++
 lib/librte_vhost/rte_vhost.h                        |  1 +
 lib/librte_vhost/socket.c                           | 19 ++++++++++++++++++-
 lib/librte_vhost/vhost.h                            |  6 +++++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
index 45863bd..04dbbb1 100644
--- a/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
+++ b/drivers/virtio_vhost_user/trans_virtio_vhost_user.c
@@ -979,6 +979,10 @@ vvu_pci_init(void)
 	}
 
 	rte_pci_register(&vvu_pci_driver);
+	if (rte_vhost_register_transport(VHOST_TRANSPORT_VVU, &virtio_vhost_user_trans_ops) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Registration of vhost-user transport (%d) failed\n", VHOST_TRANSPORT_VVU);
+	}
 }
 
 static int
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 0226b3e..0573238 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -29,6 +29,7 @@ extern "C" {
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
 #define RTE_VHOST_USER_IOMMU_SUPPORT	(1ULL << 3)
 #define RTE_VHOST_USER_POSTCOPY_SUPPORT		(1ULL << 4)
+#define RTE_VHOST_USER_VIRTIO_TRANSPORT	(1ULL << 5)
 
 /** Protocol features. */
 #ifndef VHOST_USER_PROTOCOL_F_MQ
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index fe1c78d..1295fdd 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -327,7 +327,16 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 			goto out;
 	}
 
-	trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX];
+	if (flags & RTE_VHOST_USER_VIRTIO_TRANSPORT) {
+		trans_ops = g_transport_map[VHOST_TRANSPORT_VVU];
+		if (trans_ops == NULL) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+					"virtio-vhost-user transport is not supported\n");
+			goto out;
+		}
+	} else {
+		trans_ops = g_transport_map[VHOST_TRANSPORT_UNIX];
+	}
 
 	if (!path)
 		return -1;
@@ -400,6 +409,14 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 			"Postcopy requested but not compiled\n");
 		ret = -1;
 		goto out_free;
+#else
+		if (flags & RTE_VHOST_USER_VIRTIO_TRANSPORT) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Postcopy requested but not supported "
+				"by the virtio-vhost-user transport\n");
+			ret = -1;
+			goto out_free;
+		}
 #endif
 	}
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 2e7eabe..a6131da 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -494,9 +494,13 @@ struct vhost_transport_ops {
 /** The traditional AF_UNIX vhost-user protocol transport. */
 extern const struct vhost_transport_ops af_unix_trans_ops;
 
+/** The virtio-vhost-user PCI vhost-user protocol transport. */
+extern const struct vhost_transport_ops virtio_vhost_user_trans_ops;
+
 typedef enum VhostUserTransport {
 	VHOST_TRANSPORT_UNIX = 0,
-	VHOST_TRANSPORT_MAX = 1
+	VHOST_TRANSPORT_VVU = 1,
+	VHOST_TRANSPORT_MAX = 2
 } VhostUserTransport;
 
 /* A list with all the available vhost-user transports. */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 23/28] net/vhost: add virtio-vhost-user support
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (21 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 22/28] vhost: add flag for choosing vhost-user transport Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 24/28] examples/vhost_scsi: add --socket-file argument Nikos Dragazis
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The new virtio-transport=0|1 argument enables virtio-vhost-user support:

  testpmd ... --pci-whitelist 0000:00:04.0 \
              --vdev vhost,iface=0000:00:04.0,virtio-transport=1

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 0b61e37..c0d087f 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -31,6 +31,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 #define ETH_VHOST_DEQUEUE_ZERO_COPY	"dequeue-zero-copy"
 #define ETH_VHOST_IOMMU_SUPPORT		"iommu-support"
 #define ETH_VHOST_POSTCOPY_SUPPORT	"postcopy-support"
+#define ETH_VHOST_VIRTIO_TRANSPORT	"virtio-transport"
 #define VHOST_MAX_PKT_BURST 32
 
 static const char *valid_arguments[] = {
@@ -40,6 +41,7 @@ static const char *valid_arguments[] = {
 	ETH_VHOST_DEQUEUE_ZERO_COPY,
 	ETH_VHOST_IOMMU_SUPPORT,
 	ETH_VHOST_POSTCOPY_SUPPORT,
+	ETH_VHOST_VIRTIO_TRANSPORT,
 	NULL
 };
 
@@ -1341,6 +1343,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	int dequeue_zero_copy = 0;
 	int iommu_support = 0;
 	int postcopy_support = 0;
+	uint16_t virtio_transport = 0;
 	struct rte_eth_dev *eth_dev;
 	const char *name = rte_vdev_device_name(dev);
 
@@ -1422,6 +1425,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 			flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT;
 	}
 
+	if (rte_kvargs_count(kvlist, ETH_VHOST_VIRTIO_TRANSPORT) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_VIRTIO_TRANSPORT,
+					 &open_int, &virtio_transport);
+		if (ret < 0)
+			goto out_free;
+
+		if (virtio_transport)
+			flags |= RTE_VHOST_USER_VIRTIO_TRANSPORT;
+	}
+
 	if (dev->device.numa_node == SOCKET_ID_ANY)
 		dev->device.numa_node = rte_socket_id();
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 24/28] examples/vhost_scsi: add --socket-file argument
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (22 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 23/28] net/vhost: add virtio-vhost-user support Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 25/28] examples/vhost_scsi: add virtio-vhost-user support Nikos Dragazis
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

From: Stefan Hajnoczi <stefanha@redhat.com>

The default filename built into examples/vhost_scsi may not be
convenient.  Allow the user to specify the full UNIX domain socket path
on the command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 examples/vhost_scsi/vhost_scsi.c | 93 ++++++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 18 deletions(-)

diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
index 513af0c..d2d02fd 100644
--- a/examples/vhost_scsi/vhost_scsi.c
+++ b/examples/vhost_scsi/vhost_scsi.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2017 Intel Corporation
  */
 
+#include <getopt.h>
 #include <stdint.h>
 #include <unistd.h>
 #include <stdbool.h>
@@ -402,26 +403,10 @@ static const struct vhost_device_ops vhost_scsi_device_ops = {
 };
 
 static struct vhost_scsi_ctrlr *
-vhost_scsi_ctrlr_construct(const char *ctrlr_name)
+vhost_scsi_ctrlr_construct(void)
 {
 	int ret;
 	struct vhost_scsi_ctrlr *ctrlr;
-	char *path;
-	char cwd[PATH_MAX];
-
-	/* always use current directory */
-	path = getcwd(cwd, PATH_MAX);
-	if (!path) {
-		fprintf(stderr, "Cannot get current working directory\n");
-		return NULL;
-	}
-	snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
-
-	if (access(dev_pathname, F_OK) != -1) {
-		if (unlink(dev_pathname) != 0)
-			rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
-				 dev_pathname);
-	}
 
 	if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
 		fprintf(stderr, "socket %s already exists\n", dev_pathname);
@@ -455,6 +440,71 @@ signal_handler(__rte_unused int signum)
 	exit(0);
 }
 
+static void
+set_dev_pathname(const char *path)
+{
+	if (dev_pathname[0])
+		rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n");
+
+	snprintf(dev_pathname, sizeof(dev_pathname), "%s", path);
+}
+
+static void
+vhost_scsi_usage(const char *prgname)
+{
+	fprintf(stderr, "%s [EAL options] --\n"
+	"    --socket-file PATH: The path of the UNIX domain socket\n",
+		prgname);
+}
+
+static void
+vhost_scsi_parse_args(int argc, char **argv)
+{
+	int opt;
+	int option_index;
+	const char *prgname = argv[0];
+	static struct option long_option[] = {
+		{"socket-file", required_argument, NULL, 0},
+		{NULL, 0, 0, 0},
+	};
+
+	while ((opt = getopt_long(argc, argv, "", long_option,
+				  &option_index)) != -1) {
+		switch (opt) {
+		case 0:
+			if (!strcmp(long_option[option_index].name,
+				    "socket-file")) {
+				set_dev_pathname(optarg);
+			}
+			break;
+		default:
+			vhost_scsi_usage(prgname);
+			rte_exit(EXIT_FAILURE, "Invalid argument\n");
+		}
+	}
+}
+
+static void
+vhost_scsi_set_default_dev_pathname(void)
+{
+	char *path;
+	char cwd[PATH_MAX];
+
+	/* always use current directory */
+	path = getcwd(cwd, PATH_MAX);
+	if (!path) {
+		rte_exit(EXIT_FAILURE,
+			 "Cannot get current working directory\n");
+	}
+	snprintf(dev_pathname, sizeof(dev_pathname), "%s/vhost.socket", path);
+
+	if (access(dev_pathname, F_OK) != -1) {
+		if (unlink(dev_pathname) != 0)
+			rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+				 dev_pathname);
+	}
+}
+
 int main(int argc, char *argv[])
 {
 	int ret;
@@ -465,8 +515,15 @@ int main(int argc, char *argv[])
 	ret = rte_eal_init(argc, argv);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+	argc -= ret;
+	argv += ret;
+
+	vhost_scsi_parse_args(argc, argv);
+
+	if (!dev_pathname[0])
+		vhost_scsi_set_default_dev_pathname();
 
-	g_vhost_ctrlr = vhost_scsi_ctrlr_construct("vhost.socket");
+	g_vhost_ctrlr = vhost_scsi_ctrlr_construct();
 	if (g_vhost_ctrlr == NULL) {
 		fprintf(stderr, "Construct vhost scsi controller failed\n");
 		return 0;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 25/28] examples/vhost_scsi: add virtio-vhost-user support
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (23 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 24/28] examples/vhost_scsi: add --socket-file argument Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 26/28] mk: link apps with virtio-vhost-user driver Nikos Dragazis
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

From: Stefan Hajnoczi <stefanha@redhat.com>

The new --virtio-vhost-user-pci command-line argument uses
virtio-vhost-user instead of the default AF_UNIX transport.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 examples/vhost_scsi/vhost_scsi.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
index d2d02fd..020f4a0 100644
--- a/examples/vhost_scsi/vhost_scsi.c
+++ b/examples/vhost_scsi/vhost_scsi.c
@@ -27,6 +27,7 @@
 
 /* Path to folder where character device will be created. Can be set by user. */
 static char dev_pathname[PATH_MAX] = "";
+static uint64_t dev_flags; /* for rte_vhost_driver_register() */
 
 static struct vhost_scsi_ctrlr *g_vhost_ctrlr;
 static int g_should_stop;
@@ -408,7 +409,7 @@ vhost_scsi_ctrlr_construct(void)
 	int ret;
 	struct vhost_scsi_ctrlr *ctrlr;
 
-	if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+	if (rte_vhost_driver_register(dev_pathname, dev_flags) != 0) {
 		fprintf(stderr, "socket %s already exists\n", dev_pathname);
 		return NULL;
 	}
@@ -444,7 +445,8 @@ static void
 set_dev_pathname(const char *path)
 {
 	if (dev_pathname[0])
-		rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n");
+		rte_exit(EXIT_FAILURE, "Only one of --socket-file or "
+			 "--virtio-vhost-user-pci can be given.\n");
 
 	snprintf(dev_pathname, sizeof(dev_pathname), "%s", path);
 }
@@ -453,7 +455,8 @@ static void
 vhost_scsi_usage(const char *prgname)
 {
 	fprintf(stderr, "%s [EAL options] --\n"
-	"    --socket-file PATH: The path of the UNIX domain socket\n",
+	"    --socket-file PATH: The path of the UNIX domain socket\n"
+	"    --virtio-vhost-user-pci DomBDF: PCI adapter address\n",
 		prgname);
 }
 
@@ -465,6 +468,7 @@ vhost_scsi_parse_args(int argc, char **argv)
 	const char *prgname = argv[0];
 	static struct option long_option[] = {
 		{"socket-file", required_argument, NULL, 0},
+		{"virtio-vhost-user-pci", required_argument, NULL, 0},
 		{NULL, 0, 0, 0},
 	};
 
@@ -475,6 +479,10 @@ vhost_scsi_parse_args(int argc, char **argv)
 			if (!strcmp(long_option[option_index].name,
 				    "socket-file")) {
 				set_dev_pathname(optarg);
+			} else if (!strcmp(long_option[option_index].name,
+				   "virtio-vhost-user-pci")) {
+				set_dev_pathname(optarg);
+				dev_flags = RTE_VHOST_USER_VIRTIO_TRANSPORT;
 			}
 			break;
 		default:
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 26/28] mk: link apps with virtio-vhost-user driver
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (24 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 25/28] examples/vhost_scsi: add virtio-vhost-user support Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 27/28] config: add option for the virtio-vhost-user transport Nikos Dragazis
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Export the virtio-vhost-user transport to all the apps. Support using
the virtio-vhost-user transport with shared libraries by unconditionally
linking librte_virtio_vhost_user.so with the apps.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 mk/rte.app.mk | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b5..77e02d1 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -132,6 +132,12 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS)      += -lrte_bus_fslmc
 endif
 
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+_LDLIBS-y += --no-as-needed
+_LDLIBS-y += -lrte_virtio_vhost_user
+_LDLIBS-y += --as-needed
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 27/28] config: add option for the virtio-vhost-user transport
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (25 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 26/28] mk: link apps with virtio-vhost-user driver Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 28/28] usertools: add virtio-vhost-user devices to dpdk-devbind.py Nikos Dragazis
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Add a configuration option for compiling and linking with the
virtio-vhost-user library.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
---
 config/common_base  | 6 ++++++
 config/common_linux | 1 +
 drivers/Makefile    | 5 ++++-
 mk/rte.app.mk       | 2 +-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/config/common_base b/config/common_base
index 6f19ad5..2559d69 100644
--- a/config/common_base
+++ b/config/common_base
@@ -963,6 +963,12 @@ CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile virtio-vhost-user library
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER=n
+
+#
 # Compile IFC driver
 # To compile, CONFIG_RTE_LIBRTE_VHOST and CONFIG_RTE_EAL_VFIO
 # should be enabled.
diff --git a/config/common_linux b/config/common_linux
index 7533427..7e4279f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -17,6 +17,7 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
+CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER=y
 CONFIG_RTE_LIBRTE_IFC_PMD=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
diff --git a/drivers/Makefile b/drivers/Makefile
index 72e2579..971dc6c 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -22,7 +22,10 @@ DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event
 DEPDIRS-event := common bus mempool net
 DIRS-$(CONFIG_RTE_LIBRTE_RAWDEV) += raw
 DEPDIRS-raw := common bus mempool net event
-DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio_vhost_user
+
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST)$(CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER),yy)
+DIRS-y += virtio_vhost_user
 DEPDIRS-virtio_vhost_user := bus
+endif
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 77e02d1..8dd2922 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -132,7 +132,7 @@ ifeq ($(CONFIG_RTE_EAL_VFIO),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS)      += -lrte_bus_fslmc
 endif
 
-ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+ifeq ($(CONFIG_RTE_LIBRTE_VHOST)$(CONFIG_RTE_LIBRTE_VIRTIO_VHOST_USER),yy)
 _LDLIBS-y += --no-as-needed
 _LDLIBS-y += -lrte_virtio_vhost_user
 _LDLIBS-y += --as-needed
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [dpdk-dev] [PATCH 28/28] usertools: add virtio-vhost-user devices to dpdk-devbind.py
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (26 preceding siblings ...)
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 27/28] config: add option for the virtio-vhost-user transport Nikos Dragazis
@ 2019-06-19 15:14 ` Nikos Dragazis
       [not found] ` <CGME20190620113240eucas1p22ca4faa64a36bbb7aec38a81298ade56@eucas1p2.samsung.com>
  2019-06-20 11:35 ` Maxime Coquelin
  29 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-19 15:14 UTC (permalink / raw)
  To: dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

The virtio-vhost-user PCI adapter is not detected in any existing group
of devices supported by dpdk-devbind.py.  Add a new "Others" group for
miscellaneous devices like this one.

Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 usertools/dpdk-devbind.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 9e79f0d..642b182 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -30,6 +30,8 @@
               'SVendor': None, 'SDevice': None}
 avp_vnic = {'Class': '05', 'Vendor': '1af4', 'Device': '1110',
               'SVendor': None, 'SDevice': None}
+virtio_vhost_user = {'Class': '00', 'Vendor': '1af4', 'Device': '1017,1058',
+                     'SVendor': None, 'SDevice': None}
 
 octeontx2_sso = {'Class': '08', 'Vendor': '177d', 'Device': 'a0f9,a0fa',
               'SVendor': None, 'SDevice': None}
@@ -41,6 +43,7 @@
 eventdev_devices = [cavium_sso, cavium_tim, octeontx2_sso]
 mempool_devices = [cavium_fpa, octeontx2_npa]
 compress_devices = [cavium_zip]
+other_devices = [virtio_vhost_user]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
@@ -595,6 +598,8 @@ def show_status():
     if status_dev == "compress" or status_dev == "all":
         show_device_status(compress_devices , "Compress")
 
+    if status_dev == 'other' or status_dev == 'all':
+        show_device_status(other_devices, "Other")
 
 def parse_args():
     '''Parses the command-line arguments given by the user and takes the
@@ -670,6 +675,7 @@ def do_arg_actions():
             get_device_details(eventdev_devices)
             get_device_details(mempool_devices)
             get_device_details(compress_devices)
+            get_device_details(other_devices)
         show_status()
 
 
@@ -690,6 +696,7 @@ def main():
     get_device_details(eventdev_devices)
     get_device_details(mempool_devices)
     get_device_details(compress_devices)
+    get_device_details(other_devices)
     do_arg_actions()
 
 if __name__ == "__main__":
-- 
2.7.4


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure Nikos Dragazis
@ 2019-06-19 20:14   ` Aaron Conole
  2019-06-20 10:30     ` Bruce Richardson
  2019-06-20 18:19     ` Nikos Dragazis
  0 siblings, 2 replies; 40+ messages in thread
From: Aaron Conole @ 2019-06-19 20:14 UTC (permalink / raw)
  To: Nikos Dragazis
  Cc: dev, Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

Nikos Dragazis <ndragazis@arrikto.com> writes:

> This is the first of a series of patches, whose purpose is to add
> support for the virtio-vhost-user transport. This is a vhost-user
> transport implementation that is different from the default AF_UNIX
> transport. It uses the virtio-vhost-user PCI device in order to tunnel
> vhost-user protocol messages over virtio. This lets guests act as vhost
> device backends for other guests.
>
> File descriptor passing is specific to the AF_UNIX vhost-user protocol
> transport.  In order to add support for additional transports, it is
> necessary to extract transport-specific code from the main vhost-user
> code.
>
> This patch introduces struct vhost_transport_ops and associates each
> device with a transport.  Core vhost-user code calls into
> vhost_transport_ops to perform transport-specific operations.
>
> Notifying callfd is a transport-specific operation, so it belongs to
> trans_af_unix.c.
>
> Several more patches follow this one to complete the task of moving
> AF_UNIX transport code out of core vhost-user code.
>
> Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---

You'll need to also accommodate the meson build - probably with
something like:

diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
index 3090bbe08..81b70683b 100644
--- a/lib/librte_vhost/meson.build
+++ b/lib/librte_vhost/meson.build
@@ -14,6 +14,6 @@ allow_experimental_apis = true
 cflags += '-fno-strict-aliasing'
 sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vdpa.c',
                'vhost.c', 'vhost_user.c',
-               'virtio_net.c', 'vhost_crypto.c')
+               'virtio_net.c', 'vhost_crypto.c', 'trans_af_unix.c')
 headers = files('rte_vhost.h', 'rte_vdpa.h', 'rte_vhost_crypto.h')
 deps += ['ethdev', 'cryptodev', 'hash', 'pci']


>  lib/librte_vhost/Makefile        |  2 +-
>  lib/librte_vhost/trans_af_unix.c | 20 ++++++++++++++++++++
>  lib/librte_vhost/vhost.c         |  1 +
>  lib/librte_vhost/vhost.h         | 34 +++++++++++++++++++++++++++++-----
>  4 files changed, 51 insertions(+), 6 deletions(-)
>  create mode 100644 lib/librte_vhost/trans_af_unix.c
>
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index 8623e91..5ff5fb2 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -23,7 +23,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
>  
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
> -					vhost_user.c virtio_net.c vdpa.c
> +					vhost_user.c virtio_net.c vdpa.c trans_af_unix.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
> diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
> new file mode 100644
> index 0000000..3f0c308
> --- /dev/null
> +++ b/lib/librte_vhost/trans_af_unix.c
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + * Copyright(c) 2017 Red Hat, Inc.
> + * Copyright(c) 2019 Arrikto Inc.
> + */
> +
> +#include "vhost.h"
> +
> +static int
> +af_unix_vring_call(struct virtio_net *dev __rte_unused,
> +		   struct vhost_virtqueue *vq)
> +{
> +	if (vq->callfd >= 0)
> +		eventfd_write(vq->callfd, (eventfd_t)1);
> +	return 0;
> +}
> +
> +const struct vhost_transport_ops af_unix_trans_ops = {
> +	.vring_call = af_unix_vring_call,
> +};
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 981837b..a36bc01 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -507,6 +507,7 @@ vhost_new_device(void)
>  	dev->vid = i;
>  	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
>  	dev->slave_req_fd = -1;
> +	dev->trans_ops = &af_unix_trans_ops;
>  	dev->vdpa_dev_id = -1;
>  	dev->postcopy_ufd = -1;
>  	rte_spinlock_init(&dev->slave_req_lock);
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 884befa..077f213 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -286,6 +286,30 @@ struct guest_page {
>  	uint64_t size;
>  };
>  
> +struct virtio_net;
> +
> +/**
> + * A structure containing function pointers for transport-specific operations.
> + */
> +struct vhost_transport_ops {
> +	/**
> +	 * Notify the guest that used descriptors have been added to the vring.
> +	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
> +	 * so this function just needs to perform the notification.
> +	 *
> +	 * @param dev
> +	 *  vhost device
> +	 * @param vq
> +	 *  vhost virtqueue
> +	 * @return
> +	 *  0 on success, -1 on failure
> +	 */
> +	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
> +};
> +
> +/** The traditional AF_UNIX vhost-user protocol transport. */
> +extern const struct vhost_transport_ops af_unix_trans_ops;
> +
>  /**
>   * Device structure contains all configuration information relating
>   * to the device.
> @@ -312,6 +336,7 @@ struct virtio_net {
>  	uint16_t		mtu;
>  
>  	struct vhost_device_ops const *notify_ops;
> +	struct vhost_transport_ops const *trans_ops;
>  
>  	uint32_t		nr_guest_pages;
>  	uint32_t		max_guest_pages;
> @@ -544,12 +569,11 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
>  					(vq->callfd >= 0)) ||
>  				unlikely(!signalled_used_valid))
> -			eventfd_write(vq->callfd, (eventfd_t) 1);
> +			dev->trans_ops->vring_call(dev, vq);
>  	} else {
>  		/* Kick the guest if necessary. */
> -		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
> -				&& (vq->callfd >= 0))
> -			eventfd_write(vq->callfd, (eventfd_t)1);
> +		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> +			dev->trans_ops->vring_call(dev, vq);
>  	}
>  }
>  
> @@ -601,7 +625,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  		kick = true;
>  kick:
>  	if (kick)
> -		eventfd_write(vq->callfd, (eventfd_t)1);
> +		dev->trans_ops->vring_call(dev, vq);
>  }
>  
>  static __rte_always_inline void

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure
  2019-06-19 20:14   ` Aaron Conole
@ 2019-06-20 10:30     ` Bruce Richardson
  2019-06-20 18:24       ` Nikos Dragazis
  2019-06-20 18:19     ` Nikos Dragazis
  1 sibling, 1 reply; 40+ messages in thread
From: Bruce Richardson @ 2019-06-20 10:30 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Nikos Dragazis, dev, Maxime Coquelin, Tiwei Bie, Zhihong Wang,
	Stefan Hajnoczi, Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

On Wed, Jun 19, 2019 at 04:14:09PM -0400, Aaron Conole wrote:
> Nikos Dragazis <ndragazis@arrikto.com> writes:
> 
> > This is the first of a series of patches, whose purpose is to add
> > support for the virtio-vhost-user transport. This is a vhost-user
> > transport implementation that is different from the default AF_UNIX
> > transport. It uses the virtio-vhost-user PCI device in order to tunnel
> > vhost-user protocol messages over virtio. This lets guests act as vhost
> > device backends for other guests.
> >
> > File descriptor passing is specific to the AF_UNIX vhost-user protocol
> > transport.  In order to add support for additional transports, it is
> > necessary to extract transport-specific code from the main vhost-user
> > code.
> >
> > This patch introduces struct vhost_transport_ops and associates each
> > device with a transport.  Core vhost-user code calls into
> > vhost_transport_ops to perform transport-specific operations.
> >
> > Notifying callfd is a transport-specific operation, so it belongs to
> > trans_af_unix.c.
> >
> > Several more patches follow this one to complete the task of moving
> > AF_UNIX transport code out of core vhost-user code.
> >
> > Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> 
> You'll need to also accommodate the meson build - probably with
> something like:
> 
> diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> index 3090bbe08..81b70683b 100644
> --- a/lib/librte_vhost/meson.build
> +++ b/lib/librte_vhost/meson.build
> @@ -14,6 +14,6 @@ allow_experimental_apis = true
>  cflags += '-fno-strict-aliasing'
>  sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vdpa.c',
>                 'vhost.c', 'vhost_user.c',
> -               'virtio_net.c', 'vhost_crypto.c')
> +               'virtio_net.c', 'vhost_crypto.c', 'trans_af_unix.c')
>  headers = files('rte_vhost.h', 'rte_vdpa.h', 'rte_vhost_crypto.h')
>  deps += ['ethdev', 'cryptodev', 'hash', 'pci']
> 
> 

Yep, except I think we should try and keep the files in alphabetical order,
with only a couple of entries per line [so place trans_af_unix.c on a new
line with vdpa.c].

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport
       [not found] ` <CGME20190620113240eucas1p22ca4faa64a36bbb7aec38a81298ade56@eucas1p2.samsung.com>
@ 2019-06-20 11:32   ` Ilya Maximets
  2019-06-20 23:44     ` Nikos Dragazis
  0 siblings, 1 reply; 40+ messages in thread
From: Ilya Maximets @ 2019-06-20 11:32 UTC (permalink / raw)
  To: Nikos Dragazis, dev
  Cc: Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

On 19.06.2019 18:14, Nikos Dragazis wrote:
> Hi everyone,

Hi. I didn't look at the code, just a few comments inline.

> 
> this patch series introduces the concept of the virtio-vhost-user
> transport. This is actually a revised version of an earlier RFC
> implementation that has been proposed by Stefan Hajnoczi [1]. Though
> this is a great feature, it seems to have been stalled, so I’d like to
> restart the conversation on this and hopefully get it merged with your
> help. Let me give you an overview.
> 
> The virtio-vhost-user transport is a vhost-user transport implementation
> that is based on the virtio-vhost-user device. Its key difference with
> the existing transport is that it allows deploying vhost-user targets
> inside dedicated Storage Appliance VMs instead of host user space. In
> other words, it allows having guests that act as vhost-user backends for
> other guests.
> 
> The virtio-vhost-user device implements the vhost-user control plane
> (master-slave communication) as follows:
> 
> 1. it parses the vhost-user messages from the vhost-user unix domain
>    socket and forwards them to the slave guest through virtqueues
> 
> 2. it maps the vhost memory regions in QEMU’s process address space and
>    exposes them to the slave guest as a RAM-backed PCI MMIO region
> 
> 3. it hooks up doorbells to the callfds. The slave guest can use these
>    doorbells to interrupt the master guest driver
> 
> The device code has not yet been merged into upstream QEMU, but this is
> definitely the end goal. The current state is that we are awaiting for
> the approval of the virtio spec.
> 
> I have Cced Darek from the SPDK community who has helped me a lot by
> reviewing this series. Note that any device type could be implemented
> over this new transport. So, adding the virtio-vhost-user transport in
> DPDK would allow using it from SPDK as well.
> 
> Getting into the code internals, this patch series makes the following
> changes:
> 
> 1. introduce a generic interface for the transport-specific operations.
>    Each of the two available transports, the pre-existing AF_UNIX
>    transport and the virtio-vhost-user transport, is going to implement
>    this interface. The AF_UNIX-specific code has been extracted from the
>    core vhost-user code and is now part of the AF_UNIX transport
>    implementation in trans_af_unix.c.
> 
> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>    transport requires a driver for the virtio-vhost-user devices. The
>    driver along with the transport implementation have been packed into
>    a separate library in `drivers/virtio_vhost_user/`. The necessary
>    virtio-pci code has been copied from `drivers/net/virtio/`. Some
>    additional changes have been made so that the driver can utilize the
>    additional resources of the virtio-vhost-user device.
> 
> 3. update librte_vhost public API to enable choosing transport for each
>    new vhost device. Extend the vhost net driver and vhost-scsi example
>    application to export this new API to the end user.
> 
> The primary changes I did to Stefan’s RFC implementation are the
> following:
> 
> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>    live migration relies on the userfault fd mechanism, which cannot be
>    supported by virtio-vhost-user.
> 
> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>    the log memory region involves mapping/unmapping guest memory. This
>    is an AF_UNIX transport-specific operation.

Logging dirty pages is the main concept of live migration support. Does it
mean that the live migration is not supported for virtio-vhost-user at all?

> 
> 3. introduced a vhost transport operation for
>    process_slave_message_reply()
> 
> 4. moved the virtio-vhost-user transport/driver into a separate library
>    in `drivers/virtio_vhost_user/`. This required making vhost.h and
>    vhost_user.h part of librte_vhost public API and exporting some
>    private symbols via the version script. This looks better to me that
>    just moving the entire librte_vhost into `drivers/`. I am not sure if
>    this is the most appropriate solution. I am looking forward to your
>    suggestions on this.

Moving the virtio-vhost-user code to a separate driver looks strange for me.
What is the purpose?

Exporting a lot of vhost internal structures will lead to a frequent API/ABI
breakages and will slow down accepting changes to releases in general.

It looks inconsistent to have 'trans_af_unix.c' in 'lib/librte_vhost/' and
'trans_virtio_vhost_user.c' in 'drivers/virtio_vhost_user/' because these
files should be similar in provided functionality, hence, should be located
in similar places.

> 
> 5. made use of the virtio PCI capabilities for the additional device
>    resources (doorbells, shared memory). This required changes in
>    virtio_pci.c and trans_virtio_vhost_user.c.
> 
> 6. [minor] changed some commit headlines to comply with
>    check-git-log.sh.
> 
> Please, have a look and let me know about your thoughts. Any
> reviews/pointers/suggestions are welcome.
> 
> Best regards,
> Nikos
> 
> [1] http://mails.dpdk.org/archives/dev/2018-January/088155.html
> 
> 
> Nikos Dragazis (23):
>   vhost: introduce vhost transport operations structure
>   vhost: move socket management code
>   vhost: move socket fd and un sockaddr
>   vhost: move vhost-user connection
>   vhost: move vhost-user reconnection
>   vhost: move vhost-user fdset
>   vhost: propagate vhost transport operations
>   vhost: use a single structure for the device state
>   vhost: extract socket I/O into transport
>   vhost: move slave request fd and lock
>   vhost: move mmap/munmap
>   vhost: move setup of the log memory region
>   vhost: remove main fd parameter from msg handlers
>   vhost: move postcopy live migration code
>   vhost: support registering additional vhost-user transports
>   drivers/virtio_vhost_user: add virtio PCI framework
>   drivers: add virtio-vhost-user transport
>   drivers/virtio_vhost_user: use additional device resources
>   vhost: add flag for choosing vhost-user transport
>   net/vhost: add virtio-vhost-user support
>   mk: link apps with virtio-vhost-user driver
>   config: add option for the virtio-vhost-user transport
>   usertools: add virtio-vhost-user devices to dpdk-devbind.py
> 
> Stefan Hajnoczi (5):
>   vhost: allocate per-socket transport state
>   vhost: move start server/client calls
>   vhost: add index field in vhost virtqueues
>   examples/vhost_scsi: add --socket-file argument
>   examples/vhost_scsi: add virtio-vhost-user support
> 
>  config/common_base                                 |    6 +
>  config/common_linux                                |    1 +
>  drivers/Makefile                                   |    5 +
>  drivers/net/vhost/rte_eth_vhost.c                  |   13 +
>  drivers/virtio_vhost_user/Makefile                 |   27 +
>  .../rte_virtio_vhost_user_version.map              |    4 +
>  .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
>  drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
>  drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
>  drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
>  drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
>  examples/vhost_scsi/vhost_scsi.c                   |  103 +-
>  lib/librte_vhost/Makefile                          |    4 +-
>  lib/librte_vhost/rte_vhost.h                       |    1 +
>  lib/librte_vhost/rte_vhost_version.map             |   11 +
>  lib/librte_vhost/socket.c                          |  685 +-----------
>  lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
>  lib/librte_vhost/vhost.c                           |   22 +-
>  lib/librte_vhost/vhost.h                           |  298 +++++-
>  lib/librte_vhost/vhost_user.c                      |  474 ++-------
>  lib/librte_vhost/vhost_user.h                      |   10 +-
>  mk/rte.app.mk                                      |    6 +
>  usertools/dpdk-devbind.py                          |    7 +
>  23 files changed, 3764 insertions(+), 1092 deletions(-)
>  create mode 100644 drivers/virtio_vhost_user/Makefile
>  create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
>  create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
>  create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
>  create mode 100644 drivers/virtio_vhost_user/virtqueue.h
>  create mode 100644 lib/librte_vhost/trans_af_unix.c
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport
  2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
                   ` (28 preceding siblings ...)
       [not found] ` <CGME20190620113240eucas1p22ca4faa64a36bbb7aec38a81298ade56@eucas1p2.samsung.com>
@ 2019-06-20 11:35 ` Maxime Coquelin
  2019-06-22 20:26   ` Nikos Dragazis
  29 siblings, 1 reply; 40+ messages in thread
From: Maxime Coquelin @ 2019-06-20 11:35 UTC (permalink / raw)
  To: Nikos Dragazis, dev
  Cc: Tiwei Bie, Zhihong Wang, Stefan Hajnoczi, Wei Wang,
	Stojaczyk Dariusz, Vangelis Koukis

Hi Nikos,

On 6/19/19 5:14 PM, Nikos Dragazis wrote:
> Hi everyone,
> 
> this patch series introduces the concept of the virtio-vhost-user
> transport. This is actually a revised version of an earlier RFC
> implementation that has been proposed by Stefan Hajnoczi [1]. Though
> this is a great feature, it seems to have been stalled, so I’d like to
> restart the conversation on this and hopefully get it merged with your
> help. Let me give you an overview.

Thanks for taking over the series!

I think you are already aware of that, but it arrives too late to
consider it for v19.08, as the proposal deadline is over by almost 3
weeks.

That said, it is good that you sent it early, so that we can work to
make it in for v19.11.

> The virtio-vhost-user transport is a vhost-user transport implementation
> that is based on the virtio-vhost-user device. Its key difference with
> the existing transport is that it allows deploying vhost-user targets
> inside dedicated Storage Appliance VMs instead of host user space. In
> other words, it allows having guests that act as vhost-user backends for
> other guests.
> 
> The virtio-vhost-user device implements the vhost-user control plane
> (master-slave communication) as follows:
> 
> 1. it parses the vhost-user messages from the vhost-user unix domain
>     socket and forwards them to the slave guest through virtqueues
> 
> 2. it maps the vhost memory regions in QEMU’s process address space and
>     exposes them to the slave guest as a RAM-backed PCI MMIO region
> 
> 3. it hooks up doorbells to the callfds. The slave guest can use these
>     doorbells to interrupt the master guest driver
> 
> The device code has not yet been merged into upstream QEMU, but this is
> definitely the end goal.

Could you provide a pointer to the QEMU series, and instructions to test
this new device?

> The current state is that we are awaiting for
> the approval of the virtio spec.

Ditto, a link to the spec patches would be useful.

> I have Cced Darek from the SPDK community who has helped me a lot by
> reviewing this series. Note that any device type could be implemented
> over this new transport. So, adding the virtio-vhost-user transport in
> DPDK would allow using it from SPDK as well.
> 
> Getting into the code internals, this patch series makes the following
> changes:
> 
> 1. introduce a generic interface for the transport-specific operations.
>     Each of the two available transports, the pre-existing AF_UNIX
>     transport and the virtio-vhost-user transport, is going to implement
>     this interface. The AF_UNIX-specific code has been extracted from the
>     core vhost-user code and is now part of the AF_UNIX transport
>     implementation in trans_af_unix.c.
> 
> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>     transport requires a driver for the virtio-vhost-user devices. The
>     driver along with the transport implementation have been packed into
>     a separate library in `drivers/virtio_vhost_user/`. The necessary
>     virtio-pci code has been copied from `drivers/net/virtio/`. Some
>     additional changes have been made so that the driver can utilize the
>     additional resources of the virtio-vhost-user device.
> 
> 3. update librte_vhost public API to enable choosing transport for each
>     new vhost device. Extend the vhost net driver and vhost-scsi example
>     application to export this new API to the end user.
> 
> The primary changes I did to Stefan’s RFC implementation are the
> following:
> 
> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>     live migration relies on the userfault fd mechanism, which cannot be
>     supported by virtio-vhost-user.
> 
> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>     the log memory region involves mapping/unmapping guest memory. This
>     is an AF_UNIX transport-specific operation.
> 
> 3. introduced a vhost transport operation for
>     process_slave_message_reply()
> 
> 4. moved the virtio-vhost-user transport/driver into a separate library
>     in `drivers/virtio_vhost_user/`. This required making vhost.h and
>     vhost_user.h part of librte_vhost public API and exporting some
>     private symbols via the version script. This looks better to me that
>     just moving the entire librte_vhost into `drivers/`. I am not sure if
>     this is the most appropriate solution. I am looking forward to your
>     suggestions on this.

I'm not sure this is the right place to put it.

> 5. made use of the virtio PCI capabilities for the additional device
>     resources (doorbells, shared memory). This required changes in
>     virtio_pci.c and trans_virtio_vhost_user.c.
> 
> 6. [minor] changed some commit headlines to comply with
>     check-git-log.sh.
> 
> Please, have a look and let me know about your thoughts. Any
> reviews/pointers/suggestions are welcome.

Maxime

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure
  2019-06-19 20:14   ` Aaron Conole
  2019-06-20 10:30     ` Bruce Richardson
@ 2019-06-20 18:19     ` Nikos Dragazis
  1 sibling, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-20 18:19 UTC (permalink / raw)
  To: Aaron Conole
  Cc: dev, Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

On 19/6/19 11:14 μ.μ., Aaron Conole wrote:
> Nikos Dragazis <ndragazis@arrikto.com> writes:
>
>> This is the first of a series of patches, whose purpose is to add
>> support for the virtio-vhost-user transport. This is a vhost-user
>> transport implementation that is different from the default AF_UNIX
>> transport. It uses the virtio-vhost-user PCI device in order to tunnel
>> vhost-user protocol messages over virtio. This lets guests act as vhost
>> device backends for other guests.
>>
>> File descriptor passing is specific to the AF_UNIX vhost-user protocol
>> transport.  In order to add support for additional transports, it is
>> necessary to extract transport-specific code from the main vhost-user
>> code.
>>
>> This patch introduces struct vhost_transport_ops and associates each
>> device with a transport.  Core vhost-user code calls into
>> vhost_transport_ops to perform transport-specific operations.
>>
>> Notifying callfd is a transport-specific operation, so it belongs to
>> trans_af_unix.c.
>>
>> Several more patches follow this one to complete the task of moving
>> AF_UNIX transport code out of core vhost-user code.
>>
>> Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
> You'll need to also accommodate the meson build - probably with
> something like:
>
> diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> index 3090bbe08..81b70683b 100644
> --- a/lib/librte_vhost/meson.build
> +++ b/lib/librte_vhost/meson.build
> @@ -14,6 +14,6 @@ allow_experimental_apis = true
>  cflags += '-fno-strict-aliasing'
>  sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vdpa.c',
>                 'vhost.c', 'vhost_user.c',
> -               'virtio_net.c', 'vhost_crypto.c')
> +               'virtio_net.c', 'vhost_crypto.c', 'trans_af_unix.c')
>  headers = files('rte_vhost.h', 'rte_vdpa.h', 'rte_vhost_crypto.h')
>  deps += ['ethdev', 'cryptodev', 'hash', 'pci']

Thanks for the pointer. I will incorporate the change in the second
version of the patch series along with any other potential comments.

>
>
>>  lib/librte_vhost/Makefile        |  2 +-
>>  lib/librte_vhost/trans_af_unix.c | 20 ++++++++++++++++++++
>>  lib/librte_vhost/vhost.c         |  1 +
>>  lib/librte_vhost/vhost.h         | 34 +++++++++++++++++++++++++++++-----
>>  4 files changed, 51 insertions(+), 6 deletions(-)
>>  create mode 100644 lib/librte_vhost/trans_af_unix.c
>>
>> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
>> index 8623e91..5ff5fb2 100644
>> --- a/lib/librte_vhost/Makefile
>> +++ b/lib/librte_vhost/Makefile
>> @@ -23,7 +23,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
>>  
>>  # all source are stored in SRCS-y
>>  SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
>> -					vhost_user.c virtio_net.c vdpa.c
>> +					vhost_user.c virtio_net.c vdpa.c trans_af_unix.c
>>  
>>  # install includes
>>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
>> diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
>> new file mode 100644
>> index 0000000..3f0c308
>> --- /dev/null
>> +++ b/lib/librte_vhost/trans_af_unix.c
>> @@ -0,0 +1,20 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2010-2018 Intel Corporation
>> + * Copyright(c) 2017 Red Hat, Inc.
>> + * Copyright(c) 2019 Arrikto Inc.
>> + */
>> +
>> +#include "vhost.h"
>> +
>> +static int
>> +af_unix_vring_call(struct virtio_net *dev __rte_unused,
>> +		   struct vhost_virtqueue *vq)
>> +{
>> +	if (vq->callfd >= 0)
>> +		eventfd_write(vq->callfd, (eventfd_t)1);
>> +	return 0;
>> +}
>> +
>> +const struct vhost_transport_ops af_unix_trans_ops = {
>> +	.vring_call = af_unix_vring_call,
>> +};
>> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
>> index 981837b..a36bc01 100644
>> --- a/lib/librte_vhost/vhost.c
>> +++ b/lib/librte_vhost/vhost.c
>> @@ -507,6 +507,7 @@ vhost_new_device(void)
>>  	dev->vid = i;
>>  	dev->flags = VIRTIO_DEV_BUILTIN_VIRTIO_NET;
>>  	dev->slave_req_fd = -1;
>> +	dev->trans_ops = &af_unix_trans_ops;
>>  	dev->vdpa_dev_id = -1;
>>  	dev->postcopy_ufd = -1;
>>  	rte_spinlock_init(&dev->slave_req_lock);
>> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
>> index 884befa..077f213 100644
>> --- a/lib/librte_vhost/vhost.h
>> +++ b/lib/librte_vhost/vhost.h
>> @@ -286,6 +286,30 @@ struct guest_page {
>>  	uint64_t size;
>>  };
>>  
>> +struct virtio_net;
>> +
>> +/**
>> + * A structure containing function pointers for transport-specific operations.
>> + */
>> +struct vhost_transport_ops {
>> +	/**
>> +	 * Notify the guest that used descriptors have been added to the vring.
>> +	 * The VRING_AVAIL_F_NO_INTERRUPT flag and event idx have already been checked
>> +	 * so this function just needs to perform the notification.
>> +	 *
>> +	 * @param dev
>> +	 *  vhost device
>> +	 * @param vq
>> +	 *  vhost virtqueue
>> +	 * @return
>> +	 *  0 on success, -1 on failure
>> +	 */
>> +	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
>> +};
>> +
>> +/** The traditional AF_UNIX vhost-user protocol transport. */
>> +extern const struct vhost_transport_ops af_unix_trans_ops;
>> +
>>  /**
>>   * Device structure contains all configuration information relating
>>   * to the device.
>> @@ -312,6 +336,7 @@ struct virtio_net {
>>  	uint16_t		mtu;
>>  
>>  	struct vhost_device_ops const *notify_ops;
>> +	struct vhost_transport_ops const *trans_ops;
>>  
>>  	uint32_t		nr_guest_pages;
>>  	uint32_t		max_guest_pages;
>> @@ -544,12 +569,11 @@ vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>>  		if ((vhost_need_event(vhost_used_event(vq), new, old) &&
>>  					(vq->callfd >= 0)) ||
>>  				unlikely(!signalled_used_valid))
>> -			eventfd_write(vq->callfd, (eventfd_t) 1);
>> +			dev->trans_ops->vring_call(dev, vq);
>>  	} else {
>>  		/* Kick the guest if necessary. */
>> -		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
>> -				&& (vq->callfd >= 0))
>> -			eventfd_write(vq->callfd, (eventfd_t)1);
>> +		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>> +			dev->trans_ops->vring_call(dev, vq);
>>  	}
>>  }
>>  
>> @@ -601,7 +625,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
>>  		kick = true;
>>  kick:
>>  	if (kick)
>> -		eventfd_write(vq->callfd, (eventfd_t)1);
>> +		dev->trans_ops->vring_call(dev, vq);
>>  }
>>  
>>  static __rte_always_inline void


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure
  2019-06-20 10:30     ` Bruce Richardson
@ 2019-06-20 18:24       ` Nikos Dragazis
  0 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-20 18:24 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Aaron Conole, dev, Maxime Coquelin, Tiwei Bie, Zhihong Wang,
	Stefan Hajnoczi, Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

On 20/6/19 1:30 μ.μ., Bruce Richardson wrote:
> On Wed, Jun 19, 2019 at 04:14:09PM -0400, Aaron Conole wrote:
>> Nikos Dragazis <ndragazis@arrikto.com> writes:
>>
>>> This is the first of a series of patches, whose purpose is to add
>>> support for the virtio-vhost-user transport. This is a vhost-user
>>> transport implementation that is different from the default AF_UNIX
>>> transport. It uses the virtio-vhost-user PCI device in order to tunnel
>>> vhost-user protocol messages over virtio. This lets guests act as vhost
>>> device backends for other guests.
>>>
>>> File descriptor passing is specific to the AF_UNIX vhost-user protocol
>>> transport.  In order to add support for additional transports, it is
>>> necessary to extract transport-specific code from the main vhost-user
>>> code.
>>>
>>> This patch introduces struct vhost_transport_ops and associates each
>>> device with a transport.  Core vhost-user code calls into
>>> vhost_transport_ops to perform transport-specific operations.
>>>
>>> Notifying callfd is a transport-specific operation, so it belongs to
>>> trans_af_unix.c.
>>>
>>> Several more patches follow this one to complete the task of moving
>>> AF_UNIX transport code out of core vhost-user code.
>>>
>>> Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>> You'll need to also accommodate the meson build - probably with
>> something like:
>>
>> diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
>> index 3090bbe08..81b70683b 100644
>> --- a/lib/librte_vhost/meson.build
>> +++ b/lib/librte_vhost/meson.build
>> @@ -14,6 +14,6 @@ allow_experimental_apis = true
>>  cflags += '-fno-strict-aliasing'
>>  sources = files('fd_man.c', 'iotlb.c', 'socket.c', 'vdpa.c',
>>                 'vhost.c', 'vhost_user.c',
>> -               'virtio_net.c', 'vhost_crypto.c')
>> +               'virtio_net.c', 'vhost_crypto.c', 'trans_af_unix.c')
>>  headers = files('rte_vhost.h', 'rte_vdpa.h', 'rte_vhost_crypto.h')
>>  deps += ['ethdev', 'cryptodev', 'hash', 'pci']
>>
>>
> Yep, except I think we should try and keep the files in alphabetical order,
> with only a couple of entries per line [so place trans_af_unix.c on a new
> line with vdpa.c].

Ack

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport
  2019-06-20 11:32   ` [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Ilya Maximets
@ 2019-06-20 23:44     ` Nikos Dragazis
  0 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-20 23:44 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: dev, Maxime Coquelin, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi,
	Wei Wang, Stojaczyk Dariusz, Vangelis Koukis

On 20/6/19 2:32 μ.μ., Ilya Maximets wrote:
> On 19.06.2019 18:14, Nikos Dragazis wrote:
>> Hi everyone,
> Hi. I didn't look at the code, just a few comments inline.
>
>> this patch series introduces the concept of the virtio-vhost-user
>> transport. This is actually a revised version of an earlier RFC
>> implementation that has been proposed by Stefan Hajnoczi [1]. Though
>> this is a great feature, it seems to have been stalled, so I’d like to
>> restart the conversation on this and hopefully get it merged with your
>> help. Let me give you an overview.
>>
>> The virtio-vhost-user transport is a vhost-user transport implementation
>> that is based on the virtio-vhost-user device. Its key difference with
>> the existing transport is that it allows deploying vhost-user targets
>> inside dedicated Storage Appliance VMs instead of host user space. In
>> other words, it allows having guests that act as vhost-user backends for
>> other guests.
>>
>> The virtio-vhost-user device implements the vhost-user control plane
>> (master-slave communication) as follows:
>>
>> 1. it parses the vhost-user messages from the vhost-user unix domain
>>    socket and forwards them to the slave guest through virtqueues
>>
>> 2. it maps the vhost memory regions in QEMU’s process address space and
>>    exposes them to the slave guest as a RAM-backed PCI MMIO region
>>
>> 3. it hooks up doorbells to the callfds. The slave guest can use these
>>    doorbells to interrupt the master guest driver
>>
>> The device code has not yet been merged into upstream QEMU, but this is
>> definitely the end goal. The current state is that we are awaiting for
>> the approval of the virtio spec.
>>
>> I have Cced Darek from the SPDK community who has helped me a lot by
>> reviewing this series. Note that any device type could be implemented
>> over this new transport. So, adding the virtio-vhost-user transport in
>> DPDK would allow using it from SPDK as well.
>>
>> Getting into the code internals, this patch series makes the following
>> changes:
>>
>> 1. introduce a generic interface for the transport-specific operations.
>>    Each of the two available transports, the pre-existing AF_UNIX
>>    transport and the virtio-vhost-user transport, is going to implement
>>    this interface. The AF_UNIX-specific code has been extracted from the
>>    core vhost-user code and is now part of the AF_UNIX transport
>>    implementation in trans_af_unix.c.
>>
>> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>>    transport requires a driver for the virtio-vhost-user devices. The
>>    driver along with the transport implementation have been packed into
>>    a separate library in `drivers/virtio_vhost_user/`. The necessary
>>    virtio-pci code has been copied from `drivers/net/virtio/`. Some
>>    additional changes have been made so that the driver can utilize the
>>    additional resources of the virtio-vhost-user device.
>>
>> 3. update librte_vhost public API to enable choosing transport for each
>>    new vhost device. Extend the vhost net driver and vhost-scsi example
>>    application to export this new API to the end user.
>>
>> The primary changes I did to Stefan’s RFC implementation are the
>> following:
>>
>> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>>    live migration relies on the userfault fd mechanism, which cannot be
>>    supported by virtio-vhost-user.
>>
>> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>>    the log memory region involves mapping/unmapping guest memory. This
>>    is an AF_UNIX transport-specific operation.
> Logging dirty pages is the main concept of live migration support. Does it
> mean that the live migration is not supported for virtio-vhost-user at all?

No, it is supported. To be more precise, it can be supported, it is part
of the on-going virtio device specification:

https://lists.oasis-open.org/archives/virtio-dev/201905/msg00022.html

and it is in my TODO list for the device code. Here is how it works:

the log memory region, just like the other vhost memory regions, is a
portion of the master guest memory. In case of the AF_UNIX transport,
the master sends a VHOST_USER_SET_LOG_BASE message and the vhost target
mmaps the log memory region into the process' virtual address space. In
case of the virtio-vhost-user transport, the virtio-vhost-user device
parses the VHOST_USER_SET_LOG_BASE message and maps the log memory
region into QEMU's process address space. It then exposes the log memory
region to the slave guest as a RAM-backed PCI MMIO region.

So, from the vhost target's viewpoint, the only difference is the means
of accessing the log memory region. In case of the AF_UNIX transport,
the vhost target receives a file descriptor and mmaps this file. In case
of the virtio-vhost-user transport, the device exports the log memory
region to the vhost target (running in slave guest user space) via an
MMIO BAR.

To recap, just to make sure it is clear, the virtio-vhost-user transport
does support precopy live migration, but it doesn't support postcopy
live migration.

>
>> 3. introduced a vhost transport operation for
>>    process_slave_message_reply()
>>
>> 4. moved the virtio-vhost-user transport/driver into a separate library
>>    in `drivers/virtio_vhost_user/`. This required making vhost.h and
>>    vhost_user.h part of librte_vhost public API and exporting some
>>    private symbols via the version script. This looks better to me that
>>    just moving the entire librte_vhost into `drivers/`. I am not sure if
>>    this is the most appropriate solution. I am looking forward to your
>>    suggestions on this.
> Moving the virtio-vhost-user code to a separate driver looks strange for me.
> What is the purpose?

The virtio-vhost-user transport is based on the virtio-vhost-user
device. This means that we need a user space driver for this device
type.  Currently, the driver and the transport implementation are packed
together into `trans_virtio_vhost_user.c`. There are 2 reasons why I
moved all the virtio-vhost-user code (driver + transport implementation)
into `drivers/`:

1. my understanding is that all drivers should be located in `drivers/`.

2. |the driver depends on `rte_bus_pci.h` which gets exported when
   `drivers/` is built whereas librte_vhost is built before `drivers/`.
   However, I think this could be overcomed by just adding this line:

   CFLAGS += -I$(RTE_SDK)/drivers/bus/pci

   in librte_vhost Makefile.||


|
>
> Exporting a lot of vhost internal structures will lead to a frequent API/ABI
> breakages and will slow down accepting changes to releases in general.

I understand that there are many parameters that we have to consider. My
implementation is just a proposal.  I am looking forward to hearing from
you and the community any other suggestions on this.

>
> It looks inconsistent to have 'trans_af_unix.c' in 'lib/librte_vhost/' and
> 'trans_virtio_vhost_user.c' in 'drivers/virtio_vhost_user/' because these
> files should be similar in provided functionality, hence, should be located
> in similar places.

I agree. Actually, I was thinking about separating the driver from the
transport implementation and moving just the driver into `drivers/`. But
this would require exporting some low-level virtio PCI functions. And
this didn't look appropriate to me.

>
>> 5. made use of the virtio PCI capabilities for the additional device
>>    resources (doorbells, shared memory). This required changes in
>>    virtio_pci.c and trans_virtio_vhost_user.c.
>>
>> 6. [minor] changed some commit headlines to comply with
>>    check-git-log.sh.
>>
>> Please, have a look and let me know about your thoughts. Any
>> reviews/pointers/suggestions are welcome.
>>
>> Best regards,
>> Nikos
>>
>> [1] http://mails.dpdk.org/archives/dev/2018-January/088155.html
>>
>>
>> Nikos Dragazis (23):
>>   vhost: introduce vhost transport operations structure
>>   vhost: move socket management code
>>   vhost: move socket fd and un sockaddr
>>   vhost: move vhost-user connection
>>   vhost: move vhost-user reconnection
>>   vhost: move vhost-user fdset
>>   vhost: propagate vhost transport operations
>>   vhost: use a single structure for the device state
>>   vhost: extract socket I/O into transport
>>   vhost: move slave request fd and lock
>>   vhost: move mmap/munmap
>>   vhost: move setup of the log memory region
>>   vhost: remove main fd parameter from msg handlers
>>   vhost: move postcopy live migration code
>>   vhost: support registering additional vhost-user transports
>>   drivers/virtio_vhost_user: add virtio PCI framework
>>   drivers: add virtio-vhost-user transport
>>   drivers/virtio_vhost_user: use additional device resources
>>   vhost: add flag for choosing vhost-user transport
>>   net/vhost: add virtio-vhost-user support
>>   mk: link apps with virtio-vhost-user driver
>>   config: add option for the virtio-vhost-user transport
>>   usertools: add virtio-vhost-user devices to dpdk-devbind.py
>>
>> Stefan Hajnoczi (5):
>>   vhost: allocate per-socket transport state
>>   vhost: move start server/client calls
>>   vhost: add index field in vhost virtqueues
>>   examples/vhost_scsi: add --socket-file argument
>>   examples/vhost_scsi: add virtio-vhost-user support
>>
>>  config/common_base                                 |    6 +
>>  config/common_linux                                |    1 +
>>  drivers/Makefile                                   |    5 +
>>  drivers/net/vhost/rte_eth_vhost.c                  |   13 +
>>  drivers/virtio_vhost_user/Makefile                 |   27 +
>>  .../rte_virtio_vhost_user_version.map              |    4 +
>>  .../virtio_vhost_user/trans_virtio_vhost_user.c    | 1077 +++++++++++++++++++
>>  drivers/virtio_vhost_user/virtio_pci.c             |  520 ++++++++++
>>  drivers/virtio_vhost_user/virtio_pci.h             |  289 ++++++
>>  drivers/virtio_vhost_user/virtio_vhost_user.h      |   18 +
>>  drivers/virtio_vhost_user/virtqueue.h              |  181 ++++
>>  examples/vhost_scsi/vhost_scsi.c                   |  103 +-
>>  lib/librte_vhost/Makefile                          |    4 +-
>>  lib/librte_vhost/rte_vhost.h                       |    1 +
>>  lib/librte_vhost/rte_vhost_version.map             |   11 +
>>  lib/librte_vhost/socket.c                          |  685 +-----------
>>  lib/librte_vhost/trans_af_unix.c                   | 1094 ++++++++++++++++++++
>>  lib/librte_vhost/vhost.c                           |   22 +-
>>  lib/librte_vhost/vhost.h                           |  298 +++++-
>>  lib/librte_vhost/vhost_user.c                      |  474 ++-------
>>  lib/librte_vhost/vhost_user.h                      |   10 +-
>>  mk/rte.app.mk                                      |    6 +
>>  usertools/dpdk-devbind.py                          |    7 +
>>  23 files changed, 3764 insertions(+), 1092 deletions(-)
>>  create mode 100644 drivers/virtio_vhost_user/Makefile
>>  create mode 100644 drivers/virtio_vhost_user/rte_virtio_vhost_user_version.map
>>  create mode 100644 drivers/virtio_vhost_user/trans_virtio_vhost_user.c
>>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.c
>>  create mode 100644 drivers/virtio_vhost_user/virtio_pci.h
>>  create mode 100644 drivers/virtio_vhost_user/virtio_vhost_user.h
>>  create mode 100644 drivers/virtio_vhost_user/virtqueue.h
>>  create mode 100644 lib/librte_vhost/trans_af_unix.c
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport
  2019-06-20 11:35 ` Maxime Coquelin
@ 2019-06-22 20:26   ` Nikos Dragazis
  0 siblings, 0 replies; 40+ messages in thread
From: Nikos Dragazis @ 2019-06-22 20:26 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Tiwei Bie, Zhihong Wang, Stefan Hajnoczi, Wei Wang,
	Stojaczyk Dariusz, Vangelis Koukis

On 20/6/19 2:35 μ.μ., Maxime Coquelin wrote:
> Hi Nikos,
>
> On 6/19/19 5:14 PM, Nikos Dragazis wrote:
>> Hi everyone,
>>
>> this patch series introduces the concept of the virtio-vhost-user
>> transport. This is actually a revised version of an earlier RFC
>> implementation that has been proposed by Stefan Hajnoczi [1]. Though
>> this is a great feature, it seems to have been stalled, so I’d like to
>> restart the conversation on this and hopefully get it merged with your
>> help. Let me give you an overview.
>
> Thanks for taking over the series!
>
> I think you are already aware of that, but it arrives too late to
> consider it for v19.08, as the proposal deadline is over by almost 3
> weeks.
>
> That said, it is good that you sent it early, so that we can work to
> make it in for v19.11.

That's totally fine.

>
>> The virtio-vhost-user transport is a vhost-user transport implementation
>> that is based on the virtio-vhost-user device. Its key difference with
>> the existing transport is that it allows deploying vhost-user targets
>> inside dedicated Storage Appliance VMs instead of host user space. In
>> other words, it allows having guests that act as vhost-user backends for
>> other guests.
>>
>> The virtio-vhost-user device implements the vhost-user control plane
>> (master-slave communication) as follows:
>>
>> 1. it parses the vhost-user messages from the vhost-user unix domain
>>     socket and forwards them to the slave guest through virtqueues
>>
>> 2. it maps the vhost memory regions in QEMU’s process address space and
>>     exposes them to the slave guest as a RAM-backed PCI MMIO region
>>
>> 3. it hooks up doorbells to the callfds. The slave guest can use these
>>     doorbells to interrupt the master guest driver
>>
>> The device code has not yet been merged into upstream QEMU, but this is
>> definitely the end goal.
>
> Could you provide a pointer to the QEMU series, and instructions to test
> this new device?

Of course. Please have a look at the following step-by-step guide:

https://github.com/ndragazis/ndragazis.github.io/blob/master/dpdk-vhost-vvu-demo.md

If you face any problems or you find that something breaks (I hope not
:) ) please let me know. I haven't done thorough testing so I may have
missed something.  Please, also bear in mind that the device code is not
finished. There are still some things that need to be done.

>
>> The current state is that we are awaiting for
>> the approval of the virtio spec.
>
> Ditto, a link to the spec patches would be useful.

You can find the patches here:

https://lists.oasis-open.org/archives/virtio-dev/201905/msg00022.html

and an HTML version is available here:

https://ndragazis.github.io/virtio-v1.1-wd02.html#x1-41000011

>
>> I have Cced Darek from the SPDK community who has helped me a lot by
>> reviewing this series. Note that any device type could be implemented
>> over this new transport. So, adding the virtio-vhost-user transport in
>> DPDK would allow using it from SPDK as well.
>>
>> Getting into the code internals, this patch series makes the following
>> changes:
>>
>> 1. introduce a generic interface for the transport-specific operations.
>>     Each of the two available transports, the pre-existing AF_UNIX
>>     transport and the virtio-vhost-user transport, is going to implement
>>     this interface. The AF_UNIX-specific code has been extracted from the
>>     core vhost-user code and is now part of the AF_UNIX transport
>>     implementation in trans_af_unix.c.
>>
>> 2. introduce the virtio-vhost-user transport. The virtio-vhost-user
>>     transport requires a driver for the virtio-vhost-user devices. The
>>     driver along with the transport implementation have been packed into
>>     a separate library in `drivers/virtio_vhost_user/`. The necessary
>>     virtio-pci code has been copied from `drivers/net/virtio/`. Some
>>     additional changes have been made so that the driver can utilize the
>>     additional resources of the virtio-vhost-user device.
>>
>> 3. update librte_vhost public API to enable choosing transport for each
>>     new vhost device. Extend the vhost net driver and vhost-scsi example
>>     application to export this new API to the end user.
>>
>> The primary changes I did to Stefan’s RFC implementation are the
>> following:
>>
>> 1. moved postcopy live migration code into trans_af_unix.c. Postcopy
>>     live migration relies on the userfault fd mechanism, which cannot be
>>     supported by virtio-vhost-user.
>>
>> 2. moved setup of the log memory region into trans_af_unix.c. Setting up
>>     the log memory region involves mapping/unmapping guest memory. This
>>     is an AF_UNIX transport-specific operation.
>>
>> 3. introduced a vhost transport operation for
>>     process_slave_message_reply()
>>
>> 4. moved the virtio-vhost-user transport/driver into a separate library
>>     in `drivers/virtio_vhost_user/`. This required making vhost.h and
>>     vhost_user.h part of librte_vhost public API and exporting some
>>     private symbols via the version script. This looks better to me that
>>     just moving the entire librte_vhost into `drivers/`. I am not sure if
>>     this is the most appropriate solution. I am looking forward to your
>>     suggestions on this.
>
> I'm not sure this is the right place to put it.

Okay. Is there something specific that you think that doesn't fit
nicely? Do you have something else in mind?

>
>> 5. made use of the virtio PCI capabilities for the additional device
>>     resources (doorbells, shared memory). This required changes in
>>     virtio_pci.c and trans_virtio_vhost_user.c.
>>
>> 6. [minor] changed some commit headlines to comply with
>>     check-git-log.sh.
>>
>> Please, have a look and let me know about your thoughts. Any
>> reviews/pointers/suggestions are welcome.
>
> Maxime


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework
  2019-06-19 15:14 ` [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework Nikos Dragazis
@ 2019-09-05 16:34   ` Maxime Coquelin
  2019-09-09  8:42     ` Nikos Dragazis
  0 siblings, 1 reply; 40+ messages in thread
From: Maxime Coquelin @ 2019-09-05 16:34 UTC (permalink / raw)
  To: Nikos Dragazis, dev
  Cc: Tiwei Bie, Zhihong Wang, Stefan Hajnoczi, Wei Wang,
	Stojaczyk Dariusz, Vangelis Koukis

Hi Nikos,

On 6/19/19 5:14 PM, Nikos Dragazis wrote:
> The virtio-vhost-user transport requires a driver for the
> virtio-vhost-user PCI device, hence it needs a virtio-pci driver.  There
> is currently no librte_virtio API that we can use.
> 
> This commit is a hack that duplicates the virtio pci code from
> drivers/net/ into drivers/virtio_vhost_user/.  A better solution would
> be to extract the code cleanly from drivers/net/ and share it.  Or
> perhaps we could backport SPDK's lib/virtio/.

I think it would make sense to have a Virtio library, that could be re-
used by net, crypto and virtio-vhost-user.

I didn't know about SPDK's lib. Maybe it is better to start from virtio-
net PMD codebase and then convert crypto and SPDK to use it.

What do you think?

> drivers/virtio_vhost_user/ will host the virtio-vhost-user transport
> implementation in the upcoming patches.

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework
  2019-09-05 16:34   ` Maxime Coquelin
@ 2019-09-09  8:42     ` Nikos Dragazis
  2019-09-09  8:44       ` Maxime Coquelin
  0 siblings, 1 reply; 40+ messages in thread
From: Nikos Dragazis @ 2019-09-09  8:42 UTC (permalink / raw)
  To: Maxime Coquelin, dev
  Cc: Tiwei Bie, Zhihong Wang, Stefan Hajnoczi, Wei Wang,
	Stojaczyk Dariusz, Vangelis Koukis

On 5/9/19 7:34 μ.μ., Maxime Coquelin wrote:
> Hi Nikos,
>
> On 6/19/19 5:14 PM, Nikos Dragazis wrote:
>> The virtio-vhost-user transport requires a driver for the
>> virtio-vhost-user PCI device, hence it needs a virtio-pci driver.  There
>> is currently no librte_virtio API that we can use.
>>
>> This commit is a hack that duplicates the virtio pci code from
>> drivers/net/ into drivers/virtio_vhost_user/.  A better solution would
>> be to extract the code cleanly from drivers/net/ and share it.  Or
>> perhaps we could backport SPDK's lib/virtio/.
> I think it would make sense to have a Virtio library, that could be re-
> used by net, crypto and virtio-vhost-user.
>
> I didn't know about SPDK's lib. Maybe it is better to start from virtio-
> net PMD codebase and then convert crypto and SPDK to use it.
>
> What do you think?
>
>> drivers/virtio_vhost_user/ will host the virtio-vhost-user transport
>> implementation in the upcoming patches.
> Thanks,
> Maxime

Hi Maxime,

thanks for your comments. I agree with you. There is no point in having
duplicated code here and there. A standalone virtio library sounds like
a better approach. I will come back with a proper patchset for this
purpose.

Best regards,
Nikos

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework
  2019-09-09  8:42     ` Nikos Dragazis
@ 2019-09-09  8:44       ` Maxime Coquelin
  0 siblings, 0 replies; 40+ messages in thread
From: Maxime Coquelin @ 2019-09-09  8:44 UTC (permalink / raw)
  To: Nikos Dragazis, dev
  Cc: Tiwei Bie, Zhihong Wang, Stefan Hajnoczi, Wei Wang,
	Stojaczyk Dariusz, Vangelis Koukis

Hi Nikos,

On 9/9/19 10:42 AM, Nikos Dragazis wrote:
> On 5/9/19 7:34 μ.μ., Maxime Coquelin wrote:
>> Hi Nikos,
>>
>> On 6/19/19 5:14 PM, Nikos Dragazis wrote:
>>> The virtio-vhost-user transport requires a driver for the
>>> virtio-vhost-user PCI device, hence it needs a virtio-pci driver.  There
>>> is currently no librte_virtio API that we can use.
>>>
>>> This commit is a hack that duplicates the virtio pci code from
>>> drivers/net/ into drivers/virtio_vhost_user/.  A better solution would
>>> be to extract the code cleanly from drivers/net/ and share it.  Or
>>> perhaps we could backport SPDK's lib/virtio/.
>> I think it would make sense to have a Virtio library, that could be re-
>> used by net, crypto and virtio-vhost-user.
>>
>> I didn't know about SPDK's lib. Maybe it is better to start from virtio-
>> net PMD codebase and then convert crypto and SPDK to use it.
>>
>> What do you think?
>>
>>> drivers/virtio_vhost_user/ will host the virtio-vhost-user transport
>>> implementation in the upcoming patches.
>> Thanks,
>> Maxime
> 
> Hi Maxime,
> 
> thanks for your comments. I agree with you. There is no point in having
> duplicated code here and there. A standalone virtio library sounds like
> a better approach. I will come back with a proper patchset for this
> purpose.

Great, I really appreciate the effort.

As you suggest, I agree it is better to have it done in a dedicated
patch set, the virtio-vhost-user series being already quite big :)

Regards,
Maxime

> Best regards,
> Nikos
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, back to index

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19 15:14 [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 01/28] vhost: introduce vhost transport operations structure Nikos Dragazis
2019-06-19 20:14   ` Aaron Conole
2019-06-20 10:30     ` Bruce Richardson
2019-06-20 18:24       ` Nikos Dragazis
2019-06-20 18:19     ` Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 02/28] vhost: move socket management code Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 03/28] vhost: allocate per-socket transport state Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 04/28] vhost: move socket fd and un sockaddr Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 05/28] vhost: move start server/client calls Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 06/28] vhost: move vhost-user connection Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 07/28] vhost: move vhost-user reconnection Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 08/28] vhost: move vhost-user fdset Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 09/28] vhost: propagate vhost transport operations Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 10/28] vhost: use a single structure for the device state Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 11/28] vhost: extract socket I/O into transport Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 12/28] vhost: move slave request fd and lock Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 13/28] vhost: move mmap/munmap Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 14/28] vhost: move setup of the log memory region Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 15/28] vhost: remove main fd parameter from msg handlers Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 16/28] vhost: move postcopy live migration code Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 17/28] vhost: support registering additional vhost-user transports Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 18/28] drivers/virtio_vhost_user: add virtio PCI framework Nikos Dragazis
2019-09-05 16:34   ` Maxime Coquelin
2019-09-09  8:42     ` Nikos Dragazis
2019-09-09  8:44       ` Maxime Coquelin
2019-06-19 15:14 ` [dpdk-dev] [PATCH 19/28] vhost: add index field in vhost virtqueues Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 20/28] drivers: add virtio-vhost-user transport Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 21/28] drivers/virtio_vhost_user: use additional device resources Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 22/28] vhost: add flag for choosing vhost-user transport Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 23/28] net/vhost: add virtio-vhost-user support Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 24/28] examples/vhost_scsi: add --socket-file argument Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 25/28] examples/vhost_scsi: add virtio-vhost-user support Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 26/28] mk: link apps with virtio-vhost-user driver Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 27/28] config: add option for the virtio-vhost-user transport Nikos Dragazis
2019-06-19 15:14 ` [dpdk-dev] [PATCH 28/28] usertools: add virtio-vhost-user devices to dpdk-devbind.py Nikos Dragazis
     [not found] ` <CGME20190620113240eucas1p22ca4faa64a36bbb7aec38a81298ade56@eucas1p2.samsung.com>
2019-06-20 11:32   ` [dpdk-dev] [PATCH 00/28] vhost: add virtio-vhost-user transport Ilya Maximets
2019-06-20 23:44     ` Nikos Dragazis
2019-06-20 11:35 ` Maxime Coquelin
2019-06-22 20:26   ` Nikos Dragazis

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox