DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport
@ 2018-01-19 13:44 Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 01/24] vhost: move vring_call() into trans_af_unix.c Stefan Hajnoczi
                   ` (25 more replies)
  0 siblings, 26 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

This patch series implements the virtio-vhost-user device, which tunnels
vhost-user protocol messages over virtio.  This lets guests act as vhost device
backends for other guests.

The virtio-vhost-user device is the result of discussion about Wei Wang and
Zhiyong Yang's vhost-pci device.  This patch series demonstrates that vhost
device backends, such as the vhost vdev driver, can work over both AF_UNIX and
virtio-vhost-user without significant modifications.  This allows a lot of code
to be shared between traditional AF_UNIX vhost-user and virtio-vhost-user.  The
vhost-pci patches duplicated the vhost net device backend and didn't reuse
librte_vhost:

  http://dpdk.org/ml/archives/dev/2017-November/082615.html

User-visible changes
--------------------
The vhost vdev can now be used when DPDK runs inside a guest with a
virtio-vhost-user PCI device:

  --vdev net_vhost0,iface="0000:00:04.0",virtio-transport=1

The vhost-scsi example has also been extended to support virtio-vhost-user:

  ./vhost-scsi ... -- --virtio-vhost-user-pci "0000:00:04.0"

For more information (including instructions for running the code), see
https://wiki.qemu.org/Features/VirtioVhostUser

Virtio device design
--------------------
The virtio-vhost-user device is a new virtio device type.  It acts as a
vhost-user transport and is an alternative for the traditional AF_UNIX
transport.

You can find the virtio-vhost-user VIRTIO device specification here:
https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007

librte_vhost API changes
------------------------
This patch series extends librte_vhost so that it now accepts:

  rte_vhost_driver_register("0000:00:04.0",
                            RTE_VHOST_USER_VIRTIO_TRANSPORT);

All other librte_vhost API usage remains unchanged, except that the file
descriptors exposed in some <rte_vhost.h> structs will be -1 since there is no
file descriptor passing involved.

This makes it extremely easy to support virtio-vhost-user in existing vhost
device backends!  I have extended the vhost vdev driver and the vhost-scsi
example application in this patch series.

Patch series overview
---------------------
This series is based on commit 814339ba7eea13d132508af2cccec2f73568e2d0 from
dpdk-next-virtio/master.  You can also get my git branch here:

  https://github.com/stefanha/dpdk/tree/virtio-vhost-user

The initial patches refactor librte_vhost so that AF_UNIX-specific code is moved
to a new trans_af_unix.c file.  This also introduces a struct
vhost_transport_ops interface that all transports will implement:

  adf412c3c vhost: move vring_call() into trans_af_unix.c
  04079a077 vhost: move AF_UNIX code from socket.c to trans_af_unix.c
  885642091 vhost: allocate per-socket transport state
  9c48377df vhost: move socket_fd and un sockaddr into trans_af_unix.c
  c646e6292 vhost: move start_server/client() calls to trans_af_unix.c
  db07ef7a8 vhost: move vhost_user_connection to trans_af_unix.c
  d10b80163 vhost: move vhost_user_reconnect_init() into trans_af_unix.c
  1258bcd68 vhost: move vhost_user.fdset to trans_af_unix.c
  bc7f6d7ab vhost: pass vhost_transport_ops through vhost_new_device()
  b82187a29 vhost: embed struct virtio_net inside struct vhost_user_connection
  abbd544f2 vhost: extract vhost_user.c socket I/O into transport
  0658d711b vhost: move slave_req_fd field to AF_UNIX transport
  e2ecf78ed vhost: move mmap/munmap to AF_UNIX transport

I was about to add the virtio-vhost-user PCI driver when I realized that
lib/librte_vhost/ cannot have a dependency on rte_bus_pci.h (it lives in
drivers/).  The solution I chose is to move all of librte_vhost to
drivers/librte_vhost/, but I'm open to suggestions if this is undesirable.

  8615c2140 vhost: move librte_vhost to drivers/

Since virtio-vhost-user is a virtio device it's necessary to perform a sequence
of device initialization steps and set up virtqueues.  I didn't see a virtio
API in DPDK and didn't have time to create one myself.  I copied the virtio
code from drivers/net/virtio/ as a quick hack, but really there needs to be a
librte_virtio that is shared.

  c26937c66 vhost: add virtio pci framework

Next the virtio-vhost-user transport is added along with the
RTE_VHOST_USER_VIRTIO_TRANSPORT flag:

  7fad5adc4 vhost: remember a vhost_virtqueue's queue index
  d26922892 vhost: add virtio-vhost-user transport
  f562a70ab vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag

Then I extended the vhost vdev and vhost-scsi example application to support
virtio-vhost-user:

  47434f2a2 net/vhost: add virtio-vhost-user support
  22ca05d1b examples/vhost_scsi: add --socket-file argument
  db3b391dc examples/vhost_scsi: add virtio-vhost-user support

It was also necessary to tweak dpdk-devbind.py to support virtio-vhost-user:

  4aee6f653 usertools: add virtio-vhost-user devices to dpdk-devbind.py

Finally, vhost-scsi seems broken to me so two workarounds were needed so it can
be tested again:

  b9d17bfaf WORKAROUND revert virtio-net mq vring deletion
  cadb25e7d WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX

Stefan Hajnoczi (24):
  vhost: move vring_call() into trans_af_unix.c
  vhost: move AF_UNIX code from socket.c to trans_af_unix.c
  vhost: allocate per-socket transport state
  vhost: move socket_fd and un sockaddr into trans_af_unix.c
  vhost: move start_server/client() calls to trans_af_unix.c
  vhost: move vhost_user_connection to trans_af_unix.c
  vhost: move vhost_user_reconnect_init() into trans_af_unix.c
  vhost: move vhost_user.fdset to trans_af_unix.c
  vhost: pass vhost_transport_ops through vhost_new_device()
  vhost: embed struct virtio_net inside struct vhost_user_connection
  vhost: extract vhost_user.c socket I/O into transport
  vhost: move slave_req_fd field to AF_UNIX transport
  vhost: move mmap/munmap to AF_UNIX transport
  vhost: move librte_vhost to drivers/
  vhost: add virtio pci framework
  vhost: remember a vhost_virtqueue's queue index
  vhost: add virtio-vhost-user transport
  vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag
  net/vhost: add virtio-vhost-user support
  examples/vhost_scsi: add --socket-file argument
  examples/vhost_scsi: add virtio-vhost-user support
  usertools: add virtio-vhost-user devices to dpdk-devbind.py
  WORKAROUND revert virtio-net mq vring deletion
  WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX

 drivers/Makefile                                   |    2 +
 {lib => drivers}/librte_vhost/Makefile             |    5 +-
 lib/Makefile                                       |    3 -
 {lib => drivers}/librte_vhost/fd_man.h             |    0
 {lib => drivers}/librte_vhost/iotlb.h              |    0
 {lib => drivers}/librte_vhost/rte_vhost.h          |    1 +
 {lib => drivers}/librte_vhost/vhost.h              |  193 +++-
 {lib => drivers}/librte_vhost/vhost_user.h         |    9 +-
 drivers/librte_vhost/virtio_pci.h                  |  267 +++++
 drivers/librte_vhost/virtio_vhost_user.h           |   18 +
 drivers/librte_vhost/virtqueue.h                   |  181 ++++
 {lib => drivers}/librte_vhost/fd_man.c             |    0
 {lib => drivers}/librte_vhost/iotlb.c              |    0
 drivers/librte_vhost/socket.c                      |  282 ++++++
 drivers/librte_vhost/trans_af_unix.c               |  795 +++++++++++++++
 drivers/librte_vhost/trans_virtio_vhost_user.c     | 1050 ++++++++++++++++++++
 {lib => drivers}/librte_vhost/vhost.c              |   18 +-
 {lib => drivers}/librte_vhost/vhost_user.c         |  200 +---
 {lib => drivers}/librte_vhost/virtio_net.c         |    0
 drivers/librte_vhost/virtio_pci.c                  |  504 ++++++++++
 drivers/net/vhost/rte_eth_vhost.c                  |   13 +
 examples/vhost_scsi/vhost_scsi.c                   |  104 +-
 lib/librte_vhost/socket.c                          |  828 ---------------
 .../librte_vhost/rte_vhost_version.map             |    0
 usertools/dpdk-devbind.py                          |    8 +
 25 files changed, 3455 insertions(+), 1026 deletions(-)
 rename {lib => drivers}/librte_vhost/Makefile (86%)
 rename {lib => drivers}/librte_vhost/fd_man.h (100%)
 rename {lib => drivers}/librte_vhost/iotlb.h (100%)
 rename {lib => drivers}/librte_vhost/rte_vhost.h (99%)
 rename {lib => drivers}/librte_vhost/vhost.h (68%)
 rename {lib => drivers}/librte_vhost/vhost_user.h (93%)
 create mode 100644 drivers/librte_vhost/virtio_pci.h
 create mode 100644 drivers/librte_vhost/virtio_vhost_user.h
 create mode 100644 drivers/librte_vhost/virtqueue.h
 rename {lib => drivers}/librte_vhost/fd_man.c (100%)
 rename {lib => drivers}/librte_vhost/iotlb.c (100%)
 create mode 100644 drivers/librte_vhost/socket.c
 create mode 100644 drivers/librte_vhost/trans_af_unix.c
 create mode 100644 drivers/librte_vhost/trans_virtio_vhost_user.c
 rename {lib => drivers}/librte_vhost/vhost.c (97%)
 rename {lib => drivers}/librte_vhost/vhost_user.c (89%)
 rename {lib => drivers}/librte_vhost/virtio_net.c (100%)
 create mode 100644 drivers/librte_vhost/virtio_pci.c
 delete mode 100644 lib/librte_vhost/socket.c
 rename {lib => drivers}/librte_vhost/rte_vhost_version.map (100%)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 01/24] vhost: move vring_call() into trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 02/24] vhost: move AF_UNIX code from socket.c to trans_af_unix.c Stefan Hajnoczi
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

File descriptor passing is specific to the AF_UNIX vhost-user protocol
transport.  In order to add support for additional transports it is
necessary to extract transport-specific code from the main vhost-user
code.

This patch introduces struct vhost_transport_ops and associates each
device with a transport.  Core vhost-user code calls into
vhost_transport_ops to perform transport-specific operations.

Notifying callfd is a transport-specific operation so it belongs in
trans_af_unix.c.

Several more patches follow this one to complete the task of moving
AF_UNIX transport code out of core vhost-user code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/Makefile        |  2 +-
 lib/librte_vhost/vhost.h         | 35 +++++++++++++++++++++++-----
 lib/librte_vhost/trans_af_unix.c | 49 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.c         |  1 +
 4 files changed, 80 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_vhost/trans_af_unix.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 065d5c469..ccbbce3af 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
-					vhost_user.c virtio_net.c
+					vhost_user.c virtio_net.c trans_af_unix.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b2bf0e833..53811a8b1 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -200,6 +200,30 @@ struct guest_page {
 	uint64_t size;
 };
 
+struct virtio_net;
+
+/**
+ * A structure containing function pointers for transport-specific operations.
+ */
+struct vhost_transport_ops {
+	/**
+	 * Notify the guest that used descriptors have been added to the vring.
+	 * The VRING_AVAIL_F_NO_INTERRUPT flag has already been checked so this
+	 * function just needs to perform the notification.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param vq
+	 *  vhost virtqueue
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
+};
+
+/** The traditional AF_UNIX vhost-user protocol transport. */
+extern const struct vhost_transport_ops af_unix_trans_ops;
+
 /**
  * Device structure contains all configuration information relating
  * to the device.
@@ -226,6 +250,7 @@ struct virtio_net {
 	uint16_t		mtu;
 
 	struct vhost_device_ops const *notify_ops;
+	struct vhost_transport_ops const *trans_ops;
 
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
@@ -405,16 +430,14 @@ vhost_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq)
 			__func__,
 			vhost_used_event(vq),
 			old, new);
-		if (vhost_need_event(vhost_used_event(vq), new, old)
-			&& (vq->callfd >= 0)) {
+		if (vhost_need_event(vhost_used_event(vq), new, old)) {
 			vq->signalled_used = vq->last_used_idx;
-			eventfd_write(vq->callfd, (eventfd_t) 1);
+			dev->trans_ops->vring_call(dev, vq);
 		}
 	} else {
 		/* Kick the guest if necessary. */
-		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-				&& (vq->callfd >= 0))
-			eventfd_write(vq->callfd, (eventfd_t)1);
+		if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
+			dev->trans_ops->vring_call(dev, vq);
 	}
 }
 
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
new file mode 100644
index 000000000..9ed04b7eb
--- /dev/null
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Copyright (C) 2017 Red Hat, Inc.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "vhost.h"
+
+static int
+af_unix_vring_call(struct virtio_net *dev __rte_unused,
+		   struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		eventfd_write(vq->callfd, (eventfd_t)1);
+	return 0;
+}
+
+const struct vhost_transport_ops af_unix_trans_ops = {
+	.vring_call = af_unix_vring_call,
+};
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 6789ccce5..630dbd014 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -287,6 +287,7 @@ vhost_new_device(void)
 	vhost_devices[i] = dev;
 	dev->vid = i;
 	dev->slave_req_fd = -1;
+	dev->trans_ops = &af_unix_trans_ops;
 
 	return i;
 }
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 02/24] vhost: move AF_UNIX code from socket.c to trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 01/24] vhost: move vring_call() into trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 03/24] vhost: allocate per-socket transport state Stefan Hajnoczi
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The socket.c file serves two purposes:
1. librte_vhost public API entry points, e.g. rte_vhost_driver_register().
2. AF_UNIX socket management.

Move AF_UNIX socket code into trans_af_unix.c so that socket.c only
handles the librte_vhost public API entry points.  This will make it
possible to support other transports besides AF_UNIX.

This patch is a preparatory step that simply moves code from socket.c to
trans_af_unix.c unmodified, besides dropping 'static' qualifiers where
necessary because socket.c now calls into trans_af_unix.c.

A lot of socket.c state is exposed in vhost.h but this is a temporary
measure and will be cleaned up in later patches.  By simply moving code
unmodified in this patch it will be easier to review the actual
refactoring that follows.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         |  65 +++++
 lib/librte_vhost/socket.c        | 501 +--------------------------------------
 lib/librte_vhost/trans_af_unix.c | 451 +++++++++++++++++++++++++++++++++++
 3 files changed, 517 insertions(+), 500 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 53811a8b1..8c6d6e524 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -5,6 +5,7 @@
 #ifndef _VHOST_NET_CDEV_H_
 #define _VHOST_NET_CDEV_H_
 #include <stdint.h>
+#include <stdbool.h>
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/queue.h>
@@ -12,12 +13,15 @@
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <sys/socket.h>
+#include <sys/un.h> /* TODO remove when trans_af_unix.c refactoring is done */
 #include <linux/if.h>
+#include <pthread.h>
 
 #include <rte_log.h>
 #include <rte_ether.h>
 #include <rte_rwlock.h>
 
+#include "fd_man.h"
 #include "rte_vhost.h"
 
 /* Used to indicate that the device is running on a data core */
@@ -259,6 +263,67 @@ struct virtio_net {
 	int			slave_req_fd;
 } __rte_cache_aligned;
 
+/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect
+ * declarations are temporary measures for moving AF_UNIX code into
+ * trans_af_unix.c.  They will be cleaned up as socket.c is untangled from
+ * trans_af_unix.c.
+ */
+TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
+
+/*
+ * Every time rte_vhost_driver_register() is invoked, an associated
+ * vhost_user_socket struct will be created.
+ */
+struct vhost_user_socket {
+	struct vhost_user_connection_list conn_list;
+	pthread_mutex_t conn_mutex;
+	char *path;
+	int socket_fd;
+	struct sockaddr_un un;
+	bool is_server;
+	bool reconnect;
+	bool dequeue_zero_copy;
+	bool iommu_support;
+
+	/*
+	 * The "supported_features" indicates the feature bits the
+	 * vhost driver supports. The "features" indicates the feature
+	 * bits after the rte_vhost_driver_features_disable/enable().
+	 * It is also the final feature bits used for vhost-user
+	 * features negotiation.
+	 */
+	uint64_t supported_features;
+	uint64_t features;
+
+	struct vhost_device_ops const *notify_ops;
+};
+
+struct vhost_user_connection {
+	struct vhost_user_socket *vsocket;
+	int connfd;
+	int vid;
+
+	TAILQ_ENTRY(vhost_user_connection) next;
+};
+
+#define MAX_VHOST_SOCKET 1024
+struct vhost_user {
+	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
+	struct fdset fdset;
+	int vsocket_cnt;
+	pthread_mutex_t mutex;
+};
+
+extern struct vhost_user vhost_user;
+
+int create_unix_socket(struct vhost_user_socket *vsocket);
+int vhost_user_start_server(struct vhost_user_socket *vsocket);
+int vhost_user_start_client(struct vhost_user_socket *vsocket);
+
+extern pthread_t reconn_tid;
+
+int vhost_user_reconnect_init(void);
+bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket);
 
 #define VHOST_LOG_PAGE	4096
 
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 6e3857e7a..d681f9cae 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -4,17 +4,14 @@
 
 #include <stdint.h>
 #include <stdio.h>
-#include <stdbool.h>
 #include <limits.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <string.h>
 #include <sys/types.h>
 #include <sys/socket.h>
-#include <sys/un.h>
 #include <sys/queue.h>
 #include <errno.h>
-#include <fcntl.h>
 #include <pthread.h>
 
 #include <rte_log.h>
@@ -23,61 +20,7 @@
 #include "vhost.h"
 #include "vhost_user.h"
 
-
-TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
-
-/*
- * Every time rte_vhost_driver_register() is invoked, an associated
- * vhost_user_socket struct will be created.
- */
-struct vhost_user_socket {
-	struct vhost_user_connection_list conn_list;
-	pthread_mutex_t conn_mutex;
-	char *path;
-	int socket_fd;
-	struct sockaddr_un un;
-	bool is_server;
-	bool reconnect;
-	bool dequeue_zero_copy;
-	bool iommu_support;
-
-	/*
-	 * The "supported_features" indicates the feature bits the
-	 * vhost driver supports. The "features" indicates the feature
-	 * bits after the rte_vhost_driver_features_disable/enable().
-	 * It is also the final feature bits used for vhost-user
-	 * features negotiation.
-	 */
-	uint64_t supported_features;
-	uint64_t features;
-
-	struct vhost_device_ops const *notify_ops;
-};
-
-struct vhost_user_connection {
-	struct vhost_user_socket *vsocket;
-	int connfd;
-	int vid;
-
-	TAILQ_ENTRY(vhost_user_connection) next;
-};
-
-#define MAX_VHOST_SOCKET 1024
-struct vhost_user {
-	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
-	struct fdset fdset;
-	int vsocket_cnt;
-	pthread_mutex_t mutex;
-};
-
-#define MAX_VIRTIO_BACKLOG 128
-
-static void vhost_user_server_new_connection(int fd, void *data, int *remove);
-static void vhost_user_read_cb(int fd, void *dat, int *remove);
-static int create_unix_socket(struct vhost_user_socket *vsocket);
-static int vhost_user_start_client(struct vhost_user_socket *vsocket);
-
-static struct vhost_user vhost_user = {
+struct vhost_user vhost_user = {
 	.fdset = {
 		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
 		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
@@ -87,424 +30,6 @@ static struct vhost_user vhost_user = {
 	.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
 
-/* return bytes# of read on success or negative val on failure. */
-int
-read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
-{
-	struct iovec iov;
-	struct msghdr msgh;
-	size_t fdsize = fd_num * sizeof(int);
-	char control[CMSG_SPACE(fdsize)];
-	struct cmsghdr *cmsg;
-	int ret;
-
-	memset(&msgh, 0, sizeof(msgh));
-	iov.iov_base = buf;
-	iov.iov_len  = buflen;
-
-	msgh.msg_iov = &iov;
-	msgh.msg_iovlen = 1;
-	msgh.msg_control = control;
-	msgh.msg_controllen = sizeof(control);
-
-	ret = recvmsg(sockfd, &msgh, 0);
-	if (ret <= 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
-		return ret;
-	}
-
-	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
-		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
-		return -1;
-	}
-
-	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
-		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
-		if ((cmsg->cmsg_level == SOL_SOCKET) &&
-			(cmsg->cmsg_type == SCM_RIGHTS)) {
-			memcpy(fds, CMSG_DATA(cmsg), fdsize);
-			break;
-		}
-	}
-
-	return ret;
-}
-
-int
-send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
-{
-
-	struct iovec iov;
-	struct msghdr msgh;
-	size_t fdsize = fd_num * sizeof(int);
-	char control[CMSG_SPACE(fdsize)];
-	struct cmsghdr *cmsg;
-	int ret;
-
-	memset(&msgh, 0, sizeof(msgh));
-	iov.iov_base = buf;
-	iov.iov_len = buflen;
-
-	msgh.msg_iov = &iov;
-	msgh.msg_iovlen = 1;
-
-	if (fds && fd_num > 0) {
-		msgh.msg_control = control;
-		msgh.msg_controllen = sizeof(control);
-		cmsg = CMSG_FIRSTHDR(&msgh);
-		cmsg->cmsg_len = CMSG_LEN(fdsize);
-		cmsg->cmsg_level = SOL_SOCKET;
-		cmsg->cmsg_type = SCM_RIGHTS;
-		memcpy(CMSG_DATA(cmsg), fds, fdsize);
-	} else {
-		msgh.msg_control = NULL;
-		msgh.msg_controllen = 0;
-	}
-
-	do {
-		ret = sendmsg(sockfd, &msgh, 0);
-	} while (ret < 0 && errno == EINTR);
-
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
-		return ret;
-	}
-
-	return ret;
-}
-
-static void
-vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
-{
-	int vid;
-	size_t size;
-	struct vhost_user_connection *conn;
-	int ret;
-
-	conn = malloc(sizeof(*conn));
-	if (conn == NULL) {
-		close(fd);
-		return;
-	}
-
-	vid = vhost_new_device();
-	if (vid == -1) {
-		goto err;
-	}
-
-	size = strnlen(vsocket->path, PATH_MAX);
-	vhost_set_ifname(vid, vsocket->path, size);
-
-	if (vsocket->dequeue_zero_copy)
-		vhost_enable_dequeue_zero_copy(vid);
-
-	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
-
-	if (vsocket->notify_ops->new_connection) {
-		ret = vsocket->notify_ops->new_connection(vid);
-		if (ret < 0) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to add vhost user connection with fd %d\n",
-				fd);
-			goto err;
-		}
-	}
-
-	conn->connfd = fd;
-	conn->vsocket = vsocket;
-	conn->vid = vid;
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
-			NULL, conn);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to add fd %d into vhost server fdset\n",
-			fd);
-
-		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
-
-		goto err;
-	}
-
-	pthread_mutex_lock(&vsocket->conn_mutex);
-	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
-	pthread_mutex_unlock(&vsocket->conn_mutex);
-	return;
-
-err:
-	free(conn);
-	close(fd);
-}
-
-/* call back when there is new vhost-user connection from client  */
-static void
-vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
-{
-	struct vhost_user_socket *vsocket = dat;
-
-	fd = accept(fd, NULL, NULL);
-	if (fd < 0)
-		return;
-
-	RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd);
-	vhost_user_add_connection(fd, vsocket);
-}
-
-static void
-vhost_user_read_cb(int connfd, void *dat, int *remove)
-{
-	struct vhost_user_connection *conn = dat;
-	struct vhost_user_socket *vsocket = conn->vsocket;
-	int ret;
-
-	ret = vhost_user_msg_handler(conn->vid, connfd);
-	if (ret < 0) {
-		close(connfd);
-		*remove = 1;
-		vhost_destroy_device(conn->vid);
-
-		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
-
-		pthread_mutex_lock(&vsocket->conn_mutex);
-		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-		pthread_mutex_unlock(&vsocket->conn_mutex);
-
-		free(conn);
-
-		if (vsocket->reconnect) {
-			create_unix_socket(vsocket);
-			vhost_user_start_client(vsocket);
-		}
-	}
-}
-
-static int
-create_unix_socket(struct vhost_user_socket *vsocket)
-{
-	int fd;
-	struct sockaddr_un *un = &vsocket->un;
-
-	fd = socket(AF_UNIX, SOCK_STREAM, 0);
-	if (fd < 0)
-		return -1;
-	RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n",
-		vsocket->is_server ? "server" : "client", fd);
-
-	if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"vhost-user: can't set nonblocking mode for socket, fd: "
-			"%d (%s)\n", fd, strerror(errno));
-		close(fd);
-		return -1;
-	}
-
-	memset(un, 0, sizeof(*un));
-	un->sun_family = AF_UNIX;
-	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
-	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
-
-	vsocket->socket_fd = fd;
-	return 0;
-}
-
-static int
-vhost_user_start_server(struct vhost_user_socket *vsocket)
-{
-	int ret;
-	int fd = vsocket->socket_fd;
-	const char *path = vsocket->path;
-
-	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to bind to %s: %s; remove it and try again\n",
-			path, strerror(errno));
-		goto err;
-	}
-	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
-
-	ret = listen(fd, MAX_VIRTIO_BACKLOG);
-	if (ret < 0)
-		goto err;
-
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
-		  NULL, vsocket);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to add listen fd %d to vhost server fdset\n",
-			fd);
-		goto err;
-	}
-
-	return 0;
-
-err:
-	close(fd);
-	return -1;
-}
-
-struct vhost_user_reconnect {
-	struct sockaddr_un un;
-	int fd;
-	struct vhost_user_socket *vsocket;
-
-	TAILQ_ENTRY(vhost_user_reconnect) next;
-};
-
-TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect);
-struct vhost_user_reconnect_list {
-	struct vhost_user_reconnect_tailq_list head;
-	pthread_mutex_t mutex;
-};
-
-static struct vhost_user_reconnect_list reconn_list;
-static pthread_t reconn_tid;
-
-static int
-vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
-{
-	int ret, flags;
-
-	ret = connect(fd, un, sz);
-	if (ret < 0 && errno != EISCONN)
-		return -1;
-
-	flags = fcntl(fd, F_GETFL, 0);
-	if (flags < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"can't get flags for connfd %d\n", fd);
-		return -2;
-	}
-	if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-				"can't disable nonblocking on fd %d\n", fd);
-		return -2;
-	}
-	return 0;
-}
-
-static void *
-vhost_user_client_reconnect(void *arg __rte_unused)
-{
-	int ret;
-	struct vhost_user_reconnect *reconn, *next;
-
-	while (1) {
-		pthread_mutex_lock(&reconn_list.mutex);
-
-		/*
-		 * An equal implementation of TAILQ_FOREACH_SAFE,
-		 * which does not exist on all platforms.
-		 */
-		for (reconn = TAILQ_FIRST(&reconn_list.head);
-		     reconn != NULL; reconn = next) {
-			next = TAILQ_NEXT(reconn, next);
-
-			ret = vhost_user_connect_nonblock(reconn->fd,
-						(struct sockaddr *)&reconn->un,
-						sizeof(reconn->un));
-			if (ret == -2) {
-				close(reconn->fd);
-				RTE_LOG(ERR, VHOST_CONFIG,
-					"reconnection for fd %d failed\n",
-					reconn->fd);
-				goto remove_fd;
-			}
-			if (ret == -1)
-				continue;
-
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"%s: connected\n", reconn->vsocket->path);
-			vhost_user_add_connection(reconn->fd, reconn->vsocket);
-remove_fd:
-			TAILQ_REMOVE(&reconn_list.head, reconn, next);
-			free(reconn);
-		}
-
-		pthread_mutex_unlock(&reconn_list.mutex);
-		sleep(1);
-	}
-
-	return NULL;
-}
-
-static int
-vhost_user_reconnect_init(void)
-{
-	int ret;
-	char thread_name[RTE_MAX_THREAD_NAME_LEN];
-
-	ret = pthread_mutex_init(&reconn_list.mutex, NULL);
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex");
-		return ret;
-	}
-	TAILQ_INIT(&reconn_list.head);
-
-	ret = pthread_create(&reconn_tid, NULL,
-			     vhost_user_client_reconnect, NULL);
-	if (ret != 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread");
-		if (pthread_mutex_destroy(&reconn_list.mutex)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to destroy reconnect mutex");
-		}
-	} else {
-		snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
-			 "vhost-reconn");
-
-		if (rte_thread_setname(reconn_tid, thread_name)) {
-			RTE_LOG(DEBUG, VHOST_CONFIG,
-				"failed to set reconnect thread name");
-		}
-	}
-
-	return ret;
-}
-
-static int
-vhost_user_start_client(struct vhost_user_socket *vsocket)
-{
-	int ret;
-	int fd = vsocket->socket_fd;
-	const char *path = vsocket->path;
-	struct vhost_user_reconnect *reconn;
-
-	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
-					  sizeof(vsocket->un));
-	if (ret == 0) {
-		vhost_user_add_connection(fd, vsocket);
-		return 0;
-	}
-
-	RTE_LOG(WARNING, VHOST_CONFIG,
-		"failed to connect to %s: %s\n",
-		path, strerror(errno));
-
-	if (ret == -2 || !vsocket->reconnect) {
-		close(fd);
-		return -1;
-	}
-
-	RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path);
-	reconn = malloc(sizeof(*reconn));
-	if (reconn == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"failed to allocate memory for reconnect\n");
-		close(fd);
-		return -1;
-	}
-	reconn->un = vsocket->un;
-	reconn->fd = fd;
-	reconn->vsocket = vsocket;
-	pthread_mutex_lock(&reconn_list.mutex);
-	TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next);
-	pthread_mutex_unlock(&reconn_list.mutex);
-
-	return 0;
-}
-
 static struct vhost_user_socket *
 find_vhost_user_socket(const char *path)
 {
@@ -688,30 +213,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	return ret;
 }
 
-static bool
-vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
-{
-	int found = false;
-	struct vhost_user_reconnect *reconn, *next;
-
-	pthread_mutex_lock(&reconn_list.mutex);
-
-	for (reconn = TAILQ_FIRST(&reconn_list.head);
-	     reconn != NULL; reconn = next) {
-		next = TAILQ_NEXT(reconn, next);
-
-		if (reconn->vsocket == vsocket) {
-			TAILQ_REMOVE(&reconn_list.head, reconn, next);
-			close(reconn->fd);
-			free(reconn);
-			found = true;
-			break;
-		}
-	}
-	pthread_mutex_unlock(&reconn_list.mutex);
-	return found;
-}
-
 /**
  * Unregister the specified vhost socket
  */
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 9ed04b7eb..636f69916 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -33,7 +33,458 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <fcntl.h>
+
+#include <rte_log.h>
+
 #include "vhost.h"
+#include "vhost_user.h"
+
+#define MAX_VIRTIO_BACKLOG 128
+
+static void vhost_user_read_cb(int connfd, void *dat, int *remove);
+
+/* return bytes# of read on success or negative val on failure. */
+int
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len  = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+	msgh.msg_control = control;
+	msgh.msg_controllen = sizeof(control);
+
+	ret = recvmsg(sockfd, &msgh, 0);
+	if (ret <= 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
+		return ret;
+	}
+
+	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
+		return -1;
+	}
+
+	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+		if ((cmsg->cmsg_level == SOL_SOCKET) &&
+			(cmsg->cmsg_type == SCM_RIGHTS)) {
+			memcpy(fds, CMSG_DATA(cmsg), fdsize);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+int
+send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+
+	if (fds && fd_num > 0) {
+		msgh.msg_control = control;
+		msgh.msg_controllen = sizeof(control);
+		cmsg = CMSG_FIRSTHDR(&msgh);
+		cmsg->cmsg_len = CMSG_LEN(fdsize);
+		cmsg->cmsg_level = SOL_SOCKET;
+		cmsg->cmsg_type = SCM_RIGHTS;
+		memcpy(CMSG_DATA(cmsg), fds, fdsize);
+	} else {
+		msgh.msg_control = NULL;
+		msgh.msg_controllen = 0;
+	}
+
+	do {
+		ret = sendmsg(sockfd, &msgh, 0);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static void
+vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
+{
+	int vid;
+	size_t size;
+	struct vhost_user_connection *conn;
+	int ret;
+
+	conn = malloc(sizeof(*conn));
+	if (conn == NULL) {
+		close(fd);
+		return;
+	}
+
+	vid = vhost_new_device();
+	if (vid == -1) {
+		goto err;
+	}
+
+	size = strnlen(vsocket->path, PATH_MAX);
+	vhost_set_ifname(vid, vsocket->path, size);
+
+	if (vsocket->dequeue_zero_copy)
+		vhost_enable_dequeue_zero_copy(vid);
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
+
+	if (vsocket->notify_ops->new_connection) {
+		ret = vsocket->notify_ops->new_connection(vid);
+		if (ret < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to add vhost user connection with fd %d\n",
+				fd);
+			goto err;
+		}
+	}
+
+	conn->connfd = fd;
+	conn->vsocket = vsocket;
+	conn->vid = vid;
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
+			NULL, conn);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add fd %d into vhost server fdset\n",
+			fd);
+
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->vid);
+
+		goto err;
+	}
+
+	pthread_mutex_lock(&vsocket->conn_mutex);
+	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
+	pthread_mutex_unlock(&vsocket->conn_mutex);
+	return;
+
+err:
+	free(conn);
+	close(fd);
+}
+
+/* call back when there is new vhost-user connection from client  */
+static void
+vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
+{
+	struct vhost_user_socket *vsocket = dat;
+
+	fd = accept(fd, NULL, NULL);
+	if (fd < 0)
+		return;
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new vhost user connection is %d\n", fd);
+	vhost_user_add_connection(fd, vsocket);
+}
+
+static void
+vhost_user_read_cb(int connfd, void *dat, int *remove)
+{
+	struct vhost_user_connection *conn = dat;
+	struct vhost_user_socket *vsocket = conn->vsocket;
+	int ret;
+
+	ret = vhost_user_msg_handler(conn->vid, connfd);
+	if (ret < 0) {
+		close(connfd);
+		*remove = 1;
+		vhost_destroy_device(conn->vid);
+
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->vid);
+
+		pthread_mutex_lock(&vsocket->conn_mutex);
+		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
+		pthread_mutex_unlock(&vsocket->conn_mutex);
+
+		free(conn);
+
+		if (vsocket->reconnect) {
+			create_unix_socket(vsocket);
+			vhost_user_start_client(vsocket);
+		}
+	}
+}
+
+int
+create_unix_socket(struct vhost_user_socket *vsocket)
+{
+	int fd;
+	struct sockaddr_un *un = &vsocket->un;
+
+	fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (fd < 0)
+		return -1;
+	RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n",
+		vsocket->is_server ? "server" : "client", fd);
+
+	if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost-user: can't set nonblocking mode for socket, fd: "
+			"%d (%s)\n", fd, strerror(errno));
+		close(fd);
+		return -1;
+	}
+
+	memset(un, 0, sizeof(*un));
+	un->sun_family = AF_UNIX;
+	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
+	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
+
+	vsocket->socket_fd = fd;
+	return 0;
+}
+
+int
+vhost_user_start_server(struct vhost_user_socket *vsocket)
+{
+	int ret;
+	int fd = vsocket->socket_fd;
+	const char *path = vsocket->path;
+
+	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to bind to %s: %s; remove it and try again\n",
+			path, strerror(errno));
+		goto err;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
+
+	ret = listen(fd, MAX_VIRTIO_BACKLOG);
+	if (ret < 0)
+		goto err;
+
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
+		  NULL, vsocket);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add listen fd %d to vhost server fdset\n",
+			fd);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	close(fd);
+	return -1;
+}
+
+struct vhost_user_reconnect {
+	struct sockaddr_un un;
+	int fd;
+	struct vhost_user_socket *vsocket;
+
+	TAILQ_ENTRY(vhost_user_reconnect) next;
+};
+
+TAILQ_HEAD(vhost_user_reconnect_tailq_list, vhost_user_reconnect);
+struct vhost_user_reconnect_list {
+	struct vhost_user_reconnect_tailq_list head;
+	pthread_mutex_t mutex;
+};
+
+static struct vhost_user_reconnect_list reconn_list;
+pthread_t reconn_tid;
+
+static int
+vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
+{
+	int ret, flags;
+
+	ret = connect(fd, un, sz);
+	if (ret < 0 && errno != EISCONN)
+		return -1;
+
+	flags = fcntl(fd, F_GETFL, 0);
+	if (flags < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"can't get flags for connfd %d\n", fd);
+		return -2;
+	}
+	if ((flags & O_NONBLOCK) && fcntl(fd, F_SETFL, flags & ~O_NONBLOCK)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"can't disable nonblocking on fd %d\n", fd);
+		return -2;
+	}
+	return 0;
+}
+
+static void *
+vhost_user_client_reconnect(void *arg __rte_unused)
+{
+	int ret;
+	struct vhost_user_reconnect *reconn, *next;
+
+	while (1) {
+		pthread_mutex_lock(&reconn_list.mutex);
+
+		/*
+		 * An equal implementation of TAILQ_FOREACH_SAFE,
+		 * which does not exist on all platforms.
+		 */
+		for (reconn = TAILQ_FIRST(&reconn_list.head);
+		     reconn != NULL; reconn = next) {
+			next = TAILQ_NEXT(reconn, next);
+
+			ret = vhost_user_connect_nonblock(reconn->fd,
+						(struct sockaddr *)&reconn->un,
+						sizeof(reconn->un));
+			if (ret == -2) {
+				close(reconn->fd);
+				RTE_LOG(ERR, VHOST_CONFIG,
+					"reconnection for fd %d failed\n",
+					reconn->fd);
+				goto remove_fd;
+			}
+			if (ret == -1)
+				continue;
+
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"%s: connected\n", reconn->vsocket->path);
+			vhost_user_add_connection(reconn->fd, reconn->vsocket);
+remove_fd:
+			TAILQ_REMOVE(&reconn_list.head, reconn, next);
+			free(reconn);
+		}
+
+		pthread_mutex_unlock(&reconn_list.mutex);
+		sleep(1);
+	}
+
+	return NULL;
+}
+
+int
+vhost_user_reconnect_init(void)
+{
+	int ret;
+	char thread_name[RTE_MAX_THREAD_NAME_LEN];
+
+	ret = pthread_mutex_init(&reconn_list.mutex, NULL);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "failed to initialize mutex");
+		return ret;
+	}
+	TAILQ_INIT(&reconn_list.head);
+
+	ret = pthread_create(&reconn_tid, NULL,
+			     vhost_user_client_reconnect, NULL);
+	if (ret != 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "failed to create reconnect thread");
+		if (pthread_mutex_destroy(&reconn_list.mutex)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to destroy reconnect mutex");
+		}
+	} else {
+		snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
+			 "vhost-reconn");
+
+		if (rte_thread_setname(reconn_tid, thread_name)) {
+			RTE_LOG(DEBUG, VHOST_CONFIG,
+				"failed to set reconnect thread name");
+		}
+	}
+
+	return ret;
+}
+
+int
+vhost_user_start_client(struct vhost_user_socket *vsocket)
+{
+	int ret;
+	int fd = vsocket->socket_fd;
+	const char *path = vsocket->path;
+	struct vhost_user_reconnect *reconn;
+
+	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
+					  sizeof(vsocket->un));
+	if (ret == 0) {
+		vhost_user_add_connection(fd, vsocket);
+		return 0;
+	}
+
+	RTE_LOG(WARNING, VHOST_CONFIG,
+		"failed to connect to %s: %s\n",
+		path, strerror(errno));
+
+	if (ret == -2 || !vsocket->reconnect) {
+		close(fd);
+		return -1;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "%s: reconnecting...\n", path);
+	reconn = malloc(sizeof(*reconn));
+	if (reconn == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to allocate memory for reconnect\n");
+		close(fd);
+		return -1;
+	}
+	reconn->un = vsocket->un;
+	reconn->fd = fd;
+	reconn->vsocket = vsocket;
+	pthread_mutex_lock(&reconn_list.mutex);
+	TAILQ_INSERT_TAIL(&reconn_list.head, reconn, next);
+	pthread_mutex_unlock(&reconn_list.mutex);
+
+	return 0;
+}
+
+bool
+vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
+{
+	int found = false;
+	struct vhost_user_reconnect *reconn, *next;
+
+	pthread_mutex_lock(&reconn_list.mutex);
+
+	for (reconn = TAILQ_FIRST(&reconn_list.head);
+	     reconn != NULL; reconn = next) {
+		next = TAILQ_NEXT(reconn, next);
+
+		if (reconn->vsocket == vsocket) {
+			TAILQ_REMOVE(&reconn_list.head, reconn, next);
+			close(reconn->fd);
+			free(reconn);
+			found = true;
+			break;
+		}
+	}
+	pthread_mutex_unlock(&reconn_list.mutex);
+	return found;
+}
 
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 03/24] vhost: allocate per-socket transport state
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 01/24] vhost: move vring_call() into trans_af_unix.c Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 02/24] vhost: move AF_UNIX code from socket.c to trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 04/24] vhost: move socket_fd and un sockaddr into trans_af_unix.c Stefan Hajnoczi
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

vhost-user transports have per-socket state (like file descriptors).
Make it possible for transports to keep state beyond what is included in
struct vhost_user_socket.

This patch makes it possible to move AF_UNIX-specific fields from struct
vhost_user_socket into trans_af_unix.c in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 9 +++++++++
 lib/librte_vhost/socket.c        | 6 ++++--
 lib/librte_vhost/trans_af_unix.c | 5 +++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 8c6d6e524..e5279a572 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -210,6 +210,9 @@ struct virtio_net;
  * A structure containing function pointers for transport-specific operations.
  */
 struct vhost_transport_ops {
+	/** Size of struct vhost_user_socket-derived per-socket state */
+	size_t socket_size;
+
 	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag has already been checked so this
@@ -273,6 +276,11 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
+ *
+ * Transport-specific per-socket state can be kept by embedding this struct at
+ * the beginning of a transport-specific struct.  Set
+ * vhost_transport_ops->socket_size to the size of the transport-specific
+ * struct.
  */
 struct vhost_user_socket {
 	struct vhost_user_connection_list conn_list;
@@ -296,6 +304,7 @@ struct vhost_user_socket {
 	uint64_t features;
 
 	struct vhost_device_ops const *notify_ops;
+	struct vhost_transport_ops const *trans_ops;
 };
 
 struct vhost_user_connection {
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index d681f9cae..fffffc663 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -128,6 +128,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 {
 	int ret = -1;
 	struct vhost_user_socket *vsocket;
+	const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops;
 
 	if (!path)
 		return -1;
@@ -140,10 +141,11 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		goto out;
 	}
 
-	vsocket = malloc(sizeof(struct vhost_user_socket));
+	vsocket = malloc(trans_ops->socket_size);
 	if (!vsocket)
 		goto out;
-	memset(vsocket, 0, sizeof(struct vhost_user_socket));
+	memset(vsocket, 0, trans_ops->socket_size);
+	vsocket->trans_ops = trans_ops;
 	vsocket->path = strdup(path);
 	if (vsocket->path == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 636f69916..5e3c5ab2a 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -42,6 +42,10 @@
 
 #define MAX_VIRTIO_BACKLOG 128
 
+struct af_unix_socket {
+	struct vhost_user_socket socket; /* must be the first field! */
+};
+
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /* return bytes# of read on success or negative val on failure. */
@@ -496,5 +500,6 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 }
 
 const struct vhost_transport_ops af_unix_trans_ops = {
+	.socket_size = sizeof(struct af_unix_socket),
 	.vring_call = af_unix_vring_call,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 04/24] vhost: move socket_fd and un sockaddr into trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 03/24] vhost: allocate per-socket transport state Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 05/24] vhost: move start_server/client() calls to trans_af_unix.c Stefan Hajnoczi
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The socket file descriptor and AF_UNIX sockaddr are specific to the
AF_UNIX transport, so move them into trans_af_unix.c.

In order to do this we need to begin defining the vhost_transport_ops
interface that will allow librte_vhost to support multiple transports.
This patch adds socket_init() and socket_cleanup() to
vhost_transport_ops.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 31 +++++++++++++++++-----
 lib/librte_vhost/socket.c        | 10 ++------
 lib/librte_vhost/trans_af_unix.c | 55 ++++++++++++++++++++++++++++++++--------
 3 files changed, 72 insertions(+), 24 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index e5279a572..3aefe6597 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -12,8 +12,6 @@
 #include <unistd.h>
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
-#include <sys/socket.h>
-#include <sys/un.h> /* TODO remove when trans_af_unix.c refactoring is done */
 #include <linux/if.h>
 #include <pthread.h>
 
@@ -205,6 +203,7 @@ struct guest_page {
 };
 
 struct virtio_net;
+struct vhost_user_socket;
 
 /**
  * A structure containing function pointers for transport-specific operations.
@@ -213,6 +212,30 @@ struct vhost_transport_ops {
 	/** Size of struct vhost_user_socket-derived per-socket state */
 	size_t socket_size;
 
+	/**
+	 * Initialize a vhost-user socket that is being created by
+	 * rte_vhost_driver_register().  This function checks that the flags
+	 * are valid but does not establish a vhost-user connection.
+	 *
+	 * @param vsocket
+	 *  new socket
+	 * @param flags
+	 *  flags argument from rte_vhost_driver_register()
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*socket_init)(struct vhost_user_socket *vsocket, uint64_t flags);
+
+	/**
+	 * Free resources associated with a socket, including any established
+	 * connections.  This function calls vhost_destroy_device() to destroy
+	 * established connections for this socket.
+	 *
+	 * @param vsocket
+	 *  vhost socket
+	 */
+	void (*socket_cleanup)(struct vhost_user_socket *vsocket);
+
 	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag has already been checked so this
@@ -286,8 +309,6 @@ struct vhost_user_socket {
 	struct vhost_user_connection_list conn_list;
 	pthread_mutex_t conn_mutex;
 	char *path;
-	int socket_fd;
-	struct sockaddr_un un;
 	bool is_server;
 	bool reconnect;
 	bool dequeue_zero_copy;
@@ -325,14 +346,12 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-int create_unix_socket(struct vhost_user_socket *vsocket);
 int vhost_user_start_server(struct vhost_user_socket *vsocket);
 int vhost_user_start_client(struct vhost_user_socket *vsocket);
 
 extern pthread_t reconn_tid;
 
 int vhost_user_reconnect_init(void);
-bool vhost_user_remove_reconnect(struct vhost_user_socket *vsocket);
 
 #define VHOST_LOG_PAGE	4096
 
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index fffffc663..78f847ccc 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -191,7 +191,7 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	} else {
 		vsocket->is_server = true;
 	}
-	ret = create_unix_socket(vsocket);
+	ret = trans_ops->socket_init(vsocket, flags);
 	if (ret < 0) {
 		goto out_mutex;
 	}
@@ -231,13 +231,7 @@ rte_vhost_driver_unregister(const char *path)
 		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
 
 		if (!strcmp(vsocket->path, path)) {
-			if (vsocket->is_server) {
-				fdset_del(&vhost_user.fdset, vsocket->socket_fd);
-				close(vsocket->socket_fd);
-				unlink(path);
-			} else if (vsocket->reconnect) {
-				vhost_user_remove_reconnect(vsocket);
-			}
+			vsocket->trans_ops->socket_cleanup(vsocket);
 
 			pthread_mutex_lock(&vsocket->conn_mutex);
 			for (conn = TAILQ_FIRST(&vsocket->conn_list);
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 5e3c5ab2a..cc8d7ccdc 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -33,6 +33,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <sys/socket.h>
+#include <sys/un.h>
 #include <fcntl.h>
 
 #include <rte_log.h>
@@ -44,8 +46,11 @@
 
 struct af_unix_socket {
 	struct vhost_user_socket socket; /* must be the first field! */
+	int socket_fd;
+	struct sockaddr_un un;
 };
 
+static int create_unix_socket(struct vhost_user_socket *vsocket);
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /* return bytes# of read on success or negative val on failure. */
@@ -240,11 +245,13 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 	}
 }
 
-int
+static int
 create_unix_socket(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int fd;
-	struct sockaddr_un *un = &vsocket->un;
+	struct sockaddr_un *un = &s->un;
 
 	fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (fd < 0)
@@ -265,18 +272,20 @@ create_unix_socket(struct vhost_user_socket *vsocket)
 	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
 	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
 
-	vsocket->socket_fd = fd;
+	s->socket_fd = fd;
 	return 0;
 }
 
 int
 vhost_user_start_server(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
-	int fd = vsocket->socket_fd;
+	int fd = s->socket_fd;
 	const char *path = vsocket->path;
 
-	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
+	ret = bind(fd, (struct sockaddr *)&s->un, sizeof(s->un));
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"failed to bind to %s: %s; remove it and try again\n",
@@ -427,13 +436,15 @@ vhost_user_reconnect_init(void)
 int
 vhost_user_start_client(struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
-	int fd = vsocket->socket_fd;
+	int fd = s->socket_fd;
 	const char *path = vsocket->path;
 	struct vhost_user_reconnect *reconn;
 
-	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
-					  sizeof(vsocket->un));
+	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&s->un,
+					  sizeof(s->un));
 	if (ret == 0) {
 		vhost_user_add_connection(fd, vsocket);
 		return 0;
@@ -456,7 +467,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket)
 		close(fd);
 		return -1;
 	}
-	reconn->un = vsocket->un;
+	reconn->un = s->un;
 	reconn->fd = fd;
 	reconn->vsocket = vsocket;
 	pthread_mutex_lock(&reconn_list.mutex);
@@ -466,7 +477,7 @@ vhost_user_start_client(struct vhost_user_socket *vsocket)
 	return 0;
 }
 
-bool
+static bool
 vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
 {
 	int found = false;
@@ -490,6 +501,28 @@ vhost_user_remove_reconnect(struct vhost_user_socket *vsocket)
 	return found;
 }
 
+static int
+af_unix_socket_init(struct vhost_user_socket *vsocket,
+		    uint64_t flags __rte_unused)
+{
+	return create_unix_socket(vsocket);
+}
+
+static void
+af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
+{
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
+
+	if (vsocket->is_server) {
+		fdset_del(&vhost_user.fdset, s->socket_fd);
+		close(s->socket_fd);
+		unlink(vsocket->path);
+	} else if (vsocket->reconnect) {
+		vhost_user_remove_reconnect(vsocket);
+	}
+}
+
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
@@ -501,5 +534,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
+	.socket_init = af_unix_socket_init,
+	.socket_cleanup = af_unix_socket_cleanup,
 	.vring_call = af_unix_vring_call,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 05/24] vhost: move start_server/client() calls to trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 04/24] vhost: move socket_fd and un sockaddr into trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 06/24] vhost: move vhost_user_connection " Stefan Hajnoczi
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

Introduce a vhost_transport_ops->socket_start() interface so the
transport can begin establishing vhost-user connections.  This is part
of the AF_UNIX transport refactoring and removes AF_UNIX code from
vhost.h and socket.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 16 +++++++++++++---
 lib/librte_vhost/socket.c        |  5 +----
 lib/librte_vhost/trans_af_unix.c | 16 ++++++++++++++--
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 3aefe6597..7cbef04ab 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -236,6 +236,19 @@ struct vhost_transport_ops {
 	 */
 	void (*socket_cleanup)(struct vhost_user_socket *vsocket);
 
+	/**
+	 * Start establishing vhost-user connections.  This function is
+	 * asynchronous and connections may be established after it has
+	 * returned.  Call vhost_user_add_connection() to register new
+	 * connections.
+	 *
+	 * @param vsocket
+	 *  vhost socket
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*socket_start)(struct vhost_user_socket *vsocket);
+
 	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag has already been checked so this
@@ -346,9 +359,6 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-int vhost_user_start_server(struct vhost_user_socket *vsocket);
-int vhost_user_start_client(struct vhost_user_socket *vsocket);
-
 extern pthread_t reconn_tid;
 
 int vhost_user_reconnect_init(void);
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 78f847ccc..f8a96ab5f 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -318,8 +318,5 @@ rte_vhost_driver_start(const char *path)
 				"failed to create fdset handling thread");
 	}
 
-	if (vsocket->is_server)
-		return vhost_user_start_server(vsocket);
-	else
-		return vhost_user_start_client(vsocket);
+	return vsocket->trans_ops->socket_start(vsocket);
 }
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index cc8d7ccdc..6c22093a4 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -51,6 +51,8 @@ struct af_unix_socket {
 };
 
 static int create_unix_socket(struct vhost_user_socket *vsocket);
+static int vhost_user_start_server(struct vhost_user_socket *vsocket);
+static int vhost_user_start_client(struct vhost_user_socket *vsocket);
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /* return bytes# of read on success or negative val on failure. */
@@ -276,7 +278,7 @@ create_unix_socket(struct vhost_user_socket *vsocket)
 	return 0;
 }
 
-int
+static int
 vhost_user_start_server(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *s =
@@ -433,7 +435,7 @@ vhost_user_reconnect_init(void)
 	return ret;
 }
 
-int
+static int
 vhost_user_start_client(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *s =
@@ -523,6 +525,15 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	}
 }
 
+static int
+af_unix_socket_start(struct vhost_user_socket *vsocket)
+{
+	if (vsocket->is_server)
+		return vhost_user_start_server(vsocket);
+	else
+		return vhost_user_start_client(vsocket);
+}
+
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
@@ -536,5 +547,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
+	.socket_start = af_unix_socket_start,
 	.vring_call = af_unix_vring_call,
 };
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 06/24] vhost: move vhost_user_connection to trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 05/24] vhost: move start_server/client() calls to trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 07/24] vhost: move vhost_user_reconnect_init() into trans_af_unix.c Stefan Hajnoczi
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The AF_UNIX transport can accept multiple client connections on a server
socket.  Each connection instantiates a separate vhost-user device,
which is stored as a vhost_user_connection.  This behavior is specific
to AF_UNIX and other transports may not support N connections per
socket endpoint.

Move struct vhost_user_connection to trans_af_unix.c and
conn_list/conn_mutex into struct af_unix_socket.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 19 ++-----------
 lib/librte_vhost/socket.c        | 36 ++----------------------
 lib/librte_vhost/trans_af_unix.c | 60 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 7cbef04ab..734a8721d 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -302,13 +302,10 @@ struct virtio_net {
 	int			slave_req_fd;
 } __rte_cache_aligned;
 
-/* The vhost_user, vhost_user_socket, vhost_user_connection, and reconnect
- * declarations are temporary measures for moving AF_UNIX code into
- * trans_af_unix.c.  They will be cleaned up as socket.c is untangled from
- * trans_af_unix.c.
+/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary
+ * measures for moving AF_UNIX code into trans_af_unix.c.  They will be cleaned
+ * up as socket.c is untangled from trans_af_unix.c.
  */
-TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
-
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
@@ -319,8 +316,6 @@ TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
  * struct.
  */
 struct vhost_user_socket {
-	struct vhost_user_connection_list conn_list;
-	pthread_mutex_t conn_mutex;
 	char *path;
 	bool is_server;
 	bool reconnect;
@@ -341,14 +336,6 @@ struct vhost_user_socket {
 	struct vhost_transport_ops const *trans_ops;
 };
 
-struct vhost_user_connection {
-	struct vhost_user_socket *vsocket;
-	int connfd;
-	int vid;
-
-	TAILQ_ENTRY(vhost_user_connection) next;
-};
-
 #define MAX_VHOST_SOCKET 1024
 struct vhost_user {
 	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index f8a96ab5f..4fd86fd5b 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -153,13 +153,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		free(vsocket);
 		goto out;
 	}
-	TAILQ_INIT(&vsocket->conn_list);
-	ret = pthread_mutex_init(&vsocket->conn_mutex, NULL);
-	if (ret) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"error: failed to init connection mutex\n");
-		goto out_free;
-	}
 	vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
 
 	/*
@@ -186,14 +179,14 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
 		if (vsocket->reconnect && reconn_tid == 0) {
 			if (vhost_user_reconnect_init() != 0)
-				goto out_mutex;
+				goto out_free;
 		}
 	} else {
 		vsocket->is_server = true;
 	}
 	ret = trans_ops->socket_init(vsocket, flags);
 	if (ret < 0) {
-		goto out_mutex;
+		goto out_free;
 	}
 
 	vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket;
@@ -201,11 +194,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	pthread_mutex_unlock(&vhost_user.mutex);
 	return ret;
 
-out_mutex:
-	if (pthread_mutex_destroy(&vsocket->conn_mutex)) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"error: failed to destroy connection mutex\n");
-	}
 out_free:
 	free(vsocket->path);
 	free(vsocket);
@@ -223,7 +211,6 @@ rte_vhost_driver_unregister(const char *path)
 {
 	int i;
 	int count;
-	struct vhost_user_connection *conn, *next;
 
 	pthread_mutex_lock(&vhost_user.mutex);
 
@@ -232,25 +219,6 @@ rte_vhost_driver_unregister(const char *path)
 
 		if (!strcmp(vsocket->path, path)) {
 			vsocket->trans_ops->socket_cleanup(vsocket);
-
-			pthread_mutex_lock(&vsocket->conn_mutex);
-			for (conn = TAILQ_FIRST(&vsocket->conn_list);
-			     conn != NULL;
-			     conn = next) {
-				next = TAILQ_NEXT(conn, next);
-
-				fdset_del(&vhost_user.fdset, conn->connfd);
-				RTE_LOG(INFO, VHOST_CONFIG,
-					"free connfd = %d for device '%s'\n",
-					conn->connfd, path);
-				close(conn->connfd);
-				vhost_destroy_device(conn->vid);
-				TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-				free(conn);
-			}
-			pthread_mutex_unlock(&vsocket->conn_mutex);
-
-			pthread_mutex_destroy(&vsocket->conn_mutex);
 			free(vsocket->path);
 			free(vsocket);
 
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 6c22093a4..747fd9690 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -44,8 +44,20 @@
 
 #define MAX_VIRTIO_BACKLOG 128
 
+TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
+
+struct vhost_user_connection {
+	struct vhost_user_socket *vsocket;
+	int connfd;
+	int vid;
+
+	TAILQ_ENTRY(vhost_user_connection) next;
+};
+
 struct af_unix_socket {
 	struct vhost_user_socket socket; /* must be the first field! */
+	struct vhost_user_connection_list conn_list;
+	pthread_mutex_t conn_mutex;
 	int socket_fd;
 	struct sockaddr_un un;
 };
@@ -144,6 +156,8 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 static void
 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int vid;
 	size_t size;
 	struct vhost_user_connection *conn;
@@ -194,9 +208,9 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		goto err;
 	}
 
-	pthread_mutex_lock(&vsocket->conn_mutex);
-	TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next);
-	pthread_mutex_unlock(&vsocket->conn_mutex);
+	pthread_mutex_lock(&s->conn_mutex);
+	TAILQ_INSERT_TAIL(&s->conn_list, conn, next);
+	pthread_mutex_unlock(&s->conn_mutex);
 	return;
 
 err:
@@ -223,6 +237,8 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 {
 	struct vhost_user_connection *conn = dat;
 	struct vhost_user_socket *vsocket = conn->vsocket;
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
 	ret = vhost_user_msg_handler(conn->vid, connfd);
@@ -234,9 +250,9 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 		if (vsocket->notify_ops->destroy_connection)
 			vsocket->notify_ops->destroy_connection(conn->vid);
 
-		pthread_mutex_lock(&vsocket->conn_mutex);
-		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
-		pthread_mutex_unlock(&vsocket->conn_mutex);
+		pthread_mutex_lock(&s->conn_mutex);
+		TAILQ_REMOVE(&s->conn_list, conn, next);
+		pthread_mutex_unlock(&s->conn_mutex);
 
 		free(conn);
 
@@ -507,6 +523,18 @@ static int
 af_unix_socket_init(struct vhost_user_socket *vsocket,
 		    uint64_t flags __rte_unused)
 {
+	struct af_unix_socket *s =
+		container_of(vsocket, struct af_unix_socket, socket);
+	int ret;
+
+	TAILQ_INIT(&s->conn_list);
+	ret = pthread_mutex_init(&s->conn_mutex, NULL);
+	if (ret) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: failed to init connection mutex\n");
+		return -1;
+	}
+
 	return create_unix_socket(vsocket);
 }
 
@@ -515,6 +543,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *s =
 		container_of(vsocket, struct af_unix_socket, socket);
+	struct vhost_user_connection *conn, *next;
 
 	if (vsocket->is_server) {
 		fdset_del(&vhost_user.fdset, s->socket_fd);
@@ -523,6 +552,25 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	} else if (vsocket->reconnect) {
 		vhost_user_remove_reconnect(vsocket);
 	}
+
+	pthread_mutex_lock(&s->conn_mutex);
+	for (conn = TAILQ_FIRST(&s->conn_list);
+	     conn != NULL;
+	     conn = next) {
+		next = TAILQ_NEXT(conn, next);
+
+		fdset_del(&vhost_user.fdset, conn->connfd);
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"free connfd = %d for device '%s'\n",
+			conn->connfd, vsocket->path);
+		close(conn->connfd);
+		vhost_destroy_device(conn->vid);
+		TAILQ_REMOVE(&s->conn_list, conn, next);
+		free(conn);
+	}
+	pthread_mutex_unlock(&s->conn_mutex);
+
+	pthread_mutex_destroy(&s->conn_mutex);
 }
 
 static int
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 07/24] vhost: move vhost_user_reconnect_init() into trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 06/24] vhost: move vhost_user_connection " Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 08/24] vhost: move vhost_user.fdset to trans_af_unix.c Stefan Hajnoczi
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The socket reconnection code is highly specific to AF_UNIX, so move the
remaining pieces of it into trans_af_unix.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 10 +++-------
 lib/librte_vhost/socket.c        |  4 ----
 lib/librte_vhost/trans_af_unix.c |  9 +++++++--
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 734a8721d..8f18c1ed0 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -302,9 +302,9 @@ struct virtio_net {
 	int			slave_req_fd;
 } __rte_cache_aligned;
 
-/* The vhost_user, vhost_user_socket, and reconnect declarations are temporary
- * measures for moving AF_UNIX code into trans_af_unix.c.  They will be cleaned
- * up as socket.c is untangled from trans_af_unix.c.
+/* The vhost_user and vhost_user_socket declarations are temporary measures for
+ * moving AF_UNIX code into trans_af_unix.c.  They will be cleaned up as
+ * socket.c is untangled from trans_af_unix.c.
  */
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
@@ -346,10 +346,6 @@ struct vhost_user {
 
 extern struct vhost_user vhost_user;
 
-extern pthread_t reconn_tid;
-
-int vhost_user_reconnect_init(void);
-
 #define VHOST_LOG_PAGE	4096
 
 /*
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 4fd86fd5b..f9069fcb1 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -177,10 +177,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 
 	if ((flags & RTE_VHOST_USER_CLIENT) != 0) {
 		vsocket->reconnect = !(flags & RTE_VHOST_USER_NO_RECONNECT);
-		if (vsocket->reconnect && reconn_tid == 0) {
-			if (vhost_user_reconnect_init() != 0)
-				goto out_free;
-		}
 	} else {
 		vsocket->is_server = true;
 	}
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 747fd9690..ad07a8e24 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -347,7 +347,7 @@ struct vhost_user_reconnect_list {
 };
 
 static struct vhost_user_reconnect_list reconn_list;
-pthread_t reconn_tid;
+static pthread_t reconn_tid;
 
 static int
 vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
@@ -417,7 +417,7 @@ vhost_user_client_reconnect(void *arg __rte_unused)
 	return NULL;
 }
 
-int
+static int
 vhost_user_reconnect_init(void)
 {
 	int ret;
@@ -527,6 +527,11 @@ af_unix_socket_init(struct vhost_user_socket *vsocket,
 		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
+	if (vsocket->reconnect && reconn_tid == 0) {
+		if (vhost_user_reconnect_init() != 0)
+			return -1;
+	}
+
 	TAILQ_INIT(&s->conn_list);
 	ret = pthread_mutex_init(&s->conn_mutex, NULL);
 	if (ret) {
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 08/24] vhost: move vhost_user.fdset to trans_af_unix.c
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 07/24] vhost: move vhost_user_reconnect_init() into trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 09/24] vhost: pass vhost_transport_ops through vhost_new_device() Stefan Hajnoczi
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The fdset is used by the AF_UNIX transport code but other transports may
not need it.  Move it to trans_af_unix.c and then make struct vhost_user
private again since nothing outside socket.c needs it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 15 ---------------
 lib/librte_vhost/socket.c        | 21 +++++++--------------
 lib/librte_vhost/trans_af_unix.c | 25 +++++++++++++++++++++----
 3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 8f18c1ed0..45167b6a7 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -19,7 +19,6 @@
 #include <rte_ether.h>
 #include <rte_rwlock.h>
 
-#include "fd_man.h"
 #include "rte_vhost.h"
 
 /* Used to indicate that the device is running on a data core */
@@ -302,10 +301,6 @@ struct virtio_net {
 	int			slave_req_fd;
 } __rte_cache_aligned;
 
-/* The vhost_user and vhost_user_socket declarations are temporary measures for
- * moving AF_UNIX code into trans_af_unix.c.  They will be cleaned up as
- * socket.c is untangled from trans_af_unix.c.
- */
 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
  * vhost_user_socket struct will be created.
@@ -336,16 +331,6 @@ struct vhost_user_socket {
 	struct vhost_transport_ops const *trans_ops;
 };
 
-#define MAX_VHOST_SOCKET 1024
-struct vhost_user {
-	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
-	struct fdset fdset;
-	int vsocket_cnt;
-	pthread_mutex_t mutex;
-};
-
-extern struct vhost_user vhost_user;
-
 #define VHOST_LOG_PAGE	4096
 
 /*
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index f9069fcb1..c46328950 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -20,12 +20,14 @@
 #include "vhost.h"
 #include "vhost_user.h"
 
+#define MAX_VHOST_SOCKET 1024
+struct vhost_user {
+	struct vhost_user_socket *vsockets[MAX_VHOST_SOCKET];
+	int vsocket_cnt;
+	pthread_mutex_t mutex;
+};
+
 struct vhost_user vhost_user = {
-	.fdset = {
-		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
-		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
-		.num = 0
-	},
 	.vsocket_cnt = 0,
 	.mutex = PTHREAD_MUTEX_INITIALIZER,
 };
@@ -265,7 +267,6 @@ int
 rte_vhost_driver_start(const char *path)
 {
 	struct vhost_user_socket *vsocket;
-	static pthread_t fdset_tid;
 
 	pthread_mutex_lock(&vhost_user.mutex);
 	vsocket = find_vhost_user_socket(path);
@@ -274,13 +275,5 @@ rte_vhost_driver_start(const char *path)
 	if (!vsocket)
 		return -1;
 
-	if (fdset_tid == 0) {
-		int ret = pthread_create(&fdset_tid, NULL, fdset_event_dispatch,
-				     &vhost_user.fdset);
-		if (ret != 0)
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"failed to create fdset handling thread");
-	}
-
 	return vsocket->trans_ops->socket_start(vsocket);
 }
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index ad07a8e24..54c1b2db3 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -39,11 +39,18 @@
 
 #include <rte_log.h>
 
+#include "fd_man.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
 #define MAX_VIRTIO_BACKLOG 128
 
+static struct fdset af_unix_fdset = {
+	.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+	.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+	.num = 0
+};
+
 TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 
 struct vhost_user_connection {
@@ -195,7 +202,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	conn->connfd = fd;
 	conn->vsocket = vsocket;
 	conn->vid = vid;
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb,
+	ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb,
 			NULL, conn);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -316,7 +323,7 @@ vhost_user_start_server(struct vhost_user_socket *vsocket)
 	if (ret < 0)
 		goto err;
 
-	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
+	ret = fdset_add(&af_unix_fdset, fd, vhost_user_server_new_connection,
 		  NULL, vsocket);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -551,7 +558,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	struct vhost_user_connection *conn, *next;
 
 	if (vsocket->is_server) {
-		fdset_del(&vhost_user.fdset, s->socket_fd);
+		fdset_del(&af_unix_fdset, s->socket_fd);
 		close(s->socket_fd);
 		unlink(vsocket->path);
 	} else if (vsocket->reconnect) {
@@ -564,7 +571,7 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 	     conn = next) {
 		next = TAILQ_NEXT(conn, next);
 
-		fdset_del(&vhost_user.fdset, conn->connfd);
+		fdset_del(&af_unix_fdset, conn->connfd);
 		RTE_LOG(INFO, VHOST_CONFIG,
 			"free connfd = %d for device '%s'\n",
 			conn->connfd, vsocket->path);
@@ -581,6 +588,16 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 static int
 af_unix_socket_start(struct vhost_user_socket *vsocket)
 {
+	static pthread_t fdset_tid;
+
+	if (fdset_tid == 0) {
+		int ret = pthread_create(&fdset_tid, NULL, fdset_event_dispatch,
+				     &af_unix_fdset);
+		if (ret != 0)
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to create fdset handling thread");
+	}
+
 	if (vsocket->is_server)
 		return vhost_user_start_server(vsocket);
 	else
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 09/24] vhost: pass vhost_transport_ops through vhost_new_device()
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 08/24] vhost: move vhost_user.fdset to trans_af_unix.c Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 10/24] vhost: embed struct virtio_net inside struct vhost_user_connection Stefan Hajnoczi
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

This patch propagates struct vhost_user_socket's vhost_transport_ops
into the newly created vhost device.

This patch completes the initial refactoring of socket.c, with the
AF_UNIX-specific code now in trans_af_unix.c and the librte_vhost API
entrypoints in socket.c.

Now it is time to turn towards vhost_user.c and its mixture of
vhost-user protocol processing and socket I/O.  The socket I/O will be
moved into trans_af_unix.c so that other transports can be added that
don't use file descriptors.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 2 +-
 lib/librte_vhost/trans_af_unix.c | 2 +-
 lib/librte_vhost/vhost.c         | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 45167b6a7..ca7507284 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -434,7 +434,7 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 
 struct virtio_net *get_device(int vid);
 
-int vhost_new_device(void);
+int vhost_new_device(const struct vhost_transport_ops *trans_ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 54c1b2db3..c85a162e8 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -176,7 +176,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		return;
 	}
 
-	vid = vhost_new_device();
+	vid = vhost_new_device(vsocket->trans_ops);
 	if (vid == -1) {
 		goto err;
 	}
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 630dbd014..f8754e261 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -261,7 +261,7 @@ reset_device(struct virtio_net *dev)
  * there is a new virtio device being attached).
  */
 int
-vhost_new_device(void)
+vhost_new_device(const struct vhost_transport_ops *trans_ops)
 {
 	struct virtio_net *dev;
 	int i;
@@ -287,7 +287,7 @@ vhost_new_device(void)
 	vhost_devices[i] = dev;
 	dev->vid = i;
 	dev->slave_req_fd = -1;
-	dev->trans_ops = &af_unix_trans_ops;
+	dev->trans_ops = trans_ops;
 
 	return i;
 }
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 10/24] vhost: embed struct virtio_net inside struct vhost_user_connection
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 09/24] vhost: pass vhost_transport_ops through vhost_new_device() Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 11/24] vhost: extract vhost_user.c socket I/O into transport Stefan Hajnoczi
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

There is a 1:1 relationship between struct virtio_net and struct
vhost_user_connection.  They share the same lifetime.  struct virtio_net
is the per-device state that is part of the vhost.h API.  struct
vhost_user_connection is the AF_UNIX-specific per-device state and is
private to trans_af_unix.c.  It will be necessary to go between these
two structs.

This patch embeds struct virtio_net within struct vhost_user_connection
so that AF_UNIX transport code can convert a struct virtio_net pointer
into a struct vhost_user_connection pointer.  There is now just a single
malloc/free for both of these structs together.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 11 +++++++++-
 lib/librte_vhost/trans_af_unix.c | 44 +++++++++++++++++-----------------------
 lib/librte_vhost/vhost.c         | 10 ++++-----
 3 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index ca7507284..757e18391 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -211,6 +211,9 @@ struct vhost_transport_ops {
 	/** Size of struct vhost_user_socket-derived per-socket state */
 	size_t socket_size;
 
+	/** Size of struct virtio_net-derived per-device state */
+	size_t device_size;
+
 	/**
 	 * Initialize a vhost-user socket that is being created by
 	 * rte_vhost_driver_register().  This function checks that the flags
@@ -269,6 +272,11 @@ extern const struct vhost_transport_ops af_unix_trans_ops;
 /**
  * Device structure contains all configuration information relating
  * to the device.
+ *
+ * Transport-specific per-device state can be kept by embedding this struct at
+ * the beginning of a transport-specific struct.  Set
+ * vhost_transport_ops->device_size to the size of the transport-specific
+ * struct.
  */
 struct virtio_net {
 	/* Frontend (QEMU) memory and memory region information */
@@ -434,7 +442,8 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 
 struct virtio_net *get_device(int vid);
 
-int vhost_new_device(const struct vhost_transport_ops *trans_ops);
+struct virtio_net *
+vhost_new_device(const struct vhost_transport_ops *trans_ops);
 void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index c85a162e8..dde3e76cd 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -54,9 +54,9 @@ static struct fdset af_unix_fdset = {
 TAILQ_HEAD(vhost_user_connection_list, vhost_user_connection);
 
 struct vhost_user_connection {
+	struct virtio_net device; /* must be the first field! */
 	struct vhost_user_socket *vsocket;
 	int connfd;
-	int vid;
 
 	TAILQ_ENTRY(vhost_user_connection) next;
 };
@@ -165,32 +165,30 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
 	struct af_unix_socket *s =
 		container_of(vsocket, struct af_unix_socket, socket);
-	int vid;
 	size_t size;
+	struct virtio_net *dev;
 	struct vhost_user_connection *conn;
 	int ret;
 
-	conn = malloc(sizeof(*conn));
-	if (conn == NULL) {
-		close(fd);
+	dev = vhost_new_device(vsocket->trans_ops);
+	if (!dev) {
 		return;
 	}
 
-	vid = vhost_new_device(vsocket->trans_ops);
-	if (vid == -1) {
-		goto err;
-	}
+	conn = container_of(dev, struct vhost_user_connection, device);
+	conn->connfd = fd;
+	conn->vsocket = vsocket;
 
 	size = strnlen(vsocket->path, PATH_MAX);
-	vhost_set_ifname(vid, vsocket->path, size);
+	vhost_set_ifname(dev->vid, vsocket->path, size);
 
 	if (vsocket->dequeue_zero_copy)
-		vhost_enable_dequeue_zero_copy(vid);
+		vhost_enable_dequeue_zero_copy(dev->vid);
 
-	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid);
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid);
 
 	if (vsocket->notify_ops->new_connection) {
-		ret = vsocket->notify_ops->new_connection(vid);
+		ret = vsocket->notify_ops->new_connection(dev->vid);
 		if (ret < 0) {
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"failed to add vhost user connection with fd %d\n",
@@ -199,9 +197,6 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 		}
 	}
 
-	conn->connfd = fd;
-	conn->vsocket = vsocket;
-	conn->vid = vid;
 	ret = fdset_add(&af_unix_fdset, fd, vhost_user_read_cb,
 			NULL, conn);
 	if (ret < 0) {
@@ -210,7 +205,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 			fd);
 
 		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
+			vsocket->notify_ops->destroy_connection(dev->vid);
 
 		goto err;
 	}
@@ -221,8 +216,8 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	return;
 
 err:
-	free(conn);
-	close(fd);
+	close(conn->connfd);
+	vhost_destroy_device(dev->vid);
 }
 
 /* call back when there is new vhost-user connection from client  */
@@ -248,20 +243,19 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 		container_of(vsocket, struct af_unix_socket, socket);
 	int ret;
 
-	ret = vhost_user_msg_handler(conn->vid, connfd);
+	ret = vhost_user_msg_handler(conn->device.vid, connfd);
 	if (ret < 0) {
 		close(connfd);
 		*remove = 1;
-		vhost_destroy_device(conn->vid);
 
 		if (vsocket->notify_ops->destroy_connection)
-			vsocket->notify_ops->destroy_connection(conn->vid);
+			vsocket->notify_ops->destroy_connection(conn->device.vid);
 
 		pthread_mutex_lock(&s->conn_mutex);
 		TAILQ_REMOVE(&s->conn_list, conn, next);
 		pthread_mutex_unlock(&s->conn_mutex);
 
-		free(conn);
+		vhost_destroy_device(conn->device.vid);
 
 		if (vsocket->reconnect) {
 			create_unix_socket(vsocket);
@@ -576,9 +570,8 @@ af_unix_socket_cleanup(struct vhost_user_socket *vsocket)
 			"free connfd = %d for device '%s'\n",
 			conn->connfd, vsocket->path);
 		close(conn->connfd);
-		vhost_destroy_device(conn->vid);
 		TAILQ_REMOVE(&s->conn_list, conn, next);
-		free(conn);
+		vhost_destroy_device(conn->device.vid);
 	}
 	pthread_mutex_unlock(&s->conn_mutex);
 
@@ -615,6 +608,7 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
+	.device_size = sizeof(struct vhost_user_connection),
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index f8754e261..1168e137e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -260,17 +260,17 @@ reset_device(struct virtio_net *dev)
  * Invoked when there is a new vhost-user connection established (when
  * there is a new virtio device being attached).
  */
-int
+struct virtio_net *
 vhost_new_device(const struct vhost_transport_ops *trans_ops)
 {
 	struct virtio_net *dev;
 	int i;
 
-	dev = rte_zmalloc(NULL, sizeof(struct virtio_net), 0);
+	dev = rte_zmalloc(NULL, trans_ops->device_size, 0);
 	if (dev == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to allocate memory for new dev.\n");
-		return -1;
+		return NULL;
 	}
 
 	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
@@ -281,7 +281,7 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Failed to find a free slot for new device.\n");
 		rte_free(dev);
-		return -1;
+		return NULL;
 	}
 
 	vhost_devices[i] = dev;
@@ -289,7 +289,7 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 	dev->slave_req_fd = -1;
 	dev->trans_ops = trans_ops;
 
-	return i;
+	return dev;
 }
 
 /*
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 11/24] vhost: extract vhost_user.c socket I/O into transport
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 10/24] vhost: embed struct virtio_net inside struct vhost_user_connection Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 12/24] vhost: move slave_req_fd field to AF_UNIX transport Stefan Hajnoczi
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The core vhost-user protocol code should not do socket I/O because the
details are transport-specific.  Move code to send and receive
vhost-user messages into trans_af_unix.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 26 ++++++++++++
 lib/librte_vhost/vhost_user.h    |  6 +--
 lib/librte_vhost/trans_af_unix.c | 70 +++++++++++++++++++++++++++++++--
 lib/librte_vhost/vhost_user.c    | 85 +++++++++++-----------------------------
 4 files changed, 115 insertions(+), 72 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 757e18391..ac9ceefb9 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -203,6 +203,7 @@ struct guest_page {
 
 struct virtio_net;
 struct vhost_user_socket;
+struct VhostUserMsg;
 
 /**
  * A structure containing function pointers for transport-specific operations.
@@ -264,6 +265,31 @@ struct vhost_transport_ops {
 	 *  0 on success, -1 on failure
 	 */
 	int (*vring_call)(struct virtio_net *dev, struct vhost_virtqueue *vq);
+
+	/**
+	 * Send a reply to the master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param reply
+	 *  reply message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*send_reply)(struct virtio_net *dev, struct VhostUserMsg *reply);
+
+	/**
+	 * Send a slave request to the master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param req
+	 *  request message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*send_slave_req)(struct virtio_net *dev,
+			      struct VhostUserMsg *req);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index d4bd604b9..dec658dff 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -110,11 +110,7 @@ typedef struct VhostUserMsg {
 
 
 /* vhost_user.c */
-int vhost_user_msg_handler(int vid, int fd);
+int vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg);
 int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
-/* socket.c */
-int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
-int send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
-
 #endif
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index dde3e76cd..9e5a5c127 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -75,7 +75,7 @@ static int vhost_user_start_client(struct vhost_user_socket *vsocket);
 static void vhost_user_read_cb(int connfd, void *dat, int *remove);
 
 /* return bytes# of read on success or negative val on failure. */
-int
+static int
 read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 {
 	struct iovec iov;
@@ -117,8 +117,8 @@ read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 	return ret;
 }
 
-int
-send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+static int
+send_fd_message(int sockfd, void *buf, int buflen, int *fds, int fd_num)
 {
 
 	struct iovec iov;
@@ -160,6 +160,23 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
 	return ret;
 }
 
+static int
+af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	return send_fd_message(conn->connfd, msg,
+			       VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+}
+
+static int
+af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	return send_fd_message(dev->slave_req_fd, msg,
+			       VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+}
+
 static void
 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
@@ -234,6 +251,36 @@ vhost_user_server_new_connection(int fd, void *dat, int *remove __rte_unused)
 	vhost_user_add_connection(fd, vsocket);
 }
 
+/* return bytes# of read on success or negative val on failure. */
+static int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
+		msg->fds, VHOST_MEMORY_MAX_NREGIONS);
+	if (ret <= 0)
+		return ret;
+
+	if (msg && msg->size) {
+		if (msg->size > sizeof(msg->payload)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"invalid msg size: %d\n", msg->size);
+			return -1;
+		}
+		ret = read(sockfd, &msg->payload, msg->size);
+		if (ret <= 0)
+			return ret;
+		if (ret != (int)msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"read control message failed\n");
+			return -1;
+		}
+	}
+
+	return ret;
+}
+
 static void
 vhost_user_read_cb(int connfd, void *dat, int *remove)
 {
@@ -241,10 +288,23 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
 	struct vhost_user_socket *vsocket = conn->vsocket;
 	struct af_unix_socket *s =
 		container_of(vsocket, struct af_unix_socket, socket);
+	struct VhostUserMsg msg;
 	int ret;
 
-	ret = vhost_user_msg_handler(conn->device.vid, connfd);
+	ret = read_vhost_message(connfd, &msg);
+	if (ret <= 0) {
+		if (ret < 0)
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"vhost read message failed\n");
+		else if (ret == 0)
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"vhost peer closed\n");
+		goto err;
+	}
+
+	ret = vhost_user_msg_handler(conn->device.vid, &msg);
 	if (ret < 0) {
+err:
 		close(connfd);
 		*remove = 1;
 
@@ -613,4 +673,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
 	.vring_call = af_unix_vring_call,
+	.send_reply = af_unix_send_reply,
+	.send_slave_req = af_unix_send_slave_req,
 };
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index e54795a41..5f89453bc 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1137,48 +1137,8 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 	return 0;
 }
 
-/* return bytes# of read on success or negative val on failure. */
 static int
-read_vhost_message(int sockfd, struct VhostUserMsg *msg)
-{
-	int ret;
-
-	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
-		msg->fds, VHOST_MEMORY_MAX_NREGIONS);
-	if (ret <= 0)
-		return ret;
-
-	if (msg && msg->size) {
-		if (msg->size > sizeof(msg->payload)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"invalid msg size: %d\n", msg->size);
-			return -1;
-		}
-		ret = read(sockfd, &msg->payload, msg->size);
-		if (ret <= 0)
-			return ret;
-		if (ret != (int)msg->size) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"read control message failed\n");
-			return -1;
-		}
-	}
-
-	return ret;
-}
-
-static int
-send_vhost_message(int sockfd, struct VhostUserMsg *msg)
-{
-	if (!msg)
-		return 0;
-
-	return send_fd_message(sockfd, (char *)msg,
-		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
-}
-
-static int
-send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
+send_vhost_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
 	if (!msg)
 		return 0;
@@ -1188,7 +1148,16 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
 	msg->flags |= VHOST_USER_VERSION;
 	msg->flags |= VHOST_USER_REPLY_MASK;
 
-	return send_vhost_message(sockfd, msg);
+	return dev->trans_ops->send_reply(dev, msg);
+}
+
+static int
+send_vhost_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	if (!msg)
+		return 0;
+
+	return dev->trans_ops->send_slave_req(dev, msg);
 }
 
 /*
@@ -1230,10 +1199,10 @@ vhost_user_check_and_alloc_queue_pair(struct virtio_net *dev, VhostUserMsg *msg)
 }
 
 int
-vhost_user_msg_handler(int vid, int fd)
+vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg_)
 {
+	struct VhostUserMsg msg = *msg_; /* copy so we can build the reply */
 	struct virtio_net *dev;
-	struct VhostUserMsg msg;
 	int ret;
 
 	dev = get_device(vid);
@@ -1250,18 +1219,8 @@ vhost_user_msg_handler(int vid, int fd)
 		}
 	}
 
-	ret = read_vhost_message(fd, &msg);
-	if (ret <= 0 || msg.request.master >= VHOST_USER_MAX) {
-		if (ret < 0)
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"vhost read message failed\n");
-		else if (ret == 0)
-			RTE_LOG(INFO, VHOST_CONFIG,
-				"vhost peer closed\n");
-		else
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"vhost read incorrect message\n");
-
+	if (msg.request.master >= VHOST_USER_MAX) {
+		RTE_LOG(ERR, VHOST_CONFIG, "vhost read incorrect message\n");
 		return -1;
 	}
 
@@ -1284,7 +1243,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_FEATURES:
 		msg.payload.u64 = vhost_user_get_features(dev);
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 		break;
 	case VHOST_USER_SET_FEATURES:
 		ret = vhost_user_set_features(dev, msg.payload.u64);
@@ -1294,7 +1253,7 @@ vhost_user_msg_handler(int vid, int fd)
 
 	case VHOST_USER_GET_PROTOCOL_FEATURES:
 		vhost_user_get_protocol_features(dev, &msg);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 		break;
 	case VHOST_USER_SET_PROTOCOL_FEATURES:
 		vhost_user_set_protocol_features(dev, msg.payload.u64);
@@ -1316,7 +1275,7 @@ vhost_user_msg_handler(int vid, int fd)
 
 		/* it needs a reply */
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 		break;
 	case VHOST_USER_SET_LOG_FD:
 		close(msg.fds[0]);
@@ -1336,7 +1295,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_VRING_BASE:
 		vhost_user_get_vring_base(dev, &msg);
 		msg.size = sizeof(msg.payload.state);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_KICK:
@@ -1355,7 +1314,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_QUEUE_NUM:
 		msg.payload.u64 = VHOST_MAX_QUEUE_PAIRS;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_ENABLE:
@@ -1386,7 +1345,7 @@ vhost_user_msg_handler(int vid, int fd)
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_reply(fd, &msg);
+		send_vhost_reply(dev, &msg);
 	}
 
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
@@ -1421,7 +1380,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
 		},
 	};
 
-	ret = send_vhost_message(dev->slave_req_fd, &msg);
+	ret = send_vhost_slave_req(dev, &msg);
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 				"Failed to send IOTLB miss message (%d)\n",
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 12/24] vhost: move slave_req_fd field to AF_UNIX transport
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (10 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 11/24] vhost: extract vhost_user.c socket I/O into transport Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 13/24] vhost: move mmap/munmap " Stefan Hajnoczi
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The slave request file descriptor is specific to the AF_UNIX transport.
Move the field out of struct virtio_net and into the trans_af_unix.c
private struct vhost_user_connection struct.

This change will allow future transports to implement the slave request
fd without relying on socket I/O.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 27 ++++++++++++++++++++++++--
 lib/librte_vhost/trans_af_unix.c | 41 +++++++++++++++++++++++++++++++++++++++-
 lib/librte_vhost/vhost.c         |  3 ++-
 lib/librte_vhost/vhost_user.c    | 18 +-----------------
 4 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index ac9ceefb9..60e4d10bd 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -252,6 +252,16 @@ struct vhost_transport_ops {
 	 */
 	int (*socket_start)(struct vhost_user_socket *vsocket);
 
+	/**
+	* Free resources associated with this device.
+	*
+	* @param dev
+	*  vhost device
+	* @param destroy
+	*  0 on device reset, 1 on full cleanup.
+	*/
+	void (*cleanup_device)(struct virtio_net *dev, int destroy);
+
 	/**
 	 * Notify the guest that used descriptors have been added to the vring.
 	 * The VRING_AVAIL_F_NO_INTERRUPT flag has already been checked so this
@@ -290,6 +300,21 @@ struct vhost_transport_ops {
 	 */
 	int (*send_slave_req)(struct virtio_net *dev,
 			      struct VhostUserMsg *req);
+
+	/**
+	 * Process VHOST_USER_SET_SLAVE_REQ_FD message.  After this function
+	 * succeeds send_slave_req() may be called to submit requests to the
+	 * master.
+	 *
+	 * @param dev
+	 *  vhost device
+	 * @param msg
+	 *  message
+	 * @return
+	 *  0 on success, -1 on failure
+	 */
+	int (*set_slave_req_fd)(struct virtio_net *dev,
+				struct VhostUserMsg *msg);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
@@ -331,8 +356,6 @@ struct virtio_net {
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
-
-	int			slave_req_fd;
 } __rte_cache_aligned;
 
 /*
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 9e5a5c127..7128e121e 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -57,6 +57,7 @@ struct vhost_user_connection {
 	struct virtio_net device; /* must be the first field! */
 	struct vhost_user_socket *vsocket;
 	int connfd;
+	int slave_req_fd;
 
 	TAILQ_ENTRY(vhost_user_connection) next;
 };
@@ -173,10 +174,32 @@ af_unix_send_reply(struct virtio_net *dev, struct VhostUserMsg *msg)
 static int
 af_unix_send_slave_req(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
-	return send_fd_message(dev->slave_req_fd, msg,
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	return send_fd_message(conn->slave_req_fd, msg,
 			       VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
 }
 
+static int
+af_unix_set_slave_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+	int fd = msg->fds[0];
+
+	if (fd < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid file descriptor for slave channel (%d)\n",
+				fd);
+		return -1;
+	}
+
+	conn->slave_req_fd = fd;
+
+	return 0;
+}
+
 static void
 vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 {
@@ -194,6 +217,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 
 	conn = container_of(dev, struct vhost_user_connection, device);
 	conn->connfd = fd;
+	conn->slave_req_fd = -1;
 	conn->vsocket = vsocket;
 
 	size = strnlen(vsocket->path, PATH_MAX);
@@ -657,6 +681,19 @@ af_unix_socket_start(struct vhost_user_socket *vsocket)
 		return vhost_user_start_client(vsocket);
 }
 
+static void
+af_unix_cleanup_device(struct virtio_net *dev __rte_unused,
+		       int destroy __rte_unused)
+{
+	struct vhost_user_connection *conn =
+		container_of(dev, struct vhost_user_connection, device);
+
+	if (conn->slave_req_fd >= 0) {
+		close(conn->slave_req_fd);
+		conn->slave_req_fd = -1;
+	}
+}
+
 static int
 af_unix_vring_call(struct virtio_net *dev __rte_unused,
 		   struct vhost_virtqueue *vq)
@@ -672,7 +709,9 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_init = af_unix_socket_init,
 	.socket_cleanup = af_unix_socket_cleanup,
 	.socket_start = af_unix_socket_start,
+	.cleanup_device = af_unix_cleanup_device,
 	.vring_call = af_unix_vring_call,
 	.send_reply = af_unix_send_reply,
 	.send_slave_req = af_unix_send_slave_req,
+	.set_slave_req_fd = af_unix_set_slave_req_fd,
 };
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 1168e137e..0d95a4b3a 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -96,6 +96,8 @@ cleanup_device(struct virtio_net *dev, int destroy)
 
 	for (i = 0; i < dev->nr_vring; i++)
 		cleanup_vq(dev->virtqueue[i], destroy);
+
+	dev->trans_ops->cleanup_device(dev, destroy);
 }
 
 void
@@ -286,7 +288,6 @@ vhost_new_device(const struct vhost_transport_ops *trans_ops)
 
 	vhost_devices[i] = dev;
 	dev->vid = i;
-	dev->slave_req_fd = -1;
 	dev->trans_ops = trans_ops;
 
 	return dev;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5f89453bc..ee1b0a1a2 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -96,11 +96,6 @@ vhost_backend_cleanup(struct virtio_net *dev)
 		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
 		dev->log_addr = 0;
 	}
-
-	if (dev->slave_req_fd >= 0) {
-		close(dev->slave_req_fd);
-		dev->slave_req_fd = -1;
-	}
 }
 
 /*
@@ -1030,18 +1025,7 @@ vhost_user_net_set_mtu(struct virtio_net *dev, struct VhostUserMsg *msg)
 static int
 vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
 {
-	int fd = msg->fds[0];
-
-	if (fd < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-				"Invalid file descriptor for slave channel (%d)\n",
-				fd);
-		return -1;
-	}
-
-	dev->slave_req_fd = fd;
-
-	return 0;
+	return dev->trans_ops->set_slave_req_fd(dev, msg);
 }
 
 static int
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 13/24] vhost: move mmap/munmap to AF_UNIX transport
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (11 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 12/24] vhost: move slave_req_fd field to AF_UNIX transport Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 14/24] vhost: move librte_vhost to drivers/ Stefan Hajnoczi
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

How mem table regions are mapped is transport-specific, so move the mmap
code into trans_af_unix.c.  The new .map_mem_table()/.unmap_mem_table()
interfaces allow transports to perform the mapping and unmapping.

Drop the "mmap align:" debug output because the alignment is no longer
available from vhost_user.c.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 lib/librte_vhost/vhost.h         | 17 +++++++
 lib/librte_vhost/vhost_user.h    |  3 ++
 lib/librte_vhost/trans_af_unix.c | 78 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.c    | 95 ++++++++++------------------------------
 4 files changed, 121 insertions(+), 72 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 60e4d10bd..a50b802e7 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -315,6 +315,23 @@ struct vhost_transport_ops {
 	 */
 	int (*set_slave_req_fd)(struct virtio_net *dev,
 				struct VhostUserMsg *msg);
+
+	/**
+	 * Map memory table regions in dev->mem->regions[].
+	 *
+	 * @param dev
+	 *  vhost device
+	 */
+	int (*map_mem_regions)(struct virtio_net *dev);
+
+	/**
+	 * Unmap memory table regions in dev->mem->regions[] and free any
+	 * resources, such as file descriptors.
+	 *
+	 * @param dev
+	 *  vhost device
+	 */
+	void (*unmap_mem_regions)(struct virtio_net *dev);
 };
 
 /** The traditional AF_UNIX vhost-user protocol transport. */
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index dec658dff..4181f34c9 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -110,6 +110,9 @@ typedef struct VhostUserMsg {
 
 
 /* vhost_user.c */
+void vhost_add_guest_pages(struct virtio_net *dev,
+			   struct rte_vhost_mem_region *reg,
+			   uint64_t page_size);
 int vhost_user_msg_handler(int vid, const struct VhostUserMsg *msg);
 int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
diff --git a/lib/librte_vhost/trans_af_unix.c b/lib/librte_vhost/trans_af_unix.c
index 7128e121e..d3a5519b7 100644
--- a/lib/librte_vhost/trans_af_unix.c
+++ b/lib/librte_vhost/trans_af_unix.c
@@ -34,6 +34,8 @@
  */
 
 #include <sys/socket.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
 #include <sys/un.h>
 #include <fcntl.h>
 
@@ -703,6 +705,80 @@ af_unix_vring_call(struct virtio_net *dev __rte_unused,
 	return 0;
 }
 
+static uint64_t
+get_blk_size(int fd)
+{
+	struct stat stat;
+	int ret;
+
+	ret = fstat(fd, &stat);
+	return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
+}
+
+static int
+af_unix_map_mem_regions(struct virtio_net *dev)
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+		uint64_t mmap_size = reg->mmap_size;
+		uint64_t mmap_offset = mmap_size - reg->size;
+		uint64_t alignment;
+		void *mmap_addr;
+
+		/* mmap() without flag of MAP_ANONYMOUS, should be called
+		 * with length argument aligned with hugepagesz at older
+		 * longterm version Linux, like 2.6.32 and 3.2.72, or
+		 * mmap() will fail with EINVAL.
+		 *
+		 * to avoid failure, make sure in caller to keep length
+		 * aligned.
+		 */
+		alignment = get_blk_size(reg->fd);
+		if (alignment == (uint64_t)-1) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"couldn't get hugepage size through fstat\n");
+			return -1;
+		}
+		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
+
+		mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
+				 MAP_SHARED | MAP_POPULATE, reg->fd, 0);
+
+		if (mmap_addr == MAP_FAILED) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"mmap region %u failed.\n", i);
+			return -1;
+		}
+
+		reg->mmap_addr = mmap_addr;
+		reg->mmap_size = mmap_size;
+		reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr +
+				      mmap_offset;
+
+		if (dev->dequeue_zero_copy)
+			vhost_add_guest_pages(dev, reg, alignment);
+	}
+
+	return 0;
+}
+
+static void
+af_unix_unmap_mem_regions(struct virtio_net *dev)
+{
+	uint32_t i;
+	struct rte_vhost_mem_region *reg;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		reg = &dev->mem->regions[i];
+		if (reg->host_user_addr) {
+			munmap(reg->mmap_addr, reg->mmap_size);
+			close(reg->fd);
+		}
+	}
+}
+
 const struct vhost_transport_ops af_unix_trans_ops = {
 	.socket_size = sizeof(struct af_unix_socket),
 	.device_size = sizeof(struct vhost_user_connection),
@@ -714,4 +790,6 @@ const struct vhost_transport_ops af_unix_trans_ops = {
 	.send_reply = af_unix_send_reply,
 	.send_slave_req = af_unix_send_slave_req,
 	.set_slave_req_fd = af_unix_set_slave_req_fd,
+	.map_mem_regions = af_unix_map_mem_regions,
+	.unmap_mem_regions = af_unix_unmap_mem_regions,
 };
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ee1b0a1a2..a819684b4 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -52,32 +52,13 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
 };
 
-static uint64_t
-get_blk_size(int fd)
-{
-	struct stat stat;
-	int ret;
-
-	ret = fstat(fd, &stat);
-	return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
-}
-
 static void
 free_mem_region(struct virtio_net *dev)
 {
-	uint32_t i;
-	struct rte_vhost_mem_region *reg;
-
 	if (!dev || !dev->mem)
 		return;
 
-	for (i = 0; i < dev->mem->nregions; i++) {
-		reg = &dev->mem->regions[i];
-		if (reg->host_user_addr) {
-			munmap(reg->mmap_addr, reg->mmap_size);
-			close(reg->fd);
-		}
-	}
+	dev->trans_ops->unmap_mem_regions(dev);
 }
 
 void
@@ -516,9 +497,9 @@ add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr,
 	page->size = size;
 }
 
-static void
-add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg,
-		uint64_t page_size)
+void
+vhost_add_guest_pages(struct virtio_net *dev, struct rte_vhost_mem_region *reg,
+		      uint64_t page_size)
 {
 	uint64_t reg_size = reg->size;
 	uint64_t host_user_addr  = reg->host_user_addr;
@@ -602,19 +583,17 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
 	struct VhostUserMemory memory = pmsg->payload.memory;
 	struct rte_vhost_mem_region *reg;
-	void *mmap_addr;
-	uint64_t mmap_size;
-	uint64_t mmap_offset;
-	uint64_t alignment;
 	uint32_t i;
-	int fd;
 
 	if (dev->mem && !vhost_memory_changed(&memory, dev->mem)) {
 		RTE_LOG(INFO, VHOST_CONFIG,
 			"(%d) memory regions not changed\n", dev->vid);
 
-		for (i = 0; i < memory.nregions; i++)
-			close(pmsg->fds[i]);
+		for (i = 0; i < memory.nregions; i++) {
+			if (pmsg->fds[i] >= 0) {
+				close(pmsg->fds[i]);
+			}
+		}
 
 		return 0;
 	}
@@ -649,50 +628,24 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	}
 	dev->mem->nregions = memory.nregions;
 
+	/* Fill in dev->mem->regions[] */
 	for (i = 0; i < memory.nregions; i++) {
-		fd  = pmsg->fds[i];
 		reg = &dev->mem->regions[i];
 
 		reg->guest_phys_addr = memory.regions[i].guest_phys_addr;
 		reg->guest_user_addr = memory.regions[i].userspace_addr;
 		reg->size            = memory.regions[i].memory_size;
-		reg->fd              = fd;
+		reg->mmap_size       = reg->size + memory.regions[i].mmap_offset;
+		reg->mmap_addr       = NULL;
+		reg->host_user_addr  = 0;
+		reg->fd              = pmsg->fds[i];
+	}
 
-		mmap_offset = memory.regions[i].mmap_offset;
-		mmap_size   = reg->size + mmap_offset;
+	if (dev->trans_ops->map_mem_regions(dev) < 0)
+		goto err;
 
-		/* mmap() without flag of MAP_ANONYMOUS, should be called
-		 * with length argument aligned with hugepagesz at older
-		 * longterm version Linux, like 2.6.32 and 3.2.72, or
-		 * mmap() will fail with EINVAL.
-		 *
-		 * to avoid failure, make sure in caller to keep length
-		 * aligned.
-		 */
-		alignment = get_blk_size(fd);
-		if (alignment == (uint64_t)-1) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"couldn't get hugepage size through fstat\n");
-			goto err_mmap;
-		}
-		mmap_size = RTE_ALIGN_CEIL(mmap_size, alignment);
-
-		mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
-				 MAP_SHARED | MAP_POPULATE, fd, 0);
-
-		if (mmap_addr == MAP_FAILED) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"mmap region %u failed.\n", i);
-			goto err_mmap;
-		}
-
-		reg->mmap_addr = mmap_addr;
-		reg->mmap_size = mmap_size;
-		reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
-				      mmap_offset;
-
-		if (dev->dequeue_zero_copy)
-			add_guest_pages(dev, reg, alignment);
+	for (i = 0; i < memory.nregions; i++) {
+		reg = &dev->mem->regions[i];
 
 		RTE_LOG(INFO, VHOST_CONFIG,
 			"guest memory region %u, size: 0x%" PRIx64 "\n"
@@ -701,23 +654,21 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 			"\t host  virtual  addr: 0x%" PRIx64 "\n"
 			"\t mmap addr : 0x%" PRIx64 "\n"
 			"\t mmap size : 0x%" PRIx64 "\n"
-			"\t mmap align: 0x%" PRIx64 "\n"
 			"\t mmap off  : 0x%" PRIx64 "\n",
 			i, reg->size,
 			reg->guest_phys_addr,
 			reg->guest_user_addr,
 			reg->host_user_addr,
-			(uint64_t)(uintptr_t)mmap_addr,
-			mmap_size,
-			alignment,
-			mmap_offset);
+			(uint64_t)(uintptr_t)reg->mmap_addr,
+			reg->mmap_size,
+			memory.regions[i].mmap_offset);
 	}
 
 	dump_guest_pages(dev);
 
 	return 0;
 
-err_mmap:
+err:
 	free_mem_region(dev);
 	rte_free(dev->mem);
 	dev->mem = NULL;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 14/24] vhost: move librte_vhost to drivers/
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (12 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 13/24] vhost: move mmap/munmap " Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 15/24] vhost: add virtio pci framework Stefan Hajnoczi
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

PCI drivers depend on the rte_bus_pci.h header file which is only
available when drivers/ is built.  Code in lib/ cannot depend on code in
drivers/ because lib/ is built first during make install.

This patch moves librte_vhost into drivers/ so that later patches can
add a virtio-vhost-pci driver to librte_vhost without exporting all
private vhost.h symbols.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Is there a better way of giving a PCI driver access to librte_vhost
transport (not a public API that apps should use)?
---
 drivers/Makefile                                    | 2 ++
 {lib => drivers}/librte_vhost/Makefile              | 0
 lib/Makefile                                        | 3 ---
 {lib => drivers}/librte_vhost/fd_man.h              | 0
 {lib => drivers}/librte_vhost/iotlb.h               | 0
 {lib => drivers}/librte_vhost/rte_vhost.h           | 0
 {lib => drivers}/librte_vhost/vhost.h               | 0
 {lib => drivers}/librte_vhost/vhost_user.h          | 0
 {lib => drivers}/librte_vhost/fd_man.c              | 0
 {lib => drivers}/librte_vhost/iotlb.c               | 0
 {lib => drivers}/librte_vhost/socket.c              | 0
 {lib => drivers}/librte_vhost/trans_af_unix.c       | 0
 {lib => drivers}/librte_vhost/vhost.c               | 0
 {lib => drivers}/librte_vhost/vhost_user.c          | 0
 {lib => drivers}/librte_vhost/virtio_net.c          | 0
 {lib => drivers}/librte_vhost/rte_vhost_version.map | 0
 16 files changed, 2 insertions(+), 3 deletions(-)
 rename {lib => drivers}/librte_vhost/Makefile (100%)
 rename {lib => drivers}/librte_vhost/fd_man.h (100%)
 rename {lib => drivers}/librte_vhost/iotlb.h (100%)
 rename {lib => drivers}/librte_vhost/rte_vhost.h (100%)
 rename {lib => drivers}/librte_vhost/vhost.h (100%)
 rename {lib => drivers}/librte_vhost/vhost_user.h (100%)
 rename {lib => drivers}/librte_vhost/fd_man.c (100%)
 rename {lib => drivers}/librte_vhost/iotlb.c (100%)
 rename {lib => drivers}/librte_vhost/socket.c (100%)
 rename {lib => drivers}/librte_vhost/trans_af_unix.c (100%)
 rename {lib => drivers}/librte_vhost/vhost.c (100%)
 rename {lib => drivers}/librte_vhost/vhost_user.c (100%)
 rename {lib => drivers}/librte_vhost/virtio_net.c (100%)
 rename {lib => drivers}/librte_vhost/rte_vhost_version.map (100%)

diff --git a/drivers/Makefile b/drivers/Makefile
index 57e1a48a8..67f110c25 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -12,5 +12,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto
 DEPDIRS-crypto := bus mempool
 DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += event
 DEPDIRS-event := bus mempool net
+DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
+DEPDIRS-librte_vhost := bus
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_vhost/Makefile b/drivers/librte_vhost/Makefile
similarity index 100%
rename from lib/librte_vhost/Makefile
rename to drivers/librte_vhost/Makefile
diff --git a/lib/Makefile b/lib/Makefile
index 679912a28..dd916901b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -30,9 +30,6 @@ DEPDIRS-librte_security += librte_ether
 DEPDIRS-librte_security += librte_cryptodev
 DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += librte_eventdev
 DEPDIRS-librte_eventdev := librte_eal librte_ring librte_ether librte_hash
-DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
-DEPDIRS-librte_vhost := librte_eal librte_mempool librte_mbuf librte_ether \
-			librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
 DEPDIRS-librte_hash := librte_eal librte_ring
 DIRS-$(CONFIG_RTE_LIBRTE_EFD) += librte_efd
diff --git a/lib/librte_vhost/fd_man.h b/drivers/librte_vhost/fd_man.h
similarity index 100%
rename from lib/librte_vhost/fd_man.h
rename to drivers/librte_vhost/fd_man.h
diff --git a/lib/librte_vhost/iotlb.h b/drivers/librte_vhost/iotlb.h
similarity index 100%
rename from lib/librte_vhost/iotlb.h
rename to drivers/librte_vhost/iotlb.h
diff --git a/lib/librte_vhost/rte_vhost.h b/drivers/librte_vhost/rte_vhost.h
similarity index 100%
rename from lib/librte_vhost/rte_vhost.h
rename to drivers/librte_vhost/rte_vhost.h
diff --git a/lib/librte_vhost/vhost.h b/drivers/librte_vhost/vhost.h
similarity index 100%
rename from lib/librte_vhost/vhost.h
rename to drivers/librte_vhost/vhost.h
diff --git a/lib/librte_vhost/vhost_user.h b/drivers/librte_vhost/vhost_user.h
similarity index 100%
rename from lib/librte_vhost/vhost_user.h
rename to drivers/librte_vhost/vhost_user.h
diff --git a/lib/librte_vhost/fd_man.c b/drivers/librte_vhost/fd_man.c
similarity index 100%
rename from lib/librte_vhost/fd_man.c
rename to drivers/librte_vhost/fd_man.c
diff --git a/lib/librte_vhost/iotlb.c b/drivers/librte_vhost/iotlb.c
similarity index 100%
rename from lib/librte_vhost/iotlb.c
rename to drivers/librte_vhost/iotlb.c
diff --git a/lib/librte_vhost/socket.c b/drivers/librte_vhost/socket.c
similarity index 100%
rename from lib/librte_vhost/socket.c
rename to drivers/librte_vhost/socket.c
diff --git a/lib/librte_vhost/trans_af_unix.c b/drivers/librte_vhost/trans_af_unix.c
similarity index 100%
rename from lib/librte_vhost/trans_af_unix.c
rename to drivers/librte_vhost/trans_af_unix.c
diff --git a/lib/librte_vhost/vhost.c b/drivers/librte_vhost/vhost.c
similarity index 100%
rename from lib/librte_vhost/vhost.c
rename to drivers/librte_vhost/vhost.c
diff --git a/lib/librte_vhost/vhost_user.c b/drivers/librte_vhost/vhost_user.c
similarity index 100%
rename from lib/librte_vhost/vhost_user.c
rename to drivers/librte_vhost/vhost_user.c
diff --git a/lib/librte_vhost/virtio_net.c b/drivers/librte_vhost/virtio_net.c
similarity index 100%
rename from lib/librte_vhost/virtio_net.c
rename to drivers/librte_vhost/virtio_net.c
diff --git a/lib/librte_vhost/rte_vhost_version.map b/drivers/librte_vhost/rte_vhost_version.map
similarity index 100%
rename from lib/librte_vhost/rte_vhost_version.map
rename to drivers/librte_vhost/rte_vhost_version.map
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 15/24] vhost: add virtio pci framework
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (13 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 14/24] vhost: move librte_vhost to drivers/ Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 16/24] vhost: remember a vhost_virtqueue's queue index Stefan Hajnoczi
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The virtio-vhost-user transport will involve a virtio pci device driver.
There is currently no librte_virtio API that we can reusable.

This commit is a hack that duplicates the virtio pci code from
drivers/net/.  A better solution would be to extract the code cleanly
from drivers/net/ and share it.  Or perhaps we could backport SPDK's
lib/virtio.  I don't have time to do either right now so I've just
copied the code, removed virtio-net and ethdev parts, and renamed
symbols to avoid link errors.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/librte_vhost/Makefile     |   4 +-
 drivers/librte_vhost/virtio_pci.h | 267 ++++++++++++++++++++
 drivers/librte_vhost/virtqueue.h  | 181 ++++++++++++++
 drivers/librte_vhost/virtio_pci.c | 504 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 955 insertions(+), 1 deletion(-)
 create mode 100644 drivers/librte_vhost/virtio_pci.h
 create mode 100644 drivers/librte_vhost/virtqueue.h
 create mode 100644 drivers/librte_vhost/virtio_pci.c

diff --git a/drivers/librte_vhost/Makefile b/drivers/librte_vhost/Makefile
index ccbbce3af..8a56c32af 100644
--- a/drivers/librte_vhost/Makefile
+++ b/drivers/librte_vhost/Makefile
@@ -21,7 +21,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
-					vhost_user.c virtio_net.c trans_af_unix.c
+					vhost_user.c virtio_net.c \
+					trans_af_unix.c \
+					virtio_pci.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
diff --git a/drivers/librte_vhost/virtio_pci.h b/drivers/librte_vhost/virtio_pci.h
new file mode 100644
index 000000000..7afc24853
--- /dev/null
+++ b/drivers/librte_vhost/virtio_pci.h
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+/* XXX This file is based on drivers/net/virtio/virtio_pci.h.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <stdint.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_spinlock.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_VIRTIO_PCI_CONFIG RTE_LOGTYPE_USER2
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_PCI_VENDORID     0x1AF4
+#define VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER 0x1017
+#define VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER 0x1058
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR		  19 /* interrupt status register, reading
+				      * also clears the register (8, RO) */
+/* Only if MSIX is enabled: */
+#define VIRTIO_MSI_CONFIG_VECTOR  20 /* configuration change vector (16, RW) */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22 /* vector for selected VQ notifications
+				      (16, RW) */
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* VirtIO device IDs. */
+#define VIRTIO_ID_VHOST_USER  0x18
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them? */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/*
+ * Some VirtIO feature bits (currently bits 28 through 31) are
+ * reserved for the transport being used (eg. virtio_ring), the
+ * rest are per-device feature bits.
+ */
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END   34
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field. */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field. */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;		/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;		/* Generic PCI field: next ptr. */
+	uint8_t cap_len;		/* Generic PCI field: capability length */
+	uint8_t cfg_type;		/* Identifies the structure. */
+	uint8_t bar;			/* Where to find it. */
+	uint8_t padding[3];		/* Pad to full dword. */
+	uint32_t offset;		/* Offset within bar. */
+	uint32_t length;		/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_hw *hw);
+	void    (*set_status)(struct virtio_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_hw *hw);
+	void     (*set_features)(struct virtio_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_hw *hw, struct virtqueue *vq,
+			uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id);
+	int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_hw {
+	uint64_t    guest_features;
+	uint32_t    max_queue_pairs;
+	uint16_t    started;
+	uint8_t	    use_msix;
+	uint16_t    internal_id;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	void	    *dev_cfg;
+	/*
+	 * App management thread and virtio interrupt handler thread
+	 * both can change device state, this lock is meant to avoid
+	 * such a contention.
+	 */
+	rte_spinlock_t state_lock;
+
+	struct virtqueue **vqs;
+};
+
+/*
+ * While virtio_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+};
+
+#define VTPCI_OPS(hw)	(virtio_pci_hw_internal[(hw)->internal_id].vtpci_ops)
+
+extern struct virtio_hw_internal virtio_pci_hw_internal[8];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+virtio_pci_with_feature(struct virtio_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw);
+void virtio_pci_reset(struct virtio_hw *);
+
+void virtio_pci_reinit_complete(struct virtio_hw *);
+
+uint8_t virtio_pci_get_status(struct virtio_hw *);
+void virtio_pci_set_status(struct virtio_hw *, uint8_t);
+
+uint64_t virtio_pci_negotiate_features(struct virtio_hw *, uint64_t);
+
+void virtio_pci_write_dev_config(struct virtio_hw *, size_t, const void *, int);
+
+void virtio_pci_read_dev_config(struct virtio_hw *, size_t, void *, int);
+
+uint8_t virtio_pci_isr(struct virtio_hw *);
+
+enum virtio_msix_status virtio_pci_msix_detect(struct rte_pci_device *dev);
+
+extern const struct virtio_pci_ops virtio_pci_modern_ops;
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/librte_vhost/virtqueue.h b/drivers/librte_vhost/virtqueue.h
new file mode 100644
index 000000000..e2ac78eef
--- /dev/null
+++ b/drivers/librte_vhost/virtqueue.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+/* XXX This file is based on drivers/net/virtio/virtqueue.h.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <stdint.h>
+#include <linux/virtio_ring.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	struct virtio_hw  *hw; /**< virtio_hw structure pointer. */
+	struct vring vq_ring;  /**< vring keeping desc, used and avail */
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_nentries;  /**< vring desc numbers */
+	uint16_t vq_free_cnt;  /**< num of desc available */
+	uint16_t vq_avail_idx; /**< sync until needed */
+	uint16_t vq_free_thresh; /**< free threshold */
+
+	void *vq_ring_virt_mem;  /**< linear address of vring*/
+	unsigned int vq_ring_size;
+
+	rte_iova_t vq_ring_mem; /**< physical address of vring */
+
+	const struct rte_memzone *mz; /**< memzone backing vring */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	uint16_t  vq_queue_index;   /**< PCI queue index */
+	uint16_t  *notify_addr;
+	struct vq_desc_extra vq_descx[0];
+};
+
+/* Chain all the descriptors in the ring with an END */
+static inline void
+vring_desc_init(struct vring_desc *dp, uint16_t n)
+{
+	uint16_t i;
+
+	for (i = 0; i < n - 1; i++)
+		dp[i].next = (uint16_t)(i + 1);
+	dp[i].next = VQ_RING_DESC_CHAIN_END;
+}
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+static inline void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+/**
+ * Tell the backend to interrupt us.
+ */
+static inline void
+virtqueue_enable_intr(struct virtqueue *vq)
+{
+	vq->vq_ring.avail->flags &= (~VRING_AVAIL_F_NO_INTERRUPT);
+}
+
+/**
+ *  Dump virtqueue internal structures, for debug purpose only.
+ */
+void virtqueue_dump(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) ((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+#ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x\n", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
diff --git a/drivers/librte_vhost/virtio_pci.c b/drivers/librte_vhost/virtio_pci.c
new file mode 100644
index 000000000..f1a23bbbf
--- /dev/null
+++ b/drivers/librte_vhost/virtio_pci.c
@@ -0,0 +1,504 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+#include <stdint.h>
+
+/* XXX This file is based on drivers/net/virtio/virtio_pci.c.  It would be
+ * better to create a shared rte_virtio library instead of duplicating this
+ * code.
+ */
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "vring address shouldn't be above 16TB!\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_hw *hw, struct virtqueue *vq, uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "queue %u addresses:\n", vq->vq_queue_index);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t desc_addr: %" PRIx64 "\n", desc_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t aval_addr: %" PRIx64 "\n", avail_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t used_addr: %" PRIx64 "\n", used_addr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "\t notify addr: %p (notify offset: %u)\n",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_pci_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+
+void
+virtio_pci_read_dev_config(struct virtio_hw *hw, size_t offset,
+		      void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+virtio_pci_write_dev_config(struct virtio_hw *hw, size_t offset,
+		       const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+virtio_pci_negotiate_features(struct virtio_hw *hw, uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+virtio_pci_reset(struct virtio_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+virtio_pci_reinit_complete(struct virtio_hw *hw)
+{
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+virtio_pci_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+virtio_pci_get_status(struct virtio_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+virtio_pci_isr(struct virtio_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "invalid bar: %u\n", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "offset(%u) + length(%u) overflows\n",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+			"invalid cap: overflows bar space: %u > %" PRIu64 "\n",
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		RTE_LOG(ERR, VIRTIO_PCI_CONFIG, "bar %u base addr is NULL\n", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to map pci device!\n");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+				"failed to read pci cap at pos: %x\n", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG,
+				"[%2x] skipping non VNDR cap id: %02x\n",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u\n",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "no modern virtio pci device found.\n");
+		return -1;
+	}
+
+	RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "found modern virtio pci device.\n");
+
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "common cfg mapped at: %p\n", hw->common_cfg);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "device cfg mapped at: %p\n", hw->dev_cfg);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "isr cfg mapped at: %p\n", hw->isr);
+	RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "notify base: %p, notify off multiplier: %u\n",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+struct virtio_hw_internal virtio_pci_hw_internal[8];
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+virtio_pci_init(struct rte_pci_device *dev, struct virtio_hw *hw)
+{
+	static size_t internal_id;
+
+	if (internal_id >=
+	    sizeof(virtio_pci_hw_internal) / sizeof(*virtio_pci_hw_internal)) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "too many virtio pci devices.\n");
+		return -1;
+	}
+
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device.
+	 */
+	if (virtio_read_caps(dev, hw) != 0) {
+		RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "legacy virtio pci is not supported.\n");
+		return -1;
+	}
+
+	RTE_LOG(INFO, VIRTIO_PCI_CONFIG, "modern virtio pci detected.\n");
+	hw->internal_id = internal_id++;
+	virtio_pci_hw_internal[hw->internal_id].vtpci_ops =
+		&virtio_pci_modern_ops;
+	return 0;
+}
+
+enum virtio_msix_status
+virtio_pci_msix_detect(struct rte_pci_device *dev)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, VIRTIO_PCI_CONFIG, "failed to read pci capability list\n");
+		return VIRTIO_MSIX_NONE;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			RTE_LOG(ERR, VIRTIO_PCI_CONFIG,
+				"failed to read pci cap at pos: %x\n", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				return VIRTIO_MSIX_ENABLED;
+			else
+				return VIRTIO_MSIX_DISABLED;
+		}
+
+		pos = cap.cap_next;
+	}
+
+	return VIRTIO_MSIX_NONE;
+}
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 16/24] vhost: remember a vhost_virtqueue's queue index
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (14 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 15/24] vhost: add virtio pci framework Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 17/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

Currently the only way of determining a struct vhost_virtqueue's index
is to search struct virtio_net->virtqueue[] for its address.  Stash the
index in struct vhost_virtqueue so we won't have to search the array.

This new field will be used by virtio-vhost-user.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/librte_vhost/vhost.h | 1 +
 drivers/librte_vhost/vhost.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/librte_vhost/vhost.h b/drivers/librte_vhost/vhost.h
index a50b802e7..08ad874ef 100644
--- a/drivers/librte_vhost/vhost.h
+++ b/drivers/librte_vhost/vhost.h
@@ -72,6 +72,7 @@ struct vhost_virtqueue {
 	struct vring_avail	*avail;
 	struct vring_used	*used;
 	uint32_t		size;
+	uint32_t		vring_idx;
 
 	uint16_t		last_avail_idx;
 	uint16_t		last_used_idx;
diff --git a/drivers/librte_vhost/vhost.c b/drivers/librte_vhost/vhost.c
index 0d95a4b3a..886444683 100644
--- a/drivers/librte_vhost/vhost.c
+++ b/drivers/librte_vhost/vhost.c
@@ -191,6 +191,8 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
+	vq->vring_idx = vring_idx;
+
 	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 17/24] vhost: add virtio-vhost-user transport
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (15 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 16/24] vhost: remember a vhost_virtqueue's queue index Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 18/24] vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag Stefan Hajnoczi
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

This patch adds a new transport to librte_vhost for the
virtio-vhost-user device.  This device replaces the AF_UNIX socket used
by traditional vhost-user with a virtio device that tunnels vhost-user
protocol messages.  This allows a guest to act as a vhost device backend
for other guests.

The intended use case is for running DPDK inside a guest.  Other guests
can communicate via DPDK's "vhost" vdev driver.

For more information on virtio-vhost-user, see
https://wiki.qemu.org/Features/VirtioVhostUser.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/librte_vhost/Makefile                  |    1 +
 drivers/librte_vhost/vhost.h                   |    3 +
 drivers/librte_vhost/virtio_vhost_user.h       |   18 +
 drivers/librte_vhost/trans_virtio_vhost_user.c | 1050 ++++++++++++++++++++++++
 4 files changed, 1072 insertions(+)
 create mode 100644 drivers/librte_vhost/virtio_vhost_user.h
 create mode 100644 drivers/librte_vhost/trans_virtio_vhost_user.c

diff --git a/drivers/librte_vhost/Makefile b/drivers/librte_vhost/Makefile
index 8a56c32af..c8cce6fac 100644
--- a/drivers/librte_vhost/Makefile
+++ b/drivers/librte_vhost/Makefile
@@ -23,6 +23,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
 					vhost_user.c virtio_net.c \
 					trans_af_unix.c \
+					trans_virtio_vhost_user.c \
 					virtio_pci.c
 
 # install includes
diff --git a/drivers/librte_vhost/vhost.h b/drivers/librte_vhost/vhost.h
index 08ad874ef..1d4d1d139 100644
--- a/drivers/librte_vhost/vhost.h
+++ b/drivers/librte_vhost/vhost.h
@@ -338,6 +338,9 @@ struct vhost_transport_ops {
 /** The traditional AF_UNIX vhost-user protocol transport. */
 extern const struct vhost_transport_ops af_unix_trans_ops;
 
+/** The virtio-vhost-user PCI vhost-user protocol transport. */
+extern const struct vhost_transport_ops virtio_vhost_user_trans_ops;
+
 /**
  * Device structure contains all configuration information relating
  * to the device.
diff --git a/drivers/librte_vhost/virtio_vhost_user.h b/drivers/librte_vhost/virtio_vhost_user.h
new file mode 100644
index 000000000..baeaa7494
--- /dev/null
+++ b/drivers/librte_vhost/virtio_vhost_user.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C) 2018 Red Hat, Inc.
+ */
+
+#ifndef _LINUX_VIRTIO_VHOST_USER_H
+#define _LINUX_VIRTIO_VHOST_USER_H
+
+#include <stdint.h>
+
+struct virtio_vhost_user_config {
+    uint32_t status;
+#define VIRTIO_VHOST_USER_STATUS_SLAVE_UP 0
+#define VIRTIO_VHOST_USER_STATUS_MASTER_UP 1
+    uint32_t max_vhost_queues;
+    uint8_t uuid[16];
+};
+
+#endif /* _LINUX_VIRTIO_VHOST_USER_H */
diff --git a/drivers/librte_vhost/trans_virtio_vhost_user.c b/drivers/librte_vhost/trans_virtio_vhost_user.c
new file mode 100644
index 000000000..df654eb71
--- /dev/null
+++ b/drivers/librte_vhost/trans_virtio_vhost_user.c
@@ -0,0 +1,1050 @@
+/* SPDX-License-Idenitifier: BSD-3-Clause
+ * Copyright (C) 2018 Red Hat, Inc.
+ */
+
+/*
+ * @file
+ * virtio-vhost-user PCI transport driver
+ *
+ * This vhost-user transport communicates with the vhost-user master process
+ * over the virtio-vhost-user PCI device.
+ *
+ * Interrupts are used since this is the control path, not the data path.  This
+ * way the vhost-user command processing doesn't interfere with packet
+ * processing.  This is similar to the AF_UNIX transport's fdman thread that
+ * processes socket I/O separately.
+ *
+ * This transport replaces the usual vhost-user file descriptor passing with a
+ * PCI BAR that contains doorbell registers for callfd and logfd, and shared
+ * memory for the memory table regions.
+ *
+ * VIRTIO device specification:
+ * https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007
+ */
+
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_bus_pci.h>
+#include <rte_io.h>
+
+#include "vhost.h"
+#include "virtio_pci.h"
+#include "virtqueue.h"
+#include "virtio_vhost_user.h"
+#include "vhost_user.h"
+
+/*
+ * Data structures:
+ *
+ * Successfully probed virtio-vhost-user PCI adapters are added to
+ * vvu_pci_device_list as struct vvu_pci_device elements.
+ *
+ * When rte_vhost_driver_register() is called, a struct vvu_socket is created
+ * as the endpoint for future vhost-user connections.  The struct vvu_socket is
+ * associated with the struct vvu_pci_device that will be used for
+ * communication.
+ *
+ * When a vhost-user protocol connection is established, a struct
+ * vvu_connection is created and the application's new_device(int vid) callback
+ * is invoked.
+ */
+
+/** Probed PCI devices for lookup by rte_vhost_driver_register() */
+TAILQ_HEAD(, vvu_pci_device) vvu_pci_device_list =
+	TAILQ_HEAD_INITIALIZER(vvu_pci_device_list);
+
+struct vvu_socket;
+struct vvu_connection;
+
+/** A virtio-vhost-vsock PCI adapter */
+struct vvu_pci_device {
+	struct virtio_hw hw;
+	struct rte_pci_device *pci_dev;
+	struct vvu_socket *s;
+	TAILQ_ENTRY(vvu_pci_device) next;
+};
+
+/** A vhost-user endpoint (aka per-path state) */
+struct vvu_socket {
+	struct vhost_user_socket socket; /* must be first field! */
+	struct vvu_pci_device *pdev;
+	struct vvu_connection *conn;
+
+	/** Doorbell registers */
+	uint16_t *doorbells;
+
+	/** This struct virtio_vhost_user_config field determines the number of
+	 * doorbells available so we keep it saved.
+	 */
+	uint32_t max_vhost_queues;
+
+	/** Receive buffers */
+	const struct rte_memzone *rxbuf_mz;
+
+	/** Transmit buffers.  It is assumed that the device completes them
+	 * in-order so a single wrapping index can be used to select the next
+	 * free buffer.
+	 */
+	const struct rte_memzone *txbuf_mz;
+	unsigned int txbuf_idx;
+};
+
+/** A vhost-user protocol session (aka per-vid state) */
+struct vvu_connection {
+	struct virtio_net device; /* must be first field! */
+	struct vvu_socket *s;
+};
+
+/** Virtio feature bits that we support */
+#define VVU_VIRTIO_FEATURES ((1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | \
+			     (1ULL << VIRTIO_F_ANY_LAYOUT) | \
+			     (1ULL << VIRTIO_F_VERSION_1) | \
+			     (1ULL << VIRTIO_F_IOMMU_PLATFORM))
+
+/** Virtqueue indices */
+enum {
+	VVU_VQ_RX,
+	VVU_VQ_TX,
+	VVU_VQ_MAX,
+};
+
+enum {
+	/** Receive buffer size, in bytes */
+	VVU_RXBUF_SIZE = 1024,
+
+	/** Transmit buffer size, in bytes */
+	VVU_TXBUF_SIZE = 1024,
+};
+
+/** Look up a struct vvu_pci_device from a DomBDF string */
+static struct vvu_pci_device *
+vvu_pci_by_name(const char *name)
+{
+	struct vvu_pci_device *pdev;
+
+	TAILQ_FOREACH(pdev, &vvu_pci_device_list, next) {
+		if (!strcmp(pdev->pci_dev->device.name, name))
+			return pdev;
+	}
+	return NULL;
+}
+
+/** Start connection establishment */
+static void
+vvu_connect(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	uint32_t status;
+
+	virtio_pci_read_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+	status |= RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+	virtio_pci_write_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+}
+
+static void
+vvu_disconnect(struct vvu_socket *s)
+{
+	struct vhost_user_socket *vsocket = &s->socket;
+	struct vvu_connection *conn = s->conn;
+	struct virtio_hw *hw = &s->pdev->hw;
+	uint32_t status;
+
+	if (conn) {
+		if (vsocket->notify_ops->destroy_connection)
+			vsocket->notify_ops->destroy_connection(conn->device.vid);
+
+		vhost_destroy_device(conn->device.vid);
+	}
+
+	/* Make sure we're disconnected */
+	virtio_pci_read_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+	status &= ~RTE_LE32(1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+	virtio_pci_write_dev_config(hw,
+			offsetof(struct virtio_vhost_user_config, status),
+			&status, sizeof(status));
+}
+
+static void
+vvu_reconnect(struct vvu_socket *s)
+{
+	vvu_disconnect(s);
+	vvu_connect(s);
+}
+
+static void vvu_process_rxq(struct vvu_socket *s);
+
+static void
+vvu_cleanup_device(struct virtio_net *dev, int destroy __rte_unused)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *s = conn->s;
+
+	s->conn = NULL;
+	vvu_process_rxq(s); /* discard old replies from master */
+	vvu_reconnect(s);
+}
+
+static int
+vvu_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *s = conn->s;
+	uint16_t vq_idx = vq->vring_idx;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s vq_idx %u\n", __func__, vq_idx);
+
+	rte_write16(rte_cpu_to_le_16(vq_idx), &s->doorbells[vq_idx]);
+	return 0;
+}
+
+static int
+vvu_send_reply(struct virtio_net *dev, struct VhostUserMsg *reply)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *s = conn->s;
+	struct virtqueue *vq = s->pdev->hw.vqs[VVU_VQ_TX];
+	struct vring_desc *desc;
+	struct vq_desc_extra *descx;
+	unsigned int i;
+	void *buf;
+	size_t len;
+
+	RTE_LOG(DEBUG, VHOST_CONFIG,
+		"%s request %u flags %#x size %u\n",
+		__func__, reply->request.master,
+		reply->flags, reply->size);
+
+	/* TODO convert reply to little-endian */
+
+	if (virtqueue_full(vq)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Out of tx buffers\n");
+		return -1;
+	}
+
+	i = s->txbuf_idx;
+	len = VHOST_USER_HDR_SIZE + reply->size;
+	buf = (uint8_t *)s->txbuf_mz->addr + i * VVU_TXBUF_SIZE;
+
+	memcpy(buf, reply, len);
+
+	desc = &vq->vq_ring.desc[i];
+	descx = &vq->vq_descx[i];
+
+	desc->addr = rte_cpu_to_le_64(s->txbuf_mz->iova + i * VVU_TXBUF_SIZE);
+	desc->len = rte_cpu_to_le_32(len);
+	desc->flags = 0;
+
+	descx->cookie = buf;
+	descx->ndescs = 1;
+
+	vq->vq_free_cnt--;
+	s->txbuf_idx = (s->txbuf_idx + 1) & (vq->vq_nentries - 1);
+
+	vq_update_avail_ring(vq, i);
+	vq_update_avail_idx(vq);
+
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
+
+	return 0;
+}
+
+static int
+vvu_map_mem_regions(struct virtio_net *dev)
+{
+	struct vvu_connection *conn =
+		container_of(dev, struct vvu_connection, device);
+	struct vvu_socket *s = conn->s;
+	struct rte_pci_device *pci_dev = s->pdev->pci_dev;
+	uint8_t *mmap_addr;
+	uint32_t i;
+
+	/* Memory regions start after the doorbell registers */
+	mmap_addr = (uint8_t *)pci_dev->mem_resource[2].addr +
+		    RTE_ALIGN_CEIL((s->max_vhost_queues + 1 /* log fd */) *
+				   sizeof(uint16_t), 4096);
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+
+		reg->mmap_addr = mmap_addr;
+		reg->host_user_addr = (uint64_t)(uintptr_t)reg->mmap_addr +
+				      reg->mmap_size - reg->size;
+
+		mmap_addr += reg->mmap_size;
+	}
+
+	return 0;
+}
+
+static void
+vvu_unmap_mem_regions(struct virtio_net *dev)
+{
+	uint32_t i;
+
+	for (i = 0; i < dev->mem->nregions; i++) {
+		struct rte_vhost_mem_region *reg = &dev->mem->regions[i];
+
+		/* Just clear the pointers, the PCI BAR stays there */
+		reg->mmap_addr = NULL;
+		reg->host_user_addr = 0;
+	}
+}
+
+static void vvu_process_new_connection(struct vvu_socket *s)
+{
+	struct vhost_user_socket *vsocket = &s->socket;
+	struct vvu_connection *conn;
+	struct virtio_net *dev;
+	size_t size;
+
+	dev = vhost_new_device(vsocket->trans_ops);
+	if (!dev) {
+		vvu_reconnect(s);
+		return;
+	}
+
+	conn = container_of(dev, struct vvu_connection, device);
+	conn->s = s;
+
+	size = strnlen(vsocket->path, PATH_MAX);
+	vhost_set_ifname(dev->vid, vsocket->path, size);
+
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", dev->vid);
+
+	if (vsocket->notify_ops->new_connection) {
+		int ret = vsocket->notify_ops->new_connection(dev->vid);
+		if (ret < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to add vhost user connection\n");
+			vhost_destroy_device(dev->vid);
+			vvu_reconnect(s);
+			return;
+		}
+	}
+
+	s->conn = conn;
+	return;
+}
+
+static void vvu_process_status_change(struct vvu_socket *s, bool slave_up,
+				      bool master_up)
+{
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s slave_up %d master_up %d\n",
+		__func__, slave_up, master_up);
+
+	/* Disconnected from the master, try reconnecting */
+	if (!slave_up) {
+		vvu_reconnect(s);
+		return;
+	}
+
+	if (master_up && !s->conn) {
+		vvu_process_new_connection(s);
+		return;
+	}
+}
+
+static void
+vvu_process_txq(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	struct virtqueue *vq = hw->vqs[VVU_VQ_TX];
+	uint16_t n = VIRTQUEUE_NUSED(vq);
+
+	virtio_rmb();
+
+	/* Just mark the buffers complete */
+	vq->vq_used_cons_idx += n;
+	vq->vq_free_cnt += n;
+}
+
+static void
+vvu_process_rxq(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	struct virtqueue *vq = hw->vqs[VVU_VQ_RX];
+	bool refilled = false;
+
+	while (VIRTQUEUE_NUSED(vq)) {
+		struct vring_used_elem *uep;
+		VhostUserMsg *msg;
+		uint32_t len;
+		uint32_t desc_idx;
+		uint16_t used_idx;
+		size_t i;
+
+		virtio_rmb();
+
+		used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
+		uep = &vq->vq_ring.used->ring[used_idx];
+		desc_idx = rte_le_to_cpu_32(uep->id);
+
+		msg = vq->vq_descx[desc_idx].cookie;
+		len = rte_le_to_cpu_32(uep->len);
+
+		if (msg->size > sizeof(VhostUserMsg) ||
+		    len != VHOST_USER_HDR_SIZE + msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid vhost-user message size %u, got %u bytes\n",
+				msg->size, len);
+			/* TODO reconnect */
+			abort();
+		}
+
+		RTE_LOG(DEBUG, VHOST_CONFIG,
+			"%s request %u flags %#x size %u\n",
+			__func__, msg->request.master,
+			msg->flags, msg->size);
+
+		/* Mark file descriptors invalid */
+		for (i = 0; i < RTE_DIM(msg->fds); i++)
+			msg->fds[i] = VIRTIO_INVALID_EVENTFD;
+
+		/* Only process messages while connected */
+		if (s->conn) {
+			if (vhost_user_msg_handler(s->conn->device.vid,
+						   msg) < 0) {
+				/* TODO reconnect */
+				abort();
+			}
+		}
+
+		vq->vq_used_cons_idx++;
+
+		/* Refill rxq */
+		vq_update_avail_ring(vq, desc_idx);
+		vq_update_avail_idx(vq);
+		refilled = true;
+	}
+
+	if (!refilled)
+		return;
+	if (virtqueue_kick_prepare(vq))
+		virtqueue_notify(vq);
+}
+
+/* TODO Audit thread safety.  There are 3 threads involved:
+ * 1. The main process thread that calls librte_vhost APIs during startup.
+ * 2. The interrupt thread that calls vvu_interrupt_handler().
+ * 3. Packet processing threads (lcores) calling librte_vhost APIs.
+ *
+ * It may be necessary to use locks if any of these code paths can race.  The
+ * librte_vhost API entry points already do some locking but this needs to be
+ * checked.
+ */
+static void
+vvu_interrupt_handler(void *cb_arg)
+{
+	struct vvu_socket *s = cb_arg;
+	struct virtio_hw *hw = &s->pdev->hw;
+	struct rte_intr_handle *intr_handle = &s->pdev->pci_dev->intr_handle;
+	uint8_t isr;
+
+	/* Read Interrupt Status Register (which also clears it) */
+	isr = VTPCI_OPS(hw)->get_isr(hw);
+
+	if (isr & VIRTIO_PCI_ISR_CONFIG) {
+		uint32_t status;
+		bool slave_up;
+		bool master_up;
+
+		virtio_pci_read_dev_config(hw,
+				offsetof(struct virtio_vhost_user_config, status),
+				&status, sizeof(status));
+		status = rte_le_to_cpu_32(status);
+
+		RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x status %#x\n", __func__, isr, status);
+
+		slave_up = status & (1u << VIRTIO_VHOST_USER_STATUS_SLAVE_UP);
+		master_up = status & (1u << VIRTIO_VHOST_USER_STATUS_MASTER_UP);
+		vvu_process_status_change(s, slave_up, master_up);
+	} else
+		RTE_LOG(DEBUG, VHOST_CONFIG, "%s isr %#x\n", __func__, isr);
+
+	/* Re-arm before processing virtqueues so no interrupts are lost */
+	rte_intr_enable(intr_handle);
+
+	vvu_process_txq(s);
+	vvu_process_rxq(s);
+}
+
+static int
+vvu_virtio_pci_init_rxq(struct vvu_socket *s)
+{
+	char name[sizeof("0000:00:00.00 vq 0 rxbufs")];
+	struct virtqueue *vq;
+	size_t size;
+	size_t align;
+	int i;
+
+	vq = s->pdev->hw.vqs[VVU_VQ_RX];
+
+	snprintf(name, sizeof(name), "%s vq %u rxbufs",
+		 s->pdev->pci_dev->device.name, VVU_VQ_RX);
+
+	/* Allocate more than sizeof(VhostUserMsg) so there is room to grow */
+	size = vq->vq_nentries * VVU_RXBUF_SIZE;
+	align = 1024;
+	s->rxbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY,
+						  0, align);
+	if (!s->rxbuf_mz) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate rxbuf memzone\n");
+		return -1;
+	}
+
+	for (i = 0; i < vq->vq_nentries; i++) {
+		struct vring_desc *desc = &vq->vq_ring.desc[i];
+		struct vq_desc_extra *descx = &vq->vq_descx[i];
+
+		desc->addr = rte_cpu_to_le_64(s->rxbuf_mz->iova +
+				              i * VVU_RXBUF_SIZE);
+		desc->len = RTE_LE32(VVU_RXBUF_SIZE);
+		desc->flags = RTE_LE16(VRING_DESC_F_WRITE);
+
+		descx->cookie = (uint8_t *)s->rxbuf_mz->addr + i * VVU_RXBUF_SIZE;
+		descx->ndescs = 1;
+
+		vq_update_avail_ring(vq, i);
+		vq->vq_free_cnt--;
+	}
+
+	vq_update_avail_idx(vq);
+	virtqueue_notify(vq);
+	return 0;
+}
+
+static int
+vvu_virtio_pci_init_txq(struct vvu_socket *s)
+{
+	char name[sizeof("0000:00:00.00 vq 0 txbufs")];
+	struct virtqueue *vq;
+	size_t size;
+	size_t align;
+
+	vq = s->pdev->hw.vqs[VVU_VQ_TX];
+
+	snprintf(name, sizeof(name), "%s vq %u txbufs",
+		 s->pdev->pci_dev->device.name, VVU_VQ_TX);
+
+	/* Allocate more than sizeof(VhostUserMsg) so there is room to grow */
+	size = vq->vq_nentries * VVU_TXBUF_SIZE;
+	align = 1024;
+	s->txbuf_mz = rte_memzone_reserve_aligned(name, size, SOCKET_ID_ANY,
+						  0, align);
+	if (!s->txbuf_mz) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate txbuf memzone\n");
+		return -1;
+	}
+
+	s->txbuf_idx = 0;
+	return 0;
+}
+
+static void
+virtio_init_vring(struct virtqueue *vq)
+{
+	int size = vq->vq_nentries;
+	struct vring *vr = &vq->vq_ring;
+	uint8_t *ring_mem = vq->vq_ring_virt_mem;
+
+	memset(ring_mem, 0, vq->vq_ring_size);
+	vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_used_cons_idx = 0;
+	vq->vq_desc_head_idx = 0;
+	vq->vq_avail_idx = 0;
+	vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+	vq->vq_free_cnt = vq->vq_nentries;
+	memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+
+	vring_desc_init(vr->desc, size);
+	virtqueue_enable_intr(vq);
+}
+
+static int
+vvu_virtio_pci_init_vq(struct vvu_socket *s, int vq_idx)
+{
+	char vq_name[sizeof("0000:00:00.00 vq 0")];
+	struct virtio_hw *hw = &s->pdev->hw;
+	const struct rte_memzone *mz;
+	struct virtqueue *vq;
+	uint16_t q_num;
+	size_t size;
+
+	q_num = VTPCI_OPS(hw)->get_queue_num(hw, vq_idx);
+	RTE_LOG(DEBUG, VHOST_CONFIG, "vq %d q_num: %u\n", vq_idx, q_num);
+	if (q_num == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "virtqueue %d does not exist\n",
+			vq_idx);
+		return -1;
+	}
+
+	if (!rte_is_power_of_2(q_num)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"virtqueue %d has non-power of 2 size (%u)\n",
+			vq_idx, q_num);
+		return -1;
+	}
+
+	snprintf(vq_name, sizeof(vq_name), "%s vq %u",
+		 s->pdev->pci_dev->device.name, vq_idx);
+
+	size = RTE_ALIGN_CEIL(sizeof(*vq) +
+			      q_num * sizeof(struct vq_desc_extra),
+			      RTE_CACHE_LINE_SIZE);
+	vq = rte_zmalloc(vq_name, size, RTE_CACHE_LINE_SIZE);
+	if (!vq) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocated virtqueue %d\n", vq_idx);
+		return -1;
+	}
+	hw->vqs[vq_idx] = vq;
+
+	vq->hw = hw;
+	vq->vq_queue_index = vq_idx;
+	vq->vq_nentries = q_num;
+
+	size = vring_size(q_num, VIRTIO_PCI_VRING_ALIGN);
+	vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
+
+	mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
+					 SOCKET_ID_ANY, 0,
+					 VIRTIO_PCI_VRING_ALIGN);
+	if (mz == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to reserve memzone for virtqueue %d\n",
+			vq_idx);
+		goto err_vq;
+	}
+
+	memset(mz->addr, 0, mz->len);
+
+	vq->mz = mz;
+	vq->vq_ring_mem = mz->iova;
+	vq->vq_ring_virt_mem = mz->addr;
+	virtio_init_vring(vq);
+
+	if (VTPCI_OPS(hw)->setup_queue(hw, vq) < 0)
+		goto err_mz;
+
+	return 0;
+
+err_mz:
+	rte_memzone_free(mz);
+
+err_vq:
+	hw->vqs[vq_idx] = NULL;
+	rte_free(vq);
+	return -1;
+}
+
+static void
+vvu_virtio_pci_free_virtqueues(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	int i;
+
+	if (s->rxbuf_mz) {
+		rte_memzone_free(s->rxbuf_mz);
+		s->rxbuf_mz = NULL;
+	}
+	if (s->txbuf_mz) {
+		rte_memzone_free(s->txbuf_mz);
+		s->txbuf_mz = NULL;
+	}
+
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		struct virtqueue *vq = hw->vqs[i];
+
+		if (!vq)
+			continue;
+
+		rte_memzone_free(vq->mz);
+		rte_free(vq);
+		hw->vqs[i] = NULL;
+	}
+
+	rte_free(hw->vqs);
+	hw->vqs = NULL;
+}
+
+static void
+vvu_virtio_pci_intr_cleanup(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	struct rte_intr_handle *intr_handle = &s->pdev->pci_dev->intr_handle;
+	int i;
+
+	for (i = 0; i < VVU_VQ_MAX; i++)
+		VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i],
+					     VIRTIO_MSI_NO_VECTOR);
+	VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
+	rte_intr_disable(intr_handle);
+	rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, s);
+	rte_intr_efd_disable(intr_handle);
+}
+
+static int
+vvu_virtio_pci_init_intr(struct vvu_socket *s)
+{
+	struct virtio_hw *hw = &s->pdev->hw;
+	struct rte_intr_handle *intr_handle = &s->pdev->pci_dev->intr_handle;
+	int i;
+
+	if (!rte_intr_cap_multiple(intr_handle)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Multiple intr vector not supported\n");
+		return -1;
+	}
+
+	if (rte_intr_efd_enable(intr_handle, VVU_VQ_MAX) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to create eventfds\n");
+		return -1;
+	}
+
+	if (rte_intr_callback_register(intr_handle, vvu_interrupt_handler, s) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to register interrupt callback\n");
+		goto err_efd;
+	}
+
+	if (rte_intr_enable(intr_handle) < 0)
+		goto err_callback;
+
+	if (VTPCI_OPS(hw)->set_config_irq(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to set config MSI-X vector\n");
+		goto err_enable;
+	}
+
+	/* TODO use separate vectors and interrupt handler functions.  It seems
+	 * <rte_interrupts.h> doesn't allow efds to have interrupt_handler
+	 * functions and it just clears efds when they are raised.  As a
+	 * workaround we use the configuration change interrupt for virtqueue
+	 * interrupts!
+	 */
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		if (VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i], 0) ==
+				VIRTIO_MSI_NO_VECTOR) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to set virtqueue MSI-X vector\n");
+			goto err_vq;
+		}
+	}
+
+	return 0;
+
+err_vq:
+	for (i = 0; i < VVU_VQ_MAX; i++)
+		VTPCI_OPS(hw)->set_queue_irq(hw, hw->vqs[i],
+					     VIRTIO_MSI_NO_VECTOR);
+	VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
+err_enable:
+	rte_intr_disable(intr_handle);
+err_callback:
+	rte_intr_callback_unregister(intr_handle, vvu_interrupt_handler, s);
+err_efd:
+	rte_intr_efd_disable(intr_handle);
+	return -1;
+}
+
+static int
+vvu_virtio_pci_init_bar(struct vvu_socket *s)
+{
+	struct rte_pci_device *pci_dev = s->pdev->pci_dev;
+	struct virtio_net *dev = NULL; /* just for sizeof() */
+
+	s->doorbells = pci_dev->mem_resource[2].addr;
+	if (!s->doorbells) {
+		RTE_LOG(ERR, VHOST_CONFIG, "BAR 2 not availabled\n");
+		return -1;
+	}
+
+	/* The number of doorbells is max_vhost_queues + 1 */
+	virtio_pci_read_dev_config(&s->pdev->hw,
+			offsetof(struct virtio_vhost_user_config,
+				 max_vhost_queues),
+			&s->max_vhost_queues,
+			sizeof(s->max_vhost_queues));
+	s->max_vhost_queues = rte_le_to_cpu_32(s->max_vhost_queues);
+	if (s->max_vhost_queues < RTE_DIM(dev->virtqueue)) {
+		/* We could support devices with a smaller max number of
+		 * virtqueues than dev->virtqueue[] in the future.  Fail early
+		 * for now since the current assumption is that all of
+		 * dev->virtqueue[] can be used.
+		 */
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Device supports fewer virtqueues than driver!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+vvu_virtio_pci_init(struct vvu_socket *s)
+{
+	uint64_t host_features;
+	struct virtio_hw *hw = &s->pdev->hw;
+	int i;
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+
+	hw->guest_features = VVU_VIRTIO_FEATURES;
+	host_features = VTPCI_OPS(hw)->get_features(hw);
+	hw->guest_features = virtio_pci_negotiate_features(hw, host_features);
+
+	if (!virtio_pci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Missing VIRTIO 1 feature bit\n");
+		goto err;
+	}
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+	if (!(virtio_pci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "Failed to set FEATURES_OK\n");
+		goto err;
+	}
+
+	if (vvu_virtio_pci_init_bar(s) < 0)
+		goto err;
+
+	hw->vqs = rte_zmalloc(NULL, sizeof(struct virtqueue *) * VVU_VQ_MAX, 0);
+	if (!hw->vqs)
+		goto err;
+
+	for (i = 0; i < VVU_VQ_MAX; i++) {
+		if (vvu_virtio_pci_init_vq(s, i) < 0) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"virtqueue %u init failed\n", i);
+			goto err_init_vq;
+		}
+	}
+
+	if (vvu_virtio_pci_init_rxq(s) < 0)
+		goto err_init_vq;
+
+	if (vvu_virtio_pci_init_txq(s) < 0)
+		goto err_init_vq;
+
+	if (vvu_virtio_pci_init_intr(s) < 0)
+		goto err_init_vq;
+
+	virtio_pci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+
+	return 0;
+
+err_init_vq:
+	vvu_virtio_pci_free_virtqueues(s);
+
+err:
+	virtio_pci_reset(hw);
+	RTE_LOG(DEBUG, VHOST_CONFIG, "%s failed\n", __func__);
+	return -1;
+}
+
+static int
+vvu_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
+	      struct rte_pci_device *pci_dev)
+{
+	struct vvu_pci_device *pdev;
+
+	/* TODO support multi-process applications */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"virtio-vhost-pci does not support multi-process "
+			"applications\n");
+		return -1;
+	}
+
+	pdev = rte_zmalloc_socket(pci_dev->device.name, sizeof(*pdev),
+				  RTE_CACHE_LINE_SIZE,
+				  pci_dev->device.numa_node);
+	if (!pdev)
+		return -1;
+
+	pdev->pci_dev = pci_dev;
+
+	if (virtio_pci_init(pci_dev, &pdev->hw) != 0) {
+		rte_free(pdev);
+		return -1;
+	}
+
+	/* Reset the device now, the rest is done in vvu_socket_init() */
+	virtio_pci_reset(&pdev->hw);
+
+	if (pdev->hw.use_msix == VIRTIO_MSIX_NONE) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"MSI-X is required for PCI device at %s\n",
+			pci_dev->device.name);
+		rte_free(pdev);
+		rte_pci_unmap_device(pci_dev);
+		return -1;
+	}
+
+	TAILQ_INSERT_TAIL(&vvu_pci_device_list, pdev, next);
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"Added virtio-vhost-user device at %s\n",
+		pci_dev->device.name);
+
+	return 0;
+}
+
+static int
+vvu_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct vvu_pci_device *pdev;
+
+	TAILQ_FOREACH(pdev, &vvu_pci_device_list, next)
+		if (pdev->pci_dev == pci_dev)
+			break;
+	if (!pdev)
+		return -1;
+
+	if (pdev->s) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Cannot remove PCI device at %s with vhost still attached\n",
+			pci_dev->device.name);
+		return -1;
+	}
+
+	TAILQ_REMOVE(&vvu_pci_device_list, pdev, next);
+	rte_free(pdev);
+	rte_pci_unmap_device(pci_dev);
+	return 0;
+}
+
+static const struct rte_pci_id pci_id_vvu_map[] = {
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID,
+			 VIRTIO_PCI_LEGACY_DEVICEID_VHOST_USER) },
+	{ RTE_PCI_DEVICE(VIRTIO_PCI_VENDORID,
+			 VIRTIO_PCI_MODERN_DEVICEID_VHOST_USER) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static struct rte_pci_driver vvu_pci_driver = {
+	.driver = {
+		.name = "virtio_vhost_user",
+	},
+	.id_table = pci_id_vvu_map,
+	.drv_flags = 0,
+	.probe = vvu_pci_probe,
+	.remove = vvu_pci_remove,
+};
+
+RTE_INIT(vvu_pci_init);
+static void
+vvu_pci_init(void)
+{
+	if (rte_eal_iopl_init() != 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"IOPL call failed - cannot use virtio-vhost-user\n");
+		return;
+	}
+
+	rte_pci_register(&vvu_pci_driver);
+}
+
+static int
+vvu_socket_init(struct vhost_user_socket *vsocket, uint64_t flags)
+{
+	struct vvu_socket *s =
+		container_of(vsocket, struct vvu_socket, socket);
+	struct vvu_pci_device *pdev;
+
+	if (flags & RTE_VHOST_USER_NO_RECONNECT) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: reconnect cannot be disabled for virtio-vhost-user\n");
+		return -1;
+	}
+	if (flags & RTE_VHOST_USER_CLIENT) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: virtio-vhost-user does not support client mode\n");
+		return -1;
+	}
+	if (flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"error: virtio-vhost-user does not support dequeue-zero-copy\n");
+		return -1;
+	}
+
+	pdev = vvu_pci_by_name(vsocket->path);
+	if (!pdev) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Cannot find virtio-vhost-user PCI device at %s\n",
+			vsocket->path);
+		return -1;
+	}
+
+	if (pdev->s) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Device at %s is already in use\n",
+			vsocket->path);
+		return -1;
+	}
+
+	s->pdev = pdev;
+	pdev->s = s;
+
+	if (vvu_virtio_pci_init(s) < 0) {
+		s->pdev = NULL;
+		pdev->s = NULL;
+		return -1;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "%s at %s\n", __func__, vsocket->path);
+	return 0;
+}
+
+static void
+vvu_socket_cleanup(struct vhost_user_socket *vsocket)
+{
+	struct vvu_socket *s =
+		container_of(vsocket, struct vvu_socket, socket);
+
+	if (s->conn)
+		vhost_destroy_device(s->conn->device.vid);
+
+	vvu_virtio_pci_intr_cleanup(s);
+	virtio_pci_reset(&s->pdev->hw);
+	vvu_virtio_pci_free_virtqueues(s);
+
+	s->pdev->s = NULL;
+	s->pdev = NULL;
+}
+
+static int
+vvu_socket_start(struct vhost_user_socket *vsocket)
+{
+	struct vvu_socket *s =
+		container_of(vsocket, struct vvu_socket, socket);
+
+	vvu_connect(s);
+	return 0;
+}
+
+const struct vhost_transport_ops virtio_vhost_user_trans_ops = {
+	.socket_size = sizeof(struct vvu_socket),
+	.device_size = sizeof(struct vvu_connection),
+	.socket_init = vvu_socket_init,
+	.socket_cleanup = vvu_socket_cleanup,
+	.socket_start = vvu_socket_start,
+	.cleanup_device = vvu_cleanup_device,
+	.vring_call = vvu_vring_call,
+	.send_reply = vvu_send_reply,
+	.map_mem_regions = vvu_map_mem_regions,
+	.unmap_mem_regions = vvu_unmap_mem_regions,
+};
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 18/24] vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (16 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 17/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 19/24] net/vhost: add virtio-vhost-user support Stefan Hajnoczi
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

Extend the <rte_vhost.h> API to support the virtio-vhost-user transport
as an alternative to the AF_UNIX transport.  The caller provides a PCI
DomBDF address:

  rte_vhost_driver_register("0000:00:04.0",
                            RTE_VHOST_USER_VIRTIO_TRANSPORT);

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/librte_vhost/rte_vhost.h | 1 +
 drivers/librte_vhost/socket.c    | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/librte_vhost/rte_vhost.h b/drivers/librte_vhost/rte_vhost.h
index d33206997..d91d8d992 100644
--- a/drivers/librte_vhost/rte_vhost.h
+++ b/drivers/librte_vhost/rte_vhost.h
@@ -28,6 +28,7 @@ extern "C" {
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
 #define RTE_VHOST_USER_IOMMU_SUPPORT	(1ULL << 3)
+#define RTE_VHOST_USER_VIRTIO_TRANSPORT	(1ULL << 4)
 
 /**
  * Information relating to memory regions including offsets to
diff --git a/drivers/librte_vhost/socket.c b/drivers/librte_vhost/socket.c
index c46328950..265a014cf 100644
--- a/drivers/librte_vhost/socket.c
+++ b/drivers/librte_vhost/socket.c
@@ -132,6 +132,9 @@ rte_vhost_driver_register(const char *path, uint64_t flags)
 	struct vhost_user_socket *vsocket;
 	const struct vhost_transport_ops *trans_ops = &af_unix_trans_ops;
 
+	if (flags & RTE_VHOST_USER_VIRTIO_TRANSPORT)
+		trans_ops = &virtio_vhost_user_trans_ops;
+
 	if (!path)
 		return -1;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 19/24] net/vhost: add virtio-vhost-user support
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (17 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 18/24] vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 20/24] examples/vhost_scsi: add --socket-file argument Stefan Hajnoczi
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The new virtio-transport=0|1 argument enables virtio-vhost-user support:

  testpmd ... --pci-whitelist 0000:00:04.0 \
              --vdev vhost,iface=0000:00:04.0,virtio-transport=1

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 2536ee4a2..a136ce89f 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -53,6 +53,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 #define ETH_VHOST_CLIENT_ARG		"client"
 #define ETH_VHOST_DEQUEUE_ZERO_COPY	"dequeue-zero-copy"
 #define ETH_VHOST_IOMMU_SUPPORT		"iommu-support"
+#define ETH_VHOST_VIRTIO_TRANSPORT	"virtio-transport"
 #define VHOST_MAX_PKT_BURST 32
 
 static const char *valid_arguments[] = {
@@ -61,6 +62,7 @@ static const char *valid_arguments[] = {
 	ETH_VHOST_CLIENT_ARG,
 	ETH_VHOST_DEQUEUE_ZERO_COPY,
 	ETH_VHOST_IOMMU_SUPPORT,
+	ETH_VHOST_VIRTIO_TRANSPORT,
 	NULL
 };
 
@@ -1167,6 +1169,7 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 	int client_mode = 0;
 	int dequeue_zero_copy = 0;
 	int iommu_support = 0;
+	uint16_t virtio_transport = 0;
 
 	RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n",
 		rte_vdev_device_name(dev));
@@ -1224,6 +1227,16 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev)
 			flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
 	}
 
+	if (rte_kvargs_count(kvlist, ETH_VHOST_VIRTIO_TRANSPORT) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_VHOST_VIRTIO_TRANSPORT,
+					 &open_int, &virtio_transport);
+		if (ret < 0)
+			goto out_free;
+
+		if (virtio_transport)
+			flags |= RTE_VHOST_USER_VIRTIO_TRANSPORT;
+	}
+
 	if (dev->device.numa_node == SOCKET_ID_ANY)
 		dev->device.numa_node = rte_socket_id();
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 20/24] examples/vhost_scsi: add --socket-file argument
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (18 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 19/24] net/vhost: add virtio-vhost-user support Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 21/24] examples/vhost_scsi: add virtio-vhost-user support Stefan Hajnoczi
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The default filename built into examples/vhost_scsi may not be
convenient.  Allow the user to specify the full UNIX domain socket path
on the command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 examples/vhost_scsi/vhost_scsi.c | 93 ++++++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 18 deletions(-)

diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
index da01ad378..0a8302e02 100644
--- a/examples/vhost_scsi/vhost_scsi.c
+++ b/examples/vhost_scsi/vhost_scsi.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2017 Intel Corporation
  */
 
+#include <getopt.h>
 #include <stdint.h>
 #include <unistd.h>
 #include <stdbool.h>
@@ -359,26 +360,10 @@ static const struct vhost_device_ops vhost_scsi_device_ops = {
 };
 
 static struct vhost_scsi_ctrlr *
-vhost_scsi_ctrlr_construct(const char *ctrlr_name)
+vhost_scsi_ctrlr_construct(void)
 {
 	int ret;
 	struct vhost_scsi_ctrlr *ctrlr;
-	char *path;
-	char cwd[PATH_MAX];
-
-	/* always use current directory */
-	path = getcwd(cwd, PATH_MAX);
-	if (!path) {
-		fprintf(stderr, "Cannot get current working directory\n");
-		return NULL;
-	}
-	snprintf(dev_pathname, sizeof(dev_pathname), "%s/%s", path, ctrlr_name);
-
-	if (access(dev_pathname, F_OK) != -1) {
-		if (unlink(dev_pathname) != 0)
-			rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
-				 dev_pathname);
-	}
 
 	if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
 		fprintf(stderr, "socket %s already exists\n", dev_pathname);
@@ -412,6 +397,71 @@ signal_handler(__rte_unused int signum)
 	exit(0);
 }
 
+static void
+set_dev_pathname(const char *path)
+{
+	if (dev_pathname[0])
+		rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n");
+
+	snprintf(dev_pathname, sizeof(dev_pathname), "%s", path);
+}
+
+static void
+vhost_scsi_usage(const char *prgname)
+{
+	fprintf(stderr, "%s [EAL options] --\n"
+	"    --socket-file PATH: The path of the UNIX domain socket\n",
+		prgname);
+}
+
+static void
+vhost_scsi_parse_args(int argc, char **argv)
+{
+	int opt;
+	int option_index;
+	const char *prgname = argv[0];
+	static struct option long_option[] = {
+		{"socket-file", required_argument, NULL, 0},
+		{NULL, 0, 0, 0},
+	};
+
+	while ((opt = getopt_long(argc, argv, "", long_option,
+				  &option_index)) != -1) {
+		switch (opt) {
+		case 0:
+			if (!strcmp(long_option[option_index].name,
+				    "socket-file")) {
+				set_dev_pathname(optarg);
+			}
+			break;
+		default:
+			vhost_scsi_usage(prgname);
+			rte_exit(EXIT_FAILURE, "Invalid argument\n");
+		}
+	}
+}
+
+static void
+vhost_scsi_set_default_dev_pathname(void)
+{
+	char *path;
+	char cwd[PATH_MAX];
+
+	/* always use current directory */
+	path = getcwd(cwd, PATH_MAX);
+	if (!path) {
+		rte_exit(EXIT_FAILURE,
+			 "Cannot get current working directory\n");
+	}
+	snprintf(dev_pathname, sizeof(dev_pathname), "%s/vhost.socket", path);
+
+	if (access(dev_pathname, F_OK) != -1) {
+		if (unlink(dev_pathname) != 0)
+			rte_exit(EXIT_FAILURE, "Cannot remove %s.\n",
+				 dev_pathname);
+	}
+}
+
 int main(int argc, char *argv[])
 {
 	int ret;
@@ -422,8 +472,15 @@ int main(int argc, char *argv[])
 	ret = rte_eal_init(argc, argv);
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+	argc -= ret;
+	argv += ret;
 
-	g_vhost_ctrlr = vhost_scsi_ctrlr_construct("vhost.socket");
+	vhost_scsi_parse_args(argc, argv);
+
+	if (!dev_pathname[0])
+		vhost_scsi_set_default_dev_pathname();
+
+	g_vhost_ctrlr = vhost_scsi_ctrlr_construct();
 	if (g_vhost_ctrlr == NULL) {
 		fprintf(stderr, "Construct vhost scsi controller failed\n");
 		return 0;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 21/24] examples/vhost_scsi: add virtio-vhost-user support
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (19 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 20/24] examples/vhost_scsi: add --socket-file argument Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 22/24] usertools: add virtio-vhost-user devices to dpdk-devbind.py Stefan Hajnoczi
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The new --virtio-vhost-user-pci command-line argument uses
virtio-vhost-user instead of the default AF_UNIX transport.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 examples/vhost_scsi/vhost_scsi.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
index 0a8302e02..61001cadb 100644
--- a/examples/vhost_scsi/vhost_scsi.c
+++ b/examples/vhost_scsi/vhost_scsi.c
@@ -28,6 +28,7 @@
 
 /* Path to folder where character device will be created. Can be set by user. */
 static char dev_pathname[PATH_MAX] = "";
+static uint64_t dev_flags; /* for rte_vhost_driver_register() */
 
 static struct vhost_scsi_ctrlr *g_vhost_ctrlr;
 static int g_should_stop;
@@ -365,7 +366,7 @@ vhost_scsi_ctrlr_construct(void)
 	int ret;
 	struct vhost_scsi_ctrlr *ctrlr;
 
-	if (rte_vhost_driver_register(dev_pathname, 0) != 0) {
+	if (rte_vhost_driver_register(dev_pathname, dev_flags) != 0) {
 		fprintf(stderr, "socket %s already exists\n", dev_pathname);
 		return NULL;
 	}
@@ -401,7 +402,8 @@ static void
 set_dev_pathname(const char *path)
 {
 	if (dev_pathname[0])
-		rte_exit(EXIT_FAILURE, "--socket-file can only be given once.\n");
+		rte_exit(EXIT_FAILURE, "Only one of --socket-file or "
+			 "--virtio-vhost-user-pci can be given.\n");
 
 	snprintf(dev_pathname, sizeof(dev_pathname), "%s", path);
 }
@@ -410,7 +412,8 @@ static void
 vhost_scsi_usage(const char *prgname)
 {
 	fprintf(stderr, "%s [EAL options] --\n"
-	"    --socket-file PATH: The path of the UNIX domain socket\n",
+	"    --socket-file PATH: The path of the UNIX domain socket\n"
+	"    --virtio-vhost-user-pci DomBDF: PCI adapter address\n",
 		prgname);
 }
 
@@ -422,6 +425,7 @@ vhost_scsi_parse_args(int argc, char **argv)
 	const char *prgname = argv[0];
 	static struct option long_option[] = {
 		{"socket-file", required_argument, NULL, 0},
+		{"virtio-vhost-user-pci", required_argument, NULL, 0},
 		{NULL, 0, 0, 0},
 	};
 
@@ -432,6 +436,10 @@ vhost_scsi_parse_args(int argc, char **argv)
 			if (!strcmp(long_option[option_index].name,
 				    "socket-file")) {
 				set_dev_pathname(optarg);
+			} else if (!strcmp(long_option[option_index].name,
+				   "virtio-vhost-user-pci")) {
+				set_dev_pathname(optarg);
+				dev_flags = RTE_VHOST_USER_VIRTIO_TRANSPORT;
 			}
 			break;
 		default:
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 22/24] usertools: add virtio-vhost-user devices to dpdk-devbind.py
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (20 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 21/24] examples/vhost_scsi: add virtio-vhost-user support Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 13:44 ` [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion Stefan Hajnoczi
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The virtio-vhost-user PCI adapter is not detected in any existing group
of devices supported by dpdk-devbind.py.  Add a new "Others" group for
miscellaneous devices like this one.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 usertools/dpdk-devbind.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 894b51969..35e1883b9 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -22,11 +22,14 @@
               'SVendor': None, 'SDevice': None}
 cavium_pkx = {'Class': '08', 'Vendor': '177d', 'Device': 'a0dd,a049',
               'SVendor': None, 'SDevice': None}
+virtio_vhost_user = {'Class': '00', 'Vendor': '1af4', 'Device': '1017,1058',
+                     'SVendor': None, 'SDevice': None}
 
 network_devices = [network_class, cavium_pkx]
 crypto_devices = [encryption_class, intel_processor_class]
 eventdev_devices = [cavium_sso]
 mempool_devices = [cavium_fpa]
+other_devices = [virtio_vhost_user]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
@@ -596,6 +599,9 @@ def show_status():
     if status_dev == "mempool" or status_dev == "all":
         show_device_status(mempool_devices, "Mempool")
 
+    if status_dev == 'other' or status_dev == 'all':
+        show_device_status(other_devices, "Other")
+
 def parse_args():
     '''Parses the command-line arguments given by the user and takes the
     appropriate action for each'''
@@ -669,6 +675,7 @@ def do_arg_actions():
             get_device_details(crypto_devices)
             get_device_details(eventdev_devices)
             get_device_details(mempool_devices)
+            get_device_details(other_devices)
         show_status()
 
 
@@ -681,6 +688,7 @@ def main():
     get_device_details(crypto_devices)
     get_device_details(eventdev_devices)
     get_device_details(mempool_devices)
+    get_device_details(other_devices)
     do_arg_actions()
 
 if __name__ == "__main__":
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (21 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 22/24] usertools: add virtio-vhost-user devices to dpdk-devbind.py Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-30 17:52   ` Maxime Coquelin
  2018-01-19 13:44 ` [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX Stefan Hajnoczi
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The virtio-net mq vring deletion code should be in virtio_net.c, not in
the generic vhost_user.c code where it breaks non-virtio-net devices.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/librte_vhost/vhost_user.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/librte_vhost/vhost_user.c b/drivers/librte_vhost/vhost_user.c
index a819684b4..08fab933b 100644
--- a/drivers/librte_vhost/vhost_user.c
+++ b/drivers/librte_vhost/vhost_user.c
@@ -163,6 +163,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 		(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : "off",
 		(dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : "off");
 
+#if 0
 	if (!(dev->features & (1ULL << VIRTIO_NET_F_MQ))) {
 		/*
 		 * Remove all but first queue pair if MQ hasn't been
@@ -181,6 +182,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 			free_vq(vq);
 		}
 	}
+#endif
 
 	return 0;
 }
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (22 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion Stefan Hajnoczi
@ 2018-01-19 13:44 ` Stefan Hajnoczi
  2018-01-19 19:31   ` Michael S. Tsirkin
  2018-01-31 10:02 ` [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Maxime Coquelin
       [not found] ` <20180410093847.GA22081@stefanha-x1.localdomain>
  25 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2018-01-19 13:44 UTC (permalink / raw)
  To: dev
  Cc: maxime.coquelin, Yuanhan Liu, wei.w.wang, mst, zhiyong.yang,
	jasowang, Stefan Hajnoczi

The EVENT_IDX code in DPDK is broken.  It's missing the
signalled_used_valid flag that handles the corner cases (startup and
wrapping).  Disable it for now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 examples/vhost_scsi/vhost_scsi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
index 61001cadb..7106dc6d2 100644
--- a/examples/vhost_scsi/vhost_scsi.c
+++ b/examples/vhost_scsi/vhost_scsi.c
@@ -22,7 +22,6 @@
 #include "scsi_spec.h"
 
 #define VIRTIO_SCSI_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) |\
-			      (1 << VIRTIO_RING_F_EVENT_IDX) |\
 			      (1 << VIRTIO_SCSI_F_INOUT) |\
 			      (1 << VIRTIO_SCSI_F_CHANGE))
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX
  2018-01-19 13:44 ` [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX Stefan Hajnoczi
@ 2018-01-19 19:31   ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2018-01-19 19:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: dev, maxime.coquelin, Yuanhan Liu, wei.w.wang, zhiyong.yang, jasowang

On Fri, Jan 19, 2018 at 01:44:44PM +0000, Stefan Hajnoczi wrote:
> The EVENT_IDX code in DPDK is broken.  It's missing the
> signalled_used_valid flag that handles the corner cases (startup and
> wrapping).  Disable it for now.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

FYI signalled_used_valid isn't strictly required,
there are ways to handle event idx without that,
e.g. like virtio within guest.

> ---
>  examples/vhost_scsi/vhost_scsi.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/examples/vhost_scsi/vhost_scsi.c b/examples/vhost_scsi/vhost_scsi.c
> index 61001cadb..7106dc6d2 100644
> --- a/examples/vhost_scsi/vhost_scsi.c
> +++ b/examples/vhost_scsi/vhost_scsi.c
> @@ -22,7 +22,6 @@
>  #include "scsi_spec.h"
>  
>  #define VIRTIO_SCSI_FEATURES ((1 << VIRTIO_F_NOTIFY_ON_EMPTY) |\
> -			      (1 << VIRTIO_RING_F_EVENT_IDX) |\
>  			      (1 << VIRTIO_SCSI_F_INOUT) |\
>  			      (1 << VIRTIO_SCSI_F_CHANGE))
>  
> -- 
> 2.14.3

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion
  2018-01-19 13:44 ` [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion Stefan Hajnoczi
@ 2018-01-30 17:52   ` Maxime Coquelin
  0 siblings, 0 replies; 29+ messages in thread
From: Maxime Coquelin @ 2018-01-30 17:52 UTC (permalink / raw)
  To: Stefan Hajnoczi, dev; +Cc: Yuanhan Liu, wei.w.wang, mst, zhiyong.yang, jasowang

Hi Stefan,

On 01/19/2018 02:44 PM, Stefan Hajnoczi wrote:
> The virtio-net mq vring deletion code should be in virtio_net.c, not in
> the generic vhost_user.c code where it breaks non-virtio-net devices.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   drivers/librte_vhost/vhost_user.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/librte_vhost/vhost_user.c b/drivers/librte_vhost/vhost_user.c
> index a819684b4..08fab933b 100644
> --- a/drivers/librte_vhost/vhost_user.c
> +++ b/drivers/librte_vhost/vhost_user.c
> @@ -163,6 +163,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
>   		(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : "off",
>   		(dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : "off");
>   
> +#if 0
>   	if (!(dev->features & (1ULL << VIRTIO_NET_F_MQ))) {
>   		/*
>   		 * Remove all but first queue pair if MQ hasn't been
> @@ -181,6 +182,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
>   			free_vq(vq);
>   		}
>   	}
> +#endif
>   
>   	return 0;
>   }
> 

Thanks for reporting the issue.
It seems difficult to move this check in virtio-net.c without a deep
rework.

But I think we can workaround by ensuring the backend supports
VIRTIO_NET_F_MQ, but it has not been negotiated.
Something like:

if ((vhost_features & (1ULL << VIRTIO_NET_F_MQ)) &&
	!(dev->features & (1ULL << VIRTIO_NET_F_MQ)) {
...
}

Any thoughts?

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport
  2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
                   ` (23 preceding siblings ...)
  2018-01-19 13:44 ` [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX Stefan Hajnoczi
@ 2018-01-31 10:02 ` Maxime Coquelin
       [not found] ` <20180410093847.GA22081@stefanha-x1.localdomain>
  25 siblings, 0 replies; 29+ messages in thread
From: Maxime Coquelin @ 2018-01-31 10:02 UTC (permalink / raw)
  To: Stefan Hajnoczi, dev; +Cc: Yuanhan Liu, wei.w.wang, mst, zhiyong.yang, jasowang

Hi Stefan,

I went through the series, and I really like how it fits with existing
vhost-user over af_unix socket.

I will post the change I suggested to fix the regression met with SPDK
and multiqueue.

Thanks,
Maxime

On 01/19/2018 02:44 PM, Stefan Hajnoczi wrote:
> This patch series implements the virtio-vhost-user device, which tunnels
> vhost-user protocol messages over virtio.  This lets guests act as vhost device
> backends for other guests.
> 
> The virtio-vhost-user device is the result of discussion about Wei Wang and
> Zhiyong Yang's vhost-pci device.  This patch series demonstrates that vhost
> device backends, such as the vhost vdev driver, can work over both AF_UNIX and
> virtio-vhost-user without significant modifications.  This allows a lot of code
> to be shared between traditional AF_UNIX vhost-user and virtio-vhost-user.  The
> vhost-pci patches duplicated the vhost net device backend and didn't reuse
> librte_vhost:
> 
>    http://dpdk.org/ml/archives/dev/2017-November/082615.html
> 
> User-visible changes
> --------------------
> The vhost vdev can now be used when DPDK runs inside a guest with a
> virtio-vhost-user PCI device:
> 
>    --vdev net_vhost0,iface="0000:00:04.0",virtio-transport=1
> 
> The vhost-scsi example has also been extended to support virtio-vhost-user:
> 
>    ./vhost-scsi ... -- --virtio-vhost-user-pci "0000:00:04.0"
> 
> For more information (including instructions for running the code), see
> https://wiki.qemu.org/Features/VirtioVhostUser
> 
> Virtio device design
> --------------------
> The virtio-vhost-user device is a new virtio device type.  It acts as a
> vhost-user transport and is an alternative for the traditional AF_UNIX
> transport.
> 
> You can find the virtio-vhost-user VIRTIO device specification here:
> https://stefanha.github.io/virtio/vhost-user-slave.html#x1-2830007
> 
> librte_vhost API changes
> ------------------------
> This patch series extends librte_vhost so that it now accepts:
> 
>    rte_vhost_driver_register("0000:00:04.0",
>                              RTE_VHOST_USER_VIRTIO_TRANSPORT);
> 
> All other librte_vhost API usage remains unchanged, except that the file
> descriptors exposed in some <rte_vhost.h> structs will be -1 since there is no
> file descriptor passing involved.
> 
> This makes it extremely easy to support virtio-vhost-user in existing vhost
> device backends!  I have extended the vhost vdev driver and the vhost-scsi
> example application in this patch series.
> 
> Patch series overview
> ---------------------
> This series is based on commit 814339ba7eea13d132508af2cccec2f73568e2d0 from
> dpdk-next-virtio/master.  You can also get my git branch here:
> 
>    https://github.com/stefanha/dpdk/tree/virtio-vhost-user
> 
> The initial patches refactor librte_vhost so that AF_UNIX-specific code is moved
> to a new trans_af_unix.c file.  This also introduces a struct
> vhost_transport_ops interface that all transports will implement:
> 
>    adf412c3c vhost: move vring_call() into trans_af_unix.c
>    04079a077 vhost: move AF_UNIX code from socket.c to trans_af_unix.c
>    885642091 vhost: allocate per-socket transport state
>    9c48377df vhost: move socket_fd and un sockaddr into trans_af_unix.c
>    c646e6292 vhost: move start_server/client() calls to trans_af_unix.c
>    db07ef7a8 vhost: move vhost_user_connection to trans_af_unix.c
>    d10b80163 vhost: move vhost_user_reconnect_init() into trans_af_unix.c
>    1258bcd68 vhost: move vhost_user.fdset to trans_af_unix.c
>    bc7f6d7ab vhost: pass vhost_transport_ops through vhost_new_device()
>    b82187a29 vhost: embed struct virtio_net inside struct vhost_user_connection
>    abbd544f2 vhost: extract vhost_user.c socket I/O into transport
>    0658d711b vhost: move slave_req_fd field to AF_UNIX transport
>    e2ecf78ed vhost: move mmap/munmap to AF_UNIX transport
> 
> I was about to add the virtio-vhost-user PCI driver when I realized that
> lib/librte_vhost/ cannot have a dependency on rte_bus_pci.h (it lives in
> drivers/).  The solution I chose is to move all of librte_vhost to
> drivers/librte_vhost/, but I'm open to suggestions if this is undesirable.
> 
>    8615c2140 vhost: move librte_vhost to drivers/
> 
> Since virtio-vhost-user is a virtio device it's necessary to perform a sequence
> of device initialization steps and set up virtqueues.  I didn't see a virtio
> API in DPDK and didn't have time to create one myself.  I copied the virtio
> code from drivers/net/virtio/ as a quick hack, but really there needs to be a
> librte_virtio that is shared.
> 
>    c26937c66 vhost: add virtio pci framework
> 
> Next the virtio-vhost-user transport is added along with the
> RTE_VHOST_USER_VIRTIO_TRANSPORT flag:
> 
>    7fad5adc4 vhost: remember a vhost_virtqueue's queue index
>    d26922892 vhost: add virtio-vhost-user transport
>    f562a70ab vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag
> 
> Then I extended the vhost vdev and vhost-scsi example application to support
> virtio-vhost-user:
> 
>    47434f2a2 net/vhost: add virtio-vhost-user support
>    22ca05d1b examples/vhost_scsi: add --socket-file argument
>    db3b391dc examples/vhost_scsi: add virtio-vhost-user support
> 
> It was also necessary to tweak dpdk-devbind.py to support virtio-vhost-user:
> 
>    4aee6f653 usertools: add virtio-vhost-user devices to dpdk-devbind.py
> 
> Finally, vhost-scsi seems broken to me so two workarounds were needed so it can
> be tested again:
> 
>    b9d17bfaf WORKAROUND revert virtio-net mq vring deletion
>    cadb25e7d WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX
> 
> Stefan Hajnoczi (24):
>    vhost: move vring_call() into trans_af_unix.c
>    vhost: move AF_UNIX code from socket.c to trans_af_unix.c
>    vhost: allocate per-socket transport state
>    vhost: move socket_fd and un sockaddr into trans_af_unix.c
>    vhost: move start_server/client() calls to trans_af_unix.c
>    vhost: move vhost_user_connection to trans_af_unix.c
>    vhost: move vhost_user_reconnect_init() into trans_af_unix.c
>    vhost: move vhost_user.fdset to trans_af_unix.c
>    vhost: pass vhost_transport_ops through vhost_new_device()
>    vhost: embed struct virtio_net inside struct vhost_user_connection
>    vhost: extract vhost_user.c socket I/O into transport
>    vhost: move slave_req_fd field to AF_UNIX transport
>    vhost: move mmap/munmap to AF_UNIX transport
>    vhost: move librte_vhost to drivers/
>    vhost: add virtio pci framework
>    vhost: remember a vhost_virtqueue's queue index
>    vhost: add virtio-vhost-user transport
>    vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag
>    net/vhost: add virtio-vhost-user support
>    examples/vhost_scsi: add --socket-file argument
>    examples/vhost_scsi: add virtio-vhost-user support
>    usertools: add virtio-vhost-user devices to dpdk-devbind.py
>    WORKAROUND revert virtio-net mq vring deletion
>    WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX
> 
>   drivers/Makefile                                   |    2 +
>   {lib => drivers}/librte_vhost/Makefile             |    5 +-
>   lib/Makefile                                       |    3 -
>   {lib => drivers}/librte_vhost/fd_man.h             |    0
>   {lib => drivers}/librte_vhost/iotlb.h              |    0
>   {lib => drivers}/librte_vhost/rte_vhost.h          |    1 +
>   {lib => drivers}/librte_vhost/vhost.h              |  193 +++-
>   {lib => drivers}/librte_vhost/vhost_user.h         |    9 +-
>   drivers/librte_vhost/virtio_pci.h                  |  267 +++++
>   drivers/librte_vhost/virtio_vhost_user.h           |   18 +
>   drivers/librte_vhost/virtqueue.h                   |  181 ++++
>   {lib => drivers}/librte_vhost/fd_man.c             |    0
>   {lib => drivers}/librte_vhost/iotlb.c              |    0
>   drivers/librte_vhost/socket.c                      |  282 ++++++
>   drivers/librte_vhost/trans_af_unix.c               |  795 +++++++++++++++
>   drivers/librte_vhost/trans_virtio_vhost_user.c     | 1050 ++++++++++++++++++++
>   {lib => drivers}/librte_vhost/vhost.c              |   18 +-
>   {lib => drivers}/librte_vhost/vhost_user.c         |  200 +---
>   {lib => drivers}/librte_vhost/virtio_net.c         |    0
>   drivers/librte_vhost/virtio_pci.c                  |  504 ++++++++++
>   drivers/net/vhost/rte_eth_vhost.c                  |   13 +
>   examples/vhost_scsi/vhost_scsi.c                   |  104 +-
>   lib/librte_vhost/socket.c                          |  828 ---------------
>   .../librte_vhost/rte_vhost_version.map             |    0
>   usertools/dpdk-devbind.py                          |    8 +
>   25 files changed, 3455 insertions(+), 1026 deletions(-)
>   rename {lib => drivers}/librte_vhost/Makefile (86%)
>   rename {lib => drivers}/librte_vhost/fd_man.h (100%)
>   rename {lib => drivers}/librte_vhost/iotlb.h (100%)
>   rename {lib => drivers}/librte_vhost/rte_vhost.h (99%)
>   rename {lib => drivers}/librte_vhost/vhost.h (68%)
>   rename {lib => drivers}/librte_vhost/vhost_user.h (93%)
>   create mode 100644 drivers/librte_vhost/virtio_pci.h
>   create mode 100644 drivers/librte_vhost/virtio_vhost_user.h
>   create mode 100644 drivers/librte_vhost/virtqueue.h
>   rename {lib => drivers}/librte_vhost/fd_man.c (100%)
>   rename {lib => drivers}/librte_vhost/iotlb.c (100%)
>   create mode 100644 drivers/librte_vhost/socket.c
>   create mode 100644 drivers/librte_vhost/trans_af_unix.c
>   create mode 100644 drivers/librte_vhost/trans_virtio_vhost_user.c
>   rename {lib => drivers}/librte_vhost/vhost.c (97%)
>   rename {lib => drivers}/librte_vhost/vhost_user.c (89%)
>   rename {lib => drivers}/librte_vhost/virtio_net.c (100%)
>   create mode 100644 drivers/librte_vhost/virtio_pci.c
>   delete mode 100644 lib/librte_vhost/socket.c
>   rename {lib => drivers}/librte_vhost/rte_vhost_version.map (100%)
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport
       [not found] ` <20180410093847.GA22081@stefanha-x1.localdomain>
@ 2018-04-10 14:56   ` Wang, Wei W
  0 siblings, 0 replies; 29+ messages in thread
From: Wang, Wei W @ 2018-04-10 14:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, dev; +Cc: maxime.coquelin, Yuanhan Liu, Yang, Zhiyong

On Tuesday, April 10, 2018 5:39 PM, Stefan Hajnoczi wrote:
> Wei: Are you still interested in this?
> 
> In order to make further progress a review of this patch series is required.
> 
> If it's not a priority for you, that's fine too.  We can leave it and 
> pick it up (or replace it with something better) in the future.

Hi Stefan,

Yes, I'm interested in it. But apologize that this task is postponed due to some higher priority tasks on my worklist. I think I would be able to allocate some bandwidth for this project after the live migration related patches get merged.

When coming back, I plan to discuss with you about the QEMU patches, and finalize the device part first. Then we will need dpdk guys' help on the driver patches.

Thanks a lot for your hard work on the new prototype.

Best,
Wei

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-04-10 14:56 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19 13:44 [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 01/24] vhost: move vring_call() into trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 02/24] vhost: move AF_UNIX code from socket.c to trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 03/24] vhost: allocate per-socket transport state Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 04/24] vhost: move socket_fd and un sockaddr into trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 05/24] vhost: move start_server/client() calls to trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 06/24] vhost: move vhost_user_connection " Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 07/24] vhost: move vhost_user_reconnect_init() into trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 08/24] vhost: move vhost_user.fdset to trans_af_unix.c Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 09/24] vhost: pass vhost_transport_ops through vhost_new_device() Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 10/24] vhost: embed struct virtio_net inside struct vhost_user_connection Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 11/24] vhost: extract vhost_user.c socket I/O into transport Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 12/24] vhost: move slave_req_fd field to AF_UNIX transport Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 13/24] vhost: move mmap/munmap " Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 14/24] vhost: move librte_vhost to drivers/ Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 15/24] vhost: add virtio pci framework Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 16/24] vhost: remember a vhost_virtqueue's queue index Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 17/24] vhost: add virtio-vhost-user transport Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 18/24] vhost: add RTE_VHOST_USER_VIRTIO_TRANSPORT flag Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 19/24] net/vhost: add virtio-vhost-user support Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 20/24] examples/vhost_scsi: add --socket-file argument Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 21/24] examples/vhost_scsi: add virtio-vhost-user support Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 22/24] usertools: add virtio-vhost-user devices to dpdk-devbind.py Stefan Hajnoczi
2018-01-19 13:44 ` [dpdk-dev] [RFC 23/24] WORKAROUND revert virtio-net mq vring deletion Stefan Hajnoczi
2018-01-30 17:52   ` Maxime Coquelin
2018-01-19 13:44 ` [dpdk-dev] [RFC 24/24] WORKAROUND examples/vhost_scsi: avoid broken EVENT_IDX Stefan Hajnoczi
2018-01-19 19:31   ` Michael S. Tsirkin
2018-01-31 10:02 ` [dpdk-dev] [RFC 00/24] vhost: add virtio-vhost-user transport Maxime Coquelin
     [not found] ` <20180410093847.GA22081@stefanha-x1.localdomain>
2018-04-10 14:56   ` Wang, Wei W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).