DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension
@ 2014-11-06 11:14 Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions Tetsuya Mukawa
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

Here are RFC patches to add vhost-user extension to librte_vhost.

It seems now you are merging a patch that fixes coding style of
librte_vhost.
Unfortunately my patches based on latest tree, so I will submit
again after your patch is acked.
Because of this, I haven't check coding style strictly. When I
rebase on your new patch, I will check coding style too.

Anyway, could you please check patches?

Thanks,
Tetsuya

Tetsuya Mukawa (7):
  lib/librte_vhost: Fix host_memory_map() to handle various memory
    regions
  lib/librte_vhost: Add an abstraction layer for vhost backends
  lib/librte_vhost: Add an abstraction layer tointerpret messages
  lib/librte_vhost: Move vhost vhost-cuse device list and accessor
    functions
  lib/librte_vhost: Add a vhost session abstraction
  lib/librte_vhost: Add vhost-cuse/user specific initialization
  lib/librte_vhost: Add vhost-user implementation

 lib/librte_vhost/Makefile          |   2 +-
 lib/librte_vhost/rte_virtio_net.h  |  49 ++-
 lib/librte_vhost/vhost-net-cdev.c  |  29 +-
 lib/librte_vhost/vhost-net-cdev.h  | 113 -------
 lib/librte_vhost/vhost-net-user.c  | 541 ++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost-net.c       | 132 ++++++++
 lib/librte_vhost/vhost-net.h       | 127 +++++++
 lib/librte_vhost/vhost_rxtx.c      |   2 +-
 lib/librte_vhost/virtio-net-cdev.c | 624 ++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio-net-user.c | 410 +++++++++++++++++++++++
 lib/librte_vhost/virtio-net.c      | 669 ++++++++-----------------------------
 11 files changed, 2032 insertions(+), 666 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost-net.c
 create mode 100644 lib/librte_vhost/vhost-net.h
 create mode 100644 lib/librte_vhost/virtio-net-cdev.c
 create mode 100644 lib/librte_vhost/virtio-net-user.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 2/7] lib/librte_vhost: Add an abstraction layer for vhost backends Tetsuya Mukawa
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Without this patch, host_memory_map() can only handle a region that
exists on head of a guest physical memory. The patch fixes the
host_memory_map() to handle regions exist on middle of the physical memory.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/virtio-net.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8015dd8..9155a68 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -83,6 +83,7 @@ const uint32_t BUFSIZE = PATH_MAX;
 /* Structure containing information gathered from maps file. */
 struct procmap {
 	uint64_t	va_start;	/* Start virtual address in file. */
+	uint64_t	va_end;		/* End virtual address in file. */
 	uint64_t	len;		/* Size of file. */
 	uint64_t	pgoff;		/* Not used. */
 	uint32_t	maj;		/* Not used. */
@@ -176,7 +177,7 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 			return -1;
 		}
 
-		procmap.len = strtoull(in[1], &end, 16);
+		procmap.va_end = strtoull(in[1], &end, 16);
 		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
 			fclose(fmap);
 			return -1;
@@ -209,8 +210,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 		memcpy(&procmap.prot, in[2], PROT_SZ);
 		memcpy(&procmap.fname, in[7], PATH_MAX);
 
-		if (procmap.va_start == addr) {
-			procmap.len = procmap.len - procmap.va_start;
+		if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) {
+			procmap.len = procmap.va_end - procmap.va_start;
 			found = 1;
 			break;
 		}
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 2/7] lib/librte_vhost: Add an abstraction layer for vhost backends
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages Tetsuya Mukawa
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

The patch adds an abstraction layer for vhost backends.
So far CUSE is the only one vhost backend. But QEMU-2.1 can have one
more backend called vhost-user. To handle both backends, this kind of
layer is needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/Makefile         |   2 +-
 lib/librte_vhost/rte_virtio_net.h |  30 ++++++++--
 lib/librte_vhost/vhost-net-cdev.c |  24 ++++----
 lib/librte_vhost/vhost-net-cdev.h | 113 --------------------------------------
 lib/librte_vhost/vhost-net.c      |  97 ++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost-net.h      | 113 ++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_rxtx.c     |   2 +-
 lib/librte_vhost/virtio-net.c     |   2 +-
 8 files changed, 252 insertions(+), 131 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.c
 create mode 100644 lib/librte_vhost/vhost-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index c008d64..0d4aa98 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -37,7 +37,7 @@ LIB = librte_vhost.a
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index b6548a1..a36c0e3 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -31,8 +31,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
+#ifndef _RTE_VIRTIO_NET_H_
+#define _RTE_VIRTIO_NET_H_
 
 /**
  * @file
@@ -71,6 +71,25 @@ struct buf_vector {
 };
 
 /**
+ * Enum for vhost driver types.
+ */
+typedef enum {
+	VHOST_DRV_CUSE, /* cuse driver */
+	VHOST_DRV_NUM	/* the number of vhost driver types */
+} vhost_driver_type_t;
+
+/**
+ * Structure contains information relating vhost driver.
+ */
+struct vhost_driver {
+	vhost_driver_type_t	type;		/**< driver type. */
+	const char		*dev_name;	/**< accessing device name. */
+	union {
+		struct fuse_session *session;	/**< fuse session. */
+	};
+};
+
+/**
  * Structure contains variables relevant to RX/TX virtqueues.
  */
 struct vhost_virtqueue {
@@ -176,12 +195,13 @@ uint64_t rte_vhost_feature_get(void);
 int rte_vhost_enable_guest_notification(struct virtio_net *dev, uint16_t queue_id, int enable);
 
 /* Register vhost driver. dev_name could be different for multiple instance support. */
-int rte_vhost_driver_register(const char *dev_name);
+struct vhost_driver *rte_vhost_driver_register(
+		const char *dev_name, vhost_driver_type_t type);
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
 /* Start vhost driver session blocking loop. */
-int rte_vhost_driver_session_start(void);
+int rte_vhost_driver_session_start(struct vhost_driver *drv);
 
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
@@ -210,4 +230,4 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id,
 uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
 
-#endif /* _VIRTIO_NET_H_ */
+#endif /* _RTE_VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
index 91ff0d8..83e1d14 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -2,7 +2,6 @@
  *   BSD LICENSE
  *
  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -44,7 +43,7 @@
 #include <rte_string_fns.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define FUSE_OPT_DUMMY		"\0\0"
 #define FUSE_OPT_FORE		"-f\0\0"
@@ -55,7 +54,6 @@ static const uint32_t	default_minor = 1;
 static const char	cuse_device_name[]	= "/dev/cuse";
 static const char	default_cdev[] = "vhost-net";
 
-static struct fuse_session			*session;
 static struct vhost_net_device_ops	const *ops;
 
 /*
@@ -300,9 +298,10 @@ static const struct cuse_lowlevel_ops vhost_net_ops = {
  * cuse_info is populated and used to register the cuse device. vhost_net_device_ops are
  * also passed when the device is registered in main.c.
  */
-int
-rte_vhost_driver_register(const char *dev_name)
+static int
+vhost_cuse_driver_register(struct vhost_driver *drv)
 {
+	const char *dev_name;
 	struct cuse_info cuse_info;
 	char device_name[PATH_MAX] = "";
 	char char_device_name[PATH_MAX] = "";
@@ -318,6 +317,11 @@ rte_vhost_driver_register(const char *dev_name)
 		return -1;
 	}
 
+	if (drv == NULL)
+		return -1;
+
+	dev_name = drv->dev_name;
+
 	/*
 	 * The device name is created. This is passed to QEMU so that it can register
 	 * the device with our application.
@@ -340,9 +344,9 @@ rte_vhost_driver_register(const char *dev_name)
 
 	ops = get_virtio_net_callbacks();
 
-	session = cuse_lowlevel_setup(3, fuse_argv,
+	drv->session = cuse_lowlevel_setup(3, fuse_argv,
 				&cuse_info, &vhost_net_ops, 0, NULL);
-	if (session == NULL)
+	if (drv->session == NULL)
 		return -1;
 
 	return 0;
@@ -351,10 +355,10 @@ rte_vhost_driver_register(const char *dev_name)
 /**
  * The CUSE session is launched allowing the application to receive open, release and ioctl calls.
  */
-int
-rte_vhost_driver_session_start(void)
+static int
+vhost_cuse_driver_session_start(struct vhost_driver *drv)
 {
-	fuse_session_loop(session);
+	fuse_session_loop(drv->session);
 
 	return 0;
 }
diff --git a/lib/librte_vhost/vhost-net-cdev.h b/lib/librte_vhost/vhost-net-cdev.h
deleted file mode 100644
index 03a5c57..0000000
--- a/lib/librte_vhost/vhost-net-cdev.h
+++ /dev/null
@@ -1,113 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VHOST_NET_CDEV_H_
-#define _VHOST_NET_CDEV_H_
-#include <stdint.h>
-#include <stdio.h>
-#include <sys/types.h>
-#include <unistd.h>
-#include <linux/vhost.h>
-
-#include <rte_log.h>
-
-/* Macros for printing using RTE_LOG */
-#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
-#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
-
-#ifdef RTE_LIBRTE_VHOST_DEBUG
-#define VHOST_MAX_PRINT_BUFF 6072
-#define LOG_LEVEL RTE_LOG_DEBUG
-#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
-#define PRINT_PACKET(device, addr, size, header) do { \
-	char *pkt_addr = (char *)(addr); \
-	unsigned int index; \
-	char packet[VHOST_MAX_PRINT_BUFF]; \
-	\
-	if ((header)) \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
-	else \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
-	for (index = 0; index < (size); index++) { \
-		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
-			"%02hhx ", pkt_addr[index]); \
-	} \
-	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
-	\
-	LOG_DEBUG(VHOST_DATA, "%s", packet); \
-} while (0)
-#else
-#define LOG_LEVEL RTE_LOG_INFO
-#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
-#define PRINT_PACKET(device, addr, size, header) do {} while (0)
-#endif
-
-
-/*
- * Structure used to identify device context.
- */
-struct vhost_device_ctx {
-	pid_t		pid;	/* PID of process calling the IOCTL. */
-	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
-};
-
-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-	int (*new_device)(struct vhost_device_ctx);
-	void (*destroy_device)(struct vhost_device_ctx);
-
-	int (*get_features)(struct vhost_device_ctx, uint64_t *);
-	int (*set_features)(struct vhost_device_ctx, uint64_t *);
-
-	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
-
-	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
-	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
-
-	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
-	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_owner)(struct vhost_device_ctx);
-	int (*reset_owner)(struct vhost_device_ctx);
-};
-
-
-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
-#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost-net.c b/lib/librte_vhost/vhost-net.c
new file mode 100644
index 0000000..b0de5fd
--- /dev/null
+++ b/lib/librte_vhost/vhost-net.c
@@ -0,0 +1,97 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2014 IGEL Co.,Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <errno.h>
+#include <rte_malloc.h>
+#include <rte_virtio_net.h>
+
+#include "vhost-net.h"
+
+/**
+ * Include cuse depend functions and definitions
+ */
+#include "vhost-net-cdev.c"
+
+/**
+ * This function abstracts cuse and vhost-user driver registration.
+ */
+struct vhost_driver *
+rte_vhost_driver_register(const char *dev_name, vhost_driver_type_t type)
+{
+	int ret;
+	struct vhost_driver *drv;
+
+	drv = rte_zmalloc(dev_name, sizeof(struct vhost_driver),
+			CACHE_LINE_SIZE);
+	if (drv == NULL)
+		return NULL;
+
+	drv->dev_name = dev_name;
+	drv->type = type;
+
+	switch (type) {
+	case VHOST_DRV_CUSE:
+		ret = vhost_cuse_driver_register(drv);
+		if (ret != 0)
+			goto err;
+		break;
+	default:
+		break;
+	}
+
+	return drv;
+err:
+	free(drv);
+	return NULL;
+}
+
+/**
+ * The CUSE session is launched allowing the application to receive open, release and ioctl calls.
+ */
+int
+rte_vhost_driver_session_start(struct vhost_driver *drv)
+{
+	if (drv == NULL)
+		return -ENODEV;
+
+	switch (drv->type) {
+	case VHOST_DRV_CUSE:
+		vhost_cuse_driver_session_start(drv);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
new file mode 100644
index 0000000..03a5c57
--- /dev/null
+++ b/lib/librte_vhost/vhost-net.h
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOST_NET_CDEV_H_
+#define _VHOST_NET_CDEV_H_
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <linux/vhost.h>
+
+#include <rte_log.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
+#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
+
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+#define VHOST_MAX_PRINT_BUFF 6072
+#define LOG_LEVEL RTE_LOG_DEBUG
+#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
+#define PRINT_PACKET(device, addr, size, header) do { \
+	char *pkt_addr = (char *)(addr); \
+	unsigned int index; \
+	char packet[VHOST_MAX_PRINT_BUFF]; \
+	\
+	if ((header)) \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
+	else \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
+	for (index = 0; index < (size); index++) { \
+		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
+			"%02hhx ", pkt_addr[index]); \
+	} \
+	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
+	\
+	LOG_DEBUG(VHOST_DATA, "%s", packet); \
+} while (0)
+#else
+#define LOG_LEVEL RTE_LOG_INFO
+#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
+#define PRINT_PACKET(device, addr, size, header) do {} while (0)
+#endif
+
+
+/*
+ * Structure used to identify device context.
+ */
+struct vhost_device_ctx {
+	pid_t		pid;	/* PID of process calling the IOCTL. */
+	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
+};
+
+/*
+ * Structure contains function pointers to be defined in virtio-net.c. These
+ * functions are called in CUSE context and are used to configure devices.
+ */
+struct vhost_net_device_ops {
+	int (*new_device)(struct vhost_device_ctx);
+	void (*destroy_device)(struct vhost_device_ctx);
+
+	int (*get_features)(struct vhost_device_ctx, uint64_t *);
+	int (*set_features)(struct vhost_device_ctx, uint64_t *);
+
+	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
+
+	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
+	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
+
+	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
+	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_owner)(struct vhost_device_ctx);
+	int (*reset_owner)(struct vhost_device_ctx);
+};
+
+
+struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 84ec0e8..dad9db9 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -38,7 +38,7 @@
 #include <rte_memcpy.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define MAX_PKT_BURST 32
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 9155a68..1dee1d8 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -49,7 +49,7 @@
 #include <rte_memory.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 #include "eventfd_link/eventfd_link.h"
 
 /**
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 2/7] lib/librte_vhost: Add an abstraction layer for vhost backends Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-07 20:43   ` Xie, Huawei
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 4/7] lib/librte_vhost: Move vhost vhost-cuse device list and accessor functions Tetsuya Mukawa
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

This patch adds an abstraction layer to interpret messages from QEMU.
This abstraction layer is needed because there are differences in
message formats between vhost-cuse and vhost-user.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/vhost-net-cdev.c  |   2 +-
 lib/librte_vhost/vhost-net.h       |   3 +-
 lib/librte_vhost/virtio-net-cdev.c | 492 +++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/virtio-net.c      | 484 ++----------------------------------
 4 files changed, 517 insertions(+), 464 deletions(-)
 create mode 100644 lib/librte_vhost/virtio-net-cdev.c

diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
index 83e1d14..12d0f68 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -342,7 +342,7 @@ vhost_cuse_driver_register(struct vhost_driver *drv)
 	cuse_info.dev_info_argv = device_argv;
 	cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
 
-	ops = get_virtio_net_callbacks();
+	ops = get_virtio_net_callbacks(drv->type);
 
 	drv->session = cuse_lowlevel_setup(3, fuse_argv,
 				&cuse_info, &vhost_net_ops, 0, NULL);
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 03a5c57..09a99ce 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -109,5 +109,6 @@ struct vhost_net_device_ops {
 };
 
 
-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+struct vhost_net_device_ops const *get_virtio_net_callbacks(
+		vhost_driver_type_t type);
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/virtio-net-cdev.c b/lib/librte_vhost/virtio-net-cdev.c
new file mode 100644
index 0000000..f225bf5
--- /dev/null
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -0,0 +1,492 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2014 IGEL Co.,Ltd. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <dirent.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include <rte_virtio_net.h>
+
+#include "vhost-net.h"
+#include "eventfd_link/eventfd_link.h"
+
+const char eventfd_cdev[] = "/dev/eventfd-link";
+
+/* Line size for reading maps file. */
+const uint32_t BUFSIZE = PATH_MAX;
+
+/* Size of prot char array in procmap. */
+#define PROT_SZ 5
+
+/* Number of elements in procmap struct. */
+#define PROCMAP_SZ 8
+
+/* Structure containing information gathered from maps file. */
+struct procmap {
+	uint64_t	va_start;	/* Start virtual address in file. */
+	uint64_t	va_end;		/* End virtual address in file. */
+	uint64_t	len;		/* Size of file. */
+	uint64_t	pgoff;		/* Not used. */
+	uint32_t	maj;		/* Not used. */
+	uint32_t	min;		/* Not used. */
+	uint32_t	ino;		/* Not used. */
+	char		prot[PROT_SZ];	/* Not used. */
+	char		fname[PATH_MAX];/* File name. */
+};
+
+/*
+ * Locate the file containing QEMU's memory space and map it to our address space.
+ */
+static int
+host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
+	pid_t pid, uint64_t addr)
+{
+	struct dirent *dptr = NULL;
+	struct procmap procmap;
+	DIR *dp = NULL;
+	int fd;
+	int i;
+	char memfile[PATH_MAX];
+	char mapfile[PATH_MAX];
+	char procdir[PATH_MAX];
+	char resolved_path[PATH_MAX];
+	char *path = NULL;
+	FILE		*fmap;
+	void		*map;
+	uint8_t		found = 0;
+	char		line[BUFSIZE];
+	char dlm[] = "-   :   ";
+	char *str, *sp, *in[PROCMAP_SZ];
+	char *end = NULL;
+
+	/* Path where mem files are located. */
+	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
+	/* Maps file used to locate mem file. */
+	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
+
+	fmap = fopen(mapfile, "r");
+	if (fmap == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to open maps file for pid %d\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Read through maps file until we find out base_address. */
+	while (fgets(line, BUFSIZE, fmap) != 0) {
+		str = line;
+		errno = 0;
+		/* Split line in to fields. */
+		for (i = 0; i < PROCMAP_SZ; i++) {
+			in[i] = strtok_r(str, &dlm[i], &sp);
+			if ((in[i] == NULL) || (errno != 0)) {
+				fclose(fmap);
+				return -1;
+			}
+			str = NULL;
+		}
+
+		/* Convert/Copy each field as needed. */
+		procmap.va_start = strtoull(in[0], &end, 16);
+		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.va_end = strtoull(in[1], &end, 16);
+		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.pgoff = strtoull(in[3], &end, 16);
+		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.maj = strtoul(in[4], &end, 16);
+		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.min = strtoul(in[5], &end, 16);
+		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.ino = strtoul(in[6], &end, 16);
+		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		memcpy(&procmap.prot, in[2], PROT_SZ);
+		memcpy(&procmap.fname, in[7], PATH_MAX);
+
+		if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) {
+			procmap.len = procmap.va_end - procmap.va_start;
+			found = 1;
+			break;
+		}
+	}
+	fclose(fmap);
+
+	if (!found) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file in pid %d maps file\n", dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Find the guest memory file among the process fds. */
+	dp = opendir(procdir);
+	if (dp == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Cannot open pid %d process directory\n", dev->device_fh, pid);
+		return -1;
+
+	}
+
+	found = 0;
+
+	/* Read the fd directory contents. */
+	while (NULL != (dptr = readdir(dp))) {
+		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
+				pid, dptr->d_name);
+		path = realpath(memfile, resolved_path);
+		if (path == NULL) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"(%"PRIu64") Failed to resolve fd directory\n",
+				dev->device_fh);
+			closedir(dp);
+			return -1;
+		}
+		if (strncmp(resolved_path, procmap.fname,
+			strnlen(procmap.fname, PATH_MAX)) == 0) {
+			found = 1;
+			break;
+		}
+	}
+
+	closedir(dp);
+
+	if (found == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file for pid %d\n", dev->device_fh, pid);
+		return -1;
+	}
+	/* Open the shared memory file and map the memory into this process. */
+	fd = open(memfile, O_RDWR);
+
+	if (fd == -1) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to open %s for pid %d\n", dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE ,
+			MAP_POPULATE|MAP_SHARED, fd, 0);
+	close(fd);
+
+	if (map == MAP_FAILED) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the file %s for pid %d\n",  dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	/* Store the memory address and size in the device data structure */
+	mem->mapped_address = (uint64_t)(uintptr_t)map;
+	mem->mapped_size = procmap.len;
+
+	LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", dev->device_fh,
+		memfile, resolved_path, (long long unsigned)mem->mapped_size, map);
+
+	return 0;
+}
+
+/*
+ * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE
+ * This function creates and populates the memory structure for the device. This includes
+ * storing offsets used to translate buffer addresses.
+ */
+static int
+cuse_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
+	uint32_t nregions)
+{
+	struct virtio_net *dev;
+	struct vhost_memory_region *mem_regions;
+	struct virtio_memory *mem;
+	uint64_t size = offsetof(struct vhost_memory, regions);
+	uint32_t regionidx, valid_regions;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem) {
+		munmap((void *)(uintptr_t)dev->mem->mapped_address,
+			(size_t)dev->mem->mapped_size);
+		free(dev->mem);
+	}
+
+	/* Malloc the memory structure depending on the number of regions. */
+	mem = calloc(1, sizeof(struct virtio_memory) +
+		(sizeof(struct virtio_memory_regions) * nregions));
+	if (mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem.\n", dev->device_fh);
+		return -1;
+	}
+
+	mem->nregions = nregions;
+
+	mem_regions = (void *)(uintptr_t)
+			((uint64_t)(uintptr_t)mem_regions_addr + size);
+
+	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
+		/* Populate the region structure for each region. */
+		mem->regions[regionidx].guest_phys_address =
+			mem_regions[regionidx].guest_phys_addr;
+		mem->regions[regionidx].guest_phys_address_end =
+			mem->regions[regionidx].guest_phys_address +
+			mem_regions[regionidx].memory_size;
+		mem->regions[regionidx].memory_size =
+			mem_regions[regionidx].memory_size;
+		mem->regions[regionidx].userspace_address =
+			mem_regions[regionidx].userspace_addr;
+
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
+				regionidx, (void *)(uintptr_t)mem->regions[regionidx].guest_phys_address,
+				(void *)(uintptr_t)mem->regions[regionidx].userspace_address,
+				mem->regions[regionidx].memory_size);
+
+		/*set the base address mapping*/
+		if (mem->regions[regionidx].guest_phys_address == 0x0) {
+			mem->base_address = mem->regions[regionidx].userspace_address;
+			/* Map VM memory file */
+			if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
+				free(mem);
+				return -1;
+			}
+		}
+	}
+
+	/* Check that we have a valid base address. */
+	if (mem->base_address == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh);
+		free(mem);
+		return -1;
+	}
+
+	/* Check if all of our regions have valid mappings. Usually one does not exist in the QEMU memory file. */
+	valid_regions = mem->nregions;
+	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
+		if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
+			(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size)))
+				valid_regions--;
+	}
+
+	/* If a region does not have a valid mapping we rebuild our memory struct to contain only valid entries. */
+	if (valid_regions != mem->nregions) {
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n",
+			dev->device_fh);
+
+		/* Re-populate the memory structure with only valid regions. Invalid regions are over-written with memmove. */
+		valid_regions = 0;
+
+		for (regionidx = mem->nregions; 0 != regionidx--;) {
+			if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
+					(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) {
+				memmove(&mem->regions[regionidx], &mem->regions[regionidx + 1],
+					sizeof(struct virtio_memory_regions) * valid_regions);
+			} else {
+				valid_regions++;
+			}
+		}
+	}
+	mem->nregions = valid_regions;
+	dev->mem = mem;
+
+	/*
+	 * Calculate the address offset for each region. This offset is used to identify the vhost virtual address
+	 * corresponding to a QEMU guest physical address.
+	 */
+	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
+		dev->mem->regions[regionidx].address_offset = dev->mem->regions[regionidx].userspace_address - dev->mem->base_address
+			+ dev->mem->mapped_address - dev->mem->regions[regionidx].guest_phys_address;
+
+	}
+	return 0;
+}
+
+/*
+ * Called from CUSE IOCTL: VHOST_GET_VRING_BASE
+ * We send the virtio device our available ring last used index.
+ */
+static int
+cuse_get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
+	struct vhost_vring_state *state)
+{
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	state->index = index;
+	/* State->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	state->num = dev->virtqueue[state->index]->last_used_idx;
+
+	return 0;
+}
+
+/*
+ * This function uses the eventfd_link kernel module to copy an eventfd file descriptor
+ * provided by QEMU in to our process space.
+ */
+static int
+eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy)
+{
+	int eventfd_link, ret;
+
+	/* Open the character device to the kernel module. */
+	eventfd_link = open(eventfd_cdev, O_RDWR);
+	if (eventfd_link < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") eventfd_link module is not loaded\n",  dev->device_fh);
+		return -1;
+	}
+
+	/* Call the IOCTL to copy the eventfd. */
+	ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy);
+	close(eventfd_link);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") EVENTFD_COPY ioctl failed\n",  dev->device_fh);
+		return -1;
+	}
+
+
+	return 0;
+}
+
+/*
+ * Called from CUSE IOCTL: VHOST_SET_VRING_CALL
+ * The virtio device sends an eventfd to interrupt the guest. This fd gets copied in
+ * to our process space.
+ */
+static int
+cuse_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+	struct eventfd_copy     eventfd_kick;
+	struct vhost_virtqueue *vq;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	vq = dev->virtqueue[file->index];
+
+	if (vq->kickfd)
+		close((int)vq->kickfd);
+
+	/* Populate the eventfd_copy structure and call eventfd_copy. */
+	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+	eventfd_kick.source_fd = vq->kickfd;
+	eventfd_kick.target_fd = file->fd;
+	eventfd_kick.target_pid = ctx.pid;
+
+	if (eventfd_copy(dev, &eventfd_kick))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * Called from CUSE IOCTL: VHOST_SET_VRING_KICK
+ * The virtio device sends an eventfd that it can use to notify us. This fd gets copied in
+ * to our process space.
+ */
+static int
+cuse_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+	struct eventfd_copy eventfd_call;
+	struct vhost_virtqueue *vq;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	vq = dev->virtqueue[file->index];
+
+	if (vq->callfd)
+		close((int)vq->callfd);
+
+	/* Populate the eventfd_copy structure and call eventfd_copy. */
+	vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+	eventfd_call.source_fd = vq->callfd;
+	eventfd_call.target_fd = file->fd;
+	eventfd_call.target_pid = ctx.pid;
+
+	if (eventfd_copy(dev, &eventfd_call))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * Function pointers are set for the device operations to allow CUSE to call functions
+ * when an IOCTL, device_add or device_release is received.
+ */
+static const struct vhost_net_device_ops vhost_cuse_device_ops = {
+	.new_device = new_device,
+	.destroy_device = destroy_device,
+
+	.get_features = get_features,
+	.set_features = set_features,
+
+	.set_mem_table = cuse_set_mem_table,
+
+	.set_vring_num = set_vring_num,
+	.set_vring_addr = set_vring_addr,
+	.set_vring_base = set_vring_base,
+	.get_vring_base = cuse_get_vring_base,
+
+	.set_vring_kick = cuse_set_vring_kick,
+	.set_vring_call = cuse_set_vring_call,
+
+	.set_backend = set_backend,
+
+	.set_owner = set_owner,
+	.reset_owner = reset_owner,
+};
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 1dee1d8..985c66b 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -50,7 +50,6 @@
 #include <rte_virtio_net.h>
 
 #include "vhost-net.h"
-#include "eventfd_link/eventfd_link.h"
 
 /**
  * Device linked list structure for configuration.
@@ -60,8 +59,6 @@ struct virtio_net_config_ll {
 	struct virtio_net_config_ll	*next;	/* Next entry on linked list.*/
 };
 
-const char eventfd_cdev[] = "/dev/eventfd-link";
-
 /* device ops to add/remove device to data core. */
 static struct virtio_net_device_ops const *notify_ops;
 /* Root address of the linked list in the configuration core. */
@@ -71,28 +68,6 @@ static struct virtio_net_config_ll	*ll_root;
 #define VHOST_SUPPORTED_FEATURES (1ULL << VIRTIO_NET_F_MRG_RXBUF)
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
-/* Line size for reading maps file. */
-const uint32_t BUFSIZE = PATH_MAX;
-
-/* Size of prot char array in procmap. */
-#define PROT_SZ 5
-
-/* Number of elements in procmap struct. */
-#define PROCMAP_SZ 8
-
-/* Structure containing information gathered from maps file. */
-struct procmap {
-	uint64_t	va_start;	/* Start virtual address in file. */
-	uint64_t	va_end;		/* End virtual address in file. */
-	uint64_t	len;		/* Size of file. */
-	uint64_t	pgoff;		/* Not used. */
-	uint32_t	maj;		/* Not used. */
-	uint32_t	min;		/* Not used. */
-	uint32_t	ino;		/* Not used. */
-	char		prot[PROT_SZ];	/* Not used. */
-	char		fname[PATH_MAX];/* File name. */
-};
-
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is used
  * to convert the ring addresses to our address space.
@@ -119,173 +94,6 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 }
 
 /*
- * Locate the file containing QEMU's memory space and map it to our address space.
- */
-static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
-	pid_t pid, uint64_t addr)
-{
-	struct dirent *dptr = NULL;
-	struct procmap procmap;
-	DIR *dp = NULL;
-	int fd;
-	int i;
-	char memfile[PATH_MAX];
-	char mapfile[PATH_MAX];
-	char procdir[PATH_MAX];
-	char resolved_path[PATH_MAX];
-	char *path = NULL;
-	FILE		*fmap;
-	void		*map;
-	uint8_t		found = 0;
-	char		line[BUFSIZE];
-	char dlm[] = "-   :   ";
-	char *str, *sp, *in[PROCMAP_SZ];
-	char *end = NULL;
-
-	/* Path where mem files are located. */
-	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
-	/* Maps file used to locate mem file. */
-	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
-
-	fmap = fopen(mapfile, "r");
-	if (fmap == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open maps file for pid %d\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Read through maps file until we find out base_address. */
-	while (fgets(line, BUFSIZE, fmap) != 0) {
-		str = line;
-		errno = 0;
-		/* Split line in to fields. */
-		for (i = 0; i < PROCMAP_SZ; i++) {
-			in[i] = strtok_r(str, &dlm[i], &sp);
-			if ((in[i] == NULL) || (errno != 0)) {
-				fclose(fmap);
-				return -1;
-			}
-			str = NULL;
-		}
-
-		/* Convert/Copy each field as needed. */
-		procmap.va_start = strtoull(in[0], &end, 16);
-		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.va_end = strtoull(in[1], &end, 16);
-		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.pgoff = strtoull(in[3], &end, 16);
-		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.maj = strtoul(in[4], &end, 16);
-		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.min = strtoul(in[5], &end, 16);
-		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.ino = strtoul(in[6], &end, 16);
-		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		memcpy(&procmap.prot, in[2], PROT_SZ);
-		memcpy(&procmap.fname, in[7], PATH_MAX);
-
-		if ((procmap.va_start <= addr) && (procmap.va_end >= addr)) {
-			procmap.len = procmap.va_end - procmap.va_start;
-			found = 1;
-			break;
-		}
-	}
-	fclose(fmap);
-
-	if (!found) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file in pid %d maps file\n", dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Find the guest memory file among the process fds. */
-	dp = opendir(procdir);
-	if (dp == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Cannot open pid %d process directory\n", dev->device_fh, pid);
-		return -1;
-
-	}
-
-	found = 0;
-
-	/* Read the fd directory contents. */
-	while (NULL != (dptr = readdir(dp))) {
-		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
-				pid, dptr->d_name);
-		path = realpath(memfile, resolved_path);
-		if (path == NULL) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"(%"PRIu64") Failed to resolve fd directory\n",
-				dev->device_fh);
-			closedir(dp);
-			return -1;
-		}
-		if (strncmp(resolved_path, procmap.fname,
-			strnlen(procmap.fname, PATH_MAX)) == 0) {
-			found = 1;
-			break;
-		}
-	}
-
-	closedir(dp);
-
-	if (found == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find memory file for pid %d\n", dev->device_fh, pid);
-		return -1;
-	}
-	/* Open the shared memory file and map the memory into this process. */
-	fd = open(memfile, O_RDWR);
-
-	if (fd == -1) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to open %s for pid %d\n", dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE ,
-			MAP_POPULATE|MAP_SHARED, fd, 0);
-	close(fd);
-
-	if (map == MAP_FAILED) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the file %s for pid %d\n",  dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	/* Store the memory address and size in the device data structure */
-	mem->mapped_address = (uint64_t)(uintptr_t)map;
-	mem->mapped_size = procmap.len;
-
-	LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n", dev->device_fh,
-		memfile, resolved_path, (long long unsigned)mem->mapped_size, map);
-
-	return 0;
-}
-
-/*
  * Retrieves an entry from the devices configuration linked list.
  */
 static struct virtio_net_config_ll *
@@ -439,7 +247,7 @@ init_device(struct virtio_net *dev)
 }
 
 /*
- * Function is called from the CUSE open function. The device structure is
+ * Function is called from the open function. The device structure is
  * initialised and a new entry is added to the device configuration linked
  * list.
  */
@@ -492,7 +300,7 @@ new_device(struct vhost_device_ctx ctx)
 }
 
 /*
- * Function is called from the CUSE release function. This function will cleanup
+ * Function is called from the release function. This function will cleanup
  * the device and remove it from device configuration linked list.
  */
 static void
@@ -521,7 +329,7 @@ destroy_device(struct vhost_device_ctx ctx)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_SET_OWNER
+ * Called from IOCTL: VHOST_SET_OWNER
  * This function just returns success at the moment unless the device hasn't been initialised.
  */
 static int
@@ -537,7 +345,7 @@ set_owner(struct vhost_device_ctx ctx)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_RESET_OWNER
+ * Called from IOCTL: VHOST_RESET_OWNER
  */
 static int
 reset_owner(struct vhost_device_ctx ctx)
@@ -553,7 +361,7 @@ reset_owner(struct vhost_device_ctx ctx)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_GET_FEATURES
+ * Called from IOCTL: VHOST_GET_FEATURES
  * The features that we support are requested.
  */
 static int
@@ -571,7 +379,7 @@ get_features(struct vhost_device_ctx ctx, uint64_t *pu)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_SET_FEATURES
+ * Called from IOCTL: VHOST_SET_FEATURES
  * We receive the negotiated set of features supported by us and the virtio device.
  */
 static int
@@ -605,123 +413,8 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	return 0;
 }
 
-
-/*
- * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE
- * This function creates and populates the memory structure for the device. This includes
- * storing offsets used to translate buffer addresses.
- */
-static int
-set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
-	uint32_t nregions)
-{
-	struct virtio_net *dev;
-	struct vhost_memory_region *mem_regions;
-	struct virtio_memory *mem;
-	uint64_t size = offsetof(struct vhost_memory, regions);
-	uint32_t regionidx, valid_regions;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	if (dev->mem) {
-		munmap((void *)(uintptr_t)dev->mem->mapped_address,
-			(size_t)dev->mem->mapped_size);
-		free(dev->mem);
-	}
-
-	/* Malloc the memory structure depending on the number of regions. */
-	mem = calloc(1, sizeof(struct virtio_memory) +
-		(sizeof(struct virtio_memory_regions) * nregions));
-	if (mem == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem.\n", dev->device_fh);
-		return -1;
-	}
-
-	mem->nregions = nregions;
-
-	mem_regions = (void *)(uintptr_t)
-			((uint64_t)(uintptr_t)mem_regions_addr + size);
-
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		/* Populate the region structure for each region. */
-		mem->regions[regionidx].guest_phys_address =
-			mem_regions[regionidx].guest_phys_addr;
-		mem->regions[regionidx].guest_phys_address_end =
-			mem->regions[regionidx].guest_phys_address +
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].memory_size =
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].userspace_address =
-			mem_regions[regionidx].userspace_addr;
-
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
-				regionidx, (void *)(uintptr_t)mem->regions[regionidx].guest_phys_address,
-				(void *)(uintptr_t)mem->regions[regionidx].userspace_address,
-				mem->regions[regionidx].memory_size);
-
-		/*set the base address mapping*/
-		if (mem->regions[regionidx].guest_phys_address == 0x0) {
-			mem->base_address = mem->regions[regionidx].userspace_address;
-			/* Map VM memory file */
-			if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
-				free(mem);
-				return -1;
-			}
-		}
-	}
-
-	/* Check that we have a valid base address. */
-	if (mem->base_address == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh);
-		free(mem);
-		return -1;
-	}
-
-	/* Check if all of our regions have valid mappings. Usually one does not exist in the QEMU memory file. */
-	valid_regions = mem->nregions;
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
-			(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size)))
-				valid_regions--;
-	}
-
-	/* If a region does not have a valid mapping we rebuild our memory struct to contain only valid entries. */
-	if (valid_regions != mem->nregions) {
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n",
-			dev->device_fh);
-
-		/* Re-populate the memory structure with only valid regions. Invalid regions are over-written with memmove. */
-		valid_regions = 0;
-
-		for (regionidx = mem->nregions; 0 != regionidx--;) {
-			if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
-					(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) {
-				memmove(&mem->regions[regionidx], &mem->regions[regionidx + 1],
-					sizeof(struct virtio_memory_regions) * valid_regions);
-			} else {
-				valid_regions++;
-			}
-		}
-	}
-	mem->nregions = valid_regions;
-	dev->mem = mem;
-
-	/*
-	 * Calculate the address offset for each region. This offset is used to identify the vhost virtual address
-	 * corresponding to a QEMU guest physical address.
-	 */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		dev->mem->regions[regionidx].address_offset = dev->mem->regions[regionidx].userspace_address - dev->mem->base_address
-			+ dev->mem->mapped_address - dev->mem->regions[regionidx].guest_phys_address;
-
-	}
-	return 0;
-}
-
 /*
- * Called from CUSE IOCTL: VHOST_SET_VRING_NUM
+ * Called from IOCTL: VHOST_SET_VRING_NUM
  * The virtio device sends us the size of the descriptor ring.
  */
 static int
@@ -740,7 +433,7 @@ set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state *state)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_SET_VRING_ADDR
+ * Called from IOCTL: VHOST_SET_VRING_ADDR
  * The virtio device sends us the desc, used and avail ring addresses. This function
  * then converts these to our address space.
  */
@@ -784,7 +477,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_SET_VRING_BASE
+ * Called from IOCTL: VHOST_SET_VRING_BASE
  * The virtio device sends us the available ring last used index.
  */
 static int
@@ -804,125 +497,7 @@ set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state *state)
 }
 
 /*
- * Called from CUSE IOCTL: VHOST_GET_VRING_BASE
- * We send the virtio device our available ring last used index.
- */
-static int
-get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
-	struct vhost_vring_state *state)
-{
-	struct virtio_net *dev;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	state->index = index;
-	/* State->index refers to the queue index. The TX queue is 1, RX queue is 0. */
-	state->num = dev->virtqueue[state->index]->last_used_idx;
-
-	return 0;
-}
-
-/*
- * This function uses the eventfd_link kernel module to copy an eventfd file descriptor
- * provided by QEMU in to our process space.
- */
-static int
-eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy)
-{
-	int eventfd_link, ret;
-
-	/* Open the character device to the kernel module. */
-	eventfd_link = open(eventfd_cdev, O_RDWR);
-	if (eventfd_link < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") eventfd_link module is not loaded\n",  dev->device_fh);
-		return -1;
-	}
-
-	/* Call the IOCTL to copy the eventfd. */
-	ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy);
-	close(eventfd_link);
-
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") EVENTFD_COPY ioctl failed\n",  dev->device_fh);
-		return -1;
-	}
-
-
-	return 0;
-}
-
-/*
- * Called from CUSE IOCTL: VHOST_SET_VRING_CALL
- * The virtio device sends an eventfd to interrupt the guest. This fd gets copied in
- * to our process space.
- */
-static int
-set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
-{
-	struct virtio_net *dev;
-	struct eventfd_copy	eventfd_kick;
-	struct vhost_virtqueue *vq;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
-	vq = dev->virtqueue[file->index];
-
-	if (vq->kickfd)
-		close((int)vq->kickfd);
-
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_kick.source_fd = vq->kickfd;
-	eventfd_kick.target_fd = file->fd;
-	eventfd_kick.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_kick))
-		return -1;
-
-	return 0;
-}
-
-/*
- * Called from CUSE IOCTL: VHOST_SET_VRING_KICK
- * The virtio device sends an eventfd that it can use to notify us. This fd gets copied in
- * to our process space.
- */
-static int
-set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
-{
-	struct virtio_net *dev;
-	struct eventfd_copy eventfd_call;
-	struct vhost_virtqueue *vq;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
-	vq = dev->virtqueue[file->index];
-
-	if (vq->callfd)
-		close((int)vq->callfd);
-
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_call.source_fd = vq->callfd;
-	eventfd_call.target_fd = file->fd;
-	eventfd_call.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_call))
-		return -1;
-
-	return 0;
-}
-
-/*
- * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND
+ * Called from IOCTL: VHOST_NET_SET_BACKEND
  * To complete device initialisation when the virtio driver is loaded we are provided with a
  * valid fd for a tap device (not used by us). If this happens then we can add the device to a
  * data core. When the virtio driver is removed we get fd=-1. At that point we remove the device
@@ -953,39 +528,24 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 }
 
 /*
- * Function pointers are set for the device operations to allow CUSE to call functions
- * when an IOCTL, device_add or device_release is received.
+ * Include cuse depend functions and definitions.
  */
-static const struct vhost_net_device_ops vhost_device_ops = {
-	.new_device = new_device,
-	.destroy_device = destroy_device,
-
-	.get_features = get_features,
-	.set_features = set_features,
-
-	.set_mem_table = set_mem_table,
-
-	.set_vring_num = set_vring_num,
-	.set_vring_addr = set_vring_addr,
-	.set_vring_base = set_vring_base,
-	.get_vring_base = get_vring_base,
-
-	.set_vring_kick = set_vring_kick,
-	.set_vring_call = set_vring_call,
-
-	.set_backend = set_backend,
-
-	.set_owner = set_owner,
-	.reset_owner = reset_owner,
-};
+#include "virtio-net-cdev.c"
 
 /*
- * Called by main to setup callbacks when registering CUSE device.
+ * Called by main to setup callbacks when registering device.
  */
 struct vhost_net_device_ops const *
-get_virtio_net_callbacks(void)
+get_virtio_net_callbacks(vhost_driver_type_t type)
 {
-	return &vhost_device_ops;
+	switch (type) {
+	case VHOST_DRV_CUSE:
+		return &vhost_cuse_device_ops;
+	default:
+		break;
+	}
+
+	return NULL;
 }
 
 int rte_vhost_enable_guest_notification(struct virtio_net *dev,
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 4/7] lib/librte_vhost: Move vhost vhost-cuse device list and accessor functions
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
                   ` (2 preceding siblings ...)
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 5/7] lib/librte_vhost: Add a vhost session abstraction Tetsuya Mukawa
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

vhost-cuse and vhost-user should have a independent device list.
This patch moves vhost-cuse device list and list accessor functions
to 'virtio-net-cdev.c'.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/vhost-net-cdev.c  |   1 +
 lib/librte_vhost/vhost-net.h       |   5 +-
 lib/librte_vhost/virtio-net-cdev.c | 133 +++++++++++++++++++++++++++++++++++--
 lib/librte_vhost/virtio-net.c      | 121 ++++++++++++++-------------------
 4 files changed, 181 insertions(+), 79 deletions(-)

diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
index 12d0f68..090c6fc 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -66,6 +66,7 @@ fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
 	struct vhost_device_ctx ctx;
 	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
 
+	ctx.type = VHOST_DRV_CUSE;
 	ctx.pid = req_ctx->pid;
 	ctx.fh = fi->fh;
 
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 09a99ce..64873d0 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -77,8 +77,9 @@
  * Structure used to identify device context.
  */
 struct vhost_device_ctx {
-	pid_t		pid;	/* PID of process calling the IOCTL. */
-	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
+	vhost_driver_type_t	type;	/* driver type. */
+	pid_t			pid;	/* PID of process calling the IOCTL. */
+	uint64_t		fh;	/* Populated with fi->fh to track the device index. */
 };
 
 /*
diff --git a/lib/librte_vhost/virtio-net-cdev.c b/lib/librte_vhost/virtio-net-cdev.c
index f225bf5..70bc578 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -41,6 +41,24 @@
 #include "vhost-net.h"
 #include "eventfd_link/eventfd_link.h"
 
+/* Functions defined in virtio_net.c */
+static void init_device(struct virtio_net *dev);
+static void cleanup_device(struct virtio_net *dev);
+static void free_device(struct virtio_net_config_ll *ll_dev);
+static int new_device(struct vhost_device_ctx ctx);
+static void destroy_device(struct vhost_device_ctx ctx);
+static int set_owner(struct vhost_device_ctx ctx);
+static int reset_owner(struct vhost_device_ctx ctx);
+static int get_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state *state);
+static int set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr);
+static int set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state *state);
+static int set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file);
+
+/* Root address of the linked list in the configuration core. */
+static struct virtio_net_config_ll *cdev_ll_root;
+
 const char eventfd_cdev[] = "/dev/eventfd-link";
 
 /* Line size for reading maps file. */
@@ -65,11 +83,114 @@ struct procmap {
 	char		fname[PATH_MAX];/* File name. */
 };
 
+/**
+ * Retrieves an entry from the devices configuration linked list.
+ */
+static struct virtio_net_config_ll *
+cdev_get_config_ll_entry(struct vhost_device_ctx ctx)
+{
+	struct virtio_net_config_ll *ll_dev = cdev_ll_root;
+
+	/* Loop through linked list until the device_fh is found. */
+	while (ll_dev != NULL) {
+		if (ll_dev->dev.device_fh == ctx.fh)
+			return ll_dev;
+		ll_dev = ll_dev->next;
+	}
+
+	return NULL;
+}
+
+/**
+ * Searches the configuration core linked list and retrieves the device if it exists.
+ */
+static struct virtio_net *
+cdev_get_device(struct vhost_device_ctx ctx)
+{
+	struct virtio_net_config_ll *ll_dev;
+
+	ll_dev = cdev_get_config_ll_entry(ctx);
+
+	/* If a matching entry is found in the linked list, return the device in that entry. */
+	if (ll_dev)
+		return &ll_dev->dev;
+
+	RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Device not found in linked list.\n", ctx.fh);
+	return NULL;
+}
+
+/**
+ * Add entry containing a device to the device configuration linked list.
+ */
+static void
+cdev_add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
+{
+	struct virtio_net_config_ll *ll_dev = cdev_ll_root;
+
+	/* If ll_dev == NULL then this is the first device so go to else */
+	if (ll_dev) {
+		/* If the 1st device_fh != 0 then we insert our device here. */
+		if (ll_dev->dev.device_fh != 0) {
+			new_ll_dev->dev.device_fh = 0;
+			new_ll_dev->next = ll_dev;
+			cdev_ll_root = new_ll_dev;
+		} else {
+			/* Increment through the ll until we find un unused device_fh. Insert the device at that entry*/
+			while ((ll_dev->next != NULL) && (ll_dev->dev.device_fh == (ll_dev->next->dev.device_fh - 1)))
+				ll_dev = ll_dev->next;
+
+			new_ll_dev->dev.device_fh = ll_dev->dev.device_fh + 1;
+			new_ll_dev->next = ll_dev->next;
+			ll_dev->next = new_ll_dev;
+		}
+	} else {
+		cdev_ll_root = new_ll_dev;
+		cdev_ll_root->dev.device_fh = 0;
+	}
+
+}
+
+/**
+ * Remove an entry from the device configuration linked list.
+ */
+static struct virtio_net_config_ll *
+cdev_rm_config_ll_entry(struct virtio_net_config_ll *ll_dev, struct virtio_net_config_ll *ll_dev_last)
+{
+	/* First remove the device and then clean it up. */
+	if (ll_dev == cdev_ll_root) {
+		cdev_ll_root = ll_dev->next;
+		cleanup_device(&ll_dev->dev);
+		free_device(ll_dev);
+		return cdev_ll_root;
+	} else {
+		if (likely(ll_dev_last != NULL)) {
+			ll_dev_last->next = ll_dev->next;
+			cleanup_device(&ll_dev->dev);
+			free_device(ll_dev);
+			return ll_dev_last->next;
+		} else {
+			cleanup_device(&ll_dev->dev);
+			free_device(ll_dev);
+			RTE_LOG(ERR, VHOST_CONFIG, "Remove entry from config_ll failed\n");
+			return NULL;
+		}
+	}
+}
+
+/**
+ * Returns the root entry of linked list
+ */
+static struct virtio_net_config_ll *
+cdev_get_config_ll_root(void)
+{
+	return cdev_ll_root;
+}
+
 /*
  * Locate the file containing QEMU's memory space and map it to our address space.
  */
 static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
+cdev_host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 	pid_t pid, uint64_t addr)
 {
 	struct dirent *dptr = NULL;
@@ -247,7 +368,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
 	uint64_t size = offsetof(struct vhost_memory, regions);
 	uint32_t regionidx, valid_regions;
 
-	dev = get_device(ctx);
+	dev = cdev_get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
@@ -291,7 +412,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
 		if (mem->regions[regionidx].guest_phys_address == 0x0) {
 			mem->base_address = mem->regions[regionidx].userspace_address;
 			/* Map VM memory file */
-			if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
+			if (cdev_host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
 				free(mem);
 				return -1;
 			}
@@ -356,7 +477,7 @@ cuse_get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
 {
 	struct virtio_net *dev;
 
-	dev = get_device(ctx);
+	dev = cdev_get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
@@ -408,7 +529,7 @@ cuse_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	struct eventfd_copy     eventfd_kick;
 	struct vhost_virtqueue *vq;
 
-	dev = get_device(ctx);
+	dev = cdev_get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
@@ -442,7 +563,7 @@ cuse_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	struct eventfd_copy eventfd_call;
 	struct vhost_virtqueue *vq;
 
-	dev = get_device(ctx);
+	dev = cdev_get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 985c66b..603bb09 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -61,8 +61,6 @@ struct virtio_net_config_ll {
 
 /* device ops to add/remove device to data core. */
 static struct virtio_net_device_ops const *notify_ops;
-/* Root address of the linked list in the configuration core. */
-static struct virtio_net_config_ll	*ll_root;
 
 /* Features supported by this application. RX merge buffers are enabled by default. */
 #define VHOST_SUPPORTED_FEATURES (1ULL << VIRTIO_NET_F_MRG_RXBUF)
@@ -93,21 +91,23 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 	return vhost_va;
 }
 
+/**
+ * Include cuse depend functions and definitions.
+ */
+#include "virtio-net-cdev.c"
+
 /*
  * Retrieves an entry from the devices configuration linked list.
  */
 static struct virtio_net_config_ll *
 get_config_ll_entry(struct vhost_device_ctx ctx)
 {
-	struct virtio_net_config_ll *ll_dev = ll_root;
-
-	/* Loop through linked list until the device_fh is found. */
-	while (ll_dev != NULL) {
-		if (ll_dev->dev.device_fh == ctx.fh)
-			return ll_dev;
-		ll_dev = ll_dev->next;
+	switch (ctx.type) {
+	case VHOST_DRV_CUSE:
+		return cdev_get_config_ll_entry(ctx);
+	default:
+		break;
 	}
-
 	return NULL;
 }
 
@@ -117,15 +117,12 @@ get_config_ll_entry(struct vhost_device_ctx ctx)
 static struct virtio_net *
 get_device(struct vhost_device_ctx ctx)
 {
-	struct virtio_net_config_ll *ll_dev;
-
-	ll_dev = get_config_ll_entry(ctx);
-
-	/* If a matching entry is found in the linked list, return the device in that entry. */
-	if (ll_dev)
-		return &ll_dev->dev;
-
-	RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Device not found in linked list.\n", ctx.fh);
+	switch (ctx.type) {
+	case VHOST_DRV_CUSE:
+		return cdev_get_device(ctx);
+	default:
+		break;
+	}
 	return NULL;
 }
 
@@ -133,31 +130,15 @@ get_device(struct vhost_device_ctx ctx)
  * Add entry containing a device to the device configuration linked list.
  */
 static void
-add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
+add_config_ll_entry(vhost_driver_type_t type,
+		struct virtio_net_config_ll *new_ll_dev)
 {
-	struct virtio_net_config_ll *ll_dev = ll_root;
-
-	/* If ll_dev == NULL then this is the first device so go to else */
-	if (ll_dev) {
-		/* If the 1st device_fh != 0 then we insert our device here. */
-		if (ll_dev->dev.device_fh != 0)	{
-			new_ll_dev->dev.device_fh = 0;
-			new_ll_dev->next = ll_dev;
-			ll_root = new_ll_dev;
-		} else {
-			/* Increment through the ll until we find un unused device_fh. Insert the device at that entry*/
-			while ((ll_dev->next != NULL) && (ll_dev->dev.device_fh == (ll_dev->next->dev.device_fh - 1)))
-				ll_dev = ll_dev->next;
-
-			new_ll_dev->dev.device_fh = ll_dev->dev.device_fh + 1;
-			new_ll_dev->next = ll_dev->next;
-			ll_dev->next = new_ll_dev;
-		}
-	} else {
-		ll_root = new_ll_dev;
-		ll_root->dev.device_fh = 0;
+	switch (type) {
+	case VHOST_DRV_CUSE:
+		return cdev_add_config_ll_entry(new_ll_dev);
+	default:
+		break;
 	}
-
 }
 
 /*
@@ -199,29 +180,32 @@ free_device(struct virtio_net_config_ll *ll_dev)
  * Remove an entry from the device configuration linked list.
  */
 static struct virtio_net_config_ll *
-rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
-	struct virtio_net_config_ll *ll_dev_last)
+rm_config_ll_entry(vhost_driver_type_t type,
+		struct virtio_net_config_ll *ll_dev,
+		struct virtio_net_config_ll *ll_dev_last)
 {
-	/* First remove the device and then clean it up. */
-	if (ll_dev == ll_root) {
-		ll_root = ll_dev->next;
-		cleanup_device(&ll_dev->dev);
-		free_device(ll_dev);
-		return ll_root;
-	} else {
-		if (likely(ll_dev_last != NULL)) {
-			ll_dev_last->next = ll_dev->next;
-			cleanup_device(&ll_dev->dev);
-			free_device(ll_dev);
-			return ll_dev_last->next;
-		} else {
-			cleanup_device(&ll_dev->dev);
-			free_device(ll_dev);
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"Remove entry from config_ll failed\n");
-			return NULL;
-		}
+	switch (type) {
+	case VHOST_DRV_CUSE:
+		return cdev_rm_config_ll_entry(ll_dev, ll_dev_last);
+	default:
+		break;
+	}
+	return NULL;
+}
+
+/**
+ * Get a root entry of linked list.
+ */
+static struct virtio_net_config_ll *
+get_config_ll_root(struct vhost_device_ctx ctx)
+{
+	switch (ctx.type) {
+	case VHOST_DRV_CUSE:
+		return cdev_get_config_ll_root(ctx);
+	default:
+		break;
 	}
+	return NULL;
 }
 
 /*
@@ -294,7 +278,7 @@ new_device(struct vhost_device_ctx ctx)
 	new_ll_dev->next = NULL;
 
 	/* Add entry to device configuration linked list. */
-	add_config_ll_entry(new_ll_dev);
+	add_config_ll_entry(ctx.type, new_ll_dev);
 
 	return new_ll_dev->dev.device_fh;
 }
@@ -307,7 +291,7 @@ static void
 destroy_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *ll_dev_cur_ctx, *ll_dev_last = NULL;
-	struct virtio_net_config_ll *ll_dev_cur = ll_root;
+	struct virtio_net_config_ll *ll_dev_cur = get_config_ll_root(ctx);
 
 	/* Find the linked list entry for the device to be removed. */
 	ll_dev_cur_ctx = get_config_ll_entry(ctx);
@@ -320,7 +304,7 @@ destroy_device(struct vhost_device_ctx ctx)
 			 */
 			if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING))
 				notify_ops->destroy_device(&(ll_dev_cur->dev));
-			ll_dev_cur = rm_config_ll_entry(ll_dev_cur, ll_dev_last);
+			ll_dev_cur = rm_config_ll_entry(ctx.type, ll_dev_cur, ll_dev_last);
 		} else {
 			ll_dev_last = ll_dev_cur;
 			ll_dev_cur = ll_dev_cur->next;
@@ -528,11 +512,6 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 }
 
 /*
- * Include cuse depend functions and definitions.
- */
-#include "virtio-net-cdev.c"
-
-/*
  * Called by main to setup callbacks when registering device.
  */
 struct vhost_net_device_ops const *
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 5/7] lib/librte_vhost: Add a vhost session abstraction
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
                   ` (3 preceding siblings ...)
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 4/7] lib/librte_vhost: Move vhost vhost-cuse device list and accessor functions Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 6/7] lib/librte_vhost: Add vhost-cuse/user specific initialization Tetsuya Mukawa
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Vhost session relates vhost communication layer to virtio-net
device layer. Because vhost-cuse and vhost-user have different
session information, the patch is needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_virtio_net.h  | 2 +-
 lib/librte_vhost/vhost-net-cdev.c  | 8 ++++----
 lib/librte_vhost/vhost-net.h       | 7 ++++++-
 lib/librte_vhost/virtio-net-cdev.c | 7 ++++---
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index a36c0e3..a9e20ea 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -85,7 +85,7 @@ struct vhost_driver {
 	vhost_driver_type_t	type;		/**< driver type. */
 	const char		*dev_name;	/**< accessing device name. */
 	union {
-		struct fuse_session *session;	/**< fuse session. */
+		struct fuse_session *cuse_session;	/**< fuse session. */
 	};
 };
 
diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
index 090c6fc..6754548 100644
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost-net-cdev.c
@@ -67,7 +67,7 @@ fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
 	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
 
 	ctx.type = VHOST_DRV_CUSE;
-	ctx.pid = req_ctx->pid;
+	ctx.cdev.pid = req_ctx->pid;
 	ctx.fh = fi->fh;
 
 	return ctx;
@@ -345,9 +345,9 @@ vhost_cuse_driver_register(struct vhost_driver *drv)
 
 	ops = get_virtio_net_callbacks(drv->type);
 
-	drv->session = cuse_lowlevel_setup(3, fuse_argv,
+	drv->cuse_session = cuse_lowlevel_setup(3, fuse_argv,
 				&cuse_info, &vhost_net_ops, 0, NULL);
-	if (drv->session == NULL)
+	if (drv->cuse_session == NULL)
 		return -1;
 
 	return 0;
@@ -359,7 +359,7 @@ vhost_cuse_driver_register(struct vhost_driver *drv)
 static int
 vhost_cuse_driver_session_start(struct vhost_driver *drv)
 {
-	fuse_session_loop(drv->session);
+	fuse_session_loop(drv->cuse_session);
 
 	return 0;
 }
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 64873d0..ef04832 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -72,14 +72,19 @@
 #define PRINT_PACKET(device, addr, size, header) do {} while (0)
 #endif
 
+struct vhost_device_cuse_ctx {
+	pid_t   pid;	/* PID of process calling the IOCTL. */
+};
 
 /*
  * Structure used to identify device context.
  */
 struct vhost_device_ctx {
 	vhost_driver_type_t	type;	/* driver type. */
-	pid_t			pid;	/* PID of process calling the IOCTL. */
 	uint64_t		fh;	/* Populated with fi->fh to track the device index. */
+	union {
+		struct vhost_device_cuse_ctx cdev;
+	};
 };
 
 /*
diff --git a/lib/librte_vhost/virtio-net-cdev.c b/lib/librte_vhost/virtio-net-cdev.c
index 70bc578..ac97551 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -412,7 +412,8 @@ cuse_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
 		if (mem->regions[regionidx].guest_phys_address == 0x0) {
 			mem->base_address = mem->regions[regionidx].userspace_address;
 			/* Map VM memory file */
-			if (cdev_host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
+			if (cdev_host_memory_map(dev, mem, ctx.cdev.pid,
+						mem->base_address) != 0) {
 				free(mem);
 				return -1;
 			}
@@ -543,7 +544,7 @@ cuse_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
 	eventfd_kick.source_fd = vq->kickfd;
 	eventfd_kick.target_fd = file->fd;
-	eventfd_kick.target_pid = ctx.pid;
+	eventfd_kick.target_pid = ctx.cdev.pid;
 
 	if (eventfd_copy(dev, &eventfd_kick))
 		return -1;
@@ -577,7 +578,7 @@ cuse_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
 	eventfd_call.source_fd = vq->callfd;
 	eventfd_call.target_fd = file->fd;
-	eventfd_call.target_pid = ctx.pid;
+	eventfd_call.target_pid = ctx.cdev.pid;
 
 	if (eventfd_copy(dev, &eventfd_call))
 		return -1;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 6/7] lib/librte_vhost: Add vhost-cuse/user specific initialization
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
                   ` (4 preceding siblings ...)
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 5/7] lib/librte_vhost: Add a vhost session abstraction Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation Tetsuya Mukawa
  2014-11-07  3:33 ` [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Xie, Huawei
  7 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Initialization of vhost-cuse and vhost-user are different.
To call each initialization, the patch is needed.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/virtio-net-cdev.c | 12 +++++++++++-
 lib/librte_vhost/virtio-net.c      | 13 ++++++++++---
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/virtio-net-cdev.c b/lib/librte_vhost/virtio-net-cdev.c
index ac97551..a1ba1f9 100644
--- a/lib/librte_vhost/virtio-net-cdev.c
+++ b/lib/librte_vhost/virtio-net-cdev.c
@@ -42,7 +42,7 @@
 #include "eventfd_link/eventfd_link.h"
 
 /* Functions defined in virtio_net.c */
-static void init_device(struct virtio_net *dev);
+static void init_device(struct vhost_device_ctx ctx, struct virtio_net *dev);
 static void cleanup_device(struct virtio_net *dev);
 static void free_device(struct virtio_net_config_ll *ll_dev);
 static int new_device(struct vhost_device_ctx ctx);
@@ -186,6 +186,16 @@ cdev_get_config_ll_root(void)
 	return cdev_ll_root;
 }
 
+
+/**
+ * CUSE specific device initialization.
+ */
+static void
+cdev_init_device(struct vhost_device_ctx ctx __rte_unused,
+		struct virtio_net *dev __rte_unused)
+{
+}
+
 /*
  * Locate the file containing QEMU's memory space and map it to our address space.
  */
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 603bb09..13fbb6f 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -212,7 +212,7 @@ get_config_ll_root(struct vhost_device_ctx ctx)
  *  Initialise all variables in device structure.
  */
 static void
-init_device(struct virtio_net *dev)
+init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)
 {
 	uint64_t vq_offset;
 
@@ -228,6 +228,13 @@ init_device(struct virtio_net *dev)
 	/* Backends are set to -1 indicating an inactive device. */
 	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
 	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+
+	switch (ctx.type) {
+	case VHOST_DRV_CUSE:
+		return cdev_init_device(ctx, dev);
+	default:
+		break;
+	}
 }
 
 /*
@@ -273,7 +280,7 @@ new_device(struct vhost_device_ctx ctx)
 	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
 
 	/* Initialise device and virtqueues. */
-	init_device(&new_ll_dev->dev);
+	init_device(ctx, &new_ll_dev->dev);
 
 	new_ll_dev->next = NULL;
 
@@ -339,7 +346,7 @@ reset_owner(struct vhost_device_ctx ctx)
 	ll_dev = get_config_ll_entry(ctx);
 
 	cleanup_device(&ll_dev->dev);
-	init_device(&ll_dev->dev);
+	init_device(ctx, &ll_dev->dev);
 
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
                   ` (5 preceding siblings ...)
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 6/7] lib/librte_vhost: Add vhost-cuse/user specific initialization Tetsuya Mukawa
@ 2014-11-06 11:14 ` Tetsuya Mukawa
  2014-11-07 21:25   ` Xie, Huawei
  2014-11-14  0:07   ` Xie, Huawei
  2014-11-07  3:33 ` [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Xie, Huawei
  7 siblings, 2 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-06 11:14 UTC (permalink / raw)
  To: dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

This patch adds vhost-user implementation to librte_vhost.
To communicate with vhost-user of QEMU, speficy VHOST_DRV_USER as
a vhost_driver_type_t variable in rte_vhost_driver_register().

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
---
 lib/librte_vhost/rte_virtio_net.h  |  19 +-
 lib/librte_vhost/vhost-net-user.c  | 541 +++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost-net.c       |  39 ++-
 lib/librte_vhost/vhost-net.h       |   7 +
 lib/librte_vhost/virtio-net-user.c | 410 ++++++++++++++++++++++++++++
 lib/librte_vhost/virtio-net.c      |  64 ++++-
 6 files changed, 1073 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_vhost/vhost-net-user.c
 create mode 100644 lib/librte_vhost/virtio-net-user.c

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index a9e20ea..af07900 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -75,17 +75,32 @@ struct buf_vector {
  */
 typedef enum {
 	VHOST_DRV_CUSE, /* cuse driver */
+	VHOST_DRV_USER, /* vhost-user driver */
 	VHOST_DRV_NUM	/* the number of vhost driver types */
 } vhost_driver_type_t;
 
+
+/**
+ * Structure contains vhost-user session specific information
+ */
+struct vhost_user_session {
+	int		fh;		/**< session identifier */
+	pthread_t	tid;		/**< thread id of session handler */
+	int		socketfd;	/**< fd of socket */
+	int		interval;	/**< reconnection interval of session */
+};
+
 /**
  * Structure contains information relating vhost driver.
  */
 struct vhost_driver {
 	vhost_driver_type_t	type;		/**< driver type. */
 	const char		*dev_name;	/**< accessing device name. */
+	void			*priv;		/**< private data. */
 	union {
 		struct fuse_session *cuse_session;	/**< fuse session. */
+		struct vhost_user_session *user_session;
+						/**< vhost-user session. */
 	};
 };
 
@@ -199,9 +214,11 @@ struct vhost_driver *rte_vhost_driver_register(
 		const char *dev_name, vhost_driver_type_t type);
 
 /* Register callbacks. */
-int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+int rte_vhost_driver_callback_register(struct vhost_driver *drv,
+			struct virtio_net_device_ops const * const, void *priv);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(struct vhost_driver *drv);
+void rte_vhost_driver_session_stop(struct vhost_driver *drv);
 
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
diff --git a/lib/librte_vhost/vhost-net-user.c b/lib/librte_vhost/vhost-net-user.c
new file mode 100644
index 0000000..434f20f
--- /dev/null
+++ b/lib/librte_vhost/vhost-net-user.c
@@ -0,0 +1,541 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2014 IGEL Co/.Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <linux/un.h>
+
+#define VHOST_USER_MAX_DEVICE		(32)
+#define VHOST_USER_MAX_FD_NUM		(3)
+
+/* start id of vhost user device */
+rte_atomic16_t vhost_user_device_id;
+
+static struct vhost_net_device_ops const *ops;
+
+typedef enum VhostUserRequest {
+	VHOST_USER_NONE = 0,
+	VHOST_USER_GET_FEATURES = 1,
+	VHOST_USER_SET_FEATURES = 2,
+	VHOST_USER_SET_OWNER = 3,
+	VHOST_USER_RESET_OWNER = 4,
+	VHOST_USER_SET_MEM_TABLE = 5,
+	VHOST_USER_SET_LOG_BASE = 6,
+	VHOST_USER_SET_LOG_FD = 7,
+	VHOST_USER_SET_VRING_NUM = 8,
+	VHOST_USER_SET_VRING_ADDR = 9,
+	VHOST_USER_SET_VRING_BASE = 10,
+	VHOST_USER_GET_VRING_BASE = 11,
+	VHOST_USER_SET_VRING_KICK = 12,
+	VHOST_USER_SET_VRING_CALL = 13,
+	VHOST_USER_SET_VRING_ERR = 14,
+	VHOST_USER_MAX
+} VhostUserRequest;
+
+#define VHOST_MEMORY_MAX_NREGIONS	8
+
+typedef struct VhostUserMemoryRegion {
+	uint64_t guest_phys_addr;
+	uint64_t memory_size;
+	uint64_t userspace_addr;
+	uint64_t mmap_offset;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+	uint32_t nregions;
+	uint32_t padding;
+	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+	VhostUserRequest request;
+
+#define VHOST_USER_VERSION_MASK		(0x3)
+#define VHOST_USER_REPLY_MASK		(0x1<<2)
+	uint32_t flags;
+	uint32_t size; /* the following payload size */
+	union {
+#define VHOST_USER_VRING_IDX_MASK	(0xff)
+#define VHOST_USER_VRING_NOFD_MASK	(0x1<<8)
+		uint64_t u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		VhostUserMemory memory;
+	};
+} __attribute__((packed)) VhostUserMsg;
+
+static VhostUserMsg m __attribute__ ((unused));
+#define VHOST_USER_HDR_SIZE	(sizeof(m.request) \
+		+ sizeof(m.flags) + sizeof(m.size))
+
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION		(0x1)
+
+static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
+	-1,			/* VHOST_USER_NONE */
+	VHOST_GET_FEATURES,	/* VHOST_USER_GET_FEATURES */
+	VHOST_SET_FEATURES,	/* VHOST_USER_SET_FEATURES */
+	VHOST_SET_OWNER,	/* VHOST_USER_SET_OWNER */
+	VHOST_RESET_OWNER,	/* VHOST_USER_RESET_OWNER */
+	VHOST_SET_MEM_TABLE,	/* VHOST_USER_SET_MEM_TABLE */
+	VHOST_SET_LOG_BASE,	/* VHOST_USER_SET_LOG_BASE */
+	VHOST_SET_LOG_FD,	/* VHOST_USER_SET_LOG_FD */
+	VHOST_SET_VRING_NUM,	/* VHOST_USER_SET_VRING_NUM */
+	VHOST_SET_VRING_ADDR,	/* VHOST_USER_SET_VRING_ADDR */
+	VHOST_SET_VRING_BASE,	/* VHOST_USER_SET_VRING_BASE */
+	VHOST_GET_VRING_BASE,	/* VHOST_USER_GET_VRING_BASE */
+	VHOST_SET_VRING_KICK,	/* VHOST_USER_SET_VRING_KICK */
+	VHOST_SET_VRING_CALL,	/* VHOST_USER_SET_VRING_CALL */
+	VHOST_SET_VRING_ERR	/* VHOST_USER_SET_VRING_ERR */
+};
+
+/**
+ * Returns vhost_device_ctx from given fuse_req_t. The index is populated later when
+ * the device is added to the device linked list.
+ */
+static struct vhost_device_ctx
+vhost_driver_to_vhost_ctx(struct vhost_driver *drv)
+{
+	struct vhost_device_ctx ctx;
+	int device_id = drv->user_session->fh;
+
+	ctx.type = VHOST_DRV_USER;
+	ctx.fh = device_id;
+	ctx.user.drv = drv;
+
+	return ctx;
+}
+
+/**
+ * When the device is created in QEMU it gets initialised here and added to the device linked list.
+ */
+static int
+vhost_user_open(struct vhost_driver *drv)
+{
+	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
+
+	int ret;
+
+	ret = ops->new_device(ctx);
+	if (ret == -1)
+		return -1;
+
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device configuration started\n", ctx.fh);
+
+	return 0;
+}
+
+/**
+ * When QEMU is shutdown or killed the device gets released.
+ */
+static void
+vhost_user_release(struct vhost_driver *drv)
+{
+	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
+
+	ops->destroy_device(ctx);
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh);
+}
+
+/**
+ * Send data to vhost-user device on a QEMU.
+ */
+static int
+vhost_user_write(struct vhost_driver *drv, VhostUserMsg *msg,
+		int *fds, size_t fd_num)
+{
+	int fd, len;
+	size_t fd_size = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fd_size)];
+	struct msghdr msg_header;
+	struct iovec iov[1];
+	struct cmsghdr *cmsg_header;
+	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
+
+	if ((drv == NULL) || (msg == NULL))
+		return -EINVAL;
+
+	fd = drv->user_session->socketfd;
+
+	memset(&msg_header, 0, sizeof(msg_header));
+	memset(control, 0, sizeof(control));
+
+	/* set the payload */
+	iov[0].iov_base = (void *)msg;
+	iov[0].iov_len = VHOST_USER_HDR_SIZE + msg->size;
+
+	msg_header.msg_iov = iov;
+	msg_header.msg_iovlen = 1;
+
+	if (fd_num) {
+		msg_header.msg_control = control;
+		msg_header.msg_controllen = sizeof(control);
+		cmsg_header = CMSG_FIRSTHDR(&msg_header);
+		cmsg_header->cmsg_len = CMSG_LEN(fd_size);
+		cmsg_header->cmsg_level = SOL_SOCKET;
+		cmsg_header->cmsg_type = SCM_RIGHTS;
+		memcpy(CMSG_DATA(cmsg_header), fds, fd_size);
+	} else {
+		msg_header.msg_control = 0;
+		msg_header.msg_controllen = 0;
+	}
+
+	do {
+		len = sendmsg(fd, &msg_header, 0);
+	} while (len < 0 && errno == EINTR);
+
+	if (len < 0)
+		goto error;
+
+	return 0;
+
+error:
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device cannot send message\n", ctx.fh);
+	return -EFAULT;
+}
+
+/**
+ * Receive data from vhost-user device on a QEMU.
+ */
+static int
+vhost_user_read(struct vhost_driver *drv, VhostUserMsg *msg,
+		int *fds, size_t *fd_num)
+{
+	int fd, len;
+	size_t fd_size = (*fd_num) * sizeof(int);
+	char control[CMSG_SPACE(fd_size)];
+	struct msghdr msg_header;
+	struct iovec iov[1];
+	struct cmsghdr *cmsg_header;
+	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
+
+	if ((drv == NULL) || (msg == NULL))
+		return -EINVAL;
+
+	fd = drv->user_session->socketfd;
+
+	memset(&msg_header, 0, sizeof(msg_header));
+	memset(control, 0, sizeof(control));
+	*fd_num = 0;
+
+	/* set the payload */
+	iov[0].iov_base = (void *)msg;
+	iov[0].iov_len = VHOST_USER_HDR_SIZE;
+
+	msg_header.msg_iov = iov;
+	msg_header.msg_iovlen = 1;
+	msg_header.msg_control = control;
+	msg_header.msg_controllen = sizeof(control);
+
+	if ((len = recvmsg(fd, &msg_header, 0)) <= 0)
+		goto error;
+
+	if (msg_header.msg_flags & (MSG_TRUNC | MSG_CTRUNC))
+		goto error;
+
+	cmsg_header = CMSG_FIRSTHDR(&msg_header);
+	if (cmsg_header && cmsg_header->cmsg_len > 0 &&
+			cmsg_header->cmsg_level == SOL_SOCKET &&
+			cmsg_header->cmsg_type == SCM_RIGHTS) {
+		if (fd_size >= cmsg_header->cmsg_len - CMSG_LEN(0)) {
+			fd_size = cmsg_header->cmsg_len - CMSG_LEN(0);
+			memcpy(fds, CMSG_DATA(cmsg_header), fd_size);
+			*fd_num = fd_size / sizeof(int);
+		}
+	}
+
+	if (read(fd, ((char *)msg) + len, msg->size) < 0)
+		goto error;
+
+	return 0;
+
+error:
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device cannot receive message\n", ctx.fh);
+	return -EFAULT;
+}
+
+/*
+ * Boilerplate code for vhost-user IOCTL
+ * Implicit arguments: ctx, req, result.
+ */
+#define VHOST_USER_IOCTL(func) do {	\
+	result = (func)(ctx);		\
+} while (0)
+
+/*
+ * Boilerplate code for vhost-user Read IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_USER_IOCTL_R(type, var, func) do {\
+	result = func(ctx, &(var));		\
+} while (0)
+
+/*
+ * Boilerplate code for vhost-user Write IOCTL
+ * Implicit arguments: ctx, req, result, out_bufsz.
+ */
+#define	VHOST_USER_IOCTL_W(type, var, func) do {\
+	result = (func)(ctx, &(var));		\
+	msg->flags |= VHOST_USER_REPLY_MASK;	\
+	msg->size = sizeof(type);		\
+	vhost_user_write(drv, msg, NULL, 0);	\
+} while (0)
+
+/*
+ * Boilerplate code for vhost-user Read/Write IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_USER_IOCTL_RW(type1, var1, type2, var2, func) do {\
+	result = (func)(ctx, (var1), &(var2));			\
+	msg->flags |= VHOST_USER_REPLY_MASK;			\
+	msg->size = sizeof(type2);				\
+	vhost_user_write(drv, msg, NULL, 0);			\
+} while (0)
+
+/**
+ * The IOCTLs are handled using unix domain socket in userspace.
+ */
+	static int
+vhost_user_ioctl(struct vhost_driver *drv, VhostUserMsg *msg,
+		int *fds, int fd_num)
+{
+	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
+	struct vhost_vring_file file;
+	int result = 0;
+
+	switch (ioctl_to_vhost_user_request[msg->request]) {
+	case VHOST_GET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh);
+		VHOST_USER_IOCTL_W(uint64_t, msg->u64, ops->get_features);
+		break;
+
+	case VHOST_SET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh);
+		VHOST_USER_IOCTL_R(uint64_t, msg->u64, ops->set_features);
+		break;
+
+	case VHOST_RESET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh);
+		VHOST_USER_IOCTL(ops->reset_owner);
+		break;
+
+	case VHOST_SET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
+		VHOST_USER_IOCTL(ops->set_owner);
+		break;
+
+	case VHOST_SET_MEM_TABLE:
+		/*TODO fix race condition.*/
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh);
+		/* all fds should be same, because physical memory consist of an one file */
+		ctx.user.fds = fds;
+		ctx.user.fd_num = fd_num;
+		result = ops->set_mem_table(ctx, &msg->memory, msg->memory.nregions);
+		break;
+
+	case VHOST_SET_VRING_NUM:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh);
+		VHOST_USER_IOCTL_R(struct vhost_vring_state, msg->state, ops->set_vring_num);
+		break;
+
+	case VHOST_SET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh);
+		VHOST_USER_IOCTL_R(struct vhost_vring_state, msg->state, ops->set_vring_base);
+		break;
+
+	case VHOST_GET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh);
+		VHOST_USER_IOCTL_RW(uint32_t, msg->addr.index, struct vhost_vring_state, msg->state, ops->get_vring_base);
+		break;
+
+	case VHOST_SET_VRING_ADDR:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh);
+		VHOST_USER_IOCTL_R(struct vhost_vring_addr, msg->addr, ops->set_vring_addr);
+		break;
+
+	case VHOST_SET_VRING_KICK:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
+		ctx.user.fds = fds;
+		ctx.user.fd_num = fd_num;
+		file.index = msg->u64;
+		VHOST_USER_IOCTL_R(struct vhost_vring_file, file, ops->set_vring_kick);
+		break;
+
+	case VHOST_SET_VRING_CALL:
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
+		ctx.user.fds = fds;
+		ctx.user.fd_num = fd_num;
+		file.index = msg->u64;
+		VHOST_USER_IOCTL_R(struct vhost_vring_file, file, ops->set_vring_call);
+		break;
+
+	case VHOST_NET_SET_BACKEND:
+	default:
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
+		result = -1;
+	}
+
+	if (result < 0)
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
+	else
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
+
+	return result;
+}
+
+/**
+ * vhost-user specific registration.
+ */
+static int
+vhost_user_driver_register(struct vhost_driver *drv)
+{
+	if ((drv == NULL) || (drv->dev_name == NULL) ||
+			(strlen(drv->dev_name) > UNIX_PATH_MAX - 1))
+		return -1;
+
+	ops = get_virtio_net_callbacks(drv->type);
+
+	drv->user_session = rte_malloc(NULL, sizeof(struct vhost_user_session), CACHE_LINE_SIZE);
+	if (drv->user_session == NULL)
+		return -1;
+
+	drv->user_session->fh =
+		rte_atomic16_add_return(&vhost_user_device_id, 1) - 1; /* fh of first device is zero */
+	drv->user_session->interval = 1;
+
+	return 0;
+}
+
+/**
+ * When vhost-user driver starts, the session handler communicates with vhost-user
+ * device on a QEMU using a unix domain sokcet.
+ */
+static void *
+vhost_user_session_handler(void *data)
+{
+	struct vhost_driver *drv = data;
+	int ret;
+	struct sockaddr_un caddr;
+	VhostUserMsg msg;
+	int fds[VHOST_USER_MAX_FD_NUM];
+	int socketfd;
+	int interval;
+	size_t fd_num;
+
+	if ((drv == NULL) || (drv->dev_name == NULL))
+		return NULL;
+
+	bzero(&caddr, sizeof(caddr));
+	caddr.sun_family = AF_LOCAL;
+	strncpy((char *)&caddr.sun_path, drv->dev_name, strlen(drv->dev_name));
+
+reconnect:
+	drv->user_session->socketfd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (drv->user_session->socketfd < 0)
+		return NULL;
+
+	socketfd = drv->user_session->socketfd;
+	interval = drv->user_session->interval;
+	while (1) {
+		ret = connect(socketfd, (struct sockaddr *)&caddr, sizeof(caddr));
+		if (ret == 0)
+			break; /* success */
+		sleep(interval);
+	}
+
+	ret = vhost_user_open(drv);
+	if (ret != 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) open failuer\n", drv->dev_name);
+		return NULL;
+	}
+
+	for (;;) {
+		fd_num = VHOST_USER_MAX_FD_NUM;
+		bzero(&msg, sizeof(VhostUserMsg));
+		ret = vhost_user_read(drv, &msg, fds, &fd_num);
+		if (ret != 0) {
+			RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) read failuer\n", drv->dev_name);
+			vhost_user_release(drv);
+			goto reconnect;
+		}
+
+		ret = vhost_user_ioctl(drv, &msg, fds, fd_num);
+		if (ret != 0) {
+			RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) request failuer\n", drv->dev_name);
+			vhost_user_release(drv);
+			goto reconnect;
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * Create session handler
+ */
+static int
+vhost_user_driver_start(struct vhost_driver *drv)
+{
+	if (pthread_create(&drv->user_session->tid, NULL, vhost_user_session_handler, drv)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"(Socket %s) starting event handler failuer\n", drv->dev_name);
+		return -1;
+	}
+
+	/* TODO: The event handler thread may need to run on a core user speficied. */
+
+	return 0;
+}
+
+/**
+ * Destroy session handler
+ */
+static void
+vhost_user_driver_stop(struct vhost_driver *drv)
+{
+	pthread_t *tid = &drv->user_session->tid;
+
+	if (pthread_create(tid, NULL, vhost_user_session_handler, drv)) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"(Socket %s) starting event handler failuer\n", drv->dev_name);
+		return;
+	}
+
+	/* stop event thread and wait until connection is closed */
+	if (*tid) {
+		pthread_cancel(*tid);
+		pthread_join(*tid, NULL);
+	}
+
+	vhost_user_release(drv);
+}
diff --git a/lib/librte_vhost/vhost-net.c b/lib/librte_vhost/vhost-net.c
index b0de5fd..10f41e9 100644
--- a/lib/librte_vhost/vhost-net.c
+++ b/lib/librte_vhost/vhost-net.c
@@ -42,6 +42,11 @@
  */
 #include "vhost-net-cdev.c"
 
+/*
+ * Include vhost-user depend functions and definitions
+ */
+#include "vhost-net-user.c"
+
 /**
  * This function abstracts cuse and vhost-user driver registration.
  */
@@ -65,10 +70,17 @@ rte_vhost_driver_register(const char *dev_name, vhost_driver_type_t type)
 		if (ret != 0)
 			goto err;
 		break;
+	case VHOST_DRV_USER:
+		ret = vhost_user_driver_register(drv);
+		break;
 	default:
+		ret = -EINVAL;
 		break;
 	}
 
+	if (ret != 0)
+		goto err;
+
 	return drv;
 err:
 	free(drv);
@@ -81,17 +93,40 @@ err:
 int
 rte_vhost_driver_session_start(struct vhost_driver *drv)
 {
+	int ret;
+
 	if (drv == NULL)
 		return -ENODEV;
 
 	switch (drv->type) {
 	case VHOST_DRV_CUSE:
-		vhost_cuse_driver_session_start(drv);
+		ret = vhost_cuse_driver_session_start(drv);
+		break;
+	case VHOST_DRV_USER:
+		ret = vhost_user_driver_start(drv);
 		break;
 	default:
+		ret = -EINVAL;
 		break;
 	}
 
-	return 0;
+	return ret;
 }
 
+/**
+ * The vhost session is closed, only allow for vhost-user.
+ */
+void
+rte_vhost_driver_session_stop(struct vhost_driver *drv)
+{
+	if (drv == NULL)
+		return;
+
+	switch (drv->type) {
+	case VHOST_DRV_USER:
+		vhost_user_driver_stop(drv);
+		break;
+	default:
+		break;
+	}
+}
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index ef04832..0e36ba0 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -76,6 +76,12 @@ struct vhost_device_cuse_ctx {
 	pid_t   pid;	/* PID of process calling the IOCTL. */
 };
 
+struct vhost_device_user_ctx {
+	int			*fds;
+	int			fd_num;
+	struct vhost_driver	*drv;
+};
+
 /*
  * Structure used to identify device context.
  */
@@ -83,6 +89,7 @@ struct vhost_device_ctx {
 	vhost_driver_type_t	type;	/* driver type. */
 	uint64_t		fh;	/* Populated with fi->fh to track the device index. */
 	union {
+		struct vhost_device_user_ctx user;
 		struct vhost_device_cuse_ctx cdev;
 	};
 };
diff --git a/lib/librte_vhost/virtio-net-user.c b/lib/librte_vhost/virtio-net-user.c
new file mode 100644
index 0000000..1e78f98
--- /dev/null
+++ b/lib/librte_vhost/virtio-net-user.c
@@ -0,0 +1,410 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2014 IGEL Co.,Ltd.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of IGEL nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <stdlib.h>
+
+/* Functions defined in virtio_net.c */
+static void init_device(struct vhost_device_ctx ctx, struct virtio_net *dev);
+static void cleanup_device(struct virtio_net *dev);
+static void free_device(struct virtio_net_config_ll *ll_dev);
+static int new_device(struct vhost_device_ctx ctx);
+static void destroy_device(struct vhost_device_ctx ctx);
+static int set_owner(struct vhost_device_ctx ctx);
+static int reset_owner(struct vhost_device_ctx ctx);
+static int get_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_features(struct vhost_device_ctx ctx, uint64_t *pu);
+static int set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state *state);
+static int set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr);
+static int set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state *state);
+static int set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file);
+
+/* Root address of the linked list in the configuration core. */
+static struct virtio_net_config_ll *user_ll_root;
+
+/**
+ * Retrieves an entry from the devices configuration linked list.
+ */
+static struct virtio_net_config_ll *
+user_get_config_ll_entry(struct vhost_device_ctx ctx)
+{
+	struct virtio_net_config_ll *ll_dev = user_ll_root;
+
+	/* Loop through linked list until the device_fh is found. */
+	while (ll_dev != NULL) {
+		if (ll_dev->dev.device_fh == ctx.fh)
+			return ll_dev;
+		ll_dev = ll_dev->next;
+	}
+
+	return NULL;
+}
+
+/**
+ * Searches the configuration core linked list and retrieves the device if it exists.
+ */
+static struct virtio_net *
+user_get_device(struct vhost_device_ctx ctx)
+{
+	struct virtio_net_config_ll *ll_dev;
+
+	ll_dev = user_get_config_ll_entry(ctx);
+
+	/* If a matching entry is found in the linked list, return the device in that entry. */
+	if (ll_dev)
+		return &ll_dev->dev;
+
+	RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Device not found in linked list.\n", ctx.fh);
+	return NULL;
+}
+
+/**
+ * Add entry containing a device to the device configuration linked list.
+ */
+static void
+user_add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
+{
+	struct virtio_net_config_ll *ll_dev = user_ll_root;
+
+	/* If ll_dev == NULL then this is the first device so go to else */
+	if (ll_dev) {
+		/* If the 1st device_fh != 0 then we insert our device here. */
+		if (ll_dev->dev.device_fh != 0)	{
+			new_ll_dev->dev.device_fh = 0;
+			new_ll_dev->next = ll_dev;
+			user_ll_root = new_ll_dev;
+		} else {
+			/* Increment through the ll until we find un unused device_fh. Insert the device at that entry*/
+			while ((ll_dev->next != NULL) && (ll_dev->dev.device_fh == (ll_dev->next->dev.device_fh - 1)))
+				ll_dev = ll_dev->next;
+
+			new_ll_dev->dev.device_fh = ll_dev->dev.device_fh + 1;
+			new_ll_dev->next = ll_dev->next;
+			ll_dev->next = new_ll_dev;
+		}
+	} else {
+		user_ll_root = new_ll_dev;
+		user_ll_root->dev.device_fh = 0;
+	}
+
+}
+
+/**
+ * Remove an entry from the device configuration linked list.
+ */
+static struct virtio_net_config_ll *
+user_rm_config_ll_entry(struct virtio_net_config_ll *ll_dev, struct virtio_net_config_ll *ll_dev_last)
+{
+	/* First remove the device and then clean it up. */
+	if (ll_dev == user_ll_root) {
+		user_ll_root = ll_dev->next;
+		cleanup_device(&ll_dev->dev);
+		free_device(ll_dev);
+		return user_ll_root;
+	} else {
+		if (likely(ll_dev_last != NULL)) {
+			ll_dev_last->next = ll_dev->next;
+			cleanup_device(&ll_dev->dev);
+			free_device(ll_dev);
+			return ll_dev_last->next;
+		} else {
+			cleanup_device(&ll_dev->dev);
+			free_device(ll_dev);
+			RTE_LOG(ERR, VHOST_CONFIG, "Remove entry from config_ll failed\n");
+			return NULL;
+		}
+	}
+}
+
+/**
+ * Returns the root entry of linked list
+ */
+static struct virtio_net_config_ll *
+user_get_config_ll_root(void)
+{
+	return user_ll_root;
+}
+
+/**
+ * vhost-user specific device initialization.
+ */
+static void
+user_init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)
+{
+	dev->priv = ctx.user.drv->priv;
+}
+
+/**
+ * Locate the file containing QEMU's memory space and map it to our address space.
+ */
+static int
+user_host_memory_map(struct virtio_net *dev, struct virtio_memory *mem, int fd, size_t size)
+{
+	void *map;
+
+	map = mmap(0, size, PROT_READ|PROT_WRITE , MAP_POPULATE|MAP_SHARED, fd, 0);
+	close(fd);
+
+	if (map == MAP_FAILED) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the file fd %d\n",  dev->device_fh, fd);
+		return -1;
+	}
+
+	/* Store the memory address and size in the device data structure */
+	mem->mapped_address = (uint64_t)(uintptr_t)map;
+	mem->mapped_size = size;
+
+	LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: fd: %d - Size: %llu - VA: %p\n", dev->device_fh,
+			fd, (long long unsigned)mem->mapped_size, map);
+
+	return 0;
+}
+
+/*
+ * Called from IOCTL: VHOST_SET_MEM_TABLE
+ * This function creates and populates the memory structure for the device. This includes
+ * storing offsets used to translate buffer addresses.
+ */
+static int
+user_set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr, uint32_t nregions)
+{
+	struct virtio_net *dev;
+	struct vhost_memory_region *mem_regions;
+	struct virtio_memory *mem;
+	uint64_t size = offsetof(struct vhost_memory, regions);
+	uint32_t regionidx, valid_regions;
+	size_t guest_memory_size = 0;
+
+	dev = user_get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem) {
+		munmap((void *)(uintptr_t)dev->mem->mapped_address, (size_t)dev->mem->mapped_size);
+		free(dev->mem);
+	}
+
+	/* Malloc the memory structure depending on the number of regions. */
+	mem = calloc(1, sizeof(struct virtio_memory) + (sizeof(struct virtio_memory_regions) * nregions));
+	if (mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev->mem.\n", dev->device_fh);
+		return -1;
+	}
+
+	mem->nregions = nregions;
+
+	mem_regions = (void*)(uintptr_t)((uint64_t)(uintptr_t)mem_regions_addr + size);
+
+	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
+		/* Populate the region structure for each region. */
+		mem->regions[regionidx].guest_phys_address = mem_regions[regionidx].guest_phys_addr;
+		mem->regions[regionidx].guest_phys_address_end = mem->regions[regionidx].guest_phys_address +
+			mem_regions[regionidx].memory_size;
+		mem->regions[regionidx].memory_size = mem_regions[regionidx].memory_size;
+		mem->regions[regionidx].userspace_address = mem_regions[regionidx].userspace_addr;
+		guest_memory_size += mem_regions[regionidx].memory_size;
+
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
+				regionidx, (void*)(uintptr_t)mem->regions[regionidx].guest_phys_address,
+				(void*)(uintptr_t)mem->regions[regionidx].userspace_address,
+				mem->regions[regionidx].memory_size);
+	}
+
+	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
+		/*set the base address mapping*/
+		if (mem->regions[regionidx].guest_phys_address == 0x0) {
+			mem->base_address = mem->regions[regionidx].userspace_address;
+			/* Map VM memory file */
+			if (user_host_memory_map(dev, mem, ctx.user.fds[regionidx], guest_memory_size) != 0) {
+				free(mem);
+				return -1;
+			}
+		} else
+			close(ctx.user.fds[regionidx]);
+	}
+
+	/* Check that we have a valid base address. */
+	if (mem->base_address == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh);
+		free(mem);
+		return -1;
+	}
+
+	/* Check if all of our regions have valid mappings. Usually one does not exist in the QEMU memory file. */
+	valid_regions = mem->nregions;
+	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
+		if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
+				(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size)))
+			valid_regions--;
+	}
+
+	/* If a region does not have a valid mapping we rebuild our memory struct to contain only valid entries. */
+	if (valid_regions != mem->nregions) {
+		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n",
+				dev->device_fh);
+
+		/* Re-populate the memory structure with only valid regions. Invalid regions are over-written with memmove. */
+		valid_regions = 0;
+
+		for (regionidx = mem->nregions; 0 != regionidx--;) {
+			if ((mem->regions[regionidx].userspace_address < mem->base_address) ||
+					(mem->regions[regionidx].userspace_address > (mem->base_address + mem->mapped_size))) {
+				memmove(&mem->regions[regionidx], &mem->regions[regionidx + 1],
+						sizeof(struct virtio_memory_regions) * valid_regions);
+			} else {
+				valid_regions++;
+			}
+		}
+	}
+	mem->nregions = valid_regions;
+	dev->mem = mem;
+
+	/*
+	 * Calculate the address offset for each region. This offset is used to identify the vhost virtual address
+	 * corresponding to a QEMU guest physical address.
+	 */
+	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++)
+		dev->mem->regions[regionidx].address_offset = dev->mem->regions[regionidx].userspace_address - dev->mem->base_address
+			+ dev->mem->mapped_address - dev->mem->regions[regionidx].guest_phys_address;
+
+	return 0;
+}
+
+/**
+ * Called from IOCTL: VHOST_GET_VRING_BASE
+ * We send the virtio device our available ring last used index.
+ */
+static int
+user_get_vring_base(struct vhost_device_ctx ctx, uint32_t index, struct vhost_vring_state *state)
+{
+	struct virtio_net *dev;
+
+	dev = user_get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	state->index = index;
+	/* State->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	state->num = dev->virtqueue[state->index]->last_used_idx;
+
+	return 0;
+}
+
+/**
+ * Called from IOCTL: VHOST_SET_VRING_CALL
+ * The virtio device sends an eventfd to interrupt the guest. This fd gets copied in
+ * to our process space.
+ * Also this message is sent when virtio-net device is reset by device driver on QEMU.
+ */
+static int
+user_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+	struct vhost_virtqueue *vq;
+
+	dev = user_get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	vq = dev->virtqueue[file->index];
+
+	if (vq->kickfd)
+		close((int)vq->kickfd);
+
+	/* Populate the eventfd_copy structure and call eventfd_copy. */
+	vq->kickfd = ctx.user.fds[0];
+
+	return 0;
+}
+
+/**
+ * Called from IOCTL: VHOST_SET_VRING_KICK
+ * The virtio device sends an eventfd that it can use to notify us. This fd gets copied in
+ * to our process space.
+ */
+static int
+user_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+	struct vhost_virtqueue *vq;
+
+	dev = user_get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0. */
+	vq = dev->virtqueue[file->index];
+
+	if (vq->callfd)
+		close((int)vq->callfd);
+
+	/* Populate the eventfd_copy structure and call eventfd_copy. */
+	vq->callfd = ctx.user.fds[0];
+
+	if ((dev->virtqueue[VIRTIO_RXQ] != NULL) && (dev->virtqueue[VIRTIO_TXQ]) != NULL)
+		return set_backend(ctx, file);
+
+	return 0;
+}
+
+/**
+ * Function pointers are set for the device operations to allow to call functions
+ * when an IOCTL, device_add or device_release is received.
+ */
+static const struct vhost_net_device_ops vhost_user_device_ops = {
+	.new_device = new_device,
+	.destroy_device = destroy_device,
+
+	.get_features = get_features,
+	.set_features = set_features,
+
+	.set_mem_table = user_set_mem_table,
+
+	.set_vring_num = set_vring_num,
+	.set_vring_addr = set_vring_addr,
+	.set_vring_base = set_vring_base,
+	.get_vring_base = user_get_vring_base,
+
+	.set_vring_kick = user_set_vring_kick,
+	.set_vring_call = user_set_vring_call,
+
+	.set_backend = set_backend,
+
+	.set_owner = set_owner,
+	.reset_owner = reset_owner,
+};
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 13fbb6f..db810e7 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -96,6 +96,12 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
  */
 #include "virtio-net-cdev.c"
 
+/**
+ * Include vhost-user depend functions and definitions.
+ */
+#include "virtio-net-user.c"
+
+
 /*
  * Retrieves an entry from the devices configuration linked list.
  */
@@ -105,6 +111,8 @@ get_config_ll_entry(struct vhost_device_ctx ctx)
 	switch (ctx.type) {
 	case VHOST_DRV_CUSE:
 		return cdev_get_config_ll_entry(ctx);
+	case VHOST_DRV_USER:
+		return user_get_config_ll_entry(ctx);
 	default:
 		break;
 	}
@@ -120,6 +128,8 @@ get_device(struct vhost_device_ctx ctx)
 	switch (ctx.type) {
 	case VHOST_DRV_CUSE:
 		return cdev_get_device(ctx);
+	case VHOST_DRV_USER:
+		return user_get_device(ctx);
 	default:
 		break;
 	}
@@ -136,6 +146,8 @@ add_config_ll_entry(vhost_driver_type_t type,
 	switch (type) {
 	case VHOST_DRV_CUSE:
 		return cdev_add_config_ll_entry(new_ll_dev);
+	case VHOST_DRV_USER:
+		return user_add_config_ll_entry(new_ll_dev);
 	default:
 		break;
 	}
@@ -149,8 +161,39 @@ cleanup_device(struct virtio_net *dev)
 {
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
-		munmap((void *)(uintptr_t)dev->mem->mapped_address,
-			(size_t)dev->mem->mapped_size);
+		{
+			/*
+			 * 'munmap()' will be failed when mapped_size isn't
+			 * aligned with hugepage size.
+			 * Usually a file size of QEMU physical memory is
+			 * aligned by hugepage size. So In a case of CUSE,
+			 * there is no problem. But with vhost-user, there is
+			 * no way to get physical memory size.
+			 *
+			 * Let's assume hugepage size is 2MB or 1GB here.
+			 * BTW, 'mmap()' automatically fixed size parameter
+			 * to be aligned. Why does 'munmap()' do like so?
+			 */
+			int ret = 0;
+			size_t hugepagesize, size = dev->mem->mapped_size;
+
+			/* assume hugepage size is 2MB */
+			hugepagesize = 2 * 1024 * 1024;
+			size = (size + hugepagesize - 1) /
+						hugepagesize * hugepagesize;
+			ret = munmap((void *)(uintptr_t)
+						dev->mem->mapped_address,
+						size);
+			if (ret) {
+				/* assume hugepage size is 1GB, try again */
+				hugepagesize = 1024 * 1024 * 1024;
+				size = (size + hugepagesize - 1) /
+						hugepagesize * hugepagesize;
+				munmap((void *)(uintptr_t)
+						dev->mem->mapped_address,
+						size);
+			}
+		}
 		free(dev->mem);
 	}
 
@@ -187,6 +230,8 @@ rm_config_ll_entry(vhost_driver_type_t type,
 	switch (type) {
 	case VHOST_DRV_CUSE:
 		return cdev_rm_config_ll_entry(ll_dev, ll_dev_last);
+	case VHOST_DRV_USER:
+		return user_rm_config_ll_entry(ll_dev, ll_dev_last);
 	default:
 		break;
 	}
@@ -201,7 +246,9 @@ get_config_ll_root(struct vhost_device_ctx ctx)
 {
 	switch (ctx.type) {
 	case VHOST_DRV_CUSE:
-		return cdev_get_config_ll_root(ctx);
+		return cdev_get_config_ll_root();
+	case VHOST_DRV_USER:
+		return user_get_config_ll_root();
 	default:
 		break;
 	}
@@ -232,6 +279,8 @@ init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)
 	switch (ctx.type) {
 	case VHOST_DRV_CUSE:
 		return cdev_init_device(ctx, dev);
+	case VHOST_DRV_USER:
+		return user_init_device(ctx, dev);
 	default:
 		break;
 	}
@@ -527,6 +576,8 @@ get_virtio_net_callbacks(vhost_driver_type_t type)
 	switch (type) {
 	case VHOST_DRV_CUSE:
 		return &vhost_cuse_device_ops;
+	case VHOST_DRV_USER:
+		return &vhost_user_device_ops;
 	default:
 		break;
 	}
@@ -570,9 +621,14 @@ int rte_vhost_feature_enable(uint64_t feature_mask)
  * Register ops so that we can add/remove device to data core.
  */
 int
-rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const ops)
+rte_vhost_driver_callback_register(struct vhost_driver *drv,
+		struct virtio_net_device_ops const * const ops, void *priv)
 {
+	if (drv == NULL || ops == NULL)
+		return -1;
+
 	notify_ops = ops;
+	drv->priv = priv;
 
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension
  2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
                   ` (6 preceding siblings ...)
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation Tetsuya Mukawa
@ 2014-11-07  3:33 ` Xie, Huawei
  2014-11-07  5:09   ` Tetsuya Mukawa
  7 siblings, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-07  3:33 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Tetsuya:
Will do careful review. 
You send all the patches including vhost-user implementation, seems I don't have to send mine, :).
When do you plan to send formal patch?

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Thursday, November 06, 2014 4:14 AM
> To: dev@dpdk.org
> Cc: nakajima.yoshihiro@lab.ntt.co.jp; masutani.hitoshi@lab.ntt.co.jp
> Subject: [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension
> 
> Hi Xie,
> 
> Here are RFC patches to add vhost-user extension to librte_vhost.
> 
> It seems now you are merging a patch that fixes coding style of
> librte_vhost.
> Unfortunately my patches based on latest tree, so I will submit
> again after your patch is acked.
> Because of this, I haven't check coding style strictly. When I
> rebase on your new patch, I will check coding style too.
> 
> Anyway, could you please check patches?
> 
> Thanks,
> Tetsuya
> 
> Tetsuya Mukawa (7):
>   lib/librte_vhost: Fix host_memory_map() to handle various memory
>     regions
>   lib/librte_vhost: Add an abstraction layer for vhost backends
>   lib/librte_vhost: Add an abstraction layer tointerpret messages
>   lib/librte_vhost: Move vhost vhost-cuse device list and accessor
>     functions
>   lib/librte_vhost: Add a vhost session abstraction
>   lib/librte_vhost: Add vhost-cuse/user specific initialization
>   lib/librte_vhost: Add vhost-user implementation
> 
>  lib/librte_vhost/Makefile          |   2 +-
>  lib/librte_vhost/rte_virtio_net.h  |  49 ++-
>  lib/librte_vhost/vhost-net-cdev.c  |  29 +-
>  lib/librte_vhost/vhost-net-cdev.h  | 113 -------
>  lib/librte_vhost/vhost-net-user.c  | 541 ++++++++++++++++++++++++++++++
>  lib/librte_vhost/vhost-net.c       | 132 ++++++++
>  lib/librte_vhost/vhost-net.h       | 127 +++++++
>  lib/librte_vhost/vhost_rxtx.c      |   2 +-
>  lib/librte_vhost/virtio-net-cdev.c | 624
> ++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio-net-user.c | 410 +++++++++++++++++++++++
>  lib/librte_vhost/virtio-net.c      | 669 ++++++++-----------------------------
>  11 files changed, 2032 insertions(+), 666 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost-net.c
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/virtio-net-user.c
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension
  2014-11-07  3:33 ` [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Xie, Huawei
@ 2014-11-07  5:09   ` Tetsuya Mukawa
       [not found]     ` <C37D651A908B024F974696C65296B57B0F2E3C93@SHSMSX101.ccr.corp.intel.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-07  5:09 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

(2014/11/07 12:33), Xie, Huawei wrote:
> Tetsuya:
> Will do careful review. 
> You send all the patches including vhost-user implementation, seems I don't have to send mine, :).
> When do you plan to send formal patch?
>
Sorry, I just thought you also wanted to check vhost-user implementation.

Now I am rebasing on your new patch.
Probably next Monday, I will submit only patches related to create
abstraction layer that hides vhost-user and vhost-cuse differences from
DPDK apps.

While rebasing, I am still facing several warnings from checkpatch.pl.
I guess some of them are not fixed intentionally. So could I igonore those?
Of course, I will remove all warnings come from code I added.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension
       [not found]     ` <C37D651A908B024F974696C65296B57B0F2E3C93@SHSMSX101.ccr.corp.intel.com>
@ 2014-11-07  6:16       ` Tetsuya Mukawa
  0 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-07  6:16 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

(2014/11/07 14:24), Xie, Huawei wrote:
> What about vhost-user implementation? 
It seems I misunderstand your comment. I will send all vhost-user files.

> We try to fix all of them if possible, but it is ok to leave some warnings, for example,
>  lengthy lines.
Thanks, I will.

Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages Tetsuya Mukawa
@ 2014-11-07 20:43   ` Xie, Huawei
  2014-11-10  5:12     ` Tetsuya Mukawa
  0 siblings, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-07 20:43 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

> -struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
> +struct vhost_net_device_ops const *get_virtio_net_callbacks(
> +		vhost_driver_type_t type);

Tetsuya:
I feel currently it is better we still keep the common get_virtio_net_callbacks(). 
For the message flow from control layer 1 (cuse ioctl or user sock message recv/xmit)---> cuse/user local message handling layer 2-> common virtio message handling layer 3
Layer 1 and layer 2 belong to one  module. It is that module's choice whether to implement callbacks between internal layer1 and layer2. We don't need to force that.
Besides, even that module wants to define the ops between layer 1 and layer2, the interface could be different between cuse/user. 
Refer to the following code for user:

vhost-user-server.c:
case VHOST_USER_SET_MEM_TABLE:
	user_set_mem_table(ctx, &msg)

virtio-net-user.c:
user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
{

....

	ops->set_mem_table(ctx, regions, memory.nregions);
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation Tetsuya Mukawa
@ 2014-11-07 21:25   ` Xie, Huawei
  2014-11-10  5:11     ` Tetsuya Mukawa
  2014-11-14  0:07   ` Xie, Huawei
  1 sibling, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-07 21:25 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

How about using client/server model and select/poll event handing mechanism rather than poll?
The polling could cause periodic jitter.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Thursday, November 06, 2014 4:15 AM
> To: dev@dpdk.org
> Cc: nakajima.yoshihiro@lab.ntt.co.jp; masutani.hitoshi@lab.ntt.co.jp
> Subject: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user
> implementation
> 
> This patch adds vhost-user implementation to librte_vhost.
> To communicate with vhost-user of QEMU, speficy VHOST_DRV_USER as
> a vhost_driver_type_t variable in rte_vhost_driver_register().
> 
> Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
> ---
>  lib/librte_vhost/rte_virtio_net.h  |  19 +-
>  lib/librte_vhost/vhost-net-user.c  | 541
> +++++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/vhost-net.c       |  39 ++-
>  lib/librte_vhost/vhost-net.h       |   7 +
>  lib/librte_vhost/virtio-net-user.c | 410 ++++++++++++++++++++++++++++
>  lib/librte_vhost/virtio-net.c      |  64 ++++-
>  6 files changed, 1073 insertions(+), 7 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/virtio-net-user.c
> 
> diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
> index a9e20ea..af07900 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -75,17 +75,32 @@ struct buf_vector {
>   */
>  typedef enum {
>  	VHOST_DRV_CUSE, /* cuse driver */
> +	VHOST_DRV_USER, /* vhost-user driver */
>  	VHOST_DRV_NUM	/* the number of vhost driver types */
>  } vhost_driver_type_t;
> 
> +
> +/**
> + * Structure contains vhost-user session specific information
> + */
> +struct vhost_user_session {
> +	int		fh;		/**< session identifier */
> +	pthread_t	tid;		/**< thread id of session handler */
> +	int		socketfd;	/**< fd of socket */
> +	int		interval;	/**< reconnection interval of session
> */
> +};
> +
>  /**
>   * Structure contains information relating vhost driver.
>   */
>  struct vhost_driver {
>  	vhost_driver_type_t	type;		/**< driver type. */
>  	const char		*dev_name;	/**< accessing device name. */
> +	void			*priv;		/**< private data. */
>  	union {
>  		struct fuse_session *cuse_session;	/**< fuse session. */
> +		struct vhost_user_session *user_session;
> +						/**< vhost-user session. */
>  	};
>  };
> 
> @@ -199,9 +214,11 @@ struct vhost_driver *rte_vhost_driver_register(
>  		const char *dev_name, vhost_driver_type_t type);
> 
>  /* Register callbacks. */
> -int rte_vhost_driver_callback_register(struct virtio_net_device_ops const *
> const);
> +int rte_vhost_driver_callback_register(struct vhost_driver *drv,
> +			struct virtio_net_device_ops const * const, void *priv);
>  /* Start vhost driver session blocking loop. */
>  int rte_vhost_driver_session_start(struct vhost_driver *drv);
> +void rte_vhost_driver_session_stop(struct vhost_driver *drv);
> 
>  /**
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
> diff --git a/lib/librte_vhost/vhost-net-user.c b/lib/librte_vhost/vhost-net-user.c
> new file mode 100644
> index 0000000..434f20f
> --- /dev/null
> +++ b/lib/librte_vhost/vhost-net-user.c
> @@ -0,0 +1,541 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2014 IGEL Co/.Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <linux/un.h>
> +
> +#define VHOST_USER_MAX_DEVICE		(32)
> +#define VHOST_USER_MAX_FD_NUM		(3)
> +
> +/* start id of vhost user device */
> +rte_atomic16_t vhost_user_device_id;
> +
> +static struct vhost_net_device_ops const *ops;
> +
> +typedef enum VhostUserRequest {
> +	VHOST_USER_NONE = 0,
> +	VHOST_USER_GET_FEATURES = 1,
> +	VHOST_USER_SET_FEATURES = 2,
> +	VHOST_USER_SET_OWNER = 3,
> +	VHOST_USER_RESET_OWNER = 4,
> +	VHOST_USER_SET_MEM_TABLE = 5,
> +	VHOST_USER_SET_LOG_BASE = 6,
> +	VHOST_USER_SET_LOG_FD = 7,
> +	VHOST_USER_SET_VRING_NUM = 8,
> +	VHOST_USER_SET_VRING_ADDR = 9,
> +	VHOST_USER_SET_VRING_BASE = 10,
> +	VHOST_USER_GET_VRING_BASE = 11,
> +	VHOST_USER_SET_VRING_KICK = 12,
> +	VHOST_USER_SET_VRING_CALL = 13,
> +	VHOST_USER_SET_VRING_ERR = 14,
> +	VHOST_USER_MAX
> +} VhostUserRequest;
> +
> +#define VHOST_MEMORY_MAX_NREGIONS	8
> +
> +typedef struct VhostUserMemoryRegion {
> +	uint64_t guest_phys_addr;
> +	uint64_t memory_size;
> +	uint64_t userspace_addr;
> +	uint64_t mmap_offset;
> +} VhostUserMemoryRegion;
> +
> +typedef struct VhostUserMemory {
> +	uint32_t nregions;
> +	uint32_t padding;
> +	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +} VhostUserMemory;
> +
> +typedef struct VhostUserMsg {
> +	VhostUserRequest request;
> +
> +#define VHOST_USER_VERSION_MASK		(0x3)
> +#define VHOST_USER_REPLY_MASK		(0x1<<2)
> +	uint32_t flags;
> +	uint32_t size; /* the following payload size */
> +	union {
> +#define VHOST_USER_VRING_IDX_MASK	(0xff)
> +#define VHOST_USER_VRING_NOFD_MASK	(0x1<<8)
> +		uint64_t u64;
> +		struct vhost_vring_state state;
> +		struct vhost_vring_addr addr;
> +		VhostUserMemory memory;
> +	};
> +} __attribute__((packed)) VhostUserMsg;
> +
> +static VhostUserMsg m __attribute__ ((unused));
> +#define VHOST_USER_HDR_SIZE	(sizeof(m.request) \
> +		+ sizeof(m.flags) + sizeof(m.size))
> +
> +/* The version of the protocol we support */
> +#define VHOST_USER_VERSION		(0x1)
> +
> +static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
> +	-1,			/* VHOST_USER_NONE */
> +	VHOST_GET_FEATURES,	/* VHOST_USER_GET_FEATURES */
> +	VHOST_SET_FEATURES,	/* VHOST_USER_SET_FEATURES */
> +	VHOST_SET_OWNER,	/* VHOST_USER_SET_OWNER */
> +	VHOST_RESET_OWNER,	/* VHOST_USER_RESET_OWNER */
> +	VHOST_SET_MEM_TABLE,	/* VHOST_USER_SET_MEM_TABLE */
> +	VHOST_SET_LOG_BASE,	/* VHOST_USER_SET_LOG_BASE */
> +	VHOST_SET_LOG_FD,	/* VHOST_USER_SET_LOG_FD */
> +	VHOST_SET_VRING_NUM,	/* VHOST_USER_SET_VRING_NUM */
> +	VHOST_SET_VRING_ADDR,	/* VHOST_USER_SET_VRING_ADDR */
> +	VHOST_SET_VRING_BASE,	/* VHOST_USER_SET_VRING_BASE */
> +	VHOST_GET_VRING_BASE,	/* VHOST_USER_GET_VRING_BASE */
> +	VHOST_SET_VRING_KICK,	/* VHOST_USER_SET_VRING_KICK */
> +	VHOST_SET_VRING_CALL,	/* VHOST_USER_SET_VRING_CALL */
> +	VHOST_SET_VRING_ERR	/* VHOST_USER_SET_VRING_ERR */
> +};
> +
> +/**
> + * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
> when
> + * the device is added to the device linked list.
> + */
> +static struct vhost_device_ctx
> +vhost_driver_to_vhost_ctx(struct vhost_driver *drv)
> +{
> +	struct vhost_device_ctx ctx;
> +	int device_id = drv->user_session->fh;
> +
> +	ctx.type = VHOST_DRV_USER;
> +	ctx.fh = device_id;
> +	ctx.user.drv = drv;
> +
> +	return ctx;
> +}
> +
> +/**
> + * When the device is created in QEMU it gets initialised here and added to the
> device linked list.
> + */
> +static int
> +vhost_user_open(struct vhost_driver *drv)
> +{
> +	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
> +
> +	int ret;
> +
> +	ret = ops->new_device(ctx);
> +	if (ret == -1)
> +		return -1;
> +
> +	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device configuration
> started\n", ctx.fh);
> +
> +	return 0;
> +}
> +
> +/**
> + * When QEMU is shutdown or killed the device gets released.
> + */
> +static void
> +vhost_user_release(struct vhost_driver *drv)
> +{
> +	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
> +
> +	ops->destroy_device(ctx);
> +	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n",
> ctx.fh);
> +}
> +
> +/**
> + * Send data to vhost-user device on a QEMU.
> + */
> +static int
> +vhost_user_write(struct vhost_driver *drv, VhostUserMsg *msg,
> +		int *fds, size_t fd_num)
> +{
> +	int fd, len;
> +	size_t fd_size = fd_num * sizeof(int);
> +	char control[CMSG_SPACE(fd_size)];
> +	struct msghdr msg_header;
> +	struct iovec iov[1];
> +	struct cmsghdr *cmsg_header;
> +	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
> +
> +	if ((drv == NULL) || (msg == NULL))
> +		return -EINVAL;
> +
> +	fd = drv->user_session->socketfd;
> +
> +	memset(&msg_header, 0, sizeof(msg_header));
> +	memset(control, 0, sizeof(control));
> +
> +	/* set the payload */
> +	iov[0].iov_base = (void *)msg;
> +	iov[0].iov_len = VHOST_USER_HDR_SIZE + msg->size;
> +
> +	msg_header.msg_iov = iov;
> +	msg_header.msg_iovlen = 1;
> +
> +	if (fd_num) {
> +		msg_header.msg_control = control;
> +		msg_header.msg_controllen = sizeof(control);
> +		cmsg_header = CMSG_FIRSTHDR(&msg_header);
> +		cmsg_header->cmsg_len = CMSG_LEN(fd_size);
> +		cmsg_header->cmsg_level = SOL_SOCKET;
> +		cmsg_header->cmsg_type = SCM_RIGHTS;
> +		memcpy(CMSG_DATA(cmsg_header), fds, fd_size);
> +	} else {
> +		msg_header.msg_control = 0;
> +		msg_header.msg_controllen = 0;
> +	}
> +
> +	do {
> +		len = sendmsg(fd, &msg_header, 0);
> +	} while (len < 0 && errno == EINTR);
> +
> +	if (len < 0)
> +		goto error;
> +
> +	return 0;
> +
> +error:
> +	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device cannot send
> message\n", ctx.fh);
> +	return -EFAULT;
> +}
> +
> +/**
> + * Receive data from vhost-user device on a QEMU.
> + */
> +static int
> +vhost_user_read(struct vhost_driver *drv, VhostUserMsg *msg,
> +		int *fds, size_t *fd_num)
> +{
> +	int fd, len;
> +	size_t fd_size = (*fd_num) * sizeof(int);
> +	char control[CMSG_SPACE(fd_size)];
> +	struct msghdr msg_header;
> +	struct iovec iov[1];
> +	struct cmsghdr *cmsg_header;
> +	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
> +
> +	if ((drv == NULL) || (msg == NULL))
> +		return -EINVAL;
> +
> +	fd = drv->user_session->socketfd;
> +
> +	memset(&msg_header, 0, sizeof(msg_header));
> +	memset(control, 0, sizeof(control));
> +	*fd_num = 0;
> +
> +	/* set the payload */
> +	iov[0].iov_base = (void *)msg;
> +	iov[0].iov_len = VHOST_USER_HDR_SIZE;
> +
> +	msg_header.msg_iov = iov;
> +	msg_header.msg_iovlen = 1;
> +	msg_header.msg_control = control;
> +	msg_header.msg_controllen = sizeof(control);
> +
> +	if ((len = recvmsg(fd, &msg_header, 0)) <= 0)
> +		goto error;
> +
> +	if (msg_header.msg_flags & (MSG_TRUNC | MSG_CTRUNC))
> +		goto error;
> +
> +	cmsg_header = CMSG_FIRSTHDR(&msg_header);
> +	if (cmsg_header && cmsg_header->cmsg_len > 0 &&
> +			cmsg_header->cmsg_level == SOL_SOCKET &&
> +			cmsg_header->cmsg_type == SCM_RIGHTS) {
> +		if (fd_size >= cmsg_header->cmsg_len - CMSG_LEN(0)) {
> +			fd_size = cmsg_header->cmsg_len - CMSG_LEN(0);
> +			memcpy(fds, CMSG_DATA(cmsg_header), fd_size);
> +			*fd_num = fd_size / sizeof(int);
> +		}
> +	}
> +
> +	if (read(fd, ((char *)msg) + len, msg->size) < 0)
> +		goto error;
> +
> +	return 0;
> +
> +error:
> +	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device cannot receive
> message\n", ctx.fh);
> +	return -EFAULT;
> +}
> +
> +/*
> + * Boilerplate code for vhost-user IOCTL
> + * Implicit arguments: ctx, req, result.
> + */
> +#define VHOST_USER_IOCTL(func) do {	\
> +	result = (func)(ctx);		\
> +} while (0)
> +
> +/*
> + * Boilerplate code for vhost-user Read IOCTL
> + * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
> + */
> +#define VHOST_USER_IOCTL_R(type, var, func) do {\
> +	result = func(ctx, &(var));		\
> +} while (0)
> +
> +/*
> + * Boilerplate code for vhost-user Write IOCTL
> + * Implicit arguments: ctx, req, result, out_bufsz.
> + */
> +#define	VHOST_USER_IOCTL_W(type, var, func) do {\
> +	result = (func)(ctx, &(var));		\
> +	msg->flags |= VHOST_USER_REPLY_MASK;	\
> +	msg->size = sizeof(type);		\
> +	vhost_user_write(drv, msg, NULL, 0);	\
> +} while (0)
> +
> +/*
> + * Boilerplate code for vhost-user Read/Write IOCTL
> + * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
> + */
> +#define VHOST_USER_IOCTL_RW(type1, var1, type2, var2, func) do {\
> +	result = (func)(ctx, (var1), &(var2));			\
> +	msg->flags |= VHOST_USER_REPLY_MASK;			\
> +	msg->size = sizeof(type2);				\
> +	vhost_user_write(drv, msg, NULL, 0);			\
> +} while (0)
> +
> +/**
> + * The IOCTLs are handled using unix domain socket in userspace.
> + */
> +	static int
> +vhost_user_ioctl(struct vhost_driver *drv, VhostUserMsg *msg,
> +		int *fds, int fd_num)
> +{
> +	struct vhost_device_ctx ctx = vhost_driver_to_vhost_ctx(drv);
> +	struct vhost_vring_file file;
> +	int result = 0;
> +
> +	switch (ioctl_to_vhost_user_request[msg->request]) {
> +	case VHOST_GET_FEATURES:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_GET_FEATURES\n", ctx.fh);
> +		VHOST_USER_IOCTL_W(uint64_t, msg->u64, ops->get_features);
> +		break;
> +
> +	case VHOST_SET_FEATURES:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_FEATURES\n", ctx.fh);
> +		VHOST_USER_IOCTL_R(uint64_t, msg->u64, ops->set_features);
> +		break;
> +
> +	case VHOST_RESET_OWNER:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_RESET_OWNER\n", ctx.fh);
> +		VHOST_USER_IOCTL(ops->reset_owner);
> +		break;
> +
> +	case VHOST_SET_OWNER:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_OWNER\n", ctx.fh);
> +		VHOST_USER_IOCTL(ops->set_owner);
> +		break;
> +
> +	case VHOST_SET_MEM_TABLE:
> +		/*TODO fix race condition.*/
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_MEM_TABLE\n", ctx.fh);
> +		/* all fds should be same, because physical memory consist of
> an one file */
> +		ctx.user.fds = fds;
> +		ctx.user.fd_num = fd_num;
> +		result = ops->set_mem_table(ctx, &msg->memory, msg-
> >memory.nregions);
> +		break;
> +
> +	case VHOST_SET_VRING_NUM:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_VRING_NUM\n", ctx.fh);
> +		VHOST_USER_IOCTL_R(struct vhost_vring_state, msg->state,
> ops->set_vring_num);
> +		break;
> +
> +	case VHOST_SET_VRING_BASE:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_VRING_BASE\n", ctx.fh);
> +		VHOST_USER_IOCTL_R(struct vhost_vring_state, msg->state,
> ops->set_vring_base);
> +		break;
> +
> +	case VHOST_GET_VRING_BASE:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_GET_VRING_BASE\n", ctx.fh);
> +		VHOST_USER_IOCTL_RW(uint32_t, msg->addr.index, struct
> vhost_vring_state, msg->state, ops->get_vring_base);
> +		break;
> +
> +	case VHOST_SET_VRING_ADDR:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_VRING_ADDR\n", ctx.fh);
> +		VHOST_USER_IOCTL_R(struct vhost_vring_addr, msg->addr,
> ops->set_vring_addr);
> +		break;
> +
> +	case VHOST_SET_VRING_KICK:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_VRING_KICK\n", ctx.fh);
> +		ctx.user.fds = fds;
> +		ctx.user.fd_num = fd_num;
> +		file.index = msg->u64;
> +		VHOST_USER_IOCTL_R(struct vhost_vring_file, file, ops-
> >set_vring_kick);
> +		break;
> +
> +	case VHOST_SET_VRING_CALL:
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL:
> VHOST_SET_VRING_CALL\n", ctx.fh);
> +		ctx.user.fds = fds;
> +		ctx.user.fd_num = fd_num;
> +		file.index = msg->u64;
> +		VHOST_USER_IOCTL_R(struct vhost_vring_file, file, ops-
> >set_vring_call);
> +		break;
> +
> +	case VHOST_NET_SET_BACKEND:
> +	default:
> +		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") IOCTL: DOESN
> NOT EXIST\n", ctx.fh);
> +		result = -1;
> +	}
> +
> +	if (result < 0)
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: FAIL\n",
> ctx.fh);
> +	else
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") IOCTL: SUCCESS\n",
> ctx.fh);
> +
> +	return result;
> +}
> +
> +/**
> + * vhost-user specific registration.
> + */
> +static int
> +vhost_user_driver_register(struct vhost_driver *drv)
> +{
> +	if ((drv == NULL) || (drv->dev_name == NULL) ||
> +			(strlen(drv->dev_name) > UNIX_PATH_MAX - 1))
> +		return -1;
> +
> +	ops = get_virtio_net_callbacks(drv->type);
> +
> +	drv->user_session = rte_malloc(NULL, sizeof(struct vhost_user_session),
> CACHE_LINE_SIZE);
> +	if (drv->user_session == NULL)
> +		return -1;
> +
> +	drv->user_session->fh =
> +		rte_atomic16_add_return(&vhost_user_device_id, 1) - 1; /* fh
> of first device is zero */
> +	drv->user_session->interval = 1;
> +
> +	return 0;
> +}
> +
> +/**
> + * When vhost-user driver starts, the session handler communicates with vhost-
> user
> + * device on a QEMU using a unix domain sokcet.
> + */
> +static void *
> +vhost_user_session_handler(void *data)
> +{
> +	struct vhost_driver *drv = data;
> +	int ret;
> +	struct sockaddr_un caddr;
> +	VhostUserMsg msg;
> +	int fds[VHOST_USER_MAX_FD_NUM];
> +	int socketfd;
> +	int interval;
> +	size_t fd_num;
> +
> +	if ((drv == NULL) || (drv->dev_name == NULL))
> +		return NULL;
> +
> +	bzero(&caddr, sizeof(caddr));
> +	caddr.sun_family = AF_LOCAL;
> +	strncpy((char *)&caddr.sun_path, drv->dev_name, strlen(drv-
> >dev_name));
> +
> +reconnect:
> +	drv->user_session->socketfd = socket(AF_UNIX, SOCK_STREAM, 0);
> +	if (drv->user_session->socketfd < 0)
> +		return NULL;
> +
> +	socketfd = drv->user_session->socketfd;
> +	interval = drv->user_session->interval;
> +	while (1) {
> +		ret = connect(socketfd, (struct sockaddr *)&caddr, sizeof(caddr));
> +		if (ret == 0)
> +			break; /* success */
> +		sleep(interval);
> +	}
> +
> +	ret = vhost_user_open(drv);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) open failuer\n",
> drv->dev_name);
> +		return NULL;
> +	}
> +
> +	for (;;) {
> +		fd_num = VHOST_USER_MAX_FD_NUM;
> +		bzero(&msg, sizeof(VhostUserMsg));
> +		ret = vhost_user_read(drv, &msg, fds, &fd_num);
> +		if (ret != 0) {
> +			RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) read
> failuer\n", drv->dev_name);
> +			vhost_user_release(drv);
> +			goto reconnect;
> +		}
> +
> +		ret = vhost_user_ioctl(drv, &msg, fds, fd_num);
> +		if (ret != 0) {
> +			RTE_LOG(ERR, VHOST_CONFIG, "(Socket %s) request
> failuer\n", drv->dev_name);
> +			vhost_user_release(drv);
> +			goto reconnect;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * Create session handler
> + */
> +static int
> +vhost_user_driver_start(struct vhost_driver *drv)
> +{
> +	if (pthread_create(&drv->user_session->tid, NULL,
> vhost_user_session_handler, drv)) {
> +		RTE_LOG(ERR, VHOST_CONFIG,
> +				"(Socket %s) starting event handler failuer\n",
> drv->dev_name);
> +		return -1;
> +	}
> +
> +	/* TODO: The event handler thread may need to run on a core user
> speficied. */
> +
> +	return 0;
> +}
> +
> +/**
> + * Destroy session handler
> + */
> +static void
> +vhost_user_driver_stop(struct vhost_driver *drv)
> +{
> +	pthread_t *tid = &drv->user_session->tid;
> +
> +	if (pthread_create(tid, NULL, vhost_user_session_handler, drv)) {
> +		RTE_LOG(ERR, VHOST_CONFIG,
> +				"(Socket %s) starting event handler failuer\n",
> drv->dev_name);
> +		return;
> +	}
> +
> +	/* stop event thread and wait until connection is closed */
> +	if (*tid) {
> +		pthread_cancel(*tid);
> +		pthread_join(*tid, NULL);
> +	}
> +
> +	vhost_user_release(drv);
> +}
> diff --git a/lib/librte_vhost/vhost-net.c b/lib/librte_vhost/vhost-net.c
> index b0de5fd..10f41e9 100644
> --- a/lib/librte_vhost/vhost-net.c
> +++ b/lib/librte_vhost/vhost-net.c
> @@ -42,6 +42,11 @@
>   */
>  #include "vhost-net-cdev.c"
> 
> +/*
> + * Include vhost-user depend functions and definitions
> + */
> +#include "vhost-net-user.c"
> +
>  /**
>   * This function abstracts cuse and vhost-user driver registration.
>   */
> @@ -65,10 +70,17 @@ rte_vhost_driver_register(const char *dev_name,
> vhost_driver_type_t type)
>  		if (ret != 0)
>  			goto err;
>  		break;
> +	case VHOST_DRV_USER:
> +		ret = vhost_user_driver_register(drv);
> +		break;
>  	default:
> +		ret = -EINVAL;
>  		break;
>  	}
> 
> +	if (ret != 0)
> +		goto err;
> +
>  	return drv;
>  err:
>  	free(drv);
> @@ -81,17 +93,40 @@ err:
>  int
>  rte_vhost_driver_session_start(struct vhost_driver *drv)
>  {
> +	int ret;
> +
>  	if (drv == NULL)
>  		return -ENODEV;
> 
>  	switch (drv->type) {
>  	case VHOST_DRV_CUSE:
> -		vhost_cuse_driver_session_start(drv);
> +		ret = vhost_cuse_driver_session_start(drv);
> +		break;
> +	case VHOST_DRV_USER:
> +		ret = vhost_user_driver_start(drv);
>  		break;
>  	default:
> +		ret = -EINVAL;
>  		break;
>  	}
> 
> -	return 0;
> +	return ret;
>  }
> 
> +/**
> + * The vhost session is closed, only allow for vhost-user.
> + */
> +void
> +rte_vhost_driver_session_stop(struct vhost_driver *drv)
> +{
> +	if (drv == NULL)
> +		return;
> +
> +	switch (drv->type) {
> +	case VHOST_DRV_USER:
> +		vhost_user_driver_stop(drv);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index ef04832..0e36ba0 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -76,6 +76,12 @@ struct vhost_device_cuse_ctx {
>  	pid_t   pid;	/* PID of process calling the IOCTL. */
>  };
> 
> +struct vhost_device_user_ctx {
> +	int			*fds;
> +	int			fd_num;
> +	struct vhost_driver	*drv;
> +};
> +
>  /*
>   * Structure used to identify device context.
>   */
> @@ -83,6 +89,7 @@ struct vhost_device_ctx {
>  	vhost_driver_type_t	type;	/* driver type. */
>  	uint64_t		fh;	/* Populated with fi->fh to track the
> device index. */
>  	union {
> +		struct vhost_device_user_ctx user;
>  		struct vhost_device_cuse_ctx cdev;
>  	};
>  };
> diff --git a/lib/librte_vhost/virtio-net-user.c b/lib/librte_vhost/virtio-net-user.c
> new file mode 100644
> index 0000000..1e78f98
> --- /dev/null
> +++ b/lib/librte_vhost/virtio-net-user.c
> @@ -0,0 +1,410 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2014 IGEL Co.,Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <stdlib.h>
> +
> +/* Functions defined in virtio_net.c */
> +static void init_device(struct vhost_device_ctx ctx, struct virtio_net *dev);
> +static void cleanup_device(struct virtio_net *dev);
> +static void free_device(struct virtio_net_config_ll *ll_dev);
> +static int new_device(struct vhost_device_ctx ctx);
> +static void destroy_device(struct vhost_device_ctx ctx);
> +static int set_owner(struct vhost_device_ctx ctx);
> +static int reset_owner(struct vhost_device_ctx ctx);
> +static int get_features(struct vhost_device_ctx ctx, uint64_t *pu);
> +static int set_features(struct vhost_device_ctx ctx, uint64_t *pu);
> +static int set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state
> *state);
> +static int set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr
> *addr);
> +static int set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state
> *state);
> +static int set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file);
> +
> +/* Root address of the linked list in the configuration core. */
> +static struct virtio_net_config_ll *user_ll_root;
> +
> +/**
> + * Retrieves an entry from the devices configuration linked list.
> + */
> +static struct virtio_net_config_ll *
> +user_get_config_ll_entry(struct vhost_device_ctx ctx)
> +{
> +	struct virtio_net_config_ll *ll_dev = user_ll_root;
> +
> +	/* Loop through linked list until the device_fh is found. */
> +	while (ll_dev != NULL) {
> +		if (ll_dev->dev.device_fh == ctx.fh)
> +			return ll_dev;
> +		ll_dev = ll_dev->next;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * Searches the configuration core linked list and retrieves the device if it exists.
> + */
> +static struct virtio_net *
> +user_get_device(struct vhost_device_ctx ctx)
> +{
> +	struct virtio_net_config_ll *ll_dev;
> +
> +	ll_dev = user_get_config_ll_entry(ctx);
> +
> +	/* If a matching entry is found in the linked list, return the device in that
> entry. */
> +	if (ll_dev)
> +		return &ll_dev->dev;
> +
> +	RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Device not found in linked
> list.\n", ctx.fh);
> +	return NULL;
> +}
> +
> +/**
> + * Add entry containing a device to the device configuration linked list.
> + */
> +static void
> +user_add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
> +{
> +	struct virtio_net_config_ll *ll_dev = user_ll_root;
> +
> +	/* If ll_dev == NULL then this is the first device so go to else */
> +	if (ll_dev) {
> +		/* If the 1st device_fh != 0 then we insert our device here. */
> +		if (ll_dev->dev.device_fh != 0)	{
> +			new_ll_dev->dev.device_fh = 0;
> +			new_ll_dev->next = ll_dev;
> +			user_ll_root = new_ll_dev;
> +		} else {
> +			/* Increment through the ll until we find un unused
> device_fh. Insert the device at that entry*/
> +			while ((ll_dev->next != NULL) && (ll_dev->dev.device_fh
> == (ll_dev->next->dev.device_fh - 1)))
> +				ll_dev = ll_dev->next;
> +
> +			new_ll_dev->dev.device_fh = ll_dev->dev.device_fh + 1;
> +			new_ll_dev->next = ll_dev->next;
> +			ll_dev->next = new_ll_dev;
> +		}
> +	} else {
> +		user_ll_root = new_ll_dev;
> +		user_ll_root->dev.device_fh = 0;
> +	}
> +
> +}
> +
> +/**
> + * Remove an entry from the device configuration linked list.
> + */
> +static struct virtio_net_config_ll *
> +user_rm_config_ll_entry(struct virtio_net_config_ll *ll_dev, struct
> virtio_net_config_ll *ll_dev_last)
> +{
> +	/* First remove the device and then clean it up. */
> +	if (ll_dev == user_ll_root) {
> +		user_ll_root = ll_dev->next;
> +		cleanup_device(&ll_dev->dev);
> +		free_device(ll_dev);
> +		return user_ll_root;
> +	} else {
> +		if (likely(ll_dev_last != NULL)) {
> +			ll_dev_last->next = ll_dev->next;
> +			cleanup_device(&ll_dev->dev);
> +			free_device(ll_dev);
> +			return ll_dev_last->next;
> +		} else {
> +			cleanup_device(&ll_dev->dev);
> +			free_device(ll_dev);
> +			RTE_LOG(ERR, VHOST_CONFIG, "Remove entry from
> config_ll failed\n");
> +			return NULL;
> +		}
> +	}
> +}
> +
> +/**
> + * Returns the root entry of linked list
> + */
> +static struct virtio_net_config_ll *
> +user_get_config_ll_root(void)
> +{
> +	return user_ll_root;
> +}
> +
> +/**
> + * vhost-user specific device initialization.
> + */
> +static void
> +user_init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)
> +{
> +	dev->priv = ctx.user.drv->priv;
> +}
> +
> +/**
> + * Locate the file containing QEMU's memory space and map it to our address
> space.
> + */
> +static int
> +user_host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
> int fd, size_t size)
> +{
> +	void *map;
> +
> +	map = mmap(0, size, PROT_READ|PROT_WRITE ,
> MAP_POPULATE|MAP_SHARED, fd, 0);
> +	close(fd);
> +
> +	if (map == MAP_FAILED) {
> +		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Error mapping the
> file fd %d\n",  dev->device_fh, fd);
> +		return -1;
> +	}
> +
> +	/* Store the memory address and size in the device data structure */
> +	mem->mapped_address = (uint64_t)(uintptr_t)map;
> +	mem->mapped_size = size;
> +
> +	LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Mem File: fd: %d - Size: %llu
> - VA: %p\n", dev->device_fh,
> +			fd, (long long unsigned)mem->mapped_size, map);
> +
> +	return 0;
> +}
> +
> +/*
> + * Called from IOCTL: VHOST_SET_MEM_TABLE
> + * This function creates and populates the memory structure for the device.
> This includes
> + * storing offsets used to translate buffer addresses.
> + */
> +static int
> +user_set_mem_table(struct vhost_device_ctx ctx, const void
> *mem_regions_addr, uint32_t nregions)
> +{
> +	struct virtio_net *dev;
> +	struct vhost_memory_region *mem_regions;
> +	struct virtio_memory *mem;
> +	uint64_t size = offsetof(struct vhost_memory, regions);
> +	uint32_t regionidx, valid_regions;
> +	size_t guest_memory_size = 0;
> +
> +	dev = user_get_device(ctx);
> +	if (dev == NULL)
> +		return -1;
> +
> +	if (dev->mem) {
> +		munmap((void *)(uintptr_t)dev->mem->mapped_address,
> (size_t)dev->mem->mapped_size);
> +		free(dev->mem);
> +	}
> +
> +	/* Malloc the memory structure depending on the number of regions. */
> +	mem = calloc(1, sizeof(struct virtio_memory) + (sizeof(struct
> virtio_memory_regions) * nregions));
> +	if (mem == NULL) {
> +		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate
> memory for dev->mem.\n", dev->device_fh);
> +		return -1;
> +	}
> +
> +	mem->nregions = nregions;
> +
> +	mem_regions =
> (void*)(uintptr_t)((uint64_t)(uintptr_t)mem_regions_addr + size);
> +
> +	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
> +		/* Populate the region structure for each region. */
> +		mem->regions[regionidx].guest_phys_address =
> mem_regions[regionidx].guest_phys_addr;
> +		mem->regions[regionidx].guest_phys_address_end = mem-
> >regions[regionidx].guest_phys_address +
> +			mem_regions[regionidx].memory_size;
> +		mem->regions[regionidx].memory_size =
> mem_regions[regionidx].memory_size;
> +		mem->regions[regionidx].userspace_address =
> mem_regions[regionidx].userspace_addr;
> +		guest_memory_size += mem_regions[regionidx].memory_size;
> +
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u -
> GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
> +				regionidx, (void*)(uintptr_t)mem-
> >regions[regionidx].guest_phys_address,
> +				(void*)(uintptr_t)mem-
> >regions[regionidx].userspace_address,
> +				mem->regions[regionidx].memory_size);
> +	}
> +
> +	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
> +		/*set the base address mapping*/
> +		if (mem->regions[regionidx].guest_phys_address == 0x0) {
> +			mem->base_address = mem-
> >regions[regionidx].userspace_address;
> +			/* Map VM memory file */
> +			if (user_host_memory_map(dev, mem,
> ctx.user.fds[regionidx], guest_memory_size) != 0) {
> +				free(mem);
> +				return -1;
> +			}
> +		} else
> +			close(ctx.user.fds[regionidx]);
> +	}
> +
> +	/* Check that we have a valid base address. */
> +	if (mem->base_address == 0) {
> +		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base
> address of qemu memory file.\n", dev->device_fh);
> +		free(mem);
> +		return -1;
> +	}
> +
> +	/* Check if all of our regions have valid mappings. Usually one does not
> exist in the QEMU memory file. */
> +	valid_regions = mem->nregions;
> +	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
> +		if ((mem->regions[regionidx].userspace_address < mem-
> >base_address) ||
> +				(mem->regions[regionidx].userspace_address >
> (mem->base_address + mem->mapped_size)))
> +			valid_regions--;
> +	}
> +
> +	/* If a region does not have a valid mapping we rebuild our memory
> struct to contain only valid entries. */
> +	if (valid_regions != mem->nregions) {
> +		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory
> regions exist in the QEMU mem file. Re-populating mem structure\n",
> +				dev->device_fh);
> +
> +		/* Re-populate the memory structure with only valid regions.
> Invalid regions are over-written with memmove. */
> +		valid_regions = 0;
> +
> +		for (regionidx = mem->nregions; 0 != regionidx--;) {
> +			if ((mem->regions[regionidx].userspace_address <
> mem->base_address) ||
> +					(mem-
> >regions[regionidx].userspace_address > (mem->base_address + mem-
> >mapped_size))) {
> +				memmove(&mem->regions[regionidx], &mem-
> >regions[regionidx + 1],
> +						sizeof(struct
> virtio_memory_regions) * valid_regions);
> +			} else {
> +				valid_regions++;
> +			}
> +		}
> +	}
> +	mem->nregions = valid_regions;
> +	dev->mem = mem;
> +
> +	/*
> +	 * Calculate the address offset for each region. This offset is used to
> identify the vhost virtual address
> +	 * corresponding to a QEMU guest physical address.
> +	 */
> +	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++)
> +		dev->mem->regions[regionidx].address_offset = dev->mem-
> >regions[regionidx].userspace_address - dev->mem->base_address
> +			+ dev->mem->mapped_address - dev->mem-
> >regions[regionidx].guest_phys_address;
> +
> +	return 0;
> +}
> +
> +/**
> + * Called from IOCTL: VHOST_GET_VRING_BASE
> + * We send the virtio device our available ring last used index.
> + */
> +static int
> +user_get_vring_base(struct vhost_device_ctx ctx, uint32_t index, struct
> vhost_vring_state *state)
> +{
> +	struct virtio_net *dev;
> +
> +	dev = user_get_device(ctx);
> +	if (dev == NULL)
> +		return -1;
> +
> +	state->index = index;
> +	/* State->index refers to the queue index. The TX queue is 1, RX queue is
> 0. */
> +	state->num = dev->virtqueue[state->index]->last_used_idx;
> +
> +	return 0;
> +}
> +
> +/**
> + * Called from IOCTL: VHOST_SET_VRING_CALL
> + * The virtio device sends an eventfd to interrupt the guest. This fd gets copied
> in
> + * to our process space.
> + * Also this message is sent when virtio-net device is reset by device driver on
> QEMU.
> + */
> +static int
> +user_set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> +{
> +	struct virtio_net *dev;
> +	struct vhost_virtqueue *vq;
> +
> +	dev = user_get_device(ctx);
> +	if (dev == NULL)
> +		return -1;
> +
> +	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0.
> */
> +	vq = dev->virtqueue[file->index];
> +
> +	if (vq->kickfd)
> +		close((int)vq->kickfd);
> +
> +	/* Populate the eventfd_copy structure and call eventfd_copy. */
> +	vq->kickfd = ctx.user.fds[0];
> +
> +	return 0;
> +}
> +
> +/**
> + * Called from IOCTL: VHOST_SET_VRING_KICK
> + * The virtio device sends an eventfd that it can use to notify us. This fd gets
> copied in
> + * to our process space.
> + */
> +static int
> +user_set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> +{
> +	struct virtio_net *dev;
> +	struct vhost_virtqueue *vq;
> +
> +	dev = user_get_device(ctx);
> +	if (dev == NULL)
> +		return -1;
> +
> +	/* file->index refers to the queue index. The TX queue is 1, RX queue is 0.
> */
> +	vq = dev->virtqueue[file->index];
> +
> +	if (vq->callfd)
> +		close((int)vq->callfd);
> +
> +	/* Populate the eventfd_copy structure and call eventfd_copy. */
> +	vq->callfd = ctx.user.fds[0];
> +
> +	if ((dev->virtqueue[VIRTIO_RXQ] != NULL) && (dev-
> >virtqueue[VIRTIO_TXQ]) != NULL)
> +		return set_backend(ctx, file);
> +
> +	return 0;
> +}
> +
> +/**
> + * Function pointers are set for the device operations to allow to call functions
> + * when an IOCTL, device_add or device_release is received.
> + */
> +static const struct vhost_net_device_ops vhost_user_device_ops = {
> +	.new_device = new_device,
> +	.destroy_device = destroy_device,
> +
> +	.get_features = get_features,
> +	.set_features = set_features,
> +
> +	.set_mem_table = user_set_mem_table,
> +
> +	.set_vring_num = set_vring_num,
> +	.set_vring_addr = set_vring_addr,
> +	.set_vring_base = set_vring_base,
> +	.get_vring_base = user_get_vring_base,
> +
> +	.set_vring_kick = user_set_vring_kick,
> +	.set_vring_call = user_set_vring_call,
> +
> +	.set_backend = set_backend,
> +
> +	.set_owner = set_owner,
> +	.reset_owner = reset_owner,
> +};
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 13fbb6f..db810e7 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -96,6 +96,12 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
>   */
>  #include "virtio-net-cdev.c"
> 
> +/**
> + * Include vhost-user depend functions and definitions.
> + */
> +#include "virtio-net-user.c"
> +
> +
>  /*
>   * Retrieves an entry from the devices configuration linked list.
>   */
> @@ -105,6 +111,8 @@ get_config_ll_entry(struct vhost_device_ctx ctx)
>  	switch (ctx.type) {
>  	case VHOST_DRV_CUSE:
>  		return cdev_get_config_ll_entry(ctx);
> +	case VHOST_DRV_USER:
> +		return user_get_config_ll_entry(ctx);
>  	default:
>  		break;
>  	}
> @@ -120,6 +128,8 @@ get_device(struct vhost_device_ctx ctx)
>  	switch (ctx.type) {
>  	case VHOST_DRV_CUSE:
>  		return cdev_get_device(ctx);
> +	case VHOST_DRV_USER:
> +		return user_get_device(ctx);
>  	default:
>  		break;
>  	}
> @@ -136,6 +146,8 @@ add_config_ll_entry(vhost_driver_type_t type,
>  	switch (type) {
>  	case VHOST_DRV_CUSE:
>  		return cdev_add_config_ll_entry(new_ll_dev);
> +	case VHOST_DRV_USER:
> +		return user_add_config_ll_entry(new_ll_dev);
>  	default:
>  		break;
>  	}
> @@ -149,8 +161,39 @@ cleanup_device(struct virtio_net *dev)
>  {
>  	/* Unmap QEMU memory file if mapped. */
>  	if (dev->mem) {
> -		munmap((void *)(uintptr_t)dev->mem->mapped_address,
> -			(size_t)dev->mem->mapped_size);
> +		{
> +			/*
> +			 * 'munmap()' will be failed when mapped_size isn't
> +			 * aligned with hugepage size.
> +			 * Usually a file size of QEMU physical memory is
> +			 * aligned by hugepage size. So In a case of CUSE,
> +			 * there is no problem. But with vhost-user, there is
> +			 * no way to get physical memory size.
> +			 *
> +			 * Let's assume hugepage size is 2MB or 1GB here.
> +			 * BTW, 'mmap()' automatically fixed size parameter
> +			 * to be aligned. Why does 'munmap()' do like so?
> +			 */
> +			int ret = 0;
> +			size_t hugepagesize, size = dev->mem->mapped_size;
> +
> +			/* assume hugepage size is 2MB */
> +			hugepagesize = 2 * 1024 * 1024;
> +			size = (size + hugepagesize - 1) /
> +						hugepagesize * hugepagesize;
> +			ret = munmap((void *)(uintptr_t)
> +						dev->mem->mapped_address,
> +						size);
> +			if (ret) {
> +				/* assume hugepage size is 1GB, try again */
> +				hugepagesize = 1024 * 1024 * 1024;
> +				size = (size + hugepagesize - 1) /
> +						hugepagesize * hugepagesize;
> +				munmap((void *)(uintptr_t)
> +						dev->mem->mapped_address,
> +						size);
> +			}
> +		}
>  		free(dev->mem);
>  	}
> 
> @@ -187,6 +230,8 @@ rm_config_ll_entry(vhost_driver_type_t type,
>  	switch (type) {
>  	case VHOST_DRV_CUSE:
>  		return cdev_rm_config_ll_entry(ll_dev, ll_dev_last);
> +	case VHOST_DRV_USER:
> +		return user_rm_config_ll_entry(ll_dev, ll_dev_last);
>  	default:
>  		break;
>  	}
> @@ -201,7 +246,9 @@ get_config_ll_root(struct vhost_device_ctx ctx)
>  {
>  	switch (ctx.type) {
>  	case VHOST_DRV_CUSE:
> -		return cdev_get_config_ll_root(ctx);
> +		return cdev_get_config_ll_root();
> +	case VHOST_DRV_USER:
> +		return user_get_config_ll_root();
>  	default:
>  		break;
>  	}
> @@ -232,6 +279,8 @@ init_device(struct vhost_device_ctx ctx, struct virtio_net
> *dev)
>  	switch (ctx.type) {
>  	case VHOST_DRV_CUSE:
>  		return cdev_init_device(ctx, dev);
> +	case VHOST_DRV_USER:
> +		return user_init_device(ctx, dev);
>  	default:
>  		break;
>  	}
> @@ -527,6 +576,8 @@ get_virtio_net_callbacks(vhost_driver_type_t type)
>  	switch (type) {
>  	case VHOST_DRV_CUSE:
>  		return &vhost_cuse_device_ops;
> +	case VHOST_DRV_USER:
> +		return &vhost_user_device_ops;
>  	default:
>  		break;
>  	}
> @@ -570,9 +621,14 @@ int rte_vhost_feature_enable(uint64_t feature_mask)
>   * Register ops so that we can add/remove device to data core.
>   */
>  int
> -rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const
> ops)
> +rte_vhost_driver_callback_register(struct vhost_driver *drv,
> +		struct virtio_net_device_ops const * const ops, void *priv)
>  {
> +	if (drv == NULL || ops == NULL)
> +		return -1;
> +
>  	notify_ops = ops;
> +	drv->priv = priv;
> 
>  	return 0;
>  }
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-07 21:25   ` Xie, Huawei
@ 2014-11-10  5:11     ` Tetsuya Mukawa
  2014-11-10  8:18       ` Xie, Huawei
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-10  5:11 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi XIe,

(2014/11/08 6:25), Xie, Huawei wrote:
> How about using client/server model and select/poll event handing mechanism rather than poll?
> The polling could cause periodic jitter.
>
Sounds nice. I will change like your comment.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
  2014-11-07 20:43   ` Xie, Huawei
@ 2014-11-10  5:12     ` Tetsuya Mukawa
  2014-11-10  8:07       ` Xie, Huawei
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-10  5:12 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

(2014/11/08 5:43), Xie, Huawei wrote:
>> -struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
>> +struct vhost_net_device_ops const *get_virtio_net_callbacks(
>> +		vhost_driver_type_t type);
> Tetsuya:
> I feel currently it is better we still keep the common get_virtio_net_callbacks(). 
> For the message flow from control layer 1 (cuse ioctl or user sock message recv/xmit)---> cuse/user local message handling layer 2-> common virtio message handling layer 3
> Layer 1 and layer 2 belong to one  module. It is that module's choice whether to implement callbacks between internal layer1 and layer2. We don't need to force that.
> Besides, even that module wants to define the ops between layer 1 and layer2, the interface could be different between cuse/user. 
> Refer to the following code for user:
>
> vhost-user-server.c:
> case VHOST_USER_SET_MEM_TABLE:
> 	user_set_mem_table(ctx, &msg)
>
> virtio-net-user.c:
> user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
> {
>
> ....
>
> 	ops->set_mem_table(ctx, regions, memory.nregions);
> }
>
>
I may misunderstand what you say, please let me know in the case.
I guess it's difficult to remove 'vhost_driver_type_t'  from
'get_virtio_net_callbacks()'.
In original vhost example code, there are 2 layers related with
initialization as you mentioned.
  + Layer1: cuse ioctl handling layer.
  + Layer2: vhost-cuse( = vhost-net) message handling layer.

Layer1 needs function pointers to call Layer2 functions.
'get_virtio_net_callbacks()' is used for that purpose.

My RFC is based on above, but Layer1/2 are abstracted to hide vhost-cuse
and vhost-user.
 + Layer1: device control abstraction layer.
 -- Layer1-a: cuse ioctl handling layer.
 -- Layer1-b: unix domain socket handling layer.
 + Layer2: message handling abstraction layer.
 -- Layer2-a: vhost-cuse(vhost-net) message handling layer.
 -- Layer2-b: vhost-user message handling layer.

Still Layer1 needs function pointers of Layer2.
So, anyway, we still need to implement 'get_virtio_net_callbacks()'.

Also, as you mentioned, function definition and behavior are different
between Layer2-a and Lanyer2-b like 'user_set_mem_table()'.
Because of this, 'get_virtio_net_callbacks()' need to return collect
function pointers to Layer1.
So I guess 'get_virtio_net_callbacks()' needs 'vhost_driver_type_t' to
know which function pointers are needed by Layer1.

If someone wants to implement new vhost-backend, of course they can
implement Layer2 implementation and Layer1 together.
In the case,  they doesn't need to call 'get_virtio_net_callbacks()'.
Also they can reuse existing Layer2 implementation by calling
'get_virtio_net_callbacks()' with existing driver type, or they can
implement a new Layer2 implementation for new vhost-backend.

BTW, the name of 'vhost_driver_type_t' is redundant, I will change the name.

Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
  2014-11-10  5:12     ` Tetsuya Mukawa
@ 2014-11-10  8:07       ` Xie, Huawei
  2014-11-10  8:44         ` Tetsuya Mukawa
  0 siblings, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-10  8:07 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi



> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Sunday, November 09, 2014 10:13 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: nakajima.yoshihiro@lab.ntt.co.jp; masutani.hitoshi@lab.ntt.co.jp
> Subject: Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction
> layer tointerpret messages
> 
> Hi Xie,
> 
> (2014/11/08 5:43), Xie, Huawei wrote:
> >> -struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
> >> +struct vhost_net_device_ops const *get_virtio_net_callbacks(
> >> +		vhost_driver_type_t type);
> > Tetsuya:
> > I feel currently it is better we still keep the common get_virtio_net_callbacks().
> > For the message flow from control layer 1 (cuse ioctl or user sock message
> recv/xmit)---> cuse/user local message handling layer 2-> common virtio
> message handling layer 3
> > Layer 1 and layer 2 belong to one  module. It is that module's choice whether
> to implement callbacks between internal layer1 and layer2. We don't need to
> force that.
> > Besides, even that module wants to define the ops between layer 1 and layer2,
> the interface could be different between cuse/user.
> > Refer to the following code for user:
> >
> > vhost-user-server.c:
> > case VHOST_USER_SET_MEM_TABLE:
> > 	user_set_mem_table(ctx, &msg)
> >
> > virtio-net-user.c:
> > user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
> > {
> >
> > ....
> >
> > 	ops->set_mem_table(ctx, regions, memory.nregions);
> > }
> >
> >
> I may misunderstand what you say, please let me know in the case.
> I guess it's difficult to remove 'vhost_driver_type_t'  from
> 'get_virtio_net_callbacks()'.
> In original vhost example code, there are 2 layers related with
> initialization as you mentioned.
>   + Layer1: cuse ioctl handling layer.
>   + Layer2: vhost-cuse( = vhost-net) message handling layer.
> 
> Layer1 needs function pointers to call Layer2 functions.
> 'get_virtio_net_callbacks()' is used for that purpose.
> 
> My RFC is based on above, but Layer1/2 are abstracted to hide vhost-cuse
> and vhost-user.
>  + Layer1: device control abstraction layer.
>  -- Layer1-a: cuse ioctl handling layer.
>  -- Layer1-b: unix domain socket handling layer.
>  + Layer2: message handling abstraction layer.
>  -- Layer2-a: vhost-cuse(vhost-net) message handling layer.
>  -- Layer2-b: vhost-user message handling layer.
> 
> Still Layer1 needs function pointers of Layer2.
> So, anyway, we still need to implement 'get_virtio_net_callbacks()'.
> 
> Also, as you mentioned, function definition and behavior are different
> between Layer2-a and Lanyer2-b like 'user_set_mem_table()'.
> Because of this, 'get_virtio_net_callbacks()' need to return collect
> function pointers to Layer1.
> So I guess 'get_virtio_net_callbacks()' needs 'vhost_driver_type_t' to
> know which function pointers are needed by Layer1.

Here all layer 2 implementations are required to return same type of vhost_net_device_ops function pointers to
layer 1, so layer 1 need to do some kind of preprocessing of its message or wrap some private message ctx in like vhost_device_ctx,
and then pass the message to layer2.
But as we have a more common layer 3, virtio-net layer, how about we put common message handler in virtio net layer as much as possible,
and different layer 2 only do the local message preprocessing, and then pass common message format to layer 3?
I think we at least need to define functional pointers between layer 2 and layer 3.
Layer 1 and layer 2 actually are sub layers of the same layer. It is that layer(cuse/user) implementation's choice whether to  provide an interface between
them, and the interface could be different in terms of function prototype.
Let us say we are to implement a  new vhost, I only care the common interface provided by layer 3. I don't want to register another callbacks for my driver which
are used by myself only.
Let us think more about this.
> 
> If someone wants to implement new vhost-backend, of course they can
> implement Layer2 implementation and Layer1 together.
> In the case,  they doesn't need to call 'get_virtio_net_callbacks()'.
> Also they can reuse existing Layer2 implementation by calling
> 'get_virtio_net_callbacks()' with existing driver type, or they can
> implement a new Layer2 implementation for new vhost-backend.
> 
> BTW, the name of 'vhost_driver_type_t' is redundant, I will change the name.
> 
> Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-10  5:11     ` Tetsuya Mukawa
@ 2014-11-10  8:18       ` Xie, Huawei
  2014-11-10  8:55         ` Tetsuya Mukawa
  0 siblings, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-10  8:18 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Tetsuya:
I already did this, :), and will publish the code for review after I do some cleanup next week.

> -----Original Message-----
> From: Tetsuya Mukawa [mailto:mukawa@igel.co.jp]
> Sent: Sunday, November 09, 2014 10:11 PM
> To: Xie, Huawei; dev@dpdk.org
> Cc: nakajima.yoshihiro@lab.ntt.co.jp; masutani.hitoshi@lab.ntt.co.jp
> Subject: Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user
> implementation
> 
> Hi XIe,
> 
> (2014/11/08 6:25), Xie, Huawei wrote:
> > How about using client/server model and select/poll event handing mechanism
> rather than poll?
> > The polling could cause periodic jitter.
> >
> Sounds nice. I will change like your comment.
> 
> Thanks,
> Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages
  2014-11-10  8:07       ` Xie, Huawei
@ 2014-11-10  8:44         ` Tetsuya Mukawa
  0 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-10  8:44 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

(2014/11/10 17:07), Xie, Huawei wrote:
> Here all layer 2 implementations are required to return same type of vhost_net_device_ops function pointers to
> layer 1, so layer 1 need to do some kind of preprocessing of its message or wrap some private message ctx in like vhost_device_ctx,
> and then pass the message to layer2.
> But as we have a more common layer 3, virtio-net layer, how about we put common message handler in virtio net layer as much as possible,
> and different layer 2 only do the local message preprocessing, and then pass common message format to layer 3?
> I think we at least need to define functional pointers between layer 2 and layer 3.
> Layer 1 and layer 2 actually are sub layers of the same layer. It is that layer(cuse/user) implementation's choice whether to  provide an interface between
> them, and the interface could be different in terms of function prototype.
> Let us say we are to implement a  new vhost, I only care the common interface provided by layer 3. I don't want to register another callbacks for my driver which
> are used by myself only.
> Let us think more about this.
With my RFC implementation, sometimes Layer1 directly calls Layer2-a or
Layer2-b functions.
It may be fast a bit, but may not be well abstracted because Layer1
doesn't call virtio common layer sometimes.

Anyway, I guess it's nice to change implementation as you mentioned.
We don't need speed while initialization. Let's take well abstracted
implementation.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-10  8:18       ` Xie, Huawei
@ 2014-11-10  8:55         ` Tetsuya Mukawa
  0 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-10  8:55 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

Hi Xie,

(2014/11/10 17:18), Xie, Huawei wrote:
> Tetsuya:
> I already did this, :), and will publish the code for review after I do some cleanup next week.

I appreciate it.
I guess your implementation assumes that all vhost-user functions you implemented are called
by virtio common layer. Is it right?

If so, I will change abstraction layer implementation by this week or early next week.
(Please also check email related with 'get_virtio_net_callbacks()').

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation Tetsuya Mukawa
  2014-11-07 21:25   ` Xie, Huawei
@ 2014-11-14  0:07   ` Xie, Huawei
  2014-11-14  4:41     ` Tetsuya Mukawa
  1 sibling, 1 reply; 21+ messages in thread
From: Xie, Huawei @ 2014-11-14  0:07 UTC (permalink / raw)
  To: Tetsuya Mukawa, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

> +struct vhost_device_user_ctx {
> +	int			*fds;
> +	int			fd_num;
> +	struct vhost_driver	*drv;
> +};
> +
>  /*
>   * Structure used to identify device context.
>   */
> @@ -83,6 +89,7 @@ struct vhost_device_ctx {
>  	vhost_driver_type_t	type;	/* driver type. */
>  	uint64_t		fh;	/* Populated with fi->fh to track the
> device index. */
>  	union {
> +		struct vhost_device_user_ctx user;
>  		struct vhost_device_cuse_ctx cdev;
>  	};
>  };

Tetsuya:
It is ok we define the enum ctx, but so far I don't see absolute necessity to have user ctx.
Will send out RFC patch of my implementation today or next day to make it more clear.

I don't understand  why we keep two device lists.
  * in real case, will we allow to register two drivers?
     Besides, we have the open question whether we still need to keep the DPDK cuse driver. It requires maintenance effort
and extra kernel module;
     Btw, your framework to allow dynamically register different vhost driver is nice!
  * If two drivers are simultaneously accessing the device list, we could add lock.
 
> +user_get_device(struct vhost_device_ctx ctx)
> +user_add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
> +user_rm_config_ll_entry(struct virtio_net_config_ll *ll_dev, struct
> +user_get_config_ll_root(void)
> +user_init_device(struct vhost_device_ctx ctx, struct virtio_net *dev)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation
  2014-11-14  0:07   ` Xie, Huawei
@ 2014-11-14  4:41     ` Tetsuya Mukawa
  0 siblings, 0 replies; 21+ messages in thread
From: Tetsuya Mukawa @ 2014-11-14  4:41 UTC (permalink / raw)
  To: Xie, Huawei, dev; +Cc: nakajima.yoshihiro, masutani.hitoshi

(2014/11/14 9:07), Xie, Huawei wrote:
>> +struct vhost_device_user_ctx {
>> +	int			*fds;
>> +	int			fd_num;
>> +	struct vhost_driver	*drv;
>> +};
>> +
>>  /*
>>   * Structure used to identify device context.
>>   */
>> @@ -83,6 +89,7 @@ struct vhost_device_ctx {
>>  	vhost_driver_type_t	type;	/* driver type. */
>>  	uint64_t		fh;	/* Populated with fi->fh to track the
>> device index. */
>>  	union {
>> +		struct vhost_device_user_ctx user;
>>  		struct vhost_device_cuse_ctx cdev;
>>  	};
>>  };
> Tetsuya:
> It is ok we define the enum ctx, but so far I don't see absolute necessity to have user ctx.
> Will send out RFC patch of my implementation today or next day to make it more clear.
Thanks, let's change implementation simpler as much as possible.
>
> I don't understand  why we keep two device lists.
>   * in real case, will we allow to register two drivers?
>      Besides, we have the open question whether we still need to keep the DPDK cuse driver. It requires maintenance effort
> and extra kernel module;
>      Btw, your framework to allow dynamically register different vhost driver is nice!
I assume some customers still need to use QEMU under 2.0.
But it's okay for me to remove vhost-cuse implementation.
What is you and intel plan?

>   * If two drivers are simultaneously accessing the device list, we could add lock.
Also we may need to remove some global variables that cannot be shared
between drivers.
To remove global variable, we may be possible to save these variable in ctx.

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-11-14  4:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-06 11:14 [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 1/7] lib/librte_vhost: Fix host_memory_map() to handle various memory regions Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 2/7] lib/librte_vhost: Add an abstraction layer for vhost backends Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 3/7] lib/librte_vhost: Add an abstraction layer tointerpret messages Tetsuya Mukawa
2014-11-07 20:43   ` Xie, Huawei
2014-11-10  5:12     ` Tetsuya Mukawa
2014-11-10  8:07       ` Xie, Huawei
2014-11-10  8:44         ` Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 4/7] lib/librte_vhost: Move vhost vhost-cuse device list and accessor functions Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 5/7] lib/librte_vhost: Add a vhost session abstraction Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 6/7] lib/librte_vhost: Add vhost-cuse/user specific initialization Tetsuya Mukawa
2014-11-06 11:14 ` [dpdk-dev] [RFC PATCH 7/7] lib/librte_vhost: Add vhost-user implementation Tetsuya Mukawa
2014-11-07 21:25   ` Xie, Huawei
2014-11-10  5:11     ` Tetsuya Mukawa
2014-11-10  8:18       ` Xie, Huawei
2014-11-10  8:55         ` Tetsuya Mukawa
2014-11-14  0:07   ` Xie, Huawei
2014-11-14  4:41     ` Tetsuya Mukawa
2014-11-07  3:33 ` [dpdk-dev] [RFC PATCH 0/7] lib/librte_vhost: Add vhost-user extension Xie, Huawei
2014-11-07  5:09   ` Tetsuya Mukawa
     [not found]     ` <C37D651A908B024F974696C65296B57B0F2E3C93@SHSMSX101.ccr.corp.intel.com>
2014-11-07  6:16       ` Tetsuya Mukawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).