DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
@ 2015-02-12  5:07 Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Huawei Xie
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

vhost-user supports passing vring information to a seperate vhost enabled
user space process, normally a user space vSwitch, through unix domain socket.

In previous DPDK version, we implement a user space character device driver
vhost-cuse in user space DPDK process. vring information is passed to the
cuse driver through ioctl call, including eventfds for interrupt injection and
host notification. A kernel module is developed to copy these fds from
qemu process into our process. We also need some trick to map guest memory.
(TODO: kickfd/callfd is reversed which causes confusion)

known issue in vhost-user implementation in QEMU, reported by haifeng.lin@huawei.com
* QEMU doesn't send correct memory region information with multiple numa node configuration
        http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html

Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1
as fd on Ubuntu 14.04".

Huawei Xie (11):
 enable VIRTIO_NET_F_CTRL_RX
 create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
 rename vhost-net-cdev.h to vhost-net.h
 move fd copying(from qemu process into vhost process) to eventfd_copy.c
 copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
 make host_memory_map a more generic function.
 implement cuse_set_memory_table in virtio-net-cdev.c
 add select based event driven processing
 vhost user support
 support dev->ifname
 support calling rte_vhost_driver_register after rte_vhost_driver_session_start

 lib/librte_vhost/Makefile                     |   8 +-
 lib/librte_vhost/rte_virtio_net.h             |   5 +-
 lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
 lib/librte_vhost/vhost-net-cdev.h             | 113 ------
 lib/librte_vhost/vhost-net.h                  | 118 +++++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
 lib/librte_vhost/vhost_rxtx.c                 |   2 +-
 lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
 lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
 lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
 lib/librte_vhost/virtio-net.h                 |  43 +++
 19 files changed, 2491 insertions(+), 959 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.h
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
 create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
 create mode 100644 lib/librte_vhost/virtio-net.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-16  8:15   ` Tetsuya Mukawa
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Huawei Xie
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

In virtnet_send_command:

	/* Caller should know better */
	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
		(out + in > VIRTNET_SEND_COMMAND_SG_MAX));

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index b041849..52b4957 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root;
 
 /* Features supported by this lib. */
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
-				  (1ULL << VIRTIO_NET_F_CTRL_RX))
+				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+				(1ULL << VIRTIO_NET_F_CTRL_RX))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 /* Line size for reading maps file. */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Huawei Xie
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).

vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.

virtio-net.c provides common message handling for both vhost-cuse and vhost-user.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/Makefile                    |   4 +-
 lib/librte_vhost/vhost-net-cdev.c            | 389 ---------------------------
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 389 +++++++++++++++++++++++++++
 3 files changed, 391 insertions(+), 391 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 369c25a..49ae7ae 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -38,10 +38,10 @@ EXPORT_MAP := rte_vhost_version.map
 
 LIBABIVER := 1
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
deleted file mode 100644
index 57c76cb..0000000
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ /dev/null
@@ -1,389 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <errno.h>
-#include <fuse/cuse_lowlevel.h>
-#include <linux/limits.h>
-#include <linux/vhost.h>
-#include <stdint.h>
-#include <string.h>
-#include <unistd.h>
-
-#include <rte_ethdev.h>
-#include <rte_log.h>
-#include <rte_string_fns.h>
-#include <rte_virtio_net.h>
-
-#include "vhost-net-cdev.h"
-
-#define FUSE_OPT_DUMMY "\0\0"
-#define FUSE_OPT_FORE  "-f\0\0"
-#define FUSE_OPT_NOMULTI "-s\0\0"
-
-static const uint32_t default_major = 231;
-static const uint32_t default_minor = 1;
-static const char cuse_device_name[] = "/dev/cuse";
-static const char default_cdev[] = "vhost-net";
-
-static struct fuse_session *session;
-static struct vhost_net_device_ops const *ops;
-
-/*
- * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
- * when the device is added to the device linked list.
- */
-static struct vhost_device_ctx
-fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
-{
-	struct vhost_device_ctx ctx;
-	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
-
-	ctx.pid = req_ctx->pid;
-	ctx.fh = fi->fh;
-
-	return ctx;
-}
-
-/*
- * When the device is created in QEMU it gets initialised here and
- * added to the device linked list.
- */
-static void
-vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
-{
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-	int err = 0;
-
-	err = ops->new_device(ctx);
-	if (err == -1) {
-		fuse_reply_err(req, EPERM);
-		return;
-	}
-
-	fi->fh = err;
-
-	RTE_LOG(INFO, VHOST_CONFIG,
-		"(%"PRIu64") Device configuration started\n", fi->fh);
-	fuse_reply_open(req, fi);
-}
-
-/*
- * When QEMU is shutdown or killed the device gets released.
- */
-static void
-vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
-{
-	int err = 0;
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-
-	ops->destroy_device(ctx);
-	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh);
-	fuse_reply_err(req, err);
-}
-
-/*
- * Boilerplate code for CUSE IOCTL
- * Implicit arguments: ctx, req, result.
- */
-#define VHOST_IOCTL(func) do {	\
-	result = (func)(ctx);	\
-	fuse_reply_ioctl(req, result, NULL, 0);	\
-} while (0)
-
-/*
- * Boilerplate IOCTL RETRY
- * Implicit arguments: req.
- */
-#define VHOST_IOCTL_RETRY(size_r, size_w) do {	\
-	struct iovec iov_r = { arg, (size_r) };	\
-	struct iovec iov_w = { arg, (size_w) };	\
-	fuse_reply_ioctl_retry(req, &iov_r,	\
-		(size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\
-} while (0)
-
-/*
- * Boilerplate code for CUSE Read IOCTL
- * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
- */
-#define VHOST_IOCTL_R(type, var, func) do {	\
-	if (!in_bufsz) {	\
-		VHOST_IOCTL_RETRY(sizeof(type), 0);\
-	} else {	\
-		(var) = *(const type*)in_buf;	\
-		result = func(ctx, &(var));	\
-		fuse_reply_ioctl(req, result, NULL, 0);\
-	}	\
-} while (0)
-
-/*
- * Boilerplate code for CUSE Write IOCTL
- * Implicit arguments: ctx, req, result, out_bufsz.
- */
-#define VHOST_IOCTL_W(type, var, func) do {	\
-	if (!out_bufsz) {	\
-		VHOST_IOCTL_RETRY(0, sizeof(type));\
-	} else {	\
-		result = (func)(ctx, &(var));\
-		fuse_reply_ioctl(req, result, &(var), sizeof(type));\
-	} \
-} while (0)
-
-/*
- * Boilerplate code for CUSE Read/Write IOCTL
- * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
- */
-#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do {	\
-	if (!in_bufsz) {	\
-		VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\
-	} else {	\
-		(var1) = *(const type1*) (in_buf);	\
-		result = (func)(ctx, (var1), &(var2));	\
-		fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\
-	}	\
-} while (0)
-
-/*
- * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on the type
- * of IOCTL a buffer is requested to read or to write. This request is handled
- * by FUSE and the buffer is then given to CUSE.
- */
-static void
-vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
-		struct fuse_file_info *fi, __rte_unused unsigned flags,
-		const void *in_buf, size_t in_bufsz, size_t out_bufsz)
-{
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-	struct vhost_vring_file file;
-	struct vhost_vring_state state;
-	struct vhost_vring_addr addr;
-	uint64_t features;
-	uint32_t index;
-	int result = 0;
-
-	switch (cmd) {
-	case VHOST_NET_SET_BACKEND:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
-		break;
-
-	case VHOST_GET_FEATURES:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh);
-		VHOST_IOCTL_W(uint64_t, features, ops->get_features);
-		break;
-
-	case VHOST_SET_FEATURES:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh);
-		VHOST_IOCTL_R(uint64_t, features, ops->set_features);
-		break;
-
-	case VHOST_RESET_OWNER:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh);
-		VHOST_IOCTL(ops->reset_owner);
-		break;
-
-	case VHOST_SET_OWNER:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
-		VHOST_IOCTL(ops->set_owner);
-		break;
-
-	case VHOST_SET_MEM_TABLE:
-		/*TODO fix race condition.*/
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh);
-		static struct vhost_memory mem_temp;
-
-		switch (in_bufsz) {
-		case 0:
-			VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0);
-			break;
-
-		case sizeof(struct vhost_memory):
-			mem_temp = *(const struct vhost_memory *) in_buf;
-
-			if (mem_temp.nregions > 0) {
-				VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) +
-					(sizeof(struct vhost_memory_region) *
-						mem_temp.nregions), 0);
-			} else {
-				result = -1;
-				fuse_reply_ioctl(req, result, NULL, 0);
-			}
-			break;
-
-		default:
-			result = ops->set_mem_table(ctx,
-					in_buf, mem_temp.nregions);
-			if (result)
-				fuse_reply_err(req, EINVAL);
-			else
-				fuse_reply_ioctl(req, result, NULL, 0);
-		}
-		break;
-
-	case VHOST_SET_VRING_NUM:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_state, state,
-			ops->set_vring_num);
-		break;
-
-	case VHOST_SET_VRING_BASE:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_state, state,
-			ops->set_vring_base);
-		break;
-
-	case VHOST_GET_VRING_BASE:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh);
-		VHOST_IOCTL_RW(uint32_t, index,
-			struct vhost_vring_state, state, ops->get_vring_base);
-		break;
-
-	case VHOST_SET_VRING_ADDR:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_addr, addr,
-			ops->set_vring_addr);
-		break;
-
-	case VHOST_SET_VRING_KICK:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_kick);
-		break;
-
-	case VHOST_SET_VRING_CALL:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_call);
-		break;
-
-	default:
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
-		result = -1;
-		fuse_reply_ioctl(req, result, NULL, 0);
-	}
-
-	if (result < 0)
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
-	else
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
-}
-
-/*
- * Structure handling open, release and ioctl function pointers is populated.
- */
-static const struct cuse_lowlevel_ops vhost_net_ops = {
-	.open		= vhost_net_open,
-	.release	= vhost_net_release,
-	.ioctl		= vhost_net_ioctl,
-};
-
-/*
- * cuse_info is populated and used to register the cuse device.
- * vhost_net_device_ops are also passed when the device is registered in app.
- */
-int
-rte_vhost_driver_register(const char *dev_name)
-{
-	struct cuse_info cuse_info;
-	char device_name[PATH_MAX] = "";
-	char char_device_name[PATH_MAX] = "";
-	const char *device_argv[] = { device_name };
-
-	char fuse_opt_dummy[] = FUSE_OPT_DUMMY;
-	char fuse_opt_fore[] = FUSE_OPT_FORE;
-	char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI;
-	char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti};
-
-	if (access(cuse_device_name, R_OK | W_OK) < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"char device %s can't be accessed, maybe not exist\n",
-			cuse_device_name);
-		return -1;
-	}
-
-	/*
-	 * The device name is created. This is passed to QEMU so that it can
-	 * register the device with our application.
-	 */
-	snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name);
-	snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name);
-
-	/* Check if device already exists. */
-	if (access(char_device_name, F_OK) != -1) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"char device %s already exists\n", char_device_name);
-		return -1;
-	}
-
-	memset(&cuse_info, 0, sizeof(cuse_info));
-	cuse_info.dev_major = default_major;
-	cuse_info.dev_minor = default_minor;
-	cuse_info.dev_info_argc = 1;
-	cuse_info.dev_info_argv = device_argv;
-	cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
-
-	ops = get_virtio_net_callbacks();
-
-	session = cuse_lowlevel_setup(3, fuse_argv,
-			&cuse_info, &vhost_net_ops, 0, NULL);
-	if (session == NULL)
-		return -1;
-
-	return 0;
-}
-
-/**
- * The CUSE session is launched allowing the application to receive open,
- * release and ioctl calls.
- */
-int
-rte_vhost_driver_session_start(void)
-{
-	fuse_session_loop(session);
-
-	return 0;
-}
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
new file mode 100644
index 0000000..57c76cb
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -0,0 +1,389 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <errno.h>
+#include <fuse/cuse_lowlevel.h>
+#include <linux/limits.h>
+#include <linux/vhost.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_string_fns.h>
+#include <rte_virtio_net.h>
+
+#include "vhost-net-cdev.h"
+
+#define FUSE_OPT_DUMMY "\0\0"
+#define FUSE_OPT_FORE  "-f\0\0"
+#define FUSE_OPT_NOMULTI "-s\0\0"
+
+static const uint32_t default_major = 231;
+static const uint32_t default_minor = 1;
+static const char cuse_device_name[] = "/dev/cuse";
+static const char default_cdev[] = "vhost-net";
+
+static struct fuse_session *session;
+static struct vhost_net_device_ops const *ops;
+
+/*
+ * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
+ * when the device is added to the device linked list.
+ */
+static struct vhost_device_ctx
+fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
+{
+	struct vhost_device_ctx ctx;
+	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
+
+	ctx.pid = req_ctx->pid;
+	ctx.fh = fi->fh;
+
+	return ctx;
+}
+
+/*
+ * When the device is created in QEMU it gets initialised here and
+ * added to the device linked list.
+ */
+static void
+vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
+{
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+	int err = 0;
+
+	err = ops->new_device(ctx);
+	if (err == -1) {
+		fuse_reply_err(req, EPERM);
+		return;
+	}
+
+	fi->fh = err;
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"(%"PRIu64") Device configuration started\n", fi->fh);
+	fuse_reply_open(req, fi);
+}
+
+/*
+ * When QEMU is shutdown or killed the device gets released.
+ */
+static void
+vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
+{
+	int err = 0;
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+
+	ops->destroy_device(ctx);
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh);
+	fuse_reply_err(req, err);
+}
+
+/*
+ * Boilerplate code for CUSE IOCTL
+ * Implicit arguments: ctx, req, result.
+ */
+#define VHOST_IOCTL(func) do {	\
+	result = (func)(ctx);	\
+	fuse_reply_ioctl(req, result, NULL, 0);	\
+} while (0)
+
+/*
+ * Boilerplate IOCTL RETRY
+ * Implicit arguments: req.
+ */
+#define VHOST_IOCTL_RETRY(size_r, size_w) do {	\
+	struct iovec iov_r = { arg, (size_r) };	\
+	struct iovec iov_w = { arg, (size_w) };	\
+	fuse_reply_ioctl_retry(req, &iov_r,	\
+		(size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Read IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_IOCTL_R(type, var, func) do {	\
+	if (!in_bufsz) {	\
+		VHOST_IOCTL_RETRY(sizeof(type), 0);\
+	} else {	\
+		(var) = *(const type*)in_buf;	\
+		result = func(ctx, &(var));	\
+		fuse_reply_ioctl(req, result, NULL, 0);\
+	}	\
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Write IOCTL
+ * Implicit arguments: ctx, req, result, out_bufsz.
+ */
+#define VHOST_IOCTL_W(type, var, func) do {	\
+	if (!out_bufsz) {	\
+		VHOST_IOCTL_RETRY(0, sizeof(type));\
+	} else {	\
+		result = (func)(ctx, &(var));\
+		fuse_reply_ioctl(req, result, &(var), sizeof(type));\
+	} \
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Read/Write IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do {	\
+	if (!in_bufsz) {	\
+		VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\
+	} else {	\
+		(var1) = *(const type1*) (in_buf);	\
+		result = (func)(ctx, (var1), &(var2));	\
+		fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\
+	}	\
+} while (0)
+
+/*
+ * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on the type
+ * of IOCTL a buffer is requested to read or to write. This request is handled
+ * by FUSE and the buffer is then given to CUSE.
+ */
+static void
+vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
+		struct fuse_file_info *fi, __rte_unused unsigned flags,
+		const void *in_buf, size_t in_bufsz, size_t out_bufsz)
+{
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+	struct vhost_vring_file file;
+	struct vhost_vring_state state;
+	struct vhost_vring_addr addr;
+	uint64_t features;
+	uint32_t index;
+	int result = 0;
+
+	switch (cmd) {
+	case VHOST_NET_SET_BACKEND:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
+		break;
+
+	case VHOST_GET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh);
+		VHOST_IOCTL_W(uint64_t, features, ops->get_features);
+		break;
+
+	case VHOST_SET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh);
+		VHOST_IOCTL_R(uint64_t, features, ops->set_features);
+		break;
+
+	case VHOST_RESET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh);
+		VHOST_IOCTL(ops->reset_owner);
+		break;
+
+	case VHOST_SET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
+		VHOST_IOCTL(ops->set_owner);
+		break;
+
+	case VHOST_SET_MEM_TABLE:
+		/*TODO fix race condition.*/
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh);
+		static struct vhost_memory mem_temp;
+
+		switch (in_bufsz) {
+		case 0:
+			VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0);
+			break;
+
+		case sizeof(struct vhost_memory):
+			mem_temp = *(const struct vhost_memory *) in_buf;
+
+			if (mem_temp.nregions > 0) {
+				VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) +
+					(sizeof(struct vhost_memory_region) *
+						mem_temp.nregions), 0);
+			} else {
+				result = -1;
+				fuse_reply_ioctl(req, result, NULL, 0);
+			}
+			break;
+
+		default:
+			result = ops->set_mem_table(ctx,
+					in_buf, mem_temp.nregions);
+			if (result)
+				fuse_reply_err(req, EINVAL);
+			else
+				fuse_reply_ioctl(req, result, NULL, 0);
+		}
+		break;
+
+	case VHOST_SET_VRING_NUM:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_state, state,
+			ops->set_vring_num);
+		break;
+
+	case VHOST_SET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_state, state,
+			ops->set_vring_base);
+		break;
+
+	case VHOST_GET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh);
+		VHOST_IOCTL_RW(uint32_t, index,
+			struct vhost_vring_state, state, ops->get_vring_base);
+		break;
+
+	case VHOST_SET_VRING_ADDR:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_addr, addr,
+			ops->set_vring_addr);
+		break;
+
+	case VHOST_SET_VRING_KICK:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file,
+			ops->set_vring_kick);
+		break;
+
+	case VHOST_SET_VRING_CALL:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file,
+			ops->set_vring_call);
+		break;
+
+	default:
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
+		result = -1;
+		fuse_reply_ioctl(req, result, NULL, 0);
+	}
+
+	if (result < 0)
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
+	else
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
+}
+
+/*
+ * Structure handling open, release and ioctl function pointers is populated.
+ */
+static const struct cuse_lowlevel_ops vhost_net_ops = {
+	.open		= vhost_net_open,
+	.release	= vhost_net_release,
+	.ioctl		= vhost_net_ioctl,
+};
+
+/*
+ * cuse_info is populated and used to register the cuse device.
+ * vhost_net_device_ops are also passed when the device is registered in app.
+ */
+int
+rte_vhost_driver_register(const char *dev_name)
+{
+	struct cuse_info cuse_info;
+	char device_name[PATH_MAX] = "";
+	char char_device_name[PATH_MAX] = "";
+	const char *device_argv[] = { device_name };
+
+	char fuse_opt_dummy[] = FUSE_OPT_DUMMY;
+	char fuse_opt_fore[] = FUSE_OPT_FORE;
+	char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI;
+	char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti};
+
+	if (access(cuse_device_name, R_OK | W_OK) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"char device %s can't be accessed, maybe not exist\n",
+			cuse_device_name);
+		return -1;
+	}
+
+	/*
+	 * The device name is created. This is passed to QEMU so that it can
+	 * register the device with our application.
+	 */
+	snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name);
+	snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name);
+
+	/* Check if device already exists. */
+	if (access(char_device_name, F_OK) != -1) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"char device %s already exists\n", char_device_name);
+		return -1;
+	}
+
+	memset(&cuse_info, 0, sizeof(cuse_info));
+	cuse_info.dev_major = default_major;
+	cuse_info.dev_minor = default_minor;
+	cuse_info.dev_info_argc = 1;
+	cuse_info.dev_info_argv = device_argv;
+	cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
+
+	ops = get_virtio_net_callbacks();
+
+	session = cuse_lowlevel_setup(3, fuse_argv,
+			&cuse_info, &vhost_net_ops, 0, NULL);
+	if (session == NULL)
+		return -1;
+
+	return 0;
+}
+
+/**
+ * The CUSE session is launched allowing the application to receive open,
+ * release and ioctl calls.
+ */
+int
+rte_vhost_driver_session_start(void)
+{
+	fuse_session_loop(session);
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Huawei Xie
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

This file defines common operations provided by virtio-net(.c).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost-net-cdev.h            | 113 ---------------------------
 lib/librte_vhost/vhost-net.h                 | 113 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |   2 +-
 lib/librte_vhost/vhost_rxtx.c                |   2 +-
 lib/librte_vhost/virtio-net.c                |   2 +-
 5 files changed, 116 insertions(+), 116 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.h

diff --git a/lib/librte_vhost/vhost-net-cdev.h b/lib/librte_vhost/vhost-net-cdev.h
deleted file mode 100644
index 03a5c57..0000000
--- a/lib/librte_vhost/vhost-net-cdev.h
+++ /dev/null
@@ -1,113 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VHOST_NET_CDEV_H_
-#define _VHOST_NET_CDEV_H_
-#include <stdint.h>
-#include <stdio.h>
-#include <sys/types.h>
-#include <unistd.h>
-#include <linux/vhost.h>
-
-#include <rte_log.h>
-
-/* Macros for printing using RTE_LOG */
-#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
-#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
-
-#ifdef RTE_LIBRTE_VHOST_DEBUG
-#define VHOST_MAX_PRINT_BUFF 6072
-#define LOG_LEVEL RTE_LOG_DEBUG
-#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
-#define PRINT_PACKET(device, addr, size, header) do { \
-	char *pkt_addr = (char *)(addr); \
-	unsigned int index; \
-	char packet[VHOST_MAX_PRINT_BUFF]; \
-	\
-	if ((header)) \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
-	else \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
-	for (index = 0; index < (size); index++) { \
-		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
-			"%02hhx ", pkt_addr[index]); \
-	} \
-	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
-	\
-	LOG_DEBUG(VHOST_DATA, "%s", packet); \
-} while (0)
-#else
-#define LOG_LEVEL RTE_LOG_INFO
-#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
-#define PRINT_PACKET(device, addr, size, header) do {} while (0)
-#endif
-
-
-/*
- * Structure used to identify device context.
- */
-struct vhost_device_ctx {
-	pid_t		pid;	/* PID of process calling the IOCTL. */
-	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
-};
-
-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-	int (*new_device)(struct vhost_device_ctx);
-	void (*destroy_device)(struct vhost_device_ctx);
-
-	int (*get_features)(struct vhost_device_ctx, uint64_t *);
-	int (*set_features)(struct vhost_device_ctx, uint64_t *);
-
-	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
-
-	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
-	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
-
-	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
-	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_owner)(struct vhost_device_ctx);
-	int (*reset_owner)(struct vhost_device_ctx);
-};
-
-
-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
-#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
new file mode 100644
index 0000000..03a5c57
--- /dev/null
+++ b/lib/librte_vhost/vhost-net.h
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOST_NET_CDEV_H_
+#define _VHOST_NET_CDEV_H_
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <linux/vhost.h>
+
+#include <rte_log.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
+#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
+
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+#define VHOST_MAX_PRINT_BUFF 6072
+#define LOG_LEVEL RTE_LOG_DEBUG
+#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
+#define PRINT_PACKET(device, addr, size, header) do { \
+	char *pkt_addr = (char *)(addr); \
+	unsigned int index; \
+	char packet[VHOST_MAX_PRINT_BUFF]; \
+	\
+	if ((header)) \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
+	else \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
+	for (index = 0; index < (size); index++) { \
+		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
+			"%02hhx ", pkt_addr[index]); \
+	} \
+	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
+	\
+	LOG_DEBUG(VHOST_DATA, "%s", packet); \
+} while (0)
+#else
+#define LOG_LEVEL RTE_LOG_INFO
+#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
+#define PRINT_PACKET(device, addr, size, header) do {} while (0)
+#endif
+
+
+/*
+ * Structure used to identify device context.
+ */
+struct vhost_device_ctx {
+	pid_t		pid;	/* PID of process calling the IOCTL. */
+	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
+};
+
+/*
+ * Structure contains function pointers to be defined in virtio-net.c. These
+ * functions are called in CUSE context and are used to configure devices.
+ */
+struct vhost_net_device_ops {
+	int (*new_device)(struct vhost_device_ctx);
+	void (*destroy_device)(struct vhost_device_ctx);
+
+	int (*get_features)(struct vhost_device_ctx, uint64_t *);
+	int (*set_features)(struct vhost_device_ctx, uint64_t *);
+
+	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
+
+	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
+	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
+
+	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
+	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_owner)(struct vhost_device_ctx);
+	int (*reset_owner)(struct vhost_device_ctx);
+};
+
+
+struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 57c76cb..2bb07af 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -44,7 +44,7 @@
 #include <rte_string_fns.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define FUSE_OPT_DUMMY "\0\0"
 #define FUSE_OPT_FORE  "-f\0\0"
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index ccfd82f..c7c9550 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -38,7 +38,7 @@
 #include <rte_memcpy.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define MAX_PKT_BURST 32
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 52b4957..6bc9d51 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -53,7 +53,7 @@
 #include <rte_memory.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 #include "eventfd_link/eventfd_link.h"
 
 /*
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (2 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Huawei Xie
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

 vhost-user doesn't need eventfd kernel module to copy fds between processes.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/Makefile                    |  2 +-
 lib/librte_vhost/vhost_cuse/eventfd_copy.c   | 88 ++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.h   | 39 ++++++++++++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 41 +++++++++----
 lib/librte_vhost/virtio-net.c                | 57 +-----------------
 5 files changed, 161 insertions(+), 66 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 49ae7ae..88d1295 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -41,7 +41,7 @@ LIBABIVER := 1
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.c b/lib/librte_vhost/vhost_cuse/eventfd_copy.c
new file mode 100644
index 0000000..4d697a2
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/eventfd_copy.c
@@ -0,0 +1,88 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/eventfd.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <rte_log.h>
+
+#include "eventfd_link/eventfd_link.h"
+#include "eventfd_copy.h"
+#include "vhost-net.h"
+
+static const char eventfd_cdev[] = "/dev/eventfd-link";
+
+/*
+ * This function uses the eventfd_link kernel module to copy an eventfd file
+ * descriptor provided by QEMU in to our process space.
+ */
+int
+eventfd_copy(int target_fd, int target_pid)
+{
+	int eventfd_link, ret;
+	struct eventfd_copy eventfd_copy;
+	int fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+
+	if (fd == -1)
+		return -1;
+
+	/* Open the character device to the kernel module. */
+	/* TODO: check this earlier rather than fail until VM boots! */
+	eventfd_link = open(eventfd_cdev, O_RDWR);
+	if (eventfd_link < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"eventfd_link module is not loaded\n");
+		close(fd);
+		return -1;
+	}
+
+	eventfd_copy.source_fd = fd;
+	eventfd_copy.target_fd = target_fd;
+	eventfd_copy.target_pid = target_pid;
+	/* Call the IOCTL to copy the eventfd. */
+	ret = ioctl(eventfd_link, EVENTFD_COPY, &eventfd_copy);
+	close(eventfd_link);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"EVENTFD_COPY ioctl failed\n");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.h b/lib/librte_vhost/vhost_cuse/eventfd_copy.h
new file mode 100644
index 0000000..19ae30d
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/eventfd_copy.h
@@ -0,0 +1,39 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _EVENTFD_H
+#define _EVENTFD_H
+
+int
+eventfd_copy(int target_fd, int target_pid);
+
+#endif
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 2bb07af..e7794b0 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -45,6 +45,7 @@
 #include <rte_virtio_net.h>
 
 #include "vhost-net.h"
+#include "eventfd_copy.h"
 
 #define FUSE_OPT_DUMMY "\0\0"
 #define FUSE_OPT_FORE  "-f\0\0"
@@ -284,17 +285,37 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 		break;
 
 	case VHOST_SET_VRING_KICK:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_kick);
-		break;
-
 	case VHOST_SET_VRING_CALL:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_call);
+		if (cmd == VHOST_SET_VRING_KICK)
+			LOG_DEBUG(VHOST_CONFIG,
+				"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n",
+			ctx.fh);
+		else
+			LOG_DEBUG(VHOST_CONFIG,
+				"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n",
+			ctx.fh);
+		if (!in_buf)
+			VHOST_IOCTL_RETRY(sizeof(struct vhost_vring_file), 0);
+		else {
+			int fd;
+			file = *(const struct vhost_vring_file *)in_buf;
+			LOG_DEBUG(VHOST_CONFIG,
+				"idx:%d fd:%d\n", file.index, file.fd);
+			fd = eventfd_copy(file.fd, ctx.pid);
+			if (fd < 0) {
+				fuse_reply_ioctl(req, -1, NULL, 0);
+				result = -1;
+				break;
+			}
+			file.fd = fd;
+			if (cmd == VHOST_SET_VRING_KICK) {
+				result = ops->set_vring_kick(ctx, &file);
+				fuse_reply_ioctl(req, result, NULL, 0);
+			} else {
+				result = ops->set_vring_call(ctx, &file);
+				fuse_reply_ioctl(req, result, NULL, 0);
+			}
+		}
 		break;
 
 	default:
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 6bc9d51..da9e3a6 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -38,8 +38,6 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
-#include <sys/eventfd.h>
-#include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <unistd.h>
 
@@ -54,7 +52,6 @@
 #include <rte_virtio_net.h>
 
 #include "vhost-net.h"
-#include "eventfd_link/eventfd_link.h"
 
 /*
  * Device linked list structure for configuration.
@@ -64,8 +61,6 @@ struct virtio_net_config_ll {
 	struct virtio_net_config_ll *next;	/* Next dev on linked list.*/
 };
 
-const char eventfd_cdev[] = "/dev/eventfd-link";
-
 /* device ops to add/remove device to/from data core. */
 static struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
@@ -904,37 +899,6 @@ get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
 	return 0;
 }
 
-/*
- * This function uses the eventfd_link kernel module to copy an eventfd file
- * descriptor provided by QEMU in to our process space.
- */
-static int
-eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy)
-{
-	int eventfd_link, ret;
-
-	/* Open the character device to the kernel module. */
-	eventfd_link = open(eventfd_cdev, O_RDWR);
-	if (eventfd_link < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") eventfd_link module is not loaded\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	/* Call the IOCTL to copy the eventfd. */
-	ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy);
-	close(eventfd_link);
-
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") EVENTFD_COPY ioctl failed\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	return 0;
-}
 
 /*
  * Called from CUSE IOCTL: VHOST_SET_VRING_CALL
@@ -945,7 +909,6 @@ static int
 set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
-	struct eventfd_copy	eventfd_kick;
 	struct vhost_virtqueue *vq;
 
 	dev = get_device(ctx);
@@ -958,14 +921,7 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (vq->kickfd)
 		close((int)vq->kickfd);
 
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_kick.source_fd = vq->kickfd;
-	eventfd_kick.target_fd = file->fd;
-	eventfd_kick.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_kick))
-		return -1;
+	vq->kickfd = file->fd;
 
 	return 0;
 }
@@ -979,7 +935,6 @@ static int
 set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
-	struct eventfd_copy eventfd_call;
 	struct vhost_virtqueue *vq;
 
 	dev = get_device(ctx);
@@ -991,15 +946,7 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 
 	if (vq->callfd)
 		close((int)vq->callfd);
-
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_call.source_fd = vq->callfd;
-	eventfd_call.target_fd = file->fd;
-	eventfd_call.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_call))
-		return -1;
+	vq->callfd = file->fd;
 
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (3 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 06/11] lib/librte_vhost: make host_memory_map a more generic function Huawei Xie
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 257 ++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c

diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
new file mode 100644
index 0000000..baca379
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -0,0 +1,257 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <dirent.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h>
+#include <fuse/cuse_lowlevel.h>
+#include <stddef.h>
+#include <string.h>
+#include <stdlib.h>
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <rte_log.h>
+
+#include "vhost-net.h"
+
+/* Line size for reading maps file. */
+static const uint32_t BUFSIZE = PATH_MAX;
+
+/* Size of prot char array in procmap. */
+#define PROT_SZ 5
+
+/* Number of elements in procmap struct. */
+#define PROCMAP_SZ 8
+
+/* Structure containing information gathered from maps file. */
+struct procmap {
+	uint64_t va_start;	/* Start virtual address in file. */
+	uint64_t len;		/* Size of file. */
+	uint64_t pgoff;		/* Not used. */
+	uint32_t maj;		/* Not used. */
+	uint32_t min;		/* Not used. */
+	uint32_t ino;		/* Not used. */
+	char prot[PROT_SZ];	/* Not used. */
+	char fname[PATH_MAX];	/* File name. */
+};
+
+/*
+ * Locate the file containing QEMU's memory space and
+ * map it to our address space.
+ */
+static int
+host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
+	pid_t pid, uint64_t addr)
+{
+	struct dirent *dptr = NULL;
+	struct procmap procmap;
+	DIR *dp = NULL;
+	int fd;
+	int i;
+	char memfile[PATH_MAX];
+	char mapfile[PATH_MAX];
+	char procdir[PATH_MAX];
+	char resolved_path[PATH_MAX];
+	char *path = NULL;
+	FILE *fmap;
+	void *map;
+	uint8_t found = 0;
+	char line[BUFSIZE];
+	char dlm[] = "-   :   ";
+	char *str, *sp, *in[PROCMAP_SZ];
+	char *end = NULL;
+
+	/* Path where mem files are located. */
+	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
+	/* Maps file used to locate mem file. */
+	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
+
+	fmap = fopen(mapfile, "r");
+	if (fmap == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to open maps file for pid %d\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Read through maps file until we find out base_address. */
+	while (fgets(line, BUFSIZE, fmap) != 0) {
+		str = line;
+		errno = 0;
+		/* Split line into fields. */
+		for (i = 0; i < PROCMAP_SZ; i++) {
+			in[i] = strtok_r(str, &dlm[i], &sp);
+			if ((in[i] == NULL) || (errno != 0)) {
+				fclose(fmap);
+				return -1;
+			}
+			str = NULL;
+		}
+
+		/* Convert/Copy each field as needed. */
+		procmap.va_start = strtoull(in[0], &end, 16);
+		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.len = strtoull(in[1], &end, 16);
+		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.pgoff = strtoull(in[3], &end, 16);
+		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.maj = strtoul(in[4], &end, 16);
+		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.min = strtoul(in[5], &end, 16);
+		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.ino = strtoul(in[6], &end, 16);
+		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		memcpy(&procmap.prot, in[2], PROT_SZ);
+		memcpy(&procmap.fname, in[7], PATH_MAX);
+
+		if (procmap.va_start == addr) {
+			procmap.len = procmap.len - procmap.va_start;
+			found = 1;
+			break;
+		}
+	}
+	fclose(fmap);
+
+	if (!found) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Find the guest memory file among the process fds. */
+	dp = opendir(procdir);
+	if (dp == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Cannot open pid %d process directory\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	found = 0;
+
+	/* Read the fd directory contents. */
+	while (NULL != (dptr = readdir(dp))) {
+		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
+				pid, dptr->d_name);
+		path = realpath(memfile, resolved_path);
+		if ((path == NULL) && (strlen(resolved_path) == 0)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"(%"PRIu64") Failed to resolve fd directory\n",
+				dev->device_fh);
+			closedir(dp);
+			return -1;
+		}
+		if (strncmp(resolved_path, procmap.fname,
+			strnlen(procmap.fname, PATH_MAX)) == 0) {
+			found = 1;
+			break;
+		}
+	}
+
+	closedir(dp);
+
+	if (found == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to find memory file for pid %d\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+	/* Open the shared memory file and map the memory into this process. */
+	fd = open(memfile, O_RDWR);
+
+	if (fd == -1) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to open %s for pid %d\n",
+			dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
+		MAP_POPULATE|MAP_SHARED, fd, 0);
+	close(fd);
+
+	if (map == MAP_FAILED) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Error mapping the file %s for pid %d\n",
+			dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	/* Store the memory address and size in the device data structure */
+	mem->mapped_address = (uint64_t)(uintptr_t)map;
+	mem->mapped_size = procmap.len;
+
+	LOG_DEBUG(VHOST_CONFIG,
+		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
+		dev->device_fh,
+		memfile, resolved_path,
+		(unsigned long long)mem->mapped_size, map);
+
+	return 0;
+}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 06/11] lib/librte_vhost: make host_memory_map a more generic function.
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (4 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 07/11] lib/librte_vhost: implement cuse_set_memory_table Huawei Xie
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.

The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 42 +++++++++++++--------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index baca379..58ac3dd 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -75,8 +75,8 @@ struct procmap {
  * map it to our address space.
  */
 static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
-	pid_t pid, uint64_t addr)
+host_memory_map(pid_t pid, uint64_t addr,
+	uint64_t *mapped_address, uint64_t *mapped_size)
 {
 	struct dirent *dptr = NULL;
 	struct procmap procmap;
@@ -104,8 +104,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 	fmap = fopen(mapfile, "r");
 	if (fmap == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open maps file for pid %d\n",
-			dev->device_fh, pid);
+			"Failed to open maps file for pid %d\n",
+			pid);
 		return -1;
 	}
 
@@ -179,8 +179,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (!found) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
-			dev->device_fh, pid);
+			"Failed to find memory file in pid %d maps file\n",
+			pid);
 		return -1;
 	}
 
@@ -188,8 +188,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 	dp = opendir(procdir);
 	if (dp == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Cannot open pid %d process directory\n",
-			dev->device_fh, pid);
+			"Cannot open pid %d process directory\n",
+			pid);
 		return -1;
 	}
 
@@ -202,8 +202,7 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 		path = realpath(memfile, resolved_path);
 		if ((path == NULL) && (strlen(resolved_path) == 0)) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"(%"PRIu64") Failed to resolve fd directory\n",
-				dev->device_fh);
+				"Failed to resolve fd directory\n");
 			closedir(dp);
 			return -1;
 		}
@@ -218,8 +217,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (found == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file for pid %d\n",
-			dev->device_fh, pid);
+			"Failed to find memory file for pid %d\n",
+			pid);
 		return -1;
 	}
 	/* Open the shared memory file and map the memory into this process. */
@@ -227,31 +226,30 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (fd == -1) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open %s for pid %d\n",
-			dev->device_fh, memfile, pid);
+			"Failed to open %s for pid %d\n",
+			memfile, pid);
 		return -1;
 	}
 
 	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
-		MAP_POPULATE|MAP_SHARED, fd, 0);
+			MAP_POPULATE|MAP_SHARED, fd, 0);
 	close(fd);
 
 	if (map == MAP_FAILED) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Error mapping the file %s for pid %d\n",
-			dev->device_fh, memfile, pid);
+			"Error mapping the file %s for pid %d\n",
+			memfile, pid);
 		return -1;
 	}
 
 	/* Store the memory address and size in the device data structure */
-	mem->mapped_address = (uint64_t)(uintptr_t)map;
-	mem->mapped_size = procmap.len;
+	*mapped_address = (uint64_t)(uintptr_t)map;
+	*mapped_size = procmap.len;
 
 	LOG_DEBUG(VHOST_CONFIG,
-		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
-		dev->device_fh,
+		"Mem File: %s->%s - Size: %llu - VA: %p\n",
 		memfile, resolved_path,
-		(unsigned long long)mem->mapped_size, map);
+		(unsigned long long)*mapped_size, map);
 
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 07/11] lib/librte_vhost: implement cuse_set_memory_table
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (5 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 06/11] lib/librte_vhost: make host_memory_map a more generic function Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing Huawei Xie
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

remove set_memory_table ops

vhost-cuse or vhost-user will both implement their own set_memory_region handler.

In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/Makefile                     |   2 +-
 lib/librte_vhost/vhost-net.h                  |   4 +-
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  |   7 +-
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 115 +++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  45 ++++
 lib/librte_vhost/virtio-net.c                 | 348 --------------------------
 lib/librte_vhost/virtio-net.h                 |  43 ++++
 7 files changed, 210 insertions(+), 354 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
 create mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 88d1295..797a806 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -41,7 +41,7 @@ LIBABIVER := 1
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 03a5c57..86b38a5 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -41,6 +41,8 @@
 
 #include <rte_log.h>
 
+#define VHOST_MEMORY_MAX_NREGIONS 8
+
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
@@ -92,8 +94,6 @@ struct vhost_net_device_ops {
 	int (*get_features)(struct vhost_device_ctx, uint64_t *);
 	int (*set_features)(struct vhost_device_ctx, uint64_t *);
 
-	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
-
 	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
 	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
 	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index e7794b0..72609a3 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -44,6 +44,7 @@
 #include <rte_string_fns.h>
 #include <rte_virtio_net.h>
 
+#include "virtio-net-cdev.h"
 #include "vhost-net.h"
 #include "eventfd_copy.h"
 
@@ -57,7 +58,7 @@ static const char cuse_device_name[] = "/dev/cuse";
 static const char default_cdev[] = "vhost-net";
 
 static struct fuse_session *session;
-static struct vhost_net_device_ops const *ops;
+struct vhost_net_device_ops const *ops;
 
 /*
  * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
@@ -247,8 +248,8 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 			break;
 
 		default:
-			result = ops->set_mem_table(ctx,
-					in_buf, mem_temp.nregions);
+			result = cuse_set_mem_table(ctx, in_buf,
+				mem_temp.nregions);
 			if (result)
 				fuse_reply_err(req, EINVAL);
 			else
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index 58ac3dd..adebb54 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -47,7 +47,10 @@
 
 #include <rte_log.h>
 
+#include "rte_virtio_net.h"
 #include "vhost-net.h"
+#include "virtio-net-cdev.h"
+#include "virtio-net.h"
 
 /* Line size for reading maps file. */
 static const uint32_t BUFSIZE = PATH_MAX;
@@ -253,3 +256,115 @@ host_memory_map(pid_t pid, uint64_t addr,
 
 	return 0;
 }
+
+int
+cuse_set_mem_table(struct vhost_device_ctx ctx,
+	const struct vhost_memory *mem_regions_addr, uint32_t nregions)
+{
+	uint64_t size = offsetof(struct vhost_memory, regions);
+	uint32_t idx, valid_regions;
+	struct virtio_memory_regions *pregion;
+	struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
+		((uint64_t)(uintptr_t)mem_regions_addr + size);
+	uint64_t base_address = 0, mapped_address, mapped_size;
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem && dev->mem->mapped_address) {
+		munmap((void *)(uintptr_t)dev->mem->mapped_address,
+			(size_t)dev->mem->mapped_size);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+
+	dev->mem = calloc(1, sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_regions) * nregions);
+	if (dev->mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to allocate memory for dev->mem\n",
+			dev->device_fh);
+		return -1;
+	}
+
+	pregion = &dev->mem->regions[0];
+
+	for (idx = 0; idx < nregions; idx++) {
+		pregion[idx].guest_phys_address =
+			mem_regions[idx].guest_phys_addr;
+		pregion[idx].guest_phys_address_end =
+			pregion[idx].guest_phys_address +
+			mem_regions[idx].memory_size;
+		pregion[idx].memory_size =
+			mem_regions[idx].memory_size;
+		pregion[idx].userspace_address =
+			mem_regions[idx].userspace_addr;
+
+		LOG_DEBUG(VHOST_CONFIG,
+			"REGION: %u - GPA: %p - QVA: %p - SIZE (%"PRIu64")\n",
+			idx,
+			(void *)(uintptr_t)pregion[idx].guest_phys_address,
+			(void *)(uintptr_t)pregion[idx].userspace_address,
+			pregion[idx].memory_size);
+
+		/*set the base address mapping*/
+		if (pregion[idx].guest_phys_address == 0x0) {
+			base_address =
+				pregion[idx].userspace_address;
+			/* Map VM memory file */
+			if (host_memory_map(ctx.pid, base_address,
+				&mapped_address, &mapped_size) != 0) {
+				free(dev->mem);
+				dev->mem = NULL;
+				return -1;
+			}
+			dev->mem->mapped_address = mapped_address;
+			dev->mem->base_address = base_address;
+			dev->mem->mapped_size = mapped_size;
+		}
+	}
+
+	/* Check that we have a valid base address. */
+	if (base_address == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to find base address of qemu memory file.\n");
+		free(dev->mem);
+		dev->mem = NULL;
+		return -1;
+	}
+
+	valid_regions = nregions;
+	for (idx = 0; idx < nregions; idx++) {
+		if ((pregion[idx].userspace_address < base_address) ||
+			(pregion[idx].userspace_address >
+			(base_address + mapped_size)))
+			valid_regions--;
+	}
+
+
+	if (valid_regions != nregions) {
+		valid_regions = 0;
+		for (idx = nregions; 0 != idx--; ) {
+			if ((pregion[idx].userspace_address < base_address) ||
+			(pregion[idx].userspace_address >
+			(base_address + mapped_size))) {
+				memmove(&pregion[idx], &pregion[idx + 1],
+					sizeof(struct virtio_memory_regions) *
+					valid_regions);
+			} else
+				valid_regions++;
+		}
+	}
+
+	for (idx = 0; idx < valid_regions; idx++) {
+		pregion[idx].address_offset =
+			mapped_address - base_address +
+			pregion[idx].userspace_address -
+			pregion[idx].guest_phys_address;
+	}
+	dev->mem->nregions = valid_regions;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
new file mode 100644
index 0000000..5ee81b1
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _VIRTIO_NET_CDEV_H
+#define _VIRTIO_NET_CDEV_H
+
+#include <stdint.h>
+#include <linux/vhost.h>
+
+#include "vhost-net.h"
+
+int
+cuse_set_mem_table(struct vhost_device_ctx ctx,
+	const struct vhost_memory *mem_regions_addr, uint32_t nregions);
+
+#endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index da9e3a6..c490f19 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -31,8 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <dirent.h>
-#include <fuse/cuse_lowlevel.h>
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <stddef.h>
@@ -72,26 +70,6 @@ static struct virtio_net_config_ll *ll_root;
 				(1ULL << VIRTIO_NET_F_CTRL_RX))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
-/* Line size for reading maps file. */
-static const uint32_t BUFSIZE = PATH_MAX;
-
-/* Size of prot char array in procmap. */
-#define PROT_SZ 5
-
-/* Number of elements in procmap struct. */
-#define PROCMAP_SZ 8
-
-/* Structure containing information gathered from maps file. */
-struct procmap {
-	uint64_t va_start;	/* Start virtual address in file. */
-	uint64_t len;		/* Size of file. */
-	uint64_t pgoff;		/* Not used. */
-	uint32_t maj;		/* Not used. */
-	uint32_t min;		/* Not used. */
-	uint32_t ino;		/* Not used. */
-	char prot[PROT_SZ];	/* Not used. */
-	char fname[PATH_MAX];	/* File name. */
-};
 
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
@@ -118,191 +96,6 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 	return vhost_va;
 }
 
-/*
- * Locate the file containing QEMU's memory space and
- * map it to our address space.
- */
-static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
-	pid_t pid, uint64_t addr)
-{
-	struct dirent *dptr = NULL;
-	struct procmap procmap;
-	DIR *dp = NULL;
-	int fd;
-	int i;
-	char memfile[PATH_MAX];
-	char mapfile[PATH_MAX];
-	char procdir[PATH_MAX];
-	char resolved_path[PATH_MAX];
-	char *path = NULL;
-	FILE *fmap;
-	void *map;
-	uint8_t found = 0;
-	char line[BUFSIZE];
-	char dlm[] = "-   :   ";
-	char *str, *sp, *in[PROCMAP_SZ];
-	char *end = NULL;
-
-	/* Path where mem files are located. */
-	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
-	/* Maps file used to locate mem file. */
-	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
-
-	fmap = fopen(mapfile, "r");
-	if (fmap == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open maps file for pid %d\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Read through maps file until we find out base_address. */
-	while (fgets(line, BUFSIZE, fmap) != 0) {
-		str = line;
-		errno = 0;
-		/* Split line into fields. */
-		for (i = 0; i < PROCMAP_SZ; i++) {
-			in[i] = strtok_r(str, &dlm[i], &sp);
-			if ((in[i] == NULL) || (errno != 0)) {
-				fclose(fmap);
-				return -1;
-			}
-			str = NULL;
-		}
-
-		/* Convert/Copy each field as needed. */
-		procmap.va_start = strtoull(in[0], &end, 16);
-		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.len = strtoull(in[1], &end, 16);
-		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.pgoff = strtoull(in[3], &end, 16);
-		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.maj = strtoul(in[4], &end, 16);
-		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.min = strtoul(in[5], &end, 16);
-		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.ino = strtoul(in[6], &end, 16);
-		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		memcpy(&procmap.prot, in[2], PROT_SZ);
-		memcpy(&procmap.fname, in[7], PATH_MAX);
-
-		if (procmap.va_start == addr) {
-			procmap.len = procmap.len - procmap.va_start;
-			found = 1;
-			break;
-		}
-	}
-	fclose(fmap);
-
-	if (!found) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Find the guest memory file among the process fds. */
-	dp = opendir(procdir);
-	if (dp == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Cannot open pid %d process directory\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	found = 0;
-
-	/* Read the fd directory contents. */
-	while (NULL != (dptr = readdir(dp))) {
-		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
-				pid, dptr->d_name);
-		path = realpath(memfile, resolved_path);
-		if ((path == NULL) && (strlen(resolved_path) == 0)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"(%"PRIu64") Failed to resolve fd directory\n",
-				dev->device_fh);
-			closedir(dp);
-			return -1;
-		}
-		if (strncmp(resolved_path, procmap.fname,
-			strnlen(procmap.fname, PATH_MAX)) == 0) {
-			found = 1;
-			break;
-		}
-	}
-
-	closedir(dp);
-
-	if (found == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file for pid %d\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-	/* Open the shared memory file and map the memory into this process. */
-	fd = open(memfile, O_RDWR);
-
-	if (fd == -1) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open %s for pid %d\n",
-			dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
-		MAP_POPULATE|MAP_SHARED, fd, 0);
-	close(fd);
-
-	if (map == MAP_FAILED) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Error mapping the file %s for pid %d\n",
-			dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	/* Store the memory address and size in the device data structure */
-	mem->mapped_address = (uint64_t)(uintptr_t)map;
-	mem->mapped_size = procmap.len;
-
-	LOG_DEBUG(VHOST_CONFIG,
-		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
-		dev->device_fh,
-		memfile, resolved_path,
-		(unsigned long long)mem->mapped_size, map);
-
-	return 0;
-}
 
 /*
  * Retrieves an entry from the devices configuration linked list.
@@ -644,145 +437,6 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	return 0;
 }
 
-
-/*
- * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE
- * This function creates and populates the memory structure for the device.
- * This includes storing offsets used to translate buffer addresses.
- */
-static int
-set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
-	uint32_t nregions)
-{
-	struct virtio_net *dev;
-	struct vhost_memory_region *mem_regions;
-	struct virtio_memory *mem;
-	uint64_t size = offsetof(struct vhost_memory, regions);
-	uint32_t regionidx, valid_regions;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	if (dev->mem) {
-		munmap((void *)(uintptr_t)dev->mem->mapped_address,
-			(size_t)dev->mem->mapped_size);
-		free(dev->mem);
-	}
-
-	/* Malloc the memory structure depending on the number of regions. */
-	mem = calloc(1, sizeof(struct virtio_memory) +
-		(sizeof(struct virtio_memory_regions) * nregions));
-	if (mem == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for dev->mem.\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	mem->nregions = nregions;
-
-	mem_regions = (void *)(uintptr_t)
-			((uint64_t)(uintptr_t)mem_regions_addr + size);
-
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		/* Populate the region structure for each region. */
-		mem->regions[regionidx].guest_phys_address =
-			mem_regions[regionidx].guest_phys_addr;
-		mem->regions[regionidx].guest_phys_address_end =
-			mem->regions[regionidx].guest_phys_address +
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].memory_size =
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].userspace_address =
-			mem_regions[regionidx].userspace_addr;
-
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
-			regionidx,
-			(void *)(uintptr_t)mem->regions[regionidx].guest_phys_address,
-			(void *)(uintptr_t)mem->regions[regionidx].userspace_address,
-			mem->regions[regionidx].memory_size);
-
-		/*set the base address mapping*/
-		if (mem->regions[regionidx].guest_phys_address == 0x0) {
-			mem->base_address =
-				mem->regions[regionidx].userspace_address;
-			/* Map VM memory file */
-			if (host_memory_map(dev, mem, ctx.pid,
-				mem->base_address) != 0) {
-				free(mem);
-				return -1;
-			}
-		}
-	}
-
-	/* Check that we have a valid base address. */
-	if (mem->base_address == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh);
-		free(mem);
-		return -1;
-	}
-
-	/*
-	 * Check if all of our regions have valid mappings.
-	 * Usually one does not exist in the QEMU memory file.
-	 */
-	valid_regions = mem->nregions;
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		if ((mem->regions[regionidx].userspace_address <
-			mem->base_address) ||
-			(mem->regions[regionidx].userspace_address >
-			(mem->base_address + mem->mapped_size)))
-				valid_regions--;
-	}
-
-	/*
-	 * If a region does not have a valid mapping,
-	 * we rebuild our memory struct to contain only valid entries.
-	 */
-	if (valid_regions != mem->nregions) {
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n",
-			dev->device_fh);
-
-		/*
-		 * Re-populate the memory structure with only valid regions.
-		 * Invalid regions are over-written with memmove.
-		 */
-		valid_regions = 0;
-
-		for (regionidx = mem->nregions; 0 != regionidx--;) {
-			if ((mem->regions[regionidx].userspace_address <
-				mem->base_address) ||
-				(mem->regions[regionidx].userspace_address >
-				(mem->base_address + mem->mapped_size))) {
-				memmove(&mem->regions[regionidx],
-					&mem->regions[regionidx + 1],
-					sizeof(struct virtio_memory_regions) *
-						valid_regions);
-			} else {
-				valid_regions++;
-			}
-		}
-	}
-	mem->nregions = valid_regions;
-	dev->mem = mem;
-
-	/*
-	 * Calculate the address offset for each region.
-	 * This offset is used to identify the vhost virtual address
-	 * corresponding to a QEMU guest physical address.
-	 */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		dev->mem->regions[regionidx].address_offset =
-			dev->mem->regions[regionidx].userspace_address -
-				dev->mem->base_address +
-				dev->mem->mapped_address -
-				dev->mem->regions[regionidx].guest_phys_address;
-
-	}
-	return 0;
-}
-
 /*
  * Called from CUSE IOCTL: VHOST_SET_VRING_NUM
  * The virtio device sends us the size of the descriptor ring.
@@ -1040,8 +694,6 @@ static const struct vhost_net_device_ops vhost_device_ops = {
 	.get_features = get_features,
 	.set_features = set_features,
 
-	.set_mem_table = set_mem_table,
-
 	.set_vring_num = set_vring_num,
 	.set_vring_addr = set_vring_addr,
 	.set_vring_base = set_vring_base,
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
new file mode 100644
index 0000000..75fb57e
--- /dev/null
+++ b/lib/librte_vhost/virtio-net.h
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_NET_H
+#define _VIRTIO_NET_H
+
+#include "vhost-net.h"
+#include "rte_virtio_net.h"
+
+struct virtio_net_device_ops const *notify_ops;
+struct virtio_net *get_device(struct vhost_device_ctx ctx);
+
+#endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (6 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 07/11] lib/librte_vhost: implement cuse_set_memory_table Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-16 17:10   ` Ananyev, Konstantin
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support Huawei Xie
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

for more generic event driven processing, refer to:
	http://libevent.org/


Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_user/fd_man.c | 207 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/fd_man.h |  64 +++++++++++
 2 files changed, 271 insertions(+)
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.h

diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
new file mode 100644
index 0000000..929fbc3
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <rte_log.h>
+
+#include "fd_man.h"
+
+/**
+ * Returns the index in the fdset for a given fd.
+ * If fd is -1, it means to search for a free entry.
+ * @return
+ *   index for the fd, or -1 if fd isn't in the fdset.
+ */
+static int
+fdset_find_fd(struct fdset *pfdset, int fd)
+{
+	int i;
+
+	if (pfdset == NULL)
+		return -1;
+
+	for (i = 0; i < MAX_FDS && pfdset->fd[i].fd != fd; i++)
+		;
+
+	return i ==  MAX_FDS ? -1 : i;
+}
+
+static int
+fdset_find_free_slot(struct fdset *pfdset)
+{
+	return fdset_find_fd(pfdset, -1);
+}
+
+static void
+fdset_add_fd(struct fdset  *pfdset, int idx, int fd,
+	fd_cb rcb, fd_cb wcb, void *dat)
+{
+	struct fdentry *pfdentry;
+
+	if (pfdset == NULL || idx >= MAX_FDS)
+		return;
+
+	pfdentry = &pfdset->fd[idx];
+	pfdentry->fd = fd;
+	pfdentry->rcb = rcb;
+	pfdentry->wcb = wcb;
+	pfdentry->dat = dat;
+}
+
+/**
+ * Fill the read/write fd_set with the fds in the fdset.
+ * @return
+ *  the maximum fds filled in the read/write fd_set.
+ */
+static int
+fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
+{
+	struct fdentry *pfdentry;
+	int i, maxfds = -1;
+	int num = MAX_FDS;
+
+	if (pfdset == NULL)
+		return -1;
+
+	for (i = 0; i < num; i++) {
+		pfdentry = &pfdset->fd[i];
+		if (pfdentry->fd != -1) {
+			int added = 0;
+			if (pfdentry->rcb && rfset) {
+				FD_SET(pfdentry->fd, rfset);
+				added = 1;
+			}
+			if (pfdentry->wcb && wfset) {
+				FD_SET(pfdentry->fd, wfset);
+				added = 1;
+			}
+			if (added)
+				maxfds = pfdentry->fd < maxfds ?
+					maxfds : pfdentry->fd;
+		}
+	}
+	return maxfds;
+}
+
+void
+fdset_init(struct fdset *pfdset)
+{
+	int i;
+
+	if (pfdset == NULL)
+		return;
+
+	for (i = 0; i < MAX_FDS; i++)
+		pfdset->fd[i].fd = -1;
+	pfdset->num = 0;
+}
+
+/**
+ * Register the fd in the fdset with read/write handler and context.
+ */
+int
+fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
+{
+	int i;
+
+	if (pfdset == NULL || fd == -1)
+		return -1;
+
+	/* Find a free slot in the list. */
+	i = fdset_find_free_slot(pfdset);
+	if (i == -1)
+		return -2;
+
+	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
+	pfdset->num++;
+
+	return 0;
+}
+
+/**
+ *  Unregister the fd from the fdset.
+ */
+void
+fdset_del(struct fdset *pfdset, int fd)
+{
+	int i;
+
+	i = fdset_find_fd(pfdset, fd);
+	if (i != -1 && fd != -1) {
+		pfdset->fd[i].fd = -1;
+		pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
+		pfdset->num--;
+	}
+}
+
+/**
+ * This functions runs in infinite blocking loop until there is no fd in
+ * pfdset. It calls corresponding r/w handler if there is event on the fd.
+ */
+void
+fdset_event_dispatch(struct fdset *pfdset)
+{
+	fd_set rfds, wfds;
+	int i, maxfds;
+	struct fdentry *pfdentry;
+	int num = MAX_FDS;
+
+	if (pfdset == NULL)
+		return;
+
+	while (1) {
+		FD_ZERO(&rfds);
+		FD_ZERO(&wfds);
+		maxfds = fdset_fill(&rfds, &wfds, pfdset);
+		if (maxfds == -1)
+			return;
+
+		select(maxfds + 1, &rfds, &wfds, NULL, NULL);
+
+		for (i = 0; i < num; i++) {
+			pfdentry = &pfdset->fd[i];
+			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb)
+				pfdentry->rcb(pfdentry->fd, pfdentry->dat);
+			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb)
+				pfdentry->wcb(pfdentry->fd, pfdentry->dat);
+		}
+	}
+}
diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h
new file mode 100644
index 0000000..26b4619
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/fd_man.h
@@ -0,0 +1,64 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _FD_MAN_H_
+#define _FD_MAN_H_
+#include <stdint.h>
+
+#define MAX_FDS 1024
+
+typedef void (*fd_cb)(int fd, void *dat);
+
+struct fdentry {
+	int fd;		/* -1 indicates this entry is empty */
+	fd_cb rcb;	/* callback when this fd is readable. */
+	fd_cb wcb;	/* callback when this fd is writeable.*/
+	void *dat;	/* fd context */
+};
+
+struct fdset {
+	struct fdentry fd[MAX_FDS];
+	int num;	/* current fd number of this fdset */
+};
+
+
+void fdset_init(struct fdset *pfdset);
+
+int fdset_add(struct fdset *pfdset, int fd,
+	fd_cb rcb, fd_cb wcb, void *dat);
+
+void fdset_del(struct fdset *pfdset, int fd);
+
+void fdset_event_dispatch(struct fdset *pfdset);
+
+#endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (7 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  8:26   ` Linhaifeng
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 10/11] lib/librte_vhost: support dev->ifname for vhost-user Huawei Xie
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.

In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.

To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/Makefile                     |   8 +-
 lib/librte_vhost/rte_virtio_net.h             |   2 +
 lib/librte_vhost/vhost-net.h                  |   4 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 457 ++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
 lib/librte_vhost/virtio-net.c                 |  20 +-
 8 files changed, 951 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 797a806..52f6575 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -38,10 +38,14 @@ EXPORT_MAP := rte_vhost_version.map
 
 LIBABIVER := 1
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64
+CFLAGS += -I vhost_cuse -lfuse
+CFLAGS += -I vhost_user
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
+#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 0bf07c7..46c2072 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -50,6 +50,8 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 
+#define VHOST_MEMORY_MAX_NREGIONS 8
+
 /* Used to indicate that the device is running on a data core */
 #define VIRTIO_DEV_RUNNING 1
 
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 86b38a5..a56e405 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -41,7 +41,9 @@
 
 #include <rte_log.h>
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
+#include "rte_virtio_net.h"
+
+extern struct vhost_net_device_ops const *ops;
 
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
new file mode 100644
index 0000000..712a82f
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -0,0 +1,457 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <errno.h>
+
+#include <rte_log.h>
+#include <rte_virtio_net.h>
+
+#include "fd_man.h"
+#include "vhost-net-user.h"
+#include "vhost-net.h"
+#include "virtio-net-user.h"
+
+#define MAX_VIRTIO_BACKLOG 128
+static void vserver_new_vq_conn(int fd, void *data);
+static void vserver_message_handler(int fd, void *dat);
+struct vhost_net_device_ops const *ops;
+
+struct connfd_ctx {
+	struct vhost_server *vserver;
+	uint32_t fh;
+};
+
+#define MAX_VHOST_SERVER 1024
+static struct {
+	struct vhost_server *server[MAX_VHOST_SERVER];
+	struct fdset fdset;	/**< The fd list this vhost server manages. */
+} g_vhost_server;
+
+static int vserver_idx;
+
+static const char *vhost_message_str[VHOST_USER_MAX] = {
+	[VHOST_USER_NONE] = "VHOST_USER_NONE",
+	[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
+	[VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
+	[VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
+	[VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
+	[VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
+	[VHOST_USER_SET_LOG_BASE] = "VHOST_USER_SET_LOG_BASE",
+	[VHOST_USER_SET_LOG_FD] = "VHOST_USER_SET_LOG_FD",
+	[VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM",
+	[VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR",
+	[VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE",
+	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
+	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
+	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
+	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR"
+};
+
+/**
+ * Create a unix domain socket, bind to path and listen for connection.
+ * @return
+ *  socket fd or -1 on failure
+ */
+static int
+uds_socket(const char *path)
+{
+	struct sockaddr_un un;
+	int sockfd;
+	int ret;
+
+	if (path == NULL)
+		return -1;
+
+	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (sockfd < 0)
+		return -1;
+	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd);
+
+	memset(&un, 0, sizeof(un));
+	un.sun_family = AF_UNIX;
+	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
+	ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+	if (ret == -1)
+		goto err;
+	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
+
+	ret = listen(sockfd, MAX_VIRTIO_BACKLOG);
+	if (ret == -1)
+		goto err;
+
+	return sockfd;
+
+err:
+	close(sockfd);
+	return -1;
+}
+
+/* return bytes# of read on success or negative val on failure. */
+static int
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len  = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+	msgh.msg_control = control;
+	msgh.msg_controllen = sizeof(control);
+
+	ret = recvmsg(sockfd, &msgh, 0);
+	if (ret <= 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
+		return ret;
+	}
+
+	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
+		return -1;
+	}
+
+	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+		if ((cmsg->cmsg_level == SOL_SOCKET) &&
+			(cmsg->cmsg_type == SCM_RIGHTS)) {
+			memcpy(fds, CMSG_DATA(cmsg), fdsize);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/* return bytes# of read on success or negative val on failure. */
+static int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
+		msg->fds, VHOST_MEMORY_MAX_NREGIONS);
+	if (ret <= 0)
+		return ret;
+
+	if (msg && msg->size) {
+		if (msg->size > sizeof(msg->payload)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"invalid msg size: %d\n", msg->size);
+			return -1;
+		}
+		ret = read(sockfd, &msg->payload, msg->size);
+		if (ret <= 0)
+			return ret;
+		if (ret != (int)msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"read control message failed\n");
+			return -1;
+		}
+	}
+
+	return ret;
+}
+
+static int
+send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+
+	if (fds && fd_num > 0) {
+		msgh.msg_control = control;
+		msgh.msg_controllen = sizeof(control);
+		cmsg = CMSG_FIRSTHDR(&msgh);
+		cmsg->cmsg_len = CMSG_LEN(fdsize);
+		cmsg->cmsg_level = SOL_SOCKET;
+		cmsg->cmsg_type = SCM_RIGHTS;
+		memcpy(CMSG_DATA(cmsg), fds, fdsize);
+	} else {
+		msgh.msg_control = NULL;
+		msgh.msg_controllen = 0;
+	}
+
+	do {
+		ret = sendmsg(sockfd, &msgh, 0);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static int
+send_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	if (!msg)
+		return 0;
+
+	msg->flags &= ~VHOST_USER_VERSION_MASK;
+	msg->flags |= VHOST_USER_VERSION;
+	msg->flags |= VHOST_USER_REPLY_MASK;
+
+	ret = send_fd_message(sockfd, (char *)msg,
+		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+
+	return ret;
+}
+
+/* call back when there is new virtio connection.  */
+static void
+vserver_new_vq_conn(int fd, void *dat)
+{
+	struct vhost_server *vserver = (struct vhost_server *)dat;
+	int conn_fd;
+	struct connfd_ctx *ctx;
+	int fh;
+	struct vhost_device_ctx vdev_ctx = { 0 };
+
+	conn_fd = accept(fd, NULL, NULL);
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"new virtio connection is %d\n", conn_fd);
+	if (conn_fd < 0)
+		return;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (ctx == NULL) {
+		close(conn_fd);
+		return;
+	}
+
+	fh = ops->new_device(vdev_ctx);
+	if (fh == -1) {
+		free(ctx);
+		close(conn_fd);
+		return;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", fh);
+
+	ctx->vserver = vserver;
+	ctx->fh = fh;
+	fdset_add(&g_vhost_server.fdset,
+		conn_fd, vserver_message_handler, NULL, ctx);
+}
+
+/* callback when there is message on the connfd */
+static void
+vserver_message_handler(int connfd, void *dat)
+{
+	struct vhost_device_ctx ctx;
+	struct connfd_ctx *cfd_ctx = (struct connfd_ctx *)dat;
+	struct VhostUserMsg msg;
+	uint64_t features;
+	int ret;
+
+	ctx.fh = cfd_ctx->fh;
+	ret = read_vhost_message(connfd, &msg);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost read message failed\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	} else if (ret == 0) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"vhost peer closed\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	}
+	if (msg.request > VHOST_USER_MAX) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost read incorrect message\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
+		vhost_message_str[msg.request]);
+	switch (msg.request) {
+	case VHOST_USER_GET_FEATURES:
+		ret = ops->get_features(ctx, &features);
+		msg.payload.u64 = features;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+	case VHOST_USER_SET_FEATURES:
+		features = msg.payload.u64;
+		ops->set_features(ctx, &features);
+		break;
+
+	case VHOST_USER_SET_OWNER:
+		ops->set_owner(ctx);
+		break;
+	case VHOST_USER_RESET_OWNER:
+		ops->reset_owner(ctx);
+		break;
+
+	case VHOST_USER_SET_MEM_TABLE:
+		user_set_mem_table(ctx, &msg);
+		break;
+
+	case VHOST_USER_SET_LOG_BASE:
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
+	case VHOST_USER_SET_LOG_FD:
+		close(msg.fds[0]);
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
+		break;
+
+	case VHOST_USER_SET_VRING_NUM:
+		ops->set_vring_num(ctx, &msg.payload.state);
+		break;
+	case VHOST_USER_SET_VRING_ADDR:
+		ops->set_vring_addr(ctx, &msg.payload.addr);
+		break;
+	case VHOST_USER_SET_VRING_BASE:
+		ops->set_vring_base(ctx, &msg.payload.state);
+		break;
+
+	case VHOST_USER_GET_VRING_BASE:
+		ret = user_get_vring_base(ctx, &msg.payload.state);
+		msg.size = sizeof(msg.payload.state);
+		send_vhost_message(connfd, &msg);
+		break;
+
+	case VHOST_USER_SET_VRING_KICK:
+		user_set_vring_kick(ctx, &msg);
+		break;
+	case VHOST_USER_SET_VRING_CALL:
+		user_set_vring_call(ctx, &msg);
+		break;
+
+	case VHOST_USER_SET_VRING_ERR:
+		if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK))
+			close(msg.fds[0]);
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
+		break;
+
+	default:
+		break;
+
+	}
+}
+
+
+/**
+ * Creates and initialise the vhost server.
+ */
+int
+rte_vhost_driver_register(const char *path)
+{
+	struct vhost_server *vserver;
+
+	if (vserver_idx == 0) {
+		fdset_init(&g_vhost_server.fdset);
+		ops = get_virtio_net_callbacks();
+	}
+	if (vserver_idx == MAX_VHOST_SERVER)
+		return -1;
+
+	vserver = calloc(sizeof(struct vhost_server), 1);
+	if (vserver == NULL)
+		return -1;
+
+	unlink(path);
+
+	vserver->listenfd = uds_socket(path);
+	if (vserver->listenfd < 0) {
+		free(vserver);
+		return -1;
+	}
+	vserver->path = path;
+
+	fdset_add(&g_vhost_server.fdset, vserver->listenfd,
+		vserver_new_vq_conn, NULL,
+		vserver);
+
+	g_vhost_server.server[vserver_idx++] = vserver;
+
+	return 0;
+}
+
+
+int
+rte_vhost_driver_session_start(void)
+{
+	fdset_event_dispatch(&g_vhost_server.fdset);
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
new file mode 100644
index 0000000..1b6be6c
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -0,0 +1,106 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOST_NET_USER_H
+#define _VHOST_NET_USER_H
+
+#include <stdint.h>
+#include <linux/vhost.h>
+
+#include "rte_virtio_net.h"
+#include "fd_man.h"
+
+struct vhost_server {
+	const char *path; /**< The path the uds is bind to. */
+	int listenfd;     /**< The listener sockfd. */
+};
+
+/* refer to hw/virtio/vhost-user.c */
+
+typedef enum VhostUserRequest {
+	VHOST_USER_NONE = 0,
+	VHOST_USER_GET_FEATURES = 1,
+	VHOST_USER_SET_FEATURES = 2,
+	VHOST_USER_SET_OWNER = 3,
+	VHOST_USER_RESET_OWNER = 4,
+	VHOST_USER_SET_MEM_TABLE = 5,
+	VHOST_USER_SET_LOG_BASE = 6,
+	VHOST_USER_SET_LOG_FD = 7,
+	VHOST_USER_SET_VRING_NUM = 8,
+	VHOST_USER_SET_VRING_ADDR = 9,
+	VHOST_USER_SET_VRING_BASE = 10,
+	VHOST_USER_GET_VRING_BASE = 11,
+	VHOST_USER_SET_VRING_KICK = 12,
+	VHOST_USER_SET_VRING_CALL = 13,
+	VHOST_USER_SET_VRING_ERR = 14,
+	VHOST_USER_MAX
+} VhostUserRequest;
+
+typedef struct VhostUserMemoryRegion {
+	uint64_t guest_phys_addr;
+	uint64_t memory_size;
+	uint64_t userspace_addr;
+	uint64_t mmap_offset;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+	uint32_t nregions;
+	uint32_t padding;
+	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+	VhostUserRequest request;
+
+#define VHOST_USER_VERSION_MASK     0x3
+#define VHOST_USER_REPLY_MASK       (0x1 << 2)
+	uint32_t flags;
+	uint32_t size; /* the following payload size */
+	union {
+#define VHOST_USER_VRING_IDX_MASK   0xff
+#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
+		uint64_t u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		VhostUserMemory memory;
+	} payload;
+	int fds[VHOST_MEMORY_MAX_NREGIONS];
+} __attribute((packed)) VhostUserMsg;
+
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
+
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    0x1
+
+/*****************************************************************************/
+#endif
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
new file mode 100644
index 0000000..97c5177
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -0,0 +1,314 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+
+#include "virtio-net.h"
+#include "virtio-net-user.h"
+#include "vhost-net-user.h"
+#include "vhost-net.h"
+
+struct orig_region_map {
+	int fd;
+	uint64_t mapped_address;
+	uint64_t mapped_size;
+	uint64_t blksz;
+};
+
+#define orig_region(ptr, nregions) \
+	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
+		sizeof(struct virtio_memory) + \
+		sizeof(struct virtio_memory_regions) * (nregions)))
+
+static uint64_t
+get_blk_size(int fd)
+{
+	struct stat stat;
+
+	fstat(fd, &stat);
+	return (uint64_t)stat.st_blksize;
+}
+
+static void
+free_mem_region(struct virtio_net *dev)
+{
+	struct orig_region_map *region;
+	unsigned int idx;
+	uint64_t alignment;
+
+	if (!dev || !dev->mem)
+		return;
+
+	region = orig_region(dev->mem, dev->mem->nregions);
+	for (idx = 0; idx < dev->mem->nregions; idx++) {
+		if (region[idx].mapped_address) {
+			alignment = region[idx].blksz;
+			munmap((void *)
+				RTE_ALIGN_FLOOR(
+					region[idx].mapped_address, alignment),
+				RTE_ALIGN_CEIL(
+					region[idx].mapped_size, alignment));
+			close(region[idx].fd);
+		}
+	}
+}
+
+int
+user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct VhostUserMemory memory = pmsg->payload.memory;
+	struct virtio_memory_regions *pregion;
+	uint64_t mapped_address, mapped_size;
+	struct virtio_net *dev;
+	unsigned int idx = 0;
+	struct orig_region_map *pregion_orig;
+	uint64_t alignment;
+
+	/* unmap old memory regions one by one*/
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem) {
+		free_mem_region(dev);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+
+	dev->mem = calloc(1,
+		sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_regions) * memory.nregions +
+		sizeof(struct orig_region_map) * memory.nregions);
+	if (dev->mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to allocate memory for dev->mem\n",
+			dev->device_fh);
+		return -1;
+	}
+	dev->mem->nregions = memory.nregions;
+
+	pregion_orig = orig_region(dev->mem, memory.nregions);
+	for (idx = 0; idx < memory.nregions; idx++) {
+		pregion = &dev->mem->regions[idx];
+		pregion->guest_phys_address =
+			memory.regions[idx].guest_phys_addr;
+		pregion->guest_phys_address_end =
+			memory.regions[idx].guest_phys_addr +
+			memory.regions[idx].memory_size;
+		pregion->memory_size =
+			memory.regions[idx].memory_size;
+		pregion->userspace_address =
+			memory.regions[idx].userspace_addr;
+
+		/* This is ugly */
+		mapped_size = memory.regions[idx].memory_size +
+			memory.regions[idx].mmap_offset;
+		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
+			mapped_size,
+			PROT_READ | PROT_WRITE, MAP_SHARED,
+			pmsg->fds[idx],
+			0);
+
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"mapped region %d fd:%d to %p sz:0x%"PRIx64" off:0x%"PRIx64"\n",
+			idx, pmsg->fds[idx], (void *)mapped_address,
+			mapped_size, memory.regions[idx].mmap_offset);
+
+		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"mmap qemu guest failed.\n");
+			goto err_mmap;
+		}
+
+		pregion_orig[idx].mapped_address = mapped_address;
+		pregion_orig[idx].mapped_size = mapped_size;
+		pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
+		pregion_orig[idx].fd = pmsg->fds[idx];
+
+		mapped_address +=  memory.regions[idx].mmap_offset;
+
+		pregion->address_offset = mapped_address -
+			pregion->guest_phys_address;
+
+		if (memory.regions[idx].guest_phys_addr == 0) {
+			dev->mem->base_address =
+				memory.regions[idx].userspace_addr;
+			dev->mem->mapped_address =
+				pregion->address_offset;
+		}
+
+		LOG_DEBUG(VHOST_CONFIG,
+			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
+			idx,
+			(void *)(uintptr_t)pregion->guest_phys_address,
+			(void *)(uintptr_t)pregion->userspace_address,
+			 pregion->memory_size);
+	}
+
+	return 0;
+
+err_mmap:
+	while (idx--) {
+		alignment = pregion_orig[idx].blksz;
+		munmap((void *)RTE_ALIGN_FLOOR(
+			pregion_orig[idx].mapped_address, alignment),
+			RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size,
+					alignment));
+		close(pregion_orig[idx].fd);
+	}
+	free(dev->mem);
+	dev->mem = NULL;
+	return -1;
+}
+
+static int
+virtio_is_ready(struct virtio_net *dev)
+{
+	struct vhost_virtqueue *rvq, *tvq;
+
+	/* mq support in future.*/
+	rvq = dev->virtqueue[VIRTIO_RXQ];
+	tvq = dev->virtqueue[VIRTIO_TXQ];
+	if (rvq && tvq && rvq->desc && tvq->desc &&
+		(rvq->kickfd != (eventfd_t)-1) &&
+		(rvq->callfd != (eventfd_t)-1) &&
+		(tvq->kickfd != (eventfd_t)-1) &&
+		(tvq->callfd != (eventfd_t)-1)) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"virtio is now ready for processing.\n");
+		return 1;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"virtio isn't ready for processing.\n");
+	return 0;
+}
+
+void
+user_set_vring_call(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct vhost_vring_file file;
+
+	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
+		file.fd = -1;
+	else
+		file.fd = pmsg->fds[0];
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring call idx:%d file:%d\n", file.index, file.fd);
+	ops->set_vring_call(ctx, &file);
+}
+
+
+/*
+ *  In vhost-user, when we receive kick message, will test whether virtio
+ *  device is ready for packet processing.
+ */
+void
+user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct vhost_vring_file file;
+	struct virtio_net *dev = get_device(ctx);
+
+	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
+		file.fd = -1;
+	else
+		file.fd = pmsg->fds[0];
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring kick idx:%d file:%d\n", file.index, file.fd);
+	ops->set_vring_kick(ctx, &file);
+
+	if (virtio_is_ready(dev) &&
+		!(dev->flags & VIRTIO_DEV_RUNNING))
+			notify_ops->new_device(dev);
+}
+
+/*
+ * when virtio is stopped, qemu will send us the GET_VRING_BASE message.
+ */
+int
+user_get_vring_base(struct vhost_device_ctx ctx,
+	struct vhost_vring_state *state)
+{
+	struct virtio_net *dev = get_device(ctx);
+
+	/* We have to stop the queue (virtio) if it is running. */
+	if (dev->flags & VIRTIO_DEV_RUNNING)
+		notify_ops->destroy_device(dev);
+
+	/* Here we are safe to get the last used index */
+	ops->get_vring_base(ctx, state->index, state);
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring base idx:%d file:%d\n", state->index, state->num);
+	/*
+	 * Based on current qemu vhost-user implementation, this message is
+	 * sent and only sent in vhost_vring_stop.
+	 * TODO: cleanup the vring, it isn't usable since here.
+	 */
+	if (((int)dev->virtqueue[VIRTIO_RXQ]->callfd) >= 0) {
+		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
+		dev->virtqueue[VIRTIO_RXQ]->callfd = (eventfd_t)-1;
+	}
+	if (((int)dev->virtqueue[VIRTIO_TXQ]->callfd) >= 0) {
+		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
+		dev->virtqueue[VIRTIO_TXQ]->callfd = (eventfd_t)-1;
+	}
+
+	return 0;
+}
+
+void
+user_destroy_device(struct vhost_device_ctx ctx)
+{
+	struct virtio_net *dev = get_device(ctx);
+
+	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
+		notify_ops->destroy_device(dev);
+
+	if (dev && dev->mem) {
+		free_mem_region(dev);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
new file mode 100644
index 0000000..df24860
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_NET_USER_H
+#define _VIRTIO_NET_USER_H
+
+#include "vhost-net.h"
+#include "vhost-net-user.h"
+
+int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
+
+void user_set_vring_call(struct vhost_device_ctx, struct VhostUserMsg *);
+
+void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
+
+int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
+
+void user_destroy_device(struct vhost_device_ctx);
+#endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index c490f19..5cb50d2 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -50,6 +50,7 @@
 #include <rte_virtio_net.h>
 
 #include "vhost-net.h"
+#include "virtio-net.h"
 
 /*
  * Device linked list structure for configuration.
@@ -60,7 +61,7 @@ struct virtio_net_config_ll {
 };
 
 /* device ops to add/remove device to/from data core. */
-static struct virtio_net_device_ops const *notify_ops;
+struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -88,8 +89,9 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 		if ((qemu_va >= region->userspace_address) &&
 			(qemu_va <= region->userspace_address +
 			region->memory_size)) {
-			vhost_va = dev->mem->mapped_address + qemu_va -
-					dev->mem->base_address;
+			vhost_va = qemu_va + region->guest_phys_address +
+				region->address_offset -
+				region->userspace_address;
 			break;
 		}
 	}
@@ -119,7 +121,7 @@ get_config_ll_entry(struct vhost_device_ctx ctx)
  * Searches the configuration core linked list and
  * retrieves the device if it exists.
  */
-static struct virtio_net *
+struct virtio_net *
 get_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *ll_dev;
@@ -256,6 +258,11 @@ init_device(struct virtio_net *dev)
 	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
 	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
 
+	dev->virtqueue[VIRTIO_RXQ]->kickfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_RXQ]->callfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_TXQ]->kickfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_TXQ]->callfd = (eventfd_t)-1;
+
 	/* Backends are set to -1 indicating an inactive device. */
 	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
 	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
@@ -572,7 +579,7 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
 
-	if (vq->kickfd)
+	if ((int)vq->kickfd >= 0)
 		close((int)vq->kickfd);
 
 	vq->kickfd = file->fd;
@@ -598,8 +605,9 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
 
-	if (vq->callfd)
+	if ((int)vq->callfd >= 0)
 		close((int)vq->callfd);
+
 	vq->callfd = file->fd;
 
 	return 0;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 10/11] lib/librte_vhost: support dev->ifname for vhost-user
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (8 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server Huawei Xie
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/rte_virtio_net.h             |  3 +-
 lib/librte_vhost/vhost-net.h                  |  3 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  |  8 +++-
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 53 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  3 ++
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 +++
 lib/librte_vhost/virtio-net.c                 | 63 +++++++++------------------
 7 files changed, 95 insertions(+), 45 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 46c2072..611a3d4 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -100,7 +100,8 @@ struct virtio_net {
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		device_fh;	/**< device identifier. */
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
-	char			ifname[IFNAMSIZ];	/**< Name of the tap device. */
+#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
+	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	void			*priv;		/**< private context */
 } __rte_cache_aligned;
 
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index a56e405..0f3f8dc 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -93,6 +93,9 @@ struct vhost_net_device_ops {
 	int (*new_device)(struct vhost_device_ctx);
 	void (*destroy_device)(struct vhost_device_ctx);
 
+	void (*set_ifname)(struct vhost_device_ctx,
+		const char *if_name, unsigned int if_len);
+
 	int (*get_features)(struct vhost_device_ctx, uint64_t *);
 	int (*set_features)(struct vhost_device_ctx, uint64_t *);
 
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 72609a3..6b68abf 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -196,7 +196,13 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 	case VHOST_NET_SET_BACKEND:
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
+		if (!in_buf) {
+			VHOST_IOCTL_RETRY(sizeof(file), 0);
+			break;
+		}
+		file = *(const struct vhost_vring_file *)in_buf;
+		result = cuse_set_backend(ctx, &file);
+		fuse_reply_ioctl(req, result, NULL, 0);
 		break;
 
 	case VHOST_GET_FEATURES:
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index adebb54..ae2c3fa 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -43,6 +43,10 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/if_tun.h>
+#include <linux/if.h>
 #include <errno.h>
 
 #include <rte_log.h>
@@ -51,6 +55,7 @@
 #include "vhost-net.h"
 #include "virtio-net-cdev.h"
 #include "virtio-net.h"
+#include "eventfd_copy.h"
 
 /* Line size for reading maps file. */
 static const uint32_t BUFSIZE = PATH_MAX;
@@ -368,3 +373,51 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
 
 	return 0;
 }
+
+/*
+ * Function to get the tap device name from the provided file descriptor and
+ * save it in the device structure.
+ */
+static int
+get_ifname(struct vhost_device_ctx ctx, struct virtio_net *dev, int tap_fd, int pid)
+{
+	int fd_tap;
+	struct ifreq ifr;
+	uint32_t ifr_size;
+	int ret;
+
+	fd_tap = eventfd_copy(tap_fd, pid);
+	if (fd_tap < 0)
+		return -1;
+
+	ret = ioctl(fd_tap, TUNGETIFF, &ifr);
+
+	if (close(fd_tap) < 0)
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") fd close failed\n",
+			dev->device_fh);
+
+	if (ret >= 0) {
+		ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name));
+		ops->set_ifname(ctx, ifr.ifr_name, ifr_size);
+	} else
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") TUNGETIFF ioctl failed\n",
+			dev->device_fh);
+
+	return 0;
+}
+
+int cuse_set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (!(dev->flags & VIRTIO_DEV_RUNNING) && file->fd != VIRTIO_DEV_STOPPED)
+		get_ifname(ctx, dev, file->fd, ctx.pid);
+
+	return ops->set_backend(ctx, file);
+}
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
index 5ee81b1..eb6b0ba 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
@@ -42,4 +42,7 @@ int
 cuse_set_mem_table(struct vhost_device_ctx ctx,
 	const struct vhost_memory *mem_regions_addr, uint32_t nregions);
 
+int
+cuse_set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *);
+
 #endif
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 712a82f..634a498 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -268,6 +268,7 @@ vserver_new_vq_conn(int fd, void *dat)
 	struct connfd_ctx *ctx;
 	int fh;
 	struct vhost_device_ctx vdev_ctx = { 0 };
+	unsigned int size;
 
 	conn_fd = accept(fd, NULL, NULL);
 	RTE_LOG(INFO, VHOST_CONFIG,
@@ -287,6 +288,12 @@ vserver_new_vq_conn(int fd, void *dat)
 		close(conn_fd);
 		return;
 	}
+
+	vdev_ctx.fh = fh;
+	size = strnlen(vserver->path, PATH_MAX);
+	ops->set_ifname(vdev_ctx, vserver->path,
+		size);
+
 	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", fh);
 
 	ctx->vserver = vserver;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 5cb50d2..20567ff 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -40,8 +40,6 @@
 #include <unistd.h>
 
 #include <sys/socket.h>
-#include <linux/if_tun.h>
-#include <linux/if.h>
 
 #include <rte_ethdev.h>
 #include <rte_log.h>
@@ -354,6 +352,24 @@ destroy_device(struct vhost_device_ctx ctx)
 	}
 }
 
+static void
+set_ifname(struct vhost_device_ctx ctx,
+	const char *if_name, unsigned int if_len)
+{
+	struct virtio_net *dev;
+	unsigned int len;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return;
+
+	len = if_len > sizeof(dev->ifname) ?
+		sizeof(dev->ifname) : if_len;
+
+	strncpy(dev->ifname, if_name, len);
+}
+
+
 /*
  * Called from CUSE IOCTL: VHOST_SET_OWNER
  * This function just returns success at the moment unless
@@ -614,46 +630,6 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 }
 
 /*
- * Function to get the tap device name from the provided file descriptor and
- * save it in the device structure.
- */
-static int
-get_ifname(struct virtio_net *dev, int tap_fd, int pid)
-{
-	struct eventfd_copy fd_tap;
-	struct ifreq ifr;
-	uint32_t size, ifr_size;
-	int ret;
-
-	fd_tap.source_fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	fd_tap.target_fd = tap_fd;
-	fd_tap.target_pid = pid;
-
-	if (eventfd_copy(dev, &fd_tap))
-		return -1;
-
-	ret = ioctl(fd_tap.source_fd, TUNGETIFF, &ifr);
-
-	if (close(fd_tap.source_fd) < 0)
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") fd close failed\n",
-			dev->device_fh);
-
-	if (ret >= 0) {
-		ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name));
-		size = ifr_size > sizeof(dev->ifname) ?
-				sizeof(dev->ifname) : ifr_size;
-
-		strncpy(dev->ifname, ifr.ifr_name, size);
-	} else
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") TUNGETIFF ioctl failed\n",
-			dev->device_fh);
-
-	return 0;
-}
-
-/*
  * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND
  * To complete device initialisation when the virtio driver is loaded,
  * we are provided with a valid fd for a tap device (not used by us).
@@ -681,7 +657,6 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			get_ifname(dev, file->fd, ctx.pid);
 			return notify_ops->new_device(dev);
 		}
 	/* Otherwise we remove it. */
@@ -699,6 +674,8 @@ static const struct vhost_net_device_ops vhost_device_ops = {
 	.new_device = new_device,
 	.destroy_device = destroy_device,
 
+	.set_ifname = set_ifname,
+
 	.get_features = get_features,
 	.set_features = set_features,
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (9 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 10/11] lib/librte_vhost: support dev->ifname for vhost-user Huawei Xie
@ 2015-02-12  5:07 ` Huawei Xie
  2015-02-16  8:17   ` Tetsuya Mukawa
  2015-02-16 17:11   ` Ananyev, Konstantin
  2015-02-12  5:26 ` [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Xie, Huawei
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 41+ messages in thread
From: Huawei Xie @ 2015-02-12  5:07 UTC (permalink / raw)
  To: dev

* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.

mutex lock scenario in vhost:

* event_dispatch(in rte_vhost_driver_session_start) runs in a seperate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
  1. clears busy flag.
  2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.

The above steps ensures fd data context isn't freed when cb is using.

VM(s) should have been shutdown before rte_vhost_driver_unregister.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_user/fd_man.c         | 63 +++++++++++++++++++++++++---
 lib/librte_vhost/vhost_user/fd_man.h         |  5 ++-
 lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +++++++++------
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
index 929fbc3..63ac4df 100644
--- a/lib/librte_vhost/vhost_user/fd_man.c
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -40,6 +40,7 @@
 #include <sys/types.h>
 #include <unistd.h>
 
+#include <rte_common.h>
 #include <rte_log.h>
 
 #include "fd_man.h"
@@ -145,6 +146,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
 	if (pfdset == NULL || fd == -1)
 		return -1;
 
+	pthread_mutex_lock(&pfdset->fd_mutex);
+
 	/* Find a free slot in the list. */
 	i = fdset_find_free_slot(pfdset);
 	if (i == -1)
@@ -153,6 +156,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
 	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
 	pfdset->num++;
 
+	pthread_mutex_unlock(&pfdset->fd_mutex);
+
 	return 0;
 }
 
@@ -164,17 +169,36 @@ fdset_del(struct fdset *pfdset, int fd)
 {
 	int i;
 
+	if (pfdset == NULL || fd == -1)
+		return;
+
+again:
+	pthread_mutex_lock(&pfdset->fd_mutex);
+
 	i = fdset_find_fd(pfdset, fd);
 	if (i != -1 && fd != -1) {
+		/* busy indicates r/wcb is executing! */
+		if (pfdset->fd[i].busy == 1) {
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			goto again;
+		}
+
 		pfdset->fd[i].fd = -1;
 		pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
 		pfdset->num--;
 	}
+
+	pthread_mutex_unlock(&pfdset->fd_mutex);
 }
 
 /**
  * This functions runs in infinite blocking loop until there is no fd in
  * pfdset. It calls corresponding r/w handler if there is event on the fd.
+ *
+ * Before the callback is called, we set the flag to busy status; If other
+ * thread(now rte_vhost_driver_unregister) calls fdset_del concurrently, it
+ * will wait until the flag is reset to zero(which indicates the callback is
+ * finished), then it could free the context after fdset_del.
  */
 void
 fdset_event_dispatch(struct fdset *pfdset)
@@ -183,6 +207,10 @@ fdset_event_dispatch(struct fdset *pfdset)
 	int i, maxfds;
 	struct fdentry *pfdentry;
 	int num = MAX_FDS;
+	fd_cb rcb, wcb;
+	void *dat;
+	int fd;
+	int remove1, remove2;
 
 	if (pfdset == NULL)
 		return;
@@ -190,18 +218,41 @@ fdset_event_dispatch(struct fdset *pfdset)
 	while (1) {
 		FD_ZERO(&rfds);
 		FD_ZERO(&wfds);
+		pthread_mutex_lock(&pfdset->fd_mutex);
+
 		maxfds = fdset_fill(&rfds, &wfds, pfdset);
-		if (maxfds == -1)
-			return;
+		if (maxfds == -1) {
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			sleep(1);
+			continue;
+		}
+
+		pthread_mutex_unlock(&pfdset->fd_mutex);
 
 		select(maxfds + 1, &rfds, &wfds, NULL, NULL);
 
 		for (i = 0; i < num; i++) {
+			remove1 = remove2 = 0;
+			pthread_mutex_lock(&pfdset->fd_mutex);
 			pfdentry = &pfdset->fd[i];
-			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb)
-				pfdentry->rcb(pfdentry->fd, pfdentry->dat);
-			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb)
-				pfdentry->wcb(pfdentry->fd, pfdentry->dat);
+			fd = pfdentry->fd;
+			rcb = pfdentry->rcb;
+			wcb = pfdentry->wcb;
+			dat = pfdentry->dat;
+			pfdentry->busy = 1;
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
+				rcb(fd, dat, &remove1);
+			if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
+				wcb(fd, dat, &remove2);
+			pfdentry->busy = 0;
+			/*
+			 * fdset_del needs to check busy flag.
+			 * We don't allow fdset_del to be called in callback
+			 * directly.
+			 */
+			if (remove1 || remove2)
+				fdset_del(pfdset, fd);
 		}
 	}
 }
diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h
index 26b4619..74ecde2 100644
--- a/lib/librte_vhost/vhost_user/fd_man.h
+++ b/lib/librte_vhost/vhost_user/fd_man.h
@@ -34,20 +34,23 @@
 #ifndef _FD_MAN_H_
 #define _FD_MAN_H_
 #include <stdint.h>
+#include <pthread.h>
 
 #define MAX_FDS 1024
 
-typedef void (*fd_cb)(int fd, void *dat);
+typedef void (*fd_cb)(int fd, void *dat, int *remove);
 
 struct fdentry {
 	int fd;		/* -1 indicates this entry is empty */
 	fd_cb rcb;	/* callback when this fd is readable. */
 	fd_cb wcb;	/* callback when this fd is writeable.*/
 	void *dat;	/* fd context */
+	int busy;	/* whether this entry is being used in cb. */
 };
 
 struct fdset {
 	struct fdentry fd[MAX_FDS];
+	pthread_mutex_t fd_mutex;
 	int num;	/* current fd number of this fdset */
 };
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 634a498..3aa9436 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -41,6 +41,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <errno.h>
+#include <pthread.h>
 
 #include <rte_log.h>
 #include <rte_virtio_net.h>
@@ -51,8 +52,9 @@
 #include "virtio-net-user.h"
 
 #define MAX_VIRTIO_BACKLOG 128
-static void vserver_new_vq_conn(int fd, void *data);
-static void vserver_message_handler(int fd, void *dat);
+
+static void vserver_new_vq_conn(int fd, void *data, int *remove);
+static void vserver_message_handler(int fd, void *dat, int *remove);
 struct vhost_net_device_ops const *ops;
 
 struct connfd_ctx {
@@ -61,10 +63,18 @@ struct connfd_ctx {
 };
 
 #define MAX_VHOST_SERVER 1024
-static struct {
+struct _vhost_server {
 	struct vhost_server *server[MAX_VHOST_SERVER];
-	struct fdset fdset;	/**< The fd list this vhost server manages. */
-} g_vhost_server;
+	struct fdset fdset;
+};
+
+static struct _vhost_server g_vhost_server = {
+	.fdset = {
+		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.num = 0
+	},
+};
 
 static int vserver_idx;
 
@@ -261,7 +271,7 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 
 /* call back when there is new virtio connection.  */
 static void
-vserver_new_vq_conn(int fd, void *dat)
+vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove)
 {
 	struct vhost_server *vserver = (struct vhost_server *)dat;
 	int conn_fd;
@@ -304,7 +314,7 @@ vserver_new_vq_conn(int fd, void *dat)
 
 /* callback when there is message on the connfd */
 static void
-vserver_message_handler(int connfd, void *dat)
+vserver_message_handler(int connfd, void *dat, int *remove)
 {
 	struct vhost_device_ctx ctx;
 	struct connfd_ctx *cfd_ctx = (struct connfd_ctx *)dat;
@@ -319,7 +329,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost read message failed\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -330,7 +340,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost peer closed\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -342,7 +352,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost read incorrect message\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -426,10 +436,8 @@ rte_vhost_driver_register(const char *path)
 {
 	struct vhost_server *vserver;
 
-	if (vserver_idx == 0) {
-		fdset_init(&g_vhost_server.fdset);
+	if (vserver_idx == 0)
 		ops = get_virtio_net_callbacks();
-	}
 	if (vserver_idx == MAX_VHOST_SERVER)
 		return -1;
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (10 preceding siblings ...)
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server Huawei Xie
@ 2015-02-12  5:26 ` Xie, Huawei
  2015-02-16  8:19 ` Tetsuya Mukawa
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 41+ messages in thread
From: Xie, Huawei @ 2015-02-12  5:26 UTC (permalink / raw)
  To: dev

Sorry, forget to add --in-reply-to and add v2 changes.

v2 changes:
* vhost-cuse and vhost-user have their own set_mem_table message handler.
* rework and refine mutex operation in fd management to avoid race condition
* increase listen backlog
* use memset to fix initialization compiler errors reported by haifeng lin
* code style fixes
On 2/12/2015 1:08 PM, Huawei Xie wrote:
> vhost-user supports passing vring information to a seperate vhost enabled
> user space process, normally a user space vSwitch, through unix domain socket.
>
> In previous DPDK version, we implement a user space character device driver
> vhost-cuse in user space DPDK process. vring information is passed to the
> cuse driver through ioctl call, including eventfds for interrupt injection and
> host notification. A kernel module is developed to copy these fds from
> qemu process into our process. We also need some trick to map guest memory.
> (TODO: kickfd/callfd is reversed which causes confusion)
>
> known issue in vhost-user implementation in QEMU, reported by haifeng.lin@huawei.com
> * QEMU doesn't send correct memory region information with multiple numa node configuration
>         http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html
>
> Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1
> as fd on Ubuntu 14.04".
>
> Huawei Xie (11):
>  enable VIRTIO_NET_F_CTRL_RX
>  create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
>  rename vhost-net-cdev.h to vhost-net.h
>  move fd copying(from qemu process into vhost process) to eventfd_copy.c
>  copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
>  make host_memory_map a more generic function.
>  implement cuse_set_memory_table in virtio-net-cdev.c
>  add select based event driven processing
>  vhost user support
>  support dev->ifname
>  support calling rte_vhost_driver_register after rte_vhost_driver_session_start
>
>  lib/librte_vhost/Makefile                     |   8 +-
>  lib/librte_vhost/rte_virtio_net.h             |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
>  lib/librte_vhost/vhost-net-cdev.h             | 113 ------
>  lib/librte_vhost/vhost-net.h                  | 118 +++++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c                 |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
>  lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
>  lib/librte_vhost/virtio-net.h                 |  43 +++
>  19 files changed, 2491 insertions(+), 959 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support Huawei Xie
@ 2015-02-12  8:26   ` Linhaifeng
  2015-02-12  9:28     ` Xie, Huawei
  0 siblings, 1 reply; 41+ messages in thread
From: Linhaifeng @ 2015-02-12  8:26 UTC (permalink / raw)
  To: Huawei Xie, dev



On 2015/2/12 13:07, Huawei Xie wrote:
> +
> +		/* This is ugly */
> +		mapped_size = memory.regions[idx].memory_size +
> +			memory.regions[idx].mmap_offset;
> +		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> +			mapped_size,
> +			PROT_READ | PROT_WRITE, MAP_SHARED,
> +			pmsg->fds[idx],
> +			0);

Just another ugly way:
We can use the size of file to mmap then unmmap is not need align to the size of page.

-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support
  2015-02-12  8:26   ` Linhaifeng
@ 2015-02-12  9:28     ` Xie, Huawei
  2015-02-12 10:19       ` Linhaifeng
  0 siblings, 1 reply; 41+ messages in thread
From: Xie, Huawei @ 2015-02-12  9:28 UTC (permalink / raw)
  To: Linhaifeng, dev

On 2/12/2015 4:28 PM, Linhaifeng wrote:
>
> On 2015/2/12 13:07, Huawei Xie wrote:
>> +
>> +		/* This is ugly */
>> +		mapped_size = memory.regions[idx].memory_size +
>> +			memory.regions[idx].mmap_offset;
>> +		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
>> +			mapped_size,
>> +			PROT_READ | PROT_WRITE, MAP_SHARED,
>> +			pmsg->fds[idx],
>> +			0);
> Just another ugly way:
> We can use the size of file to mmap then unmmap is not need align to the size of page.
>
Yes, this is like how cuses handle mmap.
We will add this into the to-do list, combine all the regions the first,
check if they belong to the same file, and then map each file once.
Seems there is no elegant way.

There is another to do for mmap. If there are multiple virtio devices,
the memory are mapped for each virtio device. Actually we only need once.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support
  2015-02-12  9:28     ` Xie, Huawei
@ 2015-02-12 10:19       ` Linhaifeng
  0 siblings, 0 replies; 41+ messages in thread
From: Linhaifeng @ 2015-02-12 10:19 UTC (permalink / raw)
  To: Xie, Huawei, dev



On 2015/2/12 17:28, Xie, Huawei wrote:
> On 2/12/2015 4:28 PM, Linhaifeng wrote:
>>
>> On 2015/2/12 13:07, Huawei Xie wrote:
>>> +
>>> +		/* This is ugly */
>>> +		mapped_size = memory.regions[idx].memory_size +
>>> +			memory.regions[idx].mmap_offset;
>>> +		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
>>> +			mapped_size,
>>> +			PROT_READ | PROT_WRITE, MAP_SHARED,
>>> +			pmsg->fds[idx],
>>> +			0);
>> Just another ugly way:
>> We can use the size of file to mmap then unmmap is not need align to the size of page.
>>
> Yes, this is like how cuses handle mmap.
> We will add this into the to-do list, combine all the regions the first,
> check if they belong to the same file, and then map each file once.
> Seems there is no elegant way.
> 
> There is another to do for mmap. If there are multiple virtio devices,
> the memory are mapped for each virtio device. Actually we only need once.
> 

Great minds think alike.

The graceful way is qemu send a message to us to mmap whit which file and size then
we not need to mmap for each virtio device.

> 
> 
> 

-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Huawei Xie
@ 2015-02-16  8:15   ` Tetsuya Mukawa
  0 siblings, 0 replies; 41+ messages in thread
From: Tetsuya Mukawa @ 2015-02-16  8:15 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

Hi Xie,

Could you please check commit title?
I guess this commit title involves first sentence of commit log.

Thanks,
Tetsuya


On 2015/02/12 14:07, Huawei Xie wrote:
> In virtnet_send_command:
>
> 	/* Caller should know better */
> 	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
> 		(out + in > VIRTNET_SEND_COMMAND_SG_MAX));
>
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  lib/librte_vhost/virtio-net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index b041849..52b4957 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root;
>  
>  /* Features supported by this lib. */
>  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> -				  (1ULL << VIRTIO_NET_F_CTRL_RX))
> +				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> +				(1ULL << VIRTIO_NET_F_CTRL_RX))
>  static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>  
>  /* Line size for reading maps file. */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server Huawei Xie
@ 2015-02-16  8:17   ` Tetsuya Mukawa
  2015-02-16 17:11   ` Ananyev, Konstantin
  1 sibling, 0 replies; 41+ messages in thread
From: Tetsuya Mukawa @ 2015-02-16  8:17 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

On 2015/02/12 14:07, Huawei Xie wrote:
> * support calling rte_vhost_driver_register after rte_vhost_driver_session_start
> * add mutext to protect fdset from concurrent access
> * add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.
>
> mutex lock scenario in vhost:
>
> * event_dispatch(in rte_vhost_driver_session_start) runs in a seperate thread, infinitely
> processing vhost messages through cb(callback).
> * event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
> and releases the mutex.
> * vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
> * vserver_message_handler cb frees data context, marks remove flag to request to delete
> connfd(connection fd) from fdset.
> * after cb returns, event_dispatch
>   1. clears busy flag.
>   2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
> removes connfd from fdset.
> * rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
> calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.
>
> The above steps ensures fd data context isn't freed when cb is using.
>
> VM(s) should have been shutdown before rte_vhost_driver_unregister.
>
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> ---
>  lib/librte_vhost/vhost_user/fd_man.c         | 63 +++++++++++++++++++++++++---
>  lib/librte_vhost/vhost_user/fd_man.h         |  5 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +++++++++------
>  3 files changed, 82 insertions(+), 20 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
> index 929fbc3..63ac4df 100644
> --- a/lib/librte_vhost/vhost_user/fd_man.c
> +++ b/lib/librte_vhost/vhost_user/fd_man.c
> @@ -40,6 +40,7 @@
>  #include <sys/types.h>
>  #include <unistd.h>
>  
> +#include <rte_common.h>
>  #include <rte_log.h>
>  
>  #include "fd_man.h"
> @@ -145,6 +146,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
>  	if (pfdset == NULL || fd == -1)
>  		return -1;
>  
> +	pthread_mutex_lock(&pfdset->fd_mutex);
> +
>  	/* Find a free slot in the list. */
>  	i = fdset_find_free_slot(pfdset);
>  	if (i == -1)
> @@ -153,6 +156,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
>  	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
>  	pfdset->num++;
>  
> +	pthread_mutex_unlock(&pfdset->fd_mutex);
> +
>  	return 0;
>  }
>  
> @@ -164,17 +169,36 @@ fdset_del(struct fdset *pfdset, int fd)
>  {
>  	int i;
>  
> +	if (pfdset == NULL || fd == -1)
> +		return;
> +
> +again:
> +	pthread_mutex_lock(&pfdset->fd_mutex);
> +
>  	i = fdset_find_fd(pfdset, fd);
>  	if (i != -1 && fd != -1) {
> +		/* busy indicates r/wcb is executing! */
> +		if (pfdset->fd[i].busy == 1) {
> +			pthread_mutex_unlock(&pfdset->fd_mutex);
> +			goto again;
> +		}
> +
>  		pfdset->fd[i].fd = -1;
>  		pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
>  		pfdset->num--;
>  	}
> +
> +	pthread_mutex_unlock(&pfdset->fd_mutex);
>  }
>  
>  /**
>   * This functions runs in infinite blocking loop until there is no fd in
>   * pfdset. It calls corresponding r/w handler if there is event on the fd.
> + *
> + * Before the callback is called, we set the flag to busy status; If other
> + * thread(now rte_vhost_driver_unregister) calls fdset_del concurrently, it
> + * will wait until the flag is reset to zero(which indicates the callback is
> + * finished), then it could free the context after fdset_del.
>   */
>  void
>  fdset_event_dispatch(struct fdset *pfdset)
> @@ -183,6 +207,10 @@ fdset_event_dispatch(struct fdset *pfdset)
>  	int i, maxfds;
>  	struct fdentry *pfdentry;
>  	int num = MAX_FDS;
> +	fd_cb rcb, wcb;
> +	void *dat;
> +	int fd;
> +	int remove1, remove2;
>  
>  	if (pfdset == NULL)
>  		return;
> @@ -190,18 +218,41 @@ fdset_event_dispatch(struct fdset *pfdset)
>  	while (1) {
>  		FD_ZERO(&rfds);
>  		FD_ZERO(&wfds);
> +		pthread_mutex_lock(&pfdset->fd_mutex);
> +
>  		maxfds = fdset_fill(&rfds, &wfds, pfdset);
> -		if (maxfds == -1)
> -			return;
> +		if (maxfds == -1) {
> +			pthread_mutex_unlock(&pfdset->fd_mutex);
> +			sleep(1);
> +			continue;
> +		}
> +
> +		pthread_mutex_unlock(&pfdset->fd_mutex);
>  
>  		select(maxfds + 1, &rfds, &wfds, NULL, NULL);
>  
>  		for (i = 0; i < num; i++) {
> +			remove1 = remove2 = 0;
> +			pthread_mutex_lock(&pfdset->fd_mutex);
>  			pfdentry = &pfdset->fd[i];
> -			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb)
> -				pfdentry->rcb(pfdentry->fd, pfdentry->dat);
> -			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb)
> -				pfdentry->wcb(pfdentry->fd, pfdentry->dat);
> +			fd = pfdentry->fd;
> +			rcb = pfdentry->rcb;
> +			wcb = pfdentry->wcb;
> +			dat = pfdentry->dat;
> +			pfdentry->busy = 1;
> +			pthread_mutex_unlock(&pfdset->fd_mutex);
> +			if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
> +				rcb(fd, dat, &remove1);
> +			if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
> +				wcb(fd, dat, &remove2);

Hi Xie,

Should we add pthread_mutex_lock() before accessing pfdentry->busy?


> +			pfdentry->busy = 0;

Should we add pthread_mutex_unlock()?

Thanks,
Tetsuya


> +			/*
> +			 * fdset_del needs to check busy flag.
> +			 * We don't allow fdset_del to be called in callback
> +			 * directly.
> +			 */
> +			if (remove1 || remove2)
> +				fdset_del(pfdset, fd);
>  		}
>  	}
>  }
> diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h
> index 26b4619..74ecde2 100644
> --- a/lib/librte_vhost/vhost_user/fd_man.h
> +++ b/lib/librte_vhost/vhost_user/fd_man.h
> @@ -34,20 +34,23 @@
>  #ifndef _FD_MAN_H_
>  #define _FD_MAN_H_
>  #include <stdint.h>
> +#include <pthread.h>
>  
>  #define MAX_FDS 1024
>  
> -typedef void (*fd_cb)(int fd, void *dat);
> +typedef void (*fd_cb)(int fd, void *dat, int *remove);
>  
>  struct fdentry {
>  	int fd;		/* -1 indicates this entry is empty */
>  	fd_cb rcb;	/* callback when this fd is readable. */
>  	fd_cb wcb;	/* callback when this fd is writeable.*/
>  	void *dat;	/* fd context */
> +	int busy;	/* whether this entry is being used in cb. */
>  };
>  
>  struct fdset {
>  	struct fdentry fd[MAX_FDS];
> +	pthread_mutex_t fd_mutex;
>  	int num;	/* current fd number of this fdset */
>  };
>  
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
> index 634a498..3aa9436 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -41,6 +41,7 @@
>  #include <sys/socket.h>
>  #include <sys/un.h>
>  #include <errno.h>
> +#include <pthread.h>
>  
>  #include <rte_log.h>
>  #include <rte_virtio_net.h>
> @@ -51,8 +52,9 @@
>  #include "virtio-net-user.h"
>  
>  #define MAX_VIRTIO_BACKLOG 128
> -static void vserver_new_vq_conn(int fd, void *data);
> -static void vserver_message_handler(int fd, void *dat);
> +
> +static void vserver_new_vq_conn(int fd, void *data, int *remove);
> +static void vserver_message_handler(int fd, void *dat, int *remove);
>  struct vhost_net_device_ops const *ops;
>  
>  struct connfd_ctx {
> @@ -61,10 +63,18 @@ struct connfd_ctx {
>  };
>  
>  #define MAX_VHOST_SERVER 1024
> -static struct {
> +struct _vhost_server {
>  	struct vhost_server *server[MAX_VHOST_SERVER];
> -	struct fdset fdset;	/**< The fd list this vhost server manages. */
> -} g_vhost_server;
> +	struct fdset fdset;
> +};
> +
> +static struct _vhost_server g_vhost_server = {
> +	.fdset = {
> +		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
> +		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
> +		.num = 0
> +	},
> +};
>  
>  static int vserver_idx;
>  
> @@ -261,7 +271,7 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
>  
>  /* call back when there is new virtio connection.  */
>  static void
> -vserver_new_vq_conn(int fd, void *dat)
> +vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove)
>  {
>  	struct vhost_server *vserver = (struct vhost_server *)dat;
>  	int conn_fd;
> @@ -304,7 +314,7 @@ vserver_new_vq_conn(int fd, void *dat)
>  
>  /* callback when there is message on the connfd */
>  static void
> -vserver_message_handler(int connfd, void *dat)
> +vserver_message_handler(int connfd, void *dat, int *remove)
>  {
>  	struct vhost_device_ctx ctx;
>  	struct connfd_ctx *cfd_ctx = (struct connfd_ctx *)dat;
> @@ -319,7 +329,7 @@ vserver_message_handler(int connfd, void *dat)
>  			"vhost read message failed\n");
>  
>  		close(connfd);
> -		fdset_del(&g_vhost_server.fdset, connfd);
> +		*remove = 1;
>  		free(cfd_ctx);
>  		user_destroy_device(ctx);
>  		ops->destroy_device(ctx);
> @@ -330,7 +340,7 @@ vserver_message_handler(int connfd, void *dat)
>  			"vhost peer closed\n");
>  
>  		close(connfd);
> -		fdset_del(&g_vhost_server.fdset, connfd);
> +		*remove = 1;
>  		free(cfd_ctx);
>  		user_destroy_device(ctx);
>  		ops->destroy_device(ctx);
> @@ -342,7 +352,7 @@ vserver_message_handler(int connfd, void *dat)
>  			"vhost read incorrect message\n");
>  
>  		close(connfd);
> -		fdset_del(&g_vhost_server.fdset, connfd);
> +		*remove = 1;
>  		free(cfd_ctx);
>  		user_destroy_device(ctx);
>  		ops->destroy_device(ctx);
> @@ -426,10 +436,8 @@ rte_vhost_driver_register(const char *path)
>  {
>  	struct vhost_server *vserver;
>  
> -	if (vserver_idx == 0) {
> -		fdset_init(&g_vhost_server.fdset);
> +	if (vserver_idx == 0)
>  		ops = get_virtio_net_callbacks();
> -	}
>  	if (vserver_idx == MAX_VHOST_SERVER)
>  		return -1;
>  

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (11 preceding siblings ...)
  2015-02-12  5:26 ` [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Xie, Huawei
@ 2015-02-16  8:19 ` Tetsuya Mukawa
  2015-02-22 18:20   ` Thomas Monjalon
  2015-02-23  2:50 ` Tetsuya Mukawa
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
  14 siblings, 1 reply; 41+ messages in thread
From: Tetsuya Mukawa @ 2015-02-16  8:19 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

On 2015/02/12 14:07, Huawei Xie wrote:
> vhost-user supports passing vring information to a seperate vhost enabled
> user space process, normally a user space vSwitch, through unix domain socket.
>
> In previous DPDK version, we implement a user space character device driver
> vhost-cuse in user space DPDK process. vring information is passed to the
> cuse driver through ioctl call, including eventfds for interrupt injection and
> host notification. A kernel module is developed to copy these fds from
> qemu process into our process. We also need some trick to map guest memory.
> (TODO: kickfd/callfd is reversed which causes confusion)
>
> known issue in vhost-user implementation in QEMU, reported by haifeng.lin@huawei.com
> * QEMU doesn't send correct memory region information with multiple numa node configuration
>         http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html
>
> Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1
> as fd on Ubuntu 14.04".
>
> Huawei Xie (11):
>  enable VIRTIO_NET_F_CTRL_RX
>  create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
>  rename vhost-net-cdev.h to vhost-net.h
>  move fd copying(from qemu process into vhost process) to eventfd_copy.c
>  copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
>  make host_memory_map a more generic function.
>  implement cuse_set_memory_table in virtio-net-cdev.c
>  add select based event driven processing
>  vhost user support
>  support dev->ifname
>  support calling rte_vhost_driver_register after rte_vhost_driver_session_start
>
>  lib/librte_vhost/Makefile                     |   8 +-
>  lib/librte_vhost/rte_virtio_net.h             |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
>  lib/librte_vhost/vhost-net-cdev.h             | 113 ------
>  lib/librte_vhost/vhost-net.h                  | 118 +++++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c                 |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
>  lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
>  lib/librte_vhost/virtio-net.h                 |  43 +++
>  19 files changed, 2491 insertions(+), 959 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>

Hi Xie,

I have 2 questions about v2 patches.
Could you please check my other emails?

Also checkpatch.pl reports some warnings.
I am not sure how strictly we should follow checkpatch.pl.

Here is what I did.
$ git show --format=email | checkpatch.pl --no-tree --no-signoff -q -

Is there a consensus how to use checkpatch.pl in DPDK community?

Thanks,
Tetsuya

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing Huawei Xie
@ 2015-02-16 17:10   ` Ananyev, Konstantin
  0 siblings, 0 replies; 41+ messages in thread
From: Ananyev, Konstantin @ 2015-02-16 17:10 UTC (permalink / raw)
  To: Xie, Huawei, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, February 12, 2015 5:07 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing
> 
> for more generic event driven processing, refer to:
> 	http://libevent.org/
> 
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> ---
>  lib/librte_vhost/vhost_user/fd_man.c | 207 +++++++++++++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/fd_man.h |  64 +++++++++++
>  2 files changed, 271 insertions(+)
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server
  2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server Huawei Xie
  2015-02-16  8:17   ` Tetsuya Mukawa
@ 2015-02-16 17:11   ` Ananyev, Konstantin
  1 sibling, 0 replies; 41+ messages in thread
From: Ananyev, Konstantin @ 2015-02-16 17:11 UTC (permalink / raw)
  To: Xie, Huawei, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, February 12, 2015 5:07 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server
> 
> * support calling rte_vhost_driver_register after rte_vhost_driver_session_start
> * add mutext to protect fdset from concurrent access
> * add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.
> 
> mutex lock scenario in vhost:
> 
> * event_dispatch(in rte_vhost_driver_session_start) runs in a seperate thread, infinitely
> processing vhost messages through cb(callback).
> * event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
> and releases the mutex.
> * vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
> * vserver_message_handler cb frees data context, marks remove flag to request to delete
> connfd(connection fd) from fdset.
> * after cb returns, event_dispatch
>   1. clears busy flag.
>   2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
> removes connfd from fdset.
> * rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
> calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.
> 
> The above steps ensures fd data context isn't freed when cb is using.
> 
> VM(s) should have been shutdown before rte_vhost_driver_unregister.
> 
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> ---
>  lib/librte_vhost/vhost_user/fd_man.c         | 63 +++++++++++++++++++++++++---
>  lib/librte_vhost/vhost_user/fd_man.h         |  5 ++-
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +++++++++------
>  3 files changed, 82 insertions(+), 20 deletions(-)
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-16  8:19 ` Tetsuya Mukawa
@ 2015-02-22 18:20   ` Thomas Monjalon
  2015-02-23 13:53     ` Czesnowicz, Przemyslaw
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Monjalon @ 2015-02-22 18:20 UTC (permalink / raw)
  To: Huawei Xie; +Cc: dev

2015-02-16 17:19, Tetsuya Mukawa:
> On 2015/02/12 14:07, Huawei Xie wrote:
> > vhost-user supports passing vring information to a seperate vhost enabled
> > user space process, normally a user space vSwitch, through unix domain socket.
> >
> > In previous DPDK version, we implement a user space character device driver
> > vhost-cuse in user space DPDK process. vring information is passed to the
> > cuse driver through ioctl call, including eventfds for interrupt injection and
> > host notification. A kernel module is developed to copy these fds from
> > qemu process into our process. We also need some trick to map guest memory.
> > (TODO: kickfd/callfd is reversed which causes confusion)
> >
> > known issue in vhost-user implementation in QEMU, reported by haifeng.lin@huawei.com
> > * QEMU doesn't send correct memory region information with multiple numa node configuration
> >         http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html
> >
> > Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1
> > as fd on Ubuntu 14.04".
> >
> > Huawei Xie (11):
> >  enable VIRTIO_NET_F_CTRL_RX
> >  create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
> >  rename vhost-net-cdev.h to vhost-net.h
> >  move fd copying(from qemu process into vhost process) to eventfd_copy.c
> >  copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
> >  make host_memory_map a more generic function.
> >  implement cuse_set_memory_table in virtio-net-cdev.c
> >  add select based event driven processing
> >  vhost user support
> >  support dev->ifname
> >  support calling rte_vhost_driver_register after rte_vhost_driver_session_start
> 
> Hi Xie,
> 
> I have 2 questions about v2 patches.
> Could you please check my other emails?

I tried to locally applied the patches, waiting comments are closed.
But I stopped after patch 04/11 which makes compilation failing.
I'm so sorry that we still don't have a vhost-user support integrated in DPDK.
I feel it won't be ready in next days to be able to enter in 2.0 version.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (12 preceding siblings ...)
  2015-02-16  8:19 ` Tetsuya Mukawa
@ 2015-02-23  2:50 ` Tetsuya Mukawa
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
  14 siblings, 0 replies; 41+ messages in thread
From: Tetsuya Mukawa @ 2015-02-23  2:50 UTC (permalink / raw)
  To: dev

On 2015/02/12 14:07, Huawei Xie wrote:
> vhost-user supports passing vring information to a seperate vhost enabled
> user space process, normally a user space vSwitch, through unix domain socket.
>
> In previous DPDK version, we implement a user space character device driver
> vhost-cuse in user space DPDK process. vring information is passed to the
> cuse driver through ioctl call, including eventfds for interrupt injection and
> host notification. A kernel module is developed to copy these fds from
> qemu process into our process. We also need some trick to map guest memory.
> (TODO: kickfd/callfd is reversed which causes confusion)
>
> known issue in vhost-user implementation in QEMU, reported by haifeng.lin@huawei.com
> * QEMU doesn't send correct memory region information with multiple numa node configuration
>         http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01454.html
>
> Thanks Tetsuya for reporting the issue that "FD_ISSET would crash when receive -1
> as fd on Ubuntu 14.04".
>
> Huawei Xie (11):
>  enable VIRTIO_NET_F_CTRL_RX
>  create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
>  rename vhost-net-cdev.h to vhost-net.h
>  move fd copying(from qemu process into vhost process) to eventfd_copy.c
>  copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
>  make host_memory_map a more generic function.
>  implement cuse_set_memory_table in virtio-net-cdev.c
>  add select based event driven processing
>  vhost user support
>  support dev->ifname
>  support calling rte_vhost_driver_register after rte_vhost_driver_session_start
>
>  lib/librte_vhost/Makefile                     |   8 +-
>  lib/librte_vhost/rte_virtio_net.h             |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
>  lib/librte_vhost/vhost-net-cdev.h             | 113 ------
>  lib/librte_vhost/vhost-net.h                  | 118 +++++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c                 |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
>  lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
>  lib/librte_vhost/virtio-net.h                 |  43 +++
>  19 files changed, 2491 insertions(+), 959 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>
Acked-by: Tetsuya Mukawa <mukawa@igel.co.jp>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-22 18:20   ` Thomas Monjalon
@ 2015-02-23 13:53     ` Czesnowicz, Przemyslaw
  2015-02-23 14:08       ` Thomas Monjalon
  0 siblings, 1 reply; 41+ messages in thread
From: Czesnowicz, Przemyslaw @ 2015-02-23 13:53 UTC (permalink / raw)
  To: Thomas Monjalon, Xie, Huawei; +Cc: dev

> I tried to locally applied the patches, waiting comments are closed.
> But I stopped after patch 04/11 which makes compilation failing.
> I'm so sorry that we still don't have a vhost-user support integrated in DPDK.
> I feel it won't be ready in next days to be able to enter in 2.0 version.

Hi Thomas,

You are seeing this compile failure because Huawei was working
on an older tree and did a rebase as patch #10.
If you apply all the patches from the series they compile and work just fine. 

Unfortunately Huawei is not available at the moment.
We could squash all the patches into one and resend it to the ML.
Is that ok?

Regards
Przemek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-23 13:53     ` Czesnowicz, Przemyslaw
@ 2015-02-23 14:08       ` Thomas Monjalon
  2015-02-23 14:15         ` Czesnowicz, Przemyslaw
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Monjalon @ 2015-02-23 14:08 UTC (permalink / raw)
  To: Czesnowicz, Przemyslaw; +Cc: dev

2015-02-23 13:53, Czesnowicz, Przemyslaw:
> > I tried to locally applied the patches, waiting comments are closed.
> > But I stopped after patch 04/11 which makes compilation failing.
> > I'm so sorry that we still don't have a vhost-user support integrated in DPDK.
> > I feel it won't be ready in next days to be able to enter in 2.0 version.
> 
> Hi Thomas,
> 
> You are seeing this compile failure because Huawei was working
> on an older tree and did a rebase as patch #10.
> If you apply all the patches from the series they compile and work just fine. 
> 
> Unfortunately Huawei is not available at the moment.
> We could squash all the patches into one and resend it to the ML.
> Is that ok?

Are you joking?
No we need to have patches well split and have them compiling.
Is there someone able to fix correctly the patchset and resubmit them?
I don't want to lose time to fix it myself.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support
  2015-02-23 14:08       ` Thomas Monjalon
@ 2015-02-23 14:15         ` Czesnowicz, Przemyslaw
  0 siblings, 0 replies; 41+ messages in thread
From: Czesnowicz, Przemyslaw @ 2015-02-23 14:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

> 2015-02-23 13:53, Czesnowicz, Przemyslaw:
> > > I tried to locally applied the patches, waiting comments are closed.
> > > But I stopped after patch 04/11 which makes compilation failing.
> > > I'm so sorry that we still don't have a vhost-user support integrated in DPDK.
> > > I feel it won't be ready in next days to be able to enter in 2.0 version.
> >
> > Hi Thomas,
> >
> > You are seeing this compile failure because Huawei was working on an
> > older tree and did a rebase as patch #10.
> > If you apply all the patches from the series they compile and work just fine.
> >
> > Unfortunately Huawei is not available at the moment.
> > We could squash all the patches into one and resend it to the ML.
> > Is that ok?
> 
> Are you joking?
I was expecting this answer, sorry for that.
> No we need to have patches well split and have them compiling.
> Is there someone able to fix correctly the patchset and resubmit them?
> I don't want to lose time to fix it myself.

I'll fix the patchset and resubmit.
Przemek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support
  2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
                   ` (13 preceding siblings ...)
  2015-02-23  2:50 ` Tetsuya Mukawa
@ 2015-02-23 17:36 ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Przemyslaw Czesnowicz
                     ` (12 more replies)
  14 siblings, 13 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

v3 changes:
  * move things around to make all patches compile
  

Xie, Huawei (11):
  lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is
    dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver
    in guest would crash with only CTRL_RX enabled.
  lib/librte_vhost: create vhost_cuse directory and move
    vhost-net-cdev.c into vhost_cuse
  lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
  lib/librte_vhost: move fd copying(from qemu process into vhost
    process) to eventfd_copy.c
  lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file
    virtio-net-cdev.c
  lib/librte_vhost: make host_memory_map a more generic function.
  lib/librte_vhost: implement cuse_set_memory_table
  lib/librte_vhost: add select based event driven processing
  lib/librte_vhost: vhost user support
  lib/librte_vhost: support dev->ifname for vhost-user
  lib/librte_vhost: support dynamically registering vhost server

 lib/librte_vhost/Makefile                     |   8 +-
 lib/librte_vhost/rte_virtio_net.h             |   5 +-
 lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
 lib/librte_vhost/vhost-net-cdev.h             | 113 ------
 lib/librte_vhost/vhost-net.h                  | 118 +++++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
 lib/librte_vhost/vhost_rxtx.c                 |   2 +-
 lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
 lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
 lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
 lib/librte_vhost/virtio-net.h                 |  43 +++
 19 files changed, 2491 insertions(+), 959 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.h
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
 create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
 create mode 100644 lib/librte_vhost/virtio-net.h

-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Przemyslaw Czesnowicz
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

In virtnet_send_command:

	/* Caller should know better */
	BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
		(out + in > VIRTNET_SEND_COMMAND_SG_MAX));

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index b041849..52b4957 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root;
 
 /* Features supported by this lib. */
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
-				  (1ULL << VIRTIO_NET_F_CTRL_RX))
+				(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
+				(1ULL << VIRTIO_NET_F_CTRL_RX))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
 /* Line size for reading maps file. */
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Przemyslaw Czesnowicz
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

vhost-cuse driver will be divided into two parts: cuse driver specific message
handling(in cuse directory) and common message handling(in virtio-net.c).

vhost ioctl message is pre-processed in cuse and then sent to virtio-net
if is not terminated.

virtio-net.c provides common message handling for both vhost-cuse and vhost-user.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/Makefile                    |   4 +-
 lib/librte_vhost/vhost-net-cdev.c            | 389 ---------------------------
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 389 +++++++++++++++++++++++++++
 3 files changed, 391 insertions(+), 391 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 369c25a..49ae7ae 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -38,10 +38,10 @@ EXPORT_MAP := rte_vhost_version.map
 
 LIBABIVER := 1
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net-cdev.c b/lib/librte_vhost/vhost-net-cdev.c
deleted file mode 100644
index 57c76cb..0000000
--- a/lib/librte_vhost/vhost-net-cdev.c
+++ /dev/null
@@ -1,389 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <errno.h>
-#include <fuse/cuse_lowlevel.h>
-#include <linux/limits.h>
-#include <linux/vhost.h>
-#include <stdint.h>
-#include <string.h>
-#include <unistd.h>
-
-#include <rte_ethdev.h>
-#include <rte_log.h>
-#include <rte_string_fns.h>
-#include <rte_virtio_net.h>
-
-#include "vhost-net-cdev.h"
-
-#define FUSE_OPT_DUMMY "\0\0"
-#define FUSE_OPT_FORE  "-f\0\0"
-#define FUSE_OPT_NOMULTI "-s\0\0"
-
-static const uint32_t default_major = 231;
-static const uint32_t default_minor = 1;
-static const char cuse_device_name[] = "/dev/cuse";
-static const char default_cdev[] = "vhost-net";
-
-static struct fuse_session *session;
-static struct vhost_net_device_ops const *ops;
-
-/*
- * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
- * when the device is added to the device linked list.
- */
-static struct vhost_device_ctx
-fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
-{
-	struct vhost_device_ctx ctx;
-	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
-
-	ctx.pid = req_ctx->pid;
-	ctx.fh = fi->fh;
-
-	return ctx;
-}
-
-/*
- * When the device is created in QEMU it gets initialised here and
- * added to the device linked list.
- */
-static void
-vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
-{
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-	int err = 0;
-
-	err = ops->new_device(ctx);
-	if (err == -1) {
-		fuse_reply_err(req, EPERM);
-		return;
-	}
-
-	fi->fh = err;
-
-	RTE_LOG(INFO, VHOST_CONFIG,
-		"(%"PRIu64") Device configuration started\n", fi->fh);
-	fuse_reply_open(req, fi);
-}
-
-/*
- * When QEMU is shutdown or killed the device gets released.
- */
-static void
-vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
-{
-	int err = 0;
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-
-	ops->destroy_device(ctx);
-	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh);
-	fuse_reply_err(req, err);
-}
-
-/*
- * Boilerplate code for CUSE IOCTL
- * Implicit arguments: ctx, req, result.
- */
-#define VHOST_IOCTL(func) do {	\
-	result = (func)(ctx);	\
-	fuse_reply_ioctl(req, result, NULL, 0);	\
-} while (0)
-
-/*
- * Boilerplate IOCTL RETRY
- * Implicit arguments: req.
- */
-#define VHOST_IOCTL_RETRY(size_r, size_w) do {	\
-	struct iovec iov_r = { arg, (size_r) };	\
-	struct iovec iov_w = { arg, (size_w) };	\
-	fuse_reply_ioctl_retry(req, &iov_r,	\
-		(size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\
-} while (0)
-
-/*
- * Boilerplate code for CUSE Read IOCTL
- * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
- */
-#define VHOST_IOCTL_R(type, var, func) do {	\
-	if (!in_bufsz) {	\
-		VHOST_IOCTL_RETRY(sizeof(type), 0);\
-	} else {	\
-		(var) = *(const type*)in_buf;	\
-		result = func(ctx, &(var));	\
-		fuse_reply_ioctl(req, result, NULL, 0);\
-	}	\
-} while (0)
-
-/*
- * Boilerplate code for CUSE Write IOCTL
- * Implicit arguments: ctx, req, result, out_bufsz.
- */
-#define VHOST_IOCTL_W(type, var, func) do {	\
-	if (!out_bufsz) {	\
-		VHOST_IOCTL_RETRY(0, sizeof(type));\
-	} else {	\
-		result = (func)(ctx, &(var));\
-		fuse_reply_ioctl(req, result, &(var), sizeof(type));\
-	} \
-} while (0)
-
-/*
- * Boilerplate code for CUSE Read/Write IOCTL
- * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
- */
-#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do {	\
-	if (!in_bufsz) {	\
-		VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\
-	} else {	\
-		(var1) = *(const type1*) (in_buf);	\
-		result = (func)(ctx, (var1), &(var2));	\
-		fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\
-	}	\
-} while (0)
-
-/*
- * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on the type
- * of IOCTL a buffer is requested to read or to write. This request is handled
- * by FUSE and the buffer is then given to CUSE.
- */
-static void
-vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
-		struct fuse_file_info *fi, __rte_unused unsigned flags,
-		const void *in_buf, size_t in_bufsz, size_t out_bufsz)
-{
-	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
-	struct vhost_vring_file file;
-	struct vhost_vring_state state;
-	struct vhost_vring_addr addr;
-	uint64_t features;
-	uint32_t index;
-	int result = 0;
-
-	switch (cmd) {
-	case VHOST_NET_SET_BACKEND:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
-		break;
-
-	case VHOST_GET_FEATURES:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh);
-		VHOST_IOCTL_W(uint64_t, features, ops->get_features);
-		break;
-
-	case VHOST_SET_FEATURES:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh);
-		VHOST_IOCTL_R(uint64_t, features, ops->set_features);
-		break;
-
-	case VHOST_RESET_OWNER:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh);
-		VHOST_IOCTL(ops->reset_owner);
-		break;
-
-	case VHOST_SET_OWNER:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
-		VHOST_IOCTL(ops->set_owner);
-		break;
-
-	case VHOST_SET_MEM_TABLE:
-		/*TODO fix race condition.*/
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh);
-		static struct vhost_memory mem_temp;
-
-		switch (in_bufsz) {
-		case 0:
-			VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0);
-			break;
-
-		case sizeof(struct vhost_memory):
-			mem_temp = *(const struct vhost_memory *) in_buf;
-
-			if (mem_temp.nregions > 0) {
-				VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) +
-					(sizeof(struct vhost_memory_region) *
-						mem_temp.nregions), 0);
-			} else {
-				result = -1;
-				fuse_reply_ioctl(req, result, NULL, 0);
-			}
-			break;
-
-		default:
-			result = ops->set_mem_table(ctx,
-					in_buf, mem_temp.nregions);
-			if (result)
-				fuse_reply_err(req, EINVAL);
-			else
-				fuse_reply_ioctl(req, result, NULL, 0);
-		}
-		break;
-
-	case VHOST_SET_VRING_NUM:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_state, state,
-			ops->set_vring_num);
-		break;
-
-	case VHOST_SET_VRING_BASE:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_state, state,
-			ops->set_vring_base);
-		break;
-
-	case VHOST_GET_VRING_BASE:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh);
-		VHOST_IOCTL_RW(uint32_t, index,
-			struct vhost_vring_state, state, ops->get_vring_base);
-		break;
-
-	case VHOST_SET_VRING_ADDR:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_addr, addr,
-			ops->set_vring_addr);
-		break;
-
-	case VHOST_SET_VRING_KICK:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_kick);
-		break;
-
-	case VHOST_SET_VRING_CALL:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_call);
-		break;
-
-	default:
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
-		result = -1;
-		fuse_reply_ioctl(req, result, NULL, 0);
-	}
-
-	if (result < 0)
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
-	else
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
-}
-
-/*
- * Structure handling open, release and ioctl function pointers is populated.
- */
-static const struct cuse_lowlevel_ops vhost_net_ops = {
-	.open		= vhost_net_open,
-	.release	= vhost_net_release,
-	.ioctl		= vhost_net_ioctl,
-};
-
-/*
- * cuse_info is populated and used to register the cuse device.
- * vhost_net_device_ops are also passed when the device is registered in app.
- */
-int
-rte_vhost_driver_register(const char *dev_name)
-{
-	struct cuse_info cuse_info;
-	char device_name[PATH_MAX] = "";
-	char char_device_name[PATH_MAX] = "";
-	const char *device_argv[] = { device_name };
-
-	char fuse_opt_dummy[] = FUSE_OPT_DUMMY;
-	char fuse_opt_fore[] = FUSE_OPT_FORE;
-	char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI;
-	char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti};
-
-	if (access(cuse_device_name, R_OK | W_OK) < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"char device %s can't be accessed, maybe not exist\n",
-			cuse_device_name);
-		return -1;
-	}
-
-	/*
-	 * The device name is created. This is passed to QEMU so that it can
-	 * register the device with our application.
-	 */
-	snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name);
-	snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name);
-
-	/* Check if device already exists. */
-	if (access(char_device_name, F_OK) != -1) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"char device %s already exists\n", char_device_name);
-		return -1;
-	}
-
-	memset(&cuse_info, 0, sizeof(cuse_info));
-	cuse_info.dev_major = default_major;
-	cuse_info.dev_minor = default_minor;
-	cuse_info.dev_info_argc = 1;
-	cuse_info.dev_info_argv = device_argv;
-	cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
-
-	ops = get_virtio_net_callbacks();
-
-	session = cuse_lowlevel_setup(3, fuse_argv,
-			&cuse_info, &vhost_net_ops, 0, NULL);
-	if (session == NULL)
-		return -1;
-
-	return 0;
-}
-
-/**
- * The CUSE session is launched allowing the application to receive open,
- * release and ioctl calls.
- */
-int
-rte_vhost_driver_session_start(void)
-{
-	fuse_session_loop(session);
-
-	return 0;
-}
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
new file mode 100644
index 0000000..57c76cb
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -0,0 +1,389 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <errno.h>
+#include <fuse/cuse_lowlevel.h>
+#include <linux/limits.h>
+#include <linux/vhost.h>
+#include <stdint.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_log.h>
+#include <rte_string_fns.h>
+#include <rte_virtio_net.h>
+
+#include "vhost-net-cdev.h"
+
+#define FUSE_OPT_DUMMY "\0\0"
+#define FUSE_OPT_FORE  "-f\0\0"
+#define FUSE_OPT_NOMULTI "-s\0\0"
+
+static const uint32_t default_major = 231;
+static const uint32_t default_minor = 1;
+static const char cuse_device_name[] = "/dev/cuse";
+static const char default_cdev[] = "vhost-net";
+
+static struct fuse_session *session;
+static struct vhost_net_device_ops const *ops;
+
+/*
+ * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
+ * when the device is added to the device linked list.
+ */
+static struct vhost_device_ctx
+fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
+{
+	struct vhost_device_ctx ctx;
+	struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
+
+	ctx.pid = req_ctx->pid;
+	ctx.fh = fi->fh;
+
+	return ctx;
+}
+
+/*
+ * When the device is created in QEMU it gets initialised here and
+ * added to the device linked list.
+ */
+static void
+vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
+{
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+	int err = 0;
+
+	err = ops->new_device(ctx);
+	if (err == -1) {
+		fuse_reply_err(req, EPERM);
+		return;
+	}
+
+	fi->fh = err;
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"(%"PRIu64") Device configuration started\n", fi->fh);
+	fuse_reply_open(req, fi);
+}
+
+/*
+ * When QEMU is shutdown or killed the device gets released.
+ */
+static void
+vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
+{
+	int err = 0;
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+
+	ops->destroy_device(ctx);
+	RTE_LOG(INFO, VHOST_CONFIG, "(%"PRIu64") Device released\n", ctx.fh);
+	fuse_reply_err(req, err);
+}
+
+/*
+ * Boilerplate code for CUSE IOCTL
+ * Implicit arguments: ctx, req, result.
+ */
+#define VHOST_IOCTL(func) do {	\
+	result = (func)(ctx);	\
+	fuse_reply_ioctl(req, result, NULL, 0);	\
+} while (0)
+
+/*
+ * Boilerplate IOCTL RETRY
+ * Implicit arguments: req.
+ */
+#define VHOST_IOCTL_RETRY(size_r, size_w) do {	\
+	struct iovec iov_r = { arg, (size_r) };	\
+	struct iovec iov_w = { arg, (size_w) };	\
+	fuse_reply_ioctl_retry(req, &iov_r,	\
+		(size_r) ? 1 : 0, &iov_w, (size_w) ? 1 : 0);\
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Read IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_IOCTL_R(type, var, func) do {	\
+	if (!in_bufsz) {	\
+		VHOST_IOCTL_RETRY(sizeof(type), 0);\
+	} else {	\
+		(var) = *(const type*)in_buf;	\
+		result = func(ctx, &(var));	\
+		fuse_reply_ioctl(req, result, NULL, 0);\
+	}	\
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Write IOCTL
+ * Implicit arguments: ctx, req, result, out_bufsz.
+ */
+#define VHOST_IOCTL_W(type, var, func) do {	\
+	if (!out_bufsz) {	\
+		VHOST_IOCTL_RETRY(0, sizeof(type));\
+	} else {	\
+		result = (func)(ctx, &(var));\
+		fuse_reply_ioctl(req, result, &(var), sizeof(type));\
+	} \
+} while (0)
+
+/*
+ * Boilerplate code for CUSE Read/Write IOCTL
+ * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
+ */
+#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do {	\
+	if (!in_bufsz) {	\
+		VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));\
+	} else {	\
+		(var1) = *(const type1*) (in_buf);	\
+		result = (func)(ctx, (var1), &(var2));	\
+		fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\
+	}	\
+} while (0)
+
+/*
+ * The IOCTLs are handled using CUSE/FUSE in userspace. Depending on the type
+ * of IOCTL a buffer is requested to read or to write. This request is handled
+ * by FUSE and the buffer is then given to CUSE.
+ */
+static void
+vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
+		struct fuse_file_info *fi, __rte_unused unsigned flags,
+		const void *in_buf, size_t in_bufsz, size_t out_bufsz)
+{
+	struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
+	struct vhost_vring_file file;
+	struct vhost_vring_state state;
+	struct vhost_vring_addr addr;
+	uint64_t features;
+	uint32_t index;
+	int result = 0;
+
+	switch (cmd) {
+	case VHOST_NET_SET_BACKEND:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
+		break;
+
+	case VHOST_GET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n", ctx.fh);
+		VHOST_IOCTL_W(uint64_t, features, ops->get_features);
+		break;
+
+	case VHOST_SET_FEATURES:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n", ctx.fh);
+		VHOST_IOCTL_R(uint64_t, features, ops->set_features);
+		break;
+
+	case VHOST_RESET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", ctx.fh);
+		VHOST_IOCTL(ops->reset_owner);
+		break;
+
+	case VHOST_SET_OWNER:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
+		VHOST_IOCTL(ops->set_owner);
+		break;
+
+	case VHOST_SET_MEM_TABLE:
+		/*TODO fix race condition.*/
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n", ctx.fh);
+		static struct vhost_memory mem_temp;
+
+		switch (in_bufsz) {
+		case 0:
+			VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0);
+			break;
+
+		case sizeof(struct vhost_memory):
+			mem_temp = *(const struct vhost_memory *) in_buf;
+
+			if (mem_temp.nregions > 0) {
+				VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) +
+					(sizeof(struct vhost_memory_region) *
+						mem_temp.nregions), 0);
+			} else {
+				result = -1;
+				fuse_reply_ioctl(req, result, NULL, 0);
+			}
+			break;
+
+		default:
+			result = ops->set_mem_table(ctx,
+					in_buf, mem_temp.nregions);
+			if (result)
+				fuse_reply_err(req, EINVAL);
+			else
+				fuse_reply_ioctl(req, result, NULL, 0);
+		}
+		break;
+
+	case VHOST_SET_VRING_NUM:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_state, state,
+			ops->set_vring_num);
+		break;
+
+	case VHOST_SET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_state, state,
+			ops->set_vring_base);
+		break;
+
+	case VHOST_GET_VRING_BASE:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n", ctx.fh);
+		VHOST_IOCTL_RW(uint32_t, index,
+			struct vhost_vring_state, state, ops->get_vring_base);
+		break;
+
+	case VHOST_SET_VRING_ADDR:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_addr, addr,
+			ops->set_vring_addr);
+		break;
+
+	case VHOST_SET_VRING_KICK:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file,
+			ops->set_vring_kick);
+		break;
+
+	case VHOST_SET_VRING_CALL:
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
+		VHOST_IOCTL_R(struct vhost_vring_file, file,
+			ops->set_vring_call);
+		break;
+
+	default:
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
+		result = -1;
+		fuse_reply_ioctl(req, result, NULL, 0);
+	}
+
+	if (result < 0)
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
+	else
+		LOG_DEBUG(VHOST_CONFIG,
+			"(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
+}
+
+/*
+ * Structure handling open, release and ioctl function pointers is populated.
+ */
+static const struct cuse_lowlevel_ops vhost_net_ops = {
+	.open		= vhost_net_open,
+	.release	= vhost_net_release,
+	.ioctl		= vhost_net_ioctl,
+};
+
+/*
+ * cuse_info is populated and used to register the cuse device.
+ * vhost_net_device_ops are also passed when the device is registered in app.
+ */
+int
+rte_vhost_driver_register(const char *dev_name)
+{
+	struct cuse_info cuse_info;
+	char device_name[PATH_MAX] = "";
+	char char_device_name[PATH_MAX] = "";
+	const char *device_argv[] = { device_name };
+
+	char fuse_opt_dummy[] = FUSE_OPT_DUMMY;
+	char fuse_opt_fore[] = FUSE_OPT_FORE;
+	char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI;
+	char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti};
+
+	if (access(cuse_device_name, R_OK | W_OK) < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"char device %s can't be accessed, maybe not exist\n",
+			cuse_device_name);
+		return -1;
+	}
+
+	/*
+	 * The device name is created. This is passed to QEMU so that it can
+	 * register the device with our application.
+	 */
+	snprintf(device_name, PATH_MAX, "DEVNAME=%s", dev_name);
+	snprintf(char_device_name, PATH_MAX, "/dev/%s", dev_name);
+
+	/* Check if device already exists. */
+	if (access(char_device_name, F_OK) != -1) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"char device %s already exists\n", char_device_name);
+		return -1;
+	}
+
+	memset(&cuse_info, 0, sizeof(cuse_info));
+	cuse_info.dev_major = default_major;
+	cuse_info.dev_minor = default_minor;
+	cuse_info.dev_info_argc = 1;
+	cuse_info.dev_info_argv = device_argv;
+	cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
+
+	ops = get_virtio_net_callbacks();
+
+	session = cuse_lowlevel_setup(3, fuse_argv,
+			&cuse_info, &vhost_net_ops, 0, NULL);
+	if (session == NULL)
+		return -1;
+
+	return 0;
+}
+
+/**
+ * The CUSE session is launched allowing the application to receive open,
+ * release and ioctl calls.
+ */
+int
+rte_vhost_driver_session_start(void)
+{
+	fuse_session_loop(session);
+
+	return 0;
+}
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Przemyslaw Czesnowicz
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

This file defines common operations provided by virtio-net(.c).

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost-net-cdev.h            | 113 ---------------------------
 lib/librte_vhost/vhost-net.h                 | 113 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |   2 +-
 lib/librte_vhost/vhost_rxtx.c                |   2 +-
 lib/librte_vhost/virtio-net.c                |   2 +-
 5 files changed, 116 insertions(+), 116 deletions(-)
 delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost-net.h

diff --git a/lib/librte_vhost/vhost-net-cdev.h b/lib/librte_vhost/vhost-net-cdev.h
deleted file mode 100644
index 03a5c57..0000000
--- a/lib/librte_vhost/vhost-net-cdev.h
+++ /dev/null
@@ -1,113 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VHOST_NET_CDEV_H_
-#define _VHOST_NET_CDEV_H_
-#include <stdint.h>
-#include <stdio.h>
-#include <sys/types.h>
-#include <unistd.h>
-#include <linux/vhost.h>
-
-#include <rte_log.h>
-
-/* Macros for printing using RTE_LOG */
-#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
-#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
-
-#ifdef RTE_LIBRTE_VHOST_DEBUG
-#define VHOST_MAX_PRINT_BUFF 6072
-#define LOG_LEVEL RTE_LOG_DEBUG
-#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
-#define PRINT_PACKET(device, addr, size, header) do { \
-	char *pkt_addr = (char *)(addr); \
-	unsigned int index; \
-	char packet[VHOST_MAX_PRINT_BUFF]; \
-	\
-	if ((header)) \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
-	else \
-		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
-	for (index = 0; index < (size); index++) { \
-		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
-			"%02hhx ", pkt_addr[index]); \
-	} \
-	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
-	\
-	LOG_DEBUG(VHOST_DATA, "%s", packet); \
-} while (0)
-#else
-#define LOG_LEVEL RTE_LOG_INFO
-#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
-#define PRINT_PACKET(device, addr, size, header) do {} while (0)
-#endif
-
-
-/*
- * Structure used to identify device context.
- */
-struct vhost_device_ctx {
-	pid_t		pid;	/* PID of process calling the IOCTL. */
-	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
-};
-
-/*
- * Structure contains function pointers to be defined in virtio-net.c. These
- * functions are called in CUSE context and are used to configure devices.
- */
-struct vhost_net_device_ops {
-	int (*new_device)(struct vhost_device_ctx);
-	void (*destroy_device)(struct vhost_device_ctx);
-
-	int (*get_features)(struct vhost_device_ctx, uint64_t *);
-	int (*set_features)(struct vhost_device_ctx, uint64_t *);
-
-	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
-
-	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
-	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
-	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
-
-	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
-	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
-
-	int (*set_owner)(struct vhost_device_ctx);
-	int (*reset_owner)(struct vhost_device_ctx);
-};
-
-
-struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
-#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
new file mode 100644
index 0000000..03a5c57
--- /dev/null
+++ b/lib/librte_vhost/vhost-net.h
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOST_NET_CDEV_H_
+#define _VHOST_NET_CDEV_H_
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <linux/vhost.h>
+
+#include <rte_log.h>
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
+#define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
+
+#ifdef RTE_LIBRTE_VHOST_DEBUG
+#define VHOST_MAX_PRINT_BUFF 6072
+#define LOG_LEVEL RTE_LOG_DEBUG
+#define LOG_DEBUG(log_type, fmt, args...) RTE_LOG(DEBUG, log_type, fmt, ##args)
+#define PRINT_PACKET(device, addr, size, header) do { \
+	char *pkt_addr = (char *)(addr); \
+	unsigned int index; \
+	char packet[VHOST_MAX_PRINT_BUFF]; \
+	\
+	if ((header)) \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Header size %d: ", (device->device_fh), (size)); \
+	else \
+		snprintf(packet, VHOST_MAX_PRINT_BUFF, "(%"PRIu64") Packet size %d: ", (device->device_fh), (size)); \
+	for (index = 0; index < (size); index++) { \
+		snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), \
+			"%02hhx ", pkt_addr[index]); \
+	} \
+	snprintf(packet + strnlen(packet, VHOST_MAX_PRINT_BUFF), VHOST_MAX_PRINT_BUFF - strnlen(packet, VHOST_MAX_PRINT_BUFF), "\n"); \
+	\
+	LOG_DEBUG(VHOST_DATA, "%s", packet); \
+} while (0)
+#else
+#define LOG_LEVEL RTE_LOG_INFO
+#define LOG_DEBUG(log_type, fmt, args...) do {} while (0)
+#define PRINT_PACKET(device, addr, size, header) do {} while (0)
+#endif
+
+
+/*
+ * Structure used to identify device context.
+ */
+struct vhost_device_ctx {
+	pid_t		pid;	/* PID of process calling the IOCTL. */
+	uint64_t	fh;	/* Populated with fi->fh to track the device index. */
+};
+
+/*
+ * Structure contains function pointers to be defined in virtio-net.c. These
+ * functions are called in CUSE context and are used to configure devices.
+ */
+struct vhost_net_device_ops {
+	int (*new_device)(struct vhost_device_ctx);
+	void (*destroy_device)(struct vhost_device_ctx);
+
+	int (*get_features)(struct vhost_device_ctx, uint64_t *);
+	int (*set_features)(struct vhost_device_ctx, uint64_t *);
+
+	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
+
+	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
+	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
+	int (*get_vring_base)(struct vhost_device_ctx, uint32_t, struct vhost_vring_state *);
+
+	int (*set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file *);
+	int (*set_vring_call)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_backend)(struct vhost_device_ctx, struct vhost_vring_file *);
+
+	int (*set_owner)(struct vhost_device_ctx);
+	int (*reset_owner)(struct vhost_device_ctx);
+};
+
+
+struct vhost_net_device_ops const *get_virtio_net_callbacks(void);
+#endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 57c76cb..2bb07af 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -44,7 +44,7 @@
 #include <rte_string_fns.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define FUSE_OPT_DUMMY "\0\0"
 #define FUSE_OPT_FORE  "-f\0\0"
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index ccfd82f..c7c9550 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -38,7 +38,7 @@
 #include <rte_memcpy.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 
 #define MAX_PKT_BURST 32
 
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 52b4957..6bc9d51 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -53,7 +53,7 @@
 #include <rte_memory.h>
 #include <rte_virtio_net.h>
 
-#include "vhost-net-cdev.h"
+#include "vhost-net.h"
 #include "eventfd_link/eventfd_link.h"
 
 /*
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (2 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Przemyslaw Czesnowicz
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

vhost-user doesn't need eventfd kernel module to copy fds between processes.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
---
 lib/librte_vhost/Makefile                    |  2 +-
 lib/librte_vhost/vhost_cuse/eventfd_copy.c   | 88 ++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/eventfd_copy.h   | 39 ++++++++++++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c | 41 +++++++++----
 lib/librte_vhost/virtio-net.c                | 75 ++++--------------------
 5 files changed, 170 insertions(+), 75 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 49ae7ae..88d1295 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -41,7 +41,7 @@ LIBABIVER := 1
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.c b/lib/librte_vhost/vhost_cuse/eventfd_copy.c
new file mode 100644
index 0000000..4d697a2
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/eventfd_copy.c
@@ -0,0 +1,88 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/eventfd.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <rte_log.h>
+
+#include "eventfd_link/eventfd_link.h"
+#include "eventfd_copy.h"
+#include "vhost-net.h"
+
+static const char eventfd_cdev[] = "/dev/eventfd-link";
+
+/*
+ * This function uses the eventfd_link kernel module to copy an eventfd file
+ * descriptor provided by QEMU in to our process space.
+ */
+int
+eventfd_copy(int target_fd, int target_pid)
+{
+	int eventfd_link, ret;
+	struct eventfd_copy eventfd_copy;
+	int fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
+
+	if (fd == -1)
+		return -1;
+
+	/* Open the character device to the kernel module. */
+	/* TODO: check this earlier rather than fail until VM boots! */
+	eventfd_link = open(eventfd_cdev, O_RDWR);
+	if (eventfd_link < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"eventfd_link module is not loaded\n");
+		close(fd);
+		return -1;
+	}
+
+	eventfd_copy.source_fd = fd;
+	eventfd_copy.target_fd = target_fd;
+	eventfd_copy.target_pid = target_pid;
+	/* Call the IOCTL to copy the eventfd. */
+	ret = ioctl(eventfd_link, EVENTFD_COPY, &eventfd_copy);
+	close(eventfd_link);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"EVENTFD_COPY ioctl failed\n");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
diff --git a/lib/librte_vhost/vhost_cuse/eventfd_copy.h b/lib/librte_vhost/vhost_cuse/eventfd_copy.h
new file mode 100644
index 0000000..19ae30d
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/eventfd_copy.h
@@ -0,0 +1,39 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _EVENTFD_H
+#define _EVENTFD_H
+
+int
+eventfd_copy(int target_fd, int target_pid);
+
+#endif
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 2bb07af..e7794b0 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -45,6 +45,7 @@
 #include <rte_virtio_net.h>
 
 #include "vhost-net.h"
+#include "eventfd_copy.h"
 
 #define FUSE_OPT_DUMMY "\0\0"
 #define FUSE_OPT_FORE  "-f\0\0"
@@ -284,17 +285,37 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 		break;
 
 	case VHOST_SET_VRING_KICK:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_kick);
-		break;
-
 	case VHOST_SET_VRING_CALL:
-		LOG_DEBUG(VHOST_CONFIG,
-			"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file,
-			ops->set_vring_call);
+		if (cmd == VHOST_SET_VRING_KICK)
+			LOG_DEBUG(VHOST_CONFIG,
+				"(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n",
+			ctx.fh);
+		else
+			LOG_DEBUG(VHOST_CONFIG,
+				"(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n",
+			ctx.fh);
+		if (!in_buf)
+			VHOST_IOCTL_RETRY(sizeof(struct vhost_vring_file), 0);
+		else {
+			int fd;
+			file = *(const struct vhost_vring_file *)in_buf;
+			LOG_DEBUG(VHOST_CONFIG,
+				"idx:%d fd:%d\n", file.index, file.fd);
+			fd = eventfd_copy(file.fd, ctx.pid);
+			if (fd < 0) {
+				fuse_reply_ioctl(req, -1, NULL, 0);
+				result = -1;
+				break;
+			}
+			file.fd = fd;
+			if (cmd == VHOST_SET_VRING_KICK) {
+				result = ops->set_vring_kick(ctx, &file);
+				fuse_reply_ioctl(req, result, NULL, 0);
+			} else {
+				result = ops->set_vring_call(ctx, &file);
+				fuse_reply_ioctl(req, result, NULL, 0);
+			}
+		}
 		break;
 
 	default:
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 6bc9d51..dd8d53a 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -38,10 +38,10 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
-#include <sys/eventfd.h>
-#include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <unistd.h>
+#include <sys/ioctl.h>
+
 
 #include <sys/socket.h>
 #include <linux/if_tun.h>
@@ -53,8 +53,8 @@
 #include <rte_memory.h>
 #include <rte_virtio_net.h>
 
+#include "vhost_cuse/eventfd_copy.h"
 #include "vhost-net.h"
-#include "eventfd_link/eventfd_link.h"
 
 /*
  * Device linked list structure for configuration.
@@ -64,8 +64,6 @@ struct virtio_net_config_ll {
 	struct virtio_net_config_ll *next;	/* Next dev on linked list.*/
 };
 
-const char eventfd_cdev[] = "/dev/eventfd-link";
-
 /* device ops to add/remove device to/from data core. */
 static struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
@@ -904,37 +902,6 @@ get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
 	return 0;
 }
 
-/*
- * This function uses the eventfd_link kernel module to copy an eventfd file
- * descriptor provided by QEMU in to our process space.
- */
-static int
-eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy)
-{
-	int eventfd_link, ret;
-
-	/* Open the character device to the kernel module. */
-	eventfd_link = open(eventfd_cdev, O_RDWR);
-	if (eventfd_link < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") eventfd_link module is not loaded\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	/* Call the IOCTL to copy the eventfd. */
-	ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy);
-	close(eventfd_link);
-
-	if (ret < 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") EVENTFD_COPY ioctl failed\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	return 0;
-}
 
 /*
  * Called from CUSE IOCTL: VHOST_SET_VRING_CALL
@@ -945,7 +912,6 @@ static int
 set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
-	struct eventfd_copy	eventfd_kick;
 	struct vhost_virtqueue *vq;
 
 	dev = get_device(ctx);
@@ -958,14 +924,7 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (vq->kickfd)
 		close((int)vq->kickfd);
 
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_kick.source_fd = vq->kickfd;
-	eventfd_kick.target_fd = file->fd;
-	eventfd_kick.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_kick))
-		return -1;
+	vq->kickfd = file->fd;
 
 	return 0;
 }
@@ -979,7 +938,6 @@ static int
 set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
-	struct eventfd_copy eventfd_call;
 	struct vhost_virtqueue *vq;
 
 	dev = get_device(ctx);
@@ -991,15 +949,7 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 
 	if (vq->callfd)
 		close((int)vq->callfd);
-
-	/* Populate the eventfd_copy structure and call eventfd_copy. */
-	vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	eventfd_call.source_fd = vq->callfd;
-	eventfd_call.target_fd = file->fd;
-	eventfd_call.target_pid = ctx.pid;
-
-	if (eventfd_copy(dev, &eventfd_call))
-		return -1;
+	vq->callfd = file->fd;
 
 	return 0;
 }
@@ -1011,21 +961,18 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 static int
 get_ifname(struct virtio_net *dev, int tap_fd, int pid)
 {
-	struct eventfd_copy fd_tap;
+	int fd_tap;
 	struct ifreq ifr;
 	uint32_t size, ifr_size;
 	int ret;
 
-	fd_tap.source_fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
-	fd_tap.target_fd = tap_fd;
-	fd_tap.target_pid = pid;
-
-	if (eventfd_copy(dev, &fd_tap))
-		return -1;
+    fd_tap = eventfd_copy(tap_fd, pid);
+    if (fd_tap < 0)
+        return -1;
 
-	ret = ioctl(fd_tap.source_fd, TUNGETIFF, &ifr);
+	ret = ioctl(fd_tap, TUNGETIFF, &ifr);
 
-	if (close(fd_tap.source_fd) < 0)
+	if (close(fd_tap) < 0)
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%"PRIu64") fd close failed\n",
 			dev->device_fh);
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (3 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 06/11] lib/librte_vhost: make host_memory_map a more generic function Przemyslaw Czesnowicz
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 257 ++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c

diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
new file mode 100644
index 0000000..baca379
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -0,0 +1,257 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <dirent.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h>
+#include <fuse/cuse_lowlevel.h>
+#include <stddef.h>
+#include <string.h>
+#include <stdlib.h>
+#include <sys/eventfd.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <rte_log.h>
+
+#include "vhost-net.h"
+
+/* Line size for reading maps file. */
+static const uint32_t BUFSIZE = PATH_MAX;
+
+/* Size of prot char array in procmap. */
+#define PROT_SZ 5
+
+/* Number of elements in procmap struct. */
+#define PROCMAP_SZ 8
+
+/* Structure containing information gathered from maps file. */
+struct procmap {
+	uint64_t va_start;	/* Start virtual address in file. */
+	uint64_t len;		/* Size of file. */
+	uint64_t pgoff;		/* Not used. */
+	uint32_t maj;		/* Not used. */
+	uint32_t min;		/* Not used. */
+	uint32_t ino;		/* Not used. */
+	char prot[PROT_SZ];	/* Not used. */
+	char fname[PATH_MAX];	/* File name. */
+};
+
+/*
+ * Locate the file containing QEMU's memory space and
+ * map it to our address space.
+ */
+static int
+host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
+	pid_t pid, uint64_t addr)
+{
+	struct dirent *dptr = NULL;
+	struct procmap procmap;
+	DIR *dp = NULL;
+	int fd;
+	int i;
+	char memfile[PATH_MAX];
+	char mapfile[PATH_MAX];
+	char procdir[PATH_MAX];
+	char resolved_path[PATH_MAX];
+	char *path = NULL;
+	FILE *fmap;
+	void *map;
+	uint8_t found = 0;
+	char line[BUFSIZE];
+	char dlm[] = "-   :   ";
+	char *str, *sp, *in[PROCMAP_SZ];
+	char *end = NULL;
+
+	/* Path where mem files are located. */
+	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
+	/* Maps file used to locate mem file. */
+	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
+
+	fmap = fopen(mapfile, "r");
+	if (fmap == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to open maps file for pid %d\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Read through maps file until we find out base_address. */
+	while (fgets(line, BUFSIZE, fmap) != 0) {
+		str = line;
+		errno = 0;
+		/* Split line into fields. */
+		for (i = 0; i < PROCMAP_SZ; i++) {
+			in[i] = strtok_r(str, &dlm[i], &sp);
+			if ((in[i] == NULL) || (errno != 0)) {
+				fclose(fmap);
+				return -1;
+			}
+			str = NULL;
+		}
+
+		/* Convert/Copy each field as needed. */
+		procmap.va_start = strtoull(in[0], &end, 16);
+		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.len = strtoull(in[1], &end, 16);
+		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.pgoff = strtoull(in[3], &end, 16);
+		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.maj = strtoul(in[4], &end, 16);
+		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.min = strtoul(in[5], &end, 16);
+		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		procmap.ino = strtoul(in[6], &end, 16);
+		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
+			(errno != 0)) {
+			fclose(fmap);
+			return -1;
+		}
+
+		memcpy(&procmap.prot, in[2], PROT_SZ);
+		memcpy(&procmap.fname, in[7], PATH_MAX);
+
+		if (procmap.va_start == addr) {
+			procmap.len = procmap.len - procmap.va_start;
+			found = 1;
+			break;
+		}
+	}
+	fclose(fmap);
+
+	if (!found) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	/* Find the guest memory file among the process fds. */
+	dp = opendir(procdir);
+	if (dp == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Cannot open pid %d process directory\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+
+	found = 0;
+
+	/* Read the fd directory contents. */
+	while (NULL != (dptr = readdir(dp))) {
+		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
+				pid, dptr->d_name);
+		path = realpath(memfile, resolved_path);
+		if ((path == NULL) && (strlen(resolved_path) == 0)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"(%"PRIu64") Failed to resolve fd directory\n",
+				dev->device_fh);
+			closedir(dp);
+			return -1;
+		}
+		if (strncmp(resolved_path, procmap.fname,
+			strnlen(procmap.fname, PATH_MAX)) == 0) {
+			found = 1;
+			break;
+		}
+	}
+
+	closedir(dp);
+
+	if (found == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to find memory file for pid %d\n",
+			dev->device_fh, pid);
+		return -1;
+	}
+	/* Open the shared memory file and map the memory into this process. */
+	fd = open(memfile, O_RDWR);
+
+	if (fd == -1) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to open %s for pid %d\n",
+			dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
+		MAP_POPULATE|MAP_SHARED, fd, 0);
+	close(fd);
+
+	if (map == MAP_FAILED) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Error mapping the file %s for pid %d\n",
+			dev->device_fh, memfile, pid);
+		return -1;
+	}
+
+	/* Store the memory address and size in the device data structure */
+	mem->mapped_address = (uint64_t)(uintptr_t)map;
+	mem->mapped_size = procmap.len;
+
+	LOG_DEBUG(VHOST_CONFIG,
+		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
+		dev->device_fh,
+		memfile, resolved_path,
+		(unsigned long long)mem->mapped_size, map);
+
+	return 0;
+}
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 06/11] lib/librte_vhost: make host_memory_map a more generic function.
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (4 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 07/11] lib/librte_vhost: implement cuse_set_memory_table Przemyslaw Czesnowicz
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

This functions accepts a virtual address and pid(qemu), and maps it into
current process(vhost)'s address space.

The memory behind the virtual address should be backed by a file,
and virtual address should be the starting address.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
---
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 42 +++++++++++++--------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index baca379..58ac3dd 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -75,8 +75,8 @@ struct procmap {
  * map it to our address space.
  */
 static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
-	pid_t pid, uint64_t addr)
+host_memory_map(pid_t pid, uint64_t addr,
+	uint64_t *mapped_address, uint64_t *mapped_size)
 {
 	struct dirent *dptr = NULL;
 	struct procmap procmap;
@@ -104,8 +104,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 	fmap = fopen(mapfile, "r");
 	if (fmap == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open maps file for pid %d\n",
-			dev->device_fh, pid);
+			"Failed to open maps file for pid %d\n",
+			pid);
 		return -1;
 	}
 
@@ -179,8 +179,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (!found) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
-			dev->device_fh, pid);
+			"Failed to find memory file in pid %d maps file\n",
+			pid);
 		return -1;
 	}
 
@@ -188,8 +188,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 	dp = opendir(procdir);
 	if (dp == NULL) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Cannot open pid %d process directory\n",
-			dev->device_fh, pid);
+			"Cannot open pid %d process directory\n",
+			pid);
 		return -1;
 	}
 
@@ -202,8 +202,7 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 		path = realpath(memfile, resolved_path);
 		if ((path == NULL) && (strlen(resolved_path) == 0)) {
 			RTE_LOG(ERR, VHOST_CONFIG,
-				"(%"PRIu64") Failed to resolve fd directory\n",
-				dev->device_fh);
+				"Failed to resolve fd directory\n");
 			closedir(dp);
 			return -1;
 		}
@@ -218,8 +217,8 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (found == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file for pid %d\n",
-			dev->device_fh, pid);
+			"Failed to find memory file for pid %d\n",
+			pid);
 		return -1;
 	}
 	/* Open the shared memory file and map the memory into this process. */
@@ -227,31 +226,30 @@ host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
 
 	if (fd == -1) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open %s for pid %d\n",
-			dev->device_fh, memfile, pid);
+			"Failed to open %s for pid %d\n",
+			memfile, pid);
 		return -1;
 	}
 
 	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
-		MAP_POPULATE|MAP_SHARED, fd, 0);
+			MAP_POPULATE|MAP_SHARED, fd, 0);
 	close(fd);
 
 	if (map == MAP_FAILED) {
 		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Error mapping the file %s for pid %d\n",
-			dev->device_fh, memfile, pid);
+			"Error mapping the file %s for pid %d\n",
+			memfile, pid);
 		return -1;
 	}
 
 	/* Store the memory address and size in the device data structure */
-	mem->mapped_address = (uint64_t)(uintptr_t)map;
-	mem->mapped_size = procmap.len;
+	*mapped_address = (uint64_t)(uintptr_t)map;
+	*mapped_size = procmap.len;
 
 	LOG_DEBUG(VHOST_CONFIG,
-		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
-		dev->device_fh,
+		"Mem File: %s->%s - Size: %llu - VA: %p\n",
 		memfile, resolved_path,
-		(unsigned long long)mem->mapped_size, map);
+		(unsigned long long)*mapped_size, map);
 
 	return 0;
 }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 07/11] lib/librte_vhost: implement cuse_set_memory_table
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (5 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 06/11] lib/librte_vhost: make host_memory_map a more generic function Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 08/11] lib/librte_vhost: add select based event driven processing Przemyslaw Czesnowicz
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

remove set_memory_table ops

vhost-cuse or vhost-user will both implement their own set_memory_region handler.

In current vhost-cuse implementation, guest numa memory isn't supported.
Assume that guest memory is backed by only one file.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
---
 lib/librte_vhost/Makefile                     |   2 +-
 lib/librte_vhost/vhost-net.h                  |   4 +-
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  |   7 +-
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 115 +++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  45 ++++
 lib/librte_vhost/virtio-net.c                 | 353 +-------------------------
 lib/librte_vhost/virtio-net.h                 |  43 ++++
 7 files changed, 213 insertions(+), 356 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
 create mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 88d1295..298e4f8 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -41,7 +41,7 @@ LIBABIVER := 1
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c vhost_cuse/eventfd_copy.c virtio-net.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c vhost_rxtx.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 03a5c57..86b38a5 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -41,6 +41,8 @@
 
 #include <rte_log.h>
 
+#define VHOST_MEMORY_MAX_NREGIONS 8
+
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
@@ -92,8 +94,6 @@ struct vhost_net_device_ops {
 	int (*get_features)(struct vhost_device_ctx, uint64_t *);
 	int (*set_features)(struct vhost_device_ctx, uint64_t *);
 
-	int (*set_mem_table)(struct vhost_device_ctx, const void *, uint32_t);
-
 	int (*set_vring_num)(struct vhost_device_ctx, struct vhost_vring_state *);
 	int (*set_vring_addr)(struct vhost_device_ctx, struct vhost_vring_addr *);
 	int (*set_vring_base)(struct vhost_device_ctx, struct vhost_vring_state *);
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index e7794b0..72609a3 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -44,6 +44,7 @@
 #include <rte_string_fns.h>
 #include <rte_virtio_net.h>
 
+#include "virtio-net-cdev.h"
 #include "vhost-net.h"
 #include "eventfd_copy.h"
 
@@ -57,7 +58,7 @@ static const char cuse_device_name[] = "/dev/cuse";
 static const char default_cdev[] = "vhost-net";
 
 static struct fuse_session *session;
-static struct vhost_net_device_ops const *ops;
+struct vhost_net_device_ops const *ops;
 
 /*
  * Returns vhost_device_ctx from given fuse_req_t. The index is populated later
@@ -247,8 +248,8 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 			break;
 
 		default:
-			result = ops->set_mem_table(ctx,
-					in_buf, mem_temp.nregions);
+			result = cuse_set_mem_table(ctx, in_buf,
+				mem_temp.nregions);
 			if (result)
 				fuse_reply_err(req, EINVAL);
 			else
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index 58ac3dd..adebb54 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -47,7 +47,10 @@
 
 #include <rte_log.h>
 
+#include "rte_virtio_net.h"
 #include "vhost-net.h"
+#include "virtio-net-cdev.h"
+#include "virtio-net.h"
 
 /* Line size for reading maps file. */
 static const uint32_t BUFSIZE = PATH_MAX;
@@ -253,3 +256,115 @@ host_memory_map(pid_t pid, uint64_t addr,
 
 	return 0;
 }
+
+int
+cuse_set_mem_table(struct vhost_device_ctx ctx,
+	const struct vhost_memory *mem_regions_addr, uint32_t nregions)
+{
+	uint64_t size = offsetof(struct vhost_memory, regions);
+	uint32_t idx, valid_regions;
+	struct virtio_memory_regions *pregion;
+	struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
+		((uint64_t)(uintptr_t)mem_regions_addr + size);
+	uint64_t base_address = 0, mapped_address, mapped_size;
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem && dev->mem->mapped_address) {
+		munmap((void *)(uintptr_t)dev->mem->mapped_address,
+			(size_t)dev->mem->mapped_size);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+
+	dev->mem = calloc(1, sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_regions) * nregions);
+	if (dev->mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to allocate memory for dev->mem\n",
+			dev->device_fh);
+		return -1;
+	}
+
+	pregion = &dev->mem->regions[0];
+
+	for (idx = 0; idx < nregions; idx++) {
+		pregion[idx].guest_phys_address =
+			mem_regions[idx].guest_phys_addr;
+		pregion[idx].guest_phys_address_end =
+			pregion[idx].guest_phys_address +
+			mem_regions[idx].memory_size;
+		pregion[idx].memory_size =
+			mem_regions[idx].memory_size;
+		pregion[idx].userspace_address =
+			mem_regions[idx].userspace_addr;
+
+		LOG_DEBUG(VHOST_CONFIG,
+			"REGION: %u - GPA: %p - QVA: %p - SIZE (%"PRIu64")\n",
+			idx,
+			(void *)(uintptr_t)pregion[idx].guest_phys_address,
+			(void *)(uintptr_t)pregion[idx].userspace_address,
+			pregion[idx].memory_size);
+
+		/*set the base address mapping*/
+		if (pregion[idx].guest_phys_address == 0x0) {
+			base_address =
+				pregion[idx].userspace_address;
+			/* Map VM memory file */
+			if (host_memory_map(ctx.pid, base_address,
+				&mapped_address, &mapped_size) != 0) {
+				free(dev->mem);
+				dev->mem = NULL;
+				return -1;
+			}
+			dev->mem->mapped_address = mapped_address;
+			dev->mem->base_address = base_address;
+			dev->mem->mapped_size = mapped_size;
+		}
+	}
+
+	/* Check that we have a valid base address. */
+	if (base_address == 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to find base address of qemu memory file.\n");
+		free(dev->mem);
+		dev->mem = NULL;
+		return -1;
+	}
+
+	valid_regions = nregions;
+	for (idx = 0; idx < nregions; idx++) {
+		if ((pregion[idx].userspace_address < base_address) ||
+			(pregion[idx].userspace_address >
+			(base_address + mapped_size)))
+			valid_regions--;
+	}
+
+
+	if (valid_regions != nregions) {
+		valid_regions = 0;
+		for (idx = nregions; 0 != idx--; ) {
+			if ((pregion[idx].userspace_address < base_address) ||
+			(pregion[idx].userspace_address >
+			(base_address + mapped_size))) {
+				memmove(&pregion[idx], &pregion[idx + 1],
+					sizeof(struct virtio_memory_regions) *
+					valid_regions);
+			} else
+				valid_regions++;
+		}
+	}
+
+	for (idx = 0; idx < valid_regions; idx++) {
+		pregion[idx].address_offset =
+			mapped_address - base_address +
+			pregion[idx].userspace_address -
+			pregion[idx].guest_phys_address;
+	}
+	dev->mem->nregions = valid_regions;
+
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
new file mode 100644
index 0000000..5ee81b1
--- /dev/null
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _VIRTIO_NET_CDEV_H
+#define _VIRTIO_NET_CDEV_H
+
+#include <stdint.h>
+#include <linux/vhost.h>
+
+#include "vhost-net.h"
+
+int
+cuse_set_mem_table(struct vhost_device_ctx ctx,
+	const struct vhost_memory *mem_regions_addr, uint32_t nregions);
+
+#endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index dd8d53a..3e73a35 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -31,8 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <dirent.h>
-#include <fuse/cuse_lowlevel.h>
 #include <linux/vhost.h>
 #include <linux/virtio_net.h>
 #include <stddef.h>
@@ -55,6 +53,7 @@
 
 #include "vhost_cuse/eventfd_copy.h"
 #include "vhost-net.h"
+#include "virtio-net.h"
 
 /*
  * Device linked list structure for configuration.
@@ -65,7 +64,7 @@ struct virtio_net_config_ll {
 };
 
 /* device ops to add/remove device to/from data core. */
-static struct virtio_net_device_ops const *notify_ops;
+struct virtio_net_device_ops const *notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;
 
@@ -75,26 +74,6 @@ static struct virtio_net_config_ll *ll_root;
 				(1ULL << VIRTIO_NET_F_CTRL_RX))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
 
-/* Line size for reading maps file. */
-static const uint32_t BUFSIZE = PATH_MAX;
-
-/* Size of prot char array in procmap. */
-#define PROT_SZ 5
-
-/* Number of elements in procmap struct. */
-#define PROCMAP_SZ 8
-
-/* Structure containing information gathered from maps file. */
-struct procmap {
-	uint64_t va_start;	/* Start virtual address in file. */
-	uint64_t len;		/* Size of file. */
-	uint64_t pgoff;		/* Not used. */
-	uint32_t maj;		/* Not used. */
-	uint32_t min;		/* Not used. */
-	uint32_t ino;		/* Not used. */
-	char prot[PROT_SZ];	/* Not used. */
-	char fname[PATH_MAX];	/* File name. */
-};
 
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
@@ -121,191 +100,6 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 	return vhost_va;
 }
 
-/*
- * Locate the file containing QEMU's memory space and
- * map it to our address space.
- */
-static int
-host_memory_map(struct virtio_net *dev, struct virtio_memory *mem,
-	pid_t pid, uint64_t addr)
-{
-	struct dirent *dptr = NULL;
-	struct procmap procmap;
-	DIR *dp = NULL;
-	int fd;
-	int i;
-	char memfile[PATH_MAX];
-	char mapfile[PATH_MAX];
-	char procdir[PATH_MAX];
-	char resolved_path[PATH_MAX];
-	char *path = NULL;
-	FILE *fmap;
-	void *map;
-	uint8_t found = 0;
-	char line[BUFSIZE];
-	char dlm[] = "-   :   ";
-	char *str, *sp, *in[PROCMAP_SZ];
-	char *end = NULL;
-
-	/* Path where mem files are located. */
-	snprintf(procdir, PATH_MAX, "/proc/%u/fd/", pid);
-	/* Maps file used to locate mem file. */
-	snprintf(mapfile, PATH_MAX, "/proc/%u/maps", pid);
-
-	fmap = fopen(mapfile, "r");
-	if (fmap == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open maps file for pid %d\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Read through maps file until we find out base_address. */
-	while (fgets(line, BUFSIZE, fmap) != 0) {
-		str = line;
-		errno = 0;
-		/* Split line into fields. */
-		for (i = 0; i < PROCMAP_SZ; i++) {
-			in[i] = strtok_r(str, &dlm[i], &sp);
-			if ((in[i] == NULL) || (errno != 0)) {
-				fclose(fmap);
-				return -1;
-			}
-			str = NULL;
-		}
-
-		/* Convert/Copy each field as needed. */
-		procmap.va_start = strtoull(in[0], &end, 16);
-		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.len = strtoull(in[1], &end, 16);
-		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.pgoff = strtoull(in[3], &end, 16);
-		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.maj = strtoul(in[4], &end, 16);
-		if ((in[4] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.min = strtoul(in[5], &end, 16);
-		if ((in[5] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		procmap.ino = strtoul(in[6], &end, 16);
-		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
-			(errno != 0)) {
-			fclose(fmap);
-			return -1;
-		}
-
-		memcpy(&procmap.prot, in[2], PROT_SZ);
-		memcpy(&procmap.fname, in[7], PATH_MAX);
-
-		if (procmap.va_start == addr) {
-			procmap.len = procmap.len - procmap.va_start;
-			found = 1;
-			break;
-		}
-	}
-	fclose(fmap);
-
-	if (!found) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file in pid %d maps file\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	/* Find the guest memory file among the process fds. */
-	dp = opendir(procdir);
-	if (dp == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Cannot open pid %d process directory\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-
-	found = 0;
-
-	/* Read the fd directory contents. */
-	while (NULL != (dptr = readdir(dp))) {
-		snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
-				pid, dptr->d_name);
-		path = realpath(memfile, resolved_path);
-		if ((path == NULL) && (strlen(resolved_path) == 0)) {
-			RTE_LOG(ERR, VHOST_CONFIG,
-				"(%"PRIu64") Failed to resolve fd directory\n",
-				dev->device_fh);
-			closedir(dp);
-			return -1;
-		}
-		if (strncmp(resolved_path, procmap.fname,
-			strnlen(procmap.fname, PATH_MAX)) == 0) {
-			found = 1;
-			break;
-		}
-	}
-
-	closedir(dp);
-
-	if (found == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to find memory file for pid %d\n",
-			dev->device_fh, pid);
-		return -1;
-	}
-	/* Open the shared memory file and map the memory into this process. */
-	fd = open(memfile, O_RDWR);
-
-	if (fd == -1) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to open %s for pid %d\n",
-			dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE,
-		MAP_POPULATE|MAP_SHARED, fd, 0);
-	close(fd);
-
-	if (map == MAP_FAILED) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Error mapping the file %s for pid %d\n",
-			dev->device_fh, memfile, pid);
-		return -1;
-	}
-
-	/* Store the memory address and size in the device data structure */
-	mem->mapped_address = (uint64_t)(uintptr_t)map;
-	mem->mapped_size = procmap.len;
-
-	LOG_DEBUG(VHOST_CONFIG,
-		"(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
-		dev->device_fh,
-		memfile, resolved_path,
-		(unsigned long long)mem->mapped_size, map);
-
-	return 0;
-}
 
 /*
  * Retrieves an entry from the devices configuration linked list.
@@ -329,7 +123,7 @@ get_config_ll_entry(struct vhost_device_ctx ctx)
  * Searches the configuration core linked list and
  * retrieves the device if it exists.
  */
-static struct virtio_net *
+struct virtio_net *
 get_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *ll_dev;
@@ -647,145 +441,6 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	return 0;
 }
 
-
-/*
- * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE
- * This function creates and populates the memory structure for the device.
- * This includes storing offsets used to translate buffer addresses.
- */
-static int
-set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
-	uint32_t nregions)
-{
-	struct virtio_net *dev;
-	struct vhost_memory_region *mem_regions;
-	struct virtio_memory *mem;
-	uint64_t size = offsetof(struct vhost_memory, regions);
-	uint32_t regionidx, valid_regions;
-
-	dev = get_device(ctx);
-	if (dev == NULL)
-		return -1;
-
-	if (dev->mem) {
-		munmap((void *)(uintptr_t)dev->mem->mapped_address,
-			(size_t)dev->mem->mapped_size);
-		free(dev->mem);
-	}
-
-	/* Malloc the memory structure depending on the number of regions. */
-	mem = calloc(1, sizeof(struct virtio_memory) +
-		(sizeof(struct virtio_memory_regions) * nregions));
-	if (mem == NULL) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for dev->mem.\n",
-			dev->device_fh);
-		return -1;
-	}
-
-	mem->nregions = nregions;
-
-	mem_regions = (void *)(uintptr_t)
-			((uint64_t)(uintptr_t)mem_regions_addr + size);
-
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		/* Populate the region structure for each region. */
-		mem->regions[regionidx].guest_phys_address =
-			mem_regions[regionidx].guest_phys_addr;
-		mem->regions[regionidx].guest_phys_address_end =
-			mem->regions[regionidx].guest_phys_address +
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].memory_size =
-			mem_regions[regionidx].memory_size;
-		mem->regions[regionidx].userspace_address =
-			mem_regions[regionidx].userspace_addr;
-
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE (%"PRIu64")\n", dev->device_fh,
-			regionidx,
-			(void *)(uintptr_t)mem->regions[regionidx].guest_phys_address,
-			(void *)(uintptr_t)mem->regions[regionidx].userspace_address,
-			mem->regions[regionidx].memory_size);
-
-		/*set the base address mapping*/
-		if (mem->regions[regionidx].guest_phys_address == 0x0) {
-			mem->base_address =
-				mem->regions[regionidx].userspace_address;
-			/* Map VM memory file */
-			if (host_memory_map(dev, mem, ctx.pid,
-				mem->base_address) != 0) {
-				free(mem);
-				return -1;
-			}
-		}
-	}
-
-	/* Check that we have a valid base address. */
-	if (mem->base_address == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to find base address of qemu memory file.\n", dev->device_fh);
-		free(mem);
-		return -1;
-	}
-
-	/*
-	 * Check if all of our regions have valid mappings.
-	 * Usually one does not exist in the QEMU memory file.
-	 */
-	valid_regions = mem->nregions;
-	for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
-		if ((mem->regions[regionidx].userspace_address <
-			mem->base_address) ||
-			(mem->regions[regionidx].userspace_address >
-			(mem->base_address + mem->mapped_size)))
-				valid_regions--;
-	}
-
-	/*
-	 * If a region does not have a valid mapping,
-	 * we rebuild our memory struct to contain only valid entries.
-	 */
-	if (valid_regions != mem->nregions) {
-		LOG_DEBUG(VHOST_CONFIG, "(%"PRIu64") Not all memory regions exist in the QEMU mem file. Re-populating mem structure\n",
-			dev->device_fh);
-
-		/*
-		 * Re-populate the memory structure with only valid regions.
-		 * Invalid regions are over-written with memmove.
-		 */
-		valid_regions = 0;
-
-		for (regionidx = mem->nregions; 0 != regionidx--;) {
-			if ((mem->regions[regionidx].userspace_address <
-				mem->base_address) ||
-				(mem->regions[regionidx].userspace_address >
-				(mem->base_address + mem->mapped_size))) {
-				memmove(&mem->regions[regionidx],
-					&mem->regions[regionidx + 1],
-					sizeof(struct virtio_memory_regions) *
-						valid_regions);
-			} else {
-				valid_regions++;
-			}
-		}
-	}
-	mem->nregions = valid_regions;
-	dev->mem = mem;
-
-	/*
-	 * Calculate the address offset for each region.
-	 * This offset is used to identify the vhost virtual address
-	 * corresponding to a QEMU guest physical address.
-	 */
-	for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-		dev->mem->regions[regionidx].address_offset =
-			dev->mem->regions[regionidx].userspace_address -
-				dev->mem->base_address +
-				dev->mem->mapped_address -
-				dev->mem->regions[regionidx].guest_phys_address;
-
-	}
-	return 0;
-}
-
 /*
  * Called from CUSE IOCTL: VHOST_SET_VRING_NUM
  * The virtio device sends us the size of the descriptor ring.
@@ -1040,8 +695,6 @@ static const struct vhost_net_device_ops vhost_device_ops = {
 	.get_features = get_features,
 	.set_features = set_features,
 
-	.set_mem_table = set_mem_table,
-
 	.set_vring_num = set_vring_num,
 	.set_vring_addr = set_vring_addr,
 	.set_vring_base = set_vring_base,
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
new file mode 100644
index 0000000..75fb57e
--- /dev/null
+++ b/lib/librte_vhost/virtio-net.h
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_NET_H
+#define _VIRTIO_NET_H
+
+#include "vhost-net.h"
+#include "rte_virtio_net.h"
+
+struct virtio_net_device_ops const *notify_ops;
+struct virtio_net *get_device(struct vhost_device_ctx ctx);
+
+#endif
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 08/11] lib/librte_vhost: add select based event driven processing
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (6 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 07/11] lib/librte_vhost: implement cuse_set_memory_table Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support Przemyslaw Czesnowicz
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

for more generic event driven processing, refer to:
	http://libevent.org/

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_vhost/vhost_user/fd_man.c | 207 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/fd_man.h |  64 +++++++++++
 2 files changed, 271 insertions(+)
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
 create mode 100644 lib/librte_vhost/vhost_user/fd_man.h

diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
new file mode 100644
index 0000000..929fbc3
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <rte_log.h>
+
+#include "fd_man.h"
+
+/**
+ * Returns the index in the fdset for a given fd.
+ * If fd is -1, it means to search for a free entry.
+ * @return
+ *   index for the fd, or -1 if fd isn't in the fdset.
+ */
+static int
+fdset_find_fd(struct fdset *pfdset, int fd)
+{
+	int i;
+
+	if (pfdset == NULL)
+		return -1;
+
+	for (i = 0; i < MAX_FDS && pfdset->fd[i].fd != fd; i++)
+		;
+
+	return i ==  MAX_FDS ? -1 : i;
+}
+
+static int
+fdset_find_free_slot(struct fdset *pfdset)
+{
+	return fdset_find_fd(pfdset, -1);
+}
+
+static void
+fdset_add_fd(struct fdset  *pfdset, int idx, int fd,
+	fd_cb rcb, fd_cb wcb, void *dat)
+{
+	struct fdentry *pfdentry;
+
+	if (pfdset == NULL || idx >= MAX_FDS)
+		return;
+
+	pfdentry = &pfdset->fd[idx];
+	pfdentry->fd = fd;
+	pfdentry->rcb = rcb;
+	pfdentry->wcb = wcb;
+	pfdentry->dat = dat;
+}
+
+/**
+ * Fill the read/write fd_set with the fds in the fdset.
+ * @return
+ *  the maximum fds filled in the read/write fd_set.
+ */
+static int
+fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
+{
+	struct fdentry *pfdentry;
+	int i, maxfds = -1;
+	int num = MAX_FDS;
+
+	if (pfdset == NULL)
+		return -1;
+
+	for (i = 0; i < num; i++) {
+		pfdentry = &pfdset->fd[i];
+		if (pfdentry->fd != -1) {
+			int added = 0;
+			if (pfdentry->rcb && rfset) {
+				FD_SET(pfdentry->fd, rfset);
+				added = 1;
+			}
+			if (pfdentry->wcb && wfset) {
+				FD_SET(pfdentry->fd, wfset);
+				added = 1;
+			}
+			if (added)
+				maxfds = pfdentry->fd < maxfds ?
+					maxfds : pfdentry->fd;
+		}
+	}
+	return maxfds;
+}
+
+void
+fdset_init(struct fdset *pfdset)
+{
+	int i;
+
+	if (pfdset == NULL)
+		return;
+
+	for (i = 0; i < MAX_FDS; i++)
+		pfdset->fd[i].fd = -1;
+	pfdset->num = 0;
+}
+
+/**
+ * Register the fd in the fdset with read/write handler and context.
+ */
+int
+fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
+{
+	int i;
+
+	if (pfdset == NULL || fd == -1)
+		return -1;
+
+	/* Find a free slot in the list. */
+	i = fdset_find_free_slot(pfdset);
+	if (i == -1)
+		return -2;
+
+	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
+	pfdset->num++;
+
+	return 0;
+}
+
+/**
+ *  Unregister the fd from the fdset.
+ */
+void
+fdset_del(struct fdset *pfdset, int fd)
+{
+	int i;
+
+	i = fdset_find_fd(pfdset, fd);
+	if (i != -1 && fd != -1) {
+		pfdset->fd[i].fd = -1;
+		pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
+		pfdset->num--;
+	}
+}
+
+/**
+ * This functions runs in infinite blocking loop until there is no fd in
+ * pfdset. It calls corresponding r/w handler if there is event on the fd.
+ */
+void
+fdset_event_dispatch(struct fdset *pfdset)
+{
+	fd_set rfds, wfds;
+	int i, maxfds;
+	struct fdentry *pfdentry;
+	int num = MAX_FDS;
+
+	if (pfdset == NULL)
+		return;
+
+	while (1) {
+		FD_ZERO(&rfds);
+		FD_ZERO(&wfds);
+		maxfds = fdset_fill(&rfds, &wfds, pfdset);
+		if (maxfds == -1)
+			return;
+
+		select(maxfds + 1, &rfds, &wfds, NULL, NULL);
+
+		for (i = 0; i < num; i++) {
+			pfdentry = &pfdset->fd[i];
+			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb)
+				pfdentry->rcb(pfdentry->fd, pfdentry->dat);
+			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb)
+				pfdentry->wcb(pfdentry->fd, pfdentry->dat);
+		}
+	}
+}
diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h
new file mode 100644
index 0000000..26b4619
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/fd_man.h
@@ -0,0 +1,64 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _FD_MAN_H_
+#define _FD_MAN_H_
+#include <stdint.h>
+
+#define MAX_FDS 1024
+
+typedef void (*fd_cb)(int fd, void *dat);
+
+struct fdentry {
+	int fd;		/* -1 indicates this entry is empty */
+	fd_cb rcb;	/* callback when this fd is readable. */
+	fd_cb wcb;	/* callback when this fd is writeable.*/
+	void *dat;	/* fd context */
+};
+
+struct fdset {
+	struct fdentry fd[MAX_FDS];
+	int num;	/* current fd number of this fdset */
+};
+
+
+void fdset_init(struct fdset *pfdset);
+
+int fdset_add(struct fdset *pfdset, int fd,
+	fd_cb rcb, fd_cb wcb, void *dat);
+
+void fdset_del(struct fdset *pfdset, int fd);
+
+void fdset_event_dispatch(struct fdset *pfdset);
+
+#endif
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (7 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 08/11] lib/librte_vhost: add select based event driven processing Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-27  9:42     ` Xie, Huawei
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 10/11] lib/librte_vhost: support dev->ifname for vhost-user Przemyslaw Czesnowicz
                     ` (3 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
and added to polled(based on select) fdset.

In rte_vhost_driver_session_start(), fds in the fdset are checked for
processing. If there is new connection from qemu, connection fd accepted is
added to polled fdset. The listener and connection fds in the fdset are
then both checked. When there is message on the connection fd, its
callback vserver_message_handler is called to process vhost-user messages.

To support identifying which virtio is from which guest VM, we could call
rte_vhost_driver_register with different socket path. Virtio devices from
same VM will connect to VM specific socket. The socket path information is
stored in the virtio_net structure.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
---
 lib/librte_vhost/Makefile                     |   8 +-
 lib/librte_vhost/rte_virtio_net.h             |   2 +
 lib/librte_vhost/vhost-net.h                  |   4 +-
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 457 ++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
 lib/librte_vhost/virtio-net.c                 |  15 +-
 8 files changed, 948 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 298e4f8..cac943e 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -38,10 +38,14 @@ EXPORT_MAP := rte_vhost_version.map
 
 LIBABIVER := 1
 
-CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 -D_FILE_OFFSET_BITS=64 -lfuse
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64
+CFLAGS += -I vhost_cuse -lfuse
+CFLAGS += -I vhost_user
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c vhost_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c vhost_cuse/eventfd_copy.c
+#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 0bf07c7..46c2072 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -50,6 +50,8 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 
+#define VHOST_MEMORY_MAX_NREGIONS 8
+
 /* Used to indicate that the device is running on a data core */
 #define VIRTIO_DEV_RUNNING 1
 
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 86b38a5..a56e405 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -41,7 +41,9 @@
 
 #include <rte_log.h>
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
+#include "rte_virtio_net.h"
+
+extern struct vhost_net_device_ops const *ops;
 
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
new file mode 100644
index 0000000..712a82f
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -0,0 +1,457 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <errno.h>
+
+#include <rte_log.h>
+#include <rte_virtio_net.h>
+
+#include "fd_man.h"
+#include "vhost-net-user.h"
+#include "vhost-net.h"
+#include "virtio-net-user.h"
+
+#define MAX_VIRTIO_BACKLOG 128
+static void vserver_new_vq_conn(int fd, void *data);
+static void vserver_message_handler(int fd, void *dat);
+struct vhost_net_device_ops const *ops;
+
+struct connfd_ctx {
+	struct vhost_server *vserver;
+	uint32_t fh;
+};
+
+#define MAX_VHOST_SERVER 1024
+static struct {
+	struct vhost_server *server[MAX_VHOST_SERVER];
+	struct fdset fdset;	/**< The fd list this vhost server manages. */
+} g_vhost_server;
+
+static int vserver_idx;
+
+static const char *vhost_message_str[VHOST_USER_MAX] = {
+	[VHOST_USER_NONE] = "VHOST_USER_NONE",
+	[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
+	[VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
+	[VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
+	[VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
+	[VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
+	[VHOST_USER_SET_LOG_BASE] = "VHOST_USER_SET_LOG_BASE",
+	[VHOST_USER_SET_LOG_FD] = "VHOST_USER_SET_LOG_FD",
+	[VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM",
+	[VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR",
+	[VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE",
+	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
+	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
+	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
+	[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR"
+};
+
+/**
+ * Create a unix domain socket, bind to path and listen for connection.
+ * @return
+ *  socket fd or -1 on failure
+ */
+static int
+uds_socket(const char *path)
+{
+	struct sockaddr_un un;
+	int sockfd;
+	int ret;
+
+	if (path == NULL)
+		return -1;
+
+	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (sockfd < 0)
+		return -1;
+	RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd);
+
+	memset(&un, 0, sizeof(un));
+	un.sun_family = AF_UNIX;
+	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
+	ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+	if (ret == -1)
+		goto err;
+	RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
+
+	ret = listen(sockfd, MAX_VIRTIO_BACKLOG);
+	if (ret == -1)
+		goto err;
+
+	return sockfd;
+
+err:
+	close(sockfd);
+	return -1;
+}
+
+/* return bytes# of read on success or negative val on failure. */
+static int
+read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len  = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+	msgh.msg_control = control;
+	msgh.msg_controllen = sizeof(control);
+
+	ret = recvmsg(sockfd, &msgh, 0);
+	if (ret <= 0) {
+		RTE_LOG(ERR, VHOST_CONFIG, "recvmsg failed\n");
+		return ret;
+	}
+
+	if (msgh.msg_flags & (MSG_TRUNC | MSG_CTRUNC)) {
+		RTE_LOG(ERR, VHOST_CONFIG, "truncted msg\n");
+		return -1;
+	}
+
+	for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg != NULL;
+		cmsg = CMSG_NXTHDR(&msgh, cmsg)) {
+		if ((cmsg->cmsg_level == SOL_SOCKET) &&
+			(cmsg->cmsg_type == SCM_RIGHTS)) {
+			memcpy(fds, CMSG_DATA(cmsg), fdsize);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/* return bytes# of read on success or negative val on failure. */
+static int
+read_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	ret = read_fd_message(sockfd, (char *)msg, VHOST_USER_HDR_SIZE,
+		msg->fds, VHOST_MEMORY_MAX_NREGIONS);
+	if (ret <= 0)
+		return ret;
+
+	if (msg && msg->size) {
+		if (msg->size > sizeof(msg->payload)) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"invalid msg size: %d\n", msg->size);
+			return -1;
+		}
+		ret = read(sockfd, &msg->payload, msg->size);
+		if (ret <= 0)
+			return ret;
+		if (ret != (int)msg->size) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"read control message failed\n");
+			return -1;
+		}
+	}
+
+	return ret;
+}
+
+static int
+send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num)
+{
+
+	struct iovec iov;
+	struct msghdr msgh;
+	size_t fdsize = fd_num * sizeof(int);
+	char control[CMSG_SPACE(fdsize)];
+	struct cmsghdr *cmsg;
+	int ret;
+
+	memset(&msgh, 0, sizeof(msgh));
+	iov.iov_base = buf;
+	iov.iov_len = buflen;
+
+	msgh.msg_iov = &iov;
+	msgh.msg_iovlen = 1;
+
+	if (fds && fd_num > 0) {
+		msgh.msg_control = control;
+		msgh.msg_controllen = sizeof(control);
+		cmsg = CMSG_FIRSTHDR(&msgh);
+		cmsg->cmsg_len = CMSG_LEN(fdsize);
+		cmsg->cmsg_level = SOL_SOCKET;
+		cmsg->cmsg_type = SCM_RIGHTS;
+		memcpy(CMSG_DATA(cmsg), fds, fdsize);
+	} else {
+		msgh.msg_control = NULL;
+		msgh.msg_controllen = 0;
+	}
+
+	do {
+		ret = sendmsg(sockfd, &msgh, 0);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,  "sendmsg error\n");
+		return ret;
+	}
+
+	return ret;
+}
+
+static int
+send_vhost_message(int sockfd, struct VhostUserMsg *msg)
+{
+	int ret;
+
+	if (!msg)
+		return 0;
+
+	msg->flags &= ~VHOST_USER_VERSION_MASK;
+	msg->flags |= VHOST_USER_VERSION;
+	msg->flags |= VHOST_USER_REPLY_MASK;
+
+	ret = send_fd_message(sockfd, (char *)msg,
+		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+
+	return ret;
+}
+
+/* call back when there is new virtio connection.  */
+static void
+vserver_new_vq_conn(int fd, void *dat)
+{
+	struct vhost_server *vserver = (struct vhost_server *)dat;
+	int conn_fd;
+	struct connfd_ctx *ctx;
+	int fh;
+	struct vhost_device_ctx vdev_ctx = { 0 };
+
+	conn_fd = accept(fd, NULL, NULL);
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"new virtio connection is %d\n", conn_fd);
+	if (conn_fd < 0)
+		return;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (ctx == NULL) {
+		close(conn_fd);
+		return;
+	}
+
+	fh = ops->new_device(vdev_ctx);
+	if (fh == -1) {
+		free(ctx);
+		close(conn_fd);
+		return;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", fh);
+
+	ctx->vserver = vserver;
+	ctx->fh = fh;
+	fdset_add(&g_vhost_server.fdset,
+		conn_fd, vserver_message_handler, NULL, ctx);
+}
+
+/* callback when there is message on the connfd */
+static void
+vserver_message_handler(int connfd, void *dat)
+{
+	struct vhost_device_ctx ctx;
+	struct connfd_ctx *cfd_ctx = (struct connfd_ctx *)dat;
+	struct VhostUserMsg msg;
+	uint64_t features;
+	int ret;
+
+	ctx.fh = cfd_ctx->fh;
+	ret = read_vhost_message(connfd, &msg);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost read message failed\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	} else if (ret == 0) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"vhost peer closed\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	}
+	if (msg.request > VHOST_USER_MAX) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"vhost read incorrect message\n");
+
+		close(connfd);
+		fdset_del(&g_vhost_server.fdset, connfd);
+		free(cfd_ctx);
+		user_destroy_device(ctx);
+		ops->destroy_device(ctx);
+
+		return;
+	}
+
+	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
+		vhost_message_str[msg.request]);
+	switch (msg.request) {
+	case VHOST_USER_GET_FEATURES:
+		ret = ops->get_features(ctx, &features);
+		msg.payload.u64 = features;
+		msg.size = sizeof(msg.payload.u64);
+		send_vhost_message(connfd, &msg);
+		break;
+	case VHOST_USER_SET_FEATURES:
+		features = msg.payload.u64;
+		ops->set_features(ctx, &features);
+		break;
+
+	case VHOST_USER_SET_OWNER:
+		ops->set_owner(ctx);
+		break;
+	case VHOST_USER_RESET_OWNER:
+		ops->reset_owner(ctx);
+		break;
+
+	case VHOST_USER_SET_MEM_TABLE:
+		user_set_mem_table(ctx, &msg);
+		break;
+
+	case VHOST_USER_SET_LOG_BASE:
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
+	case VHOST_USER_SET_LOG_FD:
+		close(msg.fds[0]);
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
+		break;
+
+	case VHOST_USER_SET_VRING_NUM:
+		ops->set_vring_num(ctx, &msg.payload.state);
+		break;
+	case VHOST_USER_SET_VRING_ADDR:
+		ops->set_vring_addr(ctx, &msg.payload.addr);
+		break;
+	case VHOST_USER_SET_VRING_BASE:
+		ops->set_vring_base(ctx, &msg.payload.state);
+		break;
+
+	case VHOST_USER_GET_VRING_BASE:
+		ret = user_get_vring_base(ctx, &msg.payload.state);
+		msg.size = sizeof(msg.payload.state);
+		send_vhost_message(connfd, &msg);
+		break;
+
+	case VHOST_USER_SET_VRING_KICK:
+		user_set_vring_kick(ctx, &msg);
+		break;
+	case VHOST_USER_SET_VRING_CALL:
+		user_set_vring_call(ctx, &msg);
+		break;
+
+	case VHOST_USER_SET_VRING_ERR:
+		if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK))
+			close(msg.fds[0]);
+		RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
+		break;
+
+	default:
+		break;
+
+	}
+}
+
+
+/**
+ * Creates and initialise the vhost server.
+ */
+int
+rte_vhost_driver_register(const char *path)
+{
+	struct vhost_server *vserver;
+
+	if (vserver_idx == 0) {
+		fdset_init(&g_vhost_server.fdset);
+		ops = get_virtio_net_callbacks();
+	}
+	if (vserver_idx == MAX_VHOST_SERVER)
+		return -1;
+
+	vserver = calloc(sizeof(struct vhost_server), 1);
+	if (vserver == NULL)
+		return -1;
+
+	unlink(path);
+
+	vserver->listenfd = uds_socket(path);
+	if (vserver->listenfd < 0) {
+		free(vserver);
+		return -1;
+	}
+	vserver->path = path;
+
+	fdset_add(&g_vhost_server.fdset, vserver->listenfd,
+		vserver_new_vq_conn, NULL,
+		vserver);
+
+	g_vhost_server.server[vserver_idx++] = vserver;
+
+	return 0;
+}
+
+
+int
+rte_vhost_driver_session_start(void)
+{
+	fdset_event_dispatch(&g_vhost_server.fdset);
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
new file mode 100644
index 0000000..1b6be6c
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -0,0 +1,106 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VHOST_NET_USER_H
+#define _VHOST_NET_USER_H
+
+#include <stdint.h>
+#include <linux/vhost.h>
+
+#include "rte_virtio_net.h"
+#include "fd_man.h"
+
+struct vhost_server {
+	const char *path; /**< The path the uds is bind to. */
+	int listenfd;     /**< The listener sockfd. */
+};
+
+/* refer to hw/virtio/vhost-user.c */
+
+typedef enum VhostUserRequest {
+	VHOST_USER_NONE = 0,
+	VHOST_USER_GET_FEATURES = 1,
+	VHOST_USER_SET_FEATURES = 2,
+	VHOST_USER_SET_OWNER = 3,
+	VHOST_USER_RESET_OWNER = 4,
+	VHOST_USER_SET_MEM_TABLE = 5,
+	VHOST_USER_SET_LOG_BASE = 6,
+	VHOST_USER_SET_LOG_FD = 7,
+	VHOST_USER_SET_VRING_NUM = 8,
+	VHOST_USER_SET_VRING_ADDR = 9,
+	VHOST_USER_SET_VRING_BASE = 10,
+	VHOST_USER_GET_VRING_BASE = 11,
+	VHOST_USER_SET_VRING_KICK = 12,
+	VHOST_USER_SET_VRING_CALL = 13,
+	VHOST_USER_SET_VRING_ERR = 14,
+	VHOST_USER_MAX
+} VhostUserRequest;
+
+typedef struct VhostUserMemoryRegion {
+	uint64_t guest_phys_addr;
+	uint64_t memory_size;
+	uint64_t userspace_addr;
+	uint64_t mmap_offset;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+	uint32_t nregions;
+	uint32_t padding;
+	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+	VhostUserRequest request;
+
+#define VHOST_USER_VERSION_MASK     0x3
+#define VHOST_USER_REPLY_MASK       (0x1 << 2)
+	uint32_t flags;
+	uint32_t size; /* the following payload size */
+	union {
+#define VHOST_USER_VRING_IDX_MASK   0xff
+#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
+		uint64_t u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		VhostUserMemory memory;
+	} payload;
+	int fds[VHOST_MEMORY_MAX_NREGIONS];
+} __attribute((packed)) VhostUserMsg;
+
+#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
+
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    0x1
+
+/*****************************************************************************/
+#endif
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
new file mode 100644
index 0000000..97c5177
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -0,0 +1,314 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+
+#include "virtio-net.h"
+#include "virtio-net-user.h"
+#include "vhost-net-user.h"
+#include "vhost-net.h"
+
+struct orig_region_map {
+	int fd;
+	uint64_t mapped_address;
+	uint64_t mapped_size;
+	uint64_t blksz;
+};
+
+#define orig_region(ptr, nregions) \
+	((struct orig_region_map *)RTE_PTR_ADD((ptr), \
+		sizeof(struct virtio_memory) + \
+		sizeof(struct virtio_memory_regions) * (nregions)))
+
+static uint64_t
+get_blk_size(int fd)
+{
+	struct stat stat;
+
+	fstat(fd, &stat);
+	return (uint64_t)stat.st_blksize;
+}
+
+static void
+free_mem_region(struct virtio_net *dev)
+{
+	struct orig_region_map *region;
+	unsigned int idx;
+	uint64_t alignment;
+
+	if (!dev || !dev->mem)
+		return;
+
+	region = orig_region(dev->mem, dev->mem->nregions);
+	for (idx = 0; idx < dev->mem->nregions; idx++) {
+		if (region[idx].mapped_address) {
+			alignment = region[idx].blksz;
+			munmap((void *)
+				RTE_ALIGN_FLOOR(
+					region[idx].mapped_address, alignment),
+				RTE_ALIGN_CEIL(
+					region[idx].mapped_size, alignment));
+			close(region[idx].fd);
+		}
+	}
+}
+
+int
+user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct VhostUserMemory memory = pmsg->payload.memory;
+	struct virtio_memory_regions *pregion;
+	uint64_t mapped_address, mapped_size;
+	struct virtio_net *dev;
+	unsigned int idx = 0;
+	struct orig_region_map *pregion_orig;
+	uint64_t alignment;
+
+	/* unmap old memory regions one by one*/
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (dev->mem) {
+		free_mem_region(dev);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+
+	dev->mem = calloc(1,
+		sizeof(struct virtio_memory) +
+		sizeof(struct virtio_memory_regions) * memory.nregions +
+		sizeof(struct orig_region_map) * memory.nregions);
+	if (dev->mem == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") Failed to allocate memory for dev->mem\n",
+			dev->device_fh);
+		return -1;
+	}
+	dev->mem->nregions = memory.nregions;
+
+	pregion_orig = orig_region(dev->mem, memory.nregions);
+	for (idx = 0; idx < memory.nregions; idx++) {
+		pregion = &dev->mem->regions[idx];
+		pregion->guest_phys_address =
+			memory.regions[idx].guest_phys_addr;
+		pregion->guest_phys_address_end =
+			memory.regions[idx].guest_phys_addr +
+			memory.regions[idx].memory_size;
+		pregion->memory_size =
+			memory.regions[idx].memory_size;
+		pregion->userspace_address =
+			memory.regions[idx].userspace_addr;
+
+		/* This is ugly */
+		mapped_size = memory.regions[idx].memory_size +
+			memory.regions[idx].mmap_offset;
+		mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
+			mapped_size,
+			PROT_READ | PROT_WRITE, MAP_SHARED,
+			pmsg->fds[idx],
+			0);
+
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"mapped region %d fd:%d to %p sz:0x%"PRIx64" off:0x%"PRIx64"\n",
+			idx, pmsg->fds[idx], (void *)mapped_address,
+			mapped_size, memory.regions[idx].mmap_offset);
+
+		if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"mmap qemu guest failed.\n");
+			goto err_mmap;
+		}
+
+		pregion_orig[idx].mapped_address = mapped_address;
+		pregion_orig[idx].mapped_size = mapped_size;
+		pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
+		pregion_orig[idx].fd = pmsg->fds[idx];
+
+		mapped_address +=  memory.regions[idx].mmap_offset;
+
+		pregion->address_offset = mapped_address -
+			pregion->guest_phys_address;
+
+		if (memory.regions[idx].guest_phys_addr == 0) {
+			dev->mem->base_address =
+				memory.regions[idx].userspace_addr;
+			dev->mem->mapped_address =
+				pregion->address_offset;
+		}
+
+		LOG_DEBUG(VHOST_CONFIG,
+			"REGION: %u GPA: %p QEMU VA: %p SIZE (%"PRIu64")\n",
+			idx,
+			(void *)(uintptr_t)pregion->guest_phys_address,
+			(void *)(uintptr_t)pregion->userspace_address,
+			 pregion->memory_size);
+	}
+
+	return 0;
+
+err_mmap:
+	while (idx--) {
+		alignment = pregion_orig[idx].blksz;
+		munmap((void *)RTE_ALIGN_FLOOR(
+			pregion_orig[idx].mapped_address, alignment),
+			RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size,
+					alignment));
+		close(pregion_orig[idx].fd);
+	}
+	free(dev->mem);
+	dev->mem = NULL;
+	return -1;
+}
+
+static int
+virtio_is_ready(struct virtio_net *dev)
+{
+	struct vhost_virtqueue *rvq, *tvq;
+
+	/* mq support in future.*/
+	rvq = dev->virtqueue[VIRTIO_RXQ];
+	tvq = dev->virtqueue[VIRTIO_TXQ];
+	if (rvq && tvq && rvq->desc && tvq->desc &&
+		(rvq->kickfd != (eventfd_t)-1) &&
+		(rvq->callfd != (eventfd_t)-1) &&
+		(tvq->kickfd != (eventfd_t)-1) &&
+		(tvq->callfd != (eventfd_t)-1)) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"virtio is now ready for processing.\n");
+		return 1;
+	}
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"virtio isn't ready for processing.\n");
+	return 0;
+}
+
+void
+user_set_vring_call(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct vhost_vring_file file;
+
+	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
+		file.fd = -1;
+	else
+		file.fd = pmsg->fds[0];
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring call idx:%d file:%d\n", file.index, file.fd);
+	ops->set_vring_call(ctx, &file);
+}
+
+
+/*
+ *  In vhost-user, when we receive kick message, will test whether virtio
+ *  device is ready for packet processing.
+ */
+void
+user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct vhost_vring_file file;
+	struct virtio_net *dev = get_device(ctx);
+
+	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
+		file.fd = -1;
+	else
+		file.fd = pmsg->fds[0];
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring kick idx:%d file:%d\n", file.index, file.fd);
+	ops->set_vring_kick(ctx, &file);
+
+	if (virtio_is_ready(dev) &&
+		!(dev->flags & VIRTIO_DEV_RUNNING))
+			notify_ops->new_device(dev);
+}
+
+/*
+ * when virtio is stopped, qemu will send us the GET_VRING_BASE message.
+ */
+int
+user_get_vring_base(struct vhost_device_ctx ctx,
+	struct vhost_vring_state *state)
+{
+	struct virtio_net *dev = get_device(ctx);
+
+	/* We have to stop the queue (virtio) if it is running. */
+	if (dev->flags & VIRTIO_DEV_RUNNING)
+		notify_ops->destroy_device(dev);
+
+	/* Here we are safe to get the last used index */
+	ops->get_vring_base(ctx, state->index, state);
+
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring base idx:%d file:%d\n", state->index, state->num);
+	/*
+	 * Based on current qemu vhost-user implementation, this message is
+	 * sent and only sent in vhost_vring_stop.
+	 * TODO: cleanup the vring, it isn't usable since here.
+	 */
+	if (((int)dev->virtqueue[VIRTIO_RXQ]->callfd) >= 0) {
+		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
+		dev->virtqueue[VIRTIO_RXQ]->callfd = (eventfd_t)-1;
+	}
+	if (((int)dev->virtqueue[VIRTIO_TXQ]->callfd) >= 0) {
+		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
+		dev->virtqueue[VIRTIO_TXQ]->callfd = (eventfd_t)-1;
+	}
+
+	return 0;
+}
+
+void
+user_destroy_device(struct vhost_device_ctx ctx)
+{
+	struct virtio_net *dev = get_device(ctx);
+
+	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
+		notify_ops->destroy_device(dev);
+
+	if (dev && dev->mem) {
+		free_mem_region(dev);
+		free(dev->mem);
+		dev->mem = NULL;
+	}
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
new file mode 100644
index 0000000..df24860
--- /dev/null
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _VIRTIO_NET_USER_H
+#define _VIRTIO_NET_USER_H
+
+#include "vhost-net.h"
+#include "vhost-net-user.h"
+
+int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);
+
+void user_set_vring_call(struct vhost_device_ctx, struct VhostUserMsg *);
+
+void user_set_vring_kick(struct vhost_device_ctx, struct VhostUserMsg *);
+
+int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);
+
+void user_destroy_device(struct vhost_device_ctx);
+#endif
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3e73a35..9dea69c 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -92,8 +92,9 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
 		if ((qemu_va >= region->userspace_address) &&
 			(qemu_va <= region->userspace_address +
 			region->memory_size)) {
-			vhost_va = dev->mem->mapped_address + qemu_va -
-					dev->mem->base_address;
+			vhost_va = qemu_va + region->guest_phys_address +
+				region->address_offset -
+				region->userspace_address;
 			break;
 		}
 	}
@@ -260,6 +261,11 @@ init_device(struct virtio_net *dev)
 	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
 	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
 
+	dev->virtqueue[VIRTIO_RXQ]->kickfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_RXQ]->callfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_TXQ]->kickfd = (eventfd_t)-1;
+	dev->virtqueue[VIRTIO_TXQ]->callfd = (eventfd_t)-1;
+
 	/* Backends are set to -1 indicating an inactive device. */
 	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
 	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
@@ -576,7 +582,7 @@ set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
 
-	if (vq->kickfd)
+	if ((int)vq->kickfd >= 0)
 		close((int)vq->kickfd);
 
 	vq->kickfd = file->fd;
@@ -602,8 +608,9 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
 
-	if (vq->callfd)
+	if ((int)vq->callfd >= 0)
 		close((int)vq->callfd);
+
 	vq->callfd = file->fd;
 
 	return 0;
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 10/11] lib/librte_vhost: support dev->ifname for vhost-user
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (8 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 11/11] lib/librte_vhost: support dynamically registering vhost server Przemyslaw Czesnowicz
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

for vhost-cuse, ifname is the name of the tap device
for vhost-user, ifname is the name of the unix domain socket path

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
---
 lib/librte_vhost/Makefile                     |  2 +-
 lib/librte_vhost/rte_virtio_net.h             |  3 +-
 lib/librte_vhost/vhost-net.h                  |  3 ++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  |  8 +++-
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 53 ++++++++++++++++++++++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  3 ++
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 +++
 lib/librte_vhost/virtio-net.c                 | 63 +++++++++------------------
 8 files changed, 96 insertions(+), 46 deletions(-)

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index cac943e..52f6575 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -43,7 +43,7 @@ CFLAGS += -I vhost_cuse -lfuse
 CFLAGS += -I vhost_user
 LDFLAGS += -lfuse
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c vhost_cuse/eventfd_copy.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
 #SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c
 
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 46c2072..611a3d4 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -100,7 +100,8 @@ struct virtio_net {
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		device_fh;	/**< device identifier. */
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
-	char			ifname[IFNAMSIZ];	/**< Name of the tap device. */
+#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
+	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
 	void			*priv;		/**< private context */
 } __rte_cache_aligned;
 
diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index a56e405..0f3f8dc 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -93,6 +93,9 @@ struct vhost_net_device_ops {
 	int (*new_device)(struct vhost_device_ctx);
 	void (*destroy_device)(struct vhost_device_ctx);
 
+	void (*set_ifname)(struct vhost_device_ctx,
+		const char *if_name, unsigned int if_len);
+
 	int (*get_features)(struct vhost_device_ctx, uint64_t *);
 	int (*set_features)(struct vhost_device_ctx, uint64_t *);
 
diff --git a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
index 72609a3..6b68abf 100644
--- a/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
@@ -196,7 +196,13 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
 	case VHOST_NET_SET_BACKEND:
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n", ctx.fh);
-		VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
+		if (!in_buf) {
+			VHOST_IOCTL_RETRY(sizeof(file), 0);
+			break;
+		}
+		file = *(const struct vhost_vring_file *)in_buf;
+		result = cuse_set_backend(ctx, &file);
+		fuse_reply_ioctl(req, result, NULL, 0);
 		break;
 
 	case VHOST_GET_FEATURES:
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index adebb54..ae2c3fa 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -43,6 +43,10 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/if_tun.h>
+#include <linux/if.h>
 #include <errno.h>
 
 #include <rte_log.h>
@@ -51,6 +55,7 @@
 #include "vhost-net.h"
 #include "virtio-net-cdev.h"
 #include "virtio-net.h"
+#include "eventfd_copy.h"
 
 /* Line size for reading maps file. */
 static const uint32_t BUFSIZE = PATH_MAX;
@@ -368,3 +373,51 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
 
 	return 0;
 }
+
+/*
+ * Function to get the tap device name from the provided file descriptor and
+ * save it in the device structure.
+ */
+static int
+get_ifname(struct vhost_device_ctx ctx, struct virtio_net *dev, int tap_fd, int pid)
+{
+	int fd_tap;
+	struct ifreq ifr;
+	uint32_t ifr_size;
+	int ret;
+
+	fd_tap = eventfd_copy(tap_fd, pid);
+	if (fd_tap < 0)
+		return -1;
+
+	ret = ioctl(fd_tap, TUNGETIFF, &ifr);
+
+	if (close(fd_tap) < 0)
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") fd close failed\n",
+			dev->device_fh);
+
+	if (ret >= 0) {
+		ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name));
+		ops->set_ifname(ctx, ifr.ifr_name, ifr_size);
+	} else
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"(%"PRIu64") TUNGETIFF ioctl failed\n",
+			dev->device_fh);
+
+	return 0;
+}
+
+int cuse_set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
+{
+	struct virtio_net *dev;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return -1;
+
+	if (!(dev->flags & VIRTIO_DEV_RUNNING) && file->fd != VIRTIO_DEV_STOPPED)
+		get_ifname(ctx, dev, file->fd, ctx.pid);
+
+	return ops->set_backend(ctx, file);
+}
diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
index 5ee81b1..eb6b0ba 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
@@ -42,4 +42,7 @@ int
 cuse_set_mem_table(struct vhost_device_ctx ctx,
 	const struct vhost_memory *mem_regions_addr, uint32_t nregions);
 
+int
+cuse_set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *);
+
 #endif
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 712a82f..634a498 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -268,6 +268,7 @@ vserver_new_vq_conn(int fd, void *dat)
 	struct connfd_ctx *ctx;
 	int fh;
 	struct vhost_device_ctx vdev_ctx = { 0 };
+	unsigned int size;
 
 	conn_fd = accept(fd, NULL, NULL);
 	RTE_LOG(INFO, VHOST_CONFIG,
@@ -287,6 +288,12 @@ vserver_new_vq_conn(int fd, void *dat)
 		close(conn_fd);
 		return;
 	}
+
+	vdev_ctx.fh = fh;
+	size = strnlen(vserver->path, PATH_MAX);
+	ops->set_ifname(vdev_ctx, vserver->path,
+		size);
+
 	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", fh);
 
 	ctx->vserver = vserver;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 9dea69c..20567ff 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -38,12 +38,8 @@
 #include <stdlib.h>
 #include <sys/mman.h>
 #include <unistd.h>
-#include <sys/ioctl.h>
-
 
 #include <sys/socket.h>
-#include <linux/if_tun.h>
-#include <linux/if.h>
 
 #include <rte_ethdev.h>
 #include <rte_log.h>
@@ -51,7 +47,6 @@
 #include <rte_memory.h>
 #include <rte_virtio_net.h>
 
-#include "vhost_cuse/eventfd_copy.h"
 #include "vhost-net.h"
 #include "virtio-net.h"
 
@@ -357,6 +352,24 @@ destroy_device(struct vhost_device_ctx ctx)
 	}
 }
 
+static void
+set_ifname(struct vhost_device_ctx ctx,
+	const char *if_name, unsigned int if_len)
+{
+	struct virtio_net *dev;
+	unsigned int len;
+
+	dev = get_device(ctx);
+	if (dev == NULL)
+		return;
+
+	len = if_len > sizeof(dev->ifname) ?
+		sizeof(dev->ifname) : if_len;
+
+	strncpy(dev->ifname, if_name, len);
+}
+
+
 /*
  * Called from CUSE IOCTL: VHOST_SET_OWNER
  * This function just returns success at the moment unless
@@ -617,43 +630,6 @@ set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 }
 
 /*
- * Function to get the tap device name from the provided file descriptor and
- * save it in the device structure.
- */
-static int
-get_ifname(struct virtio_net *dev, int tap_fd, int pid)
-{
-	int fd_tap;
-	struct ifreq ifr;
-	uint32_t size, ifr_size;
-	int ret;
-
-    fd_tap = eventfd_copy(tap_fd, pid);
-    if (fd_tap < 0)
-        return -1;
-
-	ret = ioctl(fd_tap, TUNGETIFF, &ifr);
-
-	if (close(fd_tap) < 0)
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") fd close failed\n",
-			dev->device_fh);
-
-	if (ret >= 0) {
-		ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name));
-		size = ifr_size > sizeof(dev->ifname) ?
-				sizeof(dev->ifname) : ifr_size;
-
-		strncpy(dev->ifname, ifr.ifr_name, size);
-	} else
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") TUNGETIFF ioctl failed\n",
-			dev->device_fh);
-
-	return 0;
-}
-
-/*
  * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND
  * To complete device initialisation when the virtio driver is loaded,
  * we are provided with a valid fd for a tap device (not used by us).
@@ -681,7 +657,6 @@ set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 	if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
 		if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) &&
 			((int)dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED)) {
-			get_ifname(dev, file->fd, ctx.pid);
 			return notify_ops->new_device(dev);
 		}
 	/* Otherwise we remove it. */
@@ -699,6 +674,8 @@ static const struct vhost_net_device_ops vhost_device_ops = {
 	.new_device = new_device,
 	.destroy_device = destroy_device,
 
+	.set_ifname = set_ifname,
+
 	.get_features = get_features,
 	.set_features = set_features,
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [dpdk-dev] [PATCH v3 11/11] lib/librte_vhost: support dynamically registering vhost server
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (9 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 10/11] lib/librte_vhost: support dev->ifname for vhost-user Przemyslaw Czesnowicz
@ 2015-02-23 17:36   ` Przemyslaw Czesnowicz
  2015-02-24  0:41   ` [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support Thomas Monjalon
  2015-02-25  5:55   ` Xie, Huawei
  12 siblings, 0 replies; 41+ messages in thread
From: Przemyslaw Czesnowicz @ 2015-02-23 17:36 UTC (permalink / raw)
  To: dev

From: "Xie, Huawei" <huawei.xie@intel.com>

* support calling rte_vhost_driver_register after rte_vhost_driver_session_start
* add mutext to protect fdset from concurrent access
* add busy flag in fdentry. this flag is set before cb and cleared after cb is finished.

mutex lock scenario in vhost:

* event_dispatch(in rte_vhost_driver_session_start) runs in a seperate thread, infinitely
processing vhost messages through cb(callback).
* event_dispatch acquires the lock, get the cb and its context, mark the busy flag,
and releases the mutex.
* vserver_new_vq_conn cb calls fdset_add, which acquires the mutex and add new fd into fdset.
* vserver_message_handler cb frees data context, marks remove flag to request to delete
connfd(connection fd) from fdset.
* after cb returns, event_dispatch
  1. clears busy flag.
  2. if there is remove request, call fdset_del, which acquires mutex, checks busy flag, and
removes connfd from fdset.
* rte_vhost_driver_unregister(not implemented) runs in another thread, acquires the mutex,
calls fdset_del to remove fd(listenerfd) from fdset. Then it could free data context.

The above steps ensures fd data context isn't freed when cb is using.

VM(s) should have been shutdown before rte_vhost_driver_unregister.

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_vhost/vhost_user/fd_man.c         | 63 +++++++++++++++++++++++++---
 lib/librte_vhost/vhost_user/fd_man.h         |  5 ++-
 lib/librte_vhost/vhost_user/vhost-net-user.c | 34 +++++++++------
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
index 929fbc3..63ac4df 100644
--- a/lib/librte_vhost/vhost_user/fd_man.c
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -40,6 +40,7 @@
 #include <sys/types.h>
 #include <unistd.h>
 
+#include <rte_common.h>
 #include <rte_log.h>
 
 #include "fd_man.h"
@@ -145,6 +146,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
 	if (pfdset == NULL || fd == -1)
 		return -1;
 
+	pthread_mutex_lock(&pfdset->fd_mutex);
+
 	/* Find a free slot in the list. */
 	i = fdset_find_free_slot(pfdset);
 	if (i == -1)
@@ -153,6 +156,8 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
 	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
 	pfdset->num++;
 
+	pthread_mutex_unlock(&pfdset->fd_mutex);
+
 	return 0;
 }
 
@@ -164,17 +169,36 @@ fdset_del(struct fdset *pfdset, int fd)
 {
 	int i;
 
+	if (pfdset == NULL || fd == -1)
+		return;
+
+again:
+	pthread_mutex_lock(&pfdset->fd_mutex);
+
 	i = fdset_find_fd(pfdset, fd);
 	if (i != -1 && fd != -1) {
+		/* busy indicates r/wcb is executing! */
+		if (pfdset->fd[i].busy == 1) {
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			goto again;
+		}
+
 		pfdset->fd[i].fd = -1;
 		pfdset->fd[i].rcb = pfdset->fd[i].wcb = NULL;
 		pfdset->num--;
 	}
+
+	pthread_mutex_unlock(&pfdset->fd_mutex);
 }
 
 /**
  * This functions runs in infinite blocking loop until there is no fd in
  * pfdset. It calls corresponding r/w handler if there is event on the fd.
+ *
+ * Before the callback is called, we set the flag to busy status; If other
+ * thread(now rte_vhost_driver_unregister) calls fdset_del concurrently, it
+ * will wait until the flag is reset to zero(which indicates the callback is
+ * finished), then it could free the context after fdset_del.
  */
 void
 fdset_event_dispatch(struct fdset *pfdset)
@@ -183,6 +207,10 @@ fdset_event_dispatch(struct fdset *pfdset)
 	int i, maxfds;
 	struct fdentry *pfdentry;
 	int num = MAX_FDS;
+	fd_cb rcb, wcb;
+	void *dat;
+	int fd;
+	int remove1, remove2;
 
 	if (pfdset == NULL)
 		return;
@@ -190,18 +218,41 @@ fdset_event_dispatch(struct fdset *pfdset)
 	while (1) {
 		FD_ZERO(&rfds);
 		FD_ZERO(&wfds);
+		pthread_mutex_lock(&pfdset->fd_mutex);
+
 		maxfds = fdset_fill(&rfds, &wfds, pfdset);
-		if (maxfds == -1)
-			return;
+		if (maxfds == -1) {
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			sleep(1);
+			continue;
+		}
+
+		pthread_mutex_unlock(&pfdset->fd_mutex);
 
 		select(maxfds + 1, &rfds, &wfds, NULL, NULL);
 
 		for (i = 0; i < num; i++) {
+			remove1 = remove2 = 0;
+			pthread_mutex_lock(&pfdset->fd_mutex);
 			pfdentry = &pfdset->fd[i];
-			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &rfds) && pfdentry->rcb)
-				pfdentry->rcb(pfdentry->fd, pfdentry->dat);
-			if (pfdentry->fd >= 0 && FD_ISSET(pfdentry->fd, &wfds) && pfdentry->wcb)
-				pfdentry->wcb(pfdentry->fd, pfdentry->dat);
+			fd = pfdentry->fd;
+			rcb = pfdentry->rcb;
+			wcb = pfdentry->wcb;
+			dat = pfdentry->dat;
+			pfdentry->busy = 1;
+			pthread_mutex_unlock(&pfdset->fd_mutex);
+			if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
+				rcb(fd, dat, &remove1);
+			if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
+				wcb(fd, dat, &remove2);
+			pfdentry->busy = 0;
+			/*
+			 * fdset_del needs to check busy flag.
+			 * We don't allow fdset_del to be called in callback
+			 * directly.
+			 */
+			if (remove1 || remove2)
+				fdset_del(pfdset, fd);
 		}
 	}
 }
diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/vhost_user/fd_man.h
index 26b4619..74ecde2 100644
--- a/lib/librte_vhost/vhost_user/fd_man.h
+++ b/lib/librte_vhost/vhost_user/fd_man.h
@@ -34,20 +34,23 @@
 #ifndef _FD_MAN_H_
 #define _FD_MAN_H_
 #include <stdint.h>
+#include <pthread.h>
 
 #define MAX_FDS 1024
 
-typedef void (*fd_cb)(int fd, void *dat);
+typedef void (*fd_cb)(int fd, void *dat, int *remove);
 
 struct fdentry {
 	int fd;		/* -1 indicates this entry is empty */
 	fd_cb rcb;	/* callback when this fd is readable. */
 	fd_cb wcb;	/* callback when this fd is writeable.*/
 	void *dat;	/* fd context */
+	int busy;	/* whether this entry is being used in cb. */
 };
 
 struct fdset {
 	struct fdentry fd[MAX_FDS];
+	pthread_mutex_t fd_mutex;
 	int num;	/* current fd number of this fdset */
 };
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 634a498..3aa9436 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -41,6 +41,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <errno.h>
+#include <pthread.h>
 
 #include <rte_log.h>
 #include <rte_virtio_net.h>
@@ -51,8 +52,9 @@
 #include "virtio-net-user.h"
 
 #define MAX_VIRTIO_BACKLOG 128
-static void vserver_new_vq_conn(int fd, void *data);
-static void vserver_message_handler(int fd, void *dat);
+
+static void vserver_new_vq_conn(int fd, void *data, int *remove);
+static void vserver_message_handler(int fd, void *dat, int *remove);
 struct vhost_net_device_ops const *ops;
 
 struct connfd_ctx {
@@ -61,10 +63,18 @@ struct connfd_ctx {
 };
 
 #define MAX_VHOST_SERVER 1024
-static struct {
+struct _vhost_server {
 	struct vhost_server *server[MAX_VHOST_SERVER];
-	struct fdset fdset;	/**< The fd list this vhost server manages. */
-} g_vhost_server;
+	struct fdset fdset;
+};
+
+static struct _vhost_server g_vhost_server = {
+	.fdset = {
+		.fd = { [0 ... MAX_FDS - 1] = {-1, NULL, NULL, NULL, 0} },
+		.fd_mutex = PTHREAD_MUTEX_INITIALIZER,
+		.num = 0
+	},
+};
 
 static int vserver_idx;
 
@@ -261,7 +271,7 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 
 /* call back when there is new virtio connection.  */
 static void
-vserver_new_vq_conn(int fd, void *dat)
+vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove)
 {
 	struct vhost_server *vserver = (struct vhost_server *)dat;
 	int conn_fd;
@@ -304,7 +314,7 @@ vserver_new_vq_conn(int fd, void *dat)
 
 /* callback when there is message on the connfd */
 static void
-vserver_message_handler(int connfd, void *dat)
+vserver_message_handler(int connfd, void *dat, int *remove)
 {
 	struct vhost_device_ctx ctx;
 	struct connfd_ctx *cfd_ctx = (struct connfd_ctx *)dat;
@@ -319,7 +329,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost read message failed\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -330,7 +340,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost peer closed\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -342,7 +352,7 @@ vserver_message_handler(int connfd, void *dat)
 			"vhost read incorrect message\n");
 
 		close(connfd);
-		fdset_del(&g_vhost_server.fdset, connfd);
+		*remove = 1;
 		free(cfd_ctx);
 		user_destroy_device(ctx);
 		ops->destroy_device(ctx);
@@ -426,10 +436,8 @@ rte_vhost_driver_register(const char *path)
 {
 	struct vhost_server *vserver;
 
-	if (vserver_idx == 0) {
-		fdset_init(&g_vhost_server.fdset);
+	if (vserver_idx == 0)
 		ops = get_virtio_net_callbacks();
-	}
 	if (vserver_idx == MAX_VHOST_SERVER)
 		return -1;
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (10 preceding siblings ...)
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 11/11] lib/librte_vhost: support dynamically registering vhost server Przemyslaw Czesnowicz
@ 2015-02-24  0:41   ` Thomas Monjalon
  2015-02-25  5:55   ` Xie, Huawei
  12 siblings, 0 replies; 41+ messages in thread
From: Thomas Monjalon @ 2015-02-24  0:41 UTC (permalink / raw)
  To: Przemyslaw Czesnowicz, Xie, Huawei; +Cc: dev

2015-02-23 17:36, Przemyslaw Czesnowicz:
> v3 changes:
>   * move things around to make all patches compile
>   
> 
> Xie, Huawei (11):
>   lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is
>     dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver
>     in guest would crash with only CTRL_RX enabled.
>   lib/librte_vhost: create vhost_cuse directory and move
>     vhost-net-cdev.c into vhost_cuse
>   lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
>   lib/librte_vhost: move fd copying(from qemu process into vhost
>     process) to eventfd_copy.c
>   lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file
>     virtio-net-cdev.c
>   lib/librte_vhost: make host_memory_map a more generic function.
>   lib/librte_vhost: implement cuse_set_memory_table
>   lib/librte_vhost: add select based event driven processing
>   lib/librte_vhost: vhost user support
>   lib/librte_vhost: support dev->ifname for vhost-user
>   lib/librte_vhost: support dynamically registering vhost server

Applied, thanks

Commit logs don't explain why changes are done and there are probably
a lot of cleanup to do, but I prefer having it integrated.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support
  2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
                     ` (11 preceding siblings ...)
  2015-02-24  0:41   ` [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support Thomas Monjalon
@ 2015-02-25  5:55   ` Xie, Huawei
  12 siblings, 0 replies; 41+ messages in thread
From: Xie, Huawei @ 2015-02-25  5:55 UTC (permalink / raw)
  To: Czesnowicz, Przemyslaw, dev

PC:
Thanks a lot for the effort.
During one of the rebase process, i moved eventfd copy into
eventfd_copy.c but forget to update virtio-net.c, so it  isn't
compilable until later commit.
Sorry for the trouble. Will check if each commit could be compiled in
future.

On 2/24/2015 1:36 AM, Przemyslaw Czesnowicz wrote:
> v3 changes:
>   * move things around to make all patches compile
>   
>
> Xie, Huawei (11):
>   lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is
>     dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver
>     in guest would crash with only CTRL_RX enabled.
>   lib/librte_vhost: create vhost_cuse directory and move
>     vhost-net-cdev.c into vhost_cuse
>   lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h
>   lib/librte_vhost: move fd copying(from qemu process into vhost
>     process) to eventfd_copy.c
>   lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file
>     virtio-net-cdev.c
>   lib/librte_vhost: make host_memory_map a more generic function.
>   lib/librte_vhost: implement cuse_set_memory_table
>   lib/librte_vhost: add select based event driven processing
>   lib/librte_vhost: vhost user support
>   lib/librte_vhost: support dev->ifname for vhost-user
>   lib/librte_vhost: support dynamically registering vhost server
>
>  lib/librte_vhost/Makefile                     |   8 +-
>  lib/librte_vhost/rte_virtio_net.h             |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c             | 389 --------------------
>  lib/librte_vhost/vhost-net-cdev.h             | 113 ------
>  lib/librte_vhost/vhost-net.h                  | 118 +++++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c    |  88 +++++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h    |  39 ++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 417 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 423 ++++++++++++++++++++++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c                 |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c          | 258 ++++++++++++++
>  lib/librte_vhost/vhost_user/fd_man.h          |  67 ++++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 472 +++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c                 | 491 ++------------------------
>  lib/librte_vhost/virtio-net.h                 |  43 +++
>  19 files changed, 2491 insertions(+), 959 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support
  2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support Przemyslaw Czesnowicz
@ 2015-02-27  9:42     ` Xie, Huawei
  0 siblings, 0 replies; 41+ messages in thread
From: Xie, Huawei @ 2015-02-27  9:42 UTC (permalink / raw)
  To: Czesnowicz, Przemyslaw, dev

On 2/24/2015 1:36 AM, Czesnowicz, Przemyslaw wrote:
> From: "Xie, Huawei" <huawei.xie@intel.com>
>
> In rte_vhost_driver_register(), vhost unix domain socket listener fd is created
> and added to polled(based on select) fdset.
>
> In rte_vhost_driver_session_start(), fds in the fdset are checked for
> processing. If there is new connection from qemu, connection fd accepted is
> added to polled fdset. The listener and connection fds in the fdset are
> then both checked. When there is message on the connection fd, its
> callback vserver_message_handler is called to process vhost-user messages.
>
> To support identifying which virtio is from which guest VM, we could call
> rte_vhost_driver_register with different socket path. Virtio devices from
> same VM will connect to VM specific socket. The socket path information is
> stored in the virtio_net structure.
>
> Signed-off-by: Huawei Xie <huawei.xie@intel.com>
> Signed-off-by: Przemyslaw Czesnowicz <przemyslaw.czesnowicz@intel.com>
> ---
>  lib/librte_vhost/Makefile                     |   8 +-
>  lib/librte_vhost/rte_virtio_net.h             |   2 +
>  lib/librte_vhost/vhost-net.h                  |   4 +-
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 457 ++++++++++++++++++++++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 314 ++++++++++++++++++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c                 |  15 +-
>  8 files changed, 948 insertions(+), 7 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index 298e4f8..cac943e 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile

cut lines...

> +
> +int
> +user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
> +{
> +	struct VhostUserMemory memory = pmsg->payload.memory;
> +	struct virtio_memory_regions *pregion;
> +	uint64_t mapped_address, mapped_size;
> +	struct virtio_net *dev;
> +	unsigned int idx = 0;
> +	struct orig_region_map *pregion_orig;
> +	uint64_t alignment;
> +
> +	/* unmap old memory regions one by one*/
> +	dev = get_device(ctx);
> +	if (dev == NULL)
> +		return -1;
> +
> +	if (dev->mem) {
> +		free_mem_region(dev);
> +		free(dev->mem);
> +		dev->mem = NULL;
> +	}
> +
VHOST_SET_MEM_TABLE is sent and only sent to vhost backend in
vhost_dev_start.
In FC21, we don't receive VHOST_GET_VRING_BASE to stop the vhost
device(refer to later comment below). Need root cause the change of
guest virtio driver.
For that case, we could remove the vhost device from data plane(if it is
there) when we receive this message.
Will submit a fix.
So far everything works fine but apparently we need clear spec when to
stop and start vhost device.

> +	dev->mem = calloc(1,
cut lines....

+}
+
+
+/*
+ *  In vhost-user, when we receive kick message, will test whether virtio
+ *  device is ready for packet processing.
+ */
+void
+user_set_vring_kick(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
+{
+	struct vhost_vring_file file;
+	struct virtio_net *dev = get_device(ctx);
+
+	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
+	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
+		file.fd = -1;
+	else
+		file.fd = pmsg->fds[0];
+	RTE_LOG(INFO, VHOST_CONFIG,
+		"vring kick idx:%d file:%d\n", file.index, file.fd);
+	ops->set_vring_kick(ctx, &file);
+
+	if (virtio_is_ready(dev) &&
+		!(dev->flags & VIRTIO_DEV_RUNNING))
+			notify_ops->new_device(dev);
+}

When vhost device is started, qemu will call
vhost_net_start_one->vhost_dev_start->vhost_virtqueue_start.
    1. vhost_dev_start send VHOST_SET_MEM_TABLE to vhost backend.
    2. kick fd is sent and only sent to vhost backend in
vhost_virtqueue_start for each queue, through VHOST_SET_VRING_KICK message.
    2. vhost_net_start_one will send VHOST_NET_SET_BACKEND to vhost
backend if type is TAP.
For vhost-user, as we don't receive SET_BACKEND message, it makes sense
to use VHOST_SET_VRING_KICK as the start signal for vhost device.

> +
> +/*
> + * when virtio is stopped, qemu will send us the GET_VRING_BASE message.
> + */
> +int
> +user_get_vring_base(struct vhost_device_ctx ctx,
> +	struct vhost_vring_state *state)
> +{
> +	struct virtio_net *dev = get_device(ctx);
> +
> +	/* We have to stop the queue (virtio) if it is running. */
> +	if (dev->flags & VIRTIO_DEV_RUNNING)
> +		notify_ops->destroy_device(dev);
> +
> +	/* Here we are safe to get the last used index */
> +	ops->get_vring_base(ctx, state->index, state);
> +
> +	RTE_LOG(INFO, VHOST_CONFIG,
> +		"vring base idx:%d file:%d\n", state->index, state->num);
> +	/*
> +	 * Based on current qemu vhost-user implementation, this message is
> +	 * sent and only sent in vhost_vring_stop.
> +	 * TODO: cleanup the vring, it isn't usable since here.
> +	 */
> +	if (((int)dev->virtqueue[VIRTIO_RXQ]->callfd) >= 0) {
> +		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
> +		dev->virtqueue[VIRTIO_RXQ]->callfd = (eventfd_t)-1;
> +	}
> +	if (((int)dev->virtqueue[VIRTIO_TXQ]->callfd) >= 0) {
> +		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
> +		dev->virtqueue[VIRTIO_TXQ]->callfd = (eventfd_t)-1;
> +	}
> +
> +	return 0;
> +}

When vhost device is to be stopped(mostly when virtio device status
register is written), vhost_net_stop_one->vhost_dev_stop-> is called.
1. VHOST_NET_SET_BACKEND message is sent to vhost backend if type is TAP.
2. VHOST_GET_VRING_BASE is sent in vhost_virtqueue_stop for each virt queue.
It makes sense to use VHOST_GET_VRING_BASE as the stop signal for vhost
device.

> +
> +void
> +user_destroy_device(struct vhost_device_ctx ctx)
> +{
> +	struct virtio_net *dev = get_device(ctx);
> +
> +	if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
> +		notify_ops->destroy_device(dev);
> +
> +	if (dev && dev->mem) {
> +		free_mem_region(dev);
> +		free(dev->mem);
> +		dev->mem = NULL;
> +	}
> +}
> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h
> new file mode 100644
> index 0000000..df24860
> --- /dev/null
cut lines...

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2015-02-27  9:43 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-12  5:07 [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Huawei Xie
2015-02-16  8:15   ` Tetsuya Mukawa
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 06/11] lib/librte_vhost: make host_memory_map a more generic function Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 07/11] lib/librte_vhost: implement cuse_set_memory_table Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 08/11] lib/librte_vhost: add select based event driven processing Huawei Xie
2015-02-16 17:10   ` Ananyev, Konstantin
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support Huawei Xie
2015-02-12  8:26   ` Linhaifeng
2015-02-12  9:28     ` Xie, Huawei
2015-02-12 10:19       ` Linhaifeng
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 10/11] lib/librte_vhost: support dev->ifname for vhost-user Huawei Xie
2015-02-12  5:07 ` [dpdk-dev] [PATCH v2 11/11] lib/librte_vhost: support dynamically registering vhost server Huawei Xie
2015-02-16  8:17   ` Tetsuya Mukawa
2015-02-16 17:11   ` Ananyev, Konstantin
2015-02-12  5:26 ` [dpdk-dev] [PATCH v2 00/11] qemu vhost-user support Xie, Huawei
2015-02-16  8:19 ` Tetsuya Mukawa
2015-02-22 18:20   ` Thomas Monjalon
2015-02-23 13:53     ` Czesnowicz, Przemyslaw
2015-02-23 14:08       ` Thomas Monjalon
2015-02-23 14:15         ` Czesnowicz, Przemyslaw
2015-02-23  2:50 ` Tetsuya Mukawa
2015-02-23 17:36 ` [dpdk-dev] [PATCH v3 " Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 01/11] lib/librte_vhost: enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 02/11] lib/librte_vhost: create vhost_cuse directory and move vhost-net-cdev.c into vhost_cuse Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 03/11] lib/librte_vhost: rename vhost-net-cdev.h to vhost-net.h Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 04/11] lib/librte_vhost: move fd copying(from qemu process into vhost process) to eventfd_copy.c Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 05/11] lib/librte_vhost: copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 06/11] lib/librte_vhost: make host_memory_map a more generic function Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 07/11] lib/librte_vhost: implement cuse_set_memory_table Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 08/11] lib/librte_vhost: add select based event driven processing Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 09/11] lib/librte_vhost: vhost user support Przemyslaw Czesnowicz
2015-02-27  9:42     ` Xie, Huawei
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 10/11] lib/librte_vhost: support dev->ifname for vhost-user Przemyslaw Czesnowicz
2015-02-23 17:36   ` [dpdk-dev] [PATCH v3 11/11] lib/librte_vhost: support dynamically registering vhost server Przemyslaw Czesnowicz
2015-02-24  0:41   ` [dpdk-dev] [PATCH v3 00/11] qemu vhost-user support Thomas Monjalon
2015-02-25  5:55   ` Xie, Huawei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).