DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table
@ 2018-09-05  4:28 Tiwei Bie
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Tiwei Bie @ 2018-09-05  4:28 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov

This series consists of some fixes and improvements for virtio-user's
memory table preparation.

This series supersedes below patches:
https://patches.dpdk.org/patch/43807/
https://patches.dpdk.org/patch/43918/

The second patch in this series depends on the below patch set:
http://patches.dpdk.org/project/dpdk/list/?series=1177

Tiwei Bie (3):
  net/virtio-user: fix deadlock in memory events callback
  net/virtio-user: avoid parsing process mappings
  net/virtio-user: fix memory hotplug support in vhost-kernel

 drivers/net/virtio/virtio_user/vhost_kernel.c |  50 +++--
 drivers/net/virtio/virtio_user/vhost_user.c   | 211 ++++++++----------
 .../net/virtio/virtio_user/virtio_user_dev.c  |  19 ++
 3 files changed, 135 insertions(+), 145 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback
  2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
@ 2018-09-05  4:28 ` Tiwei Bie
  2018-09-05  8:10   ` Sean Harte
                     ` (2 more replies)
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings Tiwei Bie
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: Tiwei Bie @ 2018-09-05  4:28 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov, stable

Deadlock can occur when allocating memory if a vhost-kernel
based virtio-user device is in use. To fix the deadlock,
we will take memory hotplug lock explicitly in virtio-user
when necessary, and always call the _thread_unsafe memory
functions.

Bugzilla ID: 81
Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
Cc: stable@dpdk.org

Reported-by: Seán Harte <seanbh@gmail.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c |  6 +++++-
 .../net/virtio/virtio_user/virtio_user_dev.c  | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..897fee0af 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -115,7 +115,11 @@ prepare_vhost_memory_kernel(void)
 	wa.region_nr = 0;
 	wa.vm = vm;
 
-	if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) {
+	/*
+	 * The memory lock has already been taken by memory subsystem
+	 * or virtio_user_start_device().
+	 */
+	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
 		free(vm);
 		return NULL;
 	}
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 7df600b02..869e96f87 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -13,6 +13,8 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include <rte_eal_memconfig.h>
+
 #include "vhost.h"
 #include "virtio_user_dev.h"
 #include "../virtio_ethdev.h"
@@ -109,9 +111,24 @@ is_vhost_user_by_type(const char *path)
 int
 virtio_user_start_device(struct virtio_user_dev *dev)
 {
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	uint64_t features;
 	int ret;
 
+	/*
+	 * XXX workaround!
+	 *
+	 * We need to make sure that the locks will be
+	 * taken in the correct order to avoid deadlocks.
+	 *
+	 * Before releasing this lock, this thread should
+	 * not trigger any memory hotplug events.
+	 *
+	 * This is a temporary workaround, and should be
+	 * replaced when we get proper supports from the
+	 * memory subsystem in the future.
+	 */
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 	pthread_mutex_lock(&dev->mutex);
 
 	if (is_vhost_user_by_type(dev->path) && dev->vhostfd < 0)
@@ -152,10 +169,12 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
 	dev->started = true;
 	pthread_mutex_unlock(&dev->mutex);
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 
 	return 0;
 error:
 	pthread_mutex_unlock(&dev->mutex);
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	/* TODO: free resource here or caller to check */
 	return -1;
 }
-- 
2.18.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
@ 2018-09-05  4:28 ` Tiwei Bie
  2018-09-07  9:39   ` Burakov, Anatoly
  2018-09-11 12:58   ` Maxime Coquelin
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel Tiwei Bie
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 23+ messages in thread
From: Tiwei Bie @ 2018-09-05  4:28 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov

Recently some memory APIs were introduced to allow users to
get the file descriptor and offset for each memory segment.
We can leverage those APIs to get rid of the /proc magic on
memory table preparation in vhost-user backend.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_user.c | 211 +++++++++-----------
 1 file changed, 90 insertions(+), 121 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_user.c b/drivers/net/virtio/virtio_user/vhost_user.c
index ef6e43df8..8bd49610b 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -11,6 +11,9 @@
 #include <string.h>
 #include <errno.h>
 
+#include <rte_fbarray.h>
+#include <rte_eal_memconfig.h>
+
 #include "vhost.h"
 #include "virtio_user_dev.h"
 
@@ -121,133 +124,103 @@ vhost_user_read(int fd, struct vhost_user_msg *msg)
 	return -1;
 }
 
-struct hugepage_file_info {
-	uint64_t addr;            /**< virtual addr */
-	size_t   size;            /**< the file size */
-	char     path[PATH_MAX];  /**< path to backing file */
+struct walk_arg {
+	struct vhost_memory *vm;
+	int *fds;
+	int region_nr;
 };
 
-/* Two possible options:
- * 1. Match HUGEPAGE_INFO_FMT to find the file storing struct hugepage_file
- * array. This is simple but cannot be used in secondary process because
- * secondary process will close and munmap that file.
- * 2. Match HUGEFILE_FMT to find hugepage files directly.
- *
- * We choose option 2.
- */
 static int
-get_hugepage_file_info(struct hugepage_file_info huges[], int max)
+update_memory_region(const struct rte_memseg_list *msl __rte_unused,
+		const struct rte_memseg *ms, void *arg)
 {
-	int idx, k, exist;
-	FILE *f;
-	char buf[BUFSIZ], *tmp, *tail;
-	char *str_underline, *str_start;
-	int huge_index;
-	uint64_t v_start, v_end;
-	struct stat stats;
-
-	f = fopen("/proc/self/maps", "r");
-	if (!f) {
-		PMD_DRV_LOG(ERR, "cannot open /proc/self/maps");
-		return -1;
-	}
-
-	idx = 0;
-	while (fgets(buf, sizeof(buf), f) != NULL) {
-		if (sscanf(buf, "%" PRIx64 "-%" PRIx64, &v_start, &v_end) < 2) {
-			PMD_DRV_LOG(ERR, "Failed to parse address");
-			goto error;
-		}
-
-		tmp = strchr(buf, ' ') + 1; /** skip address */
-		tmp = strchr(tmp, ' ') + 1; /** skip perm */
-		tmp = strchr(tmp, ' ') + 1; /** skip offset */
-		tmp = strchr(tmp, ' ') + 1; /** skip dev */
-		tmp = strchr(tmp, ' ') + 1; /** skip inode */
-		while (*tmp == ' ')         /** skip spaces */
-			tmp++;
-		tail = strrchr(tmp, '\n');  /** remove newline if exists */
-		if (tail)
-			*tail = '\0';
-
-		/* Match HUGEFILE_FMT, aka "%s/%smap_%d",
-		 * which is defined in eal_filesystem.h
-		 */
-		str_underline = strrchr(tmp, '_');
-		if (!str_underline)
-			continue;
-
-		str_start = str_underline - strlen("map");
-		if (str_start < tmp)
-			continue;
-
-		if (sscanf(str_start, "map_%d", &huge_index) != 1)
-			continue;
-
-		/* skip duplicated file which is mapped to different regions */
-		for (k = 0, exist = -1; k < idx; ++k) {
-			if (!strcmp(huges[k].path, tmp)) {
-				exist = k;
-				break;
-			}
-		}
-		if (exist >= 0)
-			continue;
-
-		if (idx >= max) {
-			PMD_DRV_LOG(ERR, "Exceed maximum of %d", max);
-			goto error;
-		}
-
-		huges[idx].addr = v_start;
-		huges[idx].size = v_end - v_start; /* To be corrected later */
-		snprintf(huges[idx].path, PATH_MAX, "%s", tmp);
-		idx++;
-	}
-
-	/* correct the size for files who have many regions */
-	for (k = 0; k < idx; ++k) {
-		if (stat(huges[k].path, &stats) < 0) {
-			PMD_DRV_LOG(ERR, "Failed to stat %s, %s\n",
-				    huges[k].path, strerror(errno));
-			continue;
-		}
-		huges[k].size = stats.st_size;
-		PMD_DRV_LOG(INFO, "file %s, size %zx\n",
-			    huges[k].path, huges[k].size);
-	}
-
-	fclose(f);
-	return idx;
-
-error:
-	fclose(f);
-	return -1;
-}
-
-static int
-prepare_vhost_memory_user(struct vhost_user_msg *msg, int fds[])
-{
-	int i, num;
-	struct hugepage_file_info huges[VHOST_MEMORY_MAX_NREGIONS];
+	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
+	uint64_t start_addr, end_addr;
+	size_t offset;
+	int i, fd;
 
-	num = get_hugepage_file_info(huges, VHOST_MEMORY_MAX_NREGIONS);
-	if (num < 0) {
-		PMD_INIT_LOG(ERR, "Failed to prepare memory for vhost-user");
+	fd = rte_memseg_get_fd_thread_unsafe(ms);
+	if (fd < 0) {
+		PMD_DRV_LOG(ERR, "Failed to get fd, ms=%p rte_errno=%d",
+			ms, rte_errno);
 		return -1;
 	}
 
-	for (i = 0; i < num; ++i) {
-		mr = &msg->payload.memory.regions[i];
-		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
-		mr->userspace_addr = huges[i].addr;
-		mr->memory_size = huges[i].size;
-		mr->mmap_offset = 0;
-		fds[i] = open(huges[i].path, O_RDWR);
+	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
+		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
+			ms, rte_errno);
+		return -1;
+	}
+
+	start_addr = (uint64_t)(uintptr_t)ms->addr;
+	end_addr = start_addr + ms->len;
+
+	for (i = 0; i < wa->region_nr; i++) {
+		if (wa->fds[i] != fd)
+			continue;
+
+		mr = &wa->vm->regions[i];
+
+		if (mr->userspace_addr + mr->memory_size < end_addr)
+			mr->memory_size = end_addr - mr->userspace_addr;
+
+		if (mr->userspace_addr > start_addr) {
+			mr->userspace_addr = start_addr;
+			mr->guest_phys_addr = start_addr;
+		}
+
+		if (mr->mmap_offset > offset)
+			mr->mmap_offset = offset;
+
+		PMD_DRV_LOG(DEBUG, "index=%d fd=%d offset=0x%" PRIx64
+			" addr=0x%" PRIx64 " len=%" PRIu64, i, fd,
+			mr->mmap_offset, mr->userspace_addr,
+			mr->memory_size);
+
+		return 0;
+	}
+
+	if (i >= VHOST_MEMORY_MAX_NREGIONS) {
+		PMD_DRV_LOG(ERR, "Too many memory regions");
+		return -1;
 	}
 
-	msg->payload.memory.nregions = num;
+	mr = &wa->vm->regions[i];
+	wa->fds[i] = fd;
+
+	mr->guest_phys_addr = start_addr;
+	mr->userspace_addr = start_addr;
+	mr->memory_size = ms->len;
+	mr->mmap_offset = offset;
+
+	PMD_DRV_LOG(DEBUG, "index=%d fd=%d offset=0x%" PRIx64
+		" addr=0x%" PRIx64 " len=%" PRIu64, i, fd,
+		mr->mmap_offset, mr->userspace_addr,
+		mr->memory_size);
+
+	wa->region_nr++;
+
+	return 0;
+}
+
+static int
+prepare_vhost_memory_user(struct vhost_user_msg *msg, int fds[])
+{
+	struct walk_arg wa;
+
+	wa.region_nr = 0;
+	wa.vm = &msg->payload.memory;
+	wa.fds = fds;
+
+	/*
+	 * The memory lock has already been taken by memory subsystem
+	 * or virtio_user_start_device().
+	 */
+	if (rte_memseg_walk_thread_unsafe(update_memory_region, &wa) < 0)
+		return -1;
+
+	msg->payload.memory.nregions = wa.region_nr;
 	msg->payload.memory.padding = 0;
 
 	return 0;
@@ -280,7 +253,7 @@ vhost_user_sock(struct virtio_user_dev *dev,
 	int need_reply = 0;
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
 	int fd_num = 0;
-	int i, len;
+	int len;
 	int vhostfd = dev->vhostfd;
 
 	RTE_SET_USED(m);
@@ -364,10 +337,6 @@ vhost_user_sock(struct virtio_user_dev *dev,
 		return -1;
 	}
 
-	if (req == VHOST_USER_SET_MEM_TABLE)
-		for (i = 0; i < fd_num; ++i)
-			close(fds[i]);
-
 	if (need_reply) {
 		if (vhost_user_read(vhostfd, &msg) < 0) {
 			PMD_DRV_LOG(ERR, "Received msg failed: %s",
-- 
2.18.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings Tiwei Bie
@ 2018-09-05  4:28 ` Tiwei Bie
  2018-09-07  9:44   ` Burakov, Anatoly
  2018-09-11 13:10   ` Maxime Coquelin
  2018-09-11 13:11 ` [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Maxime Coquelin
  2018-09-20  7:35 ` Maxime Coquelin
  4 siblings, 2 replies; 23+ messages in thread
From: Tiwei Bie @ 2018-09-05  4:28 UTC (permalink / raw)
  To: maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov, stable

It's possible to have much more hugepage backed memory regions
than what vhost-kernel supports due to the memory hotplug, which
may cause problems. A better solution is to have the virtio-user
pass all the memory ranges reserved by DPDK to vhost-kernel.

Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
Cc: stable@dpdk.org

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index 897fee0af..9338166d9 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
 	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
 };
 
-struct walk_arg {
-	struct vhost_memory_kernel *vm;
-	uint32_t region_nr;
-};
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, size_t len, void *arg)
+add_memseg_list(const struct rte_memseg_list *msl, void *arg)
 {
-	struct walk_arg *wa = arg;
+	struct vhost_memory_kernel *vm = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
+	uint64_t len;
 
-	if (wa->region_nr >= max_regions)
+	if (vm->nregions >= max_regions)
 		return -1;
 
-	mr = &wa->vm->regions[wa->region_nr++];
-	start_addr = ms->addr;
+	start_addr = msl->base_va;
+	len = msl->page_sz * msl->memseg_arr.len;
+
+	mr = &vm->regions[vm->nregions++];
 
 	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
 	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
 	mr->memory_size = len;
-	mr->mmap_offset = 0;
+	mr->mmap_offset = 0; /* flags_padding */
+
+	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
+			vm->nregions - 1, start_addr, len);
 
 	return 0;
 }
 
-/* By default, vhost kernel module allows 64 regions, but DPDK allows
- * 256 segments. As a relief, below function merges those virtually
- * adjacent memsegs into one region.
+/* By default, vhost kernel module allows 64 regions, but DPDK may
+ * have much more memory regions. Below function will treat each
+ * contiguous memory space reserved by DPDK as one region.
  */
 static struct vhost_memory_kernel *
 prepare_vhost_memory_kernel(void)
 {
 	struct vhost_memory_kernel *vm;
-	struct walk_arg wa;
 
 	vm = malloc(sizeof(struct vhost_memory_kernel) +
 			max_regions *
@@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
 	if (!vm)
 		return NULL;
 
-	wa.region_nr = 0;
-	wa.vm = vm;
+	vm->nregions = 0;
+	vm->padding = 0;
 
 	/*
 	 * The memory lock has already been taken by memory subsystem
 	 * or virtio_user_start_device().
 	 */
-	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
+	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
 		free(vm);
 		return NULL;
 	}
 
-	vm->nregions = wa.region_nr;
-	vm->padding = 0;
 	return vm;
 }
 
-- 
2.18.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
@ 2018-09-05  8:10   ` Sean Harte
  2018-09-07  9:30   ` Burakov, Anatoly
  2018-09-11 12:52   ` Maxime Coquelin
  2 siblings, 0 replies; 23+ messages in thread
From: Sean Harte @ 2018-09-05  8:10 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, anatoly.burakov, stable

On 5 September 2018 at 05:28, Tiwei Bie <tiwei.bie@intel.com> wrote:
> Deadlock can occur when allocating memory if a vhost-kernel
> based virtio-user device is in use. To fix the deadlock,
> we will take memory hotplug lock explicitly in virtio-user
> when necessary, and always call the _thread_unsafe memory
> functions.
>
> Bugzilla ID: 81
> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> Cc: stable@dpdk.org
>
> Reported-by: Seán Harte <seanbh@gmail.com>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  drivers/net/virtio/virtio_user/vhost_kernel.c |  6 +++++-
>  .../net/virtio/virtio_user/virtio_user_dev.c  | 19 +++++++++++++++++++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
> index b2444096c..897fee0af 100644
> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> @@ -115,7 +115,11 @@ prepare_vhost_memory_kernel(void)
>         wa.region_nr = 0;
>         wa.vm = vm;
>
> -       if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) {
> +       /*
> +        * The memory lock has already been taken by memory subsystem
> +        * or virtio_user_start_device().
> +        */
> +       if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
>                 free(vm);
>                 return NULL;
>         }
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 7df600b02..869e96f87 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -13,6 +13,8 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>
> +#include <rte_eal_memconfig.h>
> +
>  #include "vhost.h"
>  #include "virtio_user_dev.h"
>  #include "../virtio_ethdev.h"
> @@ -109,9 +111,24 @@ is_vhost_user_by_type(const char *path)
>  int
>  virtio_user_start_device(struct virtio_user_dev *dev)
>  {
> +       struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>         uint64_t features;
>         int ret;
>
> +       /*
> +        * XXX workaround!
> +        *
> +        * We need to make sure that the locks will be
> +        * taken in the correct order to avoid deadlocks.
> +        *
> +        * Before releasing this lock, this thread should
> +        * not trigger any memory hotplug events.
> +        *
> +        * This is a temporary workaround, and should be
> +        * replaced when we get proper supports from the
> +        * memory subsystem in the future.
> +        */
> +       rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
>         pthread_mutex_lock(&dev->mutex);
>
>         if (is_vhost_user_by_type(dev->path) && dev->vhostfd < 0)
> @@ -152,10 +169,12 @@ virtio_user_start_device(struct virtio_user_dev *dev)
>
>         dev->started = true;
>         pthread_mutex_unlock(&dev->mutex);
> +       rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>
>         return 0;
>  error:
>         pthread_mutex_unlock(&dev->mutex);
> +       rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>         /* TODO: free resource here or caller to check */
>         return -1;
>  }
> --
> 2.18.0
>

Tested-by: Seán Harte <seanbh@gmail.com>
Reviewed-by: Seán Harte <seanbh@gmail.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
  2018-09-05  8:10   ` Sean Harte
@ 2018-09-07  9:30   ` Burakov, Anatoly
  2018-09-11 12:52   ` Maxime Coquelin
  2 siblings, 0 replies; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-07  9:30 UTC (permalink / raw)
  To: Tiwei Bie, maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, stable

On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> Deadlock can occur when allocating memory if a vhost-kernel
> based virtio-user device is in use. To fix the deadlock,
> we will take memory hotplug lock explicitly in virtio-user
> when necessary, and always call the _thread_unsafe memory
> functions.
> 
> Bugzilla ID: 81
> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> Cc: stable@dpdk.org
> 
> Reported-by: Seán Harte <seanbh@gmail.com>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_user/vhost_kernel.c |  6 +++++-
>   .../net/virtio/virtio_user/virtio_user_dev.c  | 19 +++++++++++++++++++
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
> index b2444096c..897fee0af 100644
> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> @@ -115,7 +115,11 @@ prepare_vhost_memory_kernel(void)
>   	wa.region_nr = 0;
>   	wa.vm = vm;
>   
> -	if (rte_memseg_contig_walk(add_memory_region, &wa) < 0) {
> +	/*
> +	 * The memory lock has already been taken by memory subsystem
> +	 * or virtio_user_start_device().
> +	 */
> +	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
>   		free(vm);
>   		return NULL;
>   	}
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 7df600b02..869e96f87 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -13,6 +13,8 @@
>   #include <sys/types.h>
>   #include <sys/stat.h>
>   
> +#include <rte_eal_memconfig.h>
> +
>   #include "vhost.h"
>   #include "virtio_user_dev.h"
>   #include "../virtio_ethdev.h"
> @@ -109,9 +111,24 @@ is_vhost_user_by_type(const char *path)
>   int
>   virtio_user_start_device(struct virtio_user_dev *dev)
>   {
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>   	uint64_t features;
>   	int ret;
>   
> +	/*
> +	 * XXX workaround!
> +	 *
> +	 * We need to make sure that the locks will be
> +	 * taken in the correct order to avoid deadlocks.
> +	 *
> +	 * Before releasing this lock, this thread should
> +	 * not trigger any memory hotplug events.
> +	 *
> +	 * This is a temporary workaround, and should be
> +	 * replaced when we get proper supports from the
> +	 * memory subsystem in the future.
> +	 */
> +	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
>   	pthread_mutex_lock(&dev->mutex);
>   
>   	if (is_vhost_user_by_type(dev->path) && dev->vhostfd < 0)
> @@ -152,10 +169,12 @@ virtio_user_start_device(struct virtio_user_dev *dev)
>   
>   	dev->started = true;
>   	pthread_mutex_unlock(&dev->mutex);
> +	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>   
>   	return 0;
>   error:
>   	pthread_mutex_unlock(&dev->mutex);
> +	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>   	/* TODO: free resource here or caller to check */
>   	return -1;
>   }
> 

For the memory parts,

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings Tiwei Bie
@ 2018-09-07  9:39   ` Burakov, Anatoly
  2018-09-07 11:35     ` Tiwei Bie
  2018-09-11 12:58   ` Maxime Coquelin
  1 sibling, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-07  9:39 UTC (permalink / raw)
  To: Tiwei Bie, maxime.coquelin, zhihong.wang, dev; +Cc: seanbh

On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> Recently some memory APIs were introduced to allow users to
> get the file descriptor and offset for each memory segment.
> We can leverage those APIs to get rid of the /proc magic on
> memory table preparation in vhost-user backend.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---

<snip>

> -	for (i = 0; i < num; ++i) {
> -		mr = &msg->payload.memory.regions[i];
> -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
> -		mr->userspace_addr = huges[i].addr;
> -		mr->memory_size = huges[i].size;
> -		mr->mmap_offset = 0;
> -		fds[i] = open(huges[i].path, O_RDWR);
> +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
> +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
> +			ms, rte_errno);
> +		return -1;
> +	}
> +
> +	start_addr = (uint64_t)(uintptr_t)ms->addr;
> +	end_addr = start_addr + ms->len;
> +
> +	for (i = 0; i < wa->region_nr; i++) {

There has to be a better way than to run this loop on every segment. 
Maybe store last-used region, and only do a region look up if there's a 
mismatch? Generally, in single-file segments mode, you'll get lots of 
segments from one memseg list one after the other, so you will need to 
do a lookup once memseg list changes, instead of on each segment.

> +		if (wa->fds[i] != fd)
> +			continue;
> +
> +		mr = &wa->vm->regions[i];
> +
> +		if (mr->userspace_addr + mr->memory_size < end_addr)
> +			mr->memory_size = end_addr - mr->userspace_addr;
> +

<snip>

>   	int fds[VHOST_MEMORY_MAX_NREGIONS];
>   	int fd_num = 0;
> -	int i, len;
> +	int len;
>   	int vhostfd = dev->vhostfd;
>   
>   	RTE_SET_USED(m);
> @@ -364,10 +337,6 @@ vhost_user_sock(struct virtio_user_dev *dev,
>   		return -1;
>   	}
>   
> -	if (req == VHOST_USER_SET_MEM_TABLE)
> -		for (i = 0; i < fd_num; ++i)
> -			close(fds[i]);
> -

You're sharing fd's - presumably the other side of this is in a 
different process, so it's safe to close these fd's there?

>   	if (need_reply) {
>   		if (vhost_user_read(vhostfd, &msg) < 0) {
>   			PMD_DRV_LOG(ERR, "Received msg failed: %s",
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel Tiwei Bie
@ 2018-09-07  9:44   ` Burakov, Anatoly
  2018-09-07 11:37     ` Tiwei Bie
  2018-09-11 13:10   ` Maxime Coquelin
  1 sibling, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-07  9:44 UTC (permalink / raw)
  To: Tiwei Bie, maxime.coquelin, zhihong.wang, dev; +Cc: seanbh, stable

On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> It's possible to have much more hugepage backed memory regions
> than what vhost-kernel supports due to the memory hotplug, which
> may cause problems. A better solution is to have the virtio-user
> pass all the memory ranges reserved by DPDK to vhost-kernel.
> 
> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
>   1 file changed, 18 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
> index 897fee0af..9338166d9 100644
> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> @@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
>   	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
>   };
>   
> -struct walk_arg {
> -	struct vhost_memory_kernel *vm;
> -	uint32_t region_nr;
> -};
>   static int
> -add_memory_region(const struct rte_memseg_list *msl __rte_unused,
> -		const struct rte_memseg *ms, size_t len, void *arg)
> +add_memseg_list(const struct rte_memseg_list *msl, void *arg)
>   {
> -	struct walk_arg *wa = arg;
> +	struct vhost_memory_kernel *vm = arg;
>   	struct vhost_memory_region *mr;
>   	void *start_addr;
> +	uint64_t len;
>   
> -	if (wa->region_nr >= max_regions)
> +	if (vm->nregions >= max_regions)
>   		return -1;
>   
> -	mr = &wa->vm->regions[wa->region_nr++];
> -	start_addr = ms->addr;
> +	start_addr = msl->base_va;
> +	len = msl->page_sz * msl->memseg_arr.len;
> +
> +	mr = &vm->regions[vm->nregions++];
>   
>   	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
>   	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
>   	mr->memory_size = len;
> -	mr->mmap_offset = 0;
> +	mr->mmap_offset = 0; /* flags_padding */
> +
> +	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
> +			vm->nregions - 1, start_addr, len);
>   
>   	return 0;
>   }
>   
> -/* By default, vhost kernel module allows 64 regions, but DPDK allows
> - * 256 segments. As a relief, below function merges those virtually
> - * adjacent memsegs into one region.
> +/* By default, vhost kernel module allows 64 regions, but DPDK may
> + * have much more memory regions. Below function will treat each
> + * contiguous memory space reserved by DPDK as one region.
>    */
>   static struct vhost_memory_kernel *
>   prepare_vhost_memory_kernel(void)
>   {
>   	struct vhost_memory_kernel *vm;
> -	struct walk_arg wa;
>   
>   	vm = malloc(sizeof(struct vhost_memory_kernel) +
>   			max_regions *
> @@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
>   	if (!vm)
>   		return NULL;
>   
> -	wa.region_nr = 0;
> -	wa.vm = vm;
> +	vm->nregions = 0;
> +	vm->padding = 0;
>   
>   	/*
>   	 * The memory lock has already been taken by memory subsystem
>   	 * or virtio_user_start_device().
>   	 */
> -	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
> +	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
>   		free(vm);
>   		return NULL;
>   	}
>   
> -	vm->nregions = wa.region_nr;
> -	vm->padding = 0;
>   	return vm;
>   }
>   
> 

Doesn't that assume single file segments mode?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-07  9:39   ` Burakov, Anatoly
@ 2018-09-07 11:35     ` Tiwei Bie
  2018-09-07 12:21       ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Tiwei Bie @ 2018-09-07 11:35 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> > Recently some memory APIs were introduced to allow users to
> > get the file descriptor and offset for each memory segment.
> > We can leverage those APIs to get rid of the /proc magic on
> > memory table preparation in vhost-user backend.
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> 
> <snip>
> 
> > -	for (i = 0; i < num; ++i) {
> > -		mr = &msg->payload.memory.regions[i];
> > -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
> > -		mr->userspace_addr = huges[i].addr;
> > -		mr->memory_size = huges[i].size;
> > -		mr->mmap_offset = 0;
> > -		fds[i] = open(huges[i].path, O_RDWR);
> > +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
> > +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
> > +			ms, rte_errno);
> > +		return -1;
> > +	}
> > +
> > +	start_addr = (uint64_t)(uintptr_t)ms->addr;
> > +	end_addr = start_addr + ms->len;
> > +
> > +	for (i = 0; i < wa->region_nr; i++) {
> 
> There has to be a better way than to run this loop on every segment. Maybe
> store last-used region, and only do a region look up if there's a mismatch?
> Generally, in single-file segments mode, you'll get lots of segments from
> one memseg list one after the other, so you will need to do a lookup once
> memseg list changes, instead of on each segment.

We may have holes in one memseg list due to dynamic free.
Do you mean we just need to do rte_memseg_contig_walk()
and we can assume that fds of the contiguous memegs will
be the same?

> 
> > +		if (wa->fds[i] != fd)
> > +			continue;
> > +
> > +		mr = &wa->vm->regions[i];
> > +
> > +		if (mr->userspace_addr + mr->memory_size < end_addr)
> > +			mr->memory_size = end_addr - mr->userspace_addr;
> > +
> 
> <snip>
> 
> >   	int fds[VHOST_MEMORY_MAX_NREGIONS];
> >   	int fd_num = 0;
> > -	int i, len;
> > +	int len;
> >   	int vhostfd = dev->vhostfd;
> >   	RTE_SET_USED(m);
> > @@ -364,10 +337,6 @@ vhost_user_sock(struct virtio_user_dev *dev,
> >   		return -1;
> >   	}
> > -	if (req == VHOST_USER_SET_MEM_TABLE)
> > -		for (i = 0; i < fd_num; ++i)
> > -			close(fds[i]);
> > -
> 
> You're sharing fd's - presumably the other side of this is in a different
> process, so it's safe to close these fd's there?

Above code belongs to virtio-user, and it will close the
fds got from rte_memseg_get_fd_thread_unsafe().

Below is the code which will close these fds on the other
side (i.e. the vhost-user process):

https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L805
https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L97


> 
> >   	if (need_reply) {
> >   		if (vhost_user_read(vhostfd, &msg) < 0) {
> >   			PMD_DRV_LOG(ERR, "Received msg failed: %s",
> > 
> 
> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-07  9:44   ` Burakov, Anatoly
@ 2018-09-07 11:37     ` Tiwei Bie
  2018-09-07 12:24       ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Tiwei Bie @ 2018-09-07 11:37 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh, stable

On Fri, Sep 07, 2018 at 10:44:22AM +0100, Burakov, Anatoly wrote:
> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> > It's possible to have much more hugepage backed memory regions
> > than what vhost-kernel supports due to the memory hotplug, which
> > may cause problems. A better solution is to have the virtio-user
> > pass all the memory ranges reserved by DPDK to vhost-kernel.
> > 
> > Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
> >   1 file changed, 18 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
> > index 897fee0af..9338166d9 100644
> > --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> > +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> > @@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
> >   	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
> >   };
> > -struct walk_arg {
> > -	struct vhost_memory_kernel *vm;
> > -	uint32_t region_nr;
> > -};
> >   static int
> > -add_memory_region(const struct rte_memseg_list *msl __rte_unused,
> > -		const struct rte_memseg *ms, size_t len, void *arg)
> > +add_memseg_list(const struct rte_memseg_list *msl, void *arg)
> >   {
> > -	struct walk_arg *wa = arg;
> > +	struct vhost_memory_kernel *vm = arg;
> >   	struct vhost_memory_region *mr;
> >   	void *start_addr;
> > +	uint64_t len;
> > -	if (wa->region_nr >= max_regions)
> > +	if (vm->nregions >= max_regions)
> >   		return -1;
> > -	mr = &wa->vm->regions[wa->region_nr++];
> > -	start_addr = ms->addr;
> > +	start_addr = msl->base_va;
> > +	len = msl->page_sz * msl->memseg_arr.len;
> > +
> > +	mr = &vm->regions[vm->nregions++];
> >   	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
> >   	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
> >   	mr->memory_size = len;
> > -	mr->mmap_offset = 0;
> > +	mr->mmap_offset = 0; /* flags_padding */
> > +
> > +	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
> > +			vm->nregions - 1, start_addr, len);
> >   	return 0;
> >   }
> > -/* By default, vhost kernel module allows 64 regions, but DPDK allows
> > - * 256 segments. As a relief, below function merges those virtually
> > - * adjacent memsegs into one region.
> > +/* By default, vhost kernel module allows 64 regions, but DPDK may
> > + * have much more memory regions. Below function will treat each
> > + * contiguous memory space reserved by DPDK as one region.
> >    */
> >   static struct vhost_memory_kernel *
> >   prepare_vhost_memory_kernel(void)
> >   {
> >   	struct vhost_memory_kernel *vm;
> > -	struct walk_arg wa;
> >   	vm = malloc(sizeof(struct vhost_memory_kernel) +
> >   			max_regions *
> > @@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
> >   	if (!vm)
> >   		return NULL;
> > -	wa.region_nr = 0;
> > -	wa.vm = vm;
> > +	vm->nregions = 0;
> > +	vm->padding = 0;
> >   	/*
> >   	 * The memory lock has already been taken by memory subsystem
> >   	 * or virtio_user_start_device().
> >   	 */
> > -	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
> > +	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
> >   		free(vm);
> >   		return NULL;
> >   	}
> > -	vm->nregions = wa.region_nr;
> > -	vm->padding = 0;
> >   	return vm;
> >   }
> > 
> 
> Doesn't that assume single file segments mode?

This is to find out the VA ranges reserved by memory subsystem.
Why does it need to assume single file segments mode?


> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-07 11:35     ` Tiwei Bie
@ 2018-09-07 12:21       ` Burakov, Anatoly
  2018-09-10  3:59         ` Tiwei Bie
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-07 12:21 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On 07-Sep-18 12:35 PM, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
>> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
>>> Recently some memory APIs were introduced to allow users to
>>> get the file descriptor and offset for each memory segment.
>>> We can leverage those APIs to get rid of the /proc magic on
>>> memory table preparation in vhost-user backend.
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>
>> <snip>
>>
>>> -	for (i = 0; i < num; ++i) {
>>> -		mr = &msg->payload.memory.regions[i];
>>> -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
>>> -		mr->userspace_addr = huges[i].addr;
>>> -		mr->memory_size = huges[i].size;
>>> -		mr->mmap_offset = 0;
>>> -		fds[i] = open(huges[i].path, O_RDWR);
>>> +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
>>> +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
>>> +			ms, rte_errno);
>>> +		return -1;
>>> +	}
>>> +
>>> +	start_addr = (uint64_t)(uintptr_t)ms->addr;
>>> +	end_addr = start_addr + ms->len;
>>> +
>>> +	for (i = 0; i < wa->region_nr; i++) {
>>
>> There has to be a better way than to run this loop on every segment. Maybe
>> store last-used region, and only do a region look up if there's a mismatch?
>> Generally, in single-file segments mode, you'll get lots of segments from
>> one memseg list one after the other, so you will need to do a lookup once
>> memseg list changes, instead of on each segment.
> 
> We may have holes in one memseg list due to dynamic free.
> Do you mean we just need to do rte_memseg_contig_walk()
> and we can assume that fds of the contiguous memegs will
> be the same?

No, i didn't mean that.

Whether or not you are in single-file segments mode, you still need to 
scan each segment. However, you lose your state when you exit this 
function, and thus have to look up the appropriate memory region (that 
matches your fd) again, over and over. It would be good if you could 
store last-used memory region somewhere, so that next time you come back 
into this function, if the memseg has the same fd, you will not have to 
look it up.

Something like this:

struct walk_arg {
	*last_used;
	<snip>
}

int walk_func() {
	<snip>
	cur_region = wa->last_used; // check if it matches
	if (cur->region->fd != fd) {
		// fd is different - we've changed the segment
		<snip>
		wa->last_used = cur_region
	}
}

So, cache last used region to not have to look it up again, because 
chances are, you won't have to. That way, you will still do region 
lookups, but only if you have to - not every time.

> 
>>
>>> +		if (wa->fds[i] != fd)
>>> +			continue;
>>> +
>>> +		mr = &wa->vm->regions[i];
>>> +
>>> +		if (mr->userspace_addr + mr->memory_size < end_addr)
>>> +			mr->memory_size = end_addr - mr->userspace_addr;
>>> +
>>
>> <snip>
>>
>>>    	int fds[VHOST_MEMORY_MAX_NREGIONS];
>>>    	int fd_num = 0;
>>> -	int i, len;
>>> +	int len;
>>>    	int vhostfd = dev->vhostfd;
>>>    	RTE_SET_USED(m);
>>> @@ -364,10 +337,6 @@ vhost_user_sock(struct virtio_user_dev *dev,
>>>    		return -1;
>>>    	}
>>> -	if (req == VHOST_USER_SET_MEM_TABLE)
>>> -		for (i = 0; i < fd_num; ++i)
>>> -			close(fds[i]);
>>> -
>>
>> You're sharing fd's - presumably the other side of this is in a different
>> process, so it's safe to close these fd's there?
> 
> Above code belongs to virtio-user, and it will close the
> fds got from rte_memseg_get_fd_thread_unsafe().
> 
> Below is the code which will close these fds on the other
> side (i.e. the vhost-user process):
> 
> https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L805
> https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L97

OK, so not a problem then.

> 
> 
>>
>>>    	if (need_reply) {
>>>    		if (vhost_user_read(vhostfd, &msg) < 0) {
>>>    			PMD_DRV_LOG(ERR, "Received msg failed: %s",
>>>
>>
>>
>> -- 
>> Thanks,
>> Anatoly
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-07 11:37     ` Tiwei Bie
@ 2018-09-07 12:24       ` Burakov, Anatoly
  2018-09-10  4:04         ` Tiwei Bie
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-07 12:24 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh, stable

On 07-Sep-18 12:37 PM, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:44:22AM +0100, Burakov, Anatoly wrote:
>> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
>>> It's possible to have much more hugepage backed memory regions
>>> than what vhost-kernel supports due to the memory hotplug, which
>>> may cause problems. A better solution is to have the virtio-user
>>> pass all the memory ranges reserved by DPDK to vhost-kernel.
>>>
>>> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
>>>    1 file changed, 18 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
>>> index 897fee0af..9338166d9 100644
>>> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
>>> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
>>> @@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
>>>    	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
>>>    };
>>> -struct walk_arg {
>>> -	struct vhost_memory_kernel *vm;
>>> -	uint32_t region_nr;
>>> -};
>>>    static int
>>> -add_memory_region(const struct rte_memseg_list *msl __rte_unused,
>>> -		const struct rte_memseg *ms, size_t len, void *arg)
>>> +add_memseg_list(const struct rte_memseg_list *msl, void *arg)
>>>    {
>>> -	struct walk_arg *wa = arg;
>>> +	struct vhost_memory_kernel *vm = arg;
>>>    	struct vhost_memory_region *mr;
>>>    	void *start_addr;
>>> +	uint64_t len;
>>> -	if (wa->region_nr >= max_regions)
>>> +	if (vm->nregions >= max_regions)
>>>    		return -1;
>>> -	mr = &wa->vm->regions[wa->region_nr++];
>>> -	start_addr = ms->addr;
>>> +	start_addr = msl->base_va;
>>> +	len = msl->page_sz * msl->memseg_arr.len;
>>> +
>>> +	mr = &vm->regions[vm->nregions++];
>>>    	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
>>>    	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
>>>    	mr->memory_size = len;
>>> -	mr->mmap_offset = 0;
>>> +	mr->mmap_offset = 0; /* flags_padding */
>>> +
>>> +	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
>>> +			vm->nregions - 1, start_addr, len);
>>>    	return 0;
>>>    }
>>> -/* By default, vhost kernel module allows 64 regions, but DPDK allows
>>> - * 256 segments. As a relief, below function merges those virtually
>>> - * adjacent memsegs into one region.
>>> +/* By default, vhost kernel module allows 64 regions, but DPDK may
>>> + * have much more memory regions. Below function will treat each
>>> + * contiguous memory space reserved by DPDK as one region.
>>>     */
>>>    static struct vhost_memory_kernel *
>>>    prepare_vhost_memory_kernel(void)
>>>    {
>>>    	struct vhost_memory_kernel *vm;
>>> -	struct walk_arg wa;
>>>    	vm = malloc(sizeof(struct vhost_memory_kernel) +
>>>    			max_regions *
>>> @@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
>>>    	if (!vm)
>>>    		return NULL;
>>> -	wa.region_nr = 0;
>>> -	wa.vm = vm;
>>> +	vm->nregions = 0;
>>> +	vm->padding = 0;
>>>    	/*
>>>    	 * The memory lock has already been taken by memory subsystem
>>>    	 * or virtio_user_start_device().
>>>    	 */
>>> -	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
>>> +	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
>>>    		free(vm);
>>>    		return NULL;
>>>    	}
>>> -	vm->nregions = wa.region_nr;
>>> -	vm->padding = 0;
>>>    	return vm;
>>>    }
>>>
>>
>> Doesn't that assume single file segments mode?
> 
> This is to find out the VA ranges reserved by memory subsystem.
> Why does it need to assume single file segments mode?

If you are not in single-file segments mode, each individual page in a 
VA-contiguous area will be behind a different fd - so it will be part of 
a different region, would it not?

> 
> 
>>
>> -- 
>> Thanks,
>> Anatoly
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-07 12:21       ` Burakov, Anatoly
@ 2018-09-10  3:59         ` Tiwei Bie
  2018-09-17 10:17           ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Tiwei Bie @ 2018-09-10  3:59 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On Fri, Sep 07, 2018 at 01:21:35PM +0100, Burakov, Anatoly wrote:
> On 07-Sep-18 12:35 PM, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
> > > On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> > > > Recently some memory APIs were introduced to allow users to
> > > > get the file descriptor and offset for each memory segment.
> > > > We can leverage those APIs to get rid of the /proc magic on
> > > > memory table preparation in vhost-user backend.
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > 
> > > <snip>
> > > 
> > > > -	for (i = 0; i < num; ++i) {
> > > > -		mr = &msg->payload.memory.regions[i];
> > > > -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
> > > > -		mr->userspace_addr = huges[i].addr;
> > > > -		mr->memory_size = huges[i].size;
> > > > -		mr->mmap_offset = 0;
> > > > -		fds[i] = open(huges[i].path, O_RDWR);
> > > > +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
> > > > +			ms, rte_errno);
> > > > +		return -1;
> > > > +	}
> > > > +
> > > > +	start_addr = (uint64_t)(uintptr_t)ms->addr;
> > > > +	end_addr = start_addr + ms->len;
> > > > +
> > > > +	for (i = 0; i < wa->region_nr; i++) {
> > > 
> > > There has to be a better way than to run this loop on every segment. Maybe
> > > store last-used region, and only do a region look up if there's a mismatch?
> > > Generally, in single-file segments mode, you'll get lots of segments from
> > > one memseg list one after the other, so you will need to do a lookup once
> > > memseg list changes, instead of on each segment.
> > 
> > We may have holes in one memseg list due to dynamic free.
> > Do you mean we just need to do rte_memseg_contig_walk()
> > and we can assume that fds of the contiguous memegs will
> > be the same?
> 
> No, i didn't mean that.
> 
> Whether or not you are in single-file segments mode, you still need to scan
> each segment. However, you lose your state when you exit this function, and
> thus have to look up the appropriate memory region (that matches your fd)
> again, over and over. It would be good if you could store last-used memory
> region somewhere, so that next time you come back into this function, if the
> memseg has the same fd, you will not have to look it up.
> 
> Something like this:
> 
> struct walk_arg {
> 	*last_used;
> 	<snip>
> }
> 
> int walk_func() {
> 	<snip>
> 	cur_region = wa->last_used; // check if it matches
> 	if (cur->region->fd != fd) {
> 		// fd is different - we've changed the segment
> 		<snip>
> 		wa->last_used = cur_region
> 	}
> }

Thanks for the code. :)

> 
> So, cache last used region to not have to look it up again, because chances
> are, you won't have to. That way, you will still do region lookups, but only
> if you have to - not every time.

I can do it, but I'm not sure this optimization is really
necessary. Because this loop should be quite fast, as the
max number of regions permitted by vhost-user is quite
small. And actually we need to do that loop at least once
for each packet in vhost-user's dequeue and enqueue path,
i.e. the data path.


> 
> > 
> > > 
> > > > +		if (wa->fds[i] != fd)
> > > > +			continue;
> > > > +
> > > > +		mr = &wa->vm->regions[i];
> > > > +
> > > > +		if (mr->userspace_addr + mr->memory_size < end_addr)
> > > > +			mr->memory_size = end_addr - mr->userspace_addr;
> > > > +
> > > 
> > > <snip>
> > > 
> > > >    	int fds[VHOST_MEMORY_MAX_NREGIONS];
> > > >    	int fd_num = 0;
> > > > -	int i, len;
> > > > +	int len;
> > > >    	int vhostfd = dev->vhostfd;
> > > >    	RTE_SET_USED(m);
> > > > @@ -364,10 +337,6 @@ vhost_user_sock(struct virtio_user_dev *dev,
> > > >    		return -1;
> > > >    	}
> > > > -	if (req == VHOST_USER_SET_MEM_TABLE)
> > > > -		for (i = 0; i < fd_num; ++i)
> > > > -			close(fds[i]);
> > > > -
> > > 
> > > You're sharing fd's - presumably the other side of this is in a different
> > > process, so it's safe to close these fd's there?
> > 
> > Above code belongs to virtio-user, and it will close the
> > fds got from rte_memseg_get_fd_thread_unsafe().
> > 
> > Below is the code which will close these fds on the other
> > side (i.e. the vhost-user process):
> > 
> > https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L805
> > https://github.com/DPDK/dpdk/blob/3605968c2fa7/lib/librte_vhost/vhost_user.c#L97
> 
> OK, so not a problem then.
> 
> > 
> > 
> > > 
> > > >    	if (need_reply) {
> > > >    		if (vhost_user_read(vhostfd, &msg) < 0) {
> > > >    			PMD_DRV_LOG(ERR, "Received msg failed: %s",
> > > > 
> > > 
> > > 
> > > -- 
> > > Thanks,
> > > Anatoly
> > 
> 
> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-07 12:24       ` Burakov, Anatoly
@ 2018-09-10  4:04         ` Tiwei Bie
  2018-09-17 10:18           ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Tiwei Bie @ 2018-09-10  4:04 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh, stable

On Fri, Sep 07, 2018 at 01:24:05PM +0100, Burakov, Anatoly wrote:
> On 07-Sep-18 12:37 PM, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 10:44:22AM +0100, Burakov, Anatoly wrote:
> > > On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> > > > It's possible to have much more hugepage backed memory regions
> > > > than what vhost-kernel supports due to the memory hotplug, which
> > > > may cause problems. A better solution is to have the virtio-user
> > > > pass all the memory ranges reserved by DPDK to vhost-kernel.
> > > > 
> > > > Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > >    drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
> > > >    1 file changed, 18 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
> > > > index 897fee0af..9338166d9 100644
> > > > --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> > > > +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> > > > @@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
> > > >    	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
> > > >    };
> > > > -struct walk_arg {
> > > > -	struct vhost_memory_kernel *vm;
> > > > -	uint32_t region_nr;
> > > > -};
> > > >    static int
> > > > -add_memory_region(const struct rte_memseg_list *msl __rte_unused,
> > > > -		const struct rte_memseg *ms, size_t len, void *arg)
> > > > +add_memseg_list(const struct rte_memseg_list *msl, void *arg)
> > > >    {
> > > > -	struct walk_arg *wa = arg;
> > > > +	struct vhost_memory_kernel *vm = arg;
> > > >    	struct vhost_memory_region *mr;
> > > >    	void *start_addr;
> > > > +	uint64_t len;
> > > > -	if (wa->region_nr >= max_regions)
> > > > +	if (vm->nregions >= max_regions)
> > > >    		return -1;
> > > > -	mr = &wa->vm->regions[wa->region_nr++];
> > > > -	start_addr = ms->addr;
> > > > +	start_addr = msl->base_va;
> > > > +	len = msl->page_sz * msl->memseg_arr.len;
> > > > +
> > > > +	mr = &vm->regions[vm->nregions++];
> > > >    	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
> > > >    	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
> > > >    	mr->memory_size = len;
> > > > -	mr->mmap_offset = 0;
> > > > +	mr->mmap_offset = 0; /* flags_padding */
> > > > +
> > > > +	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
> > > > +			vm->nregions - 1, start_addr, len);
> > > >    	return 0;
> > > >    }
> > > > -/* By default, vhost kernel module allows 64 regions, but DPDK allows
> > > > - * 256 segments. As a relief, below function merges those virtually
> > > > - * adjacent memsegs into one region.
> > > > +/* By default, vhost kernel module allows 64 regions, but DPDK may
> > > > + * have much more memory regions. Below function will treat each
> > > > + * contiguous memory space reserved by DPDK as one region.
> > > >     */
> > > >    static struct vhost_memory_kernel *
> > > >    prepare_vhost_memory_kernel(void)
> > > >    {
> > > >    	struct vhost_memory_kernel *vm;
> > > > -	struct walk_arg wa;
> > > >    	vm = malloc(sizeof(struct vhost_memory_kernel) +
> > > >    			max_regions *
> > > > @@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
> > > >    	if (!vm)
> > > >    		return NULL;
> > > > -	wa.region_nr = 0;
> > > > -	wa.vm = vm;
> > > > +	vm->nregions = 0;
> > > > +	vm->padding = 0;
> > > >    	/*
> > > >    	 * The memory lock has already been taken by memory subsystem
> > > >    	 * or virtio_user_start_device().
> > > >    	 */
> > > > -	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
> > > > +	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
> > > >    		free(vm);
> > > >    		return NULL;
> > > >    	}
> > > > -	vm->nregions = wa.region_nr;
> > > > -	vm->padding = 0;
> > > >    	return vm;
> > > >    }
> > > > 
> > > 
> > > Doesn't that assume single file segments mode?
> > 
> > This is to find out the VA ranges reserved by memory subsystem.
> > Why does it need to assume single file segments mode?
> 
> If you are not in single-file segments mode, each individual page in a
> VA-contiguous area will be behind a different fd - so it will be part of a
> different region, would it not?

Above code is for vhost-kernel. Kernel doesn't need the
fds to get the access to virtio-user process's memory.
Kernel just needs to know the mappings between GPA (guest
physical address) and VA (virtio-user's virtual address).


> 
> > 
> > 
> > > 
> > > -- 
> > > Thanks,
> > > Anatoly
> > 
> 
> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
  2018-09-05  8:10   ` Sean Harte
  2018-09-07  9:30   ` Burakov, Anatoly
@ 2018-09-11 12:52   ` Maxime Coquelin
  2 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2018-09-11 12:52 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov, stable



On 09/05/2018 06:28 AM, Tiwei Bie wrote:
> Deadlock can occur when allocating memory if a vhost-kernel
> based virtio-user device is in use. To fix the deadlock,
> we will take memory hotplug lock explicitly in virtio-user
> when necessary, and always call the _thread_unsafe memory
> functions.
> 
> Bugzilla ID: 81
> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> Cc: stable@dpdk.org
> 
> Reported-by: Seán Harte <seanbh@gmail.com>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_user/vhost_kernel.c |  6 +++++-
>   .../net/virtio/virtio_user/virtio_user_dev.c  | 19 +++++++++++++++++++
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings Tiwei Bie
  2018-09-07  9:39   ` Burakov, Anatoly
@ 2018-09-11 12:58   ` Maxime Coquelin
  1 sibling, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2018-09-11 12:58 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov



On 09/05/2018 06:28 AM, Tiwei Bie wrote:
> Recently some memory APIs were introduced to allow users to
> get the file descriptor and offset for each memory segment.
> We can leverage those APIs to get rid of the /proc magic on
> memory table preparation in vhost-user backend.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_user/vhost_user.c | 211 +++++++++-----------
>   1 file changed, 90 insertions(+), 121 deletions(-)
> 

Nice to get rid off the /proc parsing!

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel Tiwei Bie
  2018-09-07  9:44   ` Burakov, Anatoly
@ 2018-09-11 13:10   ` Maxime Coquelin
  1 sibling, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2018-09-11 13:10 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov, stable



On 09/05/2018 06:28 AM, Tiwei Bie wrote:
> It's possible to have much more hugepage backed memory regions
> than what vhost-kernel supports due to the memory hotplug, which
> may cause problems. A better solution is to have the virtio-user
> pass all the memory ranges reserved by DPDK to vhost-kernel.
> 
> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
>   1 file changed, 18 insertions(+), 20 deletions(-)


Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table
  2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
                   ` (2 preceding siblings ...)
  2018-09-05  4:28 ` [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel Tiwei Bie
@ 2018-09-11 13:11 ` Maxime Coquelin
  2018-09-20  7:35 ` Maxime Coquelin
  4 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2018-09-11 13:11 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov



On 09/05/2018 06:28 AM, Tiwei Bie wrote:
> This series consists of some fixes and improvements for virtio-user's
> memory table preparation.
> 
> This series supersedes below patches:
> https://patches.dpdk.org/patch/43807/
> https://patches.dpdk.org/patch/43918/
> 
> The second patch in this series depends on the below patch set:
> http://patches.dpdk.org/project/dpdk/list/?series=1177
> 
> Tiwei Bie (3):
>    net/virtio-user: fix deadlock in memory events callback
>    net/virtio-user: avoid parsing process mappings
>    net/virtio-user: fix memory hotplug support in vhost-kernel
> 
>   drivers/net/virtio/virtio_user/vhost_kernel.c |  50 +++--
>   drivers/net/virtio/virtio_user/vhost_user.c   | 211 ++++++++----------
>   .../net/virtio/virtio_user/virtio_user_dev.c  |  19 ++
>   3 files changed, 135 insertions(+), 145 deletions(-)
> 

That's all good to me.
I'll apply it once Anatoly's series is accepted.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-10  3:59         ` Tiwei Bie
@ 2018-09-17 10:17           ` Burakov, Anatoly
  2018-09-17 11:57             ` Tiwei Bie
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 10:17 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On 10-Sep-18 4:59 AM, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 01:21:35PM +0100, Burakov, Anatoly wrote:
>> On 07-Sep-18 12:35 PM, Tiwei Bie wrote:
>>> On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
>>>> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
>>>>> Recently some memory APIs were introduced to allow users to
>>>>> get the file descriptor and offset for each memory segment.
>>>>> We can leverage those APIs to get rid of the /proc magic on
>>>>> memory table preparation in vhost-user backend.
>>>>>
>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> ---
>>>>
>>>> <snip>
>>>>
>>>>> -	for (i = 0; i < num; ++i) {
>>>>> -		mr = &msg->payload.memory.regions[i];
>>>>> -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
>>>>> -		mr->userspace_addr = huges[i].addr;
>>>>> -		mr->memory_size = huges[i].size;
>>>>> -		mr->mmap_offset = 0;
>>>>> -		fds[i] = open(huges[i].path, O_RDWR);
>>>>> +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
>>>>> +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
>>>>> +			ms, rte_errno);
>>>>> +		return -1;
>>>>> +	}
>>>>> +
>>>>> +	start_addr = (uint64_t)(uintptr_t)ms->addr;
>>>>> +	end_addr = start_addr + ms->len;
>>>>> +
>>>>> +	for (i = 0; i < wa->region_nr; i++) {
>>>>
>>>> There has to be a better way than to run this loop on every segment. Maybe
>>>> store last-used region, and only do a region look up if there's a mismatch?
>>>> Generally, in single-file segments mode, you'll get lots of segments from
>>>> one memseg list one after the other, so you will need to do a lookup once
>>>> memseg list changes, instead of on each segment.
>>>
>>> We may have holes in one memseg list due to dynamic free.
>>> Do you mean we just need to do rte_memseg_contig_walk()
>>> and we can assume that fds of the contiguous memegs will
>>> be the same?
>>
>> No, i didn't mean that.
>>
>> Whether or not you are in single-file segments mode, you still need to scan
>> each segment. However, you lose your state when you exit this function, and
>> thus have to look up the appropriate memory region (that matches your fd)
>> again, over and over. It would be good if you could store last-used memory
>> region somewhere, so that next time you come back into this function, if the
>> memseg has the same fd, you will not have to look it up.
>>
>> Something like this:
>>
>> struct walk_arg {
>> 	*last_used;
>> 	<snip>
>> }
>>
>> int walk_func() {
>> 	<snip>
>> 	cur_region = wa->last_used; // check if it matches
>> 	if (cur->region->fd != fd) {
>> 		// fd is different - we've changed the segment
>> 		<snip>
>> 		wa->last_used = cur_region
>> 	}
>> }
> 
> Thanks for the code. :)
> 
>>
>> So, cache last used region to not have to look it up again, because chances
>> are, you won't have to. That way, you will still do region lookups, but only
>> if you have to - not every time.
> 
> I can do it, but I'm not sure this optimization is really
> necessary. Because this loop should be quite fast, as the
> max number of regions permitted by vhost-user is quite
> small. And actually we need to do that loop at least once
> for each packet in vhost-user's dequeue and enqueue path,
> i.e. the data path.

The number of regions is small, but the number of segments may be in the 
thousands. Think worst case - 8K segments in the 16th region - with my 
code, you will execute only 16 iterations on first segment and use "last 
used" for the rest of the segments, while with your code, it'll be 8K 
times 16 :)

You'll have to clarify the "for each packet" part, not sure i follow.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel
  2018-09-10  4:04         ` Tiwei Bie
@ 2018-09-17 10:18           ` Burakov, Anatoly
  0 siblings, 0 replies; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 10:18 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh, stable

On 10-Sep-18 5:04 AM, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 01:24:05PM +0100, Burakov, Anatoly wrote:
>> On 07-Sep-18 12:37 PM, Tiwei Bie wrote:
>>> On Fri, Sep 07, 2018 at 10:44:22AM +0100, Burakov, Anatoly wrote:
>>>> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
>>>>> It's possible to have much more hugepage backed memory regions
>>>>> than what vhost-kernel supports due to the memory hotplug, which
>>>>> may cause problems. A better solution is to have the virtio-user
>>>>> pass all the memory ranges reserved by DPDK to vhost-kernel.
>>>>>
>>>>> Fixes: 12ecb2f63b12 ("net/virtio-user: support memory hotplug")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> ---
>>>>>     drivers/net/virtio/virtio_user/vhost_kernel.c | 38 +++++++++----------
>>>>>     1 file changed, 18 insertions(+), 20 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
>>>>> index 897fee0af..9338166d9 100644
>>>>> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
>>>>> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
>>>>> @@ -70,41 +70,41 @@ static uint64_t vhost_req_user_to_kernel[] = {
>>>>>     	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
>>>>>     };
>>>>> -struct walk_arg {
>>>>> -	struct vhost_memory_kernel *vm;
>>>>> -	uint32_t region_nr;
>>>>> -};
>>>>>     static int
>>>>> -add_memory_region(const struct rte_memseg_list *msl __rte_unused,
>>>>> -		const struct rte_memseg *ms, size_t len, void *arg)
>>>>> +add_memseg_list(const struct rte_memseg_list *msl, void *arg)
>>>>>     {
>>>>> -	struct walk_arg *wa = arg;
>>>>> +	struct vhost_memory_kernel *vm = arg;
>>>>>     	struct vhost_memory_region *mr;
>>>>>     	void *start_addr;
>>>>> +	uint64_t len;
>>>>> -	if (wa->region_nr >= max_regions)
>>>>> +	if (vm->nregions >= max_regions)
>>>>>     		return -1;
>>>>> -	mr = &wa->vm->regions[wa->region_nr++];
>>>>> -	start_addr = ms->addr;
>>>>> +	start_addr = msl->base_va;
>>>>> +	len = msl->page_sz * msl->memseg_arr.len;
>>>>> +
>>>>> +	mr = &vm->regions[vm->nregions++];
>>>>>     	mr->guest_phys_addr = (uint64_t)(uintptr_t)start_addr;
>>>>>     	mr->userspace_addr = (uint64_t)(uintptr_t)start_addr;
>>>>>     	mr->memory_size = len;
>>>>> -	mr->mmap_offset = 0;
>>>>> +	mr->mmap_offset = 0; /* flags_padding */
>>>>> +
>>>>> +	PMD_DRV_LOG(DEBUG, "index=%u addr=%p len=%" PRIu64,
>>>>> +			vm->nregions - 1, start_addr, len);
>>>>>     	return 0;
>>>>>     }
>>>>> -/* By default, vhost kernel module allows 64 regions, but DPDK allows
>>>>> - * 256 segments. As a relief, below function merges those virtually
>>>>> - * adjacent memsegs into one region.
>>>>> +/* By default, vhost kernel module allows 64 regions, but DPDK may
>>>>> + * have much more memory regions. Below function will treat each
>>>>> + * contiguous memory space reserved by DPDK as one region.
>>>>>      */
>>>>>     static struct vhost_memory_kernel *
>>>>>     prepare_vhost_memory_kernel(void)
>>>>>     {
>>>>>     	struct vhost_memory_kernel *vm;
>>>>> -	struct walk_arg wa;
>>>>>     	vm = malloc(sizeof(struct vhost_memory_kernel) +
>>>>>     			max_regions *
>>>>> @@ -112,20 +112,18 @@ prepare_vhost_memory_kernel(void)
>>>>>     	if (!vm)
>>>>>     		return NULL;
>>>>> -	wa.region_nr = 0;
>>>>> -	wa.vm = vm;
>>>>> +	vm->nregions = 0;
>>>>> +	vm->padding = 0;
>>>>>     	/*
>>>>>     	 * The memory lock has already been taken by memory subsystem
>>>>>     	 * or virtio_user_start_device().
>>>>>     	 */
>>>>> -	if (rte_memseg_contig_walk_thread_unsafe(add_memory_region, &wa) < 0) {
>>>>> +	if (rte_memseg_list_walk_thread_unsafe(add_memseg_list, vm) < 0) {
>>>>>     		free(vm);
>>>>>     		return NULL;
>>>>>     	}
>>>>> -	vm->nregions = wa.region_nr;
>>>>> -	vm->padding = 0;
>>>>>     	return vm;
>>>>>     }
>>>>>
>>>>
>>>> Doesn't that assume single file segments mode?
>>>
>>> This is to find out the VA ranges reserved by memory subsystem.
>>> Why does it need to assume single file segments mode?
>>
>> If you are not in single-file segments mode, each individual page in a
>> VA-contiguous area will be behind a different fd - so it will be part of a
>> different region, would it not?
> 
> Above code is for vhost-kernel. Kernel doesn't need the
> fds to get the access to virtio-user process's memory.
> Kernel just needs to know the mappings between GPA (guest
> physical address) and VA (virtio-user's virtual address).
> 

Ah OK. Thanks for clarification!

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-17 10:17           ` Burakov, Anatoly
@ 2018-09-17 11:57             ` Tiwei Bie
  2018-09-17 13:06               ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Tiwei Bie @ 2018-09-17 11:57 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On Mon, Sep 17, 2018 at 11:17:42AM +0100, Burakov, Anatoly wrote:
> On 10-Sep-18 4:59 AM, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 01:21:35PM +0100, Burakov, Anatoly wrote:
> > > On 07-Sep-18 12:35 PM, Tiwei Bie wrote:
> > > > On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
> > > > > On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
> > > > > > Recently some memory APIs were introduced to allow users to
> > > > > > get the file descriptor and offset for each memory segment.
> > > > > > We can leverage those APIs to get rid of the /proc magic on
> > > > > > memory table preparation in vhost-user backend.
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > > -	for (i = 0; i < num; ++i) {
> > > > > > -		mr = &msg->payload.memory.regions[i];
> > > > > > -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
> > > > > > -		mr->userspace_addr = huges[i].addr;
> > > > > > -		mr->memory_size = huges[i].size;
> > > > > > -		mr->mmap_offset = 0;
> > > > > > -		fds[i] = open(huges[i].path, O_RDWR);
> > > > > > +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
> > > > > > +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
> > > > > > +			ms, rte_errno);
> > > > > > +		return -1;
> > > > > > +	}
> > > > > > +
> > > > > > +	start_addr = (uint64_t)(uintptr_t)ms->addr;
> > > > > > +	end_addr = start_addr + ms->len;
> > > > > > +
> > > > > > +	for (i = 0; i < wa->region_nr; i++) {
> > > > > 
> > > > > There has to be a better way than to run this loop on every segment. Maybe
> > > > > store last-used region, and only do a region look up if there's a mismatch?
> > > > > Generally, in single-file segments mode, you'll get lots of segments from
> > > > > one memseg list one after the other, so you will need to do a lookup once
> > > > > memseg list changes, instead of on each segment.
> > > > 
> > > > We may have holes in one memseg list due to dynamic free.
> > > > Do you mean we just need to do rte_memseg_contig_walk()
> > > > and we can assume that fds of the contiguous memegs will
> > > > be the same?
> > > 
> > > No, i didn't mean that.
> > > 
> > > Whether or not you are in single-file segments mode, you still need to scan
> > > each segment. However, you lose your state when you exit this function, and
> > > thus have to look up the appropriate memory region (that matches your fd)
> > > again, over and over. It would be good if you could store last-used memory
> > > region somewhere, so that next time you come back into this function, if the
> > > memseg has the same fd, you will not have to look it up.
> > > 
> > > Something like this:
> > > 
> > > struct walk_arg {
> > > 	*last_used;
> > > 	<snip>
> > > }
> > > 
> > > int walk_func() {
> > > 	<snip>
> > > 	cur_region = wa->last_used; // check if it matches
> > > 	if (cur->region->fd != fd) {
> > > 		// fd is different - we've changed the segment
> > > 		<snip>
> > > 		wa->last_used = cur_region
> > > 	}
> > > }
> > 
> > Thanks for the code. :)
> > 
> > > 
> > > So, cache last used region to not have to look it up again, because chances
> > > are, you won't have to. That way, you will still do region lookups, but only
> > > if you have to - not every time.
> > 
> > I can do it, but I'm not sure this optimization is really
> > necessary. Because this loop should be quite fast, as the
> > max number of regions permitted by vhost-user is quite
> > small. And actually we need to do that loop at least once
> > for each packet in vhost-user's dequeue and enqueue path,
> > i.e. the data path.
> 
> The number of regions is small, but the number of segments may be in the
> thousands. Think worst case - 8K segments in the 16th region

The number of regions permitted by vhost-user is 8.
And most likely, we just have two regions as the
single-file-segment mode is mandatory when using
2MB pages.

> - with my code,
> you will execute only 16 iterations on first segment and use "last used" for
> the rest of the segments,

We still need to do 8K iterations on the segments.

> while with your code, it'll be 8K times 16 :)

IMO, what we really need is a way to reduce "8K",
i.e. reduce the number of segments (which could
be thousands currently) we need to parse.

And the loop should be faster than the function
call to rte_memseg_get_fd_thread_unsafe() and
rte_memseg_get_fd_offset_thread_unsafe() (which
are also called for each segment).

> 
> You'll have to clarify the "for each packet" part, not sure i follow.

Take the vhost-PMD as an example, when doing Rx burst
and Tx burst, for each mbuf (i.e. packet), we need to
do that loop at least once.

> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings
  2018-09-17 11:57             ` Tiwei Bie
@ 2018-09-17 13:06               ` Burakov, Anatoly
  0 siblings, 0 replies; 23+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 13:06 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: maxime.coquelin, zhihong.wang, dev, seanbh

On 17-Sep-18 12:57 PM, Tiwei Bie wrote:
> On Mon, Sep 17, 2018 at 11:17:42AM +0100, Burakov, Anatoly wrote:
>> On 10-Sep-18 4:59 AM, Tiwei Bie wrote:
>>> On Fri, Sep 07, 2018 at 01:21:35PM +0100, Burakov, Anatoly wrote:
>>>> On 07-Sep-18 12:35 PM, Tiwei Bie wrote:
>>>>> On Fri, Sep 07, 2018 at 10:39:16AM +0100, Burakov, Anatoly wrote:
>>>>>> On 05-Sep-18 5:28 AM, Tiwei Bie wrote:
>>>>>>> Recently some memory APIs were introduced to allow users to
>>>>>>> get the file descriptor and offset for each memory segment.
>>>>>>> We can leverage those APIs to get rid of the /proc magic on
>>>>>>> memory table preparation in vhost-user backend.
>>>>>>>
>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> -	for (i = 0; i < num; ++i) {
>>>>>>> -		mr = &msg->payload.memory.regions[i];
>>>>>>> -		mr->guest_phys_addr = huges[i].addr; /* use vaddr! */
>>>>>>> -		mr->userspace_addr = huges[i].addr;
>>>>>>> -		mr->memory_size = huges[i].size;
>>>>>>> -		mr->mmap_offset = 0;
>>>>>>> -		fds[i] = open(huges[i].path, O_RDWR);
>>>>>>> +	if (rte_memseg_get_fd_offset_thread_unsafe(ms, &offset) < 0) {
>>>>>>> +		PMD_DRV_LOG(ERR, "Failed to get offset, ms=%p rte_errno=%d",
>>>>>>> +			ms, rte_errno);
>>>>>>> +		return -1;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	start_addr = (uint64_t)(uintptr_t)ms->addr;
>>>>>>> +	end_addr = start_addr + ms->len;
>>>>>>> +
>>>>>>> +	for (i = 0; i < wa->region_nr; i++) {
>>>>>>
>>>>>> There has to be a better way than to run this loop on every segment. Maybe
>>>>>> store last-used region, and only do a region look up if there's a mismatch?
>>>>>> Generally, in single-file segments mode, you'll get lots of segments from
>>>>>> one memseg list one after the other, so you will need to do a lookup once
>>>>>> memseg list changes, instead of on each segment.
>>>>>
>>>>> We may have holes in one memseg list due to dynamic free.
>>>>> Do you mean we just need to do rte_memseg_contig_walk()
>>>>> and we can assume that fds of the contiguous memegs will
>>>>> be the same?
>>>>
>>>> No, i didn't mean that.
>>>>
>>>> Whether or not you are in single-file segments mode, you still need to scan
>>>> each segment. However, you lose your state when you exit this function, and
>>>> thus have to look up the appropriate memory region (that matches your fd)
>>>> again, over and over. It would be good if you could store last-used memory
>>>> region somewhere, so that next time you come back into this function, if the
>>>> memseg has the same fd, you will not have to look it up.
>>>>
>>>> Something like this:
>>>>
>>>> struct walk_arg {
>>>> 	*last_used;
>>>> 	<snip>
>>>> }
>>>>
>>>> int walk_func() {
>>>> 	<snip>
>>>> 	cur_region = wa->last_used; // check if it matches
>>>> 	if (cur->region->fd != fd) {
>>>> 		// fd is different - we've changed the segment
>>>> 		<snip>
>>>> 		wa->last_used = cur_region
>>>> 	}
>>>> }
>>>
>>> Thanks for the code. :)
>>>
>>>>
>>>> So, cache last used region to not have to look it up again, because chances
>>>> are, you won't have to. That way, you will still do region lookups, but only
>>>> if you have to - not every time.
>>>
>>> I can do it, but I'm not sure this optimization is really
>>> necessary. Because this loop should be quite fast, as the
>>> max number of regions permitted by vhost-user is quite
>>> small. And actually we need to do that loop at least once
>>> for each packet in vhost-user's dequeue and enqueue path,
>>> i.e. the data path.
>>
>> The number of regions is small, but the number of segments may be in the
>> thousands. Think worst case - 8K segments in the 16th region
> 
> The number of regions permitted by vhost-user is 8.
> And most likely, we just have two regions as the
> single-file-segment mode is mandatory when using
> 2MB pages.



> 
>> - with my code,
>> you will execute only 16 iterations on first segment and use "last used" for
>> the rest of the segments,
> 
> We still need to do 8K iterations on the segments.

Yes, but not 8K times (up to) 8 regions. Anyway, it's your driver, up to 
you :)

> 
>> while with your code, it'll be 8K times 16 :)
> 
> IMO, what we really need is a way to reduce "8K",
> i.e. reduce the number of segments (which could
> be thousands currently) we need to parse.
> 
> And the loop should be faster than the function
> call to rte_memseg_get_fd_thread_unsafe() and
> rte_memseg_get_fd_offset_thread_unsafe() (which
> are also called for each segment).

Unfortunately, we cannot do that, because we cannot make any assumptions 
about underlying structure of fd's.

Technically, i could introduce an fd-centric walk function (i.e. so 
that, instead of per-memseg or per-contiguous area walk, you'd get a 
per-fd walk), but i really don't want to pollute the API with another 
walk function.

> 
>>
>> You'll have to clarify the "for each packet" part, not sure i follow.
> 
> Take the vhost-PMD as an example, when doing Rx burst
> and Tx burst, for each mbuf (i.e. packet), we need to
> do that loop at least once.

Ah, OK, i get you - if it's fast enough to use on the data path, it's 
fast enough for memory mapping code :) Fair enough.

> 
>>
>> -- 
>> Thanks,
>> Anatoly
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table
  2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
                   ` (3 preceding siblings ...)
  2018-09-11 13:11 ` [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Maxime Coquelin
@ 2018-09-20  7:35 ` Maxime Coquelin
  4 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2018-09-20  7:35 UTC (permalink / raw)
  To: Tiwei Bie, zhihong.wang, dev; +Cc: seanbh, anatoly.burakov



On 09/05/2018 06:28 AM, Tiwei Bie wrote:
> This series consists of some fixes and improvements for virtio-user's
> memory table preparation.
> 
> This series supersedes below patches:
> https://patches.dpdk.org/patch/43807/
> https://patches.dpdk.org/patch/43918/
> 
> The second patch in this series depends on the below patch set:
> http://patches.dpdk.org/project/dpdk/list/?series=1177
> 
> Tiwei Bie (3):
>    net/virtio-user: fix deadlock in memory events callback
>    net/virtio-user: avoid parsing process mappings
>    net/virtio-user: fix memory hotplug support in vhost-kernel
> 
>   drivers/net/virtio/virtio_user/vhost_kernel.c |  50 +++--
>   drivers/net/virtio/virtio_user/vhost_user.c   | 211 ++++++++----------
>   .../net/virtio/virtio_user/virtio_user_dev.c  |  19 ++
>   3 files changed, 135 insertions(+), 145 deletions(-)
> 

Applied to dpdk-next-virtio/master.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-09-20  7:35 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05  4:28 [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Tiwei Bie
2018-09-05  4:28 ` [dpdk-dev] [PATCH 1/3] net/virtio-user: fix deadlock in memory events callback Tiwei Bie
2018-09-05  8:10   ` Sean Harte
2018-09-07  9:30   ` Burakov, Anatoly
2018-09-11 12:52   ` Maxime Coquelin
2018-09-05  4:28 ` [dpdk-dev] [PATCH 2/3] net/virtio-user: avoid parsing process mappings Tiwei Bie
2018-09-07  9:39   ` Burakov, Anatoly
2018-09-07 11:35     ` Tiwei Bie
2018-09-07 12:21       ` Burakov, Anatoly
2018-09-10  3:59         ` Tiwei Bie
2018-09-17 10:17           ` Burakov, Anatoly
2018-09-17 11:57             ` Tiwei Bie
2018-09-17 13:06               ` Burakov, Anatoly
2018-09-11 12:58   ` Maxime Coquelin
2018-09-05  4:28 ` [dpdk-dev] [PATCH 3/3] net/virtio-user: fix memory hotplug support in vhost-kernel Tiwei Bie
2018-09-07  9:44   ` Burakov, Anatoly
2018-09-07 11:37     ` Tiwei Bie
2018-09-07 12:24       ` Burakov, Anatoly
2018-09-10  4:04         ` Tiwei Bie
2018-09-17 10:18           ` Burakov, Anatoly
2018-09-11 13:10   ` Maxime Coquelin
2018-09-11 13:11 ` [dpdk-dev] [PATCH 0/3] Some fixes/improvements for virtio-user memory table Maxime Coquelin
2018-09-20  7:35 ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).