[dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation
@ 2021-06-18 14:03 Maxime Coquelin
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc Maxime Coquelin
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin

This patch series first fixes missing reallocations of some
Virtqueue and device metadata.

Then, it improves the numa_realloc function by using
rte_realloc_socket API that takes care of the memcpy &
freeing. The VQs NUMA IDs are also saved in the VQ metadata
and used for every allocations so that all allocations
before NUMA realloc are on the same VQ, later ones are
allocated on the proper one.

Finally inflight feature metada are converted from calloc()
to rte_zmalloc_socket() and their reallocation is handled
in numa_realloc().

Changes in v6:
==============
- Send the complete series

Changes in v5:
==============
- Do not reallocate if VS is ready (Chenbo)
- Fix typos & cosmetics (Chenbo)
- Improve numa_realloc() comment (Chenbo)

Changes in v4:
==============
- Check Vhose device numa node to avoid rte_realloc_socket
  to realloc even if already right node/size/align.

Changes in v3:
==============
- Fix copy/paste issues (David)
- Ad new patch to fix multiqueue reallocation

Changes in v2:
==============
- Add missing NUMA realloc in patch 6

Maxime Coquelin (7):
  vhost: fix missing memory table NUMA realloc
  vhost: fix missing guest pages table NUMA realloc
  vhost: fix missing cache logging NUMA realloc
  vhost: fix NUMA reallocation with multiqueue
  vhost: improve NUMA reallocation
  vhost: allocate all data on same node as virtqueue
  vhost: convert inflight data to DPDK allocation API

 lib/vhost/vhost.c      |  38 +++---
 lib/vhost/vhost.h      |   1 +
 lib/vhost/vhost_user.c | 274 ++++++++++++++++++++++++++---------------
 3 files changed, 196 insertions(+), 117 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  2:26   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages " Maxime Coquelin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin, stable

When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the Vhost memory table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: 552e8fd3d2b4 ("vhost: simplify memory regions handling")
Cc: stable@dpdk.org

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost_user.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8f0eba6412..b5a84f3dcd 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -473,8 +473,8 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 }
 
 /*
- * Reallocate virtio_dev and vhost_virtqueue data structure to make them on the
- * same numa node as the memory of vring descriptor.
+ * Reallocate virtio_dev, vhost_virtqueue and related data structures to
+ * make them on the same numa node as the memory of vring descriptor.
  */
 #ifdef RTE_LIBRTE_VHOST_NUMA
 static struct virtio_net*
@@ -557,6 +557,9 @@ numa_realloc(struct virtio_net *dev, int index)
 		goto out;
 	}
 	if (oldnode != newnode) {
+		struct rte_vhost_memory *old_mem;
+		ssize_t mem_size;
+
 		VHOST_LOG_CONFIG(INFO,
 			"reallocate dev from %d to %d node\n",
 			oldnode, newnode);
@@ -568,6 +571,18 @@ numa_realloc(struct virtio_net *dev, int index)
 
 		memcpy(dev, old_dev, sizeof(*dev));
 		rte_free(old_dev);
+
+		mem_size = sizeof(struct rte_vhost_memory) +
+			sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
+		old_mem = dev->mem;
+		dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
+		if (!dev->mem) {
+			dev->mem = old_mem;
+			goto out;
+		}
+
+		memcpy(dev->mem, old_mem, mem_size);
+		rte_free(old_mem);
 	}
 
 out:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages table NUMA realloc
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  2:26   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging " Maxime Coquelin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin, stable

When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost_user.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index b5a84f3dcd..5fb055ea2e 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -558,7 +558,8 @@ numa_realloc(struct virtio_net *dev, int index)
 	}
 	if (oldnode != newnode) {
 		struct rte_vhost_memory *old_mem;
-		ssize_t mem_size;
+		struct guest_page *old_gp;
+		ssize_t mem_size, gp_size;
 
 		VHOST_LOG_CONFIG(INFO,
 			"reallocate dev from %d to %d node\n",
@@ -583,6 +584,17 @@ numa_realloc(struct virtio_net *dev, int index)
 
 		memcpy(dev->mem, old_mem, mem_size);
 		rte_free(old_mem);
+
+		gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
+		old_gp = dev->guest_pages;
+		dev->guest_pages = rte_malloc_socket(NULL, gp_size, RTE_CACHE_LINE_SIZE, newnode);
+		if (!dev->guest_pages) {
+			dev->guest_pages = old_gp;
+			goto out;
+		}
+
+		memcpy(dev->guest_pages, old_gp, gp_size);
+		rte_free(old_gp);
 	}
 
 out:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc Maxime Coquelin
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages " Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  2:50   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue Maxime Coquelin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin

When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.

Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost_user.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 5fb055ea2e..82adf80fe5 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
 			vq->batch_copy_elems = new_batch_copy_elems;
 		}
 
+		if (vq->log_cache) {
+			struct log_cache_entry *log_cache;
+
+			log_cache = rte_realloc_socket(vq->log_cache,
+					sizeof(struct log_cache_entry) * VHOST_LOG_CACHE_NR,
+					0, newnode);
+			if (log_cache)
+				vq->log_cache = log_cache;
+		}
+
 		rte_free(old_vq);
 	}
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
                   ` (2 preceding siblings ...)
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging " Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  2:56   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation Maxime Coquelin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin, stable

Since the Vhost-user device initialization has been reworked,
enabling the application to start using the device as soon as
the first queue pair is ready, NUMA reallocation no more
happened on queue pairs other than the first one since
numa_realloc() was returning early if the device was running.

This patch fixes this issue by only preventing the device
metadata to be allocated if the device is running. For the
virtqueues, a vring state change notification is sent to
notify the application of its disablement. Since the callback
is supposed to be blocking, it is safe to reallocate it
afterwards.

Fixes: d0fcc38f5fa4 ("vhost: improve device readiness notifications")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost_user.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 82adf80fe5..51b96a0716 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -488,12 +488,16 @@ numa_realloc(struct virtio_net *dev, int index)
 	struct batch_copy_elem *new_batch_copy_elems;
 	int ret;
 
-	if (dev->flags & VIRTIO_DEV_RUNNING)
-		return dev;
-
 	old_dev = dev;
 	vq = old_vq = dev->virtqueue[index];
 
+	/*
+	 * If VQ is ready, it is too late to reallocate, it certainly already
+	 * happened anyway on VHOST_USER_SET_VRING_ADRR.
+	 */
+	if (vq->ready)
+		return dev;
+
 	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
 			    MPOL_F_NODE | MPOL_F_ADDR);
 
@@ -558,6 +562,9 @@ numa_realloc(struct virtio_net *dev, int index)
 		rte_free(old_vq);
 	}
 
+	if (dev->flags & VIRTIO_DEV_RUNNING)
+		goto out;
+
 	/* check if we need to reallocate dev */
 	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
 			    MPOL_F_NODE | MPOL_F_ADDR);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
                   ` (3 preceding siblings ...)
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  7:26   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue Maxime Coquelin
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API Maxime Coquelin
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin

This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost_user.c | 186 ++++++++++++++++++-----------------------
 1 file changed, 81 insertions(+), 105 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 51b96a0716..d6ec4000c3 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -480,16 +480,17 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 static struct virtio_net*
 numa_realloc(struct virtio_net *dev, int index)
 {
-	int oldnode, newnode;
+	int node, dev_node;
 	struct virtio_net *old_dev;
-	struct vhost_virtqueue *old_vq, *vq;
-	struct vring_used_elem *new_shadow_used_split;
-	struct vring_used_elem_packed *new_shadow_used_packed;
-	struct batch_copy_elem *new_batch_copy_elems;
+	struct vhost_virtqueue *vq;
+	struct batch_copy_elem *bce;
+	struct guest_page *gp;
+	struct rte_vhost_memory *mem;
+	size_t mem_size;
 	int ret;
 
 	old_dev = dev;
-	vq = old_vq = dev->virtqueue[index];
+	vq = dev->virtqueue[index];
 
 	/*
 	 * If VQ is ready, it is too late to reallocate, it certainly already
@@ -498,128 +499,103 @@ numa_realloc(struct virtio_net *dev, int index)
 	if (vq->ready)
 		return dev;
 
-	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
-			    MPOL_F_NODE | MPOL_F_ADDR);
-
-	/* check if we need to reallocate vq */
-	ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
-			     MPOL_F_NODE | MPOL_F_ADDR);
+	ret = get_mempolicy(&node, NULL, 0, vq->desc, MPOL_F_NODE | MPOL_F_ADDR);
 	if (ret) {
-		VHOST_LOG_CONFIG(ERR,
-			"Unable to get vq numa information.\n");
+		VHOST_LOG_CONFIG(ERR, "Unable to get virtqueue %d numa information.\n", index);
 		return dev;
 	}
-	if (oldnode != newnode) {
-		VHOST_LOG_CONFIG(INFO,
-			"reallocate vq from %d to %d node\n", oldnode, newnode);
-		vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
-		if (!vq)
-			return dev;
 
-		memcpy(vq, old_vq, sizeof(*vq));
+	vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
+	if (!vq) {
+		VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on node %d\n",
+				index, node);
+		return dev;
+	}
 
-		if (vq_is_packed(dev)) {
-			new_shadow_used_packed = rte_malloc_socket(NULL,
-					vq->size *
-					sizeof(struct vring_used_elem_packed),
-					RTE_CACHE_LINE_SIZE,
-					newnode);
-			if (new_shadow_used_packed) {
-				rte_free(vq->shadow_used_packed);
-				vq->shadow_used_packed = new_shadow_used_packed;
-			}
-		} else {
-			new_shadow_used_split = rte_malloc_socket(NULL,
-					vq->size *
-					sizeof(struct vring_used_elem),
-					RTE_CACHE_LINE_SIZE,
-					newnode);
-			if (new_shadow_used_split) {
-				rte_free(vq->shadow_used_split);
-				vq->shadow_used_split = new_shadow_used_split;
-			}
+	if (vq != dev->virtqueue[index]) {
+		VHOST_LOG_CONFIG(INFO, "reallocated virtqueue on node %d\n", node);
+		dev->virtqueue[index] = vq;
+		vhost_user_iotlb_init(dev, index);
+	}
+
+	if (vq_is_packed(dev)) {
+		struct vring_used_elem_packed *sup;
+
+		sup = rte_realloc_socket(vq->shadow_used_packed, vq->size * sizeof(*sup),
+				RTE_CACHE_LINE_SIZE, node);
+		if (!sup) {
+			VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow packed on node %d\n", node);
+			return dev;
 		}
+		vq->shadow_used_packed = sup;
+	} else {
+		struct vring_used_elem *sus;
 
-		new_batch_copy_elems = rte_malloc_socket(NULL,
-			vq->size * sizeof(struct batch_copy_elem),
-			RTE_CACHE_LINE_SIZE,
-			newnode);
-		if (new_batch_copy_elems) {
-			rte_free(vq->batch_copy_elems);
-			vq->batch_copy_elems = new_batch_copy_elems;
+		sus = rte_realloc_socket(vq->shadow_used_split, vq->size * sizeof(*sus),
+				RTE_CACHE_LINE_SIZE, node);
+		if (!sus) {
+			VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow split on node %d\n", node);
+			return dev;
 		}
+		vq->shadow_used_split = sus;
+	}
 
-		if (vq->log_cache) {
-			struct log_cache_entry *log_cache;
+	bce = rte_realloc_socket(vq->batch_copy_elems, vq->size * sizeof(*bce),
+			RTE_CACHE_LINE_SIZE, node);
+	if (!bce) {
+		VHOST_LOG_CONFIG(ERR, "Failed to realloc batch copy elem on node %d\n", node);
+		return dev;
+	}
+	vq->batch_copy_elems = bce;
 
-			log_cache = rte_realloc_socket(vq->log_cache,
-					sizeof(struct log_cache_entry) * VHOST_LOG_CACHE_NR,
-					0, newnode);
-			if (log_cache)
-				vq->log_cache = log_cache;
-		}
+	if (vq->log_cache) {
+		struct log_cache_entry *lc;
 
-		rte_free(old_vq);
+		lc = rte_realloc_socket(vq->log_cache, sizeof(*lc) * VHOST_LOG_CACHE_NR, 0, node);
+		if (!lc) {
+			VHOST_LOG_CONFIG(ERR, "Failed to realloc log cache on node %d\n", node);
+			return dev;
+		}
+		vq->log_cache = lc;
 	}
 
 	if (dev->flags & VIRTIO_DEV_RUNNING)
-		goto out;
+		return dev;
 
-	/* check if we need to reallocate dev */
-	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
-			    MPOL_F_NODE | MPOL_F_ADDR);
+	ret = get_mempolicy(&dev_node, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR);
 	if (ret) {
-		VHOST_LOG_CONFIG(ERR,
-			"Unable to get dev numa information.\n");
-		goto out;
+		VHOST_LOG_CONFIG(ERR, "Unable to get Virtio dev %d numa information.\n", dev->vid);
+		return dev;
 	}
-	if (oldnode != newnode) {
-		struct rte_vhost_memory *old_mem;
-		struct guest_page *old_gp;
-		ssize_t mem_size, gp_size;
-
-		VHOST_LOG_CONFIG(INFO,
-			"reallocate dev from %d to %d node\n",
-			oldnode, newnode);
-		dev = rte_malloc_socket(NULL, sizeof(*dev), 0, newnode);
-		if (!dev) {
-			dev = old_dev;
-			goto out;
-		}
-
-		memcpy(dev, old_dev, sizeof(*dev));
-		rte_free(old_dev);
-
-		mem_size = sizeof(struct rte_vhost_memory) +
-			sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
-		old_mem = dev->mem;
-		dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
-		if (!dev->mem) {
-			dev->mem = old_mem;
-			goto out;
-		}
-
-		memcpy(dev->mem, old_mem, mem_size);
-		rte_free(old_mem);
 
-		gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
-		old_gp = dev->guest_pages;
-		dev->guest_pages = rte_malloc_socket(NULL, gp_size, RTE_CACHE_LINE_SIZE, newnode);
-		if (!dev->guest_pages) {
-			dev->guest_pages = old_gp;
-			goto out;
-		}
+	if (dev_node == node)
+		return dev;
 
-		memcpy(dev->guest_pages, old_gp, gp_size);
-		rte_free(old_gp);
+	dev = rte_realloc_socket(old_dev, sizeof(*dev), 0, node);
+	if (!dev) {
+		VHOST_LOG_CONFIG(ERR, "Failed to realloc dev on node %d\n", node);
+		return old_dev;
 	}
 
-out:
-	dev->virtqueue[index] = vq;
+	VHOST_LOG_CONFIG(INFO, "reallocated device on node %d\n", node);
 	vhost_devices[dev->vid] = dev;
 
-	if (old_vq != vq)
-		vhost_user_iotlb_init(dev, index);
+	mem_size = sizeof(struct rte_vhost_memory) +
+		sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
+	mem = rte_realloc_socket(dev->mem, mem_size, 0, node);
+	if (!mem) {
+		VHOST_LOG_CONFIG(ERR, "Failed to realloc mem table on node %d\n", node);
+		return dev;
+	}
+	dev->mem = mem;
+
+	gp = rte_realloc_socket(dev->guest_pages, dev->max_guest_pages * sizeof(*gp),
+			RTE_CACHE_LINE_SIZE, node);
+	if (!gp) {
+		VHOST_LOG_CONFIG(ERR, "Failed to realloc guest pages on node %d\n", node);
+		return dev;
+	}
+	dev->guest_pages = gp;
 
 	return dev;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
                   ` (4 preceding siblings ...)
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  7:26   ` Xia, Chenbo
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API Maxime Coquelin
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin

This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.

While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.

For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c      | 34 ++++++++++++++++------------------
 lib/vhost/vhost.h      |  1 +
 lib/vhost/vhost_user.c | 41 ++++++++++++++++++++++++++++-------------
 3 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index c96f6335c8..0000cd3297 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -261,7 +261,7 @@ vhost_alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	uint64_t src, dst;
 	uint64_t len, remain = desc_len;
 
-	idesc = rte_malloc(__func__, desc_len, 0);
+	idesc = rte_malloc_socket(__func__, desc_len, 0, vq->numa_node);
 	if (unlikely(!idesc))
 		return NULL;
 
@@ -549,6 +549,7 @@ static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
 	struct vhost_virtqueue *vq;
+	int numa_node = SOCKET_ID_ANY;
 
 	if (vring_idx >= VHOST_MAX_VRING) {
 		VHOST_LOG_CONFIG(ERR,
@@ -570,6 +571,15 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
 	vq->notif_enable = VIRTIO_UNINITIALIZED_NOTIF;
 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	if (get_mempolicy(&numa_node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
+		VHOST_LOG_CONFIG(ERR, "(%d) failed to query numa node: %s\n",
+			dev->vid, rte_strerror(errno));
+		numa_node = SOCKET_ID_ANY;
+	}
+#endif
+	vq->numa_node = numa_node;
+
 	vhost_user_iotlb_init(dev, vring_idx);
 }
 
@@ -1616,7 +1626,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	struct vhost_virtqueue *vq;
 	struct virtio_net *dev = get_device(vid);
 	struct rte_vhost_async_features f;
-	int node;
 
 	if (dev == NULL || ops == NULL)
 		return -1;
@@ -1651,20 +1660,9 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 		goto reg_out;
 	}
 
-#ifdef RTE_LIBRTE_VHOST_NUMA
-	if (get_mempolicy(&node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
-		VHOST_LOG_CONFIG(ERR,
-			"unable to get numa information in async register. "
-			"allocating async buffer memory on the caller thread node\n");
-		node = SOCKET_ID_ANY;
-	}
-#else
-	node = SOCKET_ID_ANY;
-#endif
-
 	vq->async_pkts_info = rte_malloc_socket(NULL,
 			vq->size * sizeof(struct async_inflight_info),
-			RTE_CACHE_LINE_SIZE, node);
+			RTE_CACHE_LINE_SIZE, vq->numa_node);
 	if (!vq->async_pkts_info) {
 		vhost_free_async_mem(vq);
 		VHOST_LOG_CONFIG(ERR,
@@ -1675,7 +1673,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 
 	vq->it_pool = rte_malloc_socket(NULL,
 			VHOST_MAX_ASYNC_IT * sizeof(struct rte_vhost_iov_iter),
-			RTE_CACHE_LINE_SIZE, node);
+			RTE_CACHE_LINE_SIZE, vq->numa_node);
 	if (!vq->it_pool) {
 		vhost_free_async_mem(vq);
 		VHOST_LOG_CONFIG(ERR,
@@ -1686,7 +1684,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 
 	vq->vec_pool = rte_malloc_socket(NULL,
 			VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
-			RTE_CACHE_LINE_SIZE, node);
+			RTE_CACHE_LINE_SIZE, vq->numa_node);
 	if (!vq->vec_pool) {
 		vhost_free_async_mem(vq);
 		VHOST_LOG_CONFIG(ERR,
@@ -1698,7 +1696,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	if (vq_is_packed(dev)) {
 		vq->async_buffers_packed = rte_malloc_socket(NULL,
 			vq->size * sizeof(struct vring_used_elem_packed),
-			RTE_CACHE_LINE_SIZE, node);
+			RTE_CACHE_LINE_SIZE, vq->numa_node);
 		if (!vq->async_buffers_packed) {
 			vhost_free_async_mem(vq);
 			VHOST_LOG_CONFIG(ERR,
@@ -1709,7 +1707,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	} else {
 		vq->async_descs_split = rte_malloc_socket(NULL,
 			vq->size * sizeof(struct vring_used_elem),
-			RTE_CACHE_LINE_SIZE, node);
+			RTE_CACHE_LINE_SIZE, vq->numa_node);
 		if (!vq->async_descs_split) {
 			vhost_free_async_mem(vq);
 			VHOST_LOG_CONFIG(ERR,
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 8078ddff79..8ffe387556 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -164,6 +164,7 @@ struct vhost_virtqueue {
 
 	uint16_t		batch_copy_nb_elems;
 	struct batch_copy_elem	*batch_copy_elems;
+	int			numa_node;
 	bool			used_wrap_counter;
 	bool			avail_wrap_counter;
 
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index d6ec4000c3..d8ec087dfc 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -433,10 +433,10 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 	if (vq_is_packed(dev)) {
 		if (vq->shadow_used_packed)
 			rte_free(vq->shadow_used_packed);
-		vq->shadow_used_packed = rte_malloc(NULL,
+		vq->shadow_used_packed = rte_malloc_socket(NULL,
 				vq->size *
 				sizeof(struct vring_used_elem_packed),
-				RTE_CACHE_LINE_SIZE);
+				RTE_CACHE_LINE_SIZE, vq->numa_node);
 		if (!vq->shadow_used_packed) {
 			VHOST_LOG_CONFIG(ERR,
 					"failed to allocate memory for shadow used ring.\n");
@@ -447,9 +447,9 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 		if (vq->shadow_used_split)
 			rte_free(vq->shadow_used_split);
 
-		vq->shadow_used_split = rte_malloc(NULL,
+		vq->shadow_used_split = rte_malloc_socket(NULL,
 				vq->size * sizeof(struct vring_used_elem),
-				RTE_CACHE_LINE_SIZE);
+				RTE_CACHE_LINE_SIZE, vq->numa_node);
 
 		if (!vq->shadow_used_split) {
 			VHOST_LOG_CONFIG(ERR,
@@ -460,9 +460,9 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 
 	if (vq->batch_copy_elems)
 		rte_free(vq->batch_copy_elems);
-	vq->batch_copy_elems = rte_malloc(NULL,
+	vq->batch_copy_elems = rte_malloc_socket(NULL,
 				vq->size * sizeof(struct batch_copy_elem),
-				RTE_CACHE_LINE_SIZE);
+				RTE_CACHE_LINE_SIZE, vq->numa_node);
 	if (!vq->batch_copy_elems) {
 		VHOST_LOG_CONFIG(ERR,
 			"failed to allocate memory for batching copy.\n");
@@ -505,6 +505,9 @@ numa_realloc(struct virtio_net *dev, int index)
 		return dev;
 	}
 
+	if (node == vq->numa_node)
+		goto out_dev_realloc;
+
 	vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
 	if (!vq) {
 		VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on node %d\n",
@@ -559,6 +562,10 @@ numa_realloc(struct virtio_net *dev, int index)
 		vq->log_cache = lc;
 	}
 
+	vq->numa_node = node;
+
+out_dev_realloc:
+
 	if (dev->flags & VIRTIO_DEV_RUNNING)
 		return dev;
 
@@ -1213,7 +1220,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	struct virtio_net *dev = *pdev;
 	struct VhostUserMemory *memory = &msg->payload.memory;
 	struct rte_vhost_mem_region *reg;
-
+	int numa_node = SOCKET_ID_ANY;
 	uint64_t mmap_offset;
 	uint32_t i;
 
@@ -1253,13 +1260,21 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 		for (i = 0; i < dev->nr_vring; i++)
 			vhost_user_iotlb_flush_all(dev->virtqueue[i]);
 
+	/*
+	 * If VQ 0 has already been allocated, try to allocate on the same
+	 * NUMA node. It can be reallocated later in numa_realloc().
+	 */
+	if (dev->nr_vring > 0)
+		numa_node = dev->virtqueue[0]->numa_node;
+
 	dev->nr_guest_pages = 0;
 	if (dev->guest_pages == NULL) {
 		dev->max_guest_pages = 8;
-		dev->guest_pages = rte_zmalloc(NULL,
+		dev->guest_pages = rte_zmalloc_socket(NULL,
 					dev->max_guest_pages *
 					sizeof(struct guest_page),
-					RTE_CACHE_LINE_SIZE);
+					RTE_CACHE_LINE_SIZE,
+					numa_node);
 		if (dev->guest_pages == NULL) {
 			VHOST_LOG_CONFIG(ERR,
 				"(%d) failed to allocate memory "
@@ -1269,8 +1284,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg,
 		}
 	}
 
-	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct rte_vhost_memory) +
-		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0);
+	dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct rte_vhost_memory) +
+		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0, numa_node);
 	if (dev->mem == NULL) {
 		VHOST_LOG_CONFIG(ERR,
 			"(%d) failed to allocate memory for dev->mem\n",
@@ -2193,9 +2208,9 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 		rte_free(vq->log_cache);
 		vq->log_cache = NULL;
 		vq->log_cache_nb_elem = 0;
-		vq->log_cache = rte_zmalloc("vq log cache",
+		vq->log_cache = rte_malloc_socket("vq log cache",
 				sizeof(struct log_cache_entry) * VHOST_LOG_CACHE_NR,
-				0);
+				0, vq->numa_node);
 		/*
 		 * If log cache alloc fail, don't fail migration, but no
 		 * caching will be done, which will impact performance
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API
  2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
                   ` (5 preceding siblings ...)
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue Maxime Coquelin
@ 2021-06-18 14:03 ` Maxime Coquelin
  2021-06-25  7:26   ` Xia, Chenbo
  6 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-18 14:03 UTC (permalink / raw)
  To: dev, david.marchand, chenbo.xia; +Cc: Maxime Coquelin

Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/vhost/vhost.c      |  4 +--
 lib/vhost/vhost_user.c | 67 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 0000cd3297..53a470f547 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -312,10 +312,10 @@ cleanup_vq_inflight(struct virtio_net *dev, struct vhost_virtqueue *vq)
 
 	if (vq->resubmit_inflight) {
 		if (vq->resubmit_inflight->resubmit_list) {
-			free(vq->resubmit_inflight->resubmit_list);
+			rte_free(vq->resubmit_inflight->resubmit_list);
 			vq->resubmit_inflight->resubmit_list = NULL;
 		}
-		free(vq->resubmit_inflight);
+		rte_free(vq->resubmit_inflight);
 		vq->resubmit_inflight = NULL;
 	}
 }
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index d8ec087dfc..6a41071e1d 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -188,7 +188,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
 			dev->inflight_info->fd = -1;
 		}
 
-		free(dev->inflight_info);
+		rte_free(dev->inflight_info);
 		dev->inflight_info = NULL;
 	}
 
@@ -562,6 +562,31 @@ numa_realloc(struct virtio_net *dev, int index)
 		vq->log_cache = lc;
 	}
 
+	if (vq->resubmit_inflight) {
+		struct rte_vhost_resubmit_info *ri;
+
+		ri = rte_realloc_socket(vq->resubmit_inflight, sizeof(*ri), 0, node);
+		if (!ri) {
+			VHOST_LOG_CONFIG(ERR, "Failed to realloc resubmit inflight on node %d\n",
+					node);
+			return dev;
+		}
+		vq->resubmit_inflight = ri;
+
+		if (ri->resubmit_list) {
+			struct rte_vhost_resubmit_desc *rd;
+
+			rd = rte_realloc_socket(ri->resubmit_list, sizeof(*rd) * ri->resubmit_num,
+					0, node);
+			if (!rd) {
+				VHOST_LOG_CONFIG(ERR, "Failed to realloc resubmit list on node %d\n",
+						node);
+				return dev;
+			}
+			ri->resubmit_list = rd;
+		}
+	}
+
 	vq->numa_node = node;
 
 out_dev_realloc:
@@ -1491,6 +1516,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
 	uint16_t num_queues, queue_size;
 	struct virtio_net *dev = *pdev;
 	int fd, i, j;
+	int numa_node = SOCKET_ID_ANY;
 	void *addr;
 
 	if (msg->size != sizeof(msg->payload.inflight)) {
@@ -1500,9 +1526,16 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
 		return RTE_VHOST_MSG_RESULT_ERR;
 	}
 
+	/*
+	 * If VQ 0 has already been allocated, try to allocate on the same
+	 * NUMA node. It can be reallocated later in numa_realloc().
+	 */
+	if (dev->nr_vring > 0)
+		numa_node = dev->virtqueue[0]->numa_node;
+
 	if (dev->inflight_info == NULL) {
-		dev->inflight_info = calloc(1,
-					    sizeof(struct inflight_mem_info));
+		dev->inflight_info = rte_zmalloc_socket("inflight_info",
+				sizeof(struct inflight_mem_info), 0, numa_node);
 		if (!dev->inflight_info) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to alloc dev inflight area\n");
@@ -1585,6 +1618,7 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
 	struct vhost_virtqueue *vq;
 	void *addr;
 	int fd, i;
+	int numa_node = SOCKET_ID_ANY;
 
 	fd = msg->fds[0];
 	if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
@@ -1618,9 +1652,16 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
 		"set_inflight_fd pervq_inflight_size: %d\n",
 		pervq_inflight_size);
 
+	/*
+	 * If VQ 0 has already been allocated, try to allocate on the same
+	 * NUMA node. It can be reallocated later in numa_realloc().
+	 */
+	if (dev->nr_vring > 0)
+		numa_node = dev->virtqueue[0]->numa_node;
+
 	if (!dev->inflight_info) {
-		dev->inflight_info = calloc(1,
-					    sizeof(struct inflight_mem_info));
+		dev->inflight_info = rte_zmalloc_socket("inflight_info",
+				sizeof(struct inflight_mem_info), 0, numa_node);
 		if (dev->inflight_info == NULL) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to alloc dev inflight area\n");
@@ -1779,15 +1820,17 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
 	vq->last_avail_idx += resubmit_num;
 
 	if (resubmit_num) {
-		resubmit  = calloc(1, sizeof(struct rte_vhost_resubmit_info));
+		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct rte_vhost_resubmit_info),
+				0, vq->numa_node);
 		if (!resubmit) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to allocate memory for resubmit info.\n");
 			return RTE_VHOST_MSG_RESULT_ERR;
 		}
 
-		resubmit->resubmit_list = calloc(resubmit_num,
-			sizeof(struct rte_vhost_resubmit_desc));
+		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
+				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
+				0, vq->numa_node);
 		if (!resubmit->resubmit_list) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to allocate memory for inflight desc.\n");
@@ -1873,15 +1916,17 @@ vhost_check_queue_inflights_packed(struct virtio_net *dev,
 	}
 
 	if (resubmit_num) {
-		resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
+		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct rte_vhost_resubmit_info),
+				0, vq->numa_node);
 		if (resubmit == NULL) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to allocate memory for resubmit info.\n");
 			return RTE_VHOST_MSG_RESULT_ERR;
 		}
 
-		resubmit->resubmit_list = calloc(resubmit_num,
-			sizeof(struct rte_vhost_resubmit_desc));
+		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
+				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
+				0, vq->numa_node);
 		if (resubmit->resubmit_list == NULL) {
 			VHOST_LOG_CONFIG(ERR,
 				"failed to allocate memory for resubmit desc.\n");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc Maxime Coquelin
@ 2021-06-25  2:26   ` Xia, Chenbo
  0 siblings, 0 replies; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  2:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand; +Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc
> 
> When the guest allocates virtqueues on a different NUMA node
> than the one the Vhost metadata are allocated, both the Vhost
> device struct and the virtqueues struct are reallocated.
> 
> However, reallocating the Vhost memory table was missing, which
> likely causes at least one cross-NUMA accesses for every burst
> of packets.
> 
> This patch reallocates this table on the same NUMA node as the
> other metadata.
> 
> Fixes: 552e8fd3d2b4 ("vhost: simplify memory regions handling")
> Cc: stable@dpdk.org
> 
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost_user.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 8f0eba6412..b5a84f3dcd 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -473,8 +473,8 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  }
> 
>  /*
> - * Reallocate virtio_dev and vhost_virtqueue data structure to make them on
> the
> - * same numa node as the memory of vring descriptor.
> + * Reallocate virtio_dev, vhost_virtqueue and related data structures to
> + * make them on the same numa node as the memory of vring descriptor.
>   */
>  #ifdef RTE_LIBRTE_VHOST_NUMA
>  static struct virtio_net*
> @@ -557,6 +557,9 @@ numa_realloc(struct virtio_net *dev, int index)
>  		goto out;
>  	}
>  	if (oldnode != newnode) {
> +		struct rte_vhost_memory *old_mem;
> +		ssize_t mem_size;
> +
>  		VHOST_LOG_CONFIG(INFO,
>  			"reallocate dev from %d to %d node\n",
>  			oldnode, newnode);
> @@ -568,6 +571,18 @@ numa_realloc(struct virtio_net *dev, int index)
> 
>  		memcpy(dev, old_dev, sizeof(*dev));
>  		rte_free(old_dev);
> +
> +		mem_size = sizeof(struct rte_vhost_memory) +
> +			sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
> +		old_mem = dev->mem;
> +		dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
> +		if (!dev->mem) {
> +			dev->mem = old_mem;
> +			goto out;
> +		}
> +
> +		memcpy(dev->mem, old_mem, mem_size);
> +		rte_free(old_mem);
>  	}
> 
>  out:
> --
> 2.31.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages table NUMA realloc
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages " Maxime Coquelin
@ 2021-06-25  2:26   ` Xia, Chenbo
  0 siblings, 0 replies; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  2:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand; +Cc: stable

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [PATCH v6 2/7] vhost: fix missing guest pages table NUMA realloc
> 
> When the guest allocates virtqueues on a different NUMA node
> than the one the Vhost metadata are allocated, both the Vhost
> device struct and the virtqueues struct are reallocated.
> 
> However, reallocating the guest pages table was missing, which
> likely causes at least one cross-NUMA accesses for every burst
> of packets.
> 
> This patch reallocates this table on the same NUMA node as the
> other metadata.
> 
> Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost_user.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index b5a84f3dcd..5fb055ea2e 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -558,7 +558,8 @@ numa_realloc(struct virtio_net *dev, int index)
>  	}
>  	if (oldnode != newnode) {
>  		struct rte_vhost_memory *old_mem;
> -		ssize_t mem_size;
> +		struct guest_page *old_gp;
> +		ssize_t mem_size, gp_size;
> 
>  		VHOST_LOG_CONFIG(INFO,
>  			"reallocate dev from %d to %d node\n",
> @@ -583,6 +584,17 @@ numa_realloc(struct virtio_net *dev, int index)
> 
>  		memcpy(dev->mem, old_mem, mem_size);
>  		rte_free(old_mem);
> +
> +		gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
> +		old_gp = dev->guest_pages;
> +		dev->guest_pages = rte_malloc_socket(NULL, gp_size,
> RTE_CACHE_LINE_SIZE, newnode);
> +		if (!dev->guest_pages) {
> +			dev->guest_pages = old_gp;
> +			goto out;
> +		}
> +
> +		memcpy(dev->guest_pages, old_gp, gp_size);
> +		rte_free(old_gp);
>  	}
> 
>  out:
> --
> 2.31.1


Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging " Maxime Coquelin
@ 2021-06-25  2:50   ` Xia, Chenbo
  2021-06-29 14:38     ` Maxime Coquelin
  0 siblings, 1 reply; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  2:50 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
> 
> When the guest allocates virtqueues on a different NUMA node
> than the one the Vhost metadata are allocated, both the Vhost
> device struct and the virtqueues struct are reallocated.
> 
> However, reallocating the log cache on the new NUMA node was
> not done. This patch fixes this by reallocating it if it has
> been allocated already, which means a live-migration is
> on-going.
> 
> Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")

This commit is of 21.05, although LTS maintainers don't maintain non-LTS stable
releases now, I guess it's still better to add 'cc stable tag' in case anyone
volunteers to do that?

Thanks,
Chenbo

> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost_user.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 5fb055ea2e..82adf80fe5 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
>  			vq->batch_copy_elems = new_batch_copy_elems;
>  		}
> 
> +		if (vq->log_cache) {
> +			struct log_cache_entry *log_cache;
> +
> +			log_cache = rte_realloc_socket(vq->log_cache,
> +					sizeof(struct log_cache_entry) *
> VHOST_LOG_CACHE_NR,
> +					0, newnode);
> +			if (log_cache)
> +				vq->log_cache = log_cache;
> +		}
> +
>  		rte_free(old_vq);
>  	}
> 
> --
> 2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue Maxime Coquelin
@ 2021-06-25  2:56   ` Xia, Chenbo
  2021-06-25 11:37     ` Xia, Chenbo
  0 siblings, 1 reply; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  2:56 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand; +Cc: stable

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> Subject: [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
> 
> Since the Vhost-user device initialization has been reworked,
> enabling the application to start using the device as soon as
> the first queue pair is ready, NUMA reallocation no more
> happened on queue pairs other than the first one since
> numa_realloc() was returning early if the device was running.
> 
> This patch fixes this issue by only preventing the device
> metadata to be allocated if the device is running. For the
> virtqueues, a vring state change notification is sent to
> notify the application of its disablement. Since the callback
> is supposed to be blocking, it is safe to reallocate it
> afterwards.
> 
> Fixes: d0fcc38f5fa4 ("vhost: improve device readiness notifications")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost_user.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 82adf80fe5..51b96a0716 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -488,12 +488,16 @@ numa_realloc(struct virtio_net *dev, int index)
>  	struct batch_copy_elem *new_batch_copy_elems;
>  	int ret;
> 
> -	if (dev->flags & VIRTIO_DEV_RUNNING)
> -		return dev;
> -
>  	old_dev = dev;
>  	vq = old_vq = dev->virtqueue[index];
> 
> +	/*
> +	 * If VQ is ready, it is too late to reallocate, it certainly already
> +	 * happened anyway on VHOST_USER_SET_VRING_ADRR.
> +	 */
> +	if (vq->ready)
> +		return dev;
> +
>  	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
>  			    MPOL_F_NODE | MPOL_F_ADDR);
> 
> @@ -558,6 +562,9 @@ numa_realloc(struct virtio_net *dev, int index)
>  		rte_free(old_vq);
>  	}
> 
> +	if (dev->flags & VIRTIO_DEV_RUNNING)
> +		goto out;
> +

Since we don't realloc when vq is ready, there is no case that vq not ready but
device still running, right?

Thanks,
Chenbo

>  	/* check if we need to reallocate dev */
>  	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
>  			    MPOL_F_NODE | MPOL_F_ADDR);
> --
> 2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API Maxime Coquelin
@ 2021-06-25  7:26   ` Xia, Chenbo
  2021-06-29 14:36     ` Maxime Coquelin
  0 siblings, 1 reply; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  7:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API
> 
> Inflight metadata are allocated using glibc's calloc.
> This patch converts them to rte_zmalloc_socket to take
> care of the NUMA affinity.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.c      |  4 +--
>  lib/vhost/vhost_user.c | 67 +++++++++++++++++++++++++++++++++++-------
>  2 files changed, 58 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index 0000cd3297..53a470f547 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c

[...]

> @@ -1779,15 +1820,17 @@ vhost_check_queue_inflights_split(struct virtio_net
> *dev,
>  	vq->last_avail_idx += resubmit_num;
> 
>  	if (resubmit_num) {
> -		resubmit  = calloc(1, sizeof(struct rte_vhost_resubmit_info));
> +		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct
> rte_vhost_resubmit_info),
> +				0, vq->numa_node);
>  		if (!resubmit) {
>  			VHOST_LOG_CONFIG(ERR,
>  				"failed to allocate memory for resubmit info.\n");
>  			return RTE_VHOST_MSG_RESULT_ERR;
>  		}
> 
> -		resubmit->resubmit_list = calloc(resubmit_num,
> -			sizeof(struct rte_vhost_resubmit_desc));
> +		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
> +				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
> +				0, vq->numa_node);
>  		if (!resubmit->resubmit_list) {
>  			VHOST_LOG_CONFIG(ERR,
>  				"failed to allocate memory for inflight desc.\n");
> @@ -1873,15 +1916,17 @@ vhost_check_queue_inflights_packed(struct virtio_net
> *dev,
>  	}
> 
>  	if (resubmit_num) {
> -		resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
> +		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct
> rte_vhost_resubmit_info),
> +				0, vq->numa_node);

There are still two 'free(resubmit)' in vhost_check_queue_inflights_split and
vhost_check_queue_inflights_packed, which should be replaced with rte_free()

Thanks,
Chenbo 

>  		if (resubmit == NULL) {
>  			VHOST_LOG_CONFIG(ERR,
>  				"failed to allocate memory for resubmit info.\n");
>  			return RTE_VHOST_MSG_RESULT_ERR;
>  		}
> 
> -		resubmit->resubmit_list = calloc(resubmit_num,
> -			sizeof(struct rte_vhost_resubmit_desc));
> +		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
> +				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
> +				0, vq->numa_node);
>  		if (resubmit->resubmit_list == NULL) {
>  			VHOST_LOG_CONFIG(ERR,
>  				"failed to allocate memory for resubmit desc.\n");
> --
> 2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation Maxime Coquelin
@ 2021-06-25  7:26   ` Xia, Chenbo
  0 siblings, 0 replies; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  7:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v6 5/7] vhost: improve NUMA reallocation
> 
> This patch improves the numa_realloc() function by making use
> of rte_realloc_socket(), which takes care of the memory copy
> and freeing of the old data.
> 
> Suggested-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost_user.c | 186 ++++++++++++++++++-----------------------
>  1 file changed, 81 insertions(+), 105 deletions(-)
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 51b96a0716..d6ec4000c3 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -480,16 +480,17 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  static struct virtio_net*
>  numa_realloc(struct virtio_net *dev, int index)
>  {
> -	int oldnode, newnode;
> +	int node, dev_node;
>  	struct virtio_net *old_dev;
> -	struct vhost_virtqueue *old_vq, *vq;
> -	struct vring_used_elem *new_shadow_used_split;
> -	struct vring_used_elem_packed *new_shadow_used_packed;
> -	struct batch_copy_elem *new_batch_copy_elems;
> +	struct vhost_virtqueue *vq;
> +	struct batch_copy_elem *bce;
> +	struct guest_page *gp;
> +	struct rte_vhost_memory *mem;
> +	size_t mem_size;
>  	int ret;
> 
>  	old_dev = dev;
> -	vq = old_vq = dev->virtqueue[index];
> +	vq = dev->virtqueue[index];
> 
>  	/*
>  	 * If VQ is ready, it is too late to reallocate, it certainly already
> @@ -498,128 +499,103 @@ numa_realloc(struct virtio_net *dev, int index)
>  	if (vq->ready)
>  		return dev;
> 
> -	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
> -			    MPOL_F_NODE | MPOL_F_ADDR);
> -
> -	/* check if we need to reallocate vq */
> -	ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
> -			     MPOL_F_NODE | MPOL_F_ADDR);
> +	ret = get_mempolicy(&node, NULL, 0, vq->desc, MPOL_F_NODE | MPOL_F_ADDR);
>  	if (ret) {
> -		VHOST_LOG_CONFIG(ERR,
> -			"Unable to get vq numa information.\n");
> +		VHOST_LOG_CONFIG(ERR, "Unable to get virtqueue %d numa
> information.\n", index);
>  		return dev;
>  	}
> -	if (oldnode != newnode) {
> -		VHOST_LOG_CONFIG(INFO,
> -			"reallocate vq from %d to %d node\n", oldnode, newnode);
> -		vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
> -		if (!vq)
> -			return dev;
> 
> -		memcpy(vq, old_vq, sizeof(*vq));
> +	vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
> +	if (!vq) {
> +		VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on
> node %d\n",
> +				index, node);
> +		return dev;
> +	}
> 
> -		if (vq_is_packed(dev)) {
> -			new_shadow_used_packed = rte_malloc_socket(NULL,
> -					vq->size *
> -					sizeof(struct vring_used_elem_packed),
> -					RTE_CACHE_LINE_SIZE,
> -					newnode);
> -			if (new_shadow_used_packed) {
> -				rte_free(vq->shadow_used_packed);
> -				vq->shadow_used_packed = new_shadow_used_packed;
> -			}
> -		} else {
> -			new_shadow_used_split = rte_malloc_socket(NULL,
> -					vq->size *
> -					sizeof(struct vring_used_elem),
> -					RTE_CACHE_LINE_SIZE,
> -					newnode);
> -			if (new_shadow_used_split) {
> -				rte_free(vq->shadow_used_split);
> -				vq->shadow_used_split = new_shadow_used_split;
> -			}
> +	if (vq != dev->virtqueue[index]) {
> +		VHOST_LOG_CONFIG(INFO, "reallocated virtqueue on node %d\n", node);
> +		dev->virtqueue[index] = vq;
> +		vhost_user_iotlb_init(dev, index);
> +	}
> +
> +	if (vq_is_packed(dev)) {
> +		struct vring_used_elem_packed *sup;
> +
> +		sup = rte_realloc_socket(vq->shadow_used_packed, vq->size *
> sizeof(*sup),
> +				RTE_CACHE_LINE_SIZE, node);
> +		if (!sup) {
> +			VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow packed on
> node %d\n", node);
> +			return dev;
>  		}
> +		vq->shadow_used_packed = sup;
> +	} else {
> +		struct vring_used_elem *sus;
> 
> -		new_batch_copy_elems = rte_malloc_socket(NULL,
> -			vq->size * sizeof(struct batch_copy_elem),
> -			RTE_CACHE_LINE_SIZE,
> -			newnode);
> -		if (new_batch_copy_elems) {
> -			rte_free(vq->batch_copy_elems);
> -			vq->batch_copy_elems = new_batch_copy_elems;
> +		sus = rte_realloc_socket(vq->shadow_used_split, vq->size *
> sizeof(*sus),
> +				RTE_CACHE_LINE_SIZE, node);
> +		if (!sus) {
> +			VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow split on
> node %d\n", node);
> +			return dev;
>  		}
> +		vq->shadow_used_split = sus;
> +	}
> 
> -		if (vq->log_cache) {
> -			struct log_cache_entry *log_cache;
> +	bce = rte_realloc_socket(vq->batch_copy_elems, vq->size * sizeof(*bce),
> +			RTE_CACHE_LINE_SIZE, node);
> +	if (!bce) {
> +		VHOST_LOG_CONFIG(ERR, "Failed to realloc batch copy elem on
> node %d\n", node);
> +		return dev;
> +	}
> +	vq->batch_copy_elems = bce;
> 
> -			log_cache = rte_realloc_socket(vq->log_cache,
> -					sizeof(struct log_cache_entry) *
> VHOST_LOG_CACHE_NR,
> -					0, newnode);
> -			if (log_cache)
> -				vq->log_cache = log_cache;
> -		}
> +	if (vq->log_cache) {
> +		struct log_cache_entry *lc;
> 
> -		rte_free(old_vq);
> +		lc = rte_realloc_socket(vq->log_cache, sizeof(*lc) *
> VHOST_LOG_CACHE_NR, 0, node);
> +		if (!lc) {
> +			VHOST_LOG_CONFIG(ERR, "Failed to realloc log cache on
> node %d\n", node);
> +			return dev;
> +		}
> +		vq->log_cache = lc;
>  	}
> 
>  	if (dev->flags & VIRTIO_DEV_RUNNING)
> -		goto out;
> +		return dev;
> 
> -	/* check if we need to reallocate dev */
> -	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
> -			    MPOL_F_NODE | MPOL_F_ADDR);
> +	ret = get_mempolicy(&dev_node, NULL, 0, dev, MPOL_F_NODE | MPOL_F_ADDR);
>  	if (ret) {
> -		VHOST_LOG_CONFIG(ERR,
> -			"Unable to get dev numa information.\n");
> -		goto out;
> +		VHOST_LOG_CONFIG(ERR, "Unable to get Virtio dev %d numa
> information.\n", dev->vid);
> +		return dev;
>  	}
> -	if (oldnode != newnode) {
> -		struct rte_vhost_memory *old_mem;
> -		struct guest_page *old_gp;
> -		ssize_t mem_size, gp_size;
> -
> -		VHOST_LOG_CONFIG(INFO,
> -			"reallocate dev from %d to %d node\n",
> -			oldnode, newnode);
> -		dev = rte_malloc_socket(NULL, sizeof(*dev), 0, newnode);
> -		if (!dev) {
> -			dev = old_dev;
> -			goto out;
> -		}
> -
> -		memcpy(dev, old_dev, sizeof(*dev));
> -		rte_free(old_dev);
> -
> -		mem_size = sizeof(struct rte_vhost_memory) +
> -			sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
> -		old_mem = dev->mem;
> -		dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
> -		if (!dev->mem) {
> -			dev->mem = old_mem;
> -			goto out;
> -		}
> -
> -		memcpy(dev->mem, old_mem, mem_size);
> -		rte_free(old_mem);
> 
> -		gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
> -		old_gp = dev->guest_pages;
> -		dev->guest_pages = rte_malloc_socket(NULL, gp_size,
> RTE_CACHE_LINE_SIZE, newnode);
> -		if (!dev->guest_pages) {
> -			dev->guest_pages = old_gp;
> -			goto out;
> -		}
> +	if (dev_node == node)
> +		return dev;
> 
> -		memcpy(dev->guest_pages, old_gp, gp_size);
> -		rte_free(old_gp);
> +	dev = rte_realloc_socket(old_dev, sizeof(*dev), 0, node);
> +	if (!dev) {
> +		VHOST_LOG_CONFIG(ERR, "Failed to realloc dev on node %d\n", node);
> +		return old_dev;
>  	}
> 
> -out:
> -	dev->virtqueue[index] = vq;
> +	VHOST_LOG_CONFIG(INFO, "reallocated device on node %d\n", node);
>  	vhost_devices[dev->vid] = dev;
> 
> -	if (old_vq != vq)
> -		vhost_user_iotlb_init(dev, index);
> +	mem_size = sizeof(struct rte_vhost_memory) +
> +		sizeof(struct rte_vhost_mem_region) * dev->mem->nregions;
> +	mem = rte_realloc_socket(dev->mem, mem_size, 0, node);
> +	if (!mem) {
> +		VHOST_LOG_CONFIG(ERR, "Failed to realloc mem table on node %d\n",
> node);
> +		return dev;
> +	}
> +	dev->mem = mem;
> +
> +	gp = rte_realloc_socket(dev->guest_pages, dev->max_guest_pages *
> sizeof(*gp),
> +			RTE_CACHE_LINE_SIZE, node);
> +	if (!gp) {
> +		VHOST_LOG_CONFIG(ERR, "Failed to realloc guest pages on node %d\n",
> node);
> +		return dev;
> +	}
> +	dev->guest_pages = gp;
> 
>  	return dev;
>  }
> --
> 2.31.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue
  2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue Maxime Coquelin
@ 2021-06-25  7:26   ` Xia, Chenbo
  0 siblings, 0 replies; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25  7:26 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Friday, June 18, 2021 10:04 PM
> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue
> 
> This patch saves the NUMA node the virtqueue is allocated
> on at init time, in order to allocate all other data on the
> same node.
> 
> While most of the data are allocated before numa_realloc()
> is called and so the data will be reallocated properly, some
> data like the log cache are most likely allocated after.
> 
> For the virtio device metadata, we decide to allocate them
> on the same node as the VQ 0.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/vhost.c      | 34 ++++++++++++++++------------------
>  lib/vhost/vhost.h      |  1 +
>  lib/vhost/vhost_user.c | 41 ++++++++++++++++++++++++++++-------------
>  3 files changed, 45 insertions(+), 31 deletions(-)
> 
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index c96f6335c8..0000cd3297 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -261,7 +261,7 @@ vhost_alloc_copy_ind_table(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>  	uint64_t src, dst;
>  	uint64_t len, remain = desc_len;
> 
> -	idesc = rte_malloc(__func__, desc_len, 0);
> +	idesc = rte_malloc_socket(__func__, desc_len, 0, vq->numa_node);
>  	if (unlikely(!idesc))
>  		return NULL;
> 
> @@ -549,6 +549,7 @@ static void
>  init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
>  {
>  	struct vhost_virtqueue *vq;
> +	int numa_node = SOCKET_ID_ANY;
> 
>  	if (vring_idx >= VHOST_MAX_VRING) {
>  		VHOST_LOG_CONFIG(ERR,
> @@ -570,6 +571,15 @@ init_vring_queue(struct virtio_net *dev, uint32_t
> vring_idx)
>  	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
>  	vq->notif_enable = VIRTIO_UNINITIALIZED_NOTIF;
> 
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +	if (get_mempolicy(&numa_node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
> +		VHOST_LOG_CONFIG(ERR, "(%d) failed to query numa node: %s\n",
> +			dev->vid, rte_strerror(errno));
> +		numa_node = SOCKET_ID_ANY;
> +	}
> +#endif
> +	vq->numa_node = numa_node;
> +
>  	vhost_user_iotlb_init(dev, vring_idx);
>  }
> 
> @@ -1616,7 +1626,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
>  	struct vhost_virtqueue *vq;
>  	struct virtio_net *dev = get_device(vid);
>  	struct rte_vhost_async_features f;
> -	int node;
> 
>  	if (dev == NULL || ops == NULL)
>  		return -1;
> @@ -1651,20 +1660,9 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
>  		goto reg_out;
>  	}
> 
> -#ifdef RTE_LIBRTE_VHOST_NUMA
> -	if (get_mempolicy(&node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
> -		VHOST_LOG_CONFIG(ERR,
> -			"unable to get numa information in async register. "
> -			"allocating async buffer memory on the caller thread
> node\n");
> -		node = SOCKET_ID_ANY;
> -	}
> -#else
> -	node = SOCKET_ID_ANY;
> -#endif
> -
>  	vq->async_pkts_info = rte_malloc_socket(NULL,
>  			vq->size * sizeof(struct async_inflight_info),
> -			RTE_CACHE_LINE_SIZE, node);
> +			RTE_CACHE_LINE_SIZE, vq->numa_node);
>  	if (!vq->async_pkts_info) {
>  		vhost_free_async_mem(vq);
>  		VHOST_LOG_CONFIG(ERR,
> @@ -1675,7 +1673,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
> 
>  	vq->it_pool = rte_malloc_socket(NULL,
>  			VHOST_MAX_ASYNC_IT * sizeof(struct rte_vhost_iov_iter),
> -			RTE_CACHE_LINE_SIZE, node);
> +			RTE_CACHE_LINE_SIZE, vq->numa_node);
>  	if (!vq->it_pool) {
>  		vhost_free_async_mem(vq);
>  		VHOST_LOG_CONFIG(ERR,
> @@ -1686,7 +1684,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
> 
>  	vq->vec_pool = rte_malloc_socket(NULL,
>  			VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
> -			RTE_CACHE_LINE_SIZE, node);
> +			RTE_CACHE_LINE_SIZE, vq->numa_node);
>  	if (!vq->vec_pool) {
>  		vhost_free_async_mem(vq);
>  		VHOST_LOG_CONFIG(ERR,
> @@ -1698,7 +1696,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
>  	if (vq_is_packed(dev)) {
>  		vq->async_buffers_packed = rte_malloc_socket(NULL,
>  			vq->size * sizeof(struct vring_used_elem_packed),
> -			RTE_CACHE_LINE_SIZE, node);
> +			RTE_CACHE_LINE_SIZE, vq->numa_node);
>  		if (!vq->async_buffers_packed) {
>  			vhost_free_async_mem(vq);
>  			VHOST_LOG_CONFIG(ERR,
> @@ -1709,7 +1707,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t
> queue_id,
>  	} else {
>  		vq->async_descs_split = rte_malloc_socket(NULL,
>  			vq->size * sizeof(struct vring_used_elem),
> -			RTE_CACHE_LINE_SIZE, node);
> +			RTE_CACHE_LINE_SIZE, vq->numa_node);
>  		if (!vq->async_descs_split) {
>  			vhost_free_async_mem(vq);
>  			VHOST_LOG_CONFIG(ERR,
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 8078ddff79..8ffe387556 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -164,6 +164,7 @@ struct vhost_virtqueue {
> 
>  	uint16_t		batch_copy_nb_elems;
>  	struct batch_copy_elem	*batch_copy_elems;
> +	int			numa_node;
>  	bool			used_wrap_counter;
>  	bool			avail_wrap_counter;
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index d6ec4000c3..d8ec087dfc 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -433,10 +433,10 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  	if (vq_is_packed(dev)) {
>  		if (vq->shadow_used_packed)
>  			rte_free(vq->shadow_used_packed);
> -		vq->shadow_used_packed = rte_malloc(NULL,
> +		vq->shadow_used_packed = rte_malloc_socket(NULL,
>  				vq->size *
>  				sizeof(struct vring_used_elem_packed),
> -				RTE_CACHE_LINE_SIZE);
> +				RTE_CACHE_LINE_SIZE, vq->numa_node);
>  		if (!vq->shadow_used_packed) {
>  			VHOST_LOG_CONFIG(ERR,
>  					"failed to allocate memory for shadow used
> ring.\n");
> @@ -447,9 +447,9 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  		if (vq->shadow_used_split)
>  			rte_free(vq->shadow_used_split);
> 
> -		vq->shadow_used_split = rte_malloc(NULL,
> +		vq->shadow_used_split = rte_malloc_socket(NULL,
>  				vq->size * sizeof(struct vring_used_elem),
> -				RTE_CACHE_LINE_SIZE);
> +				RTE_CACHE_LINE_SIZE, vq->numa_node);
> 
>  		if (!vq->shadow_used_split) {
>  			VHOST_LOG_CONFIG(ERR,
> @@ -460,9 +460,9 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
> 
>  	if (vq->batch_copy_elems)
>  		rte_free(vq->batch_copy_elems);
> -	vq->batch_copy_elems = rte_malloc(NULL,
> +	vq->batch_copy_elems = rte_malloc_socket(NULL,
>  				vq->size * sizeof(struct batch_copy_elem),
> -				RTE_CACHE_LINE_SIZE);
> +				RTE_CACHE_LINE_SIZE, vq->numa_node);
>  	if (!vq->batch_copy_elems) {
>  		VHOST_LOG_CONFIG(ERR,
>  			"failed to allocate memory for batching copy.\n");
> @@ -505,6 +505,9 @@ numa_realloc(struct virtio_net *dev, int index)
>  		return dev;
>  	}
> 
> +	if (node == vq->numa_node)
> +		goto out_dev_realloc;
> +
>  	vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
>  	if (!vq) {
>  		VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on
> node %d\n",
> @@ -559,6 +562,10 @@ numa_realloc(struct virtio_net *dev, int index)
>  		vq->log_cache = lc;
>  	}
> 
> +	vq->numa_node = node;
> +
> +out_dev_realloc:
> +
>  	if (dev->flags & VIRTIO_DEV_RUNNING)
>  		return dev;
> 
> @@ -1213,7 +1220,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  	struct virtio_net *dev = *pdev;
>  	struct VhostUserMemory *memory = &msg->payload.memory;
>  	struct rte_vhost_mem_region *reg;
> -
> +	int numa_node = SOCKET_ID_ANY;
>  	uint64_t mmap_offset;
>  	uint32_t i;
> 
> @@ -1253,13 +1260,21 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  		for (i = 0; i < dev->nr_vring; i++)
>  			vhost_user_iotlb_flush_all(dev->virtqueue[i]);
> 
> +	/*
> +	 * If VQ 0 has already been allocated, try to allocate on the same
> +	 * NUMA node. It can be reallocated later in numa_realloc().
> +	 */
> +	if (dev->nr_vring > 0)
> +		numa_node = dev->virtqueue[0]->numa_node;
> +
>  	dev->nr_guest_pages = 0;
>  	if (dev->guest_pages == NULL) {
>  		dev->max_guest_pages = 8;
> -		dev->guest_pages = rte_zmalloc(NULL,
> +		dev->guest_pages = rte_zmalloc_socket(NULL,
>  					dev->max_guest_pages *
>  					sizeof(struct guest_page),
> -					RTE_CACHE_LINE_SIZE);
> +					RTE_CACHE_LINE_SIZE,
> +					numa_node);
>  		if (dev->guest_pages == NULL) {
>  			VHOST_LOG_CONFIG(ERR,
>  				"(%d) failed to allocate memory "
> @@ -1269,8 +1284,8 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  		}
>  	}
> 
> -	dev->mem = rte_zmalloc("vhost-mem-table", sizeof(struct rte_vhost_memory)
> +
> -		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0);
> +	dev->mem = rte_zmalloc_socket("vhost-mem-table", sizeof(struct
> rte_vhost_memory) +
> +		sizeof(struct rte_vhost_mem_region) * memory->nregions, 0,
> numa_node);
>  	if (dev->mem == NULL) {
>  		VHOST_LOG_CONFIG(ERR,
>  			"(%d) failed to allocate memory for dev->mem\n",
> @@ -2193,9 +2208,9 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct
> VhostUserMsg *msg,
>  		rte_free(vq->log_cache);
>  		vq->log_cache = NULL;
>  		vq->log_cache_nb_elem = 0;
> -		vq->log_cache = rte_zmalloc("vq log cache",
> +		vq->log_cache = rte_malloc_socket("vq log cache",
>  				sizeof(struct log_cache_entry) * VHOST_LOG_CACHE_NR,
> -				0);
> +				0, vq->numa_node);
>  		/*
>  		 * If log cache alloc fail, don't fail migration, but no
>  		 * caching will be done, which will impact performance
> --
> 2.31.1

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
  2021-06-25  2:56   ` Xia, Chenbo
@ 2021-06-25 11:37     ` Xia, Chenbo
  2021-06-29 14:35       ` Maxime Coquelin
  0 siblings, 1 reply; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-25 11:37 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand; +Cc: stable

Hi Maxime,

> -----Original Message-----
> From: stable <stable-bounces@dpdk.org> On Behalf Of Xia, Chenbo
> Sent: Friday, June 25, 2021 10:56 AM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; dev@dpdk.org;
> david.marchand@redhat.com
> Cc: stable@dpdk.org
> Subject: Re: [dpdk-stable] [PATCH v6 4/7] vhost: fix NUMA reallocation
> with multiqueue
> 
> Hi Maxime,
> 
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Friday, June 18, 2021 10:04 PM
> > To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
> > Subject: [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
> >
> > Since the Vhost-user device initialization has been reworked,
> > enabling the application to start using the device as soon as
> > the first queue pair is ready, NUMA reallocation no more
> > happened on queue pairs other than the first one since
> > numa_realloc() was returning early if the device was running.
> >
> > This patch fixes this issue by only preventing the device
> > metadata to be allocated if the device is running. For the
> > virtqueues, a vring state change notification is sent to
> > notify the application of its disablement. Since the callback
> > is supposed to be blocking, it is safe to reallocate it
> > afterwards.
> >
> > Fixes: d0fcc38f5fa4 ("vhost: improve device readiness notifications")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > ---
> >  lib/vhost/vhost_user.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > index 82adf80fe5..51b96a0716 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -488,12 +488,16 @@ numa_realloc(struct virtio_net *dev, int index)
> >  	struct batch_copy_elem *new_batch_copy_elems;
> >  	int ret;
> >
> > -	if (dev->flags & VIRTIO_DEV_RUNNING)
> > -		return dev;
> > -
> >  	old_dev = dev;
> >  	vq = old_vq = dev->virtqueue[index];
> >
> > +	/*
> > +	 * If VQ is ready, it is too late to reallocate, it certainly
> already
> > +	 * happened anyway on VHOST_USER_SET_VRING_ADRR.
> > +	 */
> > +	if (vq->ready)
> > +		return dev;
> > +
> >  	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
> >  			    MPOL_F_NODE | MPOL_F_ADDR);
> >
> > @@ -558,6 +562,9 @@ numa_realloc(struct virtio_net *dev, int index)
> >  		rte_free(old_vq);
> >  	}
> >
> > +	if (dev->flags & VIRTIO_DEV_RUNNING)
> > +		goto out;
> > +
> 
> Since we don't realloc when vq is ready, there is no case that vq not
> ready but
> device still running, right?

Sorry, I forgot DEV_RUNNING now only requires 1 qpair ready now ☹
Ignore above comments..

Thanks,
Chenbo

> 
> Thanks,
> Chenbo
> 
> >  	/* check if we need to reallocate dev */
> >  	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
> >  			    MPOL_F_NODE | MPOL_F_ADDR);
> > --
> > 2.31.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
  2021-06-25 11:37     ` Xia, Chenbo
@ 2021-06-29 14:35       ` Maxime Coquelin
  0 siblings, 0 replies; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-29 14:35 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand; +Cc: stable

Hi Chenbo,

On 6/25/21 1:37 PM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: stable <stable-bounces@dpdk.org> On Behalf Of Xia, Chenbo
>> Sent: Friday, June 25, 2021 10:56 AM
>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; dev@dpdk.org;
>> david.marchand@redhat.com
>> Cc: stable@dpdk.org
>> Subject: Re: [dpdk-stable] [PATCH v6 4/7] vhost: fix NUMA reallocation
>> with multiqueue
>>
>> Hi Maxime,
>>
>>> -----Original Message-----
>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> Sent: Friday, June 18, 2021 10:04 PM
>>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
>> <chenbo.xia@intel.com>
>>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; stable@dpdk.org
>>> Subject: [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue
>>>
>>> Since the Vhost-user device initialization has been reworked,
>>> enabling the application to start using the device as soon as
>>> the first queue pair is ready, NUMA reallocation no more
>>> happened on queue pairs other than the first one since
>>> numa_realloc() was returning early if the device was running.
>>>
>>> This patch fixes this issue by only preventing the device
>>> metadata to be allocated if the device is running. For the
>>> virtqueues, a vring state change notification is sent to
>>> notify the application of its disablement. Since the callback
>>> is supposed to be blocking, it is safe to reallocate it
>>> afterwards.
>>>
>>> Fixes: d0fcc38f5fa4 ("vhost: improve device readiness notifications")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>  lib/vhost/vhost_user.c | 13 ++++++++++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>>> index 82adf80fe5..51b96a0716 100644
>>> --- a/lib/vhost/vhost_user.c
>>> +++ b/lib/vhost/vhost_user.c
>>> @@ -488,12 +488,16 @@ numa_realloc(struct virtio_net *dev, int index)
>>>  	struct batch_copy_elem *new_batch_copy_elems;
>>>  	int ret;
>>>
>>> -	if (dev->flags & VIRTIO_DEV_RUNNING)
>>> -		return dev;
>>> -
>>>  	old_dev = dev;
>>>  	vq = old_vq = dev->virtqueue[index];
>>>
>>> +	/*
>>> +	 * If VQ is ready, it is too late to reallocate, it certainly
>> already
>>> +	 * happened anyway on VHOST_USER_SET_VRING_ADRR.
>>> +	 */
>>> +	if (vq->ready)
>>> +		return dev;
>>> +
>>>  	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
>>>  			    MPOL_F_NODE | MPOL_F_ADDR);
>>>
>>> @@ -558,6 +562,9 @@ numa_realloc(struct virtio_net *dev, int index)
>>>  		rte_free(old_vq);
>>>  	}
>>>
>>> +	if (dev->flags & VIRTIO_DEV_RUNNING)
>>> +		goto out;
>>> +
>>
>> Since we don't realloc when vq is ready, there is no case that vq not
>> ready but
>> device still running, right?
> 
> Sorry, I forgot DEV_RUNNING now only requires 1 qpair ready now ☹
> Ignore above comments..

No problem, thanks for the review!

> Thanks,
> Chenbo
> 
>>
>> Thanks,
>> Chenbo
>>
>>>  	/* check if we need to reallocate dev */
>>>  	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
>>>  			    MPOL_F_NODE | MPOL_F_ADDR);
>>> --
>>> 2.31.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API
  2021-06-25  7:26   ` Xia, Chenbo
@ 2021-06-29 14:36     ` Maxime Coquelin
  0 siblings, 0 replies; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-29 14:36 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand

Hi Chenbo,

On 6/25/21 9:26 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, June 18, 2021 10:04 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API
>>
>> Inflight metadata are allocated using glibc's calloc.
>> This patch converts them to rte_zmalloc_socket to take
>> care of the NUMA affinity.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  lib/vhost/vhost.c      |  4 +--
>>  lib/vhost/vhost_user.c | 67 +++++++++++++++++++++++++++++++++++-------
>>  2 files changed, 58 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
>> index 0000cd3297..53a470f547 100644
>> --- a/lib/vhost/vhost.c
>> +++ b/lib/vhost/vhost.c
> 
> [...]
> 
>> @@ -1779,15 +1820,17 @@ vhost_check_queue_inflights_split(struct virtio_net
>> *dev,
>>  	vq->last_avail_idx += resubmit_num;
>>
>>  	if (resubmit_num) {
>> -		resubmit  = calloc(1, sizeof(struct rte_vhost_resubmit_info));
>> +		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct
>> rte_vhost_resubmit_info),
>> +				0, vq->numa_node);
>>  		if (!resubmit) {
>>  			VHOST_LOG_CONFIG(ERR,
>>  				"failed to allocate memory for resubmit info.\n");
>>  			return RTE_VHOST_MSG_RESULT_ERR;
>>  		}
>>
>> -		resubmit->resubmit_list = calloc(resubmit_num,
>> -			sizeof(struct rte_vhost_resubmit_desc));
>> +		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
>> +				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
>> +				0, vq->numa_node);
>>  		if (!resubmit->resubmit_list) {
>>  			VHOST_LOG_CONFIG(ERR,
>>  				"failed to allocate memory for inflight desc.\n");
>> @@ -1873,15 +1916,17 @@ vhost_check_queue_inflights_packed(struct virtio_net
>> *dev,
>>  	}
>>
>>  	if (resubmit_num) {
>> -		resubmit = calloc(1, sizeof(struct rte_vhost_resubmit_info));
>> +		resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct
>> rte_vhost_resubmit_info),
>> +				0, vq->numa_node);
> 
> There are still two 'free(resubmit)' in vhost_check_queue_inflights_split and
> vhost_check_queue_inflights_packed, which should be replaced with rte_free()

Good catch, I'll fix this in next revision.

Thanks,
Maxime

> Thanks,
> Chenbo 
> 
>>  		if (resubmit == NULL) {
>>  			VHOST_LOG_CONFIG(ERR,
>>  				"failed to allocate memory for resubmit info.\n");
>>  			return RTE_VHOST_MSG_RESULT_ERR;
>>  		}
>>
>> -		resubmit->resubmit_list = calloc(resubmit_num,
>> -			sizeof(struct rte_vhost_resubmit_desc));
>> +		resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
>> +				resubmit_num * sizeof(struct rte_vhost_resubmit_desc),
>> +				0, vq->numa_node);
>>  		if (resubmit->resubmit_list == NULL) {
>>  			VHOST_LOG_CONFIG(ERR,
>>  				"failed to allocate memory for resubmit desc.\n");
>> --
>> 2.31.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
  2021-06-25  2:50   ` Xia, Chenbo
@ 2021-06-29 14:38     ` Maxime Coquelin
  2021-06-30  8:50       ` Xia, Chenbo
  0 siblings, 1 reply; 20+ messages in thread
From: Maxime Coquelin @ 2021-06-29 14:38 UTC (permalink / raw)
  To: Xia, Chenbo, dev, david.marchand



On 6/25/21 4:50 AM, Xia, Chenbo wrote:
> Hi Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Friday, June 18, 2021 10:04 PM
>> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Subject: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
>>
>> When the guest allocates virtqueues on a different NUMA node
>> than the one the Vhost metadata are allocated, both the Vhost
>> device struct and the virtqueues struct are reallocated.
>>
>> However, reallocating the log cache on the new NUMA node was
>> not done. This patch fixes this by reallocating it if it has
>> been allocated already, which means a live-migration is
>> on-going.
>>
>> Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")
> 
> This commit is of 21.05, although LTS maintainers don't maintain non-LTS stable
> releases now, I guess it's still better to add 'cc stable tag' in case anyone
> volunteers to do that?


I don't think that's what we do usually.
If someone wants to maintain v21.05 in the future, he can just look for
the Fixes tag in the git history.

Thanks,
Maxime

> Thanks,
> Chenbo
> 
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>  lib/vhost/vhost_user.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
>> index 5fb055ea2e..82adf80fe5 100644
>> --- a/lib/vhost/vhost_user.c
>> +++ b/lib/vhost/vhost_user.c
>> @@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
>>  			vq->batch_copy_elems = new_batch_copy_elems;
>>  		}
>>
>> +		if (vq->log_cache) {
>> +			struct log_cache_entry *log_cache;
>> +
>> +			log_cache = rte_realloc_socket(vq->log_cache,
>> +					sizeof(struct log_cache_entry) *
>> VHOST_LOG_CACHE_NR,
>> +					0, newnode);
>> +			if (log_cache)
>> +				vq->log_cache = log_cache;
>> +		}
>> +
>>  		rte_free(old_vq);
>>  	}
>>
>> --
>> 2.31.1
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
  2021-06-29 14:38     ` Maxime Coquelin
@ 2021-06-30  8:50       ` Xia, Chenbo
  0 siblings, 0 replies; 20+ messages in thread
From: Xia, Chenbo @ 2021-06-30  8:50 UTC (permalink / raw)
  To: Maxime Coquelin, dev, david.marchand

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 29, 2021 10:39 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org;
> david.marchand@redhat.com
> Subject: Re: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
> 
> 
> 
> On 6/25/21 4:50 AM, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Friday, June 18, 2021 10:04 PM
> >> To: dev@dpdk.org; david.marchand@redhat.com; Xia, Chenbo
> <chenbo.xia@intel.com>
> >> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Subject: [PATCH v6 3/7] vhost: fix missing cache logging NUMA realloc
> >>
> >> When the guest allocates virtqueues on a different NUMA node
> >> than the one the Vhost metadata are allocated, both the Vhost
> >> device struct and the virtqueues struct are reallocated.
> >>
> >> However, reallocating the log cache on the new NUMA node was
> >> not done. This patch fixes this by reallocating it if it has
> >> been allocated already, which means a live-migration is
> >> on-going.
> >>
> >> Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")
> >
> > This commit is of 21.05, although LTS maintainers don't maintain non-LTS
> stable
> > releases now, I guess it's still better to add 'cc stable tag' in case
> anyone
> > volunteers to do that?
> 
> 
> I don't think that's what we do usually.
> If someone wants to maintain v21.05 in the future, he can just look for
> the Fixes tag in the git history.
> 
> Thanks,
> Maxime

I asked Thomas and Ferruh this question to make sure we are all aligned. Seems
they think we'd better add it in this case. Thomas's two reasons:

- we don't know in advance whether a branch will be maintained
- it helps those maintaining a private stable branch

And my understanding is adding both fix tag and stable tag makes it clearer for
stable release maintainers (They can just ignore 'only fix tag' case). And anyway
they need to check the fix commit ID.

Anyway, I could add it with some small changes David asked for if you don’t plan
a new version. Do you?

Thanks,
Chenbo

> 
> > Thanks,
> > Chenbo
> >
> >>
> >> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> ---
> >>  lib/vhost/vhost_user.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> >> index 5fb055ea2e..82adf80fe5 100644
> >> --- a/lib/vhost/vhost_user.c
> >> +++ b/lib/vhost/vhost_user.c
> >> @@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
> >>  			vq->batch_copy_elems = new_batch_copy_elems;
> >>  		}
> >>
> >> +		if (vq->log_cache) {
> >> +			struct log_cache_entry *log_cache;
> >> +
> >> +			log_cache = rte_realloc_socket(vq->log_cache,
> >> +					sizeof(struct log_cache_entry) *
> >> VHOST_LOG_CACHE_NR,
> >> +					0, newnode);
> >> +			if (log_cache)
> >> +				vq->log_cache = log_cache;
> >> +		}
> >> +
> >>  		rte_free(old_vq);
> >>  	}
> >>
> >> --
> >> 2.31.1
> >


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-30  8:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 14:03 [dpdk-dev] [PATCH v6 0/7] vhost: Fix and improve NUMA reallocation Maxime Coquelin
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 1/7] vhost: fix missing memory table NUMA realloc Maxime Coquelin
2021-06-25  2:26   ` Xia, Chenbo
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 2/7] vhost: fix missing guest pages " Maxime Coquelin
2021-06-25  2:26   ` Xia, Chenbo
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 3/7] vhost: fix missing cache logging " Maxime Coquelin
2021-06-25  2:50   ` Xia, Chenbo
2021-06-29 14:38     ` Maxime Coquelin
2021-06-30  8:50       ` Xia, Chenbo
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 4/7] vhost: fix NUMA reallocation with multiqueue Maxime Coquelin
2021-06-25  2:56   ` Xia, Chenbo
2021-06-25 11:37     ` Xia, Chenbo
2021-06-29 14:35       ` Maxime Coquelin
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 5/7] vhost: improve NUMA reallocation Maxime Coquelin
2021-06-25  7:26   ` Xia, Chenbo
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 6/7] vhost: allocate all data on same node as virtqueue Maxime Coquelin
2021-06-25  7:26   ` Xia, Chenbo
2021-06-18 14:03 ` [dpdk-dev] [PATCH v6 7/7] vhost: convert inflight data to DPDK allocation API Maxime Coquelin
2021-06-25  7:26   ` Xia, Chenbo
2021-06-29 14:36     ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).