[dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly
@ 2021-03-23  9:02 Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 1/3] vhost: remove unused Vhost virtqueue field Maxime Coquelin
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Maxime Coquelin @ 2021-03-23  9:02 UTC (permalink / raw)
  To: dev, chenbo.xia, amorenoz, david.marchand, olivier.matz, bnemeth
  Cc: Maxime Coquelin

As done for Virtio PMD, this series improves cache utilization
of the vhost_virtqueue struct by removing unused field,
make the live-migration cache dynamically allocated at
live-migration setup time and by moving fields
around so that hot fields are on the first cachelines.

With this series, The struct vhost_virtqueue size goes
from 832B (13 cachelines) down to 320B (5 cachelines).

With this series and the virtio one, I measure a gain
of up to 8% in IO loop micro-benchmark with packed
ring, and 5% with split ring.

I don't have a setup at hand to run PVP testing, but
it might be interresting to get the numbers as I
suspect the cache pressure is higher in this test as
in real use-cases.

Changes in v4:
==============
- Fix missing changes to boolean (Chenbo)

Changes in v3:
==============
- Don't check pointer validity before freeing (David)
- Don't use deprecated rte_smp_wmb() (David, Checkpatch)
- Handle booleans properly (David)
- Prevent VQ size field overflow (David)
- Fix typo and indent (David)

Changes in v2:
==============
- Add log_cache freeing in free_vq (Chenbo)

Maxime Coquelin (3):
  vhost: remove unused Vhost virtqueue field
  vhost: move dirty logging cache out of the virtqueue
  vhost: optimize vhost virtqueue struct

 lib/librte_vhost/vhost.c      | 21 +++++++++----
 lib/librte_vhost/vhost.h      | 56 +++++++++++++++++------------------
 lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++--------
 lib/librte_vhost/virtio_net.c | 12 ++++----
 4 files changed, 82 insertions(+), 51 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v4 1/3] vhost: remove unused Vhost virtqueue field
  2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
@ 2021-03-23  9:02 ` Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 2/3] vhost: move dirty logging cache out of the virtqueue Maxime Coquelin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Maxime Coquelin @ 2021-03-23  9:02 UTC (permalink / raw)
  To: dev, chenbo.xia, amorenoz, david.marchand, olivier.matz, bnemeth
  Cc: Maxime Coquelin

This patch removes the "backend" field of the
vhost_virtqueue struct, which is not used by the
library.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost.c | 2 --
 lib/librte_vhost/vhost.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 52ab93d1ec..5a7c0c6cff 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -558,8 +558,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	vq->notif_enable = VIRTIO_UNINITIALIZED_NOTIF;
 
 	vhost_user_iotlb_init(dev, vring_idx);
-	/* Backends are set to -1 indicating an inactive device. */
-	vq->backend = -1;
 }
 
 static void
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 658f6fc287..717f410548 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -143,8 +143,6 @@ struct vhost_virtqueue {
 #define VIRTIO_INVALID_EVENTFD		(-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD	(-2)
 
-	/* Backend value to determine if device should started/stopped */
-	int			backend;
 	int			enabled;
 	int			access_ok;
 	int			ready;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v4 2/3] vhost: move dirty logging cache out of the virtqueue
  2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 1/3] vhost: remove unused Vhost virtqueue field Maxime Coquelin
@ 2021-03-23  9:02 ` Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct Maxime Coquelin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Maxime Coquelin @ 2021-03-23  9:02 UTC (permalink / raw)
  To: dev, chenbo.xia, amorenoz, david.marchand, olivier.matz, bnemeth
  Cc: Maxime Coquelin

This patch moves the per-virtqueue's dirty logging cache
out of the virtqueue struct, by allocating it dynamically
only when live-migration is enabled.

It saves 8 cachelines in vhost_virtqueue struct.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
---
 lib/librte_vhost/vhost.c      | 13 +++++++++++++
 lib/librte_vhost/vhost.h      |  2 +-
 lib/librte_vhost/vhost_user.c | 21 +++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 5a7c0c6cff..a8032e3ba1 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -145,6 +145,10 @@ __vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!dev->log_base))
 		return;
 
+	/* No cache, nothing to sync */
+	if (unlikely(!vq->log_cache))
+		return;
+
 	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	log_base = (unsigned long *)(uintptr_t)dev->log_base;
@@ -177,6 +181,14 @@ vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	uint32_t offset = page / (sizeof(unsigned long) << 3);
 	int i;
 
+	if (unlikely(!vq->log_cache)) {
+		/* No logging cache allocated, write dirty log map directly */
+		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
+
+		return;
+	}
+
 	for (i = 0; i < vq->log_cache_nb_elem; i++) {
 		struct log_cache_entry *elem = vq->log_cache + i;
 
@@ -354,6 +366,7 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	}
 	rte_free(vq->batch_copy_elems);
 	rte_mempool_free(vq->iotlb_pool);
+	rte_free(vq->log_cache);
 	rte_free(vq);
 }
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 717f410548..3a71dfeed9 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -183,7 +183,7 @@ struct vhost_virtqueue {
 	bool			used_wrap_counter;
 	bool			avail_wrap_counter;
 
-	struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR];
+	struct log_cache_entry *log_cache;
 	uint16_t log_cache_nb_elem;
 
 	rte_rwlock_t	iotlb_lock;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a60bb945ad..4d9e76e49e 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2022,6 +2022,9 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 	rte_free(vq->batch_copy_elems);
 	vq->batch_copy_elems = NULL;
 
+	rte_free(vq->log_cache);
+	vq->log_cache = NULL;
+
 	msg->size = sizeof(msg->payload.state);
 	msg->fd_num = 0;
 
@@ -2121,6 +2124,7 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	int fd = msg->fds[0];
 	uint64_t size, off;
 	void *addr;
+	uint32_t i;
 
 	if (validate_msg_fds(msg, 1) != 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
@@ -2174,6 +2178,23 @@ vhost_user_set_log_base(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	dev->log_base = dev->log_addr + off;
 	dev->log_size = size;
 
+	for (i = 0; i < dev->nr_vring; i++) {
+		struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+		rte_free(vq->log_cache);
+		vq->log_cache = NULL;
+		vq->log_cache_nb_elem = 0;
+		vq->log_cache = rte_zmalloc("vq log cache",
+				sizeof(struct log_cache_entry) * VHOST_LOG_CACHE_NR,
+				0);
+		/*
+		 * If log cache alloc fail, don't fail migration, but no
+		 * caching will be done, which will impact performance
+		 */
+		if (!vq->log_cache)
+			VHOST_LOG_CONFIG(ERR, "Failed to allocate VQ logging cache\n");
+	}
+
 	/*
 	 * The spec is not clear about it (yet), but QEMU doesn't expect
 	 * any payload in the reply.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct
  2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 1/3] vhost: remove unused Vhost virtqueue field Maxime Coquelin
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 2/3] vhost: move dirty logging cache out of the virtqueue Maxime Coquelin
@ 2021-03-23  9:02 ` Maxime Coquelin
  2021-03-25  2:30   ` Xia, Chenbo
  2021-03-23 10:30 ` [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly David Marchand
  2021-03-31  6:04 ` Xia, Chenbo
  4 siblings, 1 reply; 8+ messages in thread
From: Maxime Coquelin @ 2021-03-23  9:02 UTC (permalink / raw)
  To: dev, chenbo.xia, amorenoz, david.marchand, olivier.matz, bnemeth
  Cc: Maxime Coquelin

This patch moves vhost_virtqueue struct fields in order
to both optimize packing and move hot fields on the first
cachelines.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      |  6 ++--
 lib/librte_vhost/vhost.h      | 54 ++++++++++++++++++-----------------
 lib/librte_vhost/vhost_user.c | 23 +++++++--------
 lib/librte_vhost/virtio_net.c | 12 ++++----
 4 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index a8032e3ba1..04d63b2f02 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -524,7 +524,7 @@ vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (log_translate(dev, vq) < 0)
 		return -1;
 
-	vq->access_ok = 1;
+	vq->access_ok = true;
 
 	return 0;
 }
@@ -535,7 +535,7 @@ vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
 		vhost_user_iotlb_wr_lock(vq);
 
-	vq->access_ok = 0;
+	vq->access_ok = false;
 	vq->desc = NULL;
 	vq->avail = NULL;
 	vq->used = NULL;
@@ -1451,7 +1451,7 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
 
 	rte_spinlock_lock(&vq->access_lock);
 
-	if (unlikely(vq->enabled == 0 || vq->avail == NULL))
+	if (unlikely(!vq->enabled || vq->avail == NULL))
 		goto out;
 
 	ret = *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 3a71dfeed9..f628714c24 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -133,7 +133,7 @@ struct vhost_virtqueue {
 		struct vring_used	*used;
 		struct vring_packed_desc_event *device_event;
 	};
-	uint32_t		size;
+	uint16_t		size;
 
 	uint16_t		last_avail_idx;
 	uint16_t		last_used_idx;
@@ -143,29 +143,12 @@ struct vhost_virtqueue {
 #define VIRTIO_INVALID_EVENTFD		(-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD	(-2)
 
-	int			enabled;
-	int			access_ok;
-	int			ready;
-	int			notif_enable;
-#define VIRTIO_UNINITIALIZED_NOTIF	(-1)
+	bool			enabled;
+	bool			access_ok;
+	bool			ready;
 
 	rte_spinlock_t		access_lock;
 
-	/* Used to notify the guest (trigger interrupt) */
-	int			callfd;
-	/* Currently unused as polling mode is enabled */
-	int			kickfd;
-
-	/* Physical address of used ring, for logging */
-	uint64_t		log_guest_addr;
-
-	/* inflight share memory info */
-	union {
-		struct rte_vhost_inflight_info_split *inflight_split;
-		struct rte_vhost_inflight_info_packed *inflight_packed;
-	};
-	struct rte_vhost_resubmit_info *resubmit_inflight;
-	uint64_t		global_counter;
 
 	union {
 		struct vring_used_elem  *shadow_used_split;
@@ -176,22 +159,36 @@ struct vhost_virtqueue {
 	uint16_t		shadow_aligned_idx;
 	/* Record packed ring first dequeue desc index */
 	uint16_t		shadow_last_used_idx;
-	struct vhost_vring_addr ring_addrs;
 
-	struct batch_copy_elem	*batch_copy_elems;
 	uint16_t		batch_copy_nb_elems;
+	struct batch_copy_elem	*batch_copy_elems;
 	bool			used_wrap_counter;
 	bool			avail_wrap_counter;
 
-	struct log_cache_entry *log_cache;
-	uint16_t log_cache_nb_elem;
+	/* Physical address of used ring, for logging */
+	uint16_t		log_cache_nb_elem;
+	uint64_t		log_guest_addr;
+	struct log_cache_entry	*log_cache;
 
 	rte_rwlock_t	iotlb_lock;
 	rte_rwlock_t	iotlb_pending_lock;
 	struct rte_mempool *iotlb_pool;
 	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
-	int				iotlb_cache_nr;
 	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
+	int				iotlb_cache_nr;
+
+	/* Used to notify the guest (trigger interrupt) */
+	int			callfd;
+	/* Currently unused as polling mode is enabled */
+	int			kickfd;
+
+	/* inflight share memory info */
+	union {
+		struct rte_vhost_inflight_info_split *inflight_split;
+		struct rte_vhost_inflight_info_packed *inflight_packed;
+	};
+	struct rte_vhost_resubmit_info *resubmit_inflight;
+	uint64_t		global_counter;
 
 	/* operation callbacks for async dma */
 	struct rte_vhost_async_channel_ops	async_ops;
@@ -212,6 +209,11 @@ struct vhost_virtqueue {
 	bool		async_inorder;
 	bool		async_registered;
 	uint16_t	async_threshold;
+
+	int			notif_enable;
+#define VIRTIO_UNINITIALIZED_NOTIF	(-1)
+
+	struct vhost_vring_addr ring_addrs;
 } __rte_cache_aligned;
 
 /* Virtio device status as per Virtio specification */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 4d9e76e49e..2f4f89aeac 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -406,6 +406,11 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 	if (validate_msg_fds(msg, 0) != 0)
 		return RTE_VHOST_MSG_RESULT_ERR;
 
+	if (msg->payload.state.num > 32768) {
+		VHOST_LOG_CONFIG(ERR, "invalid virtqueue size %u\n", msg->payload.state.num);
+		return RTE_VHOST_MSG_RESULT_ERR;
+	}
+
 	vq->size = msg->payload.state.num;
 
 	/* VIRTIO 1.0, 2.4 Virtqueues says:
@@ -425,12 +430,6 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 		}
 	}
 
-	if (vq->size > 32768) {
-		VHOST_LOG_CONFIG(ERR,
-			"invalid virtqueue size %u\n", vq->size);
-		return RTE_VHOST_MSG_RESULT_ERR;
-	}
-
 	if (vq_is_packed(dev)) {
 		if (vq->shadow_used_packed)
 			rte_free(vq->shadow_used_packed);
@@ -713,7 +712,7 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index)
 			return dev;
 		}
 
-		vq->access_ok = 1;
+		vq->access_ok = true;
 		return dev;
 	}
 
@@ -771,7 +770,7 @@ translate_ring_addresses(struct virtio_net *dev, int vq_index)
 		vq->last_avail_idx = vq->used->idx;
 	}
 
-	vq->access_ok = 1;
+	vq->access_ok = true;
 
 	VHOST_LOG_CONFIG(DEBUG, "(%d) mapped address desc: %p\n",
 			dev->vid, vq->desc);
@@ -1658,7 +1657,7 @@ vhost_user_set_vring_call(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	vq = dev->virtqueue[file.index];
 
 	if (vq->ready) {
-		vq->ready = 0;
+		vq->ready = false;
 		vhost_user_notify_queue_state(dev, file.index, 0);
 	}
 
@@ -1918,14 +1917,14 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg,
 	 * the SET_VRING_ENABLE message.
 	 */
 	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
-		vq->enabled = 1;
+		vq->enabled = true;
 		if (dev->notify_ops->vring_state_changed)
 			dev->notify_ops->vring_state_changed(
 				dev->vid, file.index, 1);
 	}
 
 	if (vq->ready) {
-		vq->ready = 0;
+		vq->ready = false;
 		vhost_user_notify_queue_state(dev, file.index, 0);
 	}
 
@@ -2043,7 +2042,7 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
 			int main_fd __rte_unused)
 {
 	struct virtio_net *dev = *pdev;
-	int enable = (int)msg->payload.state.num;
+	bool enable = !!msg->payload.state.num;
 	int index = (int)msg->payload.state.index;
 
 	if (validate_msg_fds(msg, 0) != 0)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 583bf379c6..3d8e29df09 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1396,13 +1396,13 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 	rte_spinlock_lock(&vq->access_lock);
 
-	if (unlikely(vq->enabled == 0))
+	if (unlikely(!vq->enabled))
 		goto out_access_unlock;
 
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
 		vhost_user_iotlb_rd_lock(vq);
 
-	if (unlikely(vq->access_ok == 0))
+	if (unlikely(!vq->access_ok))
 		if (unlikely(vring_translate(dev, vq) < 0))
 			goto out;
 
@@ -1753,13 +1753,13 @@ virtio_dev_rx_async_submit(struct virtio_net *dev, uint16_t queue_id,
 
 	rte_spinlock_lock(&vq->access_lock);
 
-	if (unlikely(vq->enabled == 0 || !vq->async_registered))
+	if (unlikely(!vq->enabled || !vq->async_registered))
 		goto out_access_unlock;
 
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
 		vhost_user_iotlb_rd_lock(vq);
 
-	if (unlikely(vq->access_ok == 0))
+	if (unlikely(!vq->access_ok))
 		if (unlikely(vring_translate(dev, vq) < 0))
 			goto out;
 
@@ -2518,7 +2518,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
 		return 0;
 
-	if (unlikely(vq->enabled == 0)) {
+	if (unlikely(!vq->enabled)) {
 		count = 0;
 		goto out_access_unlock;
 	}
@@ -2526,7 +2526,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
 		vhost_user_iotlb_rd_lock(vq);
 
-	if (unlikely(vq->access_ok == 0))
+	if (unlikely(!vq->access_ok))
 		if (unlikely(vring_translate(dev, vq) < 0)) {
 			count = 0;
 			goto out;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct Maxime Coquelin
@ 2021-03-25  2:30   ` Xia, Chenbo
  0 siblings, 0 replies; 8+ messages in thread
From: Xia, Chenbo @ 2021-03-25  2:30 UTC (permalink / raw)
  To: Maxime Coquelin, dev, amorenoz, david.marchand, olivier.matz, bnemeth

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, March 23, 2021 5:02 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; amorenoz@redhat.com;
> david.marchand@redhat.com; olivier.matz@6wind.com; bnemeth@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v4 3/3] vhost: optimize vhost virtqueue struct
> 
> This patch moves vhost_virtqueue struct fields in order
> to both optimize packing and move hot fields on the first
> cachelines.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/vhost.c      |  6 ++--
>  lib/librte_vhost/vhost.h      | 54 ++++++++++++++++++-----------------
>  lib/librte_vhost/vhost_user.c | 23 +++++++--------
>  lib/librte_vhost/virtio_net.c | 12 ++++----
>  4 files changed, 48 insertions(+), 47 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index a8032e3ba1..04d63b2f02 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -524,7 +524,7 @@ vring_translate(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  	if (log_translate(dev, vq) < 0)
>  		return -1;
> 
> -	vq->access_ok = 1;
> +	vq->access_ok = true;
> 
>  	return 0;
>  }
> @@ -535,7 +535,7 @@ vring_invalidate(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
>  		vhost_user_iotlb_wr_lock(vq);
> 
> -	vq->access_ok = 0;
> +	vq->access_ok = false;
>  	vq->desc = NULL;
>  	vq->avail = NULL;
>  	vq->used = NULL;
> @@ -1451,7 +1451,7 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
> 
>  	rte_spinlock_lock(&vq->access_lock);
> 
> -	if (unlikely(vq->enabled == 0 || vq->avail == NULL))
> +	if (unlikely(!vq->enabled || vq->avail == NULL))
>  		goto out;
> 
>  	ret = *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 3a71dfeed9..f628714c24 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -133,7 +133,7 @@ struct vhost_virtqueue {
>  		struct vring_used	*used;
>  		struct vring_packed_desc_event *device_event;
>  	};
> -	uint32_t		size;
> +	uint16_t		size;
> 
>  	uint16_t		last_avail_idx;
>  	uint16_t		last_used_idx;
> @@ -143,29 +143,12 @@ struct vhost_virtqueue {
>  #define VIRTIO_INVALID_EVENTFD		(-1)
>  #define VIRTIO_UNINITIALIZED_EVENTFD	(-2)
> 
> -	int			enabled;
> -	int			access_ok;
> -	int			ready;
> -	int			notif_enable;
> -#define VIRTIO_UNINITIALIZED_NOTIF	(-1)
> +	bool			enabled;
> +	bool			access_ok;
> +	bool			ready;
> 
>  	rte_spinlock_t		access_lock;
> 
> -	/* Used to notify the guest (trigger interrupt) */
> -	int			callfd;
> -	/* Currently unused as polling mode is enabled */
> -	int			kickfd;
> -
> -	/* Physical address of used ring, for logging */
> -	uint64_t		log_guest_addr;
> -
> -	/* inflight share memory info */
> -	union {
> -		struct rte_vhost_inflight_info_split *inflight_split;
> -		struct rte_vhost_inflight_info_packed *inflight_packed;
> -	};
> -	struct rte_vhost_resubmit_info *resubmit_inflight;
> -	uint64_t		global_counter;
> 
>  	union {
>  		struct vring_used_elem  *shadow_used_split;
> @@ -176,22 +159,36 @@ struct vhost_virtqueue {
>  	uint16_t		shadow_aligned_idx;
>  	/* Record packed ring first dequeue desc index */
>  	uint16_t		shadow_last_used_idx;
> -	struct vhost_vring_addr ring_addrs;
> 
> -	struct batch_copy_elem	*batch_copy_elems;
>  	uint16_t		batch_copy_nb_elems;
> +	struct batch_copy_elem	*batch_copy_elems;
>  	bool			used_wrap_counter;
>  	bool			avail_wrap_counter;
> 
> -	struct log_cache_entry *log_cache;
> -	uint16_t log_cache_nb_elem;
> +	/* Physical address of used ring, for logging */
> +	uint16_t		log_cache_nb_elem;
> +	uint64_t		log_guest_addr;
> +	struct log_cache_entry	*log_cache;
> 
>  	rte_rwlock_t	iotlb_lock;
>  	rte_rwlock_t	iotlb_pending_lock;
>  	struct rte_mempool *iotlb_pool;
>  	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
> -	int				iotlb_cache_nr;
>  	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_pending_list;
> +	int				iotlb_cache_nr;
> +
> +	/* Used to notify the guest (trigger interrupt) */
> +	int			callfd;
> +	/* Currently unused as polling mode is enabled */
> +	int			kickfd;
> +
> +	/* inflight share memory info */
> +	union {
> +		struct rte_vhost_inflight_info_split *inflight_split;
> +		struct rte_vhost_inflight_info_packed *inflight_packed;
> +	};
> +	struct rte_vhost_resubmit_info *resubmit_inflight;
> +	uint64_t		global_counter;
> 
>  	/* operation callbacks for async dma */
>  	struct rte_vhost_async_channel_ops	async_ops;
> @@ -212,6 +209,11 @@ struct vhost_virtqueue {
>  	bool		async_inorder;
>  	bool		async_registered;
>  	uint16_t	async_threshold;
> +
> +	int			notif_enable;
> +#define VIRTIO_UNINITIALIZED_NOTIF	(-1)
> +
> +	struct vhost_vring_addr ring_addrs;
>  } __rte_cache_aligned;
> 
>  /* Virtio device status as per Virtio specification */
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 4d9e76e49e..2f4f89aeac 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -406,6 +406,11 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  	if (validate_msg_fds(msg, 0) != 0)
>  		return RTE_VHOST_MSG_RESULT_ERR;
> 
> +	if (msg->payload.state.num > 32768) {
> +		VHOST_LOG_CONFIG(ERR, "invalid virtqueue size %u\n", msg-
> >payload.state.num);
> +		return RTE_VHOST_MSG_RESULT_ERR;
> +	}
> +
>  	vq->size = msg->payload.state.num;
> 
>  	/* VIRTIO 1.0, 2.4 Virtqueues says:
> @@ -425,12 +430,6 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
>  		}
>  	}
> 
> -	if (vq->size > 32768) {
> -		VHOST_LOG_CONFIG(ERR,
> -			"invalid virtqueue size %u\n", vq->size);
> -		return RTE_VHOST_MSG_RESULT_ERR;
> -	}
> -
>  	if (vq_is_packed(dev)) {
>  		if (vq->shadow_used_packed)
>  			rte_free(vq->shadow_used_packed);
> @@ -713,7 +712,7 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
>  			return dev;
>  		}
> 
> -		vq->access_ok = 1;
> +		vq->access_ok = true;
>  		return dev;
>  	}
> 
> @@ -771,7 +770,7 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
>  		vq->last_avail_idx = vq->used->idx;
>  	}
> 
> -	vq->access_ok = 1;
> +	vq->access_ok = true;
> 
>  	VHOST_LOG_CONFIG(DEBUG, "(%d) mapped address desc: %p\n",
>  			dev->vid, vq->desc);
> @@ -1658,7 +1657,7 @@ vhost_user_set_vring_call(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  	vq = dev->virtqueue[file.index];
> 
>  	if (vq->ready) {
> -		vq->ready = 0;
> +		vq->ready = false;
>  		vhost_user_notify_queue_state(dev, file.index, 0);
>  	}
> 
> @@ -1918,14 +1917,14 @@ vhost_user_set_vring_kick(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  	 * the SET_VRING_ENABLE message.
>  	 */
>  	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
> -		vq->enabled = 1;
> +		vq->enabled = true;
>  		if (dev->notify_ops->vring_state_changed)
>  			dev->notify_ops->vring_state_changed(
>  				dev->vid, file.index, 1);
>  	}
> 
>  	if (vq->ready) {
> -		vq->ready = 0;
> +		vq->ready = false;
>  		vhost_user_notify_queue_state(dev, file.index, 0);
>  	}
> 
> @@ -2043,7 +2042,7 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
>  			int main_fd __rte_unused)
>  {
>  	struct virtio_net *dev = *pdev;
> -	int enable = (int)msg->payload.state.num;
> +	bool enable = !!msg->payload.state.num;
>  	int index = (int)msg->payload.state.index;
> 
>  	if (validate_msg_fds(msg, 0) != 0)
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 583bf379c6..3d8e29df09 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -1396,13 +1396,13 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t
> queue_id,
> 
>  	rte_spinlock_lock(&vq->access_lock);
> 
> -	if (unlikely(vq->enabled == 0))
> +	if (unlikely(!vq->enabled))
>  		goto out_access_unlock;
> 
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
>  		vhost_user_iotlb_rd_lock(vq);
> 
> -	if (unlikely(vq->access_ok == 0))
> +	if (unlikely(!vq->access_ok))
>  		if (unlikely(vring_translate(dev, vq) < 0))
>  			goto out;
> 
> @@ -1753,13 +1753,13 @@ virtio_dev_rx_async_submit(struct virtio_net *dev,
> uint16_t queue_id,
> 
>  	rte_spinlock_lock(&vq->access_lock);
> 
> -	if (unlikely(vq->enabled == 0 || !vq->async_registered))
> +	if (unlikely(!vq->enabled || !vq->async_registered))
>  		goto out_access_unlock;
> 
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
>  		vhost_user_iotlb_rd_lock(vq);
> 
> -	if (unlikely(vq->access_ok == 0))
> +	if (unlikely(!vq->access_ok))
>  		if (unlikely(vring_translate(dev, vq) < 0))
>  			goto out;
> 
> @@ -2518,7 +2518,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	if (unlikely(rte_spinlock_trylock(&vq->access_lock) == 0))
>  		return 0;
> 
> -	if (unlikely(vq->enabled == 0)) {
> +	if (unlikely(!vq->enabled)) {
>  		count = 0;
>  		goto out_access_unlock;
>  	}
> @@ -2526,7 +2526,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
>  		vhost_user_iotlb_rd_lock(vq);
> 
> -	if (unlikely(vq->access_ok == 0))
> +	if (unlikely(!vq->access_ok))
>  		if (unlikely(vring_translate(dev, vq) < 0)) {
>  			count = 0;
>  			goto out;
> --
> 2.30.2

Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly
  2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
                   ` (2 preceding siblings ...)
  2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct Maxime Coquelin
@ 2021-03-23 10:30 ` David Marchand
  2021-03-29 10:52   ` Balazs Nemeth
  2021-03-31  6:04 ` Xia, Chenbo
  4 siblings, 1 reply; 8+ messages in thread
From: David Marchand @ 2021-03-23 10:30 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Xia, Chenbo, Adrian Moreno Zapata, Olivier Matz, Balazs Nemeth

On Tue, Mar 23, 2021 at 10:02 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> As done for Virtio PMD, this series improves cache utilization
> of the vhost_virtqueue struct by removing unused field,
> make the live-migration cache dynamically allocated at
> live-migration setup time and by moving fields
> around so that hot fields are on the first cachelines.
>
> With this series, The struct vhost_virtqueue size goes
> from 832B (13 cachelines) down to 320B (5 cachelines).
>
> With this series and the virtio one, I measure a gain
> of up to 8% in IO loop micro-benchmark with packed
> ring, and 5% with split ring.
>
> I don't have a setup at hand to run PVP testing, but
> it might be interresting to get the numbers as I
> suspect the cache pressure is higher in this test as
> in real use-cases.
>
> Changes in v4:
> ==============
> - Fix missing changes to boolean (Chenbo)
>

For the series,
Reviewed-by: David Marchand <david.marchand@redhat.com>

Merci !


-- 
David Marchand


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly
  2021-03-23 10:30 ` [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly David Marchand
@ 2021-03-29 10:52   ` Balazs Nemeth
  0 siblings, 0 replies; 8+ messages in thread
From: Balazs Nemeth @ 2021-03-29 10:52 UTC (permalink / raw)
  To: David Marchand, Maxime Coquelin
  Cc: dev, Xia, Chenbo, Adrian Moreno Zapata, Olivier Matz

On Tue, 2021-03-23 at 11:30 +0100, David Marchand wrote:
> On Tue, Mar 23, 2021 at 10:02 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
> > 
> > As done for Virtio PMD, this series improves cache utilization
> > of the vhost_virtqueue struct by removing unused field,
> > make the live-migration cache dynamically allocated at
> > live-migration setup time and by moving fields
> > around so that hot fields are on the first cachelines.
> > 
> > With this series, The struct vhost_virtqueue size goes
> > from 832B (13 cachelines) down to 320B (5 cachelines).
> > 
> > With this series and the virtio one, I measure a gain
> > of up to 8% in IO loop micro-benchmark with packed
> > ring, and 5% with split ring.
> > 
> > I don't have a setup at hand to run PVP testing, but
> > it might be interresting to get the numbers as I
> > suspect the cache pressure is higher in this test as
> > in real use-cases.
> > 
> > Changes in v4:
> > ==============
> > - Fix missing changes to boolean (Chenbo)
> > 
> 
> For the series,
> Reviewed-by: David Marchand <david.marchand@redhat.com>
> 
> Merci !
> 
> 

Tested this in a PVP setup on ARM, giving a slight improvement in
performance. For the series:

Tested-by: Balazs Nemeth <bnemeth@redhat.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly
  2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
                   ` (3 preceding siblings ...)
  2021-03-23 10:30 ` [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly David Marchand
@ 2021-03-31  6:04 ` Xia, Chenbo
  4 siblings, 0 replies; 8+ messages in thread
From: Xia, Chenbo @ 2021-03-31  6:04 UTC (permalink / raw)
  To: Maxime Coquelin, dev, amorenoz, david.marchand, olivier.matz, bnemeth

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, March 23, 2021 5:02 PM
> To: dev@dpdk.org; Xia, Chenbo <chenbo.xia@intel.com>; amorenoz@redhat.com;
> david.marchand@redhat.com; olivier.matz@6wind.com; bnemeth@redhat.com
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Subject: [PATCH v4 0/3] vhost: make virtqueue cache-friendly
> 
> As done for Virtio PMD, this series improves cache utilization
> of the vhost_virtqueue struct by removing unused field,
> make the live-migration cache dynamically allocated at
> live-migration setup time and by moving fields
> around so that hot fields are on the first cachelines.
> 
> With this series, The struct vhost_virtqueue size goes
> from 832B (13 cachelines) down to 320B (5 cachelines).
> 
> With this series and the virtio one, I measure a gain
> of up to 8% in IO loop micro-benchmark with packed
> ring, and 5% with split ring.
> 
> I don't have a setup at hand to run PVP testing, but
> it might be interresting to get the numbers as I
> suspect the cache pressure is higher in this test as
> in real use-cases.
> 
> Maxime Coquelin (3):
>   vhost: remove unused Vhost virtqueue field
>   vhost: move dirty logging cache out of the virtqueue
>   vhost: optimize vhost virtqueue struct
> 
>  lib/librte_vhost/vhost.c      | 21 +++++++++----
>  lib/librte_vhost/vhost.h      | 56 +++++++++++++++++------------------
>  lib/librte_vhost/vhost_user.c | 44 +++++++++++++++++++--------
>  lib/librte_vhost/virtio_net.c | 12 ++++----
>  4 files changed, 82 insertions(+), 51 deletions(-)
> 
> --
> 2.30.2

Series applied to next-virtio/main, Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-31  6:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23  9:02 [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly Maxime Coquelin
2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 1/3] vhost: remove unused Vhost virtqueue field Maxime Coquelin
2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 2/3] vhost: move dirty logging cache out of the virtqueue Maxime Coquelin
2021-03-23  9:02 ` [dpdk-dev] [PATCH v4 3/3] vhost: optimize vhost virtqueue struct Maxime Coquelin
2021-03-25  2:30   ` Xia, Chenbo
2021-03-23 10:30 ` [dpdk-dev] [PATCH v4 0/3] vhost: make virtqueue cache-friendly David Marchand
2021-03-29 10:52   ` Balazs Nemeth
2021-03-31  6:04 ` Xia, Chenbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).