* [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic
@ 2020-12-21 15:50 Joyce Kong
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue Joyce Kong
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
This patchset is to replace rte smp barriers in vhost with C11 atomic
built-ins.
The rte_smp_*mb APIs provide full barrier functionality. However, many
use cases do not require full barriers. To support such use cases, DPDK
will adopt C11 barrier semantics and provide wrappers using C11 atomic
built-ins.[1]
With this patchset, PVP case(vhost-user + virtio-user) has 9.8% perf
uplift for the split in_order path and no perf degradation for the
packed in_order path under 0.001% acceptable loss on ThunderX2 platform.
[1] http://code.dpdk.org/dpdk/latest/source/doc/guides/rel_notes/deprecation.rst
Joyce Kong (8):
  examples/vhost: relax memory ordering when enqueue/dequeue
  examples/vhost_blk: replace smp with thread fence
  vhost: remove unnecessary smp barrier for desc flags
  vhost: remove unnecessary smp barrier for avail idx
  vhost: relax full barriers for desc flags
  vhost: relax full barriers for used idx
  vhost: replace smp with thread fence for packed vring
  vhost: replace smp with thread fence for control path
 examples/vhost/virtio_net.c    | 12 ++++--------
 examples/vhost_blk/vhost_blk.c |  8 ++++----
 lib/librte_vhost/vdpa.c        |  4 ++--
 lib/librte_vhost/vhost.c       | 18 +++++++++---------
 lib/librte_vhost/vhost.h       |  6 +++---
 lib/librte_vhost/vhost_user.c  |  2 +-
 lib/librte_vhost/virtio_net.c  | 26 +++++++++++---------------
 7 files changed, 34 insertions(+), 42 deletions(-)
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:33   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence Joyce Kong
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Use C11 atomic APIs with one-way barriers to replace two-way
barriers when operating enqueue/dequeue. Used->idx and avail->idx
are the synchronization points for split vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 examples/vhost/virtio_net.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c
index 8ea6b36d5..64bf3d19f 100644
--- a/examples/vhost/virtio_net.c
+++ b/examples/vhost/virtio_net.c
@@ -191,7 +191,7 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	queue = &dev->queues[queue_id];
 	vr    = &queue->vr;
 
-	avail_idx = *((volatile uint16_t *)&vr->avail->idx);
+	avail_idx = __atomic_load_n(&vr->avail->idx, __ATOMIC_ACQUIRE);
 	start_idx = queue->last_used_idx;
 	free_entries = avail_idx - start_idx;
 	count = RTE_MIN(count, free_entries);
@@ -224,9 +224,7 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 			rte_prefetch0(&vr->desc[desc_indexes[i+1]]);
 	}
 
-	rte_smp_wmb();
-
-	*(volatile uint16_t *)&vr->used->idx += count;
+	__atomic_add_fetch(&vr->used->idx, count, __ATOMIC_RELEASE);
 	queue->last_used_idx += count;
 
 	rte_vhost_vring_call(dev->vid, queue_id);
@@ -374,7 +372,7 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 	queue = &dev->queues[queue_id];
 	vr    = &queue->vr;
 
-	free_entries = *((volatile uint16_t *)&vr->avail->idx) -
+	free_entries = __atomic_load_n(&vr->avail->idx, __ATOMIC_ACQUIRE) -
 			queue->last_avail_idx;
 	if (free_entries == 0)
 		return 0;
@@ -429,10 +427,8 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
 
 	queue->last_avail_idx += i;
 	queue->last_used_idx += i;
-	rte_smp_wmb();
-	rte_smp_rmb();
 
-	vr->used->idx += i;
+	__atomic_add_fetch(&vr->used->idx, i, __ATOMIC_ACQ_REL);
 
 	rte_vhost_vring_call(dev->vid, queue_id);
 
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:35   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags Joyce Kong
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Simply replace the rte_smp_mb barriers with SEQ_CST atomic thread fence,
if there is no load/store operations.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 examples/vhost_blk/vhost_blk.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
index bb293d492..7ea60863d 100644
--- a/examples/vhost_blk/vhost_blk.c
+++ b/examples/vhost_blk/vhost_blk.c
@@ -86,9 +86,9 @@ enqueue_task(struct vhost_blk_task *task)
 	 */
 	used->ring[used->idx & (vq->vring.size - 1)].id = task->req_idx;
 	used->ring[used->idx & (vq->vring.size - 1)].len = task->data_len;
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 	used->idx++;
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	rte_vhost_clr_inflight_desc_split(task->ctrlr->vid,
 		vq->id, used->idx, task->req_idx);
@@ -112,12 +112,12 @@ enqueue_task_packed(struct vhost_blk_task *task)
 	desc->id = task->buffer_id;
 	desc->addr = 0;
 
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 	if (vq->used_wrap_counter)
 		desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
 	else
 		desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	rte_vhost_clr_inflight_desc_packed(task->ctrlr->vid, vq->id,
 					   task->inflight_idx);
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue Joyce Kong
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:38   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx Joyce Kong
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
As function desc_is_avail performs a load-acquire barrier to
enforce the ordering between desc flags and desc content, it is
unnecessary to add a rte_smp_rmb barrier around the trace which
follows desc_is_avail.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/virtio_net.c | 3 ---
 1 file changed, 3 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6c5128665..ae6723766 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1281,8 +1281,6 @@ virtio_dev_rx_batch_packed(struct virtio_net *dev,
 			return -1;
 	}
 
-	rte_smp_rmb();
-
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
 
@@ -1343,7 +1341,6 @@ virtio_dev_rx_single_packed(struct virtio_net *dev,
 	struct buf_vector buf_vec[BUF_VECTOR_MAX];
 	uint16_t nr_descs = 0;
 
-	rte_smp_rmb();
 	if (unlikely(vhost_enqueue_single_packed(dev, vq, pkt, buf_vec,
 						 &nr_descs) < 0)) {
 		VHOST_LOG_DATA(DEBUG,
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (2 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:43   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags Joyce Kong
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
The ordering between avail index and desc reads has been enforced
by load-acquire for split vring, so smp_rmb barrier is not needed
behind it.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/virtio_net.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index ae6723766..c912ae354 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1494,13 +1494,10 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 	struct async_inflight_info *pkts_info = vq->async_pkts_info;
 	int n_pkts = 0;
 
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
-
 	/*
-	 * The ordering between avail index and
-	 * desc reads needs to be enforced.
+	 * The ordering between avail index and desc reads need to be enforced.
 	 */
-	rte_smp_rmb();
+	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (3 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:45   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx Joyce Kong
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Relax the full read barrier to one-way barrier for desc flags in
packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/virtio_net.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index c912ae354..b779034dc 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -222,8 +222,9 @@ vhost_flush_dequeue_shadow_packed(struct virtio_net *dev,
 	struct vring_used_elem_packed *used_elem = &vq->shadow_used_packed[0];
 
 	vq->desc_packed[vq->shadow_last_used_idx].id = used_elem->id;
-	rte_smp_wmb();
-	vq->desc_packed[vq->shadow_last_used_idx].flags = used_elem->flags;
+	/* desc flags is the synchronization point for virtio packed vring */
+	__atomic_store_n(&vq->desc_packed[vq->shadow_last_used_idx].flags,
+			 used_elem->flags, __ATOMIC_RELEASE);
 
 	vhost_log_cache_used_vring(dev, vq, vq->shadow_last_used_idx *
 				   sizeof(struct vring_packed_desc),
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (4 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 16:46   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring Joyce Kong
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Used idx can be synchronized by one-way barrier instead of full
write barrier for split vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/vdpa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
index ae6fdd24e..99a926a77 100644
--- a/lib/librte_vhost/vdpa.c
+++ b/lib/librte_vhost/vdpa.c
@@ -217,8 +217,8 @@ rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m)
 		idx++;
 	}
 
-	rte_smp_wmb();
-	vq->used->idx = idx_m;
+	/* used idx is the synchronization point for the split vring */
+	__atomic_store_n(&vq->used->idx, idx_m, __ATOMIC_RELEASE);
 
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
 		vring_used_event(s_vring) = idx_m;
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (5 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 17:06   ` Maxime Coquelin
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path Joyce Kong
  2021-01-08  9:16 ` [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Maxime Coquelin
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Simply relace smp barriers with atomic thread fence for
virtio packed vring.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/virtio_net.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index b779034dc..e145fcbc2 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -171,7 +171,8 @@ vhost_flush_enqueue_shadow_packed(struct virtio_net *dev,
 			used_idx -= vq->size;
 	}
 
-	rte_smp_wmb();
+	/* The ordering for storing desc flags needs to be enforced. */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	for (i = 0; i < vq->shadow_used_idx; i++) {
 		uint16_t flags;
@@ -254,7 +255,7 @@ vhost_flush_enqueue_batch_packed(struct virtio_net *dev,
 		vq->desc_packed[vq->last_used_idx + i].len = lens[i];
 	}
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		vq->desc_packed[vq->last_used_idx + i].flags = flags;
@@ -313,7 +314,7 @@ vhost_shadow_dequeue_batch_packed(struct virtio_net *dev,
 		vq->desc_packed[vq->last_used_idx + i].len = 0;
 	}
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 	vhost_for_each_try_unroll(i, begin, PACKED_BATCH_SIZE)
 		vq->desc_packed[vq->last_used_idx + i].flags = flags;
 
@@ -2246,7 +2247,7 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev,
 			return -1;
 	}
 
-	rte_smp_rmb();
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (6 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring Joyce Kong
@ 2020-12-21 15:50 ` Joyce Kong
  2021-01-07 17:15   ` Maxime Coquelin
  2021-01-08  9:16 ` [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Maxime Coquelin
  8 siblings, 1 reply; 18+ messages in thread
From: Joyce Kong @ 2020-12-21 15:50 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
Simply replace the smp barriers with atomic thread fence for vhost control
path, if there are no synchronization points.
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_vhost/vhost.c      | 18 +++++++++---------
 lib/librte_vhost/vhost.h      |  6 +++---
 lib/librte_vhost/vhost_user.c |  2 +-
 lib/librte_vhost/virtio_net.c |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index b83cf639e..c69b10560 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -106,7 +106,7 @@ __vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len)
 		return;
 
 	/* To make sure guest memory updates are committed before logging */
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	page = addr / VHOST_LOG_PAGE;
 	while (page * VHOST_LOG_PAGE < addr + len) {
@@ -144,7 +144,7 @@ __vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!dev->log_base))
 		return;
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	log_base = (unsigned long *)(uintptr_t)dev->log_base;
 
@@ -163,7 +163,7 @@ __vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq)
 #endif
 	}
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	vq->log_cache_nb_elem = 0;
 }
@@ -190,7 +190,7 @@ vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		 * No more room for a new log cache entry,
 		 * so write the dirty log map directly.
 		 */
-		rte_smp_wmb();
+		rte_atomic_thread_fence(__ATOMIC_RELEASE);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -1097,11 +1097,11 @@ rte_vhost_clr_inflight_desc_split(int vid, uint16_t vring_idx,
 	if (unlikely(idx >= vq->size))
 		return -1;
 
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	vq->inflight_split->desc[idx].inflight = 0;
 
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	vq->inflight_split->used_idx = last_used_idx;
 	return 0;
@@ -1140,11 +1140,11 @@ rte_vhost_clr_inflight_desc_packed(int vid, uint16_t vring_idx,
 	if (unlikely(head >= vq->size))
 		return -1;
 
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	inflight_info->desc[head].inflight = 0;
 
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	inflight_info->old_free_head = inflight_info->free_head;
 	inflight_info->old_used_idx = inflight_info->used_idx;
@@ -1330,7 +1330,7 @@ vhost_enable_notify_packed(struct virtio_net *dev,
 			vq->avail_wrap_counter << 15;
 	}
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	vq->device_event->flags = flags;
 	return 0;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 361c9f79b..23e11ff75 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -728,7 +728,7 @@ static __rte_always_inline void
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	/* Flush used->idx update before we read avail->flags. */
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	/* Don't kick guest if we don't reach index specified by guest. */
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
@@ -770,7 +770,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	bool signalled_used_valid, kick = false;
 
 	/* Flush used desc update. */
-	rte_smp_mb();
+	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 
 	if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) {
 		if (vq->driver_event->flags !=
@@ -796,7 +796,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		goto kick;
 	}
 
-	rte_smp_rmb();
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
 
 	off_wrap = vq->driver_event->off_wrap;
 	off = off_wrap & ~(1 << 15);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 45c8ac09d..6e94a9bb6 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1690,7 +1690,7 @@ vhost_check_queue_inflights_split(struct virtio_net *dev,
 
 	if (inflight_split->used_idx != used->idx) {
 		inflight_split->desc[last_io].inflight = 0;
-		rte_smp_mb();
+		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
 		inflight_split->used_idx = used->idx;
 	}
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index e145fcbc2..fec08b262 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1663,7 +1663,7 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
 			queue_id, 0, count - vq->async_last_pkts_n);
 	n_pkts_cpl += vq->async_last_pkts_n;
 
-	rte_smp_wmb();
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
 
 	while (likely((n_pkts_put < count) && n_inflight)) {
 		uint16_t info_idx = (start_idx + n_pkts_put) & (vq_size - 1);
-- 
2.29.2
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue Joyce Kong
@ 2021-01-07 16:33   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:33 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Use C11 atomic APIs with one-way barriers to replace two-way
> barriers when operating enqueue/dequeue. Used->idx and avail->idx
> are the synchronization points for split vring.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  examples/vhost/virtio_net.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
Nice!
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence Joyce Kong
@ 2021-01-07 16:35   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:35 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Simply replace the rte_smp_mb barriers with SEQ_CST atomic thread fence,
> if there is no load/store operations.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  examples/vhost_blk/vhost_blk.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c
> index bb293d492..7ea60863d 100644
> --- a/examples/vhost_blk/vhost_blk.c
> +++ b/examples/vhost_blk/vhost_blk.c
> @@ -86,9 +86,9 @@ enqueue_task(struct vhost_blk_task *task)
>  	 */
>  	used->ring[used->idx & (vq->vring.size - 1)].id = task->req_idx;
>  	used->ring[used->idx & (vq->vring.size - 1)].len = task->data_len;
From here
> -	rte_smp_mb();
> +	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>  	used->idx++;
> -	rte_smp_mb();
> +	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
to here, couldn't it be replaced with:
__atomic_add_fetch(&used->idx, 1, __ATOMIC_RELEASE);
?
>  	rte_vhost_clr_inflight_desc_split(task->ctrlr->vid,
>  		vq->id, used->idx, task->req_idx);
> @@ -112,12 +112,12 @@ enqueue_task_packed(struct vhost_blk_task *task)
>  	desc->id = task->buffer_id;
>  	desc->addr = 0;
>  
> -	rte_smp_mb();
> +	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>  	if (vq->used_wrap_counter)
>  		desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED;
>  	else
>  		desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED);
> -	rte_smp_mb();
> +	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
>  
>  	rte_vhost_clr_inflight_desc_packed(task->ctrlr->vid, vq->id,
>  					   task->inflight_idx);
> 
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags Joyce Kong
@ 2021-01-07 16:38   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:38 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> As function desc_is_avail performs a load-acquire barrier to
> enforce the ordering between desc flags and desc content, it is
> unnecessary to add a rte_smp_rmb barrier around the trace which
> follows desc_is_avail.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/virtio_net.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx Joyce Kong
@ 2021-01-07 16:43   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:43 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> The ordering between avail index and desc reads has been enforced
> by load-acquire for split vring, so smp_rmb barrier is not needed
> behind it.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/virtio_net.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags Joyce Kong
@ 2021-01-07 16:45   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:45 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Relax the full read barrier to one-way barrier for desc flags in
> packed vring.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/virtio_net.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx Joyce Kong
@ 2021-01-07 16:46   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 16:46 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Used idx can be synchronized by one-way barrier instead of full
> write barrier for split vring.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/vdpa.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring Joyce Kong
@ 2021-01-07 17:06   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 17:06 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Simply relace smp barriers with atomic thread fence for
> virtio packed vring.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/virtio_net.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path Joyce Kong
@ 2021-01-07 17:15   ` Maxime Coquelin
  0 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-07 17:15 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Simply replace the smp barriers with atomic thread fence for vhost control
> path, if there are no synchronization points.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_vhost/vhost.c      | 18 +++++++++---------
>  lib/librte_vhost/vhost.h      |  6 +++---
>  lib/librte_vhost/vhost_user.c |  2 +-
>  lib/librte_vhost/virtio_net.c |  2 +-
>  4 files changed, 14 insertions(+), 14 deletions(-)
> 
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic
  2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
                   ` (7 preceding siblings ...)
  2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path Joyce Kong
@ 2021-01-08  9:16 ` Maxime Coquelin
  8 siblings, 0 replies; 18+ messages in thread
From: Maxime Coquelin @ 2021-01-08  9:16 UTC (permalink / raw)
  To: Joyce Kong, chenbo.xia, honnappa.nagarahalli, ruifeng.wang; +Cc: dev, nd
On 12/21/20 4:50 PM, Joyce Kong wrote:
> This patchset is to replace rte smp barriers in vhost with C11 atomic
> built-ins.
> 
> The rte_smp_*mb APIs provide full barrier functionality. However, many
> use cases do not require full barriers. To support such use cases, DPDK
> will adopt C11 barrier semantics and provide wrappers using C11 atomic
> built-ins.[1]
> 
> With this patchset, PVP case(vhost-user + virtio-user) has 9.8% perf
> uplift for the split in_order path and no perf degradation for the
> packed in_order path under 0.001% acceptable loss on ThunderX2 platform.
> 
> [1] http://code.dpdk.org/dpdk/latest/source/doc/guides/rel_notes/deprecation.rst
> 
> Joyce Kong (8):
>   examples/vhost: relax memory ordering when enqueue/dequeue
>   examples/vhost_blk: replace smp with thread fence
>   vhost: remove unnecessary smp barrier for desc flags
>   vhost: remove unnecessary smp barrier for avail idx
>   vhost: relax full barriers for desc flags
>   vhost: relax full barriers for used idx
>   vhost: replace smp with thread fence for packed vring
>   vhost: replace smp with thread fence for control path
> 
>  examples/vhost/virtio_net.c    | 12 ++++--------
>  examples/vhost_blk/vhost_blk.c |  8 ++++----
>  lib/librte_vhost/vdpa.c        |  4 ++--
>  lib/librte_vhost/vhost.c       | 18 +++++++++---------
>  lib/librte_vhost/vhost.h       |  6 +++---
>  lib/librte_vhost/vhost_user.c  |  2 +-
>  lib/librte_vhost/virtio_net.c  | 26 +++++++++++---------------
>  7 files changed, 34 insertions(+), 42 deletions(-)
> 
Series applied to dpdk-next-virtio/main.
Thanks,
Maxime
^ permalink raw reply	[flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-01-08  9:16 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-21 15:50 [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Joyce Kong
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 1/8] examples/vhost: relax memory ordering when enqueue/dequeue Joyce Kong
2021-01-07 16:33   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 2/8] examples/vhost_blk: replace smp with thread fence Joyce Kong
2021-01-07 16:35   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 3/8] vhost: remove unnecessary smp barrier for desc flags Joyce Kong
2021-01-07 16:38   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 4/8] vhost: remove unnecessary smp barrier for avail idx Joyce Kong
2021-01-07 16:43   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 5/8] vhost: relax full barriers for desc flags Joyce Kong
2021-01-07 16:45   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 6/8] vhost: relax full barriers for used idx Joyce Kong
2021-01-07 16:46   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 7/8] vhost: replace smp with thread fence for packed vring Joyce Kong
2021-01-07 17:06   ` Maxime Coquelin
2020-12-21 15:50 ` [dpdk-dev] [PATCH v1 8/8] vhost: replace smp with thread fence for control path Joyce Kong
2021-01-07 17:15   ` Maxime Coquelin
2021-01-08  9:16 ` [dpdk-dev] [PATCH v1 0/8] replace smp barriers in vhost with C11 atomic Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).