This patchset is to replace rte smp barriers in vhost with C11 atomic built-ins. The rte_smp_*mb APIs provide full barrier functionality. However, many use cases do not require full barriers. To support such use cases, DPDK will adopt C11 barrier semantics and provide wrappers using C11 atomic built-ins.[1] With this patchset, PVP case(vhost-user + virtio-user) has 9.8% perf uplift for the split in_order path and no perf degradation for the packed in_order path under 0.001% acceptable loss on ThunderX2 platform. [1] http://code.dpdk.org/dpdk/latest/source/doc/guides/rel_notes/deprecation.rst Joyce Kong (8): examples/vhost: relax memory ordering when enqueue/dequeue examples/vhost_blk: replace smp with thread fence vhost: remove unnecessary smp barrier for desc flags vhost: remove unnecessary smp barrier for avail idx vhost: relax full barriers for desc flags vhost: relax full barriers for used idx vhost: replace smp with thread fence for packed vring vhost: replace smp with thread fence for control path examples/vhost/virtio_net.c | 12 ++++-------- examples/vhost_blk/vhost_blk.c | 8 ++++---- lib/librte_vhost/vdpa.c | 4 ++-- lib/librte_vhost/vhost.c | 18 +++++++++--------- lib/librte_vhost/vhost.h | 6 +++--- lib/librte_vhost/vhost_user.c | 2 +- lib/librte_vhost/virtio_net.c | 26 +++++++++++--------------- 7 files changed, 34 insertions(+), 42 deletions(-) -- 2.29.2
Use C11 atomic APIs with one-way barriers to replace two-way barriers when operating enqueue/dequeue. Used->idx and avail->idx are the synchronization points for split vring. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- examples/vhost/virtio_net.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/examples/vhost/virtio_net.c b/examples/vhost/virtio_net.c index 8ea6b36d5..64bf3d19f 100644 --- a/examples/vhost/virtio_net.c +++ b/examples/vhost/virtio_net.c @@ -191,7 +191,7 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id, queue = &dev->queues[queue_id]; vr = &queue->vr; - avail_idx = *((volatile uint16_t *)&vr->avail->idx); + avail_idx = __atomic_load_n(&vr->avail->idx, __ATOMIC_ACQUIRE); start_idx = queue->last_used_idx; free_entries = avail_idx - start_idx; count = RTE_MIN(count, free_entries); @@ -224,9 +224,7 @@ vs_enqueue_pkts(struct vhost_dev *dev, uint16_t queue_id, rte_prefetch0(&vr->desc[desc_indexes[i+1]]); } - rte_smp_wmb(); - - *(volatile uint16_t *)&vr->used->idx += count; + __atomic_add_fetch(&vr->used->idx, count, __ATOMIC_RELEASE); queue->last_used_idx += count; rte_vhost_vring_call(dev->vid, queue_id); @@ -374,7 +372,7 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id, queue = &dev->queues[queue_id]; vr = &queue->vr; - free_entries = *((volatile uint16_t *)&vr->avail->idx) - + free_entries = __atomic_load_n(&vr->avail->idx, __ATOMIC_ACQUIRE) - queue->last_avail_idx; if (free_entries == 0) return 0; @@ -429,10 +427,8 @@ vs_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id, queue->last_avail_idx += i; queue->last_used_idx += i; - rte_smp_wmb(); - rte_smp_rmb(); - vr->used->idx += i; + __atomic_add_fetch(&vr->used->idx, i, __ATOMIC_ACQ_REL); rte_vhost_vring_call(dev->vid, queue_id); -- 2.29.2
Simply replace the rte_smp_mb barriers with SEQ_CST atomic thread fence, if there is no load/store operations. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- examples/vhost_blk/vhost_blk.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c index bb293d492..7ea60863d 100644 --- a/examples/vhost_blk/vhost_blk.c +++ b/examples/vhost_blk/vhost_blk.c @@ -86,9 +86,9 @@ enqueue_task(struct vhost_blk_task *task) */ used->ring[used->idx & (vq->vring.size - 1)].id = task->req_idx; used->ring[used->idx & (vq->vring.size - 1)].len = task->data_len; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); used->idx++; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); rte_vhost_clr_inflight_desc_split(task->ctrlr->vid, vq->id, used->idx, task->req_idx); @@ -112,12 +112,12 @@ enqueue_task_packed(struct vhost_blk_task *task) desc->id = task->buffer_id; desc->addr = 0; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); if (vq->used_wrap_counter) desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED; else desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED); - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); rte_vhost_clr_inflight_desc_packed(task->ctrlr->vid, vq->id, task->inflight_idx); -- 2.29.2
As function desc_is_avail performs a load-acquire barrier to enforce the ordering between desc flags and desc content, it is unnecessary to add a rte_smp_rmb barrier around the trace which follows desc_is_avail. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/virtio_net.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 6c5128665..ae6723766 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1281,8 +1281,6 @@ virtio_dev_rx_batch_packed(struct virtio_net *dev, return -1; } - rte_smp_rmb(); - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) lens[i] = descs[avail_idx + i].len; @@ -1343,7 +1341,6 @@ virtio_dev_rx_single_packed(struct virtio_net *dev, struct buf_vector buf_vec[BUF_VECTOR_MAX]; uint16_t nr_descs = 0; - rte_smp_rmb(); if (unlikely(vhost_enqueue_single_packed(dev, vq, pkt, buf_vec, &nr_descs) < 0)) { VHOST_LOG_DATA(DEBUG, -- 2.29.2
The ordering between avail index and desc reads has been enforced by load-acquire for split vring, so smp_rmb barrier is not needed behind it. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/virtio_net.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index ae6723766..c912ae354 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1494,13 +1494,10 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev, struct async_inflight_info *pkts_info = vq->async_pkts_info; int n_pkts = 0; - avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE); - /* - * The ordering between avail index and - * desc reads needs to be enforced. + * The ordering between avail index and desc reads need to be enforced. */ - rte_smp_rmb(); + avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE); rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]); -- 2.29.2
Relax the full read barrier to one-way barrier for desc flags in packed vring. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/virtio_net.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index c912ae354..b779034dc 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -222,8 +222,9 @@ vhost_flush_dequeue_shadow_packed(struct virtio_net *dev, struct vring_used_elem_packed *used_elem = &vq->shadow_used_packed[0]; vq->desc_packed[vq->shadow_last_used_idx].id = used_elem->id; - rte_smp_wmb(); - vq->desc_packed[vq->shadow_last_used_idx].flags = used_elem->flags; + /* desc flags is the synchronization point for virtio packed vring */ + __atomic_store_n(&vq->desc_packed[vq->shadow_last_used_idx].flags, + used_elem->flags, __ATOMIC_RELEASE); vhost_log_cache_used_vring(dev, vq, vq->shadow_last_used_idx * sizeof(struct vring_packed_desc), -- 2.29.2
Used idx can be synchronized by one-way barrier instead of full write barrier for split vring. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/vdpa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index ae6fdd24e..99a926a77 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -217,8 +217,8 @@ rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m) idx++; } - rte_smp_wmb(); - vq->used->idx = idx_m; + /* used idx is the synchronization point for the split vring */ + __atomic_store_n(&vq->used->idx, idx_m, __ATOMIC_RELEASE); if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) vring_used_event(s_vring) = idx_m; -- 2.29.2
Simply relace smp barriers with atomic thread fence for virtio packed vring. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/virtio_net.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index b779034dc..e145fcbc2 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -171,7 +171,8 @@ vhost_flush_enqueue_shadow_packed(struct virtio_net *dev, used_idx -= vq->size; } - rte_smp_wmb(); + /* The ordering for storing desc flags needs to be enforced. */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); for (i = 0; i < vq->shadow_used_idx; i++) { uint16_t flags; @@ -254,7 +255,7 @@ vhost_flush_enqueue_batch_packed(struct virtio_net *dev, vq->desc_packed[vq->last_used_idx + i].len = lens[i]; } - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) vq->desc_packed[vq->last_used_idx + i].flags = flags; @@ -313,7 +314,7 @@ vhost_shadow_dequeue_batch_packed(struct virtio_net *dev, vq->desc_packed[vq->last_used_idx + i].len = 0; } - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); vhost_for_each_try_unroll(i, begin, PACKED_BATCH_SIZE) vq->desc_packed[vq->last_used_idx + i].flags = flags; @@ -2246,7 +2247,7 @@ vhost_reserve_avail_batch_packed(struct virtio_net *dev, return -1; } - rte_smp_rmb(); + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) lens[i] = descs[avail_idx + i].len; -- 2.29.2
Simply replace the smp barriers with atomic thread fence for vhost control path, if there are no synchronization points. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- lib/librte_vhost/vhost.c | 18 +++++++++--------- lib/librte_vhost/vhost.h | 6 +++--- lib/librte_vhost/vhost_user.c | 2 +- lib/librte_vhost/virtio_net.c | 2 +- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index b83cf639e..c69b10560 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -106,7 +106,7 @@ __vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len) return; /* To make sure guest memory updates are committed before logging */ - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); page = addr / VHOST_LOG_PAGE; while (page * VHOST_LOG_PAGE < addr + len) { @@ -144,7 +144,7 @@ __vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq) if (unlikely(!dev->log_base)) return; - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); log_base = (unsigned long *)(uintptr_t)dev->log_base; @@ -163,7 +163,7 @@ __vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq) #endif } - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); vq->log_cache_nb_elem = 0; } @@ -190,7 +190,7 @@ vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq, * No more room for a new log cache entry, * so write the dirty log map directly. */ - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page); return; @@ -1097,11 +1097,11 @@ rte_vhost_clr_inflight_desc_split(int vid, uint16_t vring_idx, if (unlikely(idx >= vq->size)) return -1; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); vq->inflight_split->desc[idx].inflight = 0; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); vq->inflight_split->used_idx = last_used_idx; return 0; @@ -1140,11 +1140,11 @@ rte_vhost_clr_inflight_desc_packed(int vid, uint16_t vring_idx, if (unlikely(head >= vq->size)) return -1; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); inflight_info->desc[head].inflight = 0; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); inflight_info->old_free_head = inflight_info->free_head; inflight_info->old_used_idx = inflight_info->used_idx; @@ -1330,7 +1330,7 @@ vhost_enable_notify_packed(struct virtio_net *dev, vq->avail_wrap_counter << 15; } - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); vq->device_event->flags = flags; return 0; diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 361c9f79b..23e11ff75 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -728,7 +728,7 @@ static __rte_always_inline void vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq) { /* Flush used->idx update before we read avail->flags. */ - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); /* Don't kick guest if we don't reach index specified by guest. */ if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { @@ -770,7 +770,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) bool signalled_used_valid, kick = false; /* Flush used desc update. */ - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) { if (vq->driver_event->flags != @@ -796,7 +796,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) goto kick; } - rte_smp_rmb(); + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); off_wrap = vq->driver_event->off_wrap; off = off_wrap & ~(1 << 15); diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 45c8ac09d..6e94a9bb6 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1690,7 +1690,7 @@ vhost_check_queue_inflights_split(struct virtio_net *dev, if (inflight_split->used_idx != used->idx) { inflight_split->desc[last_io].inflight = 0; - rte_smp_mb(); + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); inflight_split->used_idx = used->idx; } diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index e145fcbc2..fec08b262 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1663,7 +1663,7 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id, queue_id, 0, count - vq->async_last_pkts_n); n_pkts_cpl += vq->async_last_pkts_n; - rte_smp_wmb(); + rte_atomic_thread_fence(__ATOMIC_RELEASE); while (likely((n_pkts_put < count) && n_inflight)) { uint16_t info_idx = (start_idx + n_pkts_put) & (vq_size - 1); -- 2.29.2
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Use C11 atomic APIs with one-way barriers to replace two-way
> barriers when operating enqueue/dequeue. Used->idx and avail->idx
> are the synchronization points for split vring.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> examples/vhost/virtio_net.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
Nice!
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote: > Simply replace the rte_smp_mb barriers with SEQ_CST atomic thread fence, > if there is no load/store operations. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > examples/vhost_blk/vhost_blk.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/examples/vhost_blk/vhost_blk.c b/examples/vhost_blk/vhost_blk.c > index bb293d492..7ea60863d 100644 > --- a/examples/vhost_blk/vhost_blk.c > +++ b/examples/vhost_blk/vhost_blk.c > @@ -86,9 +86,9 @@ enqueue_task(struct vhost_blk_task *task) > */ > used->ring[used->idx & (vq->vring.size - 1)].id = task->req_idx; > used->ring[used->idx & (vq->vring.size - 1)].len = task->data_len; From here > - rte_smp_mb(); > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); > used->idx++; > - rte_smp_mb(); > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); to here, couldn't it be replaced with: __atomic_add_fetch(&used->idx, 1, __ATOMIC_RELEASE); ? > rte_vhost_clr_inflight_desc_split(task->ctrlr->vid, > vq->id, used->idx, task->req_idx); > @@ -112,12 +112,12 @@ enqueue_task_packed(struct vhost_blk_task *task) > desc->id = task->buffer_id; > desc->addr = 0; > > - rte_smp_mb(); > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); > if (vq->used_wrap_counter) > desc->flags |= VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED; > else > desc->flags &= ~(VIRTQ_DESC_F_AVAIL | VIRTQ_DESC_F_USED); > - rte_smp_mb(); > + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); > > rte_vhost_clr_inflight_desc_packed(task->ctrlr->vid, vq->id, > task->inflight_idx); >
On 12/21/20 4:50 PM, Joyce Kong wrote:
> As function desc_is_avail performs a load-acquire barrier to
> enforce the ordering between desc flags and desc content, it is
> unnecessary to add a rte_smp_rmb barrier around the trace which
> follows desc_is_avail.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/virtio_net.c | 3 ---
> 1 file changed, 3 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> The ordering between avail index and desc reads has been enforced
> by load-acquire for split vring, so smp_rmb barrier is not needed
> behind it.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/virtio_net.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Relax the full read barrier to one-way barrier for desc flags in
> packed vring.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/virtio_net.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Used idx can be synchronized by one-way barrier instead of full
> write barrier for split vring.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/vdpa.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Simply relace smp barriers with atomic thread fence for
> virtio packed vring.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/virtio_net.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> Simply replace the smp barriers with atomic thread fence for vhost control
> path, if there are no synchronization points.
>
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> lib/librte_vhost/vhost.c | 18 +++++++++---------
> lib/librte_vhost/vhost.h | 6 +++---
> lib/librte_vhost/vhost_user.c | 2 +-
> lib/librte_vhost/virtio_net.c | 2 +-
> 4 files changed, 14 insertions(+), 14 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 12/21/20 4:50 PM, Joyce Kong wrote:
> This patchset is to replace rte smp barriers in vhost with C11 atomic
> built-ins.
>
> The rte_smp_*mb APIs provide full barrier functionality. However, many
> use cases do not require full barriers. To support such use cases, DPDK
> will adopt C11 barrier semantics and provide wrappers using C11 atomic
> built-ins.[1]
>
> With this patchset, PVP case(vhost-user + virtio-user) has 9.8% perf
> uplift for the split in_order path and no perf degradation for the
> packed in_order path under 0.001% acceptable loss on ThunderX2 platform.
>
> [1] http://code.dpdk.org/dpdk/latest/source/doc/guides/rel_notes/deprecation.rst
>
> Joyce Kong (8):
> examples/vhost: relax memory ordering when enqueue/dequeue
> examples/vhost_blk: replace smp with thread fence
> vhost: remove unnecessary smp barrier for desc flags
> vhost: remove unnecessary smp barrier for avail idx
> vhost: relax full barriers for desc flags
> vhost: relax full barriers for used idx
> vhost: replace smp with thread fence for packed vring
> vhost: replace smp with thread fence for control path
>
> examples/vhost/virtio_net.c | 12 ++++--------
> examples/vhost_blk/vhost_blk.c | 8 ++++----
> lib/librte_vhost/vdpa.c | 4 ++--
> lib/librte_vhost/vhost.c | 18 +++++++++---------
> lib/librte_vhost/vhost.h | 6 +++---
> lib/librte_vhost/vhost_user.c | 2 +-
> lib/librte_vhost/virtio_net.c | 26 +++++++++++---------------
> 7 files changed, 34 insertions(+), 42 deletions(-)
>
Series applied to dpdk-next-virtio/main.
Thanks,
Maxime