From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Wathsala Vithanage" <wathsala.vithanage@arm.com>,
<honnappa.nagarahalli@arm.com>, <konstantin.v.ananyev@yandex.ru>,
<feifei.wang2@arm.com>
Cc: <dev@dpdk.org>, <nd@arm.com>
Subject: RE: [RFC] ring: improve ring performance with C11 atomics
Date: Sat, 22 Apr 2023 15:19:35 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D878AE@smartserver.smartshare.dk> (raw)
In-Reply-To: <20230421191642.217011-1-wathsala.vithanage@arm.com>
> From: Wathsala Vithanage [mailto:wathsala.vithanage@arm.com]
> Sent: Friday, 21 April 2023 21.17
>
> Tail load in __rte_ring_move_cons_head and __rte_ring_move_prod_head
> can be changed to __ATOMIC_RELAXED from __ATOMIC_ACQUIRE.
> Because to calculate the addresses of the dequeue
> elements __rte_ring_dequeue_elems uses the old_head updated by the
> __atomic_compare_exchange_n intrinsic used in
> __rte_ring_move_prod_head. This results in an address dependency
> between the two operations. Therefore __rte_ring_dequeue_elems cannot
> happen before __rte_ring_move_prod_head.
> Similarly __rte_ring_enqueue_elems and __rte_ring_move_cons_head
> won't be reordered either.
These preconditions should be added as comments in the source code.
>
> Performance on Arm N1
> Gain relative to generic implementation
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 8 (Arm N1) |
> +-------------------------------------------------------------------+
> | Generic | C11 atomics | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 766730 | Total count: 651686 | Total count: 812125 |
> | | Gain: -15% | Gain: 6% |
> +-------------------------------------------------------------------+
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 32 (Arm N1) |
> +-------------------------------------------------------------------+
> | Generic | C11 atomics | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 816745 | Total count: 646385 | Total count: 830935 |
> | | Gain: -21% | Gain: 2% |
> +-------------------------------------------------------------------+
Big performance gain compared to pre-improved C11 atomics! Excellent.
>
> Performance on x86-64 Cascade Lake
> Gain relative to generic implementation
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 8 |
> +-------------------------------------------------------------------+
> | Generic | C11 atomics | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 181640 | Total count: 181995 | Total count: 182791 |
> | | Gain: 0.2% | Gain: 0.6%
> +-------------------------------------------------------------------+
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 32 |
> +-------------------------------------------------------------------+
> | Generic | C11 atomics | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 167495 | Total count: 161536 | Total count: 163190 |
> | | Gain: -3.5% | Gain: -2.6% |
> +-------------------------------------------------------------------+
I noticed that the larger size (32 objects) had a larger relative drop in performance than the smaller size (8 objects), so I am wondering what the performance numbers are for size 512, the default RTE_MEMPOOL_CACHE_MAX_SIZE? It's probably not going to change anything regarding the patch acceptance, but I'm curious about the numbers.
>
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> ---
> .mailmap | 1 +
> lib/ring/rte_ring_c11_pvt.h | 18 +++++++++---------
> 2 files changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/.mailmap b/.mailmap
> index 4018f0fc47..367115d134 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1430,6 +1430,7 @@ Walter Heymans <walter.heymans@corigine.com>
> Wang Sheng-Hui <shhuiw@gmail.com>
> Wangyu (Eric) <seven.wangyu@huawei.com>
> Waterman Cao <waterman.cao@intel.com>
> +Wathsala Vithanage <wathsala.vithanage@arm.com>
> Weichun Chen <weichunx.chen@intel.com>
> Wei Dai <wei.dai@intel.com>
> Weifeng Li <liweifeng96@126.com>
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index f895950df4..1895f2bb0e 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -24,6 +24,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht,
> uint32_t old_val,
> if (!single)
> rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
>
> + /*
> + * Updating of ht->tail cannot happen before elements are added to or
> + * removed from the ring, as it could result in data races between
> + * producer and consumer threads. Therefore ht->tail should be updated
> + * with release semantics to prevent ring data copy phase from sinking
> + * below it.
> + */
I think this comment should clarified as:
Updating of ht->tail SHOULD NOT happen before elements are added to or
removed from the ring, as it WOULD result in data races between
producer and consumer threads. Therefore ht->tail MUST be updated
with release semantics to prevent ring data copy phase from sinking
below it.
Don't use capitals; I only used them to highlight my modifications.
NB: English is not my native language, so please ignore me if I am mistaken!
> __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> }
>
> @@ -69,11 +76,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int
> is_sp,
> /* Ensure the head is read before tail */
> __atomic_thread_fence(__ATOMIC_ACQUIRE);
>
> - /* load-acquire synchronize with store-release of ht->tail
> - * in update_tail.
> - */
> cons_tail = __atomic_load_n(&r->cons.tail,
> - __ATOMIC_ACQUIRE);
> + __ATOMIC_RELAXED);
The __ATOMIC_ACQUIRE had a comment describing its purpose.
Please don't just remove that comment. Please replace it with a new comment describing why __ATOMIC_RELAXED is valid here.
>
> /* The subtraction is done between two unsigned 32bits value
> * (the result is always modulo 32 bits even if we have
> @@ -145,12 +149,8 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
> /* Ensure the head is read before tail */
> __atomic_thread_fence(__ATOMIC_ACQUIRE);
>
> - /* this load-acquire synchronize with store-release of ht->tail
> - * in update_tail.
> - */
> prod_tail = __atomic_load_n(&r->prod.tail,
> - __ATOMIC_ACQUIRE);
> -
> + __ATOMIC_RELAXED);
Same comment as above: Don't just remove the old comment, please replace it with a new comment.
> /* The subtraction is done between two unsigned 32bits value
> * (the result is always modulo 32 bits even if we have
> * cons_head > prod_tail). So 'entries' is always between 0
> --
> 2.25.1
With comments regarding __ATOMIC_RELAXED added to the source code,
Acked-by: Morten Brørup <mb@smartsharesystems.com>
next prev parent reply other threads:[~2023-04-22 13:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-21 19:16 Wathsala Vithanage
2023-04-22 13:19 ` Morten Brørup [this message]
2023-06-08 13:21 ` Wathsala Wathawana Vithanage
-- strict thread matches above, loose matches on Subject: below --
2023-04-21 19:14 Wathsala Vithanage
2023-04-21 21:27 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D878AE@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=dev@dpdk.org \
--cc=feifei.wang2@arm.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=nd@arm.com \
--cc=wathsala.vithanage@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).