DPDK patches and discussions
 help / color / mirror / Atom feed
From: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"konstantin.v.ananyev@yandex.ru" <konstantin.v.ananyev@yandex.ru>,
	Feifei Wang <Feifei.Wang2@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: [RFC] ring: improve ring performance with C11 atomics
Date: Thu, 8 Jun 2023 13:21:22 +0000	[thread overview]
Message-ID: <AS4PR08MB7783BD2F78CF9C04ACA114789F50A@AS4PR08MB7783.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20230421191642.217011-1-wathsala.vithanage@arm.com>

The solution presented in this RFC is not C11 compliant. 
C11 __atomic_compare_exchange_n updates "expected" only when CAS instruction fails. 
Therefore, the assumption that there is an address dependency from CAS instructions in both producer/consumer head update to the ring element accesses falls apart with respect to semantics of the C11 memory model. 
Thus, this solution may corrupt ring elements when executed on CPUs with weak memory models.
Therefore, this RFC will be retracted.

> -----Original Message-----
> From: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Sent: Friday, April 21, 2023 3:17 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> konstantin.v.ananyev@yandex.ru; Feifei Wang <Feifei.Wang2@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Wathsala Wathawana Vithanage
> <wathsala.vithanage@arm.com>
> Subject: [RFC] ring: improve ring performance with C11 atomics
> 
> Tail load in __rte_ring_move_cons_head and __rte_ring_move_prod_head can
> be changed to __ATOMIC_RELAXED from __ATOMIC_ACQUIRE.
> Because to calculate the addresses of the dequeue elements
> __rte_ring_dequeue_elems uses the old_head updated by the
> __atomic_compare_exchange_n intrinsic used in __rte_ring_move_prod_head.
> This results in an address dependency between the two operations. Therefore
> __rte_ring_dequeue_elems cannot happen before
> __rte_ring_move_prod_head.
> Similarly __rte_ring_enqueue_elems and __rte_ring_move_cons_head won't be
> reordered either.
> 
> Performance on Arm N1
> Gain relative to generic implementation
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 8 (Arm N1)                         |
> +-------------------------------------------------------------------+
> | Generic             | C11 atomics          | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 766730 | Total count: 651686  | Total count: 812125  |
> |                     |        Gain: -15%    |        Gain: 6%      |
> +-------------------------------------------------------------------+
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 32 (Arm N1)                        |
> +-------------------------------------------------------------------+
> | Generic             | C11 atomics          | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 816745 | Total count: 646385  | Total count: 830935  |
> |                     |        Gain: -21%    |        Gain: 2%      |
> +-------------------------------------------------------------------+
> 
> Performance on x86-64 Cascade Lake
> Gain relative to generic implementation
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 8                                  |
> +-------------------------------------------------------------------+
> | Generic             | C11 atomics          | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 181640 | Total count: 181995  | Total count: 182791  |
> |                     |        Gain: 0.2%    |        Gain: 0.6%
> +-------------------------------------------------------------------+
> +-------------------------------------------------------------------+
> | Bulk enq/dequeue count on size 32                                 |
> +-------------------------------------------------------------------+
> | Generic             | C11 atomics          | C11 atomics improved |
> +-------------------------------------------------------------------+
> | Total count: 167495 | Total count: 161536  | Total count: 163190  |
> |                     |        Gain: -3.5%   |        Gain: -2.6%   |
> +-------------------------------------------------------------------+
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Feifei Wang <feifei.wang2@arm.com>
> ---
>  .mailmap                    |  1 +
>  lib/ring/rte_ring_c11_pvt.h | 18 +++++++++---------
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/.mailmap b/.mailmap
> index 4018f0fc47..367115d134 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -1430,6 +1430,7 @@ Walter Heymans <walter.heymans@corigine.com>
> Wang Sheng-Hui <shhuiw@gmail.com>  Wangyu (Eric)
> <seven.wangyu@huawei.com>  Waterman Cao <waterman.cao@intel.com>
> +Wathsala Vithanage <wathsala.vithanage@arm.com>
>  Weichun Chen <weichunx.chen@intel.com>
>  Wei Dai <wei.dai@intel.com>
>  Weifeng Li <liweifeng96@126.com>
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h index
> f895950df4..1895f2bb0e 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -24,6 +24,13 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht,
> uint32_t old_val,
>  	if (!single)
>  		rte_wait_until_equal_32(&ht->tail, old_val,
> __ATOMIC_RELAXED);
> 
> +	/*
> +	 * Updating of ht->tail cannot happen before elements are added to or
> +	 * removed from the ring, as it could result in data races between
> +	 * producer and consumer threads. Therefore ht->tail should be
> updated
> +	 * with release semantics to prevent ring data copy phase from sinking
> +	 * below it.
> +	 */
>  	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);  }
> 
> @@ -69,11 +76,8 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> unsigned int is_sp,
>  		/* Ensure the head is read before tail */
>  		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> 
> -		/* load-acquire synchronize with store-release of ht->tail
> -		 * in update_tail.
> -		 */
>  		cons_tail = __atomic_load_n(&r->cons.tail,
> -					__ATOMIC_ACQUIRE);
> +					__ATOMIC_RELAXED);
> 
>  		/* The subtraction is done between two unsigned 32bits value
>  		 * (the result is always modulo 32 bits even if we have @@ -
> 145,12 +149,8 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
>  		/* Ensure the head is read before tail */
>  		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> 
> -		/* this load-acquire synchronize with store-release of ht->tail
> -		 * in update_tail.
> -		 */
>  		prod_tail = __atomic_load_n(&r->prod.tail,
> -					__ATOMIC_ACQUIRE);
> -
> +					__ATOMIC_RELAXED);
>  		/* The subtraction is done between two unsigned 32bits value
>  		 * (the result is always modulo 32 bits even if we have
>  		 * cons_head > prod_tail). So 'entries' is always between 0
> --
> 2.25.1


  parent reply	other threads:[~2023-06-08 13:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 19:16 Wathsala Vithanage
2023-04-22 13:19 ` Morten Brørup
2023-06-08 13:21 ` Wathsala Wathawana Vithanage [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-04-21 19:14 Wathsala Vithanage
2023-04-21 21:27 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS4PR08MB7783BD2F78CF9C04ACA114789F50A@AS4PR08MB7783.eurprd08.prod.outlook.com \
    --to=wathsala.vithanage@arm.com \
    --cc=Feifei.Wang2@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).