DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "david.marchand@redhat.com" <david.marchand@redhat.com>,
	"chaozhu@linux.vnet.ibm.com" <chaozhu@linux.vnet.ibm.com>,
	nd <nd@arm.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [EXT] [PATCH v3 5/6] spinlock: reimplement with atomic one-way barrier builtins
Date: Thu, 27 Dec 2018 09:02:42 +0000	[thread overview]
Message-ID: <VI1PR08MB31676F557B2DBB9329DB83498FB60@VI1PR08MB3167.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <af3e583ff590ee5393b157a37f8f8f51633be4c5.camel@marvell.com>



> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Sent: Thursday, December 27, 2018 3:42 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: david.marchand@redhat.com; chaozhu@linux.vnet.ibm.com; nd
> <nd@arm.com>; bruce.richardson@intel.com; thomas@monjalon.net;
> hemant.agrawal@nxp.com; stephen@networkplumber.org; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [EXT] [PATCH v3 5/6] spinlock: reimplement with atomic one-way
> barrier builtins
> 
> On Thu, 2018-12-27 at 12:13 +0800, Gavin Hu wrote:
> -------------------------------------------------------------------
> > ---
> > The __sync builtin based implementation generates full memory
> > barriers
> > ('dmb ish') on Arm platforms. Using C11 atomic builtins to generate
> > one way
> > barriers.
> >
> > Here is the assembly code of __sync_compare_and_swap builtin.
> > __sync_bool_compare_and_swap(dst, exp, src);
> >    0x000000000090f1b0 <+16>:    e0 07 40 f9 ldr x0, [sp, #8]
> >    0x000000000090f1b4 <+20>:    e1 0f 40 79 ldrh    w1, [sp, #6]
> >    0x000000000090f1b8 <+24>:    e2 0b 40 79 ldrh    w2, [sp, #4]
> >    0x000000000090f1bc <+28>:    21 3c 00 12 and w1, w1, #0xffff
> >    0x000000000090f1c0 <+32>:    03 7c 5f 48 ldxrh   w3, [x0]
> >    0x000000000090f1c4 <+36>:    7f 00 01 6b cmp w3, w1
> >    0x000000000090f1c8 <+40>:    61 00 00 54 b.ne    0x90f1d4
> > <rte_atomic16_cmpset+52>  // b.any
> >    0x000000000090f1cc <+44>:    02 fc 04 48 stlxrh  w4, w2, [x0]
> >    0x000000000090f1d0 <+48>:    84 ff ff 35 cbnz    w4, 0x90f1c0
> > <rte_atomic16_cmpset+32>
> >    0x000000000090f1d4 <+52>:    bf 3b 03 d5 dmb ish
> >    0x000000000090f1d8 <+56>:    e0 17 9f 1a cset    w0, eq  // eq =
> > none
> >
> > The benchmarking results showed 3X performance gain on Cavium
> > ThunderX2 and
> > 13% on Qualcomm Falmon and 3.7% on 4-A72 Marvell macchiatobin.
> > Here is the example test result on TX2:
> >
> > *** spinlock_autotest without this patch ***
> > Core [123] Cost Time = 639822 us
> > Core [124] Cost Time = 633253 us
> > Core [125] Cost Time = 646030 us
> > Core [126] Cost Time = 643189 us
> > Core [127] Cost Time = 647039 us
> > Total Cost Time = 95433298 us
> >
> > *** spinlock_autotest with this patch ***
> > Core [123] Cost Time = 163615 us
> > Core [124] Cost Time = 166471 us
> > Core [125] Cost Time = 189044 us
> > Core [126] Cost Time = 195745 us
> > Core [127] Cost Time = 78423 us
> > Total Cost Time = 27339656 us
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Reviewed-by: Steve Capper <Steve.Capper@arm.com>
> > ---
> >  lib/librte_eal/common/include/generic/rte_spinlock.h | 18
> > +++++++++++++-----
> >  1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h
> > b/lib/librte_eal/common/include/generic/rte_spinlock.h
> > index c4c3fc31e..87ae7a4f1 100644
> > --- a/lib/librte_eal/common/include/generic/rte_spinlock.h
> > +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
> > @@ -61,9 +61,14 @@ rte_spinlock_lock(rte_spinlock_t *sl);
> >  static inline void
> >  rte_spinlock_lock(rte_spinlock_t *sl)
> >  {
> > -	while (__sync_lock_test_and_set(&sl->locked, 1))
> > -		while(sl->locked)
> > +	int exp = 0;
> > +
> > +	while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
> > +				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
> {
> 
> How about remove explict exp = 0 and change to
> __atomic_test_and_set(flag, __ATOMIC_ACQUIRE);
> 
> i.e
> while (_atomic_test_and_set(flag, __ATOMIC_ACQUIRE))
> 
Will do it in v4.
> 
> > +		while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED))
> >  			rte_pause();
> > +		exp = 0;
> 
> We can remove exp = 0 with above scheme.
> 
> > +	}
> >  }
> >  #endif
> >
> > @@ -80,7 +85,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl);
> >  static inline void
> >  rte_spinlock_unlock (rte_spinlock_t *sl)
> >  {
> > -	__sync_lock_release(&sl->locked);
> > +	__atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
>  }
> >  #endif
> >
> > @@ -99,7 +104,10 @@ rte_spinlock_trylock (rte_spinlock_t *sl);
> >  static inline int
> >  rte_spinlock_trylock (rte_spinlock_t *sl)
> >  {
> > -	return __sync_lock_test_and_set(&sl->locked,1) == 0;
> > +	int exp = 0;
> > +	return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
> > +				0, /* disallow spurious failure */
> > +				__ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
> 
> Here to remove explicit exp.
> 
> return (__atomic_test_and_set(flag, __ATOMIC_ACQUIRE) == 0)

Will do it in v4.

> >  }
> >  #endif
> >
> > @@ -113,7 +121,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
> >   */
> >  static inline int rte_spinlock_is_locked (rte_spinlock_t *sl)
> >  {
> > -	return sl->locked;
> > +	return __atomic_load_n(&sl->locked, __ATOMIC_ACQUIRE);
> 
> __ATOMIC_RELAXED would be enough here. Right ?
Yes, it is enough for current DPDK uses, used for testing and assertions only.

For general applicability, we set acquire as concerned about it is used for reading protected data while the lock is not taken by anybody.
In this use case, Acquire will properly see all updates from before the lock was released, but this is still dangerous, as during the course, someone else might have taken the lock and changed the data.

Anyway, I will set Relaxed in v4 as the above use scenario was not recommended and not present in DPDK. 

> 
> >  }
> >
> >  /**

  reply	other threads:[~2018-12-27  9:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-27  4:13 [dpdk-dev] [PATCH v3 0/6] spinlock optimization and test case enhancements Gavin Hu
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 1/6] eal: fix clang compilation error on x86 Gavin Hu
2018-12-27  6:36   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 2/6] test/spinlock: remove 1us delay for correct benchmarking Gavin Hu
2018-12-27  7:20   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 3/6] test/spinlock: get timestamp more precisely Gavin Hu
2018-12-27  7:27   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-01-03 18:22     ` Honnappa Nagarahalli
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 4/6] test/spinlock: amortize the cost of getting time Gavin Hu
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 5/6] spinlock: reimplement with atomic one-way barrier builtins Gavin Hu
2018-12-27  7:42   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2018-12-27  9:02     ` Gavin Hu (Arm Technology China) [this message]
2019-01-03 20:35       ` Honnappa Nagarahalli
2019-01-11 13:52     ` Gavin Hu (Arm Technology China)
2019-01-14  5:54       ` Honnappa Nagarahalli
2019-01-14  7:39         ` Jerin Jacob Kollanukkaran
2019-01-14 17:08           ` Gavin Hu (Arm Technology China)
2019-01-14  7:57         ` Gavin Hu (Arm Technology China)
2018-12-27  4:13 ` [dpdk-dev] [PATCH v3 6/6] spinlock: ticket based to improve fairness Gavin Hu
2018-12-27  6:58   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2018-12-27 10:05     ` Gavin Hu (Arm Technology China)
2018-12-27 12:08       ` Jerin Jacob Kollanukkaran
2018-12-27 23:41         ` Stephen Hemminger
2018-12-28  4:39           ` Jerin Jacob Kollanukkaran
2018-12-28 10:04             ` Gavin Hu (Arm Technology China)
2019-01-03 18:35             ` Honnappa Nagarahalli
2019-01-03 19:53               ` Stephen Hemminger
2019-01-04  7:06                 ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB31676F557B2DBB9329DB83498FB60@VI1PR08MB3167.eurprd08.prod.outlook.com \
    --to=gavin.hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=bruce.richardson@intel.com \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=nd@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).