From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: Phil Yang <Phil.Yang@arm.com>, Diogo Behrens <diogo.behrens@huawei.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory
Date: Mon, 31 Aug 2020 18:45:39 +0000 [thread overview]
Message-ID: <DBAPR08MB58148C46395B8592D9AC2B8F98510@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <DB7PR08MB38654DE38D1D0C88BBE43DCEE9520@DB7PR08MB3865.eurprd08.prod.outlook.com>
Hi Diogo,
Thanks for your explanation.
As documented in https://developer.arm.com/documentation/ddi0487/fc B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions:
" Between the Load-Exclusive and the Store-Exclusive, there are no explicit memory accesses, preloads,
direct or indirect System register writes, address translation instructions, cache or TLB maintenance
instructions, exception generating instructions, exception returns, or indirect branches."
[Honnappa] This is a requirement on the software, not on the micro-architecture.
We are having few discussions internally, will get back soon.
So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg operation is atomic.
Thanks,
Phil Yang
> -----Original Message-----
> From: Diogo Behrens <mailto:diogo.behrens@huawei.com>
> Sent: Thursday, August 27, 2020 4:57 PM
> To: Phil Yang <mailto:Phil.Yang@arm.com>
> Cc: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>; Honnappa Nagarahalli
> <mailto:Honnappa.Nagarahalli@arm.com>
> Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory
>
> Hi Phil,
>
> thanks for taking a look at this. Indeed, the cmpxchg will have release
> semantics, but implicit barriers do not provide the same ordering guarantees
> as DMB ISH, ie, atomic_thread_fence(__ATOMIC_ACQ_REL).
>
> Let me try to explain the scenario. Here is a pseudo-ARM assembly with the
> instructions involved:
>
> STR me->locked = 1 // 1: initialize the node
> LDAXR pred = tail // 2: cmpxchg
> STLXR tail = me // 3: cmpxchg
> STR pred->next = me // 4: the problematic store
>
> Now, ARM allows instruction 1 and 2 to be reordered, and also instructions 3
> and 4 to be reordered:
>
> LDAXR pred = tail // 2: cmpxchg (acquires can be moved up)
> STR me-> locked = 1 // 1: initialize the node
> STR pred->next = me // 4: the problematic store
> STLXR tail = me // 3: cmpxchg (releases can be moved down)
>
> And since stores 1 and 4 can be reordered too, the following execution is
> possible:
>
> LDAXR pred = tail // 2: cmpxchg
> STR pred->next = me // 4: the problematic store
> STR me-> locked = 1 // 1: initialize the node
> STLXR tail = me // 3: cmpxchg
>
> In this subtle situation, the locking-thread can overwrite the store to next-
> >locked of the unlocking-thread (if it occurs between 4 and 1), and
> subsequently hang waiting for me->locked to become 0.
>
> We couldn't reproduce this with our available hardware, but we
> "reproduced" it with the RMEM model checker, which precisely models the
> ARM memory model (https://github.com/rems-project/rmem). The GenMC
> model checker (https://github.com/MPI-SWS/genmc) also finds the issue,
> but its intermediate memory model is slightly more general than the ARM
> model.
>
> The release attached to store (4) prevents (1) to reordering with it.
>
> Please, let me know if that makes sense or if I am missing something.
>
> Thanks,
> -Diogo
>
> -----Original Message-----
> From: Phil Yang [mailto:Phil.Yang@arm.com]
> Sent: Wednesday, August 26, 2020 12:17 PM
> To: Diogo Behrens <mailto:diogo.behrens@huawei.com>
> Cc: mailto:dev@dpdk.org; nd <mailto:nd@arm.com>; Honnappa Nagarahalli
> <mailto:Honnappa.Nagarahalli@arm.com>; nd <mailto:nd@arm.com>
> Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory
>
> Diogo Behrens <mailto:diogo.behrens@huawei.com> writes:
>
> > Subject: [PATCH] librte_eal: fix mcslock hang on weak memory
> >
> > The initialization me->locked=1 in lock() must happen before
> > next->locked=0 in unlock(), otherwise a thread may hang forever,
> > waiting me->locked become 0. On weak memory systems (sHi Puch as
> ARMv8),
> > the current implementation allows me->locked=1 to be reordered with
> > announcing the node (pred->next=me) and, consequently, to be
> > reordered with next->locked=0 in unlock().
> >
> > This fix adds a release barrier to pred->next=me, forcing
> > me->locked=1 to happen before this operation.
> >
> > Signed-off-by: Diogo Behrens <mailto:diogo.behrens@huawei.com>
> > ---
> > lib/librte_eal/include/generic/rte_mcslock.h | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/include/generic/rte_mcslock.h
> > b/lib/librte_eal/include/generic/rte_mcslock.h
> > index 2bef28351..ce553f547 100644
> > --- a/lib/librte_eal/include/generic/rte_mcslock.h
> > +++ b/lib/librte_eal/include/generic/rte_mcslock.h
> > @@ -68,7 +68,14 @@ rte_mcslock_lock(rte_mcslock_t **msl,
> rte_mcslock_t
> > *me)
> > */
> > return;
> > }
> > - __atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
> > + /* The store to me->next above should also complete before the
> > node is
> > + * visible to predecessor thread releasing the lock. Hence, the store
> > + * prev->next also requires release semantics. Note that, for
> > example,
> > + * on ARM, the release semantics in the exchange operation is not
> > + * strong as a release fence and is not sufficient to enforce the
> > + * desired order here.
>
>
> Hi Diogo,
>
> I didn't quite understand why the exchange operation with ACQ_REL
> memory ordering is not sufficient.
> It emits 'stlxr' on armv8.0-a and 'swpal' on armv8.1-a (with LSE support).
> Both of these two instructions contain a release semantics.
>
> Please check the URL for the detail.
> https://gcc.godbolt.org/z/Mc4373
>
> BTW, if you captured a deadlock issue on your testbed, could you please
> share your test case here?
>
> Thanks,
> Phil
>
>
> > + */
> > + __atomic_store_n(&prev->next, me, __ATOMIC_RELEASE);
> >
> > /* The while-load of me->locked should not move above the
> previous
> > * store to prev->next. Otherwise it will cause a deadlock. Need a
> > --
> > 2.28.0
> >
next prev parent reply other threads:[~2020-08-31 18:45 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-26 9:20 Diogo Behrens
2020-08-26 10:17 ` Phil Yang
2020-08-27 8:56 ` Diogo Behrens
2020-08-28 9:19 ` Phil Yang
2020-08-31 18:45 ` Honnappa Nagarahalli [this message]
2020-10-06 21:49 ` Thomas Monjalon
2020-10-07 9:55 ` Diogo Behrens
2020-10-20 11:56 ` Thomas Monjalon
2020-10-20 21:49 ` Honnappa Nagarahalli
2020-11-22 18:07 ` Thomas Monjalon
2020-11-23 15:06 ` Honnappa Nagarahalli
2020-11-23 15:44 ` Stephen Hemminger
2020-11-23 18:16 ` Honnappa Nagarahalli
2020-11-23 18:29 ` Honnappa Nagarahalli
2020-11-23 19:36 ` Stephen Hemminger
2020-11-25 4:50 ` Honnappa Nagarahalli
2020-11-25 8:41 ` Diogo Behrens
2020-11-25 14:16 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DBAPR08MB58148C46395B8592D9AC2B8F98510@DBAPR08MB5814.eurprd08.prod.outlook.com \
--to=honnappa.nagarahalli@arm.com \
--cc=Phil.Yang@arm.com \
--cc=dev@dpdk.org \
--cc=diogo.behrens@huawei.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).