From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 9D542A04B1;
	Thu, 27 Aug 2020 10:56:48 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 4D8461C112;
	Thu, 27 Aug 2020 10:56:48 +0200 (CEST)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210])
 by dpdk.org (Postfix) with ESMTP id 693F91C10A
 for <dev@dpdk.org>; Thu, 27 Aug 2020 10:56:47 +0200 (CEST)
Received: from lhreml731-chm.china.huawei.com (unknown [172.18.7.107])
 by Forcepoint Email with ESMTP id 0607B96EE290DFDC28F3;
 Thu, 27 Aug 2020 09:56:47 +0100 (IST)
Received: from lhreml728-chm.china.huawei.com (10.201.108.79) by
 lhreml731-chm.china.huawei.com (10.201.108.82) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1913.5; Thu, 27 Aug 2020 09:56:46 +0100
Received: from lhreml728-chm.china.huawei.com ([10.201.108.79]) by
 lhreml728-chm.china.huawei.com ([10.201.108.79]) with mapi id 15.01.1913.007; 
 Thu, 27 Aug 2020 09:56:46 +0100
From: Diogo Behrens <diogo.behrens@huawei.com>
To: Phil Yang <Phil.Yang@arm.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Honnappa Nagarahalli
 <Honnappa.Nagarahalli@arm.com>
Thread-Topic: [PATCH] librte_eal: fix mcslock hang on weak memory
Thread-Index: AQHWe4pZOqSFXnd9v02KDV0dJx5bZ6lKKsYAgAAFG2A=
Date: Thu, 27 Aug 2020 08:56:46 +0000
Message-ID: <ede3247c30734d818cc98b0115f7305b@huawei.com>
References: <20200826092002.19395-1-diogo.behrens@huawei.com>
 <DB7PR08MB38656FDC8D2208E53240E50EE9540@DB7PR08MB3865.eurprd08.prod.outlook.com>
In-Reply-To: <DB7PR08MB38656FDC8D2208E53240E50EE9540@DB7PR08MB3865.eurprd08.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.206.134.146]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Phil,

thanks for taking a look at this. Indeed, the cmpxchg will have release sem=
antics, but implicit barriers do not provide the same ordering guarantees a=
s DMB ISH, ie, atomic_thread_fence(__ATOMIC_ACQ_REL).

Let me try to explain the scenario. Here is a pseudo-ARM assembly with the =
instructions involved:

STR me->locked  =3D 1   // 1: initialize the node
LDAXR pred =3D tail  // 2: cmpxchg
STLXR tail =3D me    // 3: cmpxchg
STR pred->next =3D me // 4: the problematic store

Now, ARM allows instruction 1 and 2 to be reordered, and also instructions =
3 and 4 to be reordered:

LDAXR pred =3D tail  // 2: cmpxchg  (acquires can be moved up)
STR me-> locked =3D 1   // 1: initialize the node
STR pred->next =3D me // 4: the problematic store
STLXR tail =3D me    // 3: cmpxchg (releases can be moved down)

And since stores 1 and 4 can be reordered too, the following execution is p=
ossible:

LDAXR pred =3D tail  // 2: cmpxchg
STR pred->next =3D me // 4: the problematic store
STR me-> locked =3D 1   // 1: initialize the node
STLXR tail =3D me    // 3: cmpxchg

In this subtle situation, the locking-thread can overwrite the store to nex=
t->locked of the unlocking-thread (if it occurs between 4 and 1), and subse=
quently hang waiting for me->locked to become 0.

We couldn't reproduce this with our available hardware, but we "reproduced"=
 it with the RMEM model checker, which precisely models the ARM memory mode=
l (https://github.com/rems-project/rmem). The GenMC model checker (https://=
github.com/MPI-SWS/genmc) also finds the issue, but its intermediate memory=
 model is slightly more general than the ARM model.

The release attached to store (4) prevents (1) to reordering with it.

Please, let me know if that makes sense or if I am missing something.

Thanks,
-Diogo

-----Original Message-----
From: Phil Yang [mailto:Phil.Yang@arm.com]=20
Sent: Wednesday, August 26, 2020 12:17 PM
To: Diogo Behrens <diogo.behrens@huawei.com>
Cc: dev@dpdk.org; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahal=
li@arm.com>; nd <nd@arm.com>
Subject: RE: [PATCH] librte_eal: fix mcslock hang on weak memory

Diogo Behrens <diogo.behrens@huawei.com> writes:

> Subject: [PATCH] librte_eal: fix mcslock hang on weak memory
>=20
>     The initialization me->locked=3D1 in lock() must happen before
>     next->locked=3D0 in unlock(), otherwise a thread may hang forever,
>     waiting me->locked become 0. On weak memory systems (sHi Puch as ARMv=
8),
>     the current implementation allows me->locked=3D1 to be reordered with
>     announcing the node (pred->next=3Dme) and, consequently, to be
>     reordered with next->locked=3D0 in unlock().
>=20
>     This fix adds a release barrier to pred->next=3Dme, forcing
>     me->locked=3D1 to happen before this operation.
>=20
> Signed-off-by: Diogo Behrens <diogo.behrens@huawei.com>
> ---
>  lib/librte_eal/include/generic/rte_mcslock.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>=20
> diff --git a/lib/librte_eal/include/generic/rte_mcslock.h
> b/lib/librte_eal/include/generic/rte_mcslock.h
> index 2bef28351..ce553f547 100644
> --- a/lib/librte_eal/include/generic/rte_mcslock.h
> +++ b/lib/librte_eal/include/generic/rte_mcslock.h
> @@ -68,7 +68,14 @@ rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t
> *me)
>  		 */
>  		return;
>  	}
> -	__atomic_store_n(&prev->next, me, __ATOMIC_RELAXED);
> +	/* The store to me->next above should also complete before the
> node is
> +	 * visible to predecessor thread releasing the lock. Hence, the store
> +	 * prev->next also requires release semantics. Note that, for
> example,
> +	 * on ARM, the release semantics in the exchange operation is not
> +	 * strong as a release fence and is not sufficient to enforce the
> +	 * desired order here.


Hi Diogo,

I didn't quite understand why the exchange operation with ACQ_REL memory or=
dering is not sufficient.
It emits 'stlxr' on armv8.0-a and 'swpal' on armv8.1-a (with LSE support).=
=20
Both of these two instructions contain a release semantics.=20

Please check the URL for the detail.
https://gcc.godbolt.org/z/Mc4373

BTW, if you captured a deadlock issue on your testbed, could you please sha=
re your test case here?

Thanks,
Phil


> +	 */
> +	__atomic_store_n(&prev->next, me, __ATOMIC_RELEASE);
>=20
>  	/* The while-load of me->locked should not move above the previous
>  	 * store to prev->next. Otherwise it will cause a deadlock. Need a
> --
> 2.28.0
>=20