From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3E131A04B1; Mon, 23 Nov 2020 16:44:23 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E4E34C8C4; Mon, 23 Nov 2020 16:44:20 +0100 (CET) Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by dpdk.org (Postfix) with ESMTP id 9840B160 for ; Mon, 23 Nov 2020 16:44:18 +0100 (CET) Received: by mail-pg1-f194.google.com with SMTP id l17so4152543pgk.1 for ; Mon, 23 Nov 2020 07:44:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WPh1UHJ27/Vz3Vo847IcsGvHztM0BtUW9dgVu/iC0Kw=; b=FWeDuacCSVTIT/cIbZWTLMBCYZV8o8ZoTVtP0XdPm4wI/GzE4lgAVaj+Yl/dVbQbNE xlEVoyEVnAmRyWFffTs5WFpjVpODPcwGoEt83OkXkvpAqb3ZwnKvB9D9ebqZ+LU6YbvS ZRR348IR84FTnmQ56xKTNXoow4krIzfP1zBkd3bqRZNpM+mojN6f1aJjvuSwr75BEyqg ArmbDEN2EnFAKjjovQeve8HUsVwePF00Rl/KCBvkZIe9EdjFssxLiSp/xh6e8mNnhqkz XkK541SuWaAGc//Qo4LS5W5ZO5l+d9kUFb+RtuYTALAT2tKxPNdhgCv8LeOGRhdY3xVQ 2ypg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WPh1UHJ27/Vz3Vo847IcsGvHztM0BtUW9dgVu/iC0Kw=; b=lHhgBussSRqQwvrlmWwkge0g2mCt5rEl2Bev90AEJIg6onsxxOahfIH+FpugN/3EPT BI4Sx7x08sbJGCIrNQpVP/PnMRsZDOa6/JKz+RAK6rwy1mdGtvp9dAm+mhP7JJ+CGnSX NyHZ9o/e4lk8CFMZI4H13pPLMLxL1pIsNS2Qfc19XrntQCeW9p3XGUgKt+EXjQTkkM6D B0ZKCUV+1yxeZmrl5jYnMhovAFFDhXxiYvglA168sDh8lmqGV61N5+erYoKcNFnRH89e tBKef70ZjWmaxdvVc2bh+iyEJQIw6fXfnkOoZkTKIYDlFmwFZn+Dk1eni0WmIbPGvWm4 wmCQ== X-Gm-Message-State: AOAM533b4j4o2u8U3HpJpuL/FFRIlhKG+4OXLqUdmFKcKIzzXNdiMVeC OOTZC+LJ8+6WgG3NGjVviZQE4A== X-Google-Smtp-Source: ABdhPJxoHBQUYWhwzKTD2W05Aqfck3/WXLJJwJBeDZs5opu+DNeMlguz6aaqMx0M/O9EPSpPSBlZSw== X-Received: by 2002:a17:90a:a4c2:: with SMTP id l2mr378535pjw.106.1606146257605; Mon, 23 Nov 2020 07:44:17 -0800 (PST) Received: from hermes.local (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id gg19sm14536196pjb.21.2020.11.23.07.44.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Nov 2020 07:44:17 -0800 (PST) Date: Mon, 23 Nov 2020 07:44:03 -0800 From: Stephen Hemminger To: Honnappa Nagarahalli Cc: "thomas@monjalon.net" , "dev@dpdk.org" , nd , Diogo Behrens , "david.marchand@redhat.com" Message-ID: <20201123074403.054d08aa@hermes.local> In-Reply-To: References: <20200826092002.19395-1-diogo.behrens@huawei.com> <7423385.l6g0CaCsxP@thomas> <3356496.NAzAc6GACp@thomas> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, 23 Nov 2020 15:06:06 +0000 Honnappa Nagarahalli wrote: > >=20 > > > > > > > > 07/10/2020 11:55, Diogo Behrens: =20 > > > > > Hi Thomas, > > > > > > > > > > we are still waiting for the comments from Honnappa. In our > > > > > understanding, the missing barrier is a bug according to the > > > > > model. We reproduced the scenario in herd7, which represents > > > > > the authoritative memory model: > > > > > https://developer.arm.com/architectures/cpu-architecture/a-profile > > > > > /mem > > > > > ory-model-tool > > > > > > > > > > Here is a litmus code that shows that the XCHG (when compiled > > > > > to LDAXR =20 > > > > and STLR) is not atomic wrt memory updates to other locations: =20 > > > > > ----- > > > > > AArch64 XCHG-nonatomic > > > > > { > > > > > 0:X1=3Dlocked; 0:X3=3Dnext; > > > > > 1:X1=3Dlocked; 1:X3=3Dnext; 1:X5=3Dtail; } > > > > > P0 | P1; > > > > > LDR W0, [X3] | MOV W0, #1; > > > > > CBZ W0, end | STR W0, [X1]; (* init locked *) > > > > > MOV W2, #2 | MOV W2, #0; > > > > > STR W2, [X1] | xchg:; > > > > > end: | LDAXR W6, [X5]; > > > > > NOP | STLXR W4, W0, [X5]; > > > > > NOP | CBNZ W4, xchg; > > > > > NOP | STR W0, [X3]; (* set next *) > > > > > exists > > > > > (0:X2=3D2 /\ locked=3D1) > > > > > ----- > > > > > (web version of herd7: > > > > > http://diy.inria.fr/www/?record=3Daarch64) > > > > > > > > > > P1 is trying to acquire the lock: > > > > > - initializes locked > > > > > - does the xchg on the tail of the mcslock > > > > > - sets the next > > > > > > > > > > P0 is releasing the lock: > > > > > - if next is not set, just terminates > > > > > - if next is set, stores 2 in locked > > > > > > > > > > The initialization of locked should never overwrite the store > > > > > 2 to locked, but =20 > > > > it does. =20 > > > > > To avoid that reordering to happen, one should make the last > > > > > store of P1 to =20 > > > > have a "release" barrier, ie, STLR. =20 > > > > > > > > > > This is equivalent to the reordering occurring in the mcslock > > > > > of librte_eal. > > > > > > > > > > Best regards, > > > > > -Diogo > > > > > > > > > > -----Original Message----- > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > > Sent: Tuesday, October 6, 2020 11:50 PM > > > > > To: Phil Yang ; Diogo Behrens > > > > > ; Honnappa Nagarahalli > > > > > > > > > > Cc: dev@dpdk.org; nd > > > > > Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang > > > > > on weak memory > > > > > > > > > > 31/08/2020 20:45, Honnappa Nagarahalli: =20 > > > > > > > > > > > > Hi Diogo, > > > > > > > > > > > > Thanks for your explanation. > > > > > > > > > > > > As documented in =20 > > > > https://developer.arm.com/documentation/ddi0487/fc B2.9.5 Load- > > > > Exclusive and Store-Exclusive instruction usage restrictions: =20 > > > > > > " Between the Load-Exclusive and the Store-Exclusive, there > > > > > > are no explicit memory accesses, preloads, direct or > > > > > > indirect System register writes, address translation > > > > > > instructions, cache or TLB =20 > > > > maintenance instructions, exception generating instructions, > > > > exception returns, or indirect branches." =20 > > > > > > [Honnappa] This is a requirement on the software, not on the > > > > > > micro- =20 > > > > architecture. =20 > > > > > > We are having few discussions internally, will get back > > > > > > soon. > > > > > > > > > > > > So it is not allowed to insert (1) & (4) between (2, 3). The > > > > > > cmpxchg =20 > > > > operation is atomic. =20 > > > > > > > > > > > > > > > Please what is the conclusion? =20 > > > Apologies for not updating on this sooner. > > > > > > Unfortunately, memory ordering questions are hard topics. I have > > > been =20 > > discussing this internally with few experts and it is still > > ongoing, hope to conclude soon. =20 > > > > > > My focus has been to replace __atomic_exchange_n(msl, me, =20 > > __ATOMIC_ACQ_REL) with __atomic_exchange_n(msl, me, > > __ATOMIC_SEQ_CST). However, the generated code is the same in the > > second case as well (for load-store exclusives), which I am not > > sure if it is correct. =20 > > > > > > I think we have 2 choices here: > > > 1) Accept the patch - when my internal discussion concludes, I > > > can make the =20 > > change and backport according to the conclusion. =20 > > > 2) Wait till the discussion is over - it might take another > > > couple of weeks =20 > >=20 > > One month passed since this last update. > > We are keeping this issue in DPDK 20.11.0 I guess. > > =20 > I can accept this patch and move forward for 20.11. It is a stronger > barrier and I do not see any issues from the code perspective. I will > run tests on few platforms and provide my ACK. >=20 > It is work in progress with few changes for me to make sure we have > an optimal solution for all platforms. Those changes can go into > 21.02. Has anyone investigated later developments in concurrency? While researching MCS Lock discovered this quote: https://mfukar.github.io/2017/09/26/mcs.html Luckily, we don=E2=80=99t have to worry about this very much. MCS locks right now are mostly a teaching tool, and have mostly been superseded by: CLH locks: Craig, Landin, and Hagersten locks replace the explicit queue for a logical queue=20 K42 locks: On-stack information is used instead of keeping a thread-local queue node around. A similar idea is used by the stack-lock algorithm. Note: K42 locks are patented by IBM.