From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 88F8F46F14; Wed, 17 Sep 2025 01:08:10 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 17A384027A; Wed, 17 Sep 2025 01:08:10 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 43F6E40268 for ; Wed, 17 Sep 2025 01:08:09 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4cRHb070DPz6L56F; Wed, 17 Sep 2025 07:06:36 +0800 (CST) Received: from frapeml100008.china.huawei.com (unknown [7.182.85.131]) by mail.maildlp.com (Postfix) with ESMTPS id 3DC0B1400CB; Wed, 17 Sep 2025 07:08:08 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml100008.china.huawei.com (7.182.85.131) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 17 Sep 2025 01:08:07 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Wed, 17 Sep 2025 01:08:07 +0200 From: Konstantin Ananyev To: Konstantin Ananyev , Wathsala Vithanage , Honnappa Nagarahalli CC: "dev@dpdk.org" , Ola Liljedahl , Dhruv Tripathi Subject: RE: [PATCH 1/1] ring: safe partial ordering for head/tail update Thread-Topic: [PATCH 1/1] ring: safe partial ordering for head/tail update Thread-Index: AQHcJnJlB2Gj2xn6O0SOJRCJl7cS/rSWYP5wgAAO3ZA= Date: Tue, 16 Sep 2025 23:08:07 +0000 Message-ID: <96a93e5a616e47ec8dbf08d97a73745e@huawei.com> References: <20250915185451.533039-1-wathsala.vithanage@arm.com> <20250915185451.533039-2-wathsala.vithanage@arm.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.195.245.102] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org >=20 >=20 > > The function __rte_ring_headtail_move_head() assumes that the barrier > > (fence) between the load of the head and the load-acquire of the > > opposing tail guarantees the following: if a first thread reads tail > > and then writes head and a second thread reads the new value of head > > and then reads tail, then it should observe the same (or a later) > > value of tail. > > > > This assumption is incorrect under the C11 memory model. If the barrier > > (fence) is intended to establish a total ordering of ring operations, > > it fails to do so. Instead, the current implementation only enforces a > > partial ordering, which can lead to unsafe interleavings. In particular= , > > some partial orders can cause underflows in free slot or available > > element computations, potentially resulting in data corruption. >=20 > Hmm... sounds exactly like the problem from the patch we discussed earlie= r that > year: > https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4- > konstantin.ananyev@huawei.com/ > In two words: > "... thread can see 'latest' 'cons.head' value, with 'previous' value for= 'prod.tail' or > visa-versa. > In other words: 'cons.head' value depends on 'prod.tail', so before makin= g latest > 'cons.head' > value visible to other threads, we need to ensure that latest 'prod.tail'= is also visible." > Is that the one? >=20 > > The issue manifests when a CPU first acts as a producer and later as a > > consumer. In this scenario, the barrier assumption may fail when anothe= r > > core takes the consumer role. A Herd7 litmus test in C11 can demonstrat= e > > this violation. The problem has not been widely observed so far because= : > > (a) on strong memory models (e.g., x86-64) the assumption holds, and > > (b) on relaxed models with RCsc semantics the ordering is still stron= g > > enough to prevent hazards. > > The problem becomes visible only on weaker models, when load-acquire is > > implemented with RCpc semantics (e.g. some AArch64 CPUs which support > > the LDAPR and LDAPUR instructions). > > > > Three possible solutions exist: > > 1. Strengthen ordering by upgrading release/acquire semantics to > > sequential consistency. This requires using seq-cst for stores, > > loads, and CAS operations. However, this approach introduces a > > significant performance penalty on relaxed-memory architectures. > > > > 2. Establish a safe partial order by enforcing a pair-wise > > happens-before relationship between thread of same role by changin= g > > the CAS and the preceding load of the head by converting them to > > release and acquire respectively. This approach makes the original > > barrier assumption unnecessary and allows its removal. >=20 > For the sake of clarity, can you outline what would be exact code changes= for > approach #2? Same as in that patch: > https://patchwork.dpdk.org/project/dpdk/patch/20250521111432.207936-4- > konstantin.ananyev@huawei.com/ > Or something different? >=20 > > 3. Retain partial ordering but ensure only safe partial orders are > > committed. This can be done by detecting underflow conditions > > (producer < consumer) and quashing the update in such cases. > > This approach makes the original barrier assumption unnecessary > > and allows its removal. >=20 > > This patch implements solution (3) for performance reasons. > > > > Signed-off-by: Wathsala Vithanage > > Signed-off-by: Ola Liljedahl > > Reviewed-by: Honnappa Nagarahalli > > Reviewed-by: Dhruv Tripathi > > --- > > lib/ring/rte_ring_c11_pvt.h | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h > > index b9388af0da..e5ac1f6b9e 100644 > > --- a/lib/ring/rte_ring_c11_pvt.h > > +++ b/lib/ring/rte_ring_c11_pvt.h > > @@ -83,9 +83,6 @@ __rte_ring_headtail_move_head(struct rte_ring_headtai= l > > *d, > > /* Reset n to the initial burst count */ > > n =3D max; > > > > - /* Ensure the head is read before tail */ > > - rte_atomic_thread_fence(rte_memory_order_acquire); > > - > > /* load-acquire synchronize with store-release of ht->tail > > * in update_tail. > > */ >=20 > But then cons.head can be read a before prod.tail (and visa-versa), right= ? s/before/after/ =20 >=20 > > @@ -99,6 +96,13 @@ __rte_ring_headtail_move_head(struct rte_ring_headta= il > > *d, > > */ > > *entries =3D (capacity + stail - *old_head); > > > > + /* > > + * Ensure the entries calculation was not based on a stale > > + * and unsafe stail observation that causes underflow. > > + */ > > + if ((int)*entries < 0) > > + *entries =3D 0; > > + > > /* check that we have enough room in ring */ > > if (unlikely(n > *entries)) > > n =3D (behavior =3D=3D RTE_RING_QUEUE_FIXED) ? > > -- > > 2.43.0 > >