From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id BB460470E5
	for <public@inbox.dpdk.org>; Thu, 25 Dec 2025 10:24:48 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 9C235402B2;
	Thu, 25 Dec 2025 10:24:42 +0100 (CET)
Received: from CY3PR05CU001.outbound.protection.outlook.com
 (mail-westcentralusazon11013065.outbound.protection.outlook.com
 [40.93.201.65]) by mails.dpdk.org (Postfix) with ESMTP id BA2D640673
 for <stable@dpdk.org>; Thu, 25 Dec 2025 10:24:40 +0100 (CET)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=cHrXV+GV61o83aGQ5T9AWC7YTv992sEWohsCaRyGj2w7uhFkOZ9CGT9Kg7/xfLYbuT/9jvFNkQ2KKyXetSzSsJB9dEpth3OoQhol91Gq2JI/EVPIRsvRPEMqW+Oa1xBxAndgsI3J1kfPA+auXfrZ2tNkc9FBK/lj4/0PhoU+ldfiIgR6vdmgpJb2ieqp9fm1QkWz5wXlqYBX+jZ0BFwaQD+Vkv8MR+D+K3KZ665zdp+gB4AvV0IcUStU+peYxsOT0g3GLG/TsY81j7StQfKTOV54CybKPQTnfTKL+Cl85TnKmCpy4P4n5T+c0xS6C56GYhGUNlTVZ8G1tXsSXE/N1g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=Lu3nXHRmJitFOouHMmH+05Has8cTpZtlJk8sAr5S990=;
 b=D/BAdb1dGYSZBSg9eHAq3ItM40GB+l0RLm34j8YkoS8YOlPt/jeuhJ7bs4A3BBPUdnfW3EzhK/ANdw9QS2FR8S3Qu9D/BUtUyTJttJiK1cwN1Lsh/rvc2bjoUqArxtuVPwwG14c5e14AVXfCX5M6hNLYXT4IvJRe6oRo/UXJcwr/+Wybc20WPMYlDqMzhssSaG4h3M74VsbKcxvLmM9gvamIMrTqpC71mVIbTEsylBvnSsiu+f7fmy6r27YX+mhltpLEb75qjAsgovLwSCSCV6BarWJwzzC2daMYObItov7RsNHOavAxOlC8OtuLmY1AMFL/9bW/KtGyVJRzqSHwww==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 216.228.118.232) smtp.rcpttodomain=arm.com smtp.mailfrom=nvidia.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=Lu3nXHRmJitFOouHMmH+05Has8cTpZtlJk8sAr5S990=;
 b=mdvmD/iAkdsid7qrch2TC49VPO/2cwaIWrk6Bctw5Y/HgoduP8elBOON/TaZpNBvb+VrBfJQNPaAAXLZ5pk5DFVDvJFIa3ERH0tA+rdKzkKG9lZfK08TLd+KWPVE94dlH+3GAAmQUnekU0HHwuV9s3E4Xg4IxenAsvgMgwTN8Dy3ikZ8LLxWBuqlvlk6TlH48MolJlHeJl2WobBuq1Q2hZzSQPt5OJeMzejC55mCdlsnP4EoYhdf/iTlCG7HXuVJggK5dbrTiL0XIBtxD9VMYWnNq8eT/3Lclhr9PVi8iLWCVovZ3a7Fc5gaz1HwzkHicxPmiP/PPfytvUnEfzZzIg==
Received: from DM6PR18CA0031.namprd18.prod.outlook.com (2603:10b6:5:15b::44)
 by IA1PR12MB9740.namprd12.prod.outlook.com (2603:10b6:208:465::23) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9456.11; Thu, 25 Dec
 2025 09:24:34 +0000
Received: from DS2PEPF00003447.namprd04.prod.outlook.com
 (2603:10b6:5:15b:cafe::f8) by DM6PR18CA0031.outlook.office365.com
 (2603:10b6:5:15b::44) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9456.11 via Frontend Transport; Thu,
 25 Dec 2025 09:24:32 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232)
 smtp.mailfrom=nvidia.com;
 dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=nvidia.com;
Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.118.232 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C
Received: from mail.nvidia.com (216.228.118.232) by
 DS2PEPF00003447.mail.protection.outlook.com (10.167.17.74) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.9456.9 via Frontend Transport; Thu, 25 Dec 2025 09:24:33 +0000
Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com
 (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Thu, 25 Dec
 2025 01:24:20 -0800
Received: from drhqmail201.nvidia.com (10.126.190.180) by
 drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.2562.20; Thu, 25 Dec 2025 01:24:20 -0800
Received: from nvidia.com (10.127.8.12) by mail.nvidia.com (10.126.190.180)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20 via Frontend
 Transport; Thu, 25 Dec 2025 01:24:18 -0800
From: Shani Peretz <shperetz@nvidia.com>
To: Wathsala Vithanage <wathsala.vithanage@arm.com>
CC: Ola Liljedahl <ola.liljedahl@arm.com>, Honnappa Nagarahalli
 <honnappa.nagarahalli@arm.com>, Dhruv Tripathi <dhruv.tripathi@arm.com>,
 Konstantin Ananyev <konstantin.ananyev@huawei.com>, dpdk stable
 <stable@dpdk.org>
Subject: patch 'ring: establish safe partial order in default mode' has been
 queued to stable release 23.11.6
Date: Thu, 25 Dec 2025 11:18:30 +0200
Message-ID: <20251225091938.345892-69-shperetz@nvidia.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20251225091938.345892-1-shperetz@nvidia.com>
References: <20251221145746.763179-93-shperetz@nvidia.com>
 <20251225091938.345892-1-shperetz@nvidia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-NV-OnPremToCloud: ExternallySecured
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DS2PEPF00003447:EE_|IA1PR12MB9740:EE_
X-MS-Office365-Filtering-Correlation-Id: f4ba9969-532a-4cc4-0cef-08de43976aa5
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
 ARA:13230040|82310400026|36860700013|376014|1800799024|7053199007|13003099007;
X-Microsoft-Antispam-Message-Info: =?utf-8?B?aUJMVmlOeWRVOU1KeGNmbHVSelM4bk1xNENTekFsenNqZmdWOVZ4UG1BRFdD?=
 =?utf-8?B?KzJnU0l6N3NsTUg5WUhHRDlDQWE4UDZhNjJjaG5kQXNkRTZzSlo4cUZTKzBo?=
 =?utf-8?B?K2sxUGg3a21UOTVUbHh3RW4weUE3c1pTZHFPNllOYTBMbWZqRGNIejVLVnhm?=
 =?utf-8?B?S0sxdmxPcHdsaHU0YUlHeURFYTQ0eFNYa3d4YjlBcERodmsyTHFwTU1FT1V6?=
 =?utf-8?B?d01zZ0JXZFpiTWxHV0YrVlB1NVc4ZFd3aFJPMzdiRVlFay9GVHEzT3p0TVdT?=
 =?utf-8?B?RzBjL0hSREo4WkFoWldOdnFvNnppbUJhZ01NMGNDMlI3dmxmVlUrMVlBT09k?=
 =?utf-8?B?dFQxL2VYTnNIS05JN0lJRGtZTFZ0R29iR0RqVWpoazh2T2Z0WG95YWNlMGVI?=
 =?utf-8?B?bmQ4MkE3S3BCc2J1K29xWmU3NDVGMlNqNkEvbngxa3RhTGk1aGc4UnlyajBS?=
 =?utf-8?B?QnpxclNXWDQzSU9hQk5GY0s2bG1oUGJnZUNETHZPaU9sRzc5aUFOdmdzTnJO?=
 =?utf-8?B?T2E3UHo5dmNEQ3RsVkk2THhrNEI0V0pONjQvbTV0QXczVUt0eDFzMXExOTZR?=
 =?utf-8?B?UlJRdnp0RjBCN09BVUh1TDNSWEJ4ZWROelFwcGhONW5hSHpFdmEydDhQcVk4?=
 =?utf-8?B?c2JWZVdiR2Y4Y1F1Z2NoVjhOTTN4alVTOThKcUVRYm02R1pEMG10ZnZkZjVO?=
 =?utf-8?B?MEllcUQ3MnNMMkdRQTZnZm9UN1VCWERmU2o4L2JyUXBGdlVPclluTXl2UGd3?=
 =?utf-8?B?NnU1Z3BROHFiUk9WbnpmOVE2L3JHbFdLcW02U2FyUXdCSElBWWZWV01rTnIz?=
 =?utf-8?B?MFJXaFhPelJDbXRVaG9udVpNRmJhNmdpTlB1UGE1dUJxS0M5RWZnby8vWmVX?=
 =?utf-8?B?QUgvYkJncE5xNTVPUjNoMStnQzg5V0kwbENOUXZKZEV2ekVFeDNGU2xkRWRJ?=
 =?utf-8?B?N1d6VmFoQkFocFU4cUhOTDFFZGh6MGE2U2dadmVPMUpPWFZrbDVOVnBiemVS?=
 =?utf-8?B?TTQ2RGdkMmFkL1hoanBxUUQ5KzFrZ1hVVGpZVWhuNGNkZ09KV1hEL3JyMUFD?=
 =?utf-8?B?ZG9QMUE1UmxNWThlUXpWd2J1WnF0UURmb2tkb1p1UCtUMDdIdkZwZTdBRmdO?=
 =?utf-8?B?bTJFRnI4RDhlTU9hdHZsc3JhVGc3QktyOTcxd0Fxc0dwUlZZQktYREtacURZ?=
 =?utf-8?B?RWtueXA0MWkvUUNzNyt1ZVhDejBLN3BrK1VuVlJ3b0o3M1FiZE9tSnZQUjRB?=
 =?utf-8?B?eWFZZmtZcVdtVWJQU1RYbnBxUUVKRkdnNkdJUlBCdGhIVXBPSG9rUFpMNysx?=
 =?utf-8?B?WUF2Q0I4SUw0M3VOdnFTNHRiaytsQ0k5SjV1dEt5Ymw5SktrdXRsTWtwK0hI?=
 =?utf-8?B?UXpaZUcrRndab1RtazJXQ1VVb1JRY2pycnMxdFZhS2xjTGRjenZiVXl4enF0?=
 =?utf-8?B?R0NJS3lGQWhBSUk2a0orRUFMWGh5MjZxc0lQVHdRb1NRWmZlZmxLUUhhQlFC?=
 =?utf-8?B?bEljSDZnUlUraEp2OEx5dHRDM3kxVVFuWDBkakJBb3BJUWorWG5sTlk3NlFR?=
 =?utf-8?B?QU5FRjFGSHM0NzBOOGdQT05KZVBZV2dad0hYclAvU3h5cUtmcHpqQUUxcUVr?=
 =?utf-8?B?L3U4OUVHTDhGb3c0UlhtTXVNbTRrRHIvOWloRVNaVnNFL0pKbUpWVllaUWRD?=
 =?utf-8?B?dEpnaUNPbUNLNU1pZnlNNlNzc0lWeUlPMGhDZGFBMUNpL0ZCRlJZR1kyZUNj?=
 =?utf-8?B?TnBkQ1BwTWRsUTRZbkY0M2V3OGVZSERRaXVmSjR0a0xyZFdabjR6dkxrdkR2?=
 =?utf-8?B?Z2Y0anRhNGh1VExnYzNsRjNwYW1LcnFZUkhRWkk4SHNBcjJJSWdKbnJhdmVw?=
 =?utf-8?B?dFFpb3pDUUN6MEpsMmJYQUp4Ly93Rkc2UUxkVXJhdXJGMzFHNFRJS2ZKQ2o4?=
 =?utf-8?B?S0FTU203NVBuUDY5VXZqM3gydjZvVlFGVUZpMi9nSTVxTFFzYkR2WldETFA2?=
 =?utf-8?B?ckcvWXFCbGJmTXY4cXExSzNXSWZLUktZMWxONXNSbEc3NXBMSW5CeTZhWjlQ?=
 =?utf-8?B?VS9teWp0NXZJZVJKNjhiUTIrdGtBVjhGOWJQamJrNXZibis2NEEvNElCUWd5?=
 =?utf-8?Q?JhCUuF+akbNIXjKuqU12VS2yH?=
X-Forefront-Antispam-Report: CIP:216.228.118.232; CTRY:US; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge1.nvidia.com; CAT:NONE;
 SFS:(13230040)(82310400026)(36860700013)(376014)(1800799024)(7053199007)(13003099007);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Dec 2025 09:24:33.9951 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: f4ba9969-532a-4cc4-0cef-08de43976aa5
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.232];
 Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: DS2PEPF00003447.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB9740
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org

Hi,

FYI, your patch has been queued to stable release 23.11.6

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 12/30/25. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Queued patches are on a temporary branch at:
https://github.com/shanipr/dpdk-stable

This queued commit can be viewed at:
https://github.com/shanipr/dpdk-stable/commit/942163a489126848c2ae10fcb60df2fb59f3aac8

Thanks.

Shani

---
>From 942163a489126848c2ae10fcb60df2fb59f3aac8 Mon Sep 17 00:00:00 2001
From: Wathsala Vithanage <wathsala.vithanage@arm.com>
Date: Tue, 2 Dec 2025 20:39:26 +0000
Subject: [PATCH] ring: establish safe partial order in default mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

[ upstream commit a4ad0eba9def1d1d071da8afe5e96eb2a2e0d71f]

The function __rte_ring_headtail_move_head() assumes that the barrier
(fence) between the load of the head and the load-acquire of the
opposing tail guarantees the following: if a first thread reads tail
and then writes head and a second thread reads the new value of head
and then reads tail, then it should observe the same (or a later)
value of tail.

This assumption is incorrect under the C11 memory model. If the barrier
(fence) is intended to establish a total ordering of ring operations,
it fails to do so. Instead, the current implementation only enforces a
partial ordering, which can lead to unsafe interleavings. In particular,
some partial orders can cause underflows in free slot or available
element computations, potentially resulting in data corruption.

The issue manifests when a CPU first acts as a producer and later as a
consumer. In this scenario, the barrier assumption may fail when another
core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
this violation. The problem has not been widely observed so far because:
  (a) on strong memory models (e.g., x86-64) the assumption holds, and
  (b) on relaxed models with RCsc semantics the ordering is still strong
      enough to prevent hazards.
The problem becomes visible only on weaker models, when load-acquire is
implemented with RCpc semantics (e.g. some AArch64 CPUs which support
the LDAPR and LDAPUR instructions).

Three possible solutions exist:
  1. Strengthen ordering by upgrading release/acquire semantics to
     sequential consistency. This requires using seq-cst for stores,
     loads, and CAS operations. However, this approach introduces a
     significant performance penalty on relaxed-memory architectures.

  2. Establish a safe partial order by enforcing a pair-wise
     happens-before relationship between thread of same role by changing
     the CAS and the preceding load of the head by converting them to
     release and acquire respectively. This approach makes the original
     barrier assumption unnecessary and allows its removal.

  3. Retain partial ordering but ensure only safe partial orders are
     committed. This can be done by detecting underflow conditions
     (producer < consumer) and quashing the update in such cases.
     This approach makes the original barrier assumption unnecessary
     and allows its removal.

This patch implements solution (2) to preserve the “enqueue always
succeeds” contract expected by dependent libraries (e.g., mempool).
While solution (3) offers higher performance, adopting it now would
break that assumption.

Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/ring/rte_ring_c11_pvt.h | 71 ++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index 5c10ad88f5..fb00889fdf 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -24,7 +24,12 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 	if (!single)
 		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
 			rte_memory_order_relaxed);
-
+	/*
+	 * R0: Establishes a synchronizing edge with load-acquire of
+	 * cons_tail at A1 or prod_tail at A4.
+	 * Ensures that memory effects by this thread on ring elements array
+	 * is observed by a different thread of the other type.
+	 */
 	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
@@ -62,16 +67,24 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
-	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
+	/*
+	 * A0: Establishes a synchronizing edge with R1.
+	 * Ensure that this thread observes same values
+	 * to cons_tail observed by the thread that
+	 * updated r->prod.head.
+	 * If not, an unsafe partial order may ensue.
+	 */
+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_acquire);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		/* Ensure the head is read before tail */
-		__atomic_thread_fence(rte_memory_order_acquire);
 
-		/* load-acquire synchronize with store-release of ht->tail
-		 * in update_tail.
+		/*
+		 * A1: Establishes a synchronizing edge with R0.
+		 * Ensures that other thread's memory effects on
+		 * ring elements array is observed by the time
+		 * this thread observes its tail update.
 		 */
 		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
 					rte_memory_order_acquire);
@@ -97,10 +110,19 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 			success = 1;
 		} else
 			/* on failure, *old_head is updated */
+			/*
+			 * R1/A2.
+			 * R1: Establishes a synchronizing edge with A0 of a
+			 * different thread.
+			 * A2: Establishes a synchronizing edge with R1 of a
+			 * different thread to observe same value for
+			 * cons_tail observed by that thread on CAS failure
+			 * (to retry with an updated *old_head).
+			 */
 			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
 					old_head, *new_head,
-					rte_memory_order_relaxed,
-					rte_memory_order_relaxed);
+					rte_memory_order_release,
+					rte_memory_order_acquire);
 	} while (unlikely(success == 0));
 	return n;
 }
@@ -138,17 +160,23 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	uint32_t prod_tail;
 	int success;
 
-	/* move cons.head atomically */
-	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
+	/*
+	 * A3: Establishes a synchronizing edge with R2.
+	 * Ensure that this thread observes same values
+	 * to prod_tail observed by the thread that
+	 * updated r->cons.head.
+	 * If not, an unsafe partial order may ensue.
+	 */
+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_acquire);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
-		/* Ensure the head is read before tail */
-		__atomic_thread_fence(rte_memory_order_acquire);
-
-		/* this load-acquire synchronize with store-release of ht->tail
-		 * in update_tail.
+		/*
+		 * A4: Establishes a synchronizing edge with R0.
+		 * Ensures that other thread's memory effects on
+		 * ring elements array is observed by the time
+		 * this thread observes its tail update.
 		 */
 		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
 					rte_memory_order_acquire);
@@ -173,10 +201,19 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 			success = 1;
 		} else
 			/* on failure, *old_head will be updated */
+			/*
+			 * R2/A5.
+			 * R2: Establishes a synchronizing edge with A3 of a
+			 * different thread.
+			 * A5: Establishes a synchronizing edge with R2 of a
+			 * different thread to observe same value for
+			 * prod_tail observed by that thread on CAS failure
+			 * (to retry with an updated *old_head).
+			 */
 			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
 							old_head, *new_head,
-							rte_memory_order_relaxed,
-							rte_memory_order_relaxed);
+							rte_memory_order_release,
+							rte_memory_order_acquire);
 	} while (unlikely(success == 0));
 	return n;
 }
-- 
2.43.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2025-12-25 11:16:39.805462717 +0200
+++ 0069-ring-establish-safe-partial-order-in-default-mode.patch	2025-12-25 11:16:36.069837000 +0200
@@ -0,0 +1,194 @@
+From 942163a489126848c2ae10fcb60df2fb59f3aac8 Mon Sep 17 00:00:00 2001
+From: Wathsala Vithanage <wathsala.vithanage@arm.com>
+Date: Tue, 2 Dec 2025 20:39:26 +0000
+Subject: [PATCH] ring: establish safe partial order in default mode
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+[ upstream commit a4ad0eba9def1d1d071da8afe5e96eb2a2e0d71f]
+
+The function __rte_ring_headtail_move_head() assumes that the barrier
+(fence) between the load of the head and the load-acquire of the
+opposing tail guarantees the following: if a first thread reads tail
+and then writes head and a second thread reads the new value of head
+and then reads tail, then it should observe the same (or a later)
+value of tail.
+
+This assumption is incorrect under the C11 memory model. If the barrier
+(fence) is intended to establish a total ordering of ring operations,
+it fails to do so. Instead, the current implementation only enforces a
+partial ordering, which can lead to unsafe interleavings. In particular,
+some partial orders can cause underflows in free slot or available
+element computations, potentially resulting in data corruption.
+
+The issue manifests when a CPU first acts as a producer and later as a
+consumer. In this scenario, the barrier assumption may fail when another
+core takes the consumer role. A Herd7 litmus test in C11 can demonstrate
+this violation. The problem has not been widely observed so far because:
+  (a) on strong memory models (e.g., x86-64) the assumption holds, and
+  (b) on relaxed models with RCsc semantics the ordering is still strong
+      enough to prevent hazards.
+The problem becomes visible only on weaker models, when load-acquire is
+implemented with RCpc semantics (e.g. some AArch64 CPUs which support
+the LDAPR and LDAPUR instructions).
+
+Three possible solutions exist:
+  1. Strengthen ordering by upgrading release/acquire semantics to
+     sequential consistency. This requires using seq-cst for stores,
+     loads, and CAS operations. However, this approach introduces a
+     significant performance penalty on relaxed-memory architectures.
+
+  2. Establish a safe partial order by enforcing a pair-wise
+     happens-before relationship between thread of same role by changing
+     the CAS and the preceding load of the head by converting them to
+     release and acquire respectively. This approach makes the original
+     barrier assumption unnecessary and allows its removal.
+
+  3. Retain partial ordering but ensure only safe partial orders are
+     committed. This can be done by detecting underflow conditions
+     (producer < consumer) and quashing the update in such cases.
+     This approach makes the original barrier assumption unnecessary
+     and allows its removal.
+
+This patch implements solution (2) to preserve the “enqueue always
+succeeds” contract expected by dependent libraries (e.g., mempool).
+While solution (3) offers higher performance, adopting it now would
+break that assumption.
+
+Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
+Signed-off-by: Ola Liljedahl <ola.liljedahl@arm.com>
+Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
+Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
+Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
+Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
+---
+ lib/ring/rte_ring_c11_pvt.h | 71 ++++++++++++++++++++++++++++---------
+ 1 file changed, 54 insertions(+), 17 deletions(-)
+
+diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
+index 5c10ad88f5..fb00889fdf 100644
+--- a/lib/ring/rte_ring_c11_pvt.h
++++ b/lib/ring/rte_ring_c11_pvt.h
+@@ -24,7 +24,12 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
+ 	if (!single)
+ 		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
+ 			rte_memory_order_relaxed);
+-
++	/*
++	 * R0: Establishes a synchronizing edge with load-acquire of
++	 * cons_tail at A1 or prod_tail at A4.
++	 * Ensures that memory effects by this thread on ring elements array
++	 * is observed by a different thread of the other type.
++	 */
+ 	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
+ }
+ 
+@@ -62,16 +67,24 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
+ 	unsigned int max = n;
+ 	int success;
+ 
+-	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
++	/*
++	 * A0: Establishes a synchronizing edge with R1.
++	 * Ensure that this thread observes same values
++	 * to cons_tail observed by the thread that
++	 * updated r->prod.head.
++	 * If not, an unsafe partial order may ensue.
++	 */
++	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_acquire);
+ 	do {
+ 		/* Reset n to the initial burst count */
+ 		n = max;
+ 
+-		/* Ensure the head is read before tail */
+-		__atomic_thread_fence(rte_memory_order_acquire);
+ 
+-		/* load-acquire synchronize with store-release of ht->tail
+-		 * in update_tail.
++		/*
++		 * A1: Establishes a synchronizing edge with R0.
++		 * Ensures that other thread's memory effects on
++		 * ring elements array is observed by the time
++		 * this thread observes its tail update.
+ 		 */
+ 		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
+ 					rte_memory_order_acquire);
+@@ -97,10 +110,19 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
+ 			success = 1;
+ 		} else
+ 			/* on failure, *old_head is updated */
++			/*
++			 * R1/A2.
++			 * R1: Establishes a synchronizing edge with A0 of a
++			 * different thread.
++			 * A2: Establishes a synchronizing edge with R1 of a
++			 * different thread to observe same value for
++			 * cons_tail observed by that thread on CAS failure
++			 * (to retry with an updated *old_head).
++			 */
+ 			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
+ 					old_head, *new_head,
+-					rte_memory_order_relaxed,
+-					rte_memory_order_relaxed);
++					rte_memory_order_release,
++					rte_memory_order_acquire);
+ 	} while (unlikely(success == 0));
+ 	return n;
+ }
+@@ -138,17 +160,23 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
+ 	uint32_t prod_tail;
+ 	int success;
+ 
+-	/* move cons.head atomically */
+-	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
++	/*
++	 * A3: Establishes a synchronizing edge with R2.
++	 * Ensure that this thread observes same values
++	 * to prod_tail observed by the thread that
++	 * updated r->cons.head.
++	 * If not, an unsafe partial order may ensue.
++	 */
++	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_acquire);
+ 	do {
+ 		/* Restore n as it may change every loop */
+ 		n = max;
+ 
+-		/* Ensure the head is read before tail */
+-		__atomic_thread_fence(rte_memory_order_acquire);
+-
+-		/* this load-acquire synchronize with store-release of ht->tail
+-		 * in update_tail.
++		/*
++		 * A4: Establishes a synchronizing edge with R0.
++		 * Ensures that other thread's memory effects on
++		 * ring elements array is observed by the time
++		 * this thread observes its tail update.
+ 		 */
+ 		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
+ 					rte_memory_order_acquire);
+@@ -173,10 +201,19 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
+ 			success = 1;
+ 		} else
+ 			/* on failure, *old_head will be updated */
++			/*
++			 * R2/A5.
++			 * R2: Establishes a synchronizing edge with A3 of a
++			 * different thread.
++			 * A5: Establishes a synchronizing edge with R2 of a
++			 * different thread to observe same value for
++			 * prod_tail observed by that thread on CAS failure
++			 * (to retry with an updated *old_head).
++			 */
+ 			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
+ 							old_head, *new_head,
+-							rte_memory_order_relaxed,
+-							rte_memory_order_relaxed);
++							rte_memory_order_release,
++							rte_memory_order_acquire);
+ 	} while (unlikely(success == 0));
+ 	return n;
+ }
+-- 
+2.43.0
+