From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D5761424EE;
	Mon,  4 Sep 2023 10:25:25 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 6A1B0402AF;
	Mon,  4 Sep 2023 10:25:25 +0200 (CEST)
Received: from NAM02-SN1-obe.outbound.protection.outlook.com
 (mail-sn1nam02on2041.outbound.protection.outlook.com [40.107.96.41])
 by mails.dpdk.org (Postfix) with ESMTP id A0C76400EF;
 Mon,  4 Sep 2023 10:25:23 +0200 (CEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=CapAvqgK9zwcTwSfAcfp1bl6bXuoOKhpYaN1twbo+euwqsewHyzRhvERmYXrWTu7lNVOA8olP+v7JdDQoeX3Pc44a15KeCNboOIG531hPB5rLEuUz9nTofS0nLd9pC/2oArcxOwx1EK5uWIqHB4pKIAIQJFjzzG/E6+DyJeWIEV1pEJlT0WoBn6wVE7Jo1E5jojLZ7eJcxlk0tKaXRF0JmFhW4cNyBkcOPQ0pRZm28ZbucJtAvxK+GfsA7+TJ23AIZ1iK6dEgTkdqB0b2kLghGT5m7wadtF8ZDlCAwm4GWZuc8Se8UH30ax0g73XPQPisdd3zc7JZqtM9Th90RGtRg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=lsrQZZcPKst8RJgvHzIJ7i8p7gzqgSZCUmCDi5xuKqk=;
 b=BdHjGiIWCK5Q+JExr8b32s2GA7VqXuMaRwDLyvibQ/MAxLL7CZIiqcbp4E7uWiivjCLxiPvRrOHdPVYWPfUNrqj98pyqSrdwY2fM52WnATDM3PvZPIufgvPavY6ORTfJ3PJCE96ziE2TNV5d9qSMzvY2ssfpGb9x3FpnROExy3EFS7a7eY9rKlwOLAI5mX9FpedD7m62vVKLKtONPW3MM1TYNSD8OvO8fExdQQWvmib4gp7laG2rWTYBs9XDBZblfMmBNUie5yDjuzAXRnQcuWXGx6xwBA470gG/iqaryYvRTEmJsPaUGNF85bUep1FapHBN0ECCHVDawtEYo1hLig==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 216.228.117.161) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com;
 dkim=none (message not signed); arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=lsrQZZcPKst8RJgvHzIJ7i8p7gzqgSZCUmCDi5xuKqk=;
 b=mTrcEH/GH3zpoX3krfHun5CnR0jyj3jPFmRQlCdlhQd+yJY4/MhQwwv0Lkae8tMIjMTEkT8aeVUwGzJcsu8AssxHOCrHzXFT2UDUBYg3cR3bOWEjtfOQfgyqC8geZvxZD0LM64ngSEC89zxpLBfyzxs97/Uv6gtxh+5FOcvtpq2RVHVx07ZnOcV404inqUhpRaKFWpmKBHF0Scr1+LOXHYan0MOeMRGaYMlQxwPrj5ZUGbrre4GVqpJmKO0pkuDtw6hHDb1qFktGWLcRGw5TfA1GZN2Lb5EhJ0e2rzYjxHCpU44ULT6ZjDlJgW+XHoc/P2P4wKOAWnm+9+RS5WhzFg==
Received: from SN7PR18CA0005.namprd18.prod.outlook.com (2603:10b6:806:f3::24)
 by CH3PR12MB7763.namprd12.prod.outlook.com (2603:10b6:610:145::10)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.35; Mon, 4 Sep
 2023 08:25:20 +0000
Received: from SA2PEPF00001509.namprd04.prod.outlook.com
 (2603:10b6:806:f3:cafe::3c) by SN7PR18CA0005.outlook.office365.com
 (2603:10b6:806:f3::24) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.33 via Frontend
 Transport; Mon, 4 Sep 2023 08:25:20 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161)
 smtp.mailfrom=nvidia.com;
 dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=nvidia.com;
Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.117.161 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C
Received: from mail.nvidia.com (216.228.117.161) by
 SA2PEPF00001509.mail.protection.outlook.com (10.167.242.41) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.6768.25 via Frontend Transport; Mon, 4 Sep 2023 08:25:20 +0000
Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com
 (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Mon, 4 Sep 2023
 01:25:07 -0700
Received: from rnnvmail202.nvidia.com (10.129.68.7) by rnnvmail205.nvidia.com
 (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Mon, 4 Sep 2023
 01:25:07 -0700
Received: from nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.7) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend
 Transport; Mon, 4 Sep 2023 01:25:05 -0700
From: Artemy Kovalyov <artemyko@nvidia.com>
To: <dev@dpdk.org>
CC: Thomas Monjalon <thomas@monjalon.net>, Ophir Munk <ophirmu@nvidia.com>,
 <stable@dpdk.org>, Anatoly Burakov <anatoly.burakov@intel.com>,
 =?UTF-8?q?Morten=20Br=C3=B8rup?= <mb@smartsharesystems.com>, "Stephen
 Hemminger" <stephen@networkplumber.org>
Subject: [PATCH v2] eal: fix memory initialization deadlock
Date: Mon, 4 Sep 2023 11:24:54 +0300
Message-ID: <20230904082455.3864024-1-artemyko@nvidia.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230830103303.2428995-1-artemyko@nvidia.com>
References: <20230830103303.2428995-1-artemyko@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-NV-OnPremToCloud: ExternallySecured
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: SA2PEPF00001509:EE_|CH3PR12MB7763:EE_
X-MS-Office365-Filtering-Correlation-Id: 9fa25b35-8e32-4108-61ce-08dbad207a3c
X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: TouzsrUSU1AW9Sk/V0Jcxp//8zhQV00jphnTZwGBSWm1Crcn3T2F3L0tvJ6cNDdhynQgZFwgugmbvywSpsb64OXuuqnoxrXb8ZYzWQBqWVjlOHlZRYvmc2W9EdzezQff3+h7f3yU3Xli2bJbmLLMQqqbLOAJRciylv/aVkEwszuNqDuCl47b8Bbg7Djvrb6T5ZWyZrmqCm92bU1UhaMjlM1WEiDQ2Y1WQMALSc5SuqzzXruTTQ47GQbunw2aXQnMjom+B5J/jiouNG3ReneVfwIWTV/ScXyIb1808z8Z9Ux2tb7tr20PjisLVb1BLnsPlmkKAkBcjLxpO9zfu+lsrVMmKyQH32+1orxYnMSDyEFSd0sBB3KT6cSwmXUqhWfoGQXQQdPh278ZEjhzIK+CaHTKFm/3eBEAVA8nZX4jdkNNqkc7X0DeO9lz923rcnEcTaHpp8bSU5fZ7olqA0HWvUZRs1cnhEbP5sMSZ0wkFr8xE2Wi7vw+bTPNzdfoSv1W8Rbdbx+WKQHay5uvZvmllRHMD1PA/7v6XoZ/C4h03wT30kH9XnmrELxlSDs75DTY1DD6AXs7Ipf/cihc+iMNPfGJtcRNCgK7+p7HJyDKz/QHDNeP1eKcWx0jssgLePRqevY9Hnx+IFDjVI7eqeORA34kCb8oWpEFtYb64rMau5LfybVU1DdFn7ACZGA34tTJa/HJS78YH63oSxBPc+CBF9MWkhYXUjpjbYrbNcZl0jtO2JDdG+DhJMm374QjEyyA
X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE;
 SFS:(13230031)(4636009)(39860400002)(396003)(376002)(136003)(346002)(82310400011)(451199024)(186009)(1800799009)(46966006)(40470700004)(36840700001)(426003)(40460700003)(316002)(41300700001)(5660300002)(336012)(6286002)(6916009)(2616005)(83380400001)(26005)(86362001)(36756003)(8676002)(55016003)(2906002)(36860700001)(47076005)(4326008)(1076003)(40480700001)(8936002)(6666004)(7636003)(356005)(82740400003)(478600001)(7696005)(54906003)(70586007)(70206006);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Sep 2023 08:25:20.2987 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 9fa25b35-8e32-4108-61ce-08dbad207a3c
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161];
 Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF00001509.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB7763
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:

Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.

Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.

Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().

Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
v2:
changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow
---
 lib/eal/common/eal_common_dynmem.c   | 5 ++++-
 lib/eal/include/generic/rte_rwlock.h | 4 ++++
 lib/eal/linux/eal_memalloc.c         | 7 +++++--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
index bdbbe233a0..0d5da40096 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
 		 */
 		memset(&dummy, 0, sizeof(dummy));
 		dummy.hugepage_sz = hpi->hugepage_sz;
-		if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
+		 *  safe to call thread-unsafe version.
+		 */
+		if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
 			return -1;
 
 		for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bbc61..c98fc7d083 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl)
 /**
  * Take a read lock. Loop until the lock is held.
  *
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
  * @param rwl
  *   A pointer to a rwlock structure.
  */
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..3705b41f5f 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@ eal_memalloc_init(void)
 		eal_get_internal_configuration();
 
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
-		if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
+		 *  safe to call thread-unsafe version.
+		 */
+		if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
 			return -1;
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ eal_memalloc_init(void)
 	}
 
 	/* initialize all of the fd lists */
-	if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+	if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
 		return -1;
 	return 0;
 }
-- 
2.25.1