From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5D43B42529 for ; Wed, 6 Sep 2023 11:52:52 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5B1B4402CA; Wed, 6 Sep 2023 11:52:52 +0200 (CEST) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2089.outbound.protection.outlook.com [40.107.237.89]) by mails.dpdk.org (Postfix) with ESMTP id E33AA4027C; Wed, 6 Sep 2023 11:52:49 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h8U1upIe6DqiN3oojaskURZSfYLNCbeJ6cEI8duGnwU4kGEAmqjIzJvHsn4I2Sq5/aRf6NDJtzr2KGg1sbJXgWaEXWXXp8k7oaPPHk+XhFTNhKxyBjSuWtp/dRz8BKEIxLmFGqxlwgrq6zayOYQZbUFuOl+AFKovmG9n82bns3zKIXZgWlGk66CsGm2+YytYtFCIMjdzhrBXY2CdCUAQvIoTBlK8I+Flh8de7B6tGUtPk8W3jXM2ho4NABMZQUCw2jdxGHwBqAVduLr2Cj5V05QyuLJ5OILGmwI2yLWrcYeFYhVOfWwQDkK9HU5iCdLMotWZDY8x7w+43iFFjkrrgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nqbd/+khE7rJbzL/5fVTIhZ9+dn9K4bNalslLz5USYU=; b=GuorKQerDnFp7vCoqkGysTVZdJIhy5q8Q7RMEZrm1Q4LXi+aAQVLY/iimp2oT04iKPXLhkV8Y/N3hbtzN6p06VqgYwSDMbl9aiTeQBd/FN88mt1H7BB5hjuq8A8hs49HgW5yTpXjqpwM1AIwvNfwFWkxbykXOKeP6LCce8kbg58czUkLWnJW39eFjogHzZ/l55ZRcqxuRC3oAtGHtOoNwtOZ64sNz9TWGIq6L54+j4CWE0mSv0HU6CGdOZKJTSlE2HkuLQzJMH5BBzyOgqMiSyo61GSQbCVlkOl4ro5kGGuE/NA3MEl4KznmZBjhA4PgUkRr4D69lbgaQwt5WS2zMw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=nqbd/+khE7rJbzL/5fVTIhZ9+dn9K4bNalslLz5USYU=; b=k/CX9g5xdK41bjAozzpDUZawOPogZ+z/qwOROVGiX3z13dWWaB6NZ2NK01N8ufAYT0U612XCCi7qmvj42UT4+LP/HWj96oflLu/eDqEcKajzHC1s9J43yIG7x+WPkcuw1o5rsHYi/w70pPfhLqsa4amVy7y/UL/2E4gwEry1zUBuzcN4fD+8ker59Nc9ZtmVJOcy2vJZo9akPtNVlmKTblNu4C/MEFt+3fJFdDb8wJvxCAEuCU1w+5etIKRa3a1hSNltdEYfSgrpKjHVooicYogWuW5b4bZtk6eUC2GyYz8NuKdySlkyB8v0adGdhuTM3otBATpOMLbKjQtyEIvIWQ== Received: from DM6PR13CA0062.namprd13.prod.outlook.com (2603:10b6:5:134::39) by SA1PR12MB6726.namprd12.prod.outlook.com (2603:10b6:806:255::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.34; Wed, 6 Sep 2023 09:52:47 +0000 Received: from DS3PEPF000099E2.namprd04.prod.outlook.com (2603:10b6:5:134:cafe::7b) by DM6PR13CA0062.outlook.office365.com (2603:10b6:5:134::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.27 via Frontend Transport; Wed, 6 Sep 2023 09:52:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DS3PEPF000099E2.mail.protection.outlook.com (10.167.17.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.25 via Frontend Transport; Wed, 6 Sep 2023 09:52:47 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 6 Sep 2023 02:52:41 -0700 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 6 Sep 2023 02:52:40 -0700 Received: from nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Wed, 6 Sep 2023 02:52:38 -0700 From: Artemy Kovalyov To: CC: Thomas Monjalon , Ophir Munk , , Anatoly Burakov , =?UTF-8?q?Morten=20Br=C3=B8rup?= , "Stephen Hemminger" Subject: [PATCH v3] eal: fix memory initialization deadlock Date: Wed, 6 Sep 2023 12:52:26 +0300 Message-ID: <20230906095227.1032271-1-artemyko@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230830103303.2428995-1-artemyko@nvidia.com> References: <20230830103303.2428995-1-artemyko@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099E2:EE_|SA1PR12MB6726:EE_ X-MS-Office365-Filtering-Correlation-Id: 1383efdd-7b2c-41f0-8d04-08dbaebf06b2 X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5g1McMzoQIBMKY8MXO5MLPD6H6qTqtRTiVI12SHjBxeiRr+5tmKhNCdmk8vD7WJtjEajda5zmRt8+FkxxAtV7q09KA6RLLmqkPXCN1VZW/0+FyUVC1EnupUDp5W/ey0RVbrjSOfmbg3ZjfEajp8sxlVQUhTpmiUu8V2C7WaI8KI50W3rvutxxC8ohnX891P9t9SkfGxkc934m/7ajri2V7hsBwokhQ5fnyPDhNh9gQAEVADyVL7R0mfbUntRyANhw/2Ye7d/yRwG3wICFNDvx7KKmChKQcTTqOgZ/NKGqxSq35UyvIiIpbtXj/bFkcfTxnuvmTa3VfgRL/63/r/VXzmnqExCHKXN9H1a4O8ZDTi11sQ8jiGBWD9WRMirM7xIgKsjzdKuchXgRyKEyQS5JC3g/cWvWsKgxzvqFCKZZ/2ZATuGxzDhh1O+6qf9FNFItyoB9gJrcWgrUx6hsIZZNY2wZDCmPvpD9vNx6jn5B1BWVsVFc5Ww+z1CjJQUJURcf0GiLaoh4oVUEgT7+HPnRC8M1x87JjpgOLa61eD3GLeMG5ii9oRkxNu+kzCKU1mUb5J1hST4sDbyPmYkngKT1Fc/tG+kXLS6H4xYiQQYzHxLDj39JBfbgslQzc2/61E+OkbrOs3sDqD+5ZiGwjnofQwh42PO1JZ5Wp6v2XL2unoZVluSW/AEsbC64hsFTMDEpNXh/YEUz3sLIyJ96yBY3c69mkgjaPmRqdt9JHHbt7Ni2AN2n3WL+LIspfwy1taO X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(376002)(396003)(39860400002)(136003)(1800799009)(186009)(82310400011)(451199024)(46966006)(40470700004)(36840700001)(7636003)(356005)(82740400003)(6666004)(7696005)(40460700003)(36756003)(36860700001)(55016003)(86362001)(40480700001)(47076005)(1076003)(2616005)(2906002)(336012)(26005)(426003)(6286002)(83380400001)(478600001)(316002)(70206006)(8936002)(5660300002)(8676002)(70586007)(4326008)(41300700001)(6916009)(54906003); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Sep 2023 09:52:47.5689 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1383efdd-7b2c-41f0-8d04-08dbaebf06b2 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099E2.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB6726 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org The issue arose due to the change in the DPDK read-write lock implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed to prevent new read locks while a write lock is in the queue. However, this change has led to a scenario where a recursive read lock, where a lock is acquired twice by the same execution thread, can initiate a sequence of events resulting in a deadlock: Process 1 takes the first read lock. Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due to the presence of a read lock. This makes process 2 enter a wait loop until the read lock is released. Process 1 tries to take a second read lock. However, since a write lock is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until the write lock is acquired and then released. Both processes end up in a blocked state, unable to proceed, resulting in a deadlock scenario. Following these changes, the RW-lock no longer supports recursion, implying that a single thread shouldn't obtain a read lock if it already possesses one. The problem arises during initialization: the rte_eal_init() function acquires the memory_hotplug_lock, and later on, the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() -> rte_memseg_list_walk() acquires it again without releasing it. This scenario introduces the risk of a potential deadlock when concurrent write locks are applied to the same memory_hotplug_lock. To address this we resolved the issue by replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe(). Implementing a lock annotation for rte_memseg_list_walk() to proactively identify bugs similar to this one during compile time. Bugzilla ID: 1277 Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers") Cc: stable@dpdk.org Signed-off-by: Artemy Kovalyov --- v2: changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow v3: added lock annotation for the flow --- lib/eal/common/eal_common_dynmem.c | 5 ++++- lib/eal/common/eal_memalloc.h | 3 ++- lib/eal/common/eal_private.h | 3 ++- lib/eal/include/generic/rte_rwlock.h | 4 ++++ lib/eal/include/rte_lock_annotations.h | 5 +++++ lib/eal/include/rte_memory.h | 4 +++- lib/eal/linux/eal_memalloc.c | 7 +++++-- 7 files changed, 25 insertions(+), 6 deletions(-) diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index bdbbe233a0..95da55d9b0 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void) */ memset(&dummy, 0, sizeof(dummy)); dummy.hugepage_sz = hpi->hugepage_sz; - if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0) + /* memory_hotplug_lock is held during initialization, so it's + * safe to call thread-unsafe version. + */ + if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0) return -1; for (i = 0; i < RTE_DIM(dummy.num_pages); i++) { diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h index ebc3a6f6c1..286ffb7633 100644 --- a/lib/eal/common/eal_memalloc.h +++ b/lib/eal/common/eal_memalloc.h @@ -91,7 +91,8 @@ int eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset); int -eal_memalloc_init(void); +eal_memalloc_init(void) + __rte_shared_locks_required(rte_mcfg_mem_get_lock()); int eal_memalloc_cleanup(void); diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 5eadba4902..ebd496b537 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -115,7 +115,8 @@ int rte_eal_memseg_init(void); * @return * 0 on success, negative on error */ -int rte_eal_memory_init(void); +int rte_eal_memory_init(void) + __rte_shared_locks_required(rte_mcfg_mem_get_lock()); /** * Configure timers diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h index 9e083bbc61..c98fc7d083 100644 --- a/lib/eal/include/generic/rte_rwlock.h +++ b/lib/eal/include/generic/rte_rwlock.h @@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl) /** * Take a read lock. Loop until the lock is held. * + * @note The RW lock isn't recursive, so calling this function on the same + * lock twice without releasing it could potentially result in a deadlock + * scenario when a write lock is involved. + * * @param rwl * A pointer to a rwlock structure. */ diff --git a/lib/eal/include/rte_lock_annotations.h b/lib/eal/include/rte_lock_annotations.h index 9fc50082d6..2456a69352 100644 --- a/lib/eal/include/rte_lock_annotations.h +++ b/lib/eal/include/rte_lock_annotations.h @@ -40,6 +40,9 @@ extern "C" { #define __rte_unlock_function(...) \ __attribute__((unlock_function(__VA_ARGS__))) +#define __rte_locks_excluded(...) \ + __attribute__((locks_excluded(__VA_ARGS__))) + #define __rte_no_thread_safety_analysis \ __attribute__((no_thread_safety_analysis)) @@ -62,6 +65,8 @@ extern "C" { #define __rte_unlock_function(...) +#define __rte_locks_excluded(...) + #define __rte_no_thread_safety_analysis #endif /* RTE_ANNOTATE_LOCKS */ diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h index 3a1c607228..842362d527 100644 --- a/lib/eal/include/rte_memory.h +++ b/lib/eal/include/rte_memory.h @@ -22,6 +22,7 @@ extern "C" { #include #include #include +#include #include #define RTE_PGSIZE_4K (1ULL << 12) @@ -250,7 +251,8 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg); * -1 if user function reported error */ int -rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg); +rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) + __rte_locks_excluded(rte_mcfg_mem_get_lock()); /** * Walk list of all memsegs without performing any locking. diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c index f8b1588cae..9853ec78a2 100644 --- a/lib/eal/linux/eal_memalloc.c +++ b/lib/eal/linux/eal_memalloc.c @@ -1740,7 +1740,10 @@ eal_memalloc_init(void) eal_get_internal_configuration(); if (rte_eal_process_type() == RTE_PROC_SECONDARY) - if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0) + /* memory_hotplug_lock is held during initialization, so it's + * safe to call thread-unsafe version. + */ + if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0) return -1; if (rte_eal_process_type() == RTE_PROC_PRIMARY && internal_conf->in_memory) { @@ -1778,7 +1781,7 @@ eal_memalloc_init(void) } /* initialize all of the fd lists */ - if (rte_memseg_list_walk(fd_list_create_walk, NULL)) + if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL)) return -1; return 0; } -- 2.25.1