From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1BF8542548; Fri, 8 Sep 2023 15:17:49 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B9889402AF; Fri, 8 Sep 2023 15:17:48 +0200 (CEST) Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2059.outbound.protection.outlook.com [40.107.220.59]) by mails.dpdk.org (Postfix) with ESMTP id EC06440285 for ; Fri, 8 Sep 2023 15:17:46 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Bi39lFR5LobVLSueyO4/VRJ9zOET8D5eTe7WLnjrgp9KnX21kGbNxznfbJUo8xUmJDlUKm1mp8GtcoEE6en8PhQV0laQEqpItBEpjORHnA85nmM3nIoqBrJ1qmiKUKyXK6cOLO2y7csk0uCPOrNs0OtUUW6Vaus8W56/996cXxYOpfmeyht3ZGAW4RzK8j4xyqEbC1i8KPRzhhJcq77+GcC5m/7UQgV8ff53odwvmD5B+FWwqhK7TiW3aLFoDIhl9yY70mdKAQOBo9L6aMFUsaJmpIqVlA/Xmw6vF0s9G5cMvH8DUuzN3fSZdiN5LHdFFsxBKqkAyh4I+nsO0Ei0Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IlN/qo7sEQjq6xFzPdwC34DnJrKgmJYKas0b3T8CXp8=; b=BRM3Z7pbRvs97fwCeu4LjoMLYRM5sS0ZlQPPYVFCpX8bgXa6Zi7/u8icXB8+R3d0khrqihWmbG4ai71epfyVTtuiNd3BRxGJqzltFpY2h6foRdPQ08q68y4fqA9AjGPHaO6XnK9q9DMd7ahLltWNvHxyZfI9tpWs8bdZbXU//6N7381M7bZfwhlgqRdGju/pUQ6tEg2V/z28+V/I6Xaoq8gPC4eX9jb5IZ7rVB2yD9eC24qZscgDlP94k+pLT+WpTdamUUcaE0Kl8pKLKh6H4Z+7g62vOIe6ydjGgC7NVh5A4S2ivLGjbKYJcJLx5uImFoMeI+JlhweJC9hmZ3x0AA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IlN/qo7sEQjq6xFzPdwC34DnJrKgmJYKas0b3T8CXp8=; b=oJuWA7cjTtOFt/wjjCkksMXJrPSngxSdKacpeRePSFnVRPunGYP3WUDM/GQlz8aE3PTat4CVZAUiLRFqcgGoRi/1AiRjDO//KSrRkV9yLleH2TICCE2pBXGj3e1lbQ07b81of8iyAUiyd9BBHR7t92VlCTpiXghB8yqO5+r/h8jY8K6uibLLauOcDFY8tbKx7D9VH7OS264B/MhiYiyw/cKWwQU3xpON87GX39uynas5W+ReT+i3IbUSUI5AqCpgLA3408TTwDZL4xgRviFzJS6BlCQ/d9J7PFLe5b1lg/y0j7jDeEHKxzXWLeRuuNaTbOdsKbTnBytAZB5pa6LK4Q== Received: from DM6PR13CA0033.namprd13.prod.outlook.com (2603:10b6:5:bc::46) by DM6PR12MB4044.namprd12.prod.outlook.com (2603:10b6:5:21d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.30; Fri, 8 Sep 2023 13:17:45 +0000 Received: from CY4PEPF0000EE38.namprd03.prod.outlook.com (2603:10b6:5:bc:cafe::7) by DM6PR13CA0033.outlook.office365.com (2603:10b6:5:bc::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6792.10 via Frontend Transport; Fri, 8 Sep 2023 13:17:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by CY4PEPF0000EE38.mail.protection.outlook.com (10.167.242.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6768.25 via Frontend Transport; Fri, 8 Sep 2023 13:17:44 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Fri, 8 Sep 2023 06:17:40 -0700 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Fri, 8 Sep 2023 06:17:40 -0700 Received: from nvidia.com (10.127.8.9) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37 via Frontend Transport; Fri, 8 Sep 2023 06:17:39 -0700 From: Artemy Kovalyov To: CC: Thomas Monjalon , Ophir Munk Subject: [PATCH v4 0/2] fix memory initialization deadlock Date: Fri, 8 Sep 2023 16:17:34 +0300 Message-ID: <20230908131737.1714750-1-artemyko@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230830103303.2428995-1-artemyko@nvidia.com> References: <20230830103303.2428995-1-artemyko@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE38:EE_|DM6PR12MB4044:EE_ X-MS-Office365-Filtering-Correlation-Id: 71210f49-9c4d-4e12-40e5-08dbb06dfd13 X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +cc5P+HHtm0gree91E0a+JBMNnaRBNtS77/Xw9fB5OWUMSkH0E7WjvXb8pwBvilM/3qN22yVB1hvfD5n8KIjzIrVzfELzv4pw8Czi+AZOc45n0zK9PeWc3lVob2KY/HdMe3Kmxf5nxoFomu0IQAIGfVvpklJWNc2p4D5uBWHW97ZAq2Kk68SpJrZ4pu9JrfViwBBI/b5QCJ9YW3eHVoF9qPTQrVyLDMr670XNpWNpxNzgaHYUkqGRubkA0n+pwu2ICABLTpRhuRVUV9kJ8P4opILjuShYIY/6XDJ3JupvkeZWVoURCZdjRL3D+T3bFbjZekDtDREboVlsFaz8XnsOaP18W8/BiGRk3KghI6ilnivdj6Wf60Fnp6Q05b/DRaYmNlbuwb2K0CJG5SPsI9Om/Do6vcYO8/I9zqUtDfgrBGLugVBcaj3QKFt5XgNNJMqGdKkziu6wpjTHtMe8jxQ9C6nuEM4DhnXk6/YDkxNx/OPPOAV2oXjuhJjRKfWUCiCQWE03g4rEaWGG5nPPYj8VlKa/GPxmXMnJuAdGA1tq69m6sMTj+rc5zZSzBwI0NpBk2anUeGcEb4dtJNg/Iw+cEd8YGi9ZvZnK7StqB70DGUQbMk3A2jdD57dmC3H3e5rD1FZtlVmUKgDS9jPptqXNYy/VDfeTxAlsLN67dgzHBdLw21BfJYBhIpw3GyfArEo45ThSJjoBbOEzbq/HT8m5R1DJu8DPC1RK0+ayYf18hwvPTbWew//NxlgycOdXld7 X-Forefront-Antispam-Report: CIP:216.228.118.233; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge2.nvidia.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(39860400002)(396003)(376002)(136003)(1800799009)(451199024)(186009)(82310400011)(40470700004)(36840700001)(46966006)(8676002)(5660300002)(8936002)(70586007)(4326008)(40480700001)(41300700001)(54906003)(70206006)(316002)(6916009)(2906002)(40460700003)(36860700001)(47076005)(82740400003)(83380400001)(55016003)(2616005)(1076003)(26005)(6286002)(107886003)(426003)(336012)(86362001)(356005)(7636003)(478600001)(36756003)(7696005)(6666004); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2023 13:17:44.5940 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 71210f49-9c4d-4e12-40e5-08dbb06dfd13 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.233]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE38.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4044 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The issue arose due to the change in the DPDK read-write lock implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed to prevent new read locks while a write lock is in the queue. However, this change has led to a scenario where a recursive read lock, where a lock is acquired twice by the same execution thread, can initiate a sequence of events resulting in a deadlock: Process 1 takes the first read lock. Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due to the presence of a read lock. This makes process 2 enter a wait loop until the read lock is released. Process 1 tries to take a second read lock. However, since a write lock is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until the write lock is acquired and then released. Both processes end up in a blocked state, unable to proceed, resulting in a deadlock scenario. Following these changes, the RW-lock no longer supports recursion, implying that a single thread shouldn't obtain a read lock if it already possesses one. The problem arises during initialization: the rte_eal_init() function acquires the memory_hotplug_lock, and later on, there are sequences of calls leading to rte_memseg_list_walk() which acquires it again without releasing it. This scenario introduces the risk of a potential deadlock when concurrent write locks are applied to the same memory_hotplug_lock. To address this we resolved the issue by replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe(). Implementing a lock annotation for rte_memseg_list_walk() to proactively identify bugs similar to this one during compile time. Artemy Kovalyov (2): eal: fix memory initialization deadlock eal: annotate rte_memseg_list_walk() lib/eal/common/eal_common_dynmem.c | 5 ++++- lib/eal/common/eal_memalloc.h | 3 ++- lib/eal/common/eal_private.h | 3 ++- lib/eal/include/generic/rte_rwlock.h | 4 ++++ lib/eal/include/rte_lock_annotations.h | 5 +++++ lib/eal/include/rte_memory.h | 4 +++- lib/eal/linux/eal_memalloc.c | 7 +++++-- 7 files changed, 25 insertions(+), 6 deletions(-) -- 1.8.3.1