From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 23F4748BA9 for ; Tue, 25 Nov 2025 12:06:37 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 188FF402E1; Tue, 25 Nov 2025 12:06:37 +0100 (CET) Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazon11013053.outbound.protection.outlook.com [40.107.201.53]) by mails.dpdk.org (Postfix) with ESMTP id DE072402E1 for ; Tue, 25 Nov 2025 12:06:35 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=mK5EIHfje4r8HnANDl1gqvktNQBvS21Sb22PTBlozFvrKwk4kpN02pp2H7mZGoknHwbEYOP+YsP2ewvJd6CmbXfkrThh9t6jwHwkge7S+05YwGhBXjzn4Y8KEHMK9vX1cZpohNu8kgBUBKQduxRJ+5CM6VtFGj65HJPuA0MsQEBszmJjFLWS/05FOyjYhKQ5V8U6WbIxvp1VsaUuNS+WAXdZ/wc3CU5Js3IiJ+z85wmEZ6ZH2VEiRRgw8S/12X8Uq7O2YventFictoxxKkslnPRA36CcOerx2Y1ZrS9AKZ1G3r925JZolzkQFQCMXqSq2KqIDrw6L3hnf8TKMju5gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=v3D2y8UppWHDEcK06Vak59w6X4avJa7kYcTOL6DCF3Y=; b=k/su2X/Js3K5M6IWrv1Vwzm/xZ8UHW/uqrvmRDpUmZxMVfIoZ+Y4B5jtRdiKQBIXUlow55iM+Wg1+P9qe5/xD4gmmw9SU6cvDlFUNQZSmFDUW3CQy5rAUwC9i8ilj1zpWh19sWDueLOFW5zmUVisk1l005Q/Mo5CPmsFXSpMfwQZFHR+V+ldOq3UB2M3s1no77CAb3vQhFjT410d8sJW8O3R1pT5jHcekdvzaQYxk1oroz01QwajPw9PlZsXaZAxU9uMVidFlsh0afU3hSdonfWwnwopzFR4RIK8YGiYOm5xjku26M8rLCBkAa5djMFpxv4XtTeppomh2lsfYKAX9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=v3D2y8UppWHDEcK06Vak59w6X4avJa7kYcTOL6DCF3Y=; b=TB5P675S4YqDkLnN38NTgseauvf4jD3RvXH+m+jeBTY/KDsxQpePhEuU68deZQ0PjivuzH8Va7LOpc9u8N158kzFqDPXQ5uMz3Zd1TahT2HqApZUYeDFhcF4gDyOZb0raWG8DN+ULVAK5lAJt+TimXwFnwtZ5uBhuv5/Kan+2wgWDnAAmt5azD6o30b5edohm8+budLzpZuIii0LbZ6OEuGkwsYXW7VnXYsZbv/GadU3b0b7xyzbSUbF9o/OcLwe/QK7OI1eeCqQqWR2jn9PvuAs6CAzCvNsSfz2UmrrctGhjyzZFreZ+GA0qoGxPk79r8OQPdICMW1J6aCBp0ttSQ== Received: from BYAPR06CA0029.namprd06.prod.outlook.com (2603:10b6:a03:d4::42) by BL3PR12MB6546.namprd12.prod.outlook.com (2603:10b6:208:38d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9343.17; Tue, 25 Nov 2025 11:06:30 +0000 Received: from MWH0EPF000989E7.namprd02.prod.outlook.com (2603:10b6:a03:d4:cafe::c3) by BYAPR06CA0029.outlook.office365.com (2603:10b6:a03:d4::42) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9343.17 via Frontend Transport; Tue, 25 Nov 2025 11:06:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by MWH0EPF000989E7.mail.protection.outlook.com (10.167.241.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9366.7 via Frontend Transport; Tue, 25 Nov 2025 11:06:29 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 25 Nov 2025 03:06:14 -0800 Received: from drhqmail201.nvidia.com (10.126.190.180) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 25 Nov 2025 03:06:13 -0800 Received: from nvidia.com (10.127.8.14) by mail.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20 via Frontend Transport; Tue, 25 Nov 2025 03:06:10 -0800 From: Maayan Kashani To: CC: , , , Viacheslav Ovsiienko , Bing Zhao , Ori Kam , Suanming Mou , Matan Azrad Subject: [PATCH 24.11 v2] net/mlx5: fix device start error handling Date: Tue, 25 Nov 2025 13:06:05 +0200 Message-ID: <20251125110607.178051-1-mkashani@nvidia.com> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWH0EPF000989E7:EE_|BL3PR12MB6546:EE_ X-MS-Office365-Filtering-Correlation-Id: 22d0728d-13e4-49f8-5a54-08de2c12af50 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|376014|82310400026|1800799024|7053199007; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?/vfyIs25AoTeWlrIJPs2zdmMie80pmnDzcqC2zZQ8ozR3Sbg/yBBygebAyhH?= =?us-ascii?Q?Fd3YS9fy31JGxzSkBniaFqZuQFXHpAPoNH51aYiX+RYIDqyEozzkPj4RjDns?= =?us-ascii?Q?VjZ02oj9RH3Rj5KZ6kALPrn8ff0r9wAjp6+2s/1NWTzY2lajiI3Ctp5zUAlV?= =?us-ascii?Q?QrPp6vT4DjY8TTMnlHYvfoK7fZn1hXjY+aZ8HbTFlzNE7x0Pzt9dOPJ1Qphr?= =?us-ascii?Q?/y+ikAg7sVMwShakONPIye2yz4DoKjUvT8TNMEJnYN5TNODbNgbwYf/QDzVj?= =?us-ascii?Q?pqjwdeGbLpnjF6AuTve8cbY4H5v9wxrCXgY80thZfe79lGhf0jYMl8dOd5q9?= =?us-ascii?Q?RGgYr3pH+5GjngZVbCYAqygnEXfd/ME3BxzJ6OS8ShBdf8O0YSkNa3OAzYOo?= =?us-ascii?Q?boSlpaB21c9jT7ht2r1EfZ+zYSGmJkDaoHdmGTOPe6NrznAZ697kwyOo0TXe?= =?us-ascii?Q?iTfiGEk4MxJQRiTii3dhEb8/F9HEr6Ckb6jPibRidNqHAQR02RK2R2VD939X?= =?us-ascii?Q?QbeKrirAO/CtZZltRtQjavcWNglysTc3RcbStmPDL0ayyUv4T6/ATXUuQhws?= =?us-ascii?Q?VANm5bAJW2QogCt3W52UWhcTA5MQJVYF0rfrHIIktJ/GoLHmpO9OSNxxkWqg?= =?us-ascii?Q?lz1MNK64Ff0ftYUBm8dAavsHK3xDRwnw+wYtJoNbuFzFhQRaw3fCvignbZik?= =?us-ascii?Q?3xPTsGjkrdCINBbMJfGHwXERSF6krwcDbsbRMyKi/YKsHCfu+Fzmvz34H5Vu?= =?us-ascii?Q?zMZhiEV/Fi/8hQxHGCXfCIwRQcMTt9iIMWgNP8lYjBmbypN5K/4UM5h6FVVA?= =?us-ascii?Q?38QvWrYBd3rL/zg6cSHhTEAJoVnMTY3qbsI69Urmhibc/b2y8h3m9P0C0X6x?= =?us-ascii?Q?s+9V3rWl63cy1a0vyhl2T7q1mgXjhnzQIVep/KVDGp9fNR8QIKie+YuRn44L?= =?us-ascii?Q?5f90/JI6QKLPWevzfaRXgl5FdyX8wnRP6JQ9m5FBSJMb+wcILt0YiWhTz7o4?= =?us-ascii?Q?6snhmpdP9M9GQuGmAidBgFFqeWOrKKVbupl7EQbnofvJefsRPkucee9xp+ya?= =?us-ascii?Q?z03rDB2z5ZwNAdzvsmRCMrJSly4lH4qF6yb9UhpEY9l4NzNJwO7Flq4C7z03?= =?us-ascii?Q?KOzJbVFQ5QnmpBa5LL3H99R+Zom1D1+mGEenqkXNngu3yNL4qDntPM/6uJoY?= =?us-ascii?Q?dm8xzR1Rr04isa62imdFcunJmeOO7pgZ05xwuWdW8avlSoQmleHY0/bNphQQ?= =?us-ascii?Q?YRwcIUDZNKo4EqNvDuzkYasMCXewLnaIzfVeWyqkkFo2A87qmElMoK6+1DU9?= =?us-ascii?Q?Ga+RLz4/Ch71K3ozvhiwYN5ETg3WK5jxDeB27d6oIoVKuSl9f/dG5uH04m8W?= =?us-ascii?Q?odWcdAAzejf9lrHh1rCpN73xrmhpD0HT/xo/YUoUGOEQMF7hcuQXep/eKdaz?= =?us-ascii?Q?Kyvv7WDp5puPQ53zVVG5o0TBcrq6uFaVNeMrovND1nFGKmMLZ+YMOg4UCmK+?= =?us-ascii?Q?xGZkEvaEznTBEOOlvl40BY4/0zZVNWzKf0fEZsQMs9Vm59Ll88kZkKJu5WgJ?= =?us-ascii?Q?A6MBk7ENRQSH+O8YkCU=3D?= X-Forefront-Antispam-Report: CIP:216.228.118.233; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge2.nvidia.com; CAT:NONE; SFS:(13230040)(36860700013)(376014)(82310400026)(1800799024)(7053199007); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Nov 2025 11:06:29.4624 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 22d0728d-13e4-49f8-5a54-08de2c12af50 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.233]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MWH0EPF000989E7.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR12MB6546 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org [ upstream commit 860f6c63dbc1 ] When mlx5_dev_start() fails partway through initialization, the error cleanup code unconditionally calls cleanup functions for all steps, including those that were never successfully initialized. This causes state corruption leading to incorrect behavior on subsequent start attempts. The issue manifests as: 1. First start attempt fails with -ENOMEM (expected) 2. Second start attempt returns -EINVAL instead of -ENOMEM 3. With flow isolated mode, second attempt incorrectly succeeds, leading to segfault in rte_eth_rx_burst() Root cause: The single error label cleanup path calls functions like mlx5_traffic_disable() and mlx5_flow_stop_default() even when their corresponding initialization functions (mlx5_traffic_enable() and mlx5_flow_start_default()) were never called due to earlier failure. For example, when mlx5_rxq_start() fails: - mlx5_traffic_enable() at line 1403 never executes - mlx5_flow_start_default() at line 1420 never executes - But cleanup unconditionally calls: * mlx5_traffic_disable() - destroys control flows list * mlx5_flow_stop_default() - corrupts flow metadata state This corrupts the device state, causing subsequent start attempts to fail with different errors or, in isolated mode, to incorrectly succeed with an improperly initialized device. Fix by replacing the single error label with cascading error labels (Linux kernel style). Each label cleans up only its corresponding step, then falls through to clean up earlier steps. This ensures only successfully initialized steps are cleaned up, maintaining device state consistency across failed start attempts. Bugzilla ID: 1419 Fixes: 8db7e3b69822 ("net/mlx5: change operations for non-cached flows") Cc: stable@dpdk.org Signed-off-by: Maayan Kashani Acked-by: Dariusz Sosnowski --- drivers/net/mlx5/mlx5_trigger.c | 60 +++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 18 deletions(-) diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index 485984f9b06..138563c3f66 100644 --- a/drivers/net/mlx5/mlx5_trigger.c +++ b/drivers/net/mlx5/mlx5_trigger.c @@ -1135,6 +1135,11 @@ mlx5_hw_representor_port_allowed_start(struct rte_eth_dev *dev) #endif +#define SAVE_RTE_ERRNO_AND_STOP(ret, dev) do { \ + ret = rte_errno; \ + (dev)->data->dev_started = 0; \ +} while (0) + /** * DPDK callback to start the device. * @@ -1217,19 +1222,23 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u Tx packet pacing init failed: %s", dev->data->port_id, strerror(rte_errno)); + SAVE_RTE_ERRNO_AND_STOP(ret, dev); goto error; } if (mlx5_devx_obj_ops_en(priv->sh) && priv->obj_ops.lb_dummy_queue_create) { ret = priv->obj_ops.lb_dummy_queue_create(dev); - if (ret) - goto error; + if (ret) { + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto txpp_stop; + } } ret = mlx5_txq_start(dev); if (ret) { DRV_LOG(ERR, "port %u Tx queue allocation failed: %s", dev->data->port_id, strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto lb_dummy_queue_release; } if (priv->config.std_delay_drop || priv->config.hp_delay_drop) { if (!priv->sh->dev_cap.vf && !priv->sh->dev_cap.sf && @@ -1253,7 +1262,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u Rx queue allocation failed: %s", dev->data->port_id, strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto txq_stop; } /* * Such step will be skipped if there is no hairpin TX queue configured @@ -1263,7 +1273,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u hairpin auto binding failed: %s", dev->data->port_id, strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto rxq_stop; } /* Set started flag here for the following steps like control flow. */ dev->data->dev_started = 1; @@ -1271,7 +1282,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u Rx interrupt vector creation failed", dev->data->port_id); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto rxq_stop; } mlx5_os_stats_init(dev); /* @@ -1283,7 +1295,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) DRV_LOG(ERR, "port %u failed to attach indirect actions: %s", dev->data->port_id, rte_strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto rx_intr_vec_disable; } #ifdef HAVE_MLX5_HWS_SUPPORT if (priv->sh->config.dv_flow_en == 2) { @@ -1291,7 +1304,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u failed to update HWS tables", dev->data->port_id); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto action_handle_detach; } } #endif @@ -1299,7 +1313,8 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(ERR, "port %u failed to set defaults flows", dev->data->port_id); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto action_handle_detach; } /* Set dynamic fields and flags into Rx queues. */ mlx5_flow_rxq_dynf_set(dev); @@ -1316,12 +1331,14 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (ret) { DRV_LOG(DEBUG, "port %u failed to start default actions: %s", dev->data->port_id, strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto traffic_disable; } if (mlx5_dev_ctx_shared_mempool_subscribe(dev) != 0) { DRV_LOG(ERR, "port %u failed to subscribe for mempool life cycle: %s", dev->data->port_id, rte_strerror(rte_errno)); - goto error; + SAVE_RTE_ERRNO_AND_STOP(ret, dev); + goto stop_default; } rte_wmb(); dev->tx_pkt_burst = mlx5_select_tx_function(dev); @@ -1348,18 +1365,25 @@ mlx5_dev_start(struct rte_eth_dev *dev) priv->sh->port[priv->dev_port - 1].devx_ih_port_id = (uint32_t)dev->data->port_id; return 0; -error: - ret = rte_errno; /* Save rte_errno before cleanup. */ - /* Rollback. */ - dev->data->dev_started = 0; +stop_default: mlx5_flow_stop_default(dev); +traffic_disable: mlx5_traffic_disable(dev); - mlx5_txq_stop(dev); +action_handle_detach: + mlx5_action_handle_detach(dev); +rx_intr_vec_disable: + mlx5_rx_intr_vec_disable(dev); +rxq_stop: mlx5_rxq_stop(dev); +txq_stop: + mlx5_txq_stop(dev); +lb_dummy_queue_release: if (priv->obj_ops.lb_dummy_queue_release) priv->obj_ops.lb_dummy_queue_release(dev); - mlx5_txpp_stop(dev); /* Stop last. */ - rte_errno = ret; /* Restore rte_errno. */ +txpp_stop: + mlx5_txpp_stop(dev); +error: + rte_errno = ret; return -rte_errno; } -- 2.43.0