From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2E7034301E for ; Thu, 10 Aug 2023 01:54:35 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 18F1E40DD8; Thu, 10 Aug 2023 01:54:35 +0200 (CEST) Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2071.outbound.protection.outlook.com [40.107.212.71]) by mails.dpdk.org (Postfix) with ESMTP id 8EF2E406B6 for ; Thu, 10 Aug 2023 01:54:33 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nPwrrNzafDNHS4dJjx2FIJSDb7DeBkpj+HKHXwq4fZ30TNYENR5XloLNEjDqGC1PZcu/aCh2ICbbizeXAdLwyXgmaxYetoHAZA9Xz6BPfbdhnZH99g3Of21eM4ZavnCYvHVzK2RgGGXSE3KRRydOdjBVYGRtTHQGVAnx2jg0xMG2j8D7XDK5y8wwtzlxWGXMVfOw6okF0QkdjfAMjs/N0BGUvNdQfXaD9FPcmvi2LLeqgq5ZCeRIbKTiZhTqa2JGkrWtS3cNpk89tdeRatD9OblKiQtgckPdTIw+QU+4ur+M+kB2LDG8BO/CAxFZqpalZRbI8ZRxjbZcjF/a4+VF8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XLnAhmcqXDg5tzdAQMOWaX40Xt66+IZSWFVYHaKAwpg=; b=dYJHx2FS4NNPFUiRuQKHZCdqhofDqmJpKooNdLArgZbYBsEr/MuELFIXjLzbrvRUWsGyAOjXQ0dCljmg+IdYGA0JF+YM5buUeRpDN2vZR6alV9DMbG/3zbrbdIy1yibXMUix8hAM3mdHrA+SmrDrpWxqijDefaU7vT8JttCWWsnnBFu6/XZKY6CVU4GKQWbuzevzZ0LSHXpDyjgg3DeoNhBS41g9DDM5Irm1XD3uXK2UsApPc5GMAqimuTx4fmChRNPLhfIQe9gGoUDb9QroYuMRg+FQfWP0Acdvb+XGXpHqQj1XvE/DWLHA/rLjuWbz+j5IqQq0CX9CuHGscXgrmA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XLnAhmcqXDg5tzdAQMOWaX40Xt66+IZSWFVYHaKAwpg=; b=S8dbPwClAsTU1FwnCuwbHMzlakEwUfs3UxhMkUzE5hKiSDhff7vAEBr28waGNjVW536vLAYIvJ30MatgcDCDk45zYa1wnkwQB2Wm4yUAPkx+5DbcocZi/oPd9W8mAQFzDwl0MxZSE5ekHKob856bCgmjc+Xx9UOL/2hxdDJBgVwqrIaI420XskwYHCoIC5ZosYd/0bJ2UwsyKuwml63hbczXoP3WS7ZiJn97WXE9I/7e9ydMRM0QXLSPFrcrtbPeNdxwntSjvQKiS5VgV61LkYooxiNUb9QIeqGEqQeww3oZ2dfonT+9TsAl71tt+9Ps/agVnF6xWmWsApd9ioNgXw== Received: from DM6PR06CA0003.namprd06.prod.outlook.com (2603:10b6:5:120::16) by DM6PR12MB4957.namprd12.prod.outlook.com (2603:10b6:5:20d::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.30; Wed, 9 Aug 2023 23:54:31 +0000 Received: from CY4PEPF0000E9D7.namprd05.prod.outlook.com (2603:10b6:5:120:cafe::ec) by DM6PR06CA0003.outlook.office365.com (2603:10b6:5:120::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.30 via Frontend Transport; Wed, 9 Aug 2023 23:54:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CY4PEPF0000E9D7.mail.protection.outlook.com (10.167.241.78) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6652.19 via Frontend Transport; Wed, 9 Aug 2023 23:54:31 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.5; Wed, 9 Aug 2023 16:54:16 -0700 Received: from nvidia.com (10.126.230.37) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.37; Wed, 9 Aug 2023 16:54:15 -0700 From: Xueming Li To: Viacheslav Ovsiienko CC: dpdk stable Subject: patch 'net/mlx5: fix device removal event handling' has been queued to stable release 22.11.3 Date: Thu, 10 Aug 2023 07:48:08 +0800 Message-ID: <20230809234930.32424-47-xuemingl@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230809234930.32424-1-xuemingl@nvidia.com> References: <20230625063544.11183-1-xuemingl@nvidia.com> <20230809234930.32424-1-xuemingl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.37] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9D7:EE_|DM6PR12MB4957:EE_ X-MS-Office365-Filtering-Correlation-Id: 17b86fa7-c675-45c8-cbb6-08db9933f9e9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fj9GOJe9l93TXd75scWLA1MYeFkgL8B55oqNZhnphNtxGZAVY/JgDV/eiW1TTpFS/5BMMMX7RXQagu9TD82fVqSbQ3fc3y1EIVyCyHdZk9yo91Ci6b36tNFRO4j1uLoaq4mdDWGfE5jJ/QBTQzX0PR3+zA4EiaECmhUcCvr7wC1ogYJ1b0qRdaa81AUu+oHCPbgmqzDf2uTqtOZUvhG2Znrt3d4g4LGNY4AdX88sPyi2EP7hiXvBTOoTaKkfwCaMBNHa0OzVujFe1g0z3jE0HJ76Azoz+J4NlNyTVCT2GkwGPD51HWXwPUv+AG3714oNXTEyIRn9dqrhFjWN/JIw/jQ1Mgwtn7KHnFdOLoMKguXQt+arhNJm8XJbQiWn/xzIqp5SqcNkhyzGV3GWsyYnBhk7YOe3SoROri+4Vywj+yEgQkJwU1DGy/q3CI6o2w25yrdHsqQeG8grRY+ll+yOeHV3pUNVG4A4wlac+h4kQu9seWX+XGtYyYmK22WiR95pYFDY1O7K5vwNHk8b+Thh/1emnbhiPGCZEIZBBvxM71f1wdgG4lZODPsvQ24a1/CTkqNPbLQnyT3WRx4Hx6jFr8Ke5P5cSa1INo4ACFjGGm+QFpRbwkvesKMOXJX0v3/dQ8pG3N0WbB0Z1c62aE9bgTvFAz/Cf6ArU8p2Rr+CQCQP492ytPeMmIHqmL7WMgEot7IKOPN1PEP9Qz5alUcX++TabL1jm8jLbQ8kKrPG16zX1Hl3fFThhbDe5Ujej2arcA7w4k1dn2ErOFAMOT3c4w== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(376002)(396003)(1800799006)(186006)(451199021)(82310400008)(40470700004)(36840700001)(46966006)(40460700003)(55016003)(40480700001)(6286002)(37006003)(6636002)(70586007)(70206006)(336012)(16526019)(53546011)(1076003)(26005)(6666004)(478600001)(36756003)(40140700001)(83380400001)(47076005)(426003)(2616005)(36860700001)(41300700001)(316002)(966005)(4326008)(7696005)(2906002)(86362001)(82740400003)(356005)(7636003)(6862004)(8676002)(8936002)(5660300002); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2023 23:54:31.7194 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 17b86fa7-c675-45c8-cbb6-08db9933f9e9 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9D7.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4957 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 22.11.3 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 08/11/23. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://git.dpdk.org/dpdk-stable/log/?h=22.11-staging This queued commit can be viewed at: https://git.dpdk.org/dpdk-stable/commit/?h=22.11-staging&id=79310b1b61afd7556000b23303a28a4c49ed8fa2 Thanks. Xueming Li --- >From 79310b1b61afd7556000b23303a28a4c49ed8fa2 Mon Sep 17 00:00:00 2001 From: Viacheslav Ovsiienko Date: Tue, 30 May 2023 18:13:28 +0300 Subject: [PATCH] net/mlx5: fix device removal event handling Cc: Xueming Li [ upstream commit 22dc56cfbd39692eb74fad93ff5ecc3df5fd0633 ] On the device removal kernel notifies user space application with queueing the IBV_DEVICE_FATAL_EVENT and triggering appropriate file descriptor. Mellanox kernel driver stack emits this event twice from different layers (mlx5 and uverbs). The IB port index is not applicable in the event structure and should be ignored for IBV_DEVICE_FATAL_EVENT events. Also, on the older kernels (at least from OFED 4.9) there might be race conditions causing the event queue close before application fetches the IBV_DEVICE_FATAL_EVENT message with ibv_get_async_event() API. To provide the reliable device removal event detection the patch: - ignores the IB port index for the IBV_DEVICE_FATAL_EVENT - introduces the flag to notify PMD about removal only once - acks event with ibv_ack_async_event after actual handling - checks for EIO error, making sure queue is not closed yet Fixes: 40d9f906f4e2 ("net/mlx5: fix device removal handler for multiport") Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 +++++++++++++++++-------- drivers/net/mlx5/mlx5.h | 1 + 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c index 55801534d1..639e629fe4 100644 --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c @@ -746,6 +746,7 @@ mlx5_dev_interrupt_device_fatal(struct mlx5_dev_ctx_shared *sh) for (i = 0; i < sh->max_port; ++i) { struct rte_eth_dev *dev; + struct mlx5_priv *priv; if (sh->port[i].ih_port_id >= RTE_MAX_ETHPORTS) { /* @@ -756,9 +757,14 @@ mlx5_dev_interrupt_device_fatal(struct mlx5_dev_ctx_shared *sh) } dev = &rte_eth_devices[sh->port[i].ih_port_id]; MLX5_ASSERT(dev); - if (dev->data->dev_conf.intr_conf.rmv) + priv = dev->data->dev_private; + MLX5_ASSERT(priv); + if (!priv->rmv_notified && dev->data->dev_conf.intr_conf.rmv) { + /* Notify driver about removal only once. */ + priv->rmv_notified = 1; rte_eth_dev_callback_process (dev, RTE_ETH_EVENT_INTR_RMV, NULL); + } } } @@ -830,21 +836,29 @@ mlx5_dev_interrupt_handler(void *cb_arg) struct rte_eth_dev *dev; uint32_t tmp; - if (mlx5_glue->get_async_event(sh->cdev->ctx, &event)) + if (mlx5_glue->get_async_event(sh->cdev->ctx, &event)) { + if (errno == EIO) { + DRV_LOG(DEBUG, + "IBV async event queue closed on: %s", + sh->ibdev_name); + mlx5_dev_interrupt_device_fatal(sh); + } break; - /* Retrieve and check IB port index. */ - tmp = (uint32_t)event.element.port_num; - if (!tmp && event.event_type == IBV_EVENT_DEVICE_FATAL) { + } + if (event.event_type == IBV_EVENT_DEVICE_FATAL) { /* - * The DEVICE_FATAL event is called once for - * entire device without port specifying. - * We should notify all existing ports. + * The DEVICE_FATAL event can be called by kernel + * twice - from mlx5 and uverbs layers, and port + * index is not applicable. We should notify all + * existing ports. */ - mlx5_glue->ack_async_event(&event); mlx5_dev_interrupt_device_fatal(sh); + mlx5_glue->ack_async_event(&event); continue; } - MLX5_ASSERT(tmp && (tmp <= sh->max_port)); + /* Retrieve and check IB port index. */ + tmp = (uint32_t)event.element.port_num; + MLX5_ASSERT(tmp <= sh->max_port); if (!tmp) { /* Unsupported device level event. */ mlx5_glue->ack_async_event(&event); diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 9f27e1dba4..5f8361c52b 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1665,6 +1665,7 @@ struct mlx5_priv { unsigned int mtr_en:1; /* Whether support meter. */ unsigned int mtr_reg_share:1; /* Whether support meter REG_C share. */ unsigned int lb_used:1; /* Loopback queue is referred to. */ + unsigned int rmv_notified:1; /* Notified about removal event */ uint32_t mark_enabled:1; /* If mark action is enabled on rxqs. */ uint16_t domain_id; /* Switch domain identifier. */ uint16_t vport_id; /* Associated VF vport index (if any). */ -- 2.25.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2023-08-09 21:51:19.414550200 +0800 +++ 0046-net-mlx5-fix-device-removal-event-handling.patch 2023-08-09 21:51:18.194352000 +0800 @@ -1 +1 @@ -From 22dc56cfbd39692eb74fad93ff5ecc3df5fd0633 Mon Sep 17 00:00:00 2001 +From 79310b1b61afd7556000b23303a28a4c49ed8fa2 Mon Sep 17 00:00:00 2001 @@ -4,0 +5,3 @@ +Cc: Xueming Li + +[ upstream commit 22dc56cfbd39692eb74fad93ff5ecc3df5fd0633 ] @@ -26 +28,0 @@ -Cc: stable@dpdk.org @@ -102 +104 @@ -index fffd3c79f1..10a2f33ea0 100644 +index 9f27e1dba4..5f8361c52b 100644 @@ -105 +107 @@ -@@ -1744,6 +1744,7 @@ struct mlx5_priv { +@@ -1665,6 +1665,7 @@ struct mlx5_priv { @@ -111,2 +113,2 @@ - uint32_t num_lag_ports:4; /* Number of ports can be bonded. */ - uint32_t tunnel_enabled:1; /* If tunnel offloading is enabled on rxqs. */ + uint16_t domain_id; /* Switch domain identifier. */ + uint16_t vport_id; /* Associated VF vport index (if any). */