From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 297A545F20; Mon, 23 Dec 2024 11:11:58 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EA45140B92; Mon, 23 Dec 2024 11:11:51 +0100 (CET) Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2083.outbound.protection.outlook.com [40.107.223.83]) by mails.dpdk.org (Postfix) with ESMTP id 0E34840A75 for ; Mon, 23 Dec 2024 11:11:48 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=HipH5htA0jRvCZmT46Hc/9keYglsW6XiZ3C5r3JxAkQHEy9EEB4YfjvpyOLamkRdJnbxtybzeB3c2t5E20Ax5UpgNvIJFuy2ObYiwdnL7NrePHb6VotKXLoh+8g5VobxVyJ5QsPCayj55moJlrnv9dxAzE+vsG64NoBUi1k3hS5tFxBiPP5KlFyoz3zpFiSzlBOSWFR4iIVXqGM4cd9O2l7mHMYTr1OXSblScuUC1fSujgOG2Jviv9ntYGUvsPXuYte5t5/+KbFMTMD2j89SLDVBlJSPtK4O6ZofEO5J2JutvwwJIj3cXIAMD/Acie0ITtMpVfuASuYLvYccQwx/bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rlOLpdyx96/VMyaycft1HiV3xVANSLHGFfefQuleZg8=; b=PbmJAnUFCpHHCZplXXelsE4VATVtEW55uXCPRQ8KFACpl1wO1xnmujBYat8FGbCRyPffBdf25XnPyEGHzyJRZKf0bVv8vpP/ICq2y8cf+sJlkvTAk8XDt/zzIq29Pyj6VorkMuPT9dXXuDIcJsavzSbsb8z2Ycr1O9rvD8rDYYvMs0w2Ba1ic15w0Qo0E9onVenLVRrS7nqvy+Qy7JVhZVfHv+yTzY+JqWxt6imjrtm4I+sAtCwu21FKafEnLYJh6tTYEhCCrxb24984QAxXujYOVDbcAvMqKKfHzekW+/rRe4B6tu8FKwNpEZ1/QR6LzbArQtLA1w/7dgG7gt6Q6A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=monjalon.net smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rlOLpdyx96/VMyaycft1HiV3xVANSLHGFfefQuleZg8=; b=mdizC3ZT+zZJfXQI9iJN7bFQIHbpQTSbe8ZxGt9VLUFdlSJ08qHT3Bo4sMNQkzcPdIbDkAEM6uLEv+AwfcGNkbb1ct/DP3pThY79kNh1ckN54vx+0ec1rOO8zo6LLOJl1smSa4zf9/GjLKbUBhLGR+qkeLcKJBptrUgLJXhvu+Ze+1s2E0OvXt4Nr9YPWG3yXNgCPPAAysth1e/1pDeRqNBO0uSsjjaqbGEnsl/S6WAqpi5+aYnCOamFq5HTt04nvxTW1k9ATp7vbzRZVQbYR9fenDIoihbHHnan/2JF4YciscK0JOoWpZ/z6Bs2bH4QcY/TDfug8aEjrGEpMWzStg== Received: from BL1PR13CA0392.namprd13.prod.outlook.com (2603:10b6:208:2c2::7) by IA1PR12MB8407.namprd12.prod.outlook.com (2603:10b6:208:3d9::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8272.18; Mon, 23 Dec 2024 10:11:37 +0000 Received: from BN1PEPF0000468A.namprd05.prod.outlook.com (2603:10b6:208:2c2:cafe::49) by BL1PR13CA0392.outlook.office365.com (2603:10b6:208:2c2::7) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8293.11 via Frontend Transport; Mon, 23 Dec 2024 10:11:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by BN1PEPF0000468A.mail.protection.outlook.com (10.167.243.135) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8293.12 via Frontend Transport; Mon, 23 Dec 2024 10:11:37 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 23 Dec 2024 02:11:25 -0800 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Mon, 23 Dec 2024 02:11:21 -0800 From: "Minggang Li(Gavin)" To: , , , , Dariusz Sosnowski , Bing Zhao , Suanming Mou CC: , , Rongwei Liu Subject: [PATCH 2/7] net/mlx5: optimize device probing Date: Mon, 23 Dec 2024 12:10:56 +0200 Message-ID: <20241223101101.677449-3-gavinl@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241223101101.677449-1-gavinl@nvidia.com> References: <20241223101101.677449-1-gavinl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail201.nvidia.com (10.129.68.8) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN1PEPF0000468A:EE_|IA1PR12MB8407:EE_ X-MS-Office365-Filtering-Correlation-Id: a5da1f2f-5cdb-433e-17e2-08dd233a2fbd X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|4022899009|82310400026|1800799024|376014|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?XUff9gnPDPfPUjE6NzXOftJMG9eqmZGzVUwTZFu2sljzwAmvT3e9bgSjOTWq?= =?us-ascii?Q?fjHmmASX8Aych/e5j7R0Bvfbwq8QAhhf3KelHf+BFrxHMxG7he1EH+9RAV5R?= =?us-ascii?Q?LMIvUSZKCEHGPOm0GS+unr7eEputt7CDFC6tcaGswkjycQxyeWjz3rynfpvM?= =?us-ascii?Q?Owi1OWw6cAHUdn6AkqhGXjaRhrsH3Si0sJcuBYgJ/UDDWhVwtEd9jG0Y863F?= =?us-ascii?Q?93jt0LotRovAbqC95meeoWCYVn/v3yCF7yZWF87pNGLcRN7XNGWjdjeJigiN?= =?us-ascii?Q?61uO9e5tVlkUE1CXGeClBM9V327TIWcoTKUZaHui7lPdhrpLwERGuSPNY+WO?= =?us-ascii?Q?u3/GzcidwD5/iFzI5S0OQt3tQRFHhDrB0CZw0/yjW0WF27354x+wSXbnlB8q?= =?us-ascii?Q?4ifWD750ldEoUda/vHl41+jt4dLuZD8FgAO5QmxbtuP1acdoffEPAplGPd13?= =?us-ascii?Q?8kP0SzhGrildk5efKOthQSopGI+Ly9SoOhhqoNt2l1bUZzIEG79fm7OOn3pd?= =?us-ascii?Q?NEglpjP596ESkfHXfa7Yd8g8RQ5+qxj6TZyzcuxjmnpfP2IhjKZUdTiQoqKp?= =?us-ascii?Q?9p4bvHncp5YtGQN9Hk8JzqbuYJcjEovXMeAJ/sikPHv5B2XDXhZZKY0pfCBN?= =?us-ascii?Q?U3e/MT7V8anxvMZtqUnNFQyLKk2JL/pNodSCp/TUDML5WxRH0u7sMSmma/Ni?= =?us-ascii?Q?kjFUgHmRmCTMLsOpelgJDJcVNlP0bU6fveIU3VLeOyJtMMVRrh/r4TuvSAK4?= =?us-ascii?Q?VSIjRx2t32YkURiIcq+39/38X7GkUqkS7nHSAxV4DBi2rD+7tfGBiLOmiiAB?= =?us-ascii?Q?33GIthqNNZDJvmyDc79e7WDsbja9neNVegy/A8b2tF4mH6bgLXsJ30ytL1JK?= =?us-ascii?Q?z8wcHgwnwPYadlzZ+q1WdgL6TlHbyT+X5LYJj1lrXs0mOvuTEZLtY/svt6ah?= =?us-ascii?Q?wLs+dxNZJSR5/ZUzqmhEX7MjXkEtuG/kqscXq8G2ch7LxXVhARhIxosh41K0?= =?us-ascii?Q?bTkkHtIBStOgwJtv2j/RXwQ+tBl67tGTKaXIn+itC8pIYbBGWWzuTRzk9zRk?= =?us-ascii?Q?jYKNtM43Hr3h9aD0KGXGc4Dt64wO9gn5XwjcMpSerWom4n+NxloBbv/qH1aN?= =?us-ascii?Q?1T/XEcbK11WSzctDYQE9GpNEaTHotSkSp836V80sXmTA0WjvjfGA7O3ows2+?= =?us-ascii?Q?61apILnrOROSGzm0pCNjnX+a5bMZxX0y/8Jt4euwhIt5eG/X+9zhMrUubr98?= =?us-ascii?Q?kwDRL+hzCRV5HLsg3iTktvK/+QYPnuuKEmfYESGQooj11MmH7G/Cs440iF9q?= =?us-ascii?Q?rG49l8ZT3PVU2mp1GYZMVhVl+JMleDWPK7yWh/mJ+CWrRnSCtiX0bEdaET7Q?= =?us-ascii?Q?6eZV6YzT8YSFEQ55Y2rrDuB82sxQVN7WUfW4IN3WK8eBwsij5zNuuXqAxsgD?= =?us-ascii?Q?XUUkq2tA609S+jAts+AkZQcClsxYQheGK9oz96XhoV4mKLdne9fZ5kSaANuP?= =?us-ascii?Q?frD5EFgxU38nlKA=3D?= X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(4022899009)(82310400026)(1800799024)(376014)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Dec 2024 10:11:37.0287 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a5da1f2f-5cdb-433e-17e2-08dd233a2fbd X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN1PEPF0000468A.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB8407 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Rongwei Liu Current DPDK probing logic is: 1. Query IB device index and total port number. 2. Query each port information by traversing the port index and get the port's ifindex, name and state information etc. 3. Compare the information with devargs until getting matched. 4. For each probing device, repeat steps 2 and 3. Step 2 will communicate with kernel via netlink and it's time-consuming. There is no need to repeat netlink communication for each probing device, PMD can traverse all ports once and save the information into a caching structure. Introduce the device information caching in the mlx5 common device handle and cache the port number, ibindex, port ifindex. For dynamic interface changing: 1. New VF by toggling switchdev mode should restart dpdk as sriov configuration changed. 2. Changing VF number w/o toggling switchdev mode will trigger RTM_DELLINK and RTM_NEWLINK events. All the caching information is cleared. 3. New SF triggers RTM_NEWLINK event and no port index information in the message. All free entries (ifindex = 0) in the cache are invalidated. 4. Delete SF triggers RTM_DELLINK event. Traverse the cache entries and invalidate the one with the same ifindex. Didn't consider race-condition between probing thread and interrupt thread. Signed-off-by: Rongwei Liu Acked-by: Viacheslav Ovsiienko --- drivers/common/mlx5/linux/mlx5_common_os.h | 6 ++ drivers/common/mlx5/linux/mlx5_nl.c | 94 +++++++++++++---- drivers/common/mlx5/linux/mlx5_nl.h | 8 +- drivers/common/mlx5/mlx5_common.c | 5 + drivers/common/mlx5/mlx5_common.h | 13 +++ drivers/common/mlx5/windows/mlx5_common_os.h | 5 + drivers/net/mlx5/linux/mlx5_ethdev_os.c | 54 ++++++++++ drivers/net/mlx5/linux/mlx5_os.c | 104 ++++++++++++++----- drivers/net/mlx5/linux/mlx5_os.h | 6 -- drivers/net/mlx5/windows/mlx5_os.h | 5 - 10 files changed, 242 insertions(+), 58 deletions(-) diff --git a/drivers/common/mlx5/linux/mlx5_common_os.h b/drivers/common/mlx5/linux/mlx5_common_os.h index e8aa1d46ec..2e2c54f1fa 100644 --- a/drivers/common/mlx5/linux/mlx5_common_os.h +++ b/drivers/common/mlx5/linux/mlx5_common_os.h @@ -22,6 +22,12 @@ #include "mlx5_glue.h" #include "mlx5_malloc.h" +/* verb enumerations translations to local enums. */ +enum { + MLX5_FS_NAME_MAX = IBV_SYSFS_NAME_MAX + 1, + MLX5_FS_PATH_MAX = IBV_SYSFS_PATH_MAX + 1 +}; + /** * Get device name. Given an ibv_device pointer - return a * pointer to the corresponding device name. diff --git a/drivers/common/mlx5/linux/mlx5_nl.c b/drivers/common/mlx5/linux/mlx5_nl.c index a5ac4dc543..e98073aafe 100644 --- a/drivers/common/mlx5/linux/mlx5_nl.c +++ b/drivers/common/mlx5/linux/mlx5_nl.c @@ -1073,16 +1073,18 @@ mlx5_nl_port_info(int nl, uint32_t pindex, struct mlx5_nl_port_info *data) uint32_t sn = MLX5_NL_SN_GENERATE; int ret; - ret = mlx5_nl_send(nl, &req.nh, sn); - if (ret < 0) - return ret; - ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, data); - if (ret < 0) - return ret; - if (!(data->flags & MLX5_NL_CMD_GET_IB_NAME) || - !(data->flags & MLX5_NL_CMD_GET_IB_INDEX)) - goto error; - data->flags = 0; + if (data->ibindex == UINT32_MAX) { + ret = mlx5_nl_send(nl, &req.nh, sn); + if (ret < 0) + return ret; + ret = mlx5_nl_recv(nl, sn, mlx5_nl_cmdget_cb, data); + if (ret < 0) + return ret; + if (!(data->flags & MLX5_NL_CMD_GET_IB_NAME) || + !(data->flags & MLX5_NL_CMD_GET_IB_INDEX)) + goto error; + data->flags = 0; + } sn = MLX5_NL_SN_GENERATE; req.nh.nlmsg_type = RDMA_NL_GET_TYPE(RDMA_NL_NLDEV, RDMA_NLDEV_CMD_PORT_GET); @@ -1109,7 +1111,7 @@ mlx5_nl_port_info(int nl, uint32_t pindex, struct mlx5_nl_port_info *data) !(data->flags & MLX5_NL_CMD_GET_NET_INDEX) || !data->ifindex) goto error; - return 1; + return 0; error: rte_errno = ENODEV; return -rte_errno; @@ -1128,21 +1130,48 @@ mlx5_nl_port_info(int nl, uint32_t pindex, struct mlx5_nl_port_info *data) * IB device name. * @param[in] pindex * IB device port index, starting from 1 + * @param[in] dev_info + * Cached mlx5 device information. * @return * A valid (nonzero) interface index on success, 0 otherwise and rte_errno * is set. */ unsigned int -mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex) +mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex, struct mlx5_dev_info *dev_info) { + int ret; + struct mlx5_nl_port_info data = { .ifindex = 0, .name = name, + .ibindex = UINT32_MAX, + .flags = 0, }; - if (mlx5_nl_port_info(nl, pindex, &data) < 0) - return 0; - return data.ifindex; + if (!strcmp(name, dev_info->ibname)) { + if (dev_info->port_info && pindex <= dev_info->port_num && + dev_info->port_info[pindex].valid) { + if (!dev_info->port_info[pindex].ifindex) + rte_errno = ENODEV; + return dev_info->port_info[pindex].ifindex; + } + if (dev_info->port_num) + data.ibindex = dev_info->ibindex; + } + + ret = mlx5_nl_port_info(nl, pindex, &data); + + if (!strcmp(dev_info->ibname, name)) { + if ((!ret || ret == -ENODEV) && dev_info->port_info && + pindex <= dev_info->port_num) { + if (!ret) + dev_info->port_info[pindex].ifindex = data.ifindex; + /* -ENODEV means the pindex is unused but still valid case */ + dev_info->port_info[pindex].valid = 1; + } + } + + return ret ? 0 : data.ifindex; } /** @@ -1157,18 +1186,23 @@ mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex) * IB device name. * @param[in] pindex * IB device port index, starting from 1 + * @param[in] dev_info + * Cached mlx5 device information. * @return * Port state (ibv_port_state) on success, negative on error * and rte_errno is set. */ int -mlx5_nl_port_state(int nl, const char *name, uint32_t pindex) +mlx5_nl_port_state(int nl, const char *name, uint32_t pindex, struct mlx5_dev_info *dev_info) { struct mlx5_nl_port_info data = { .state = 0, .name = name, + .ibindex = UINT32_MAX, }; + if (dev_info && !strcmp(name, dev_info->ibname) && dev_info->port_num) + data.ibindex = dev_info->ibindex; if (mlx5_nl_port_info(nl, pindex, &data) < 0) return -rte_errno; if ((data.flags & MLX5_NL_CMD_GET_PORT_STATE) == 0) { @@ -1185,13 +1219,15 @@ mlx5_nl_port_state(int nl, const char *name, uint32_t pindex) * Netlink socket of the RDMA kind (NETLINK_RDMA). * @param[in] name * IB device name. + * @param[in] dev_info + * Cached mlx5 device info. * * @return * A valid (nonzero) number of ports on success, 0 otherwise * and rte_errno is set. */ unsigned int -mlx5_nl_portnum(int nl, const char *name) +mlx5_nl_portnum(int nl, const char *name, struct mlx5_dev_info *dev_info) { struct mlx5_nl_port_info data = { .flags = 0, @@ -1206,7 +1242,10 @@ mlx5_nl_portnum(int nl, const char *name) .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP, }; uint32_t sn = MLX5_NL_SN_GENERATE; - int ret; + int ret, size; + + if (dev_info->port_num && !strcmp(name, dev_info->ibname)) + return dev_info->port_num; ret = mlx5_nl_send(nl, &req, sn); if (ret < 0) @@ -1220,8 +1259,25 @@ mlx5_nl_portnum(int nl, const char *name) rte_errno = ENODEV; return 0; } - if (!data.portnum) + if (!data.portnum) { rte_errno = EINVAL; + return 0; + } + MLX5_ASSERT(!strlen(dev_info->ibname)); + dev_info->port_num = data.portnum; + dev_info->ibindex = data.ibindex; + snprintf(dev_info->ibname, MLX5_FS_NAME_MAX, "%s", name); + if (data.portnum > 1) { + size = (data.portnum + 1) * sizeof(struct mlx5_port_nl_info); + dev_info->port_info = mlx5_malloc(MLX5_MEM_ZERO | MLX5_MEM_RTE, size, + RTE_CACHE_LINE_SIZE, + SOCKET_ID_ANY); + if (dev_info->port_info == NULL) { + memset(dev_info, 0, sizeof(*dev_info)); + rte_errno = ENOMEM; + return 0; + } + } return data.portnum; } diff --git a/drivers/common/mlx5/linux/mlx5_nl.h b/drivers/common/mlx5/linux/mlx5_nl.h index 580de3b769..396ffc98ce 100644 --- a/drivers/common/mlx5/linux/mlx5_nl.h +++ b/drivers/common/mlx5/linux/mlx5_nl.h @@ -11,6 +11,7 @@ #include #include "mlx5_common.h" +#include "mlx5_common_utils.h" typedef void (mlx5_nl_event_cb)(struct nlmsghdr *hdr, void *user_data); @@ -52,11 +53,12 @@ int mlx5_nl_promisc(int nlsk_fd, unsigned int iface_idx, int enable); __rte_internal int mlx5_nl_allmulti(int nlsk_fd, unsigned int iface_idx, int enable); __rte_internal -unsigned int mlx5_nl_portnum(int nl, const char *name); +unsigned int mlx5_nl_portnum(int nl, const char *name, struct mlx5_dev_info *dev_info); __rte_internal -unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex); +unsigned int mlx5_nl_ifindex(int nl, const char *name, uint32_t pindex, + struct mlx5_dev_info *info); __rte_internal -int mlx5_nl_port_state(int nl, const char *name, uint32_t pindex); +int mlx5_nl_port_state(int nl, const char *name, uint32_t pindex, struct mlx5_dev_info *dev_info); __rte_internal int mlx5_nl_vf_mac_addr_modify(int nlsk_fd, unsigned int iface_idx, struct rte_ether_addr *mac, int vf_index); diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c index ca8543e36e..0aaae91c31 100644 --- a/drivers/common/mlx5/mlx5_common.c +++ b/drivers/common/mlx5/mlx5_common.c @@ -735,6 +735,11 @@ mlx5_common_dev_release(struct mlx5_common_device *cdev) if (TAILQ_EMPTY(&devices_list)) rte_mem_event_callback_unregister("MLX5_MEM_EVENT_CB", NULL); + if (cdev->dev_info.port_info != NULL) { + mlx5_free(cdev->dev_info.port_info); + cdev->dev_info.port_info = NULL; + } + cdev->dev_info.port_num = 0; mlx5_dev_mempool_unsubscribe(cdev); mlx5_mr_release_cache(&cdev->mr_scache); mlx5_dev_hw_global_release(cdev); diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h index 1abd1e8239..6cb40f54dd 100644 --- a/drivers/common/mlx5/mlx5_common.h +++ b/drivers/common/mlx5/mlx5_common.h @@ -174,6 +174,18 @@ enum mlx5_nl_phys_port_name_type { MLX5_PHYS_PORT_NAME_TYPE_UNKNOWN, /* Unrecognized. */ }; +struct mlx5_port_nl_info { + uint32_t ifindex; + uint8_t valid; +}; + +struct mlx5_dev_info { + uint32_t port_num; + uint32_t ibindex; + char ibname[MLX5_FS_NAME_MAX]; + struct mlx5_port_nl_info *port_info; +}; + /** Switch information returned by mlx5_nl_switch_info(). */ struct mlx5_switch_info { uint32_t master:1; /**< Master device. */ @@ -525,6 +537,7 @@ struct mlx5_common_device { uint32_t classes_loaded; void *ctx; /* Verbs/DV/DevX context. */ void *pd; /* Protection Domain. */ + struct mlx5_dev_info dev_info; /* Device port info queried via netlink. */ uint32_t pdn; /* Protection Domain Number. */ struct mlx5_mr_share_cache mr_scache; /* Global shared MR cache. */ struct mlx5_common_dev_config config; /* Device configuration. */ diff --git a/drivers/common/mlx5/windows/mlx5_common_os.h b/drivers/common/mlx5/windows/mlx5_common_os.h index acee0c987f..65394035de 100644 --- a/drivers/common/mlx5/windows/mlx5_common_os.h +++ b/drivers/common/mlx5/windows/mlx5_common_os.h @@ -20,6 +20,11 @@ #define MLX5_BF_OFFSET 0x800 +enum { + MLX5_FS_NAME_MAX = MLX5_DEVX_DEVICE_NAME_SIZE + 1, + MLX5_FS_PATH_MAX = MLX5_DEVX_DEVICE_PNP_SIZE + 1 +}; + /** * This API allocates aligned or non-aligned memory. The free can be on either * aligned or nonaligned memory. To be protected - even though there may be no diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c index 5d64984022..08ac6dd939 100644 --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -673,6 +674,57 @@ mlx5_link_update_bond(struct rte_eth_dev *dev) ((ifr.ifr_flags & IFF_UP) && (ifr.ifr_flags & IFF_RUNNING)); } +static void +mlx5_handle_port_info_update(struct mlx5_dev_info *dev_info, uint32_t if_index, + uint16_t msg_type) +{ + struct mlx5_switch_info info = { + .master = 0, + .representor = 0, + .name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET, + .port_name = 0, + .switch_id = 0, + }; + uint32_t i; + int nl_route; + + if (dev_info->port_num <= 1 || dev_info->port_info == NULL) + return; + + for (i = 1; i <= dev_info->port_num; i++) { + if (!dev_info->port_info[i].valid) + continue; + if (dev_info->port_info[i].ifindex == if_index) + break; + } + if (msg_type == RTM_NEWLINK && i > dev_info->port_num) { + nl_route = mlx5_nl_init(NETLINK_ROUTE, 0); + if (nl_route < 0) + goto flush_all; + + if (mlx5_nl_switch_info(nl_route, if_index, &info)) { + if (mlx5_sysfs_switch_info(if_index, &info)) + goto flush_all; + } + + if (info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFSF || + info.name_type == MLX5_PHYS_PORT_NAME_TYPE_PFVF) + goto flush_all; + close(nl_route); + } else if (msg_type == RTM_DELLINK && i <= dev_info->port_num) { + memset(dev_info->port_info + i, 0, sizeof(struct mlx5_port_nl_info)); + } + + return; +flush_all: + if (nl_route >= 0) + close(nl_route); + for (i = 1; i <= dev_info->port_num; i++) { + if (!dev_info->port_info[i].ifindex) + dev_info->port_info[i].valid = 0; + } +} + static void mlx5_dev_interrupt_nl_cb(struct nlmsghdr *hdr, void *cb_arg) { @@ -682,6 +734,8 @@ mlx5_dev_interrupt_nl_cb(struct nlmsghdr *hdr, void *cb_arg) if (mlx5_nl_parse_link_status_update(hdr, &if_index) < 0) return; + mlx5_handle_port_info_update(&sh->cdev->dev_info, if_index, hdr->nlmsg_type); + for (i = 0; i < sh->max_port; i++) { struct mlx5_dev_shared_port *port = &sh->port[i]; struct rte_eth_dev *dev; diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 69a80b9ddc..8f6e584154 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -1268,7 +1268,7 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, /* IB doesn't allow more than 255 ports, must be Ethernet. */ err = mlx5_nl_port_state(nl_rdma, spawn->phys_dev_name, - spawn->phys_port); + spawn->phys_port, &spawn->cdev->dev_info); if (err < 0) { DRV_LOG(INFO, "Failed to get netlink port state: %s", strerror(rte_errno)); @@ -1897,6 +1897,8 @@ mlx5_dev_spawn_data_cmp(const void *a, const void *b) * Netlink RDMA group socket handle. * @param[in] owner * Representor owner PF index. + * @param[in] dev_info + * Cached mlx5 device information. * @param[out] bond_info * Pointer to bonding information. * @@ -1908,6 +1910,7 @@ static int mlx5_device_bond_pci_match(const char *ibdev_name, const struct rte_pci_addr *pci_dev, int nl_rdma, uint16_t owner, + struct mlx5_dev_info *dev_info, struct mlx5_bond_info *bond_info) { char ifname[IF_NAMESIZE + 1]; @@ -1928,7 +1931,7 @@ mlx5_device_bond_pci_match(const char *ibdev_name, return -1; if (!strstr(ibdev_name, "bond")) return -1; - np = mlx5_nl_portnum(nl_rdma, ibdev_name); + np = mlx5_nl_portnum(nl_rdma, ibdev_name, dev_info); if (!np) return -1; if (mlx5_get_device_guid(pci_dev, cur_guid, sizeof(cur_guid)) < 0) @@ -1940,7 +1943,7 @@ mlx5_device_bond_pci_match(const char *ibdev_name, */ for (i = 1; i <= np; ++i) { /* Check whether Infiniband port is populated. */ - ifindex = mlx5_nl_ifindex(nl_rdma, ibdev_name, i); + ifindex = mlx5_nl_ifindex(nl_rdma, ibdev_name, i, dev_info); if (!ifindex) continue; if (!if_indextoname(ifindex, ifname)) @@ -1978,9 +1981,13 @@ mlx5_device_bond_pci_match(const char *ibdev_name, if (!file) break; info.name_type = MLX5_PHYS_PORT_NAME_TYPE_NOTSET; - if (fscanf(file, "%32s", tmp_str) == 1) + if (fscanf(file, "%32s", tmp_str) == 1) { mlx5_translate_port_name(tmp_str, &info); - fclose(file); + fclose(file); + } else { + fclose(file); + break; + } /* Only process PF ports. */ if (info.name_type != MLX5_PHYS_PORT_NAME_TYPE_LEGACY && info.name_type != MLX5_PHYS_PORT_NAME_TYPE_UPLINK) @@ -2003,8 +2010,8 @@ mlx5_device_bond_pci_match(const char *ibdev_name, if (ret != 1) break; /* Save bonding info. */ - strncpy(bond_info->ports[info.port_name].ifname, ifname, - sizeof(bond_info->ports[0].ifname)); + snprintf(bond_info->ports[info.port_name].ifname, + sizeof(bond_info->ports[0].ifname), "%s", ifname); bond_info->ports[info.port_name].pci_addr = pci_addr; bond_info->ports[info.port_name].ifindex = ifindex; bond_info->n_port++; @@ -2033,6 +2040,7 @@ mlx5_device_bond_pci_match(const char *ibdev_name, pci_addr.function == owner))) pf = info.port_name; } + fclose(bond_file); if (pf >= 0) { /* Get bond interface info */ ret = mlx5_sysfs_bond_info(ifindex, &bond_info->ifindex, @@ -2084,7 +2092,8 @@ mlx5_nl_esw_multiport_get(struct rte_pci_addr *pci_addr, int *enabled) #define SYSFS_MPESW_PARAM_MAX_LEN 16 static int -mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_addr, int *enabled) +mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_addr, int *enabled, + struct mlx5_dev_info *dev_info) { int nl_rdma; unsigned int n_ports; @@ -2096,7 +2105,7 @@ mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_ad nl_rdma = mlx5_nl_init(NETLINK_RDMA, 0); if (nl_rdma < 0) return nl_rdma; - n_ports = mlx5_nl_portnum(nl_rdma, ibv->name); + n_ports = mlx5_nl_portnum(nl_rdma, ibv->name, dev_info); if (!n_ports) { ret = -rte_errno; goto close_nl_rdma; @@ -2104,12 +2113,12 @@ mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_ad for (i = 1; i <= n_ports; ++i) { unsigned int ifindex; char ifname[IF_NAMESIZE + 1]; - struct rte_pci_addr if_pci_addr; + struct rte_pci_addr if_pci_addr = { 0 }; char mpesw[SYSFS_MPESW_PARAM_MAX_LEN + 1]; FILE *sysfs; int n; - ifindex = mlx5_nl_ifindex(nl_rdma, ibv->name, i); + ifindex = mlx5_nl_ifindex(nl_rdma, ibv->name, i, dev_info); if (!ifindex) continue; if (!if_indextoname(ifindex, ifname)) @@ -2151,7 +2160,8 @@ mlx5_sysfs_esw_multiport_get(struct ibv_device *ibv, struct rte_pci_addr *pci_ad } static int -mlx5_is_mpesw_enabled(struct ibv_device *ibv, struct rte_pci_addr *ibv_pci_addr, int *enabled) +mlx5_is_mpesw_enabled(struct ibv_device *ibv, struct rte_pci_addr *ibv_pci_addr, int *enabled, + struct mlx5_dev_info *dev_info) { /* * Try getting Multiport E-Switch state through netlink interface @@ -2159,7 +2169,7 @@ mlx5_is_mpesw_enabled(struct ibv_device *ibv, struct rte_pci_addr *ibv_pci_addr, * assume that Multiport E-Switch is disabled and return an error. */ if (mlx5_nl_esw_multiport_get(ibv_pci_addr, enabled) >= 0 || - mlx5_sysfs_esw_multiport_get(ibv, ibv_pci_addr, enabled) >= 0) + mlx5_sysfs_esw_multiport_get(ibv, ibv_pci_addr, enabled, dev_info) >= 0) return 0; DRV_LOG(DEBUG, "Unable to check MPESW state for IB device %s " "(PCI: " PCI_PRI_FMT ")", @@ -2173,7 +2183,7 @@ mlx5_is_mpesw_enabled(struct ibv_device *ibv, struct rte_pci_addr *ibv_pci_addr, static int mlx5_device_mpesw_pci_match(struct ibv_device *ibv, const struct rte_pci_addr *owner_pci, - int nl_rdma) + int nl_rdma, struct mlx5_dev_info *dev_info) { struct rte_pci_addr ibdev_pci_addr = { 0 }; char ifname[IF_NAMESIZE + 1] = { 0 }; @@ -2197,24 +2207,24 @@ mlx5_device_mpesw_pci_match(struct ibv_device *ibv, return -1; } /* Check if IB device has MPESW enabled. */ - if (mlx5_is_mpesw_enabled(ibv, &ibdev_pci_addr, &enabled)) + if (mlx5_is_mpesw_enabled(ibv, &ibdev_pci_addr, &enabled, dev_info)) return -1; if (!enabled) return -1; /* Iterate through IB ports to find MPESW master uplink port. */ if (nl_rdma < 0) return -1; - np = mlx5_nl_portnum(nl_rdma, ibv->name); + np = mlx5_nl_portnum(nl_rdma, ibv->name, dev_info); if (!np) return -1; for (i = 1; i <= np; ++i) { - struct rte_pci_addr pci_addr; + struct rte_pci_addr pci_addr = { 0 }; FILE *file; char port_name[IF_NAMESIZE + 1]; struct mlx5_switch_info info; /* Check whether IB port has a corresponding netdev. */ - ifindex = mlx5_nl_ifindex(nl_rdma, ibv->name, i); + ifindex = mlx5_nl_ifindex(nl_rdma, ibv->name, i, dev_info); if (!ifindex) continue; if (!if_indextoname(ifindex, ifname)) @@ -2321,16 +2331,30 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, * matching ones, gathering into the list. */ struct ibv_device *ibv_match[ret + 1]; + struct mlx5_dev_info *info, tmp_info[ret]; int nl_route = mlx5_nl_init(NETLINK_ROUTE, 0); int nl_rdma = mlx5_nl_init(NETLINK_RDMA, 0); unsigned int i; + memset(tmp_info, 0, sizeof(tmp_info)); while (ret-- > 0) { struct rte_pci_addr pci_addr; + if (cdev->dev_info.port_num) { + if (strcmp(ibv_list[ret]->name, cdev->dev_info.ibname)) { + DRV_LOG(INFO, "Unmatched caching device \"%s\" \"%s\"", + cdev->dev_info.ibname, ibv_list[ret]->name); + continue; + } + info = &cdev->dev_info; + } else { + info = &tmp_info[ret]; + } DRV_LOG(DEBUG, "Checking device \"%s\"", ibv_list[ret]->name); bd = mlx5_device_bond_pci_match(ibv_list[ret]->name, &owner_pci, - nl_rdma, owner_id, &bond_info); + nl_rdma, owner_id, + info, + &bond_info); if (bd >= 0) { /* * Bonding device detected. Only one match is allowed, @@ -2356,7 +2380,8 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, ibv_match[nd++] = ibv_list[ret]; break; } - mpesw = mlx5_device_mpesw_pci_match(ibv_list[ret], &owner_pci, nl_rdma); + mpesw = mlx5_device_mpesw_pci_match(ibv_list[ret], &owner_pci, nl_rdma, + info); if (mpesw >= 0) { /* * MPESW device detected. Only one matching IB device is allowed, @@ -2380,10 +2405,18 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, } /* Bonding or MPESW device was not found. */ if (mlx5_get_pci_addr(ibv_list[ret]->ibdev_path, - &pci_addr)) + &pci_addr)) { + if (tmp_info[ret].port_info != NULL) + mlx5_free(tmp_info[ret].port_info); + memset(&tmp_info[ret], 0, sizeof(tmp_info[0])); continue; - if (rte_pci_addr_cmp(&owner_pci, &pci_addr) != 0) + } + if (rte_pci_addr_cmp(&owner_pci, &pci_addr) != 0) { + if (tmp_info[ret].port_info != NULL) + mlx5_free(tmp_info[ret].port_info); + memset(&tmp_info[ret], 0, sizeof(tmp_info[0])); continue; + } DRV_LOG(INFO, "PCI information matches for device \"%s\"", ibv_list[ret]->name); ibv_match[nd++] = ibv_list[ret]; @@ -2401,13 +2434,21 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, goto exit; } if (nd == 1) { + if (!cdev->dev_info.port_num) { + for (i = 0; i < RTE_DIM(tmp_info); i++) { + if (tmp_info[i].port_num) { + cdev->dev_info = tmp_info[i]; + break; + } + } + } /* * Found single matching device may have multiple ports. * Each port may be representor, we have to check the port * number and check the representors existence. */ if (nl_rdma >= 0) - np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name); + np = mlx5_nl_portnum(nl_rdma, ibv_match[0]->name, &cdev->dev_info); if (!np) DRV_LOG(WARNING, "Cannot get IB device \"%s\" ports number.", @@ -2424,6 +2465,14 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, ret = -rte_errno; goto exit; } + } else { + /* Can't handle one common device with multiple IB devices caching */ + for (i = 0; i < RTE_DIM(tmp_info); i++) { + if (tmp_info[i].port_info != NULL) + mlx5_free(tmp_info[i].port_info); + memset(&tmp_info[i], 0, sizeof(tmp_info[0])); + } + DRV_LOG(INFO, "Cannot handle multiple IB devices info caching in single common device."); } /* Now we can determine the maximal amount of devices to be spawned. */ list = mlx5_malloc(MLX5_MEM_ZERO, @@ -2457,7 +2506,7 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, list[ns].mpesw_port = MLX5_MPESW_PORT_INVALID; list[ns].ifindex = mlx5_nl_ifindex(nl_rdma, ibv_match[0]->name, - i); + i, &cdev->dev_info); if (!list[ns].ifindex) { /* * No network interface index found for the @@ -2588,7 +2637,7 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, list[ns].ifindex = mlx5_nl_ifindex (nl_rdma, ibv_match[i]->name, - 1); + 1, &cdev->dev_info); if (!list[ns].ifindex) { char ifname[IF_NAMESIZE]; @@ -2777,6 +2826,11 @@ mlx5_os_pci_probe_pf(struct mlx5_common_device *cdev, mlx5_free(list); MLX5_ASSERT(ibv_list); mlx5_glue->free_device_list(ibv_list); + if (ret) { + if (cdev->dev_info.port_info != NULL) + mlx5_free(cdev->dev_info.port_info); + memset(&cdev->dev_info, 0, sizeof(cdev->dev_info)); + } return ret; } diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h index 80c70d713a..4ef0916173 100644 --- a/drivers/net/mlx5/linux/mlx5_os.h +++ b/drivers/net/mlx5/linux/mlx5_os.h @@ -8,12 +8,6 @@ #include -/* verb enumerations translations to local enums. */ -enum { - MLX5_FS_NAME_MAX = IBV_SYSFS_NAME_MAX + 1, - MLX5_FS_PATH_MAX = IBV_SYSFS_PATH_MAX + 1 -}; - /* Maximal data of sendmsg message(in bytes). */ #define MLX5_SENDMSG_MAX 64 diff --git a/drivers/net/mlx5/windows/mlx5_os.h b/drivers/net/mlx5/windows/mlx5_os.h index 8b58265687..fb7198c244 100644 --- a/drivers/net/mlx5/windows/mlx5_os.h +++ b/drivers/net/mlx5/windows/mlx5_os.h @@ -7,11 +7,6 @@ #include "mlx5_win_ext.h" -enum { - MLX5_FS_NAME_MAX = MLX5_DEVX_DEVICE_NAME_SIZE + 1, - MLX5_FS_PATH_MAX = MLX5_DEVX_DEVICE_PNP_SIZE + 1 -}; - #define PCI_DRV_FLAGS 0 #define MLX5_NAMESIZE MLX5_FS_NAME_MAX -- 2.34.1