From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 849C5A00C2 for ; Tue, 3 Jan 2023 12:12:14 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7F83141133; Tue, 3 Jan 2023 12:12:14 +0100 (CET) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2051.outbound.protection.outlook.com [40.107.93.51]) by mails.dpdk.org (Postfix) with ESMTP id CE43A40E2D; Tue, 3 Jan 2023 12:12:12 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LI1z7Hmpz96aclQ8wiHA5GmAXPJNS9s5eN6Ec5ebm/H/zI76SctLRCWFJUWolUBZ/FEJVOsODngLTv9wXDfwVC42nlyqX60tllzFdgQYeuNBNdFF5L1z0Gz+xwrJMCyog+x4q1XfEvw1uy2Sgxkr4C0ZsNcYnS4rOjXYC8+GnkHaDauu6wiwYGhLKtMJSe7eVHXotbGz0BngVTNVFjpwxLfVxq99OcAzBsUgL0lXc24Yuz/l9OkdVA+4ThyCUxvprtvgYc+S7uIpqtVaqqEajumZrJ6c1meUwflTpZlP+PtrPNBj8kr2qnY156ljX6AknDFSGvOJFnfCGHBaSzFFAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zUU7V7Tvor8I2P+vujNiJNHfq/3pmkilfzTVLvea0W0=; b=cA7Mwl3O/MPRHECckmStATw6MCywGQHz9axj+VCHOgvSUZCsZeYbTVZ1JtlsndRHkgHeM63HGhNCMTE+YjQCa3p0RT21WxrZSt+DMb57r1gBqCcAMO2qhOHXAdtJWK6r9l63vmf9CqBcrNaV926x8XPvessday1JUXzbUnHAuU4thbw7N+jXjRrAEQR4uLvZ9VeB/hIg0dliZ6cRwXO54n580TmqmcrCo6/rokRlcwMe3aL37VIsd2DXcPFdZ8iNiBhxegtpqAwqZv01FcPY/cyynJqR7lqv9ydY17fD9JmgVxYR1xNSPxdK0KIoWydccPGpS+M1tdrKkuXbLzC4Mg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zUU7V7Tvor8I2P+vujNiJNHfq/3pmkilfzTVLvea0W0=; b=tdrwO/P/xz7xg58rr7bpB0YOAMsgpfxEfwGFU7khVci/FYXUj1jpHOxcqxkDxBEmfOWBkChnEIk78fJF4smkbsGc2a/e84OK6EqI+PnhP9zJ79mUNYSi6YcQlkfj6rrfi67837KunW4UsRDiWnEzz5wFEETvgsjcatvPgipilE53gpGsbCa2CfJbixhakAS7vCbTHjijqr8DFah02tm2AMv3gPDQzSdl4a1mqsXOdAHLd6iGVdNzBVGEYJVE1MOvRWYJgVEhDPiEi/zvjqjkWCdNHQnb8R7dzpf+/2ISrAYvvtVe+plF1Mathb3pydz6Fvod2D8reIkHp1Ksbalu1w== Received: from DS7PR03CA0189.namprd03.prod.outlook.com (2603:10b6:5:3b6::14) by CH2PR12MB4922.namprd12.prod.outlook.com (2603:10b6:610:65::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5944.19; Tue, 3 Jan 2023 11:12:10 +0000 Received: from DS1PEPF0000E651.namprd02.prod.outlook.com (2603:10b6:5:3b6:cafe::e3) by DS7PR03CA0189.outlook.office365.com (2603:10b6:5:3b6::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5966.19 via Frontend Transport; Tue, 3 Jan 2023 11:12:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DS1PEPF0000E651.mail.protection.outlook.com (10.167.18.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5944.8 via Frontend Transport; Tue, 3 Jan 2023 11:12:09 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Tue, 3 Jan 2023 03:12:00 -0800 Received: from nvidia.com (10.126.231.37) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Tue, 3 Jan 2023 03:11:58 -0800 From: Viacheslav Ovsiienko To: CC: , , , Subject: [PATCH] net/mlx5: fix read device clock in real time mode Date: Tue, 3 Jan 2023 13:11:45 +0200 Message-ID: <20230103111145.29824-1-viacheslavo@nvidia.com> X-Mailer: git-send-email 2.18.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.126.231.37] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS1PEPF0000E651:EE_|CH2PR12MB4922:EE_ X-MS-Office365-Filtering-Correlation-Id: b24a8d10-e888-4d18-bedb-08daed7b5b8d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vpJldvm/R9gP9GNZf6sHLZTtWpXCZ2HK/sFJ0iE5laticpPVRsPD1OfCUVw49PVOMWi6wPL+NHMkBIcjEqi3E6lePVgZnkFEukYx2JJ4HSUoyF4HC4bgwFlKArB1VLjU/UTWctEmag1Blcjgr0uyvh7Gy919vIsIQNWG+kYeKqkyOr3qE5zGAAM4XewVtphz4SXfv/GwNnlkbCM0jv60Tv5QkYAjAJoAawCtM+Yjtd99oUmLm2EeebEnAcMUT+bKYy3+MKhoNGegh4K5q4GIG16oVn63FVLQy3zhE1OTXaHTvTcKNTiD19C9mOoRR+ztDAYtUQkfMp34nR4ztg7lYgMfIqVeeT9LEFeNSKXDIayrxe4rRELmL8ie0s5OVnRKbhvaO4A9OCzWoA+it5T1KPJ2xHGYHHgmNtKY/4/hMdYsxFKanxj/RA8dPWE9uL9dgjZmIXQMTlkSTYKyTfn5dey+voyMWWgej0/5h9JVYauj4buwbsLENFMz60RjW/xNkLpjfkO9gzXUfmlHArZGNoB0ZCiSY/1d9v/zKY/1c+PW4IZABsDVdLsZ2IphCsx4e8ykp/efIlUWQi6i2hNHTyQwndQDe0CrBF9/yXi4huOG9NomFQ7LVNlwIvHJUGVfNTi0ihoETYVFGU43+jOFXZ0cSZ7y0bmTZlBYKdREjLbos+kb2iPdH9loM7jd5zx6kyXrvZoAnXx9EouON43e3hHUEeC+2WjnnHy06gHDb1I= X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230022)(4636009)(396003)(39860400002)(346002)(376002)(136003)(451199015)(40470700004)(36840700001)(46966006)(36860700001)(426003)(47076005)(83380400001)(86362001)(7636003)(82740400003)(356005)(2906002)(8936002)(5660300002)(41300700001)(40480700001)(478600001)(55016003)(7696005)(336012)(82310400005)(6666004)(1076003)(2616005)(186003)(26005)(450100002)(8676002)(4326008)(316002)(16526019)(6286002)(40460700003)(6916009)(54906003)(70206006)(70586007)(36756003)(22166006); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jan 2023 11:12:09.7404 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b24a8d10-e888-4d18-bedb-08daed7b5b8d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS1PEPF0000E651.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4922 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Since ConnectX-6DX the real time timestamp mode is supported. The rte_eth_read_clock() routine queries current timestamp value from the PMD. The mlx5 PMD has special infrastructure to schedule packet sending in real time mode which can be engaged with tx_pp devarg. This infrastructure provides the timestamp reading from the special queue CEQs directly from the host memory in user space, without involving kernel calls. The ConnectX-7 NIC has hardware capability to schedule packet sending without special infrastructure and tx_pp devarg can be omitted. If there is no tx_pp devarg specified the mlx5 uses kernel calls to query current timestamp value. The kernel can be completely unaware about engaged real time mode, also kernel might use its internal queue CQEs to get timestamps, that is neither precise nor reliable, inconsistent values might be returned, causing send scheduling malfunction. The HCA PCI BAR provides the real time direct reading from hardware. This patch maps PCI resource to the process address space on demand and allows reading the real time timestamp values from the NIC directly. Fixes: b94d93ca73803 ("net/mlx5: support reading device clock") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko --- drivers/common/mlx5/mlx5_common.h | 1 + drivers/common/mlx5/mlx5_prm.h | 5 +- drivers/common/mlx5/version.map | 1 + drivers/net/mlx5/linux/mlx5_ethdev_os.c | 68 +++++++++++++++++++++++ drivers/net/mlx5/mlx5.c | 6 +- drivers/net/mlx5/mlx5.h | 4 ++ drivers/net/mlx5/mlx5_txpp.c | 15 ++++- drivers/net/mlx5/windows/mlx5_ethdev_os.c | 30 ++++++++++ 8 files changed, 127 insertions(+), 3 deletions(-) diff --git a/drivers/common/mlx5/mlx5_common.h b/drivers/common/mlx5/mlx5_common.h index d6e91b5296..c7bd703497 100644 --- a/drivers/common/mlx5/mlx5_common.h +++ b/drivers/common/mlx5/mlx5_common.h @@ -221,6 +221,7 @@ check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n, * - 0 on success. * - Negative value and rte_errno is set otherwise. */ +__rte_internal int mlx5_dev_to_pci_str(const struct rte_device *dev, char *addr, size_t size); /* diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h index 2b5c43ee6e..91ef61a06c 100644 --- a/drivers/common/mlx5/mlx5_prm.h +++ b/drivers/common/mlx5/mlx5_prm.h @@ -3040,6 +3040,7 @@ struct mlx5_ifc_health_buffer_bits { u8 ext_synd[0x10]; }; +/* HCA PCI BAR resource structure. */ struct mlx5_ifc_initial_seg_bits { u8 fw_rev_minor[0x10]; u8 fw_rev_major[0x10]; @@ -3067,7 +3068,9 @@ struct mlx5_ifc_initial_seg_bits { u8 clear_int[0x1]; u8 health_syndrome[0x8]; u8 health_counter[0x18]; - u8 reserved_8[0x17fc0]; + u8 reserved_8[0x160]; + u8 real_time[0x40]; + u8 reserved_9[0x17e20]; }; struct mlx5_ifc_create_cq_out_bits { diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map index 4f72900519..03c8ce5593 100644 --- a/drivers/common/mlx5/version.map +++ b/drivers/common/mlx5/version.map @@ -14,6 +14,7 @@ INTERNAL { mlx5_dev_is_pci; mlx5_dev_is_vf_pci; + mlx5_dev_to_pci_str; mlx5_dev_mempool_unregister; mlx5_dev_mempool_subscribe; diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c index 72268c0c8a..f1ff6f49f9 100644 --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -1776,3 +1777,70 @@ int mlx5_get_flag_dropless_rq(struct rte_eth_dev *dev) mlx5_free(sset_info); return ret; } + +/** + * Unmaps HCA PCI BAR from the current process address space. + * + * @param dev + * Pointer to Ethernet device structure. + */ +void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev) +{ + struct mlx5_proc_priv *ppriv = dev->process_private; + + if (ppriv && ppriv->hca_bar) { + rte_mem_unmap(ppriv->hca_bar, MLX5_ST_SZ_BYTES(initial_seg)); + ppriv->hca_bar = NULL; + } +} + +/** + * Maps HCA PCI BAR to the current process address space. + * Stores pointer in the process private structure allowing + * to read internal and real time counter directly from the HW. + * + * @param dev + * Pointer to Ethernet device structure. + * + * @return + * 0 on success and not NULL pointer to mapped area in process structure. + * negative otherwise and NULL pointer + */ +int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev) +{ + struct mlx5_proc_priv *ppriv = dev->process_private; + char pci_addr[PCI_PRI_STR_SIZE] = { 0 }; + void *base, *expected = NULL; + int fd, ret; + + if (!ppriv) { + rte_errno = ENOMEM; + return -rte_errno; + } + if (ppriv->hca_bar) + return 0; + ret = mlx5_dev_to_pci_str(dev->device, pci_addr, sizeof(pci_addr)); + if (ret < 0) + return -rte_errno; + /* Open PCI device resource 0 - HCA initialize segment */ + MKSTR(name, "/sys/bus/pci/devices/%s/resource0", pci_addr); + fd = open(name, O_RDWR | O_SYNC); + if (fd == -1) { + rte_errno = ENOTSUP; + return -ENOTSUP; + } + base = rte_mem_map(NULL, MLX5_ST_SZ_BYTES(initial_seg), + RTE_PROT_READ, RTE_MAP_SHARED, fd, 0); + close(fd); + if (!base) { + rte_errno = ENOTSUP; + return -ENOTSUP; + } + /* Check there is no concurrent mapping in other thread. */ + if (!__atomic_compare_exchange_n(&ppriv->hca_bar, &expected, + base, false, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) + rte_mem_unmap(base, MLX5_ST_SZ_BYTES(initial_seg)); + return 0; +} + diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 3ae35587b6..b8643cebdd 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1977,8 +1977,12 @@ mlx5_proc_priv_init(struct rte_eth_dev *dev) void mlx5_proc_priv_uninit(struct rte_eth_dev *dev) { - if (!dev->process_private) + struct mlx5_proc_priv *ppriv = dev->process_private; + + if (!ppriv) return; + if (ppriv->hca_bar) + mlx5_txpp_unmap_hca_bar(dev); mlx5_free(dev->process_private); dev->process_private = NULL; } diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 31982002ee..16b33e1548 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -1463,6 +1463,8 @@ struct mlx5_dev_ctx_shared { * Caution, secondary process may rebuild the struct during port start. */ struct mlx5_proc_priv { + void *hca_bar; + /* Mapped HCA PCI BAR area. */ size_t uar_table_sz; /* Size of UAR register table. */ struct mlx5_uar_data uar_table[]; @@ -2163,6 +2165,8 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev, struct rte_eth_xstat_name *xstats_names, unsigned int n, unsigned int n_used); void mlx5_txpp_interrupt_handler(void *cb_arg); +int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev); +void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev); /* mlx5_rxtx.c */ diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c index f853a67f58..63d98dbde9 100644 --- a/drivers/net/mlx5/mlx5_txpp.c +++ b/drivers/net/mlx5/mlx5_txpp.c @@ -969,6 +969,8 @@ mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp) { struct mlx5_priv *priv = dev->data->dev_private; struct mlx5_dev_ctx_shared *sh = priv->sh; + struct mlx5_proc_priv *ppriv; + uint64_t ts; int ret; if (sh->txpp.refcnt) { @@ -979,7 +981,6 @@ mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp) rte_int128_t u128; struct mlx5_cqe_ts cts; } to; - uint64_t ts; mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128); if (to.cts.op_own >> 4) { @@ -994,6 +995,18 @@ mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp) *timestamp = ts; return 0; } + /* Check and try to map HCA PIC BAR to allow reading real time. */ + ppriv = dev->process_private; + if (ppriv && !ppriv->hca_bar && + sh->dev_cap.rt_timestamp && mlx5_dev_is_pci(dev->device)) + mlx5_txpp_map_hca_bar(dev); + /* Check if we can read timestamp directly from hardware. */ + if (ppriv && ppriv->hca_bar) { + ts = MLX5_GET64(initial_seg, ppriv->hca_bar, real_time); + ts = mlx5_txpp_convert_rx_ts(sh, ts); + *timestamp = ts; + return 0; + } /* Not supported in isolated mode - kernel does not see the CQEs. */ if (priv->isolated || rte_eal_process_type() != RTE_PROC_PRIMARY) return -ENOTSUP; diff --git a/drivers/net/mlx5/windows/mlx5_ethdev_os.c b/drivers/net/mlx5/windows/mlx5_ethdev_os.c index 88d8213f55..a31e1b5494 100644 --- a/drivers/net/mlx5/windows/mlx5_ethdev_os.c +++ b/drivers/net/mlx5/windows/mlx5_ethdev_os.c @@ -416,3 +416,33 @@ int mlx5_get_flag_dropless_rq(struct rte_eth_dev *dev) RTE_SET_USED(dev); return -ENOTSUP; } + +/** + * Unmaps HCA PCI BAR from the current process address space. + * + * @param dev + * Pointer to Ethernet device structure. + */ +void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev) +{ + RTE_SET_USED(dev); +} + +/** + * Maps HCA PCI BAR to the current process address space. + * Stores pointer in the process private structure allowing + * to read internal and real time counter directly from the HW. + * + * @param dev + * Pointer to Ethernet device structure. + * + * @return + * 0 on success and not NULL pointer to mapped area in process structure. + * negative otherwise and NULL pointer + */ +int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev) +{ + RTE_SET_USED(dev); + rte_errno = ENOTSUP; + return -ENOTSUP; +} -- 2.18.1