From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 91BC345C16; Wed, 30 Oct 2024 17:31:43 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 27EC743457; Wed, 30 Oct 2024 17:31:39 +0100 (CET) Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2065.outbound.protection.outlook.com [40.107.243.65]) by mails.dpdk.org (Postfix) with ESMTP id BC2BB43451; Wed, 30 Oct 2024 17:31:37 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jqr+qVmKciRy4yjZ24BjcpVqlbdwoggRABJaRfALBpqC7c3l0D3/nd2WS0pAOmHhWWodap5+rSW+z+wwSFhX/A/JirZ4BEAXQGuMg2dsORi1qbImVyPO/BJ9CjPN4UP4driXhsmvA77HIBib2nZ7/9gJmKzyiWwjzFanlkhaffPVKlXKf7/j91Kyza1uoLpG2xkaHeb3tjPaJ5LHmLcbjDaZw9+G5H+Az97RBIspGWXmfAQQgCOuoPLwO0LMtjxgefACFSG1p2W1B327nZn3+o+gTB0Gd4E28GFlDnMVjRMquQ4fG+LK+JBVXScHhhoQT32ybwYXH+tdQwT9ZepvQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XFt51IQpfJ/fINI6dsOZnapMlK3FYZTGLxlIw0Sz79U=; b=rXyzjrYKh4viXNBwuM05WtO1u0WYDRZQlEJipCP+v/sN6ZIrJZP7THH/UoRVEjyx94FlcB/49E3TLtU3T9YBES+8dgM3UxMajcgbzjn2gZeyxC2i5MgviPMmj+Lo5+HOo/s3Sl6FYeZw7iPuJR9ZUjCa5ibRW/wqOH47XGl5xKlqrUjmRqfE753XKZGCJp6u6Qti2mVaSAh9cPvf10k9sBWUpJ+lf7jEsj5Vm0IYna9ByrAieGqjEKPLEDxicXByNkpYQrPB4TXM7KfrAtCHFnAtZbcNbEiQMlkGRrlVCKBaAhreG863hr6K5fpMqEe5c5RDWJsTmOrr6o7XS2U3CQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XFt51IQpfJ/fINI6dsOZnapMlK3FYZTGLxlIw0Sz79U=; b=CV6syZpA709jVt+fmZW+sPCFUb8z1OiEbqYDuAuiF9FnXvtqWsXQLFNoigYmyD5MFgTFKt9nkLL+nLxpfiLS8q+0nyyidJINEC6Pmg6qumJh6VSsKgx6OaBfp1MpbVZm8UzvENW8GfBbc3UUqJCUVpFpVGU2kNf3uhZc+TZTOJWTa4wctLssezas+md5iJ7I0nPsPpAndyA3SlxNlWPgUX8V0+IoJ8yTXkKs8yBDE8VQ2zNuuC22vx9eOlKWk3+b901rC6OJp52gKrqQLgJJI9xx3N5jrlgMdR2BxtbZNOPr9HtIYXdd90qVUOLsSvDW9IxBdRs73tOfuOg/WBO6IA== Received: from SJ0PR05CA0148.namprd05.prod.outlook.com (2603:10b6:a03:33d::33) by DM6PR12MB4044.namprd12.prod.outlook.com (2603:10b6:5:21d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8093.32; Wed, 30 Oct 2024 16:31:34 +0000 Received: from MWH0EPF000A672F.namprd04.prod.outlook.com (2603:10b6:a03:33d:cafe::d) by SJ0PR05CA0148.outlook.office365.com (2603:10b6:a03:33d::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.11 via Frontend Transport; Wed, 30 Oct 2024 16:31:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by MWH0EPF000A672F.mail.protection.outlook.com (10.167.249.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.16 via Frontend Transport; Wed, 30 Oct 2024 16:31:33 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 30 Oct 2024 09:31:09 -0700 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Wed, 30 Oct 2024 09:31:07 -0700 From: Dariusz Sosnowski To: Viacheslav Ovsiienko , Bing Zhao , Ori Kam , Suanming Mou , Matan Azrad , Xiaoyu Min CC: Raslan Darawsheh , , Subject: [PATCH 2/2] net/mlx5: fix counter query loop getting stuck Date: Wed, 30 Oct 2024 17:30:46 +0100 Message-ID: <20241030163046.495982-3-dsosnowski@nvidia.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241030163046.495982-1-dsosnowski@nvidia.com> References: <20241030163046.495982-1-dsosnowski@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWH0EPF000A672F:EE_|DM6PR12MB4044:EE_ X-MS-Office365-Filtering-Correlation-Id: 0f49abfd-57fb-491e-c9c5-08dcf9005104 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Ntv7njJZVZmXR25+jaBSdxqVX2X/ae3uyV6di0s/8O8C8onY2Ku9YSy8GCIe?= =?us-ascii?Q?+WWR+4Y+or1u1ZXnzw82eEWXCHUwnvgIyBd2qYr4npIu1T893l6a5RHGcXHI?= =?us-ascii?Q?jMqevweHIbr7pv4153QayB9pAxJpbI+gMHt8DCjee/Uqz7i2hmhMOIDXKr6r?= =?us-ascii?Q?HOVrNRo05XA9w+7EWNkjSZqsXP0n3rE80nAn2UjxA7DtD3kylF2ItUK9ChOj?= =?us-ascii?Q?WZuhqG3rivzJvhi3rBVAwl4DqCODZk1HIVZTCydLhlyWr6N25Qfouv/HgrDp?= =?us-ascii?Q?m5He5lyT4ymUIxHKGJQ7eZi7BzQP1ZkqcFAnfPOoCKUiXGDwkM0JXwVyq5XM?= =?us-ascii?Q?RxqRWlR6WK3ifxBUyA8OTnzXZEnh/UQrEX5xvUs5iT/Lb2hb/RrdGjikOYVO?= =?us-ascii?Q?+jYX4FHMhbnk/Cun/MA8xE481ZSGZ5avP6R3Rer/BPyRaJJQ+080QDFZDGSz?= =?us-ascii?Q?1Pfw2WVbti9m4G8uWeQaKetUivcwbPDER01GLA4FGyaZIXRMWbDj0h6Uvjim?= =?us-ascii?Q?W85chnCKxdA50HW0bVvE0Y0xYAOH0oFrmOGJUkg/uUB//EjW3kueRe63kI6h?= =?us-ascii?Q?Hllh8XYlHgHiL/3RhZ+z4a7t7QPJS2/RkOjIdRoasCcctmOvft8OZ75VumnE?= =?us-ascii?Q?pZrIb/av3gbuXu0H4WzlBe0FfZhkWtefnlUV+jXiVGueWyMtXJWK34CgXIsE?= =?us-ascii?Q?shR4BqqRps7Az5Ee6EOS9bNxPl89TA8zUDapfvoOj0REACqrHVPGqNGzwIgD?= =?us-ascii?Q?w0dwcf46Brb3CsUjAw36Y92i6qfuxdPRtQW8nzowALg26oqrCQBpBljZqT4U?= =?us-ascii?Q?5+UomkpjQiE4ZK1/QQanb4Yttfwsxl8M7MBJWiTJfYLjiFzotorHde4KXkIj?= =?us-ascii?Q?vUjDmLg83cqzHui4XP3zldnyMRIKZs0s4MKJqxv2UxaNufBQEIKpm8f6Z0eE?= =?us-ascii?Q?dEsRyQYJWStSKC4CoFD1XQKU1WLhGa7yubWBrZ/Lz8ZEFVu8qBwFDFtDAX7F?= =?us-ascii?Q?JHFKAnbU1K5nrIiyX88ZTuMz1ghGoAov0wzckydE89owD2hsDNhDz5YzPA0x?= =?us-ascii?Q?XNu4dQBNrcAJg+UjzHZoyNBWsPA7lV9u9lUI7zH9djN4MQh1qvKixNJjyQyz?= =?us-ascii?Q?fY50pxXrwWxfpTSeNwBGLJnhPOJbCtmI9UG/KrzG0gtXDHuRRyJ9UhqPQiNl?= =?us-ascii?Q?5QEOvkG1Hh5mTTlThxWDrCx7xltGFcyVQYzh8YqMF+eDiUC5mR7EyKydKYRp?= =?us-ascii?Q?ndcLb6t8hRrzJusrMSRHIci3MQfBWyPIRW0QXoJLyF274MNkClMyxqPzIZ0E?= =?us-ascii?Q?vmE5UbrUTsSzwBZBe5Q22dPwmmiUoaQM3SRVDlLiDojMUnoIHMmSQCAxy4U0?= =?us-ascii?Q?SfKccD43s4ROf+RLa+uO1M3D8rgFgQkTHIvFiN6dgu2HOgR/fw=3D=3D?= X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(1800799024)(376014)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2024 16:31:33.3414 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0f49abfd-57fb-491e-c9c5-08dcf9005104 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: MWH0EPF000A672F.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4044 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Counter service thread, responsible for refreshing counter values stored in host memory, is running an "infinite loop" with the following logic: - For each port: - Refresh port's counter pool - call to __mlx5_hws_cnt_svc(). - Perform aging checks. - Go to sleep if time left in current cycle. - Repeat. __mlx5_hws_cnt_svc() used to perform counter value refresh implemented the following logic: 1. Store number of counters waiting for reset. 2. Issue ASO WQEs to refresh all counters values. 3. Move counters from reset to reuse list. Number of moved counters is limited by number stored in step 1 or step 4. 4. Store number of counters waiting for reset. 5. If number of counters waiting for reset > 0, go to step 2. Now, if an application constantly creates/destroys flow rules with counters and even a single counter is added to reset list during step 2, counter service thread might end up issuing ASO WQEs endlessly, without going to sleep and respecting the configured cycle time. This patch fixes that by remove the loop inside __mlx5_hws_cnt_svc(). As a drawback of this fix, the application must allocate enough counters to accommodate for the cycle time. This number if roughly equal to the expected counter release rate. This patch also: - Ensures that proper counter related error code is returned, when flow rule create failed due to counter allocation problem. - Adds debug logging to counter service thread. - Adds documentation for counter service thread. Fixes: 4d368e1da3a4 ("net/mlx5: support flow counter action for HWS") Cc: jackmin@nvidia.com Cc: stable@dpdk.org Signed-off-by: Dariusz Sosnowski Acked-by: Ori Kam --- doc/guides/nics/mlx5.rst | 71 +++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_flow_hw.c | 17 +++++--- drivers/net/mlx5/mlx5_hws_cnt.c | 46 ++++++++++++--------- 3 files changed, 110 insertions(+), 24 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index f82e2d75de..17c8fe70fd 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -2655,3 +2655,74 @@ Destroy GENEVE TLV parser for specific port:: This command doesn't destroy the global list, For releasing options, ``flush`` command should be used. + + +Notes for flow counters +----------------------- + +mlx5 PMD supports COUNT flow action, which provides an ability +to count packets (and bytes) matched against a given flow rule. +This section describes the high level overview of how this support is +implemented and limitations. + +HW Steering flow engine +~~~~~~~~~~~~~~~~~~~~~~~ + +Flow counters are allocated from HW in bulks. +A set of bulks forms a flow counter pool managed by PMD. +When flow counters are queried from HW, +each counter is identified by an offset in a given bulk. +Querying HW flow counter requires sending a request to HW, +which will request a read of counter values for given offsets. +HW will asynchronously provide these values through a DMA write. + +In order to optimize HW to SW communication, +these requests are handled in a separate counter service thread +spawned by mlx5 PMD. +This service thread will refresh the counter values stored in memory, +in cycles, each spanning ``svc_cycle_time`` milliseconds. +By default, ``svc_cycle_time`` is set to 500. +When applications query the COUNT flow action, +PMD returns the values stored in host memory. + +mlx5 PMD manages 3 global rings of allocated counter offsets: + +- ``free`` ring - Counters which were not used at all. +- ``wait_reset`` ring - Counters which were used in some flow rules, + but were recently freed (flow rule was destroyed or an indirect action + was destroyed). + Since count value might have changed between last counter service + thread cycle and the moment it was freed, the value in host memory + might be stale. + During next service thread cycle, such counters will be moved + to ``reuse`` ring. +- ``reuse`` ring - Counters which were used at least once and + can be reused in new flow rules. + +When counters are assigned to a flow rule (or allocated to indirect action), +PMD first tries to fetch a counter from ``reuse`` ring. +If it's empty, PMD fetches a counter from ``free`` ring. + +Counter service thread works as follows: + +#. Record counters stored in ``wait_reset`` ring. +#. Read values of all counters which were used at least once + or are currently in use. +#. Move recorded counters from ``wait_reset`` to ``reuse`` ring. +#. Sleep for ``(query time) - svc_cycle_time`` milliseconds +#. Repeat. + +Because freeing a counter (by destroying a flow rule or destroying indirect +action) does not immediately make it available for the application, +PMD might return: + +- ``ENOENT`` if no counter is available in ``free``, ``reuse`` + or ``wait_reset`` rings. + No counter will be available until application releases some of them. +- ``EAGAIN`` if no counter is available in ``free`` and ``reuse`` rings, + but there are counters in ``wait_reset`` ring. + This means that after next service thread cycle new counters will be + available. + +Application has to be aware that flow rule create or indirect action create +might need be retried. diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 88d6859bac..cebded0b04 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -3734,8 +3734,11 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, case RTE_FLOW_ACTION_TYPE_COUNT: cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx); - if (ret != 0) + if (ret != 0) { + rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_ACTION, + action, "Failed to allocate flow counter"); goto error; + } ret = mlx5_hws_cnt_pool_get_action_offset (priv->hws_cpool, cnt_id, @@ -3980,6 +3983,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev, struct mlx5dr_rule_action *rule_acts; struct rte_flow_hw *flow = NULL; const struct rte_flow_item *rule_items; + struct rte_flow_error sub_error = { 0 }; uint32_t flow_idx = 0; uint32_t res_idx = 0; int ret; @@ -4037,7 +4041,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev, &table->ats[action_template_index], table->its[pattern_template_index]->item_flags, flow->table, actions, - rule_acts, queue, error)) + rule_acts, queue, &sub_error)) goto error; rule_items = flow_hw_get_rule_items(dev, table, items, pattern_template_index, &priv->hw_q[queue].pp); @@ -4074,9 +4078,12 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev, mlx5_ipool_free(table->resource, res_idx); if (flow_idx) mlx5_ipool_free(table->flow, flow_idx); - rte_flow_error_set(error, rte_errno, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, - "fail to create rte flow"); + if (sub_error.cause != RTE_FLOW_ERROR_TYPE_NONE && error != NULL) + *error = sub_error; + else + rte_flow_error_set(error, rte_errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "fail to create rte flow"); return NULL; } diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c index def0b19deb..0197c098f6 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.c +++ b/drivers/net/mlx5/mlx5_hws_cnt.c @@ -56,26 +56,29 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, uint32_t ret __rte_unused; reset_cnt_num = rte_ring_count(reset_list); - do { - cpool->query_gen++; - mlx5_aso_cnt_query(sh, cpool); - zcdr.n1 = 0; - zcdu.n1 = 0; - ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list, - sizeof(cnt_id_t), - reset_cnt_num, &zcdu, - NULL); - MLX5_ASSERT(ret == reset_cnt_num); - ret = rte_ring_dequeue_zc_burst_elem_start(reset_list, - sizeof(cnt_id_t), - reset_cnt_num, &zcdr, - NULL); - MLX5_ASSERT(ret == reset_cnt_num); - __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num); - rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num); - rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num); + cpool->query_gen++; + mlx5_aso_cnt_query(sh, cpool); + zcdr.n1 = 0; + zcdu.n1 = 0; + ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list, + sizeof(cnt_id_t), + reset_cnt_num, &zcdu, + NULL); + MLX5_ASSERT(ret == reset_cnt_num); + ret = rte_ring_dequeue_zc_burst_elem_start(reset_list, + sizeof(cnt_id_t), + reset_cnt_num, &zcdr, + NULL); + MLX5_ASSERT(ret == reset_cnt_num); + __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num); + rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num); + rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num); + + if (rte_log_can_log(mlx5_logtype, RTE_LOG_DEBUG)) { reset_cnt_num = rte_ring_count(reset_list); - } while (reset_cnt_num > 0); + DRV_LOG(DEBUG, "ibdev %s cpool %p wait_reset_cnt=%" PRIu32, + sh->ibdev_name, (void *)cpool, reset_cnt_num); + } } /** @@ -325,6 +328,11 @@ mlx5_hws_cnt_svc(void *opaque) rte_spinlock_unlock(&sh->cpool_lock); query_us = query_cycle / (rte_get_timer_hz() / US_PER_S); sleep_us = interval - query_us; + DRV_LOG(DEBUG, "ibdev %s counter service thread: " + "interval_us=%" PRIu64 " query_us=%" PRIu64 " " + "sleep_us=%" PRIu64, + sh->ibdev_name, interval, query_us, + interval > query_us ? sleep_us : 0); if (interval > query_us) rte_delay_us_sleep(sleep_us); } -- 2.39.5