From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2C57E43CA3 for ; Thu, 14 Mar 2024 14:30:38 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 26E3442E51; Thu, 14 Mar 2024 14:30:38 +0100 (CET) Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2058.outbound.protection.outlook.com [40.107.223.58]) by mails.dpdk.org (Postfix) with ESMTP id 88A6D40E28 for ; Thu, 14 Mar 2024 14:30:36 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Z3qJLH14KeXHnUY7gRE1jjQbdxjsYVrfiM2pE3z+lNpeXHdjJcdntb5XiK075Niza1XIPKfbDsnFqABrofZz6QzpEuCw+v2m8tGH4EbF/C65r93o7CLvATNCRkRqaHVE6fPHkM8S5dOos9FG59Nxn9DnGDIN4hqvVi6MEUEax2KgHrvqBYtpMVGQkNf+TKNVUoR5r+Yk78z+xSdSPD+Vs4Z3NCj3tLbBmY6O7VAFtN8EwpVBF6YfL9nNjbCazgVDEnf0RI3fDMKIrP70dCNHaqWqVX8zxzR9LW7zxCoOmD0+p/xL/BfQjsL2YNebB0a2l5pSW7vCnxxc4oDeY8JqTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JTpMlOyX7HBs/KLU31oer9QqGst2mAnKWIwPrbIoPRs=; b=Iq5eWxEOgJiv4XBHj0FBxiyi0uUkc2nCdwC1N2iIyIPwa9VzePTAr5a3fBM/s0Qhh3mC8SWBmUPQJwKQAYycZyTzQePH0A8W0tHBR3a79phXPbx4WQBCfj2A/rRKdS75yTxORzAAv4qursVumEB9sI34s7lTOQrSgs7qDDBhTiOG8V67bOLwzgZcnasTf4bZXWw4YeVnn4HAsS3CrBL0jSxg5l6Oz5liU9vhUB5t7nrKb0MLETrL7S9Wiess9QlxhIRi8d8iwBcyw7kA/zbmM6upAd211pKNyHtWCffMlmFjZHIJpLISBdwxG3FvCCxLJKRPpeaL2aWHoMfGnKfbOQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JTpMlOyX7HBs/KLU31oer9QqGst2mAnKWIwPrbIoPRs=; b=YekO+1qkIb2ffzeUX0NyTFOexl1VwSDBovbhSK/47wUxieWkpnM4P5EdqYFmu7EdwZINQH0rKBdVbZE2q6zboIEtjnbAK3ZEoDmI6YCO5NqfHz6vElZBKjDXs+pb58pM3GTh3AWjNdm78xjcX5qEuRg20Vdd5XIQplSP+8VtRGZeRENjkmdE1YZ2P7HjlIDEU7cnJnZnObbhaGhj2QtX8cPnbiOp7xIa2cwbKTvxzmUmTA+BzQzEogsW50OLQR7lYquRUvzMYYdEtvrdUIU1VAJ7L0UioT2Vk9VA6yNhSVcp1VgotFOzNHXpq95zmxum867rpHaHyh97glT7FV4i5g== Received: from DS7PR05CA0065.namprd05.prod.outlook.com (2603:10b6:8:57::10) by MN6PR12MB8516.namprd12.prod.outlook.com (2603:10b6:208:46f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7362.33; Thu, 14 Mar 2024 13:30:32 +0000 Received: from DS1PEPF00017098.namprd05.prod.outlook.com (2603:10b6:8:57:cafe::3) by DS7PR05CA0065.outlook.office365.com (2603:10b6:8:57::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7386.20 via Frontend Transport; Thu, 14 Mar 2024 13:30:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by DS1PEPF00017098.mail.protection.outlook.com (10.167.18.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7386.12 via Frontend Transport; Thu, 14 Mar 2024 13:30:31 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Thu, 14 Mar 2024 06:30:17 -0700 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.12; Thu, 14 Mar 2024 06:30:15 -0700 From: Dariusz Sosnowski To: , Ori Kam , Suanming Mou , , Matan Azrad , Viacheslav Ovsiienko , Xiaoyu Min CC: Bing Zhao Subject: [PATCH 22.11] net/mlx5: fix flow counter cache starvation Date: Thu, 14 Mar 2024 14:29:05 +0100 Message-ID: <20240314132905.409294-1-dsosnowski@nvidia.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail201.nvidia.com (10.129.68.8) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS1PEPF00017098:EE_|MN6PR12MB8516:EE_ X-MS-Office365-Filtering-Correlation-Id: 6ac94b03-030f-42a4-2595-08dc442aec14 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mIQJ7L9QhcKSp838Tzmp+MyXIk9AnCfRIV4FKDS5l1vTUoLRc/brpVy/mV5HFzkyTK6yyd1cyysEj48jjK/tW4GKIId4VWWNjobouvaFtbPkIzhrhJr6Ok6G1qQmCvitsDk4bKneSSbGsjxA9xrCt2oymbWW6hqZzgF/Xgf2u5bYC1i/1icq2CzqCbhgs+GYru70EVV9q2TeR8nlTykvOb7JPqlAh0d0u/gyl424aPRYtStleX/kDqSrMv5b1MU+zupP393mBnJQkqzzcKV6/55il2WDZo5tEMbAhvlYt1BlQeP2F0CPTyywUkBsN8kBSfHsjg2U56WEVWc5opZ4Cjpwkq99laldnU1bdKTtM6LG82zWwmjBfAvVNGhRul9ENVLR7aMEFJOsABBpwleNSwAaoh3zHjjnPqKuIiT/xQev7Q9VOPADKLI+6CF7jcPnzbTt/zVXaPuSdP5gO/OnbLUlKovl/oSqV11C1Gb75vZ4kYRLT4U6NEJyTn4QsUzUCPrTbR4AMJXoJmjdhJA19rWaObfxz78kZ26f33NEodOi9JNgzOqAHpgosU46tecHkWmSA8LKCTzT/YDYgz353d8B0X2b7TS5pzbKOZL2Hd4GrKXZNMQxV7sHRotn5jK0c38Q/kwaGpBxQ63lGaRYf3OoYEVDvikBLgnuKaeu/dxnLDV48Z4d0i7eRcdpDMsxo9xNVpAc8mT0JK3W29lNq0f/ktkiZIeQ7/2SJH9Z8VhKiN+dFmP5RaPdDV39FrbZ X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230031)(376005)(36860700004)(1800799015)(82310400014); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2024 13:30:31.8420 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6ac94b03-030f-42a4-2595-08dc442aec14 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS1PEPF00017098.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN6PR12MB8516 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org [ upstream commit d755221b77c29549be4025e547cf907ad3a0abcf ] mlx5 PMD maintains a global counter pool and per-queue counter cache, which are used to allocate COUNT flow action objects. Whenever an empty cache is accessed, it is replenished with a pre-defined number of counters. If number of configured counters was sufficiently small, then it might have happened that caches associated with some queues could get starved because all counters were fetched on other queues. This patch fixes that by disabling cache at runtime if number of configured counters is not sufficient to avoid such starvation. Fixes: 4d368e1da3a4 ("net/mlx5: support flow counter action for HWS") Cc: stable@dpdk.org Signed-off-by: Dariusz Sosnowski Acked-by: Ori Kam Acked-by: Bing Zhao --- drivers/net/mlx5/mlx5_flow_hw.c | 10 +++-- drivers/net/mlx5/mlx5_hws_cnt.c | 72 ++++++++++++++++++++++++--------- drivers/net/mlx5/mlx5_hws_cnt.h | 26 ++++++++++++ 3 files changed, 86 insertions(+), 22 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index 20fa4eee0c..bb4693c2b4 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -2170,6 +2170,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq; uint32_t ct_idx; cnt_id_t cnt_id; + uint32_t *cnt_queue; uint32_t mtr_id; action = &actions[act_data->action_src]; @@ -2321,8 +2322,8 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, break; /* Fall-through. */ case RTE_FLOW_ACTION_TYPE_COUNT: - ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, &queue, - &cnt_id, age_idx); + cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); + ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx); if (ret != 0) return ret; ret = mlx5_hws_cnt_pool_get_action_offset @@ -2654,6 +2655,8 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue, struct rte_flow_hw *flow, struct rte_flow_error *error) { + uint32_t *cnt_queue; + if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) { if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) { /* Remove this AGE parameter from indirect counter. */ @@ -2664,8 +2667,9 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue, } return; } + cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); /* Put the counter first to reduce the race risk in BG thread. */ - mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id); + mlx5_hws_cnt_pool_put(priv->hws_cpool, cnt_queue, &flow->cnt_id); flow->cnt_id = 0; if (flow->age_idx) { if (mlx5_hws_age_is_indirect(flow->age_idx)) { diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c index 791fde4458..885effe3a1 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.c +++ b/drivers/net/mlx5/mlx5_hws_cnt.c @@ -333,6 +333,55 @@ mlx5_hws_cnt_svc(void *opaque) return NULL; } +static bool +mlx5_hws_cnt_should_enable_cache(const struct mlx5_hws_cnt_pool_cfg *pcfg, + const struct mlx5_hws_cache_param *ccfg) +{ + /* + * Enable cache if and only if there are enough counters requested + * to populate all of the caches. + */ + return pcfg->request_num >= ccfg->q_num * ccfg->size; +} + +static struct mlx5_hws_cnt_pool_caches * +mlx5_hws_cnt_cache_init(const struct mlx5_hws_cnt_pool_cfg *pcfg, + const struct mlx5_hws_cache_param *ccfg) +{ + struct mlx5_hws_cnt_pool_caches *cache; + char mz_name[RTE_MEMZONE_NAMESIZE]; + uint32_t qidx; + + /* If counter pool is big enough, setup the counter pool cache. */ + cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, + sizeof(*cache) + + sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0]) + * ccfg->q_num, 0, SOCKET_ID_ANY); + if (cache == NULL) + return NULL; + /* Store the necessary cache parameters. */ + cache->fetch_sz = ccfg->fetch_sz; + cache->preload_sz = ccfg->preload_sz; + cache->threshold = ccfg->threshold; + cache->q_num = ccfg->q_num; + for (qidx = 0; qidx < ccfg->q_num; qidx++) { + snprintf(mz_name, sizeof(mz_name), "%s_qc/%x", pcfg->name, qidx); + cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size, + SOCKET_ID_ANY, + RING_F_SP_ENQ | RING_F_SC_DEQ | + RING_F_EXACT_SZ); + if (cache->qcache[qidx] == NULL) + goto error; + } + return cache; + +error: + while (qidx--) + rte_ring_free(cache->qcache[qidx]); + mlx5_free(cache); + return NULL; +} + struct mlx5_hws_cnt_pool * mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh, const struct mlx5_hws_cnt_pool_cfg *pcfg, @@ -341,7 +390,6 @@ mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh, char mz_name[RTE_MEMZONE_NAMESIZE]; struct mlx5_hws_cnt_pool *cntp; uint64_t cnt_num = 0; - uint32_t qidx; MLX5_ASSERT(pcfg); MLX5_ASSERT(ccfg); @@ -351,17 +399,6 @@ mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh, return NULL; cntp->cfg = *pcfg; - cntp->cache = mlx5_malloc(MLX5_MEM_ANY | MLX5_MEM_ZERO, - sizeof(*cntp->cache) + - sizeof(((struct mlx5_hws_cnt_pool_caches *)0)->qcache[0]) - * ccfg->q_num, 0, SOCKET_ID_ANY); - if (cntp->cache == NULL) - goto error; - /* store the necessary cache parameters. */ - cntp->cache->fetch_sz = ccfg->fetch_sz; - cntp->cache->preload_sz = ccfg->preload_sz; - cntp->cache->threshold = ccfg->threshold; - cntp->cache->q_num = ccfg->q_num; if (pcfg->request_num > sh->hws_max_nb_counters) { DRV_LOG(ERR, "Counter number %u " "is greater than the maximum supported (%u).", @@ -408,13 +445,10 @@ mlx5_hws_cnt_pool_init(struct mlx5_dev_ctx_shared *sh, DRV_LOG(ERR, "failed to create reuse list ring"); goto error; } - for (qidx = 0; qidx < ccfg->q_num; qidx++) { - snprintf(mz_name, sizeof(mz_name), "%s_qc/%x", pcfg->name, qidx); - cntp->cache->qcache[qidx] = rte_ring_create(mz_name, ccfg->size, - SOCKET_ID_ANY, - RING_F_SP_ENQ | RING_F_SC_DEQ | - RING_F_EXACT_SZ); - if (cntp->cache->qcache[qidx] == NULL) + /* Allocate counter cache only if needed. */ + if (mlx5_hws_cnt_should_enable_cache(pcfg, ccfg)) { + cntp->cache = mlx5_hws_cnt_cache_init(pcfg, ccfg); + if (cntp->cache == NULL) goto error; } /* Initialize the time for aging-out calculation. */ diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h index b5c19a8e2c..72751f3330 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.h +++ b/drivers/net/mlx5/mlx5_hws_cnt.h @@ -533,6 +533,32 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, return 0; } +/** + * Decide if the given queue can be used to perform counter allocation/deallcation + * based on counter configuration + * + * @param[in] priv + * Pointer to the port private data structure. + * @param[in] queue + * Pointer to the queue index. + * + * @return + * @p queue if cache related to the queue can be used. NULL otherwise. + */ +static __rte_always_inline uint32_t * +mlx5_hws_cnt_get_queue(struct mlx5_priv *priv, uint32_t *queue) +{ + if (priv && priv->hws_cpool) { + /* Do not use queue cache if counter cache is disabled. */ + if (priv->hws_cpool->cache == NULL) + return NULL; + return queue; + } + /* This case should not be reached if counter pool was successfully configured. */ + MLX5_ASSERT(false); + return NULL; +} + static __rte_always_inline unsigned int mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool) { -- 2.39.2