From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B51B948A44 for ; Fri, 31 Oct 2025 15:38:54 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AF13140691; Fri, 31 Oct 2025 15:38:54 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 9FD9340150 for ; Fri, 31 Oct 2025 15:38:53 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761921533; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lzdTLWrvnFhODliotUZFwoqocaCwN6jcaFHJep/7Pes=; b=Gj5AGhODu87igR6t7lf/2WdJfFCWELO6lO4sQrNICXZBT+Me+u7Sh54Qt/UCUo4k/Cm46j 1G2uTZpQSrbSGGmb9x0lcrFo1oeWwQLp+2GWgfHJlI+4qfP/HpQQP6Zeq8BW7HL3Ze0sBo yV9UVM7XFOpb9jbQUQhUuE0OBt+A2DU= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-491-1ASl-n-6MFeieQ3QepbaZg-1; Fri, 31 Oct 2025 10:38:49 -0400 X-MC-Unique: 1ASl-n-6MFeieQ3QepbaZg-1 X-Mimecast-MFC-AGG-ID: 1ASl-n-6MFeieQ3QepbaZg_1761921528 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BA4D1195608F; Fri, 31 Oct 2025 14:38:48 +0000 (UTC) Received: from rh.redhat.com (unknown [10.44.32.50]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2C3661800579; Fri, 31 Oct 2025 14:38:46 +0000 (UTC) From: Kevin Traynor To: Rongwei Liu Cc: Dariusz Sosnowski , dpdk stable Subject: patch 'net/mlx5: fix flow aging race condition' has been queued to stable release 24.11.4 Date: Fri, 31 Oct 2025 14:33:33 +0000 Message-ID: <20251031143421.324432-91-ktraynor@redhat.com> In-Reply-To: <20251031143421.324432-1-ktraynor@redhat.com> References: <20251031143421.324432-1-ktraynor@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7CBF4PRyFqS2MqygxQaBlp7BcJWIGlGQNavopqNVEx4_1761921528 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 24.11.4 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 11/05/25. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/kevintraynor/dpdk-stable This queued commit can be viewed at: https://github.com/kevintraynor/dpdk-stable/commit/5c5f24d5279a0a124b63050f4345c0f8a834d5b4 Thanks. Kevin --- >From 5c5f24d5279a0a124b63050f4345c0f8a834d5b4 Mon Sep 17 00:00:00 2001 From: Rongwei Liu Date: Thu, 9 Oct 2025 12:18:10 +0300 Subject: [PATCH] net/mlx5: fix flow aging race condition [ upstream commit 820ca7361bb7fa40e96e53515d8392ea40a35265 ] When aging is configured, there is a background thread which queries all the counters in the pool. Meantime, per queue flow insertion/deletion/update changes the counter pool too. It introduces a race condition between resetting counters's in_used and age_idx fields during flow deletion and reading them in the background thread. To resolve it, all key members of counter's struct are placed in a single uint32_t and they are accessed atomically. To avoid the occasional timestamp equalization with age_idx, query_gen_when_free is moved out of the union. The total memory size is kept the same. Fixes: 04a4de756e14 ("net/mlx5: support flow age action with HWS") Signed-off-by: Rongwei Liu Acked-by: Dariusz Sosnowski --- drivers/net/mlx5/mlx5_flow_hw.c | 5 +- drivers/net/mlx5/mlx5_hws_cnt.c | 10 +-- drivers/net/mlx5/mlx5_hws_cnt.h | 135 ++++++++++++++++++++------------ 3 files changed, 91 insertions(+), 59 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index be0129a96f..052b7f2768 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -3192,5 +3192,5 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue, if (action_flags & MLX5_FLOW_ACTION_COUNT) { cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); - if (mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &age_cnt, idx) < 0) + if (mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &age_cnt, idx, 0) < 0) return -1; flow->flags |= MLX5_FLOW_HW_FLOW_FLAG_CNT_ID; @@ -3628,5 +3628,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, case RTE_FLOW_ACTION_TYPE_COUNT: cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); - ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx); + ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, + age_idx, 0); if (ret != 0) { rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_ACTION, diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c index 7baeaedd17..3a93434ecd 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.c +++ b/drivers/net/mlx5/mlx5_hws_cnt.c @@ -57,6 +57,6 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, reset_cnt_num = rte_ring_count(reset_list); - cpool->query_gen++; mlx5_aso_cnt_query(sh, cpool); + rte_atomic_fetch_add_explicit(&cpool->query_gen, 1, rte_memory_order_release); zcdr.n1 = 0; zcdu.n1 = 0; @@ -128,12 +128,12 @@ mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool) uint16_t expected1 = HWS_AGE_CANDIDATE; uint16_t expected2 = HWS_AGE_CANDIDATE_INSIDE_RING; - uint32_t i; + uint32_t i, age_idx, in_use; cpool->time_of_last_age_check = curr_time; for (i = 0; i < nb_alloc_cnts; ++i) { - uint32_t age_idx = cpool->pool[i].age_idx; uint64_t hits; - if (!cpool->pool[i].in_used || age_idx == 0) + mlx5_hws_cnt_get_all(&cpool->pool[i], &in_use, NULL, &age_idx); + if (!in_use || age_idx == 0) continue; param = mlx5_ipool_get(age_info->ages_ipool, age_idx); @@ -754,5 +754,5 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev, * to wait for query. */ - cpool->query_gen = 1; + rte_atomic_store_explicit(&cpool->query_gen, 1, rte_memory_order_relaxed); ret = mlx5_hws_cnt_pool_action_create(priv, cpool); if (ret != 0) { diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h index d8da9dfcdd..8408c571ec 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.h +++ b/drivers/net/mlx5/mlx5_hws_cnt.h @@ -43,29 +43,32 @@ struct mlx5_hws_cnt_dcs_mng { }; +union mlx5_hws_cnt_state { + alignas(RTE_CACHE_LINE_SIZE) RTE_ATOMIC(uint32_t)data; + struct { + uint32_t in_used:1; + /* Indicator whether this counter in used or in pool. */ + uint32_t share:1; + /* + * share will be set to 1 when this counter is used as + * indirect action. + */ + uint32_t age_idx:24; + /* + * When this counter uses for aging, it stores the index + * of AGE parameter. Otherwise, this index is zero. + */ + }; +}; + struct mlx5_hws_cnt { struct flow_counter_stats reset; - bool in_used; /* Indicator whether this counter in used or in pool. */ - union { - struct { - uint32_t share:1; - /* - * share will be set to 1 when this counter is used as - * indirect action. - */ - uint32_t age_idx:24; - /* - * When this counter uses for aging, it save the index - * of AGE parameter. For pure counter (without aging) - * this index is zero. - */ - }; - /* This struct is only meaningful when user own this counter. */ - uint32_t query_gen_when_free; - /* - * When PMD own this counter (user put back counter to PMD - * counter pool, i.e), this field recorded value of counter - * pools query generation at time user release the counter. - */ - }; + union mlx5_hws_cnt_state cnt_state; + /* This struct is only meaningful when user own this counter. */ + alignas(RTE_CACHE_LINE_SIZE) RTE_ATOMIC(uint32_t)query_gen_when_free; + /* + * When PMD own this counter (user put back counter to PMD + * counter pool, i.e), this field recorded value of counter + * pools query generation at time user release the counter. + */ }; @@ -198,4 +201,40 @@ mlx5_hws_cnt_id_valid(cnt_id_t cnt_id) } +static __rte_always_inline void +mlx5_hws_cnt_set_age_idx(struct mlx5_hws_cnt *cnt, uint32_t value) +{ + union mlx5_hws_cnt_state cnt_state; + + cnt_state.data = rte_atomic_load_explicit(&cnt->cnt_state.data, rte_memory_order_acquire); + cnt_state.age_idx = value; + rte_atomic_store_explicit(&cnt->cnt_state.data, cnt_state.data, rte_memory_order_release); +} + +static __rte_always_inline void +mlx5_hws_cnt_set_all(struct mlx5_hws_cnt *cnt, uint32_t in_used, uint32_t share, uint32_t age_idx) +{ + union mlx5_hws_cnt_state cnt_state; + + cnt_state.in_used = !!in_used; + cnt_state.share = !!share; + cnt_state.age_idx = age_idx; + rte_atomic_store_explicit(&cnt->cnt_state.data, cnt_state.data, rte_memory_order_relaxed); +} + +static __rte_always_inline void +mlx5_hws_cnt_get_all(struct mlx5_hws_cnt *cnt, uint32_t *in_used, uint32_t *share, + uint32_t *age_idx) +{ + union mlx5_hws_cnt_state cnt_state; + + cnt_state.data = rte_atomic_load_explicit(&cnt->cnt_state.data, rte_memory_order_acquire); + if (in_used != NULL) + *in_used = cnt_state.in_used; + if (share != NULL) + *share = cnt_state.share; + if (age_idx != NULL) + *age_idx = cnt_state.age_idx; +} + /** * Generate Counter id from internal index. @@ -425,7 +464,8 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, hpool = mlx5_hws_cnt_host_pool(cpool); iidx = mlx5_hws_cnt_iidx(hpool, *cnt_id); - hpool->pool[iidx].in_used = false; - hpool->pool[iidx].query_gen_when_free = - rte_atomic_load_explicit(&hpool->query_gen, rte_memory_order_relaxed); + mlx5_hws_cnt_set_all(&hpool->pool[iidx], 0, 0, 0); + rte_atomic_store_explicit(&hpool->pool[iidx].query_gen_when_free, + rte_atomic_load_explicit(&hpool->query_gen, rte_memory_order_relaxed), + rte_memory_order_relaxed); if (likely(queue != NULL) && cpool->cfg.host_cpool == NULL) qcache = hpool->cache->qcache[*queue]; @@ -481,5 +521,5 @@ mlx5_hws_cnt_pool_put(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, static __rte_always_inline int mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, - cnt_id_t *cnt_id, uint32_t age_idx) + cnt_id_t *cnt_id, uint32_t age_idx, uint32_t shared) { unsigned int ret; @@ -509,8 +549,5 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, &cpool->pool[iidx].reset.hits, &cpool->pool[iidx].reset.bytes); - cpool->pool[iidx].share = 0; - MLX5_ASSERT(!cpool->pool[iidx].in_used); - cpool->pool[iidx].in_used = true; - cpool->pool[iidx].age_idx = age_idx; + mlx5_hws_cnt_set_all(&cpool->pool[iidx], 1, shared, age_idx); return 0; } @@ -531,6 +568,8 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, *cnt_id = (*(cnt_id_t *)zcdc.ptr1); iidx = mlx5_hws_cnt_iidx(cpool, *cnt_id); - query_gen = cpool->pool[iidx].query_gen_when_free; - if (cpool->query_gen == query_gen) { /* counter is waiting to reset. */ + query_gen = rte_atomic_load_explicit(&cpool->pool[iidx].query_gen_when_free, + rte_memory_order_relaxed); + /* counter is waiting to reset. */ + if (rte_atomic_load_explicit(&cpool->query_gen, rte_memory_order_relaxed) == query_gen) { rte_ring_dequeue_zc_elem_finish(qcache, 0); /* write-back counter to reset list. */ @@ -550,8 +589,5 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue, &cpool->pool[iidx].reset.bytes); rte_ring_dequeue_zc_elem_finish(qcache, 1); - cpool->pool[iidx].share = 0; - MLX5_ASSERT(!cpool->pool[iidx].in_used); - cpool->pool[iidx].in_used = true; - cpool->pool[iidx].age_idx = age_idx; + mlx5_hws_cnt_set_all(&cpool->pool[iidx], 1, shared, age_idx); return 0; } @@ -612,13 +648,6 @@ mlx5_hws_cnt_shared_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id, { struct mlx5_hws_cnt_pool *hpool = mlx5_hws_cnt_host_pool(cpool); - uint32_t iidx; - int ret; - ret = mlx5_hws_cnt_pool_get(hpool, NULL, cnt_id, age_idx); - if (ret != 0) - return ret; - iidx = mlx5_hws_cnt_iidx(hpool, *cnt_id); - hpool->pool[iidx].share = 1; - return 0; + return mlx5_hws_cnt_pool_get(hpool, NULL, cnt_id, age_idx, 1); } @@ -627,7 +656,5 @@ mlx5_hws_cnt_shared_put(struct mlx5_hws_cnt_pool *cpool, cnt_id_t *cnt_id) { struct mlx5_hws_cnt_pool *hpool = mlx5_hws_cnt_host_pool(cpool); - uint32_t iidx = mlx5_hws_cnt_iidx(hpool, *cnt_id); - hpool->pool[iidx].share = 0; mlx5_hws_cnt_pool_put(hpool, NULL, cnt_id); } @@ -638,6 +665,8 @@ mlx5_hws_cnt_is_shared(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id) struct mlx5_hws_cnt_pool *hpool = mlx5_hws_cnt_host_pool(cpool); uint32_t iidx = mlx5_hws_cnt_iidx(hpool, cnt_id); + uint32_t share; - return hpool->pool[iidx].share ? true : false; + mlx5_hws_cnt_get_all(&hpool->pool[iidx], NULL, &share, NULL); + return !!share; } @@ -649,6 +678,6 @@ mlx5_hws_cnt_age_set(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id, uint32_t iidx = mlx5_hws_cnt_iidx(hpool, cnt_id); - MLX5_ASSERT(hpool->pool[iidx].share); - hpool->pool[iidx].age_idx = age_idx; + MLX5_ASSERT(hpool->pool[iidx].cnt_state.share); + mlx5_hws_cnt_set_age_idx(&hpool->pool[iidx], age_idx); } @@ -658,7 +687,9 @@ mlx5_hws_cnt_age_get(struct mlx5_hws_cnt_pool *cpool, cnt_id_t cnt_id) struct mlx5_hws_cnt_pool *hpool = mlx5_hws_cnt_host_pool(cpool); uint32_t iidx = mlx5_hws_cnt_iidx(hpool, cnt_id); + uint32_t age_idx, share; - MLX5_ASSERT(hpool->pool[iidx].share); - return hpool->pool[iidx].age_idx; + mlx5_hws_cnt_get_all(&hpool->pool[iidx], NULL, &share, &age_idx); + MLX5_ASSERT(share); + return age_idx; } -- 2.51.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2025-10-31 13:53:55.022194996 +0000 +++ 0091-net-mlx5-fix-flow-aging-race-condition.patch 2025-10-31 13:53:52.263524077 +0000 @@ -1 +1 @@ -From 820ca7361bb7fa40e96e53515d8392ea40a35265 Mon Sep 17 00:00:00 2001 +From 5c5f24d5279a0a124b63050f4345c0f8a834d5b4 Mon Sep 17 00:00:00 2001 @@ -5,0 +6,2 @@ +[ upstream commit 820ca7361bb7fa40e96e53515d8392ea40a35265 ] + @@ -22 +23,0 @@ -Cc: stable@dpdk.org @@ -33 +34 @@ -index 9a0aa1827e..491a78a0de 100644 +index be0129a96f..052b7f2768 100644 @@ -36 +37 @@ -@@ -3233,5 +3233,5 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue, +@@ -3192,5 +3192,5 @@ flow_hw_shared_action_construct(struct rte_eth_dev *dev, uint32_t queue, @@ -43 +44 @@ -@@ -3669,5 +3669,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, +@@ -3628,5 +3628,6 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, @@ -52 +53 @@ -index 5c738f38ca..fb01fce4e5 100644 +index 7baeaedd17..3a93434ecd 100644 @@ -55 +56 @@ -@@ -64,6 +64,6 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, +@@ -57,6 +57,6 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, @@ -63 +64 @@ -@@ -135,12 +135,12 @@ mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool) +@@ -128,12 +128,12 @@ mlx5_hws_aging_check(struct mlx5_priv *priv, struct mlx5_hws_cnt_pool *cpool) @@ -79 +80 @@ -@@ -768,5 +768,5 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev, +@@ -754,5 +754,5 @@ mlx5_hws_cnt_pool_create(struct rte_eth_dev *dev, @@ -87 +88 @@ -index 38a9c19449..f5b7e8f643 100644 +index d8da9dfcdd..8408c571ec 100644