DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/2] net/mlx5: fix counter object leaks
@ 2023-06-27  8:27 Dariusz Sosnowski
  2023-06-27  8:27 ` [PATCH 1/2] net/mlx5: fix queue used to deallocate counter Dariusz Sosnowski
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2023-06-27  8:27 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam, Suanming Mou; +Cc: dev

This patch series fixes the issue where after a long cycle of
inserting/deleting flow rules using COUNT actions,
mlx5_hws_cnt_pool_get() started returning either -EAGAIN or -ENOENT.
Root cause of the issue was the fact that some counter
objects were not properly returned to the shared counter pool.

Dariusz Sosnowski (2):
  net/mlx5: fix queue used to deallocate counter
  net/mlx5: fix counter allocation from shared pool

 drivers/net/mlx5/mlx5_flow_hw.c | 14 +++++++++-----
 drivers/net/mlx5/mlx5_hws_cnt.h | 17 +++++++++++++++++
 2 files changed, 26 insertions(+), 5 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] net/mlx5: fix queue used to deallocate counter
  2023-06-27  8:27 [PATCH 0/2] net/mlx5: fix counter object leaks Dariusz Sosnowski
@ 2023-06-27  8:27 ` Dariusz Sosnowski
  2023-06-27  8:27 ` [PATCH 2/2] net/mlx5: fix counter allocation from shared pool Dariusz Sosnowski
  2023-07-02 14:41 ` [PATCH 0/2] net/mlx5: fix counter object leaks Raslan Darawsheh
  2 siblings, 0 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2023-06-27  8:27 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam, Suanming Mou; +Cc: dev

When ports are configured to share flow engine resources
(configured through rte_flow_configure()),
counter objects used during flow rule creation are pulled
directly from the shared counter pool.
This is done by calling mlx5_hws_cnt_pool_get() with NULL pointer
passed as queue pointer.

When a flow rule was destroyed on the host port,
counter object was not returned to the pool, because
mlx5_hws_cnt_pool_put() was always called with a pointer
to the currently used queue.
This forced placing the counter in the local queue cache.
As a result, during the application runtime, some counter objects
became inaccessible, since this queue is never fully flushed.

This patch fixes that behavior, by changing the flow rule destroying
logic. If port is configured to use shared resources,
mlx5_hws_cnt_pool_put() is called with NULL pointer
passed as queue pointer.

Fixes: 13ea6bdcc7ee ("net/mlx5: support counters in cross port shared mode")
Cc: viacheslavo@nvidia.com

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 drivers/net/mlx5/mlx5_flow_hw.c | 14 +++++++++-----
 drivers/net/mlx5/mlx5_hws_cnt.h | 16 ++++++++++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index b5137a822a..a0c7956626 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -2273,6 +2273,7 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 		struct mlx5_hrxq *hrxq;
 		uint32_t ct_idx;
 		cnt_id_t cnt_id;
+		uint32_t *cnt_queue;
 		uint32_t mtr_id;
 
 		action = &actions[act_data->action_src];
@@ -2429,10 +2430,9 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
 				break;
 			/* Fall-through. */
 		case RTE_FLOW_ACTION_TYPE_COUNT:
-			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool,
-					(priv->shared_refcnt ||
-					 priv->hws_cpool->cfg.host_cpool) ?
-					NULL : &queue, &cnt_id, age_idx);
+			/* If the port is engaged in resource sharing, do not use queue cache. */
+			cnt_queue = mlx5_hws_cnt_is_pool_shared(priv) ? NULL : &queue;
+			ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx);
 			if (ret != 0)
 				return ret;
 			ret = mlx5_hws_cnt_pool_get_action_offset
@@ -3014,6 +3014,8 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 			  struct rte_flow_hw *flow,
 			  struct rte_flow_error *error)
 {
+	uint32_t *cnt_queue;
+
 	if (mlx5_hws_cnt_is_shared(priv->hws_cpool, flow->cnt_id)) {
 		if (flow->age_idx && !mlx5_hws_age_is_indirect(flow->age_idx)) {
 			/* Remove this AGE parameter from indirect counter. */
@@ -3024,8 +3026,10 @@ flow_hw_age_count_release(struct mlx5_priv *priv, uint32_t queue,
 		}
 		return;
 	}
+	/* If the port is engaged in resource sharing, do not use queue cache. */
+	cnt_queue = mlx5_hws_cnt_is_pool_shared(priv) ? NULL : &queue;
 	/* Put the counter first to reduce the race risk in BG thread. */
-	mlx5_hws_cnt_pool_put(priv->hws_cpool, &queue, &flow->cnt_id);
+	mlx5_hws_cnt_pool_put(priv->hws_cpool, cnt_queue, &flow->cnt_id);
 	flow->cnt_id = 0;
 	if (flow->age_idx) {
 		if (mlx5_hws_age_is_indirect(flow->age_idx)) {
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index b4f3db0533..f37a7d6151 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -553,6 +553,22 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
 	return 0;
 }
 
+/**
+ * Check if counter pool allocated for HWS is shared between ports.
+ *
+ * @param[in] priv
+ *   Pointer to the port private data structure.
+ *
+ * @return
+ *   True if counter pools is shared between ports. False otherwise.
+ */
+static __rte_always_inline bool
+mlx5_hws_cnt_is_pool_shared(struct mlx5_priv *priv)
+{
+	return priv && priv->hws_cpool &&
+	    (priv->shared_refcnt || priv->hws_cpool->cfg.host_cpool != NULL);
+}
+
 static __rte_always_inline unsigned int
 mlx5_hws_cnt_pool_get_size(struct mlx5_hws_cnt_pool *cpool)
 {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] net/mlx5: fix counter allocation from shared pool
  2023-06-27  8:27 [PATCH 0/2] net/mlx5: fix counter object leaks Dariusz Sosnowski
  2023-06-27  8:27 ` [PATCH 1/2] net/mlx5: fix queue used to deallocate counter Dariusz Sosnowski
@ 2023-06-27  8:27 ` Dariusz Sosnowski
  2023-07-02 14:41 ` [PATCH 0/2] net/mlx5: fix counter object leaks Raslan Darawsheh
  2 siblings, 0 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2023-06-27  8:27 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam, Suanming Mou; +Cc: dev

mlx5_hws_cnt struct represents counter objects which are used
in flow rules. This struct contains a union which stores
the following information:

- If counter object is used:
  - `share` is set to 1 if and only if counter object is used
    in an indirect action.
  - `age_idx` is set to the relevant AGE object index.
  - Both of these fields are set at the time of allocatin
    a counter object from the pool.
- If counter object is unused:
  - `query_gen_when_free` is set to the current reset cycle
    of the counter service at the time of freeing the counter object.

When ports were configured to share the flow engine resources,
counter object allocation logic in mlx5_hws_cnt_pool_get()
did not reset the `share` field.
This caused issues when previously released counter object
had the least significant bit of `query_gen_when_free` set to 1.
That counter object was treated as shared counter (indirect action),
even if it was allocated by using COUNT action directly in the flow
rule.

This patch fixes this issue by adding the explicit reset of `share`
field.

Fixes: 13ea6bdcc7ee ("net/mlx5: support counters in cross port shared mode")
Cc: viacheslavo@nvidia.com

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
 drivers/net/mlx5/mlx5_hws_cnt.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index f37a7d6151..f462665eac 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -506,6 +506,7 @@ mlx5_hws_cnt_pool_get(struct mlx5_hws_cnt_pool *cpool, uint32_t *queue,
 		__hws_cnt_query_raw(cpool, *cnt_id,
 				    &cpool->pool[iidx].reset.hits,
 				    &cpool->pool[iidx].reset.bytes);
+		cpool->pool[iidx].share = 0;
 		MLX5_ASSERT(!cpool->pool[iidx].in_used);
 		cpool->pool[iidx].in_used = true;
 		cpool->pool[iidx].age_idx = age_idx;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 0/2] net/mlx5: fix counter object leaks
  2023-06-27  8:27 [PATCH 0/2] net/mlx5: fix counter object leaks Dariusz Sosnowski
  2023-06-27  8:27 ` [PATCH 1/2] net/mlx5: fix queue used to deallocate counter Dariusz Sosnowski
  2023-06-27  8:27 ` [PATCH 2/2] net/mlx5: fix counter allocation from shared pool Dariusz Sosnowski
@ 2023-07-02 14:41 ` Raslan Darawsheh
  2 siblings, 0 replies; 4+ messages in thread
From: Raslan Darawsheh @ 2023-07-02 14:41 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad, Slava Ovsiienko, Ori Kam, Suanming Mou
  Cc: dev

Hi,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Tuesday, June 27, 2023 11:28 AM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 0/2] net/mlx5: fix counter object leaks
> 
> This patch series fixes the issue where after a long cycle of inserting/deleting
> flow rules using COUNT actions,
> mlx5_hws_cnt_pool_get() started returning either -EAGAIN or -ENOENT.
> Root cause of the issue was the fact that some counter objects were not
> properly returned to the shared counter pool.
> 
> Dariusz Sosnowski (2):
>   net/mlx5: fix queue used to deallocate counter
>   net/mlx5: fix counter allocation from shared pool
> 
>  drivers/net/mlx5/mlx5_flow_hw.c | 14 +++++++++-----
> drivers/net/mlx5/mlx5_hws_cnt.h | 17 +++++++++++++++++
>  2 files changed, 26 insertions(+), 5 deletions(-)
> 
> --
> 2.25.1

Series applied to next-net-mlx,
Added missing Cc stable@dpdk.org

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-02 14:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-27  8:27 [PATCH 0/2] net/mlx5: fix counter object leaks Dariusz Sosnowski
2023-06-27  8:27 ` [PATCH 1/2] net/mlx5: fix queue used to deallocate counter Dariusz Sosnowski
2023-06-27  8:27 ` [PATCH 2/2] net/mlx5: fix counter allocation from shared pool Dariusz Sosnowski
2023-07-02 14:41 ` [PATCH 0/2] net/mlx5: fix counter object leaks Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).