* [PATCH 1/2] net/mlx5: fix dangling pointer to params
2024-10-30 16:30 [PATCH 0/2] net/mlx5: fix counter query loop getting stuck Dariusz Sosnowski
@ 2024-10-30 16:30 ` Dariusz Sosnowski
2024-10-30 16:30 ` [PATCH 2/2] net/mlx5: fix counter query loop getting stuck Dariusz Sosnowski
2024-11-13 13:37 ` [PATCH 0/2] " Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2024-10-30 16:30 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad
Cc: Raslan Darawsheh, dev
Offending commits introduced 2 structures which held temporary
parameters for flow rule created. These parameters were passed
to "rule_acts" buffers stored inside template table and
they were used only during a call to mlx5dr_rule_create().
After mlx5dr_rule_create() finished, "rule_acts" buffer is not cleared,
for performance reasons, since on the next flow rule create
it will be overwritten.
GCC 14.2.1, when compiled with -03, issued the following warning:
```
drivers/net/mlx5/mlx5_flow_hw.c:3520:51:
error: storing the address of local variable
'ap' in '*<unknown>.<U2688>.modify_header.data'
[-Werror=dangling-pointer=]
3520 | rule_acts[pos].modify_header.data =
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
3521 | (uint8_t *)ap->mhdr_cmd;
| ~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/mlx5/mlx5_flow_hw.c: In function 'flow_hw_async_flow_create':
drivers/net/mlx5/mlx5_flow_hw.c:3925:43: note: 'ap' declared here
3925 | struct mlx5_flow_hw_action_params ap;
| ^~
drivers/net/mlx5/mlx5_flow_hw.c:3910:59: note: 'table' declared here
3910 | struct rte_flow_template_table *table,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
```
Which is technically correct. This warning could be fixed by
casting "ap->mhdr_cmd" to "void *", but this is just a workaround.
Since what mlx5 PMD cares about is that there is only one instance
of mlx5_flow_hw_action_params per queue, this patch moves this struct
from the stack to mlx5_hw_q struct.
The same is done for mlx5_flow_hw_pattern_params for similar reasons.
Fixes: 1d2744f5028d ("net/mlx5: remove flow action parameters from job")
Fixes: 57fd15fa373d ("net/mlx5: remove flow pattern from job")
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
drivers/net/mlx5/mlx5.h | 28 ++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_flow.h | 25 -------------------------
drivers/net/mlx5/mlx5_flow_hw.c | 9 +++------
3 files changed, 31 insertions(+), 31 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 503366580b..bc25d72c0e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -425,6 +425,30 @@ enum mlx5_hw_indirect_type {
#define MLX5_HW_MAX_ITEMS (16)
+#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
+#define MLX5_PUSH_MAX_LEN 128
+#define MLX5_ENCAP_MAX_LEN 132
+
+/** Container for flow action data constructed during flow rule creation. */
+struct mlx5_flow_hw_action_params {
+ /** Array of constructed modify header commands. */
+ struct mlx5_modification_cmd mhdr_cmd[MLX5_MHDR_MAX_CMD];
+ /** Constructed encap/decap data buffer. */
+ uint8_t encap_data[MLX5_ENCAP_MAX_LEN];
+ /** Constructed IPv6 routing data buffer. */
+ uint8_t ipv6_push_data[MLX5_PUSH_MAX_LEN];
+};
+
+/** Container for dynamically generated flow items used during flow rule creation. */
+struct mlx5_flow_hw_pattern_params {
+ /** Array of dynamically generated flow items. */
+ struct rte_flow_item items[MLX5_HW_MAX_ITEMS];
+ /** Temporary REPRESENTED_PORT item generated by PMD. */
+ struct rte_flow_item_ethdev port_spec;
+ /** Temporary TAG item generated by PMD. */
+ struct rte_flow_item_tag tag_spec;
+};
+
/* HW steering flow management job descriptor. */
struct mlx5_hw_q_job {
uint32_t type; /* Job type. */
@@ -449,6 +473,10 @@ struct __rte_cache_aligned mlx5_hw_q {
struct rte_ring *indir_iq; /* Indirect action SW in progress queue. */
struct rte_ring *flow_transfer_pending;
struct rte_ring *flow_transfer_completed;
+ /* Action's ARGUMENT resource buffer for rule creation. */
+ struct mlx5_flow_hw_action_params ap;
+ /* Holds spec value for any implicitly added item. */
+ struct mlx5_flow_hw_pattern_params pp;
};
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 693e07218d..d4d985fa3b 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -646,9 +646,6 @@ struct mlx5_flow_dv_matcher {
struct mlx5_flow_dv_match_params mask; /**< Matcher mask. */
};
-#define MLX5_PUSH_MAX_LEN 128
-#define MLX5_ENCAP_MAX_LEN 132
-
/* Encap/decap resource structure. */
struct mlx5_flow_dv_encap_decap_resource {
struct mlx5_list_entry entry;
@@ -1420,28 +1417,6 @@ typedef int
const struct rte_flow_action *,
struct mlx5dr_rule_action *);
-#define MLX5_MHDR_MAX_CMD ((MLX5_MAX_MODIFY_NUM) * 2 + 1)
-
-/** Container for flow action data constructed during flow rule creation. */
-struct mlx5_flow_hw_action_params {
- /** Array of constructed modify header commands. */
- struct mlx5_modification_cmd mhdr_cmd[MLX5_MHDR_MAX_CMD];
- /** Constructed encap/decap data buffer. */
- uint8_t encap_data[MLX5_ENCAP_MAX_LEN];
- /** Constructed IPv6 routing data buffer. */
- uint8_t ipv6_push_data[MLX5_PUSH_MAX_LEN];
-};
-
-/** Container for dynamically generated flow items used during flow rule creation. */
-struct mlx5_flow_hw_pattern_params {
- /** Array of dynamically generated flow items. */
- struct rte_flow_item items[MLX5_HW_MAX_ITEMS];
- /** Temporary REPRESENTED_PORT item generated by PMD. */
- struct rte_flow_item_ethdev port_spec;
- /** Temporary TAG item generated by PMD. */
- struct rte_flow_item_tag tag_spec;
-};
-
/* rte flow action translate to DR action struct. */
struct mlx5_action_construct_data {
LIST_ENTRY(mlx5_action_construct_data) next;
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 2a9ef71cd8..88d6859bac 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3978,8 +3978,6 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev,
.burst = attr->postpone,
};
struct mlx5dr_rule_action *rule_acts;
- struct mlx5_flow_hw_action_params ap;
- struct mlx5_flow_hw_pattern_params pp;
struct rte_flow_hw *flow = NULL;
const struct rte_flow_item *rule_items;
uint32_t flow_idx = 0;
@@ -4035,14 +4033,14 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev,
* No need to copy and contrust a new "actions" list based on the
* user's input, in order to save the cost.
*/
- if (flow_hw_actions_construct(dev, flow, &ap,
+ if (flow_hw_actions_construct(dev, flow, &priv->hw_q[queue].ap,
&table->ats[action_template_index],
table->its[pattern_template_index]->item_flags,
flow->table, actions,
rule_acts, queue, error))
goto error;
rule_items = flow_hw_get_rule_items(dev, table, items,
- pattern_template_index, &pp);
+ pattern_template_index, &priv->hw_q[queue].pp);
if (!rule_items)
goto error;
if (likely(!rte_flow_template_table_resizable(dev->data->port_id, &table->cfg.attr))) {
@@ -4185,7 +4183,6 @@ flow_hw_async_flow_update(struct rte_eth_dev *dev,
.burst = attr->postpone,
};
struct mlx5dr_rule_action *rule_acts;
- struct mlx5_flow_hw_action_params ap;
struct rte_flow_hw *of = (struct rte_flow_hw *)flow;
struct rte_flow_hw *nf;
struct rte_flow_hw_aux *aux;
@@ -4236,7 +4233,7 @@ flow_hw_async_flow_update(struct rte_eth_dev *dev,
* No need to copy and contrust a new "actions" list based on the
* user's input, in order to save the cost.
*/
- if (flow_hw_actions_construct(dev, nf, &ap,
+ if (flow_hw_actions_construct(dev, nf, &priv->hw_q[queue].ap,
&table->ats[action_template_index],
table->its[nf->mt_idx]->item_flags,
table, actions,
--
2.39.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/2] net/mlx5: fix counter query loop getting stuck
2024-10-30 16:30 [PATCH 0/2] net/mlx5: fix counter query loop getting stuck Dariusz Sosnowski
2024-10-30 16:30 ` [PATCH 1/2] net/mlx5: fix dangling pointer to params Dariusz Sosnowski
@ 2024-10-30 16:30 ` Dariusz Sosnowski
2024-11-13 13:37 ` [PATCH 0/2] " Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2024-10-30 16:30 UTC (permalink / raw)
To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad, Xiaoyu Min
Cc: Raslan Darawsheh, dev, stable
Counter service thread, responsible for refreshing counter values
stored in host memory, is running an "infinite loop" with the following
logic:
- For each port:
- Refresh port's counter pool - call to __mlx5_hws_cnt_svc().
- Perform aging checks.
- Go to sleep if time left in current cycle.
- Repeat.
__mlx5_hws_cnt_svc() used to perform counter value refresh
implemented the following logic:
1. Store number of counters waiting for reset.
2. Issue ASO WQEs to refresh all counters values.
3. Move counters from reset to reuse list.
Number of moved counters is limited by number stored in step 1 or
step 4.
4. Store number of counters waiting for reset.
5. If number of counters waiting for reset > 0, go to step 2.
Now, if an application constantly creates/destroys flow rules with
counters and even a single counter is added to reset list during step 2,
counter service thread might end up issuing ASO WQEs endlessly,
without going to sleep and respecting the configured cycle time.
This patch fixes that by remove the loop inside __mlx5_hws_cnt_svc().
As a drawback of this fix, the application must allocate enough counters
to accommodate for the cycle time. This number if roughly equal to the
expected counter release rate.
This patch also:
- Ensures that proper counter related error code is returned,
when flow rule create failed due to counter allocation problem.
- Adds debug logging to counter service thread.
- Adds documentation for counter service thread.
Fixes: 4d368e1da3a4 ("net/mlx5: support flow counter action for HWS")
Cc: jackmin@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
doc/guides/nics/mlx5.rst | 71 +++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_flow_hw.c | 17 +++++---
drivers/net/mlx5/mlx5_hws_cnt.c | 46 ++++++++++++---------
3 files changed, 110 insertions(+), 24 deletions(-)
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f82e2d75de..17c8fe70fd 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2655,3 +2655,74 @@ Destroy GENEVE TLV parser for specific port::
This command doesn't destroy the global list,
For releasing options, ``flush`` command should be used.
+
+
+Notes for flow counters
+-----------------------
+
+mlx5 PMD supports COUNT flow action, which provides an ability
+to count packets (and bytes) matched against a given flow rule.
+This section describes the high level overview of how this support is
+implemented and limitations.
+
+HW Steering flow engine
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Flow counters are allocated from HW in bulks.
+A set of bulks forms a flow counter pool managed by PMD.
+When flow counters are queried from HW,
+each counter is identified by an offset in a given bulk.
+Querying HW flow counter requires sending a request to HW,
+which will request a read of counter values for given offsets.
+HW will asynchronously provide these values through a DMA write.
+
+In order to optimize HW to SW communication,
+these requests are handled in a separate counter service thread
+spawned by mlx5 PMD.
+This service thread will refresh the counter values stored in memory,
+in cycles, each spanning ``svc_cycle_time`` milliseconds.
+By default, ``svc_cycle_time`` is set to 500.
+When applications query the COUNT flow action,
+PMD returns the values stored in host memory.
+
+mlx5 PMD manages 3 global rings of allocated counter offsets:
+
+- ``free`` ring - Counters which were not used at all.
+- ``wait_reset`` ring - Counters which were used in some flow rules,
+ but were recently freed (flow rule was destroyed or an indirect action
+ was destroyed).
+ Since count value might have changed between last counter service
+ thread cycle and the moment it was freed, the value in host memory
+ might be stale.
+ During next service thread cycle, such counters will be moved
+ to ``reuse`` ring.
+- ``reuse`` ring - Counters which were used at least once and
+ can be reused in new flow rules.
+
+When counters are assigned to a flow rule (or allocated to indirect action),
+PMD first tries to fetch a counter from ``reuse`` ring.
+If it's empty, PMD fetches a counter from ``free`` ring.
+
+Counter service thread works as follows:
+
+#. Record counters stored in ``wait_reset`` ring.
+#. Read values of all counters which were used at least once
+ or are currently in use.
+#. Move recorded counters from ``wait_reset`` to ``reuse`` ring.
+#. Sleep for ``(query time) - svc_cycle_time`` milliseconds
+#. Repeat.
+
+Because freeing a counter (by destroying a flow rule or destroying indirect
+action) does not immediately make it available for the application,
+PMD might return:
+
+- ``ENOENT`` if no counter is available in ``free``, ``reuse``
+ or ``wait_reset`` rings.
+ No counter will be available until application releases some of them.
+- ``EAGAIN`` if no counter is available in ``free`` and ``reuse`` rings,
+ but there are counters in ``wait_reset`` ring.
+ This means that after next service thread cycle new counters will be
+ available.
+
+Application has to be aware that flow rule create or indirect action create
+might need be retried.
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 88d6859bac..cebded0b04 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3734,8 +3734,11 @@ flow_hw_actions_construct(struct rte_eth_dev *dev,
case RTE_FLOW_ACTION_TYPE_COUNT:
cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue);
ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx);
- if (ret != 0)
+ if (ret != 0) {
+ rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_ACTION,
+ action, "Failed to allocate flow counter");
goto error;
+ }
ret = mlx5_hws_cnt_pool_get_action_offset
(priv->hws_cpool,
cnt_id,
@@ -3980,6 +3983,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev,
struct mlx5dr_rule_action *rule_acts;
struct rte_flow_hw *flow = NULL;
const struct rte_flow_item *rule_items;
+ struct rte_flow_error sub_error = { 0 };
uint32_t flow_idx = 0;
uint32_t res_idx = 0;
int ret;
@@ -4037,7 +4041,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev,
&table->ats[action_template_index],
table->its[pattern_template_index]->item_flags,
flow->table, actions,
- rule_acts, queue, error))
+ rule_acts, queue, &sub_error))
goto error;
rule_items = flow_hw_get_rule_items(dev, table, items,
pattern_template_index, &priv->hw_q[queue].pp);
@@ -4074,9 +4078,12 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev,
mlx5_ipool_free(table->resource, res_idx);
if (flow_idx)
mlx5_ipool_free(table->flow, flow_idx);
- rte_flow_error_set(error, rte_errno,
- RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
- "fail to create rte flow");
+ if (sub_error.cause != RTE_FLOW_ERROR_TYPE_NONE && error != NULL)
+ *error = sub_error;
+ else
+ rte_flow_error_set(error, rte_errno,
+ RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+ "fail to create rte flow");
return NULL;
}
diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c
index def0b19deb..0197c098f6 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.c
+++ b/drivers/net/mlx5/mlx5_hws_cnt.c
@@ -56,26 +56,29 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh,
uint32_t ret __rte_unused;
reset_cnt_num = rte_ring_count(reset_list);
- do {
- cpool->query_gen++;
- mlx5_aso_cnt_query(sh, cpool);
- zcdr.n1 = 0;
- zcdu.n1 = 0;
- ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list,
- sizeof(cnt_id_t),
- reset_cnt_num, &zcdu,
- NULL);
- MLX5_ASSERT(ret == reset_cnt_num);
- ret = rte_ring_dequeue_zc_burst_elem_start(reset_list,
- sizeof(cnt_id_t),
- reset_cnt_num, &zcdr,
- NULL);
- MLX5_ASSERT(ret == reset_cnt_num);
- __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
- rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num);
- rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num);
+ cpool->query_gen++;
+ mlx5_aso_cnt_query(sh, cpool);
+ zcdr.n1 = 0;
+ zcdu.n1 = 0;
+ ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list,
+ sizeof(cnt_id_t),
+ reset_cnt_num, &zcdu,
+ NULL);
+ MLX5_ASSERT(ret == reset_cnt_num);
+ ret = rte_ring_dequeue_zc_burst_elem_start(reset_list,
+ sizeof(cnt_id_t),
+ reset_cnt_num, &zcdr,
+ NULL);
+ MLX5_ASSERT(ret == reset_cnt_num);
+ __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num);
+ rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num);
+ rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num);
+
+ if (rte_log_can_log(mlx5_logtype, RTE_LOG_DEBUG)) {
reset_cnt_num = rte_ring_count(reset_list);
- } while (reset_cnt_num > 0);
+ DRV_LOG(DEBUG, "ibdev %s cpool %p wait_reset_cnt=%" PRIu32,
+ sh->ibdev_name, (void *)cpool, reset_cnt_num);
+ }
}
/**
@@ -325,6 +328,11 @@ mlx5_hws_cnt_svc(void *opaque)
rte_spinlock_unlock(&sh->cpool_lock);
query_us = query_cycle / (rte_get_timer_hz() / US_PER_S);
sleep_us = interval - query_us;
+ DRV_LOG(DEBUG, "ibdev %s counter service thread: "
+ "interval_us=%" PRIu64 " query_us=%" PRIu64 " "
+ "sleep_us=%" PRIu64,
+ sh->ibdev_name, interval, query_us,
+ interval > query_us ? sleep_us : 0);
if (interval > query_us)
rte_delay_us_sleep(sleep_us);
}
--
2.39.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] net/mlx5: fix counter query loop getting stuck
2024-10-30 16:30 [PATCH 0/2] net/mlx5: fix counter query loop getting stuck Dariusz Sosnowski
2024-10-30 16:30 ` [PATCH 1/2] net/mlx5: fix dangling pointer to params Dariusz Sosnowski
2024-10-30 16:30 ` [PATCH 2/2] net/mlx5: fix counter query loop getting stuck Dariusz Sosnowski
@ 2024-11-13 13:37 ` Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Raslan Darawsheh @ 2024-11-13 13:37 UTC (permalink / raw)
To: Dariusz Sosnowski, Slava Ovsiienko, Bing Zhao, Ori Kam,
Suanming Mou, Matan Azrad
Cc: dev
Hi,
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
Sent: Wednesday, October 30, 2024 6:30 PM
To: Slava Ovsiienko; Bing Zhao; Ori Kam; Suanming Mou; Matan Azrad
Cc: Raslan Darawsheh; dev@dpdk.org
Subject: [PATCH 0/2] net/mlx5: fix counter query loop getting stuck
Main content of this patchset is "net/mlx5: fix counter query loop getting stuck" patch.
The preceding patch is required, because the fix for counter query loop
triggered the dangling pointer compilation error in GCC 14.2.1 in an unrelated
part of the code. To be specific, the following part caused the error:
```
@@ -3978,7 +3982,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev,
&table->ats[action_template_index],
table->its[pattern_template_index]->item_flags,
flow->table, actions,
- rule_acts, queue, error))
+ rule_acts, queue, &sub_error))
goto error;
rule_items = flow_hw_get_rule_items(dev, table, items,
pattern_template_index, &priv->hw_q[queue].pp);
```
Dariusz Sosnowski (2):
net/mlx5: fix dangling pointer to params
net/mlx5: fix counter query loop getting stuck
doc/guides/nics/mlx5.rst | 71 +++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5.h | 28 +++++++++++++
drivers/net/mlx5/mlx5_flow.h | 25 ------------
drivers/net/mlx5/mlx5_flow_hw.c | 26 +++++++-----
drivers/net/mlx5/mlx5_hws_cnt.c | 46 ++++++++++++---------
5 files changed, 141 insertions(+), 55 deletions(-)
--
2.39.5
Series applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply [flat|nested] 4+ messages in thread