* [PATCH 1/3] net/mlx5: fix multi process Tx default rules
@ 2025-10-29 15:57 Gregory Etelson
2025-10-29 15:57 ` [PATCH 2/3] net/mlx5: fix control flow leakage for external SQ Gregory Etelson
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Gregory Etelson @ 2025-10-29 15:57 UTC (permalink / raw)
To: dev
Cc: getelson, mkashani, rasland, Michael Baum, dsosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
From: Michael Baum <michaelba@nvidia.com>
When representor matching is disabled, an egress default rule is
inserted which matches all and copies REG_A to REG_C_1 (when dv_xmeta_en
== 4) and jump to group 1. All user rules started from group 1.
When 2 processes are working together, the first one creates this flow
rule and the second one is failed with errno EEXIST. This renders all
user egress rules in 2nd process to be invalid.
This patch changes this default rule match on SQs.
Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching")
Cc: dsosnowski@nvidia.com
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
# Conflicts:
# drivers/net/mlx5/mlx5_flow_hw.c
---
drivers/net/mlx5/mlx5_flow.h | 4 +++-
drivers/net/mlx5/mlx5_flow_hw.c | 24 +++++++++++-------------
drivers/net/mlx5/mlx5_trigger.c | 25 +++++++++++++------------
drivers/net/mlx5/mlx5_txq.c | 8 ++++++++
4 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index ff61706054..07d2f4185c 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -3582,7 +3582,9 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
int mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev,
uint32_t sqn);
int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
-int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev);
+int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
+ uint32_t sqn,
+ bool external);
int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external);
int mlx5_flow_hw_lacp_rx_flow(struct rte_eth_dev *dev);
int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 491a78a0de..d945c88eb0 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -10643,7 +10643,7 @@ flow_hw_create_tx_default_mreg_copy_table(struct rte_eth_dev *dev,
.priority = MLX5_HW_LOWEST_PRIO_ROOT,
.egress = 1,
},
- .nb_flows = 1, /* One default flow rule for all. */
+ .nb_flows = MLX5_HW_CTRL_FLOW_NB_RULES,
};
struct mlx5_flow_template_table_cfg tx_tbl_cfg = {
.attr = tx_tbl_attr,
@@ -16004,21 +16004,18 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
}
int
-mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
+mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
{
struct mlx5_priv *priv = dev->data->dev_private;
- struct rte_flow_item_eth promisc = {
- .hdr.dst_addr.addr_bytes = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
- .hdr.src_addr.addr_bytes = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
- .hdr.ether_type = 0,
+ struct mlx5_rte_flow_item_sq sq_spec = {
+ .queue = sqn,
};
- struct rte_flow_item eth_all[] = {
- [0] = {
- .type = RTE_FLOW_ITEM_TYPE_ETH,
- .spec = &promisc,
- .mask = &promisc,
+ struct rte_flow_item items[] = {
+ {
+ .type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+ .spec = &sq_spec,
},
- [1] = {
+ {
.type = RTE_FLOW_ITEM_TYPE_END,
},
};
@@ -16048,6 +16045,7 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
};
struct mlx5_ctrl_flow_info flow_info = {
.type = MLX5_CTRL_FLOW_TYPE_TX_META_COPY,
+ .tx_repr_sq = sqn,
};
MLX5_ASSERT(priv->master);
@@ -16057,7 +16055,7 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev)
return 0;
return flow_hw_create_ctrl_flow(dev, dev,
priv->hw_ctrl_fdb->hw_tx_meta_cpy_tbl,
- eth_all, 0, copy_reg_action, 0, &flow_info, false);
+ items, 0, copy_reg_action, 0, &flow_info, external);
}
int
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 916ac03c16..e6acb56d4d 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1606,18 +1606,6 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
unsigned int i;
int ret;
- /*
- * With extended metadata enabled, the Tx metadata copy is handled by default
- * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
- * required when representor matching is disabled.
- */
- if (config->dv_esw_en &&
- !config->repr_matching &&
- config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
- priv->master) {
- if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev))
- goto error;
- }
for (i = 0; i < priv->txqs_n; ++i) {
struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
uint32_t queue;
@@ -1639,6 +1627,19 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
goto error;
}
}
+ /*
+ * With extended metadata enabled, the Tx metadata copy is handled by default
+ * Tx tagging flow rules, so default Tx flow rule is not needed. It is only
+ * required when representor matching is disabled.
+ */
+ if (config->dv_esw_en && !config->repr_matching &&
+ config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+ (priv->master || priv->representor)) {
+ if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, queue, false)) {
+ mlx5_txq_release(dev, i);
+ goto error;
+ }
+ }
mlx5_txq_release(dev, i);
}
if (config->fdb_def_rule) {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index b090d8274d..834ca541d5 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1459,6 +1459,14 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
return -rte_errno;
}
+
+ if (!priv->sh->config.repr_matching &&
+ priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+ mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, sq_num, true)) {
+ if (sq_miss_created)
+ mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
+ return -rte_errno;
+ }
return 0;
}
#endif
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/3] net/mlx5: fix control flow leakage for external SQ
2025-10-29 15:57 [PATCH 1/3] net/mlx5: fix multi process Tx default rules Gregory Etelson
@ 2025-10-29 15:57 ` Gregory Etelson
2025-10-29 15:57 ` [PATCH 3/3] net/mlx5: support flow metadata exchange between E-Switch and VM Gregory Etelson
2025-11-02 7:32 ` [PATCH 1/3] net/mlx5: fix multi process Tx default rules Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Gregory Etelson @ 2025-10-29 15:57 UTC (permalink / raw)
To: dev
Cc: getelson, mkashani, rasland, Viacheslav Ovsiienko, stable,
Dariusz Sosnowski, Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad,
Xueming Li
From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
There is the private API rte_pmd_mlx5_external_sq_enable(),
that allows application to create the Send Queue (SQ) on its own
and then enable this queue usage as "external SQ".
On this enabling call some implicit flows are created to provide
compliant SQs behavior - copy metadata register, forward queue
originated packet to correct VF, etc.
These implicit flows are marked as "external" ones, and there is
no cleanup on device start and stop for this kind of flows.
Also, PMD has no knowledge if external SQ is still in use by
application and implicit cleanup can not be performed.
As a result, on multiple device start/stop cycles application
re-creates and re-enables many external SQs, causing implicit
flow tables overflow.
To resolve this issue the rte_pmd_mlx5_external_sq_disable()
API is provided, that allows to application to notify PMD
the external SQ is not in usage anymore and related implicit
flows can be dismissed.
Fixes: 26e1eaf2dac4 ("net/mlx5: support device control for E-Switch default rule")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
drivers/net/mlx5/mlx5_flow.h | 12 ++--
drivers/net/mlx5/mlx5_flow_hw.c | 106 +++++++++++++++++++++++++++++++-
drivers/net/mlx5/mlx5_trigger.c | 2 +-
drivers/net/mlx5/mlx5_txq.c | 55 +++++++++++++++--
drivers/net/mlx5/rte_pmd_mlx5.h | 18 ++++++
5 files changed, 181 insertions(+), 12 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 07d2f4185c..adfe84ef54 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -3580,12 +3580,16 @@ int mlx5_flow_hw_flush_ctrl_flows(struct rte_eth_dev *dev);
int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
uint32_t sqn, bool external);
int mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev,
- uint32_t sqn);
+ uint32_t sqn, bool external);
int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
- uint32_t sqn,
- bool external);
-int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external);
+ uint32_t sqn, bool external);
+int mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
+ uint32_t sqn, bool external);
+int mlx5_flow_hw_create_tx_repr_matching_flow(struct rte_eth_dev *dev,
+ uint32_t sqn, bool external);
+int mlx5_flow_hw_destroy_tx_repr_matching_flow(struct rte_eth_dev *dev,
+ uint32_t sqn, bool external);
int mlx5_flow_hw_lacp_rx_flow(struct rte_eth_dev *dev);
int mlx5_flow_actions_validate(struct rte_eth_dev *dev,
const struct rte_flow_actions_template_attr *attr,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index d945c88eb0..eb3dcce59d 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15897,7 +15897,7 @@ flow_hw_is_matching_sq_miss_flow(struct mlx5_ctrl_flow_entry *cf,
}
int
-mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
+mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
{
uint16_t port_id = dev->data->port_id;
uint16_t proxy_port_id = dev->data->port_id;
@@ -15924,7 +15924,8 @@ mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t sqn)
!proxy_priv->hw_ctrl_fdb->hw_esw_sq_miss_root_tbl ||
!proxy_priv->hw_ctrl_fdb->hw_esw_sq_miss_tbl)
return 0;
- cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+ cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
+ LIST_FIRST(&proxy_priv->hw_ctrl_flows);
while (cf != NULL) {
cf_next = LIST_NEXT(cf, next);
if (flow_hw_is_matching_sq_miss_flow(cf, dev, sqn)) {
@@ -16058,8 +16059,58 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t
items, 0, copy_reg_action, 0, &flow_info, external);
}
+static bool
+flow_hw_is_matching_tx_mreg_copy_flow(struct mlx5_ctrl_flow_entry *cf,
+ struct rte_eth_dev *dev,
+ uint32_t sqn)
+{
+ if (cf->owner_dev != dev)
+ return false;
+ if (cf->info.type == MLX5_CTRL_FLOW_TYPE_TX_META_COPY && cf->info.tx_repr_sq == sqn)
+ return true;
+ return false;
+}
+
+int
+mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
+{
+ uint16_t port_id = dev->data->port_id;
+ uint16_t proxy_port_id = dev->data->port_id;
+ struct rte_eth_dev *proxy_dev;
+ struct mlx5_priv *proxy_priv;
+ struct mlx5_ctrl_flow_entry *cf;
+ struct mlx5_ctrl_flow_entry *cf_next;
+ int ret;
+
+ ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+ if (ret) {
+ DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+ "port must be present for default SQ miss flow rules to exist.",
+ port_id);
+ return ret;
+ }
+ proxy_dev = &rte_eth_devices[proxy_port_id];
+ proxy_priv = proxy_dev->data->dev_private;
+ if (!proxy_priv->dr_ctx ||
+ !proxy_priv->hw_ctrl_fdb ||
+ !proxy_priv->hw_ctrl_fdb->hw_tx_meta_cpy_tbl)
+ return 0;
+ cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
+ LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+ while (cf != NULL) {
+ cf_next = LIST_NEXT(cf, next);
+ if (flow_hw_is_matching_tx_mreg_copy_flow(cf, dev, sqn)) {
+ claim_zero(flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow));
+ LIST_REMOVE(cf, next);
+ mlx5_free(cf);
+ }
+ cf = cf_next;
+ }
+ return 0;
+}
+
int
-mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
+mlx5_flow_hw_create_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
{
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_rte_flow_item_sq sq_spec = {
@@ -16116,6 +16167,55 @@ mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool e
items, 0, actions, 0, &flow_info, external);
}
+static bool
+flow_hw_is_tx_matching_repr_matching_flow(struct mlx5_ctrl_flow_entry *cf,
+ struct rte_eth_dev *dev,
+ uint32_t sqn)
+{
+ if (cf->owner_dev != dev)
+ return false;
+ if (cf->info.type == MLX5_CTRL_FLOW_TYPE_TX_REPR_MATCH && cf->info.tx_repr_sq == sqn)
+ return true;
+ return false;
+}
+
+int
+mlx5_flow_hw_destroy_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
+{
+ uint16_t port_id = dev->data->port_id;
+ uint16_t proxy_port_id = dev->data->port_id;
+ struct rte_eth_dev *proxy_dev;
+ struct mlx5_priv *proxy_priv;
+ struct mlx5_ctrl_flow_entry *cf;
+ struct mlx5_ctrl_flow_entry *cf_next;
+ int ret;
+
+ ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
+ if (ret) {
+ DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u. Transfer proxy "
+ "port must be present for default SQ miss flow rules to exist.",
+ port_id);
+ return ret;
+ }
+ proxy_dev = &rte_eth_devices[proxy_port_id];
+ proxy_priv = proxy_dev->data->dev_private;
+ if (!proxy_priv->dr_ctx ||
+ !proxy_priv->hw_tx_repr_tagging_tbl)
+ return 0;
+ cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
+ LIST_FIRST(&proxy_priv->hw_ctrl_flows);
+ while (cf != NULL) {
+ cf_next = LIST_NEXT(cf, next);
+ if (flow_hw_is_tx_matching_repr_matching_flow(cf, dev, sqn)) {
+ claim_zero(flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow));
+ LIST_REMOVE(cf, next);
+ mlx5_free(cf);
+ }
+ cf = cf_next;
+ }
+ return 0;
+}
+
int
mlx5_flow_hw_lacp_rx_flow(struct rte_eth_dev *dev)
{
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e6acb56d4d..6acf398ccc 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1622,7 +1622,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
}
}
if (config->dv_esw_en && config->repr_matching) {
- if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue, false)) {
+ if (mlx5_flow_hw_create_tx_repr_matching_flow(dev, queue, false)) {
mlx5_txq_release(dev, i);
goto error;
}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 834ca541d5..1d258f979c 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1433,7 +1433,7 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
priv = dev->data->dev_private;
if ((!priv->representor && !priv->master) ||
!priv->sh->config.dv_esw_en) {
- DRV_LOG(ERR, "Port %u must be represetnor or master port in E-Switch mode.",
+ DRV_LOG(ERR, "Port %u must be representor or master port in E-Switch mode.",
port_id);
rte_errno = EINVAL;
return -rte_errno;
@@ -1454,9 +1454,9 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
}
if (priv->sh->config.repr_matching &&
- mlx5_flow_hw_tx_repr_matching_flow(dev, sq_num, true)) {
+ mlx5_flow_hw_create_tx_repr_matching_flow(dev, sq_num, true)) {
if (sq_miss_created)
- mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
+ mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true);
return -rte_errno;
}
@@ -1464,7 +1464,7 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, sq_num, true)) {
if (sq_miss_created)
- mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
+ mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true);
return -rte_errno;
}
return 0;
@@ -1478,6 +1478,53 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
return -rte_errno;
}
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_external_sq_disable, 25.11)
+int
+rte_pmd_mlx5_external_sq_disable(uint16_t port_id, uint32_t sq_num)
+{
+ struct rte_eth_dev *dev;
+ struct mlx5_priv *priv;
+
+ if (rte_eth_dev_is_valid_port(port_id) < 0) {
+ DRV_LOG(ERR, "There is no Ethernet device for port %u.",
+ port_id);
+ rte_errno = ENODEV;
+ return -rte_errno;
+ }
+ dev = &rte_eth_devices[port_id];
+ priv = dev->data->dev_private;
+ if ((!priv->representor && !priv->master) ||
+ !priv->sh->config.dv_esw_en) {
+ DRV_LOG(ERR, "Port %u must be representor or master port in E-Switch mode.",
+ port_id);
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+ if (sq_num == 0) {
+ DRV_LOG(ERR, "Invalid SQ number.");
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
+#ifdef HAVE_MLX5_HWS_SUPPORT
+ if (priv->sh->config.dv_flow_en == 2) {
+ if (priv->sh->config.fdb_def_rule &&
+ mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true))
+ return -rte_errno;
+ if (priv->sh->config.repr_matching &&
+ mlx5_flow_hw_destroy_tx_repr_matching_flow(dev, sq_num, true))
+ return -rte_errno;
+ if (!priv->sh->config.repr_matching &&
+ priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+ mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(dev, sq_num, true))
+ return -rte_errno;
+ return 0;
+ }
+#endif
+ /* Not supported for software steering. */
+ rte_errno = ENOTSUP;
+ return -rte_errno;
+}
+
/**
* Set the Tx queue dynamic timestamp (mask and offset)
*
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 4d4821afae..31f99e7a78 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -484,6 +484,24 @@ typedef void (*rte_pmd_mlx5_driver_event_callback_t)(uint16_t port_id,
const void *opaque);
+/**
+ * Disable traffic for external SQ. Should be invoked by application
+ * before destroying the external SQ.
+ *
+ * @param[in] port_id
+ * The port identifier of the Ethernet device.
+ * @param[in] sq_num
+ * SQ HW number.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ * Possible values for rte_errno:
+ * - EINVAL - invalid sq_number or port type.
+ * - ENODEV - there is no Ethernet device for this port id.
+ */
+__rte_experimental
+int rte_pmd_mlx5_external_sq_disable(uint16_t port_id, uint32_t sq_num);
+
/**
* Register mlx5 driver event callback.
*
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 3/3] net/mlx5: support flow metadata exchange between E-Switch and VM
2025-10-29 15:57 [PATCH 1/3] net/mlx5: fix multi process Tx default rules Gregory Etelson
2025-10-29 15:57 ` [PATCH 2/3] net/mlx5: fix control flow leakage for external SQ Gregory Etelson
@ 2025-10-29 15:57 ` Gregory Etelson
2025-11-02 7:32 ` [PATCH 1/3] net/mlx5: fix multi process Tx default rules Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Gregory Etelson @ 2025-10-29 15:57 UTC (permalink / raw)
To: dev
Cc: getelson, mkashani, rasland, Dariusz Sosnowski,
Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
MLX5 port firmware has wiped out flow metadata if a packet was moved
between E-switch and VM applications.
Starting from version 47.0274, firmware can be configured to preserve
flow metadata after a packet was transferred between E-Switch and VM
applications.
The patch allows VM application to work with ingress and egress flow
metadata:
* Support FDB-to-VPORT and VPORT-to-FDB bits in firmware VPORT table.
* If VM ingress flow has RSS or QUEUE actions, copy medata from
C1 register to B register.
* For VM egress flows, copy medata from register A to register C1.
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
doc/guides/nics/mlx5.rst | 10 ++
drivers/common/mlx5/mlx5_devx_cmds.c | 31 ++++++
drivers/common/mlx5/mlx5_devx_cmds.h | 5 +
drivers/common/mlx5/mlx5_prm.h | 55 +++++++++-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/mlx5/mlx5.h | 3 +-
drivers/net/mlx5/mlx5_flow.c | 74 ++++++++++----
drivers/net/mlx5/mlx5_flow.h | 18 +++-
drivers/net/mlx5/mlx5_flow_hw.c | 144 +++++++++++++++++++++++++--
drivers/net/mlx5/mlx5_trigger.c | 13 ++-
drivers/net/mlx5/mlx5_txq.c | 2 +-
11 files changed, 322 insertions(+), 37 deletions(-)
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 20056f61d6..fde98ae993 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -2482,6 +2482,16 @@ and it should be allowed to specify zero values as parameters
for the META and MARK flow items and actions.
In the same time, zero mask has no meaning and should be rejected on validation stage.
+Starting from firmware version 47.0274, if :ref:`switchdev mode <mlx5_switchdev>` was enabled
+flow metadata can be shared between flows in FDB and VF domains:
+
+* If metadata was attached to FDB flow and that flow transferred incoming packet to a VF,
+ representor, ingress flow bound to the VF can match the metadata.
+
+* If metadata was attached to VF egress flow, FDB flow can match the metadata.
+
+The metadata sharing functionality is controlled with firmware configuration.
+
Requirements
^^^^^^^^^^^^
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 5622847a4a..385759230a 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -487,6 +487,34 @@ mlx5_devx_cmd_destroy(struct mlx5_devx_obj *obj)
return ret;
}
+static int
+mlx5_devx_cmd_query_esw_vport_context(void *ctx,
+ struct mlx5_hca_attr *attr)
+{
+ uint32_t in[MLX5_ST_SZ_DW(query_esw_vport_context_in)] = {0};
+ uint32_t out[MLX5_ST_SZ_DW(query_esw_vport_context_out)] = {0};
+ void *vctx;
+ int rc;
+
+ MLX5_SET(query_esw_vport_context_in, in, opcode, MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT);
+ rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out, sizeof(out));
+ if (rc || MLX5_FW_STATUS(out)) {
+ DEVX_DRV_LOG(ERR, out, "query ESW vport context", NULL, 0);
+ return MLX5_DEVX_ERR_RC(rc);
+ }
+ vctx = MLX5_ADDR_OF(query_esw_vport_context_out, out, esw_vport_context);
+ attr->fdb_to_vport_reg_c = MLX5_GET(esw_vport_context, vctx, fdb_to_vport_reg_c);
+ if (attr->fdb_to_vport_reg_c != 0) {
+ attr->vport_to_fdb_metadata =
+ MLX5_GET(esw_vport_context, vctx, vport_to_fdb_metadata);
+ attr->fdb_to_vport_metadata =
+ MLX5_GET(esw_vport_context, vctx, fdb_to_vport_metadata);
+ attr->fdb_to_vport_reg_c_id =
+ MLX5_GET(esw_vport_context, vctx, fdb_to_vport_reg_c_id);
+ }
+ return 0;
+}
+
/**
* Query NIC vport context.
* Fills minimal inline attribute.
@@ -531,6 +559,8 @@ mlx5_devx_cmd_query_nic_vport_context(void *ctx,
min_wqe_inline_mode);
attr->system_image_guid = MLX5_GET64(nic_vport_context, vctx,
system_image_guid);
+ attr->vport_to_fdb_metadata = MLX5_GET(nic_vport_context, vctx, vport_to_fdb_metadata);
+ attr->fdb_to_vport_metadata = MLX5_GET(nic_vport_context, vctx, fdb_to_vport_metadata);
return 0;
}
@@ -1407,6 +1437,7 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
esw_manager_vport_number_valid);
attr->esw_mgr_vport_id =
MLX5_GET(esw_cap, hcattr, esw_manager_vport_number);
+ mlx5_devx_cmd_query_esw_vport_context(ctx, attr);
}
if (attr->eswitch_manager) {
uint32_t esw_reg, reg_c_8_15;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 01dbb40040..efae6826dc 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -262,6 +262,8 @@ struct mlx5_hca_attr {
uint32_t mini_cqe_resp_l3_l4_tag:1;
uint32_t enhanced_cqe_compression:1;
uint32_t pkt_integrity_match:1; /* 1 if HW supports integrity item */
+ uint32_t fdb_to_vport_metadata:1; /* 1 if enabled */
+ uint32_t vport_to_fdb_metadata:1; /* 1 if enabled */
struct mlx5_hca_qos_attr qos;
struct mlx5_hca_vdpa_attr vdpa;
struct mlx5_hca_flow_attr flow;
@@ -328,6 +330,9 @@ struct mlx5_hca_attr {
uint32_t fdb_unified_en:1;
uint32_t jump_fdb_rx_en:1;
uint32_t fdb_rx_set_flow_tag_stc:1;
+ uint32_t return_reg_id:16;
+ uint32_t fdb_to_vport_reg_c:1;
+ uint8_t fdb_to_vport_reg_c_id;
uint8_t max_header_modify_pattern_length;
uint64_t system_image_guid;
uint32_t log_max_conn_track_offload:5;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 2887f7354d..5db8d67cfc 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -1260,6 +1260,7 @@ enum {
MLX5_CMD_OP_INIT2INIT_QP = 0x50E,
MLX5_CMD_OP_SUSPEND_QP = 0x50F,
MLX5_CMD_OP_RESUME_QP = 0x510,
+ MLX5_CMD_OP_QUERY_ESW_VPORT_CONTEXT = 0x752,
MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
MLX5_CMD_OP_ALLOC_Q_COUNTER = 0x771,
MLX5_CMD_OP_QUERY_Q_COUNTER = 0x773,
@@ -2546,8 +2547,14 @@ struct mlx5_ifc_mac_address_layout_bits {
u8 mac_addr_31_0[0x20];
};
+/*
+ * NIC_Vport Context table
+ */
struct mlx5_ifc_nic_vport_context_bits {
- u8 reserved_at_0[0x5];
+ u8 multi_prio_sq[0x1]; /* 00h: bit 31 */
+ u8 vport_to_fdb_metadata[0x1]; /* 00h: bit 30 */
+ u8 fdb_to_vport_metadata[0x1]; /* 00h: bit 29 */
+ u8 reserved_at_0_28[0x2];
u8 min_wqe_inline_mode[0x3];
u8 reserved_at_8[0x15];
u8 disable_mc_local_lb[0x1];
@@ -2603,6 +2610,52 @@ struct mlx5_ifc_query_nic_vport_context_in_bits {
u8 reserved_at_68[0x18];
};
+/*
+ * Esw_Vport Context table
+ */
+struct mlx5_ifc_esw_vport_context_bits {
+ u8 fdb_to_vport_reg_c[0x1]; /* 00h bits 31 */
+ u8 vport_to_fdb_metadata[0x1]; /* 00h bits 30 */
+ u8 fdb_to_vport_metadata[0x1]; /* 00h bits 29 */
+ u8 vport_svlan_strip[0x1]; /* 00h bits 28 */
+ u8 vport_cvlan_strip[0x1]; /* 00h bits 27 */
+ u8 vport_svlan_insert[0x1]; /* 00h bits 26 */
+ u8 vport_cvlan_insert[0x2]; /* 00h bits 25:24 */
+ u8 fdb_to_vport_reg_c_id[0x08]; /* 00h bits 23:16*/
+ u8 reserved_at_00_16[0x10]; /* 00h bits 15:00*/
+ u8 reserved[0xfc * CHAR_BIT];
+};
+
+enum mlx5_esw_vport_metadata_reg_cmap {
+ MLX5_ESW_VPORT_METADATA_REG_C_0 = 0,
+ MLX5_ESW_VPORT_METADATA_REG_C_1 = 1,
+ MLX5_ESW_VPORT_METADATA_REG_C_2 = 2,
+ MLX5_ESW_VPORT_METADATA_REG_C_3 = 3,
+ MLX5_ESW_VPORT_METADATA_REG_C_4 = 4,
+ MLX5_ESW_VPORT_METADATA_REG_C_5 = 5,
+ MLX5_ESW_VPORT_METADATA_REG_C_6 = 6,
+ MLX5_ESW_VPORT_METADATA_REG_C_7 = 7,
+};
+
+struct mlx5_ifc_query_esw_vport_context_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+ u8 syndrome[0x20];
+ u8 reserved_at_40[0x40];
+ struct mlx5_ifc_esw_vport_context_bits esw_vport_context;
+};
+
+struct mlx5_ifc_query_esw_vport_context_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+ u8 reserved_at_04_31[0x10];
+ u8 op_mod[0x10];
+ u8 other_vport[0x1];
+ u8 reserved_at_08_30[0xf];
+ u8 vport_number[0x10];
+ u8 reserved_at_0c_31[0x20];
+};
+
struct mlx5_ifc_tisc_bits {
u8 strict_lag_tx_port_affinity[0x1];
u8 reserved_at_1[0x3];
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b018a4f0e2..6686dd7587 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1651,7 +1651,9 @@ mlx5_init_hws_flow_tags_registers(struct mlx5_dev_ctx_shared *sh)
unset |= 1 << mlx5_regc_index(REG_C_6);
if (sh->config.dv_esw_en)
unset |= 1 << mlx5_regc_index(REG_C_0);
- if (meta_mode == MLX5_XMETA_MODE_META32_HWS)
+ if (meta_mode == MLX5_XMETA_MODE_META32_HWS ||
+ mlx5_vport_rx_metadata_passing_enabled(sh) ||
+ mlx5_vport_tx_metadata_passing_enabled(sh))
unset |= 1 << mlx5_regc_index(REG_C_1);
masks &= ~unset;
for (i = 0, j = 0; i < MLX5_FLOW_HW_TAGS_MAX; i++) {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 07418b0922..e955c19217 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2032,7 +2032,8 @@ struct mlx5_priv {
rte_spinlock_t hw_ctrl_lock;
LIST_HEAD(hw_ctrl_flow, mlx5_ctrl_flow_entry) hw_ctrl_flows;
LIST_HEAD(hw_ext_ctrl_flow, mlx5_ctrl_flow_entry) hw_ext_ctrl_flows;
- struct mlx5_flow_hw_ctrl_fdb *hw_ctrl_fdb;
+ struct mlx5_flow_hw_ctrl_fdb *hw_ctrl_fdb; /* FDB control flow context */
+ struct mlx5_flow_hw_ctrl_nic *hw_ctrl_nic; /* NIC control flow context */
struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
struct rte_flow_actions_template *hw_tx_repr_tagging_at;
struct rte_flow_template_table *hw_tx_repr_tagging_tbl;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1de398982a..2dcdddbe74 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1238,33 +1238,43 @@ mlx5_flow_get_reg_id(struct rte_eth_dev *dev,
case MLX5_HAIRPIN_TX:
return REG_A;
case MLX5_METADATA_RX:
- switch (config->dv_xmeta_en) {
- case MLX5_XMETA_MODE_LEGACY:
- return REG_B;
- case MLX5_XMETA_MODE_META16:
- return REG_C_0;
- case MLX5_XMETA_MODE_META32:
- return REG_C_1;
- case MLX5_XMETA_MODE_META32_HWS:
+ if (mlx5_vport_rx_metadata_passing_enabled(priv->sh)) {
return REG_C_1;
+ } else {
+ switch (config->dv_xmeta_en) {
+ case MLX5_XMETA_MODE_LEGACY:
+ return REG_B;
+ case MLX5_XMETA_MODE_META16:
+ return REG_C_0;
+ case MLX5_XMETA_MODE_META32:
+ return REG_C_1;
+ case MLX5_XMETA_MODE_META32_HWS:
+ return REG_C_1;
+ }
}
break;
case MLX5_METADATA_TX:
- if (config->dv_flow_en == 2 && config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
+ if ((config->dv_flow_en == 2 &&
+ config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) ||
+ mlx5_vport_tx_metadata_passing_enabled(priv->sh)) {
return REG_C_1;
} else {
return REG_A;
}
case MLX5_METADATA_FDB:
- switch (config->dv_xmeta_en) {
- case MLX5_XMETA_MODE_LEGACY:
- return REG_NON;
- case MLX5_XMETA_MODE_META16:
- return REG_C_0;
- case MLX5_XMETA_MODE_META32:
- return REG_C_1;
- case MLX5_XMETA_MODE_META32_HWS:
+ if (mlx5_esw_metadata_passing_enabled(priv->sh)) {
return REG_C_1;
+ } else {
+ switch (config->dv_xmeta_en) {
+ case MLX5_XMETA_MODE_LEGACY:
+ return REG_NON;
+ case MLX5_XMETA_MODE_META16:
+ return REG_C_0;
+ case MLX5_XMETA_MODE_META32:
+ return REG_C_1;
+ case MLX5_XMETA_MODE_META32_HWS:
+ return REG_C_1;
+ }
}
break;
case MLX5_FLOW_MARK:
@@ -12526,3 +12536,33 @@ rte_pmd_mlx5_enable_steering(void)
return 0;
}
+
+bool
+mlx5_vport_rx_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh)
+{
+ const struct mlx5_sh_config *dev_config = &sh->config;
+ const struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+
+ return !dev_config->dv_esw_en && hca_attr->fdb_to_vport_metadata;
+}
+
+bool
+mlx5_vport_tx_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh)
+{
+ const struct mlx5_sh_config *dev_config = &sh->config;
+ const struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+
+ return !dev_config->dv_esw_en && hca_attr->vport_to_fdb_metadata;
+}
+
+bool
+mlx5_esw_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh)
+{
+ const struct mlx5_sh_config *dev_config = &sh->config;
+ const struct mlx5_hca_attr *hca_attr = &sh->cdev->config.hca_attr;
+ bool fdb_to_vport_metadata_on = (hca_attr->fdb_to_vport_reg_c_id &
+ RTE_BIT32(MLX5_ESW_VPORT_METADATA_REG_C_1)) != 0;
+
+ return dev_config->dv_esw_en && hca_attr->fdb_to_vport_reg_c && fdb_to_vport_metadata_on &&
+ hca_attr->vport_to_fdb_metadata && hca_attr->fdb_to_vport_metadata;
+}
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index adfe84ef54..2a7d22dfad 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1755,6 +1755,10 @@ struct rte_flow_template_table {
struct mlx5_dr_rule_action_container rule_acts[];
};
+bool mlx5_vport_rx_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh);
+bool mlx5_vport_tx_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh);
+bool mlx5_esw_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh);
+
static __rte_always_inline struct mlx5dr_matcher *
mlx5_table_matcher(const struct rte_flow_template_table *table)
{
@@ -1799,6 +1803,11 @@ flow_hw_get_reg_id_by_domain(struct rte_eth_dev *dev,
sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS) {
return REG_C_1;
}
+ if ((mlx5_vport_rx_metadata_passing_enabled(sh) &&
+ domain_type == MLX5DR_TABLE_TYPE_NIC_RX) ||
+ (mlx5_vport_tx_metadata_passing_enabled(sh) &&
+ domain_type == MLX5DR_TABLE_TYPE_NIC_TX))
+ return REG_C_1;
/*
* On root table - PMD allows only egress META matching, thus
* REG_A matching is sufficient.
@@ -3015,6 +3024,12 @@ struct mlx5_flow_hw_ctrl_fdb {
struct rte_flow_template_table *hw_lacp_rx_tbl;
};
+struct mlx5_flow_hw_ctrl_nic {
+ struct rte_flow_pattern_template *tx_meta_items_tmpl;
+ struct rte_flow_actions_template *tx_meta_actions_tmpl;
+ struct rte_flow_template_table *hw_tx_meta_cpy_tbl;
+};
+
#define MLX5_CTRL_PROMISCUOUS (RTE_BIT32(0))
#define MLX5_CTRL_ALL_MULTICAST (RTE_BIT32(1))
#define MLX5_CTRL_BROADCAST (RTE_BIT32(2))
@@ -3582,8 +3597,9 @@ int mlx5_flow_hw_esw_create_sq_miss_flow(struct rte_eth_dev *dev,
int mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev,
uint32_t sqn, bool external);
int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
-int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
+int mlx5_flow_hw_create_fdb_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
uint32_t sqn, bool external);
+int mlx5_flow_hw_create_nic_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t sqn);
int mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
uint32_t sqn, bool external);
int mlx5_flow_hw_create_tx_repr_matching_flow(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index eb3dcce59d..ff68483a40 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -5324,6 +5324,15 @@ __translate_group(struct rte_eth_dev *dev,
NULL,
"group index not supported");
*table_group = group + 1;
+ } else if (mlx5_vport_tx_metadata_passing_enabled(priv->sh) && flow_attr->egress) {
+ /*
+ * If VM cross GVMI metadata Tx was enabled, PMD creates a default
+ * flow rule in the group 0 to copy metadata value.
+ */
+ if (group > MLX5_HW_MAX_EGRESS_GROUP)
+ return rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+ NULL, "group index not supported");
+ *table_group = group + 1;
} else {
*table_group = group;
}
@@ -8006,14 +8015,17 @@ __flow_hw_actions_template_create(struct rte_eth_dev *dev,
mf_masks[expand_mf_num] = quota_color_inc_mask;
expand_mf_num++;
}
- if (priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
- priv->sh->config.dv_esw_en &&
- !attr->transfer &&
+ if (attr->ingress &&
(action_flags & (MLX5_FLOW_ACTION_QUEUE | MLX5_FLOW_ACTION_RSS))) {
- /* Insert META copy */
- mf_actions[expand_mf_num] = rx_meta_copy_action;
- mf_masks[expand_mf_num] = rx_meta_copy_mask;
- expand_mf_num++;
+ if ((priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
+ priv->sh->config.dv_esw_en) ||
+ mlx5_vport_rx_metadata_passing_enabled(priv->sh)) {
+ /* Insert META copy */
+ mf_actions[expand_mf_num] = rx_meta_copy_action;
+ mf_masks[expand_mf_num] = rx_meta_copy_mask;
+ expand_mf_num++;
+ MLX5_ASSERT(expand_mf_num <= MLX5_HW_MAX_ACTS);
+ }
}
if (expand_mf_num) {
if (act_num + expand_mf_num > MLX5_HW_MAX_ACTS) {
@@ -10809,7 +10821,7 @@ flow_hw_create_lacp_rx_table(struct rte_eth_dev *dev,
* 0 on success, negative values otherwise
*/
static int
-flow_hw_create_ctrl_tables(struct rte_eth_dev *dev, struct rte_flow_error *error)
+flow_hw_create_fdb_ctrl_tables(struct rte_eth_dev *dev, struct rte_flow_error *error)
{
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_flow_hw_ctrl_fdb *hw_ctrl_fdb;
@@ -10958,6 +10970,59 @@ flow_hw_create_ctrl_tables(struct rte_eth_dev *dev, struct rte_flow_error *error
return -EINVAL;
}
+static void
+flow_hw_cleanup_ctrl_nic_tables(struct rte_eth_dev *dev)
+{
+ struct mlx5_priv *priv = dev->data->dev_private;
+ struct mlx5_flow_hw_ctrl_nic *ctrl = priv->hw_ctrl_nic;
+
+ if (ctrl == NULL)
+ return;
+ if (ctrl->hw_tx_meta_cpy_tbl)
+ claim_zero(flow_hw_table_destroy(dev, ctrl->hw_tx_meta_cpy_tbl, NULL));
+ if (ctrl->tx_meta_items_tmpl != NULL)
+ claim_zero(flow_hw_pattern_template_destroy(dev, ctrl->tx_meta_items_tmpl, NULL));
+ if (ctrl->tx_meta_actions_tmpl != NULL)
+ claim_zero(flow_hw_actions_template_destroy(dev, ctrl->tx_meta_actions_tmpl, NULL));
+ mlx5_free(ctrl);
+ priv->hw_ctrl_nic = NULL;
+}
+
+static int
+flow_hw_create_nic_ctrl_tables(struct rte_eth_dev *dev, struct rte_flow_error *error)
+{
+ struct mlx5_priv *priv = dev->data->dev_private;
+
+ struct mlx5_flow_hw_ctrl_nic *ctrl = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*ctrl),
+ 0, SOCKET_ID_ANY);
+ if (!ctrl)
+ return rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+ "failed to allocate port control flow table");
+ priv->hw_ctrl_nic = ctrl;
+ ctrl->tx_meta_items_tmpl = flow_hw_create_tx_repr_sq_pattern_tmpl(dev, error);
+ if (ctrl->tx_meta_items_tmpl == NULL)
+ goto error;
+ ctrl->tx_meta_actions_tmpl =
+ flow_hw_create_tx_default_mreg_copy_actions_template(dev, error);
+ if (ctrl->tx_meta_actions_tmpl == NULL) {
+ rte_flow_error_set(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+ "failed to create default Tx metadata copy actions template");
+ goto error;
+ }
+ ctrl->hw_tx_meta_cpy_tbl =
+ flow_hw_create_tx_default_mreg_copy_table(dev, ctrl->tx_meta_items_tmpl,
+ ctrl->tx_meta_actions_tmpl, error);
+ if (ctrl->hw_tx_meta_cpy_tbl == NULL) {
+ rte_flow_error_set(error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+ "failed to create default Tx metadata copy table");
+ }
+ return 0;
+
+error:
+ flow_hw_cleanup_ctrl_nic_tables(dev);
+ return -rte_errno;
+}
+
static void
flow_hw_ct_mng_destroy(struct rte_eth_dev *dev,
struct mlx5_aso_ct_pools_mng *ct_mng)
@@ -11697,6 +11762,7 @@ __flow_hw_resource_release(struct rte_eth_dev *dev, bool ctx_close)
flow_hw_rxq_flag_set(dev, false);
flow_hw_flush_all_ctrl_flows(dev);
flow_hw_cleanup_ctrl_fdb_tables(dev);
+ flow_hw_cleanup_ctrl_nic_tables(dev);
flow_hw_cleanup_tx_repr_tagging(dev);
flow_hw_cleanup_ctrl_rx_tables(dev);
flow_hw_action_template_drop_release(dev);
@@ -12141,12 +12207,19 @@ __flow_hw_configure(struct rte_eth_dev *dev,
NULL, "Failed to create vport actions.");
goto err;
}
- ret = flow_hw_create_ctrl_tables(dev, error);
+ ret = flow_hw_create_fdb_ctrl_tables(dev, error);
if (ret) {
rte_errno = -ret;
goto err;
}
}
+ if (mlx5_vport_tx_metadata_passing_enabled(priv->sh)) {
+ ret = flow_hw_create_nic_ctrl_tables(dev, error);
+ if (ret != 0) {
+ rte_errno = -ret;
+ goto err;
+ }
+ }
if (!priv->shared_host)
flow_hw_create_send_to_kernel_actions(priv, is_proxy);
if (port_attr->nb_conn_tracks || (host_priv && host_priv->hws_ctpool)) {
@@ -16005,7 +16078,8 @@ mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev)
}
int
-mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t sqn, bool external)
+mlx5_flow_hw_create_fdb_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
+ uint32_t sqn, bool external)
{
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_rte_flow_item_sq sq_spec = {
@@ -16059,6 +16133,56 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t
items, 0, copy_reg_action, 0, &flow_info, external);
}
+int
+mlx5_flow_hw_create_nic_tx_default_mreg_copy_flow(struct rte_eth_dev *dev, uint32_t sqn)
+{
+ struct mlx5_priv *priv = dev->data->dev_private;
+ struct mlx5_rte_flow_item_sq sq_spec = {
+ .queue = sqn,
+ };
+ struct rte_flow_item items[] = {
+ {
+ .type = (enum rte_flow_item_type)MLX5_RTE_FLOW_ITEM_TYPE_SQ,
+ .spec = &sq_spec,
+ },
+ {
+ .type = RTE_FLOW_ITEM_TYPE_END,
+ },
+ };
+ struct rte_flow_action_modify_field mreg_action = {
+ .operation = RTE_FLOW_MODIFY_SET,
+ .dst = {
+ .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+ .tag_index = REG_C_1,
+ },
+ .src = {
+ .field = (enum rte_flow_field_id)MLX5_RTE_FLOW_FIELD_META_REG,
+ .tag_index = REG_A,
+ },
+ .width = 32,
+ };
+ struct rte_flow_action copy_reg_action[] = {
+ [0] = {
+ .type = RTE_FLOW_ACTION_TYPE_MODIFY_FIELD,
+ .conf = &mreg_action,
+ },
+ [1] = {
+ .type = RTE_FLOW_ACTION_TYPE_JUMP,
+ },
+ [2] = {
+ .type = RTE_FLOW_ACTION_TYPE_END,
+ },
+ };
+ struct mlx5_ctrl_flow_info flow_info = {
+ .type = MLX5_CTRL_FLOW_TYPE_TX_META_COPY,
+ .tx_repr_sq = sqn,
+ };
+
+ return flow_hw_create_ctrl_flow(dev, dev,
+ priv->hw_ctrl_nic->hw_tx_meta_cpy_tbl,
+ items, 0, copy_reg_action, 0, &flow_info, false);
+}
+
static bool
flow_hw_is_matching_tx_mreg_copy_flow(struct mlx5_ctrl_flow_entry *cf,
struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6acf398ccc..996c1eb6ac 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1604,7 +1604,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
struct mlx5_sh_config *config = &priv->sh->config;
uint64_t flags = 0;
unsigned int i;
- int ret;
+ int ret = 0;
for (i = 0; i < priv->txqs_n; ++i) {
struct mlx5_txq_ctrl *txq = mlx5_txq_get(dev, i);
@@ -1635,10 +1635,13 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
if (config->dv_esw_en && !config->repr_matching &&
config->dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
(priv->master || priv->representor)) {
- if (mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, queue, false)) {
- mlx5_txq_release(dev, i);
- goto error;
- }
+ ret = mlx5_flow_hw_create_fdb_tx_default_mreg_copy_flow(dev, queue, false);
+ } else if (mlx5_vport_tx_metadata_passing_enabled(priv->sh)) {
+ ret = mlx5_flow_hw_create_nic_tx_default_mreg_copy_flow(dev, queue);
+ }
+ if (ret != 0) {
+ mlx5_txq_release(dev, i);
+ goto error;
}
mlx5_txq_release(dev, i);
}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 1d258f979c..e20165d74e 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1462,7 +1462,7 @@ rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num)
if (!priv->sh->config.repr_matching &&
priv->sh->config.dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS &&
- mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, sq_num, true)) {
+ mlx5_flow_hw_create_fdb_tx_default_mreg_copy_flow(dev, sq_num, true)) {
if (sq_miss_created)
mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true);
return -rte_errno;
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/3] net/mlx5: fix multi process Tx default rules
2025-10-29 15:57 [PATCH 1/3] net/mlx5: fix multi process Tx default rules Gregory Etelson
2025-10-29 15:57 ` [PATCH 2/3] net/mlx5: fix control flow leakage for external SQ Gregory Etelson
2025-10-29 15:57 ` [PATCH 3/3] net/mlx5: support flow metadata exchange between E-Switch and VM Gregory Etelson
@ 2025-11-02 7:32 ` Raslan Darawsheh
2 siblings, 0 replies; 4+ messages in thread
From: Raslan Darawsheh @ 2025-11-02 7:32 UTC (permalink / raw)
To: Gregory Etelson, dev
Cc: mkashani, Michael Baum, dsosnowski, Viacheslav Ovsiienko,
Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad
Hi,
On 29/10/2025 5:57 PM, Gregory Etelson wrote:
> From: Michael Baum <michaelba@nvidia.com>
>
> When representor matching is disabled, an egress default rule is
> inserted which matches all and copies REG_A to REG_C_1 (when dv_xmeta_en
> == 4) and jump to group 1. All user rules started from group 1.
>
> When 2 processes are working together, the first one creates this flow
> rule and the second one is failed with errno EEXIST. This renders all
> user egress rules in 2nd process to be invalid.
>
> This patch changes this default rule match on SQs.
>
> Fixes: 483181f7b6dd ("net/mlx5: support device control of representor matching")
> Cc: dsosnowski@nvidia.com
>
> Signed-off-by: Michael Baum <michaelba@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
>
Series applied to next-net-mlx,
Kindest regards
Raslan Darawsheh
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-11-02 7:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-29 15:57 [PATCH 1/3] net/mlx5: fix multi process Tx default rules Gregory Etelson
2025-10-29 15:57 ` [PATCH 2/3] net/mlx5: fix control flow leakage for external SQ Gregory Etelson
2025-10-29 15:57 ` [PATCH 3/3] net/mlx5: support flow metadata exchange between E-Switch and VM Gregory Etelson
2025-11-02 7:32 ` [PATCH 1/3] net/mlx5: fix multi process Tx default rules Raslan Darawsheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).