From: Shani Peretz <shperetz@nvidia.com>
To: Slava Ovsiienko <viacheslavo@nvidia.com>,
"stable@dpdk.org" <stable@dpdk.org>
Cc: "ktraynor@redhat.com" <ktraynor@redhat.com>,
"bluca@debian.org" <bluca@debian.org>,
Xueming Li <xuemingl@nvidia.com>,
Dariusz Sosnowski <dsosnowski@nvidia.com>
Subject: RE: [PATCH 22.11] net/mlx5: fix control flow leakage for external SQ
Date: Tue, 30 Dec 2025 10:55:19 +0000 [thread overview]
Message-ID: <MW4PR12MB7484FE7EFB521FDB035F89E3BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20251118165158.1315992-1-viacheslavo@nvidia.com>
> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, 18 November 2025 18:52
> To: stable@dpdk.org
> Cc: ktraynor@redhat.com; bluca@debian.org; Xueming Li
> <xuemingl@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>
> Subject: [PATCH 22.11] net/mlx5: fix control flow leakage for external SQ
>
> External email: Use caution opening links or attachments
>
>
> [ upstream commit 3bf9f0f9f0beb8dcd4f3b316c3216a87bc9ab49f ]
>
> There is the private API rte_pmd_mlx5_external_sq_enable(),
> that allows application to create the Send Queue (SQ) on its own and then
> enable this queue usage as "external SQ".
>
> On this enabling call some implicit flows are created to provide compliant SQs
> behavior - copy metadata register, forward queue originated packet to correct
> VF, etc.
>
> These implicit flows are marked as "external" ones, and there is no cleanup on
> device start and stop for this kind of flows.
> Also, PMD has no knowledge if external SQ is still in use by application and
> implicit cleanup can not be performed.
>
> As a result, on multiple device start/stop cycles application re-creates and re-
> enables many external SQs, causing implicit flow tables overflow.
>
> To resolve this issue the rte_pmd_mlx5_external_sq_disable()
> API is provided, that allows to application to notify PMD the external SQ is not
> in usage anymore and related implicit flows can be dismissed.
>
> Fixes: 26e1eaf2dac4 ("net/mlx5: support device control for E-Switch default
> rule")
> Cc: stable@dpdk.org
>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---
> drivers/net/mlx5/mlx5_flow.h | 12 ++--
> drivers/net/mlx5/mlx5_flow_hw.c | 106
> +++++++++++++++++++++++++++++++-
> drivers/net/mlx5/mlx5_trigger.c | 2 +-
> drivers/net/mlx5/mlx5_txq.c | 54 ++++++++++++++--
> drivers/net/mlx5/rte_pmd_mlx5.h | 18 ++++++
> drivers/net/mlx5/version.map | 1 +
> 6 files changed, 181 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h index
> e5672b41f9..234afeb193 100644
> --- a/drivers/net/mlx5/mlx5_flow.h
> +++ b/drivers/net/mlx5/mlx5_flow.h
> @@ -2632,12 +2632,16 @@ int mlx5_flow_hw_flush_ctrl_flows(struct
> rte_eth_dev *dev); int mlx5_flow_hw_esw_create_sq_miss_flow(struct
> rte_eth_dev *dev,
> uint32_t sqn, bool external); int
> mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev,
> - uint32_t sqn);
> + uint32_t sqn, bool external);
> int mlx5_flow_hw_esw_create_default_jump_flow(struct rte_eth_dev *dev);
> int mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
> - uint32_t sqn,
> - bool external);
> -int mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t
> sqn, bool external);
> + uint32_t sqn, bool
> +external); int mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(struct
> rte_eth_dev *dev,
> + uint32_t sqn, bool
> +external); int mlx5_flow_hw_create_tx_repr_matching_flow(struct
> rte_eth_dev *dev,
> + uint32_t sqn, bool
> +external); int mlx5_flow_hw_destroy_tx_repr_matching_flow(struct
> rte_eth_dev *dev,
> + uint32_t sqn, bool
> +external);
> int mlx5_flow_hw_lacp_rx_flow(struct rte_eth_dev *dev); int
> mlx5_flow_actions_validate(struct rte_eth_dev *dev,
> const struct rte_flow_actions_template_attr *attr, diff --git
> a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index
> e3f6e1aa3a..a85b49d284 100644
> --- a/drivers/net/mlx5/mlx5_flow_hw.c
> +++ b/drivers/net/mlx5/mlx5_flow_hw.c
> @@ -9184,7 +9184,7 @@ flow_hw_is_matching_sq_miss_flow(struct
> mlx5_hw_ctrl_flow *cf, }
>
> int
> -mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t
> sqn)
> +mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t
> +sqn, bool external)
> {
> uint16_t port_id = dev->data->port_id;
> uint16_t proxy_port_id = dev->data->port_id; @@ -9211,7 +9211,8 @@
> mlx5_flow_hw_esw_destroy_sq_miss_flow(struct rte_eth_dev *dev, uint32_t
> sqn)
> !proxy_priv->hw_ctrl_fdb->hw_esw_sq_miss_root_tbl ||
> !proxy_priv->hw_ctrl_fdb->hw_esw_sq_miss_tbl)
> return 0;
> - cf = LIST_FIRST(&proxy_priv->hw_ctrl_flows);
> + cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
> + LIST_FIRST(&proxy_priv->hw_ctrl_flows);
> while (cf != NULL) {
> cf_next = LIST_NEXT(cf, next);
> if (flow_hw_is_matching_sq_miss_flow(cf, dev, sqn)) { @@ -9345,8
> +9346,58 @@ mlx5_flow_hw_create_tx_default_mreg_copy_flow(struct
> rte_eth_dev *dev, uint32_t
> items, 0, copy_reg_action, 0, &flow_info, external); }
>
> +static bool
> +flow_hw_is_matching_tx_mreg_copy_flow(struct mlx5_hw_ctrl_flow *cf,
> + struct rte_eth_dev *dev,
> + uint32_t sqn) {
> + if (cf->owner_dev != dev)
> + return false;
> + if (cf->info.type == MLX5_HW_CTRL_FLOW_TYPE_TX_META_COPY && cf-
> >info.tx_repr_sq == sqn)
> + return true;
> + return false;
> +}
> +
> +int
> +mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(struct rte_eth_dev *dev,
> +uint32_t sqn, bool external) {
> + uint16_t port_id = dev->data->port_id;
> + uint16_t proxy_port_id = dev->data->port_id;
> + struct rte_eth_dev *proxy_dev;
> + struct mlx5_priv *proxy_priv;
> + struct mlx5_hw_ctrl_flow *cf;
> + struct mlx5_hw_ctrl_flow *cf_next;
> + int ret;
> +
> + ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
> + if (ret) {
> + DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u.
> Transfer proxy "
> + "port must be present for default SQ miss flow rules to exist.",
> + port_id);
> + return ret;
> + }
> + proxy_dev = &rte_eth_devices[proxy_port_id];
> + proxy_priv = proxy_dev->data->dev_private;
> + if (!proxy_priv->dr_ctx ||
> + !proxy_priv->hw_ctrl_fdb ||
> + !proxy_priv->hw_ctrl_fdb->hw_tx_meta_cpy_tbl)
> + return 0;
> + cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
> + LIST_FIRST(&proxy_priv->hw_ctrl_flows);
> + while (cf != NULL) {
> + cf_next = LIST_NEXT(cf, next);
> + if (flow_hw_is_matching_tx_mreg_copy_flow(cf, dev, sqn)) {
> + claim_zero(flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow));
> + LIST_REMOVE(cf, next);
> + mlx5_free(cf);
> + }
> + cf = cf_next;
> + }
> + return 0;
> +}
> +
> int
> -mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn,
> bool external)
> +mlx5_flow_hw_create_tx_repr_matching_flow(struct rte_eth_dev *dev,
> +uint32_t sqn, bool external)
> {
> struct mlx5_priv *priv = dev->data->dev_private;
> struct mlx5_rte_flow_item_sq sq_spec = { @@ -9403,6 +9454,55 @@
> mlx5_flow_hw_tx_repr_matching_flow(struct rte_eth_dev *dev, uint32_t sqn,
> bool e
> items, 0, actions, 0, &flow_info, external); }
>
> +static bool
> +flow_hw_is_tx_matching_repr_matching_flow(struct mlx5_hw_ctrl_flow *cf,
> + struct rte_eth_dev *dev,
> + uint32_t sqn) {
> + if (cf->owner_dev != dev)
> + return false;
> + if (cf->info.type == MLX5_HW_CTRL_FLOW_TYPE_TX_REPR_MATCH && cf-
> >info.tx_repr_sq == sqn)
> + return true;
> + return false;
> +}
> +
> +int
> +mlx5_flow_hw_destroy_tx_repr_matching_flow(struct rte_eth_dev *dev,
> +uint32_t sqn, bool external) {
> + uint16_t port_id = dev->data->port_id;
> + uint16_t proxy_port_id = dev->data->port_id;
> + struct rte_eth_dev *proxy_dev;
> + struct mlx5_priv *proxy_priv;
> + struct mlx5_hw_ctrl_flow *cf;
> + struct mlx5_hw_ctrl_flow *cf_next;
> + int ret;
> +
> + ret = rte_flow_pick_transfer_proxy(port_id, &proxy_port_id, NULL);
> + if (ret) {
> + DRV_LOG(ERR, "Unable to pick transfer proxy port for port %u.
> Transfer proxy "
> + "port must be present for default SQ miss flow rules to exist.",
> + port_id);
> + return ret;
> + }
> + proxy_dev = &rte_eth_devices[proxy_port_id];
> + proxy_priv = proxy_dev->data->dev_private;
> + if (!proxy_priv->dr_ctx ||
> + !proxy_priv->hw_tx_repr_tagging_tbl)
> + return 0;
> + cf = external ? LIST_FIRST(&proxy_priv->hw_ext_ctrl_flows) :
> + LIST_FIRST(&proxy_priv->hw_ctrl_flows);
> + while (cf != NULL) {
> + cf_next = LIST_NEXT(cf, next);
> + if (flow_hw_is_tx_matching_repr_matching_flow(cf, dev, sqn)) {
> + claim_zero(flow_hw_destroy_ctrl_flow(proxy_dev, cf->flow));
> + LIST_REMOVE(cf, next);
> + mlx5_free(cf);
> + }
> + cf = cf_next;
> + }
> + return 0;
> +}
> +
> int
> mlx5_flow_hw_lacp_rx_flow(struct rte_eth_dev *dev) { diff --git
> a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index
> 1b19f79822..f72ed7f820 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -1495,7 +1495,7 @@ mlx5_traffic_enable_hws(struct rte_eth_dev *dev)
> }
> }
> if (config->dv_esw_en && config->repr_matching) {
> - if (mlx5_flow_hw_tx_repr_matching_flow(dev, queue, false)) {
> + if
> + (mlx5_flow_hw_create_tx_repr_matching_flow(dev, queue, false)) {
> mlx5_txq_release(dev, i);
> goto error;
> }
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 34c7ef400d..b5dab86e7b 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1308,7 +1308,7 @@ rte_pmd_mlx5_external_sq_enable(uint16_t
> port_id, uint32_t sq_num)
> priv = dev->data->dev_private;
> if ((!priv->representor && !priv->master) ||
> !priv->sh->config.dv_esw_en) {
> - DRV_LOG(ERR, "Port %u must be represetnor or master port in E-
> Switch mode.",
> + DRV_LOG(ERR, "Port %u must be representor or master port
> + in E-Switch mode.",
> port_id);
> rte_errno = EINVAL;
> return -rte_errno;
> @@ -1329,9 +1329,9 @@ rte_pmd_mlx5_external_sq_enable(uint16_t
> port_id, uint32_t sq_num)
> }
>
> if (priv->sh->config.repr_matching &&
> - mlx5_flow_hw_tx_repr_matching_flow(dev, sq_num, true)) {
> + mlx5_flow_hw_create_tx_repr_matching_flow(dev,
> + sq_num, true)) {
> if (sq_miss_created)
> - mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
> +
> + mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true);
> return -rte_errno;
> }
>
> @@ -1339,7 +1339,7 @@ rte_pmd_mlx5_external_sq_enable(uint16_t
> port_id, uint32_t sq_num)
> priv->sh->config.dv_xmeta_en ==
> MLX5_XMETA_MODE_META32_HWS &&
> mlx5_flow_hw_create_tx_default_mreg_copy_flow(dev, sq_num,
> true)) {
> if (sq_miss_created)
> - mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num);
> +
> + mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true);
> return -rte_errno;
> }
> return 0;
> @@ -1353,6 +1353,52 @@ rte_pmd_mlx5_external_sq_enable(uint16_t
> port_id, uint32_t sq_num)
> return -rte_errno;
> }
>
> +int
> +rte_pmd_mlx5_external_sq_disable(uint16_t port_id, uint32_t sq_num) {
> + struct rte_eth_dev *dev;
> + struct mlx5_priv *priv;
> +
> + if (rte_eth_dev_is_valid_port(port_id) < 0) {
> + DRV_LOG(ERR, "There is no Ethernet device for port %u.",
> + port_id);
> + rte_errno = ENODEV;
> + return -rte_errno;
> + }
> + dev = &rte_eth_devices[port_id];
> + priv = dev->data->dev_private;
> + if ((!priv->representor && !priv->master) ||
> + !priv->sh->config.dv_esw_en) {
> + DRV_LOG(ERR, "Port %u must be representor or master port in E-
> Switch mode.",
> + port_id);
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> + if (sq_num == 0) {
> + DRV_LOG(ERR, "Invalid SQ number.");
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> +#ifdef HAVE_MLX5_HWS_SUPPORT
> + if (priv->sh->config.dv_flow_en == 2) {
> + if (priv->sh->config.fdb_def_rule &&
> + mlx5_flow_hw_esw_destroy_sq_miss_flow(dev, sq_num, true))
> + return -rte_errno;
> + if (priv->sh->config.repr_matching &&
> + mlx5_flow_hw_destroy_tx_repr_matching_flow(dev, sq_num,
> true))
> + return -rte_errno;
> + if (!priv->sh->config.repr_matching &&
> + priv->sh->config.dv_xmeta_en ==
> MLX5_XMETA_MODE_META32_HWS &&
> + mlx5_flow_hw_destroy_tx_default_mreg_copy_flow(dev, sq_num,
> true))
> + return -rte_errno;
> + return 0;
> + }
> +#endif
> + /* Not supported for software steering. */
> + rte_errno = ENOTSUP;
> + return -rte_errno;
> +}
> +
> /**
> * Set the Tx queue dynamic timestamp (mask and offset)
> *
> diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h
> b/drivers/net/mlx5/rte_pmd_mlx5.h index 76c8ad73ca..6166a4d012 100644
> --- a/drivers/net/mlx5/rte_pmd_mlx5.h
> +++ b/drivers/net/mlx5/rte_pmd_mlx5.h
> @@ -161,6 +161,24 @@ int rte_pmd_mlx5_host_shaper_config(int port_id,
> uint8_t rate, uint32_t flags); __rte_experimental int
> rte_pmd_mlx5_external_sq_enable(uint16_t port_id, uint32_t sq_num);
>
> +/**
> + * Disable traffic for external SQ. Should be invoked by application
> + * before destroying the external SQ.
> + *
> + * @param[in] port_id
> + * The port identifier of the Ethernet device.
> + * @param[in] sq_num
> + * SQ HW number.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + * Possible values for rte_errno:
> + * - EINVAL - invalid sq_number or port type.
> + * - ENODEV - there is no Ethernet device for this port id.
> + */
> +__rte_experimental
> +int rte_pmd_mlx5_external_sq_disable(uint16_t port_id, uint32_t
> +sq_num);
> +
> #ifdef __cplusplus
> }
> #endif
> diff --git a/drivers/net/mlx5/version.map b/drivers/net/mlx5/version.map index
> 848270da13..6db031aff4 100644
> --- a/drivers/net/mlx5/version.map
> +++ b/drivers/net/mlx5/version.map
> @@ -15,4 +15,5 @@ EXPERIMENTAL {
> # added in 22.07
> rte_pmd_mlx5_host_shaper_config;
> rte_pmd_mlx5_external_sq_enable;
> + rte_pmd_mlx5_external_sq_disable;
> };
> --
> 2.34.1
Hey,
This patch has also been applied to 23.11.
Thanks,
Shani
prev parent reply other threads:[~2025-12-30 10:55 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 16:51 Viacheslav Ovsiienko
2025-11-21 14:16 ` Kevin Traynor
2025-12-30 10:55 ` Shani Peretz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW4PR12MB7484FE7EFB521FDB035F89E3BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com \
--to=shperetz@nvidia.com \
--cc=bluca@debian.org \
--cc=dsosnowski@nvidia.com \
--cc=ktraynor@redhat.com \
--cc=stable@dpdk.org \
--cc=viacheslavo@nvidia.com \
--cc=xuemingl@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).