From: Shani Peretz <shperetz@nvidia.com>
To: Maayan Kashani <mkashani@nvidia.com>,
"stable@dpdk.org" <stable@dpdk.org>
Cc: Maayan Kashani <mkashani@nvidia.com>,
Dariusz Sosnowski <dsosnowski@nvidia.com>,
Raslan Darawsheh <rasland@nvidia.com>,
Matan Azrad <matan@nvidia.com>,
Slava Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>
Subject: RE: [PATCH 22.11 v2] net/mlx5: fix device start error handling
Date: Tue, 30 Dec 2025 10:58:05 +0000 [thread overview]
Message-ID: <MW4PR12MB7484484FCABD30A89A209E4FBFBCA@MW4PR12MB7484.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20251125110927.178331-1-mkashani@nvidia.com>
> -----Original Message-----
> From: Maayan Kashani <mkashani@nvidia.com>
> Sent: Tuesday, 25 November 2025 13:09
> To: stable@dpdk.org
> Cc: Maayan Kashani <mkashani@nvidia.com>; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>; Matan
> Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Bing
> Zhao <bingz@mellanox.com>
> Subject: [PATCH 22.11 v2] net/mlx5: fix device start error handling
>
> External email: Use caution opening links or attachments
>
>
> [ upstream commit 860f6c63dbc1 ]
>
> When mlx5_dev_start() fails partway through initialization, the error cleanup
> code unconditionally calls cleanup functions for all steps, including those that
> were never successfully initialized. This causes state corruption leading to
> incorrect behavior on subsequent start attempts.
>
> The issue manifests as:
> 1. First start attempt fails with -ENOMEM (expected) 2. Second start attempt
> returns -EINVAL instead of -ENOMEM 3. With flow isolated mode, second
> attempt incorrectly succeeds,
> leading to segfault in rte_eth_rx_burst()
>
> Root cause: The single error label cleanup path calls functions like
> mlx5_traffic_disable() and mlx5_flow_stop_default() even when their
> corresponding initialization functions (mlx5_traffic_enable() and
> mlx5_flow_start_default()) were never called due to earlier failure.
>
> For example, when mlx5_rxq_start() fails:
> - mlx5_traffic_enable() at line 1403 never executes
> - mlx5_flow_start_default() at line 1420 never executes
> - But cleanup unconditionally calls:
> * mlx5_traffic_disable() - destroys control flows list
> * mlx5_flow_stop_default() - corrupts flow metadata state
>
> This corrupts the device state, causing subsequent start attempts to fail with
> different errors or, in isolated mode, to incorrectly succeed with an improperly
> initialized device.
>
> Fix by replacing the single error label with cascading error labels (Linux kernel
> style). Each label cleans up only its corresponding step, then falls through to
> clean up earlier steps.
> This ensures only successfully initialized steps are cleaned up, maintaining
> device state consistency across failed start attempts.
>
> Bugzilla ID: 1419
> Fixes: 8db7e3b69822 ("net/mlx5: change operations for non-cached flows")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maayan Kashani <mkashani@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---
> drivers/net/mlx5/mlx5_trigger.c | 60 +++++++++++++++++++++++----------
> 1 file changed, 42 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index f72ed7f820f..16c3164bcf1 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -1133,6 +1133,11 @@ mlx5_hw_representor_port_allowed_start(struct
> rte_eth_dev *dev)
>
> #endif
>
> +#define SAVE_RTE_ERRNO_AND_STOP(ret, dev) do { \
> + ret = rte_errno; \
> + (dev)->data->dev_started = 0; \
> +} while (0)
> +
> /**
> * DPDK callback to start the device.
> *
> @@ -1203,19 +1208,23 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u Tx packet pacing init failed: %s",
> dev->data->port_id, strerror(rte_errno));
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> goto error;
> }
> if (mlx5_devx_obj_ops_en(priv->sh) &&
> priv->obj_ops.lb_dummy_queue_create) {
> ret = priv->obj_ops.lb_dummy_queue_create(dev);
> - if (ret)
> - goto error;
> + if (ret) {
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto txpp_stop;
> + }
> }
> ret = mlx5_txq_start(dev);
> if (ret) {
> DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
> dev->data->port_id, strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto lb_dummy_queue_release;
> }
> if (priv->config.std_delay_drop || priv->config.hp_delay_drop) {
> if (!priv->sh->dev_cap.vf && !priv->sh->dev_cap.sf && @@ -1239,7
> +1248,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
> dev->data->port_id, strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto txq_stop;
> }
> /*
> * Such step will be skipped if there is no hairpin TX queue configured @@
> -1249,7 +1259,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u hairpin auto binding failed: %s",
> dev->data->port_id, strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto rxq_stop;
> }
> /* Set started flag here for the following steps like control flow. */
> dev->data->dev_started = 1;
> @@ -1257,7 +1268,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u Rx interrupt vector creation failed",
> dev->data->port_id);
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto rxq_stop;
> }
> mlx5_os_stats_init(dev);
> /*
> @@ -1269,7 +1281,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> DRV_LOG(ERR,
> "port %u failed to attach indirect actions: %s",
> dev->data->port_id, rte_strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto rx_intr_vec_disable;
> }
> #ifdef HAVE_MLX5_HWS_SUPPORT
> if (priv->sh->config.dv_flow_en == 2) { @@ -1277,7 +1290,8 @@
> mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u failed to update HWS tables",
> dev->data->port_id);
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto action_handle_detach;
> }
> }
> #endif
> @@ -1285,7 +1299,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(ERR, "port %u failed to set defaults flows",
> dev->data->port_id);
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto action_handle_detach;
> }
> /* Set a mask and offset of dynamic metadata flows into Rx queues. */
> mlx5_flow_rxq_dynf_metadata_set(dev);
> @@ -1302,12 +1317,14 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> if (ret) {
> DRV_LOG(DEBUG, "port %u failed to start default actions: %s",
> dev->data->port_id, strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto traffic_disable;
> }
> if (mlx5_dev_ctx_shared_mempool_subscribe(dev) != 0) {
> DRV_LOG(ERR, "port %u failed to subscribe for mempool life cycle:
> %s",
> dev->data->port_id, rte_strerror(rte_errno));
> - goto error;
> + SAVE_RTE_ERRNO_AND_STOP(ret, dev);
> + goto stop_default;
> }
> rte_wmb();
> dev->tx_pkt_burst = mlx5_select_tx_function(dev); @@ -1334,18
> +1351,25 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> priv->sh->port[priv->dev_port - 1].devx_ih_port_id =
> (uint32_t)dev->data->port_id;
> return 0;
> -error:
> - ret = rte_errno; /* Save rte_errno before cleanup. */
> - /* Rollback. */
> - dev->data->dev_started = 0;
> +stop_default:
> mlx5_flow_stop_default(dev);
> +traffic_disable:
> mlx5_traffic_disable(dev);
> - mlx5_txq_stop(dev);
> +action_handle_detach:
> + mlx5_action_handle_detach(dev);
> +rx_intr_vec_disable:
> + mlx5_rx_intr_vec_disable(dev);
> +rxq_stop:
> mlx5_rxq_stop(dev);
> +txq_stop:
> + mlx5_txq_stop(dev);
> +lb_dummy_queue_release:
> if (priv->obj_ops.lb_dummy_queue_release)
> priv->obj_ops.lb_dummy_queue_release(dev);
> - mlx5_txpp_stop(dev); /* Stop last. */
> - rte_errno = ret; /* Restore rte_errno. */
> +txpp_stop:
> + mlx5_txpp_stop(dev);
> +error:
> + rte_errno = ret;
> return -rte_errno;
> }
>
> --
> 2.43.0
Hey,
This patch has also been applied to 23.11.
Thanks,
Shani
prev parent reply other threads:[~2025-12-30 10:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 11:09 Maayan Kashani
2025-12-30 10:58 ` Shani Peretz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW4PR12MB7484484FCABD30A89A209E4FBFBCA@MW4PR12MB7484.namprd12.prod.outlook.com \
--to=shperetz@nvidia.com \
--cc=bingz@nvidia.com \
--cc=dsosnowski@nvidia.com \
--cc=matan@nvidia.com \
--cc=mkashani@nvidia.com \
--cc=rasland@nvidia.com \
--cc=stable@dpdk.org \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).