* [PATCH] mlx5: fix race at mlx5_dev_close
@ 2024-04-11 6:17 hepeng
2024-10-07 17:54 ` Stephen Hemminger
0 siblings, 1 reply; 3+ messages in thread
From: hepeng @ 2024-04-11 6:17 UTC (permalink / raw)
To: dev; +Cc: hepeng.0320
From: "hepeng.0320" <hepeng.0320@bytedance.com>
mlx5_dev_close currently will set priv->sh->port[priv->dev_port -
1].nl_ih_port_id to RTE_MAX_ETHPORTS to avoid mlx5_dev_interrupt_nl_cb
to use the port's dev_private, because later the rte_eth_dev_close
will free the dev_private and set the pointer to NULL.
However, since mlx5_dev_interrupt_nl_cb is running in another thread,
I think the race still exists. So perhaps an easy fix is to wait for
1ms to avoid this race.
Signed-off-by: hepeng.0320 <hepeng.0320@bytedance.com>
---
drivers/net/mlx5/mlx5.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d1a6382..283162f 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2457,6 +2457,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
* mlx5_os_mac_addr_flush() uses ibdev_path for retrieving
* ifindex if Netlink fails.
*/
+
+ /* Avoid race condition if mlx5_dev_interrupt_nl_cb is running. */
+ rte_delay_us_sleep(1000);
+
mlx5_free_shared_dev_ctx(priv->sh);
if (priv->domain_id != RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) {
unsigned int c = 0;
--
2.11.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] mlx5: fix race at mlx5_dev_close
2024-04-11 6:17 [PATCH] mlx5: fix race at mlx5_dev_close hepeng
@ 2024-10-07 17:54 ` Stephen Hemminger
2025-05-09 2:47 ` [External] " 贺鹏
0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2024-10-07 17:54 UTC (permalink / raw)
To: hepeng; +Cc: dev
On Thu, 11 Apr 2024 14:17:40 +0800
hepeng <hepeng.0320@bytedance.com> wrote:
> From: "hepeng.0320" <hepeng.0320@bytedance.com>
>
> mlx5_dev_close currently will set priv->sh->port[priv->dev_port -
> 1].nl_ih_port_id to RTE_MAX_ETHPORTS to avoid mlx5_dev_interrupt_nl_cb
> to use the port's dev_private, because later the rte_eth_dev_close
> will free the dev_private and set the pointer to NULL.
>
> However, since mlx5_dev_interrupt_nl_cb is running in another thread,
> I think the race still exists. So perhaps an easy fix is to wait for
> 1ms to avoid this race.
>
> Signed-off-by: hepeng.0320 <hepeng.0320@bytedance.com>
Not the pest way to handle this. Adding a one second delay on shutdown
hurts some availability scenarios. Looks like mlx5 needs a more coordinated
shutdown to be safe; adding big delays is not the correct fix.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [External] Re: [PATCH] mlx5: fix race at mlx5_dev_close
2024-10-07 17:54 ` Stephen Hemminger
@ 2025-05-09 2:47 ` 贺鹏
0 siblings, 0 replies; 3+ messages in thread
From: 贺鹏 @ 2025-05-09 2:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]
It's 1ms, not 1 second.
It's a workaround, just to provide a fast and dirty fix for someone who needs this.
From: "Stephen Hemminger"<stephen@networkplumber.org>
> Date: Tue, Oct 8, 2024, 01:54
> Subject: [External] Re: [PATCH] mlx5: fix race at mlx5_dev_close
> To: "hepeng"<hepeng.0320@bytedance.com>
> Cc: <dev@dpdk.org>
> On Thu, 11 Apr 2024 14:17:40 +0800
> hepeng <hepeng.0320@bytedance.com> wrote:
>
>
> > From: "hepeng.0320" <hepeng.0320@bytedance.com>
> >
> > mlx5_dev_close currently will set priv->sh->port[priv->dev_port -
> > 1].nl_ih_port_id to RTE_MAX_ETHPORTS to avoid mlx5_dev_interrupt_nl_cb
> > to use the port's dev_private, because later the rte_eth_dev_close
> > will free the dev_private and set the pointer to NULL.
> >
> > However, since mlx5_dev_interrupt_nl_cb is running in another thread,
> > I think the race still exists. So perhaps an easy fix is to wait for
> > 1ms to avoid this race.
> >
> > Signed-off-by: hepeng.0320 <hepeng.0320@bytedance.com>
>
>
> Not the pest way to handle this. Adding a one second delay on shutdown
> hurts some availability scenarios. Looks like mlx5 needs a more coordinated
> shutdown to be safe; adding big delays is not the correct fix.
>
>
>
>
>
>
[-- Attachment #2: Type: text/html, Size: 8870 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-09 2:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-11 6:17 [PATCH] mlx5: fix race at mlx5_dev_close hepeng
2024-10-07 17:54 ` Stephen Hemminger
2025-05-09 2:47 ` [External] " 贺鹏
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).