From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B2B63466B2; Fri, 9 May 2025 04:47:32 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3EFA84026C; Fri, 9 May 2025 04:47:32 +0200 (CEST) Received: from lf-1-128.ptr.blmpb.com (lf-1-128.ptr.blmpb.com [103.149.242.128]) by mails.dpdk.org (Postfix) with ESMTP id 29B0B4026C for ; Fri, 9 May 2025 04:47:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1746758842; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=Z3akpusGKcNGfQLkJIzPxvf9lW7d5ja1QNHQ3aQDVKA=; b=V0F6IIlSgxip+IDHovKySfWB4Hx5gOatcH11IBmdGcrjy+mspJiDEA5Ga/PMDqylGImb2i R4gRqj5Z86Ex27iqc1Ld4bGUvrOgwGbZS2E7eJDRFg1clXRWIZK45OFLTbnu1gA1mHqOfY 7dQfJN8F+yGaEDJqgWvuuZNR+gdJzT++8lY8N6mN33HsxGRJbHpDHheLty23yX6bEOz0cb C2RazzqXPtN5mBc11/A8l1srnJNHxplX6rMFZRxCWrUdkP8r0U9Zl+bZQkoI33F04cbj0w JJZF7dVRlSaPkktK9KgzT1C4n4TjX9Lrxm2WTdN/Zy43cLJ4Pj4/kXvCrph8gw== Date: Fri, 09 May 2025 10:47:19 +0800 Message-Id: To: "Stephen Hemminger" Mime-Version: 1.0 References: <20240411061740.16495-1-hepeng.0320@bytedance.com> <20241007105407.51e6d4c3@hermes.local> X-Lms-Return-Path: Content-Type: multipart/alternative; boundary=2022e3103cf89737a4d071d3d25963c5d6012813d8cbb60b883bcd8f70d3 Cc: Subject: Re: [External] Re: [PATCH] mlx5: fix race at mlx5_dev_close From: =?utf-8?q?=E8=B4=BA=E9=B9=8F?= In-Reply-To: <20241007105407.51e6d4c3@hermes.local> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --2022e3103cf89737a4d071d3d25963c5d6012813d8cbb60b883bcd8f70d3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 It's 1ms, not 1 second.=C2=A0 It's a workaround, just to provide a fast and dirty fix for someone who nee= ds this. From: "Stephen Hemminger" > Date:=C2=A0 Tue, Oct 8, 2024, 01:54 > Subject:=C2=A0 [External] Re: [PATCH] mlx5: fix race at mlx5_dev_close > To: "hepeng" > Cc: > On Thu, 11 Apr 2024 14:17:40 +0800=C2=A0 > hepeng wrote:=C2=A0 >=20 >=20 > > From: "hepeng.0320" =C2=A0 > >=C2=A0 > > mlx5_dev_close currently will set priv->sh->port[priv->dev_port -=C2=A0 > > 1].nl_ih_port_id to RTE_MAX_ETHPORTS to avoid mlx5_dev_interrupt_nl_cb= =C2=A0 > > to use the port's dev_private, because later the rte_eth_dev_close=C2= =A0 > > will free the dev_private and set the pointer to NULL.=C2=A0 > >=C2=A0 > > However, since mlx5_dev_interrupt_nl_cb is running in another thread,= =C2=A0 > > I think the race still exists. So perhaps an easy fix is to wait for=C2= =A0 > > 1ms to avoid this race.=C2=A0 > >=C2=A0 > > Signed-off-by: hepeng.0320 =C2=A0 >=20 >=20 > Not the pest way to handle this. Adding a one second delay on shutdown=C2= =A0 > hurts some availability scenarios. Looks like mlx5 needs a more coordinat= ed=C2=A0 > shutdown to be safe; adding big delays is not the correct fix. >=20 >=20 >=20 >=20 >=20 >=20 --2022e3103cf89737a4d071d3d25963c5d6012813d8cbb60b883bcd8f70d3 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8
It's 1ms, not 1 second.=C2=A0
It's a workaround, just to provide a fa= st and dirty fix for someone who needs this.


From: "Stephen Hemminger"<stephen@networkplumber.org>
Date:=C2=A0 Tue, Oct 8, 2024, 01:54
Subject:= =C2=A0 [External] Re: [PATCH] mlx5: fix race at mlx5_dev_close
<= div data-zone-id=3D"0" data-line-index=3D"7" data-line=3D"true" style=3D"ma= rgin: 0px;" class=3D"history-quote-line quote-head-meta-line">
Cc: <= dev@dpdk.org>
<= div class=3D"" dir=3D"auto" style=3D"font-size: 14px;">On Thu, 11 Apr 2024 = 14:17:40 +0800=C2=A0
hepeng <hepeng.0320@bytedan= ce.com> wrote:=C2=A0

> From: "hepeng.= 0320" <hepeng.0320@bytedance.com>=C2=A0
>=C2=A0
> mlx5_dev_close currently will set priv->sh->po= rt[priv->dev_port -=C2=A0
> 1].nl_ih_port_id to RTE_MAX_ETHPORTS to= avoid mlx5_dev_interrupt_nl_cb=C2=A0
> to use the port's dev_priv= ate, because later the rte_eth_dev_close=C2=A0
> will free the dev_pri= vate and set the pointer to NULL.=C2=A0
>=C2=A0
<= div class=3D"" dir=3D"auto" style=3D"font-size: 14px;">> However, since = mlx5_dev_interrupt_nl_cb is running in another thread,=C2=A0
> I think= the race still exists. So perhaps an easy fix is to wait for=C2=A0
> = 1ms to avoid this race.=C2=A0
>=C2=A0
> Signed-off-by: hep= eng.0320 <hepeng.0320@bytedance.com>=C2=A0=

Not the pest way to handle this. Adding a one second delay on = shutdown=C2=A0
hurts some availability scenarios. Looks like mlx5 needs a= more coordinated=C2=A0
shutdown to be safe; adding big delays is not th= e correct fix.



=
--2022e3103cf89737a4d071d3d25963c5d6012813d8cbb60b883bcd8f70d3--