From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2F87F46BE5; Tue, 22 Jul 2025 21:30:19 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9F9AE40281; Tue, 22 Jul 2025 21:30:18 +0200 (CEST) Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by mails.dpdk.org (Postfix) with ESMTP id D8D264003C for ; Tue, 22 Jul 2025 21:30:16 +0200 (CEST) Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6fad3400ea3so54562196d6.0 for ; Tue, 22 Jul 2025 12:30:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uetpeshawar-edu-pk.20230601.gappssmtp.com; s=20230601; t=1753212616; x=1753817416; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QYZuKYmr1PLwtfPThmzJHGA9asZ4PU077WEt42CrZwA=; b=CKu7ztNOJ1MfR67EJaFOzfL/2HNpE9S9pL45eOCEr+zjgCz9QY4T3wn/jU6WREbGpt 7elaONfgmEKAAEMQ3qy5dSCTf6HdeVfqXdkCpjzt9hAlhQ1G8k+32QaQtnEA4f8aIXxv NYifPvTpFLFSqZZQq1kZwq06Av0sxjLdSstP0y7EyQb0ggO9/K8Dk9F5hJXwiBU5/XXK MUtKDH6T/SldOAdJhHzGaCMc+3XuyTqz5/n4v1VTqUNO4wyOYfilBKzi5o7xOgP6jQK9 Ed7zbckdDTrshSIaT4jdXJUo4OG4uAtPv5jig1M3YBb1G2FcGjjLevOjDK+97uABSQcu 8MTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753212616; x=1753817416; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QYZuKYmr1PLwtfPThmzJHGA9asZ4PU077WEt42CrZwA=; b=sZ3YcmJRqtlTCJHs9HYx3BsGzJYsu3m6tGuzzpyuiNyIq37l9Od7FnAb9JRrwHNjMC gEPMGbxb41509q3uodCDrsqEP4cmT7yISWJPLzJ+IumeiA/wlmdbciPqMXAT7vJNes7h /6dRZRg3Llsg7lEniDd879fRdzxCJB/uZ3F5vN40s4n8/BYBefGy4YeGgV1jvJSJ0Mme PNigSpHpgL/bKVdIhlmUxQp4jE7xnuyjQt/wRN6Hzv1SjBWp/g94gWSxV7ZQ2e5jx//K PCMB1O0jSdbPNYPiJSYj2LSSKzrEWi+Ro3JOE1ex9PfBiTpo5P2NxvJSTxnke3G7Ctyk MFBQ== X-Forwarded-Encrypted: i=1; AJvYcCV58D907f+Ghn2g5oHkxeQQghiBATv1z5aWYE6t/DWq4jekqb2XwVI3YGGfA92syegHqYM=@dpdk.org X-Gm-Message-State: AOJu0Yx/RUkT1e8S8QAbca6ebt51p59sMzlbU5PyD8m9YydSDy7B4onR ryjRe4KnxGP9hC9Xo40OawawRjEPFDYafbW+PHrJia+BGeywOcAFClTofpDYmM9tZUPIW+hr2ao Aa2ojhy4om7zqBBxSDBuj1n2Qazcg/xpDlZRNcz9/Ng== X-Gm-Gg: ASbGncutOqS8UtDHSxDIz95FPeN5n8EY0fhwDkAQ9rNofIEJj+a9F0DV6M+RcP9en68 fuu5zuuZU/umTU/mFQTZIlUw3i5cMO3ybYe5rnKdvvIkg9lKB3tE38YsgRMeD6vNdcn5e4hZi4h MjIV08u83wbaNKJqPZOWP3glwlNI7d177uSktLxMmH6zRmExFIgT6Ulwrs8QX87OL3z2ev8LQSI 98jHFLTfym8Z7TpGYY7f485PsOt9Mgasb1gOcBz X-Google-Smtp-Source: AGHT+IGCVQvk5C5Kb1UKDjSqaU5ZRTey0Y5qmHfC7sFYxVCxGAG65ZBEtDo9n2alrKE/9Z+L6muGjOCpVoL6g0HJlJs= X-Received: by 2002:a05:6214:21c3:b0:6fb:619c:98ac with SMTP id 6a1803df08f44-70700739c76mr5540206d6.39.1753212615918; Tue, 22 Jul 2025 12:30:15 -0700 (PDT) MIME-Version: 1.0 References: <20250721073851.963141-1-14pwcse1224@uetpeshawar.edu.pk> <20250721105819.2ci66fl7bzikwb22@ds-vm-debian.local> <3358342.oiGErgHkdL@thomas> In-Reply-To: <3358342.oiGErgHkdL@thomas> From: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> Date: Wed, 23 Jul 2025 00:30:03 +0500 X-Gm-Features: Ac12FXyKSQcd8lSRFyw_yD_XyuJkHbiXKVed90WCf0DrdGiBpoiQEPri1ynx89I Message-ID: Subject: Re: [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits To: Thomas Monjalon Cc: Dariusz Sosnowski , Ivan Malov , Andrew Rybchenko , dev@dpdk.org, rasland@nvidia.com, dpdk stable , Viacheslav Ovsiienko , Bing Zhao , Ori Kam , Suanming Mou , Matan Azrad Content-Type: multipart/alternative; boundary="000000000000301bc2063a899f62" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000301bc2063a899f62 Content-Type: text/plain; charset="UTF-8" I think at least this should be followed either in PMD or in ethdev layer to avoid this specific crashes. On Mon, Jul 21, 2025, 17:00 Thomas Monjalon wrote: > 21/07/2025 13:46, Ivan Malov: > > On Mon, 21 Jul 2025, Dariusz Sosnowski wrote: > > > > > + mlx5 maintainers > > > > > > Thank you for the patch. > > > > > > Could you please include other PMD maintainers (or other maintainers, > depending on changed code) > > > in the future patches? > > > There is a script which automatically adds maintainers while sending a > patch. > > > It is described in: > https://doc.dpdk.org/guides/contributing/patches.html#sending-patches > > > > > > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote: > > >> When the primary process exits, the shared mlx5 state becomes > > >> unavailable to secondary processes. If a secondary process attempts > > >> to query device information (e.g., via testpmd), a NULL dereference > > >> may occur due to missing shared data. > > >> > > >> This patch adds a check for shared context availability and fails > > >> gracefully while preventing a crash. > > >> > > >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop") > > >> Cc: stable@dpdk.org > > >> > > >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > >> --- > > >> drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++ > > >> 1 file changed, 6 insertions(+) > > >> > > >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c > b/drivers/net/mlx5/mlx5_ethdev.c > > >> index 68d1c1bfa7..1848f6536a 100644 > > >> --- a/drivers/net/mlx5/mlx5_ethdev.c > > >> +++ b/drivers/net/mlx5/mlx5_ethdev.c > > >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, > struct rte_eth_dev_info *info) > > >> * Since we need one CQ per QP, the limit is the minimum number > > >> * between the two values. > > >> */ > > >> + if (priv == NULL || priv->sh == NULL) { > > >> + DRV_LOG(ERR, > > >> + "mlx5 shared data unavailable (primary process likely > exited)"); > > >> + rte_errno = ENODEV; > > >> + return -rte_errno; > > >> + } > > > > > > I don't think it's an issue on PMD level, but rather on > > > ethdev/multi-process handling level. > > > > I may be very wrong here, but can't the PMD use its internal 'shared' > state to > > somehow memorise the fact that a secondary process has attached, in > order to not > > let the primary process close in the first place (before the secondary > process > > detaches)? Or is this going to violate some established convention? > > It looks to be a good idea. > > > > When primary process closes the port, ethdev library zeroes and frees > > > device data shared between processes. > > > ethdev port data (rte_eth_dev) on secondary is not updated so it now > points to > > > invalid data. rte_eth_dev_info_get() is not the only API call affected. > > > > > > If the primary process closes the port before exiting > > > (like testpmd does) and it exits before the secondary, > > > the any driver call seems invalid because of that use-after-free > behavior. > > > > > > @Thomas, @Andrew - Do you happen to know if doing anything on ethdev > ports > > > in secondary process after primary has gracefully exited is supported? > > No there is no statement about whether it is supported or not. > I think we should at least return an error instead of crashing. > > > --000000000000301bc2063a899f62 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think at least this should be followed either in PMD or= in ethdev layer to avoid this specific crashes.=C2=A0

On Mon, Jul 21, 2025, 17:00 Thomas Monjalon <thomas@monjalon.net> wrote:
21/07/2025 13:46, Ivan Malov:
> On Mon, 21 Jul 2025, Dariusz Sosnowski wrote:
>
> > + mlx5 maintainers
> >
> > Thank you for the patch.
> >
> > Could you please include other PMD maintainers (or other maintain= ers, depending on changed code)
> > in the future patches?
> > There is a script which automatically adds maintainers while send= ing a patch.
> > It is described in: https://doc.dpdk.org/guides/contributing/patches.html#sending-p= atches
> >
> > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote:
> >> When the primary process exits, the shared mlx5 state becomes=
> >> unavailable to secondary processes. If a secondary process at= tempts
> >> to query device information (e.g., via testpmd), a NULL deref= erence
> >> may occur due to missing shared data.
> >>
> >> This patch adds a check for shared context availability and f= ails
> >> gracefully while preventing a crash.
> >>
> >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/s= top")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpe= shawar.edu.pk>
> >> ---
> >>=C2=A0 drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++
> >>=C2=A0 1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx= 5/mlx5_ethdev.c
> >> index 68d1c1bfa7..1848f6536a 100644
> >> --- a/drivers/net/mlx5/mlx5_ethdev.c
> >> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *d= ev, struct rte_eth_dev_info *info)
> >>=C2=A0 =C2=A0 =C2=A0* Since we need one CQ per QP, the limit i= s the minimum number
> >>=C2=A0 =C2=A0 =C2=A0* between the two values.
> >>=C2=A0 =C2=A0 =C2=A0*/
> >> +=C2=A0 if (priv =3D=3D NULL || priv->sh =3D=3D NULL) { > >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 DRV_LOG(ERR,
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "mlx5 shared data un= available (primary process likely exited)");
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rte_errno =3D ENODEV;
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return -rte_errno;
> >> +=C2=A0 }
> >
> > I don't think it's an issue on PMD level, but rather on > > ethdev/multi-process handling level.
>
> I may be very wrong here, but can't the PMD use its internal '= shared' state to
> somehow memorise the fact that a secondary process has attached, in or= der to not
> let the primary process close in the first place (before the secondary= process
> detaches)? Or is this going to violate some established convention?
It looks to be a good idea.

> > When primary process closes the port, ethdev library zeroes and f= rees
> > device data shared between processes.
> > ethdev port data (rte_eth_dev) on secondary is not updated so it = now points to
> > invalid data. rte_eth_dev_info_get() is not the only API call aff= ected.
> >
> > If the primary process closes the port before exiting
> > (like testpmd does) and it exits before the secondary,
> > the any driver call seems invalid because of that use-after-free = behavior.
> >
> > @Thomas, @Andrew - Do you happen to know if doing anything on eth= dev ports
> > in secondary process after primary has gracefully exited is suppo= rted?

No there is no statement about whether it is supported or not.
I think we should at least return an error instead of crashing.


--000000000000301bc2063a899f62--