From: Thomas Monjalon <thomas@monjalon.net>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>,
Ivan Malov <ivan.malov@arknetworks.am>
Cc: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>,
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
dev@dpdk.org, rasland@nvidia.com, stable@dpdk.org,
Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
Suanming Mou <suanmingm@nvidia.com>,
Matan Azrad <matan@nvidia.com>
Subject: Re: [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits
Date: Mon, 21 Jul 2025 13:59:59 +0200 [thread overview]
Message-ID: <3358342.oiGErgHkdL@thomas> (raw)
In-Reply-To: <b0c702cc-1213-d773-0ee0-804542a07e28@arknetworks.am>
21/07/2025 13:46, Ivan Malov:
> On Mon, 21 Jul 2025, Dariusz Sosnowski wrote:
>
> > + mlx5 maintainers
> >
> > Thank you for the patch.
> >
> > Could you please include other PMD maintainers (or other maintainers, depending on changed code)
> > in the future patches?
> > There is a script which automatically adds maintainers while sending a patch.
> > It is described in: https://doc.dpdk.org/guides/contributing/patches.html#sending-patches
> >
> > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote:
> >> When the primary process exits, the shared mlx5 state becomes
> >> unavailable to secondary processes. If a secondary process attempts
> >> to query device information (e.g., via testpmd), a NULL dereference
> >> may occur due to missing shared data.
> >>
> >> This patch adds a check for shared context availability and fails
> >> gracefully while preventing a crash.
> >>
> >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>
> >> ---
> >> drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++
> >> 1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> >> index 68d1c1bfa7..1848f6536a 100644
> >> --- a/drivers/net/mlx5/mlx5_ethdev.c
> >> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
> >> * Since we need one CQ per QP, the limit is the minimum number
> >> * between the two values.
> >> */
> >> + if (priv == NULL || priv->sh == NULL) {
> >> + DRV_LOG(ERR,
> >> + "mlx5 shared data unavailable (primary process likely exited)");
> >> + rte_errno = ENODEV;
> >> + return -rte_errno;
> >> + }
> >
> > I don't think it's an issue on PMD level, but rather on
> > ethdev/multi-process handling level.
>
> I may be very wrong here, but can't the PMD use its internal 'shared' state to
> somehow memorise the fact that a secondary process has attached, in order to not
> let the primary process close in the first place (before the secondary process
> detaches)? Or is this going to violate some established convention?
It looks to be a good idea.
> > When primary process closes the port, ethdev library zeroes and frees
> > device data shared between processes.
> > ethdev port data (rte_eth_dev) on secondary is not updated so it now points to
> > invalid data. rte_eth_dev_info_get() is not the only API call affected.
> >
> > If the primary process closes the port before exiting
> > (like testpmd does) and it exits before the secondary,
> > the any driver call seems invalid because of that use-after-free behavior.
> >
> > @Thomas, @Andrew - Do you happen to know if doing anything on ethdev ports
> > in secondary process after primary has gracefully exited is supported?
No there is no statement about whether it is supported or not.
I think we should at least return an error instead of crashing.
next prev parent reply other threads:[~2025-07-21 12:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-21 7:38 Khadem Ullah
2025-07-21 10:58 ` Dariusz Sosnowski
2025-07-21 11:38 ` [PATCH] net/mlx5: fix crash when using meter in transfer flow Khadem Ullah
2025-07-21 11:46 ` [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits Ivan Malov
2025-07-21 11:59 ` Thomas Monjalon [this message]
2025-07-21 14:01 ` Dariusz Sosnowski
2025-07-21 14:50 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3358342.oiGErgHkdL@thomas \
--to=thomas@monjalon.net \
--cc=14pwcse1224@uetpeshawar.edu.pk \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bingz@nvidia.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=ivan.malov@arknetworks.am \
--cc=matan@nvidia.com \
--cc=orika@nvidia.com \
--cc=rasland@nvidia.com \
--cc=stable@dpdk.org \
--cc=suanmingm@nvidia.com \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).