patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>
To: Thomas Monjalon <thomas@monjalon.net>
Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Ivan Malov <ivan.malov@arknetworks.am>,
	 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	dev@dpdk.org, rasland@nvidia.com,  dpdk stable <stable@dpdk.org>,
	Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>,  Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>
Subject: Re: [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits
Date: Wed, 23 Jul 2025 16:29:03 +0500	[thread overview]
Message-ID: <CA++2-x5vmjQzUrSj_ihC5wQvicC=o8xJZSzazVGuGuRV=LG4Fg@mail.gmail.com> (raw)
In-Reply-To: <CA++2-x6o1QXgO=RiWk8vFcCRfs3ZCWc5G-yoebxCzPVzAZY9Gg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4777 bytes --]

This patch is a logical continuation of the commit:

```
commit 30410493759f ("drivers/net: check process type in close operation")
```

In that change, @Thomas Monjalon <thomas@monjalon.net>  introduced a
mechanism to prevent secondary processes from incorrectly releasing shared
resources during device close operations. That patch enforced process-type
checks in PMD close functions, ensuring only primary processes could manage
shared resources.

Furthermore, secondary application not only breaking on device closing,
it's also getting segfault when we do "show device info all" from secondary
after primary closes:

testpmd> show device info all

********************* Infos for device 0000:03:00.0 *********************
Bus name: pci
Bus information: vendor_id=15b3, device_id=101d
Driver name: mlx5_pci
Devargs:
Connect to socket: 0

Segmentation fault (core dumped)

This patch prevents these crashes and it seems that these fixes should be
in PMD along with the ethdev layer.


On Wed, Jul 23, 2025 at 12:30 AM Khadem Ullah <
14pwcse1224@uetpeshawar.edu.pk> wrote:

> I think at least this should be followed either in PMD or in ethdev layer
> to avoid this specific crashes.
>
> On Mon, Jul 21, 2025, 17:00 Thomas Monjalon <thomas@monjalon.net> wrote:
>
>> 21/07/2025 13:46, Ivan Malov:
>> > On Mon, 21 Jul 2025, Dariusz Sosnowski wrote:
>> >
>> > > + mlx5 maintainers
>> > >
>> > > Thank you for the patch.
>> > >
>> > > Could you please include other PMD maintainers (or other maintainers,
>> depending on changed code)
>> > > in the future patches?
>> > > There is a script which automatically adds maintainers while sending
>> a patch.
>> > > It is described in:
>> https://doc.dpdk.org/guides/contributing/patches.html#sending-patches
>> > >
>> > > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote:
>> > >> When the primary process exits, the shared mlx5 state becomes
>> > >> unavailable to secondary processes. If a secondary process attempts
>> > >> to query device information (e.g., via testpmd), a NULL dereference
>> > >> may occur due to missing shared data.
>> > >>
>> > >> This patch adds a check for shared context availability and fails
>> > >> gracefully while preventing a crash.
>> > >>
>> > >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop")
>> > >> Cc: stable@dpdk.org
>> > >>
>> > >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>
>> > >> ---
>> > >>  drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++
>> > >>  1 file changed, 6 insertions(+)
>> > >>
>> > >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c
>> b/drivers/net/mlx5/mlx5_ethdev.c
>> > >> index 68d1c1bfa7..1848f6536a 100644
>> > >> --- a/drivers/net/mlx5/mlx5_ethdev.c
>> > >> +++ b/drivers/net/mlx5/mlx5_ethdev.c
>> > >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev,
>> struct rte_eth_dev_info *info)
>> > >>     * Since we need one CQ per QP, the limit is the minimum number
>> > >>     * between the two values.
>> > >>     */
>> > >> +  if (priv == NULL || priv->sh == NULL) {
>> > >> +          DRV_LOG(ERR,
>> > >> +          "mlx5 shared data unavailable (primary process likely
>> exited)");
>> > >> +          rte_errno = ENODEV;
>> > >> +          return -rte_errno;
>> > >> +  }
>> > >
>> > > I don't think it's an issue on PMD level, but rather on
>> > > ethdev/multi-process handling level.
>> >
>> > I may be very wrong here, but can't the PMD use its internal 'shared'
>> state to
>> > somehow memorise the fact that a secondary process has attached, in
>> order to not
>> > let the primary process close in the first place (before the secondary
>> process
>> > detaches)? Or is this going to violate some established convention?
>>
>> It looks to be a good idea.
>>
>> > > When primary process closes the port, ethdev library zeroes and frees
>> > > device data shared between processes.
>> > > ethdev port data (rte_eth_dev) on secondary is not updated so it now
>> points to
>> > > invalid data. rte_eth_dev_info_get() is not the only API call
>> affected.
>> > >
>> > > If the primary process closes the port before exiting
>> > > (like testpmd does) and it exits before the secondary,
>> > > the any driver call seems invalid because of that use-after-free
>> behavior.
>> > >
>> > > @Thomas, @Andrew - Do you happen to know if doing anything on ethdev
>> ports
>> > > in secondary process after primary has gracefully exited is supported?
>>
>> No there is no statement about whether it is supported or not.
>> I think we should at least return an error instead of crashing.
>>
>>
>>

-- 
Engr. Khadem Ullah,
Software Engineer,
Dreambig Semiconductor Inc
https://dreambigsemi.com/

[-- Attachment #2: Type: text/html, Size: 6972 bytes --]

  reply	other threads:[~2025-07-23 11:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-21  7:38 Khadem Ullah
2025-07-21 10:58 ` Dariusz Sosnowski
2025-07-21 11:38   ` [PATCH] net/mlx5: fix crash when using meter in transfer flow Khadem Ullah
2025-07-21 11:46   ` [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits Ivan Malov
2025-07-21 11:59     ` Thomas Monjalon
2025-07-21 14:01       ` Dariusz Sosnowski
2025-07-22 12:14         ` [PATCH] ethdev: add dev_private check for secondary process Khadem Ullah
2025-07-22 16:26         ` [PATCH] net/mlx5: fix crash when secondary queries dev info after primary exits Ivan Malov
2025-07-22 16:52           ` Khadem Ullah
2025-07-22 19:30       ` Khadem Ullah
2025-07-23 11:29         ` Khadem Ullah [this message]
2025-07-21 14:50   ` Stephen Hemminger
2025-07-23 14:19 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA++2-x5vmjQzUrSj_ihC5wQvicC=o8xJZSzazVGuGuRV=LG4Fg@mail.gmail.com' \
    --to=14pwcse1224@uetpeshawar.edu.pk \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=ivan.malov@arknetworks.am \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).