I think at least this should be followed either in PMD or in ethdev layer to avoid this specific crashes. On Mon, Jul 21, 2025, 17:00 Thomas Monjalon wrote: > 21/07/2025 13:46, Ivan Malov: > > On Mon, 21 Jul 2025, Dariusz Sosnowski wrote: > > > > > + mlx5 maintainers > > > > > > Thank you for the patch. > > > > > > Could you please include other PMD maintainers (or other maintainers, > depending on changed code) > > > in the future patches? > > > There is a script which automatically adds maintainers while sending a > patch. > > > It is described in: > https://doc.dpdk.org/guides/contributing/patches.html#sending-patches > > > > > > On Mon, Jul 21, 2025 at 03:38:51AM -0400, Khadem Ullah wrote: > > >> When the primary process exits, the shared mlx5 state becomes > > >> unavailable to secondary processes. If a secondary process attempts > > >> to query device information (e.g., via testpmd), a NULL dereference > > >> may occur due to missing shared data. > > >> > > >> This patch adds a check for shared context availability and fails > > >> gracefully while preventing a crash. > > >> > > >> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop") > > >> Cc: stable@dpdk.org > > >> > > >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > >> --- > > >> drivers/net/mlx5/mlx5_ethdev.c | 6 ++++++ > > >> 1 file changed, 6 insertions(+) > > >> > > >> diff --git a/drivers/net/mlx5/mlx5_ethdev.c > b/drivers/net/mlx5/mlx5_ethdev.c > > >> index 68d1c1bfa7..1848f6536a 100644 > > >> --- a/drivers/net/mlx5/mlx5_ethdev.c > > >> +++ b/drivers/net/mlx5/mlx5_ethdev.c > > >> @@ -368,6 +368,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, > struct rte_eth_dev_info *info) > > >> * Since we need one CQ per QP, the limit is the minimum number > > >> * between the two values. > > >> */ > > >> + if (priv == NULL || priv->sh == NULL) { > > >> + DRV_LOG(ERR, > > >> + "mlx5 shared data unavailable (primary process likely > exited)"); > > >> + rte_errno = ENODEV; > > >> + return -rte_errno; > > >> + } > > > > > > I don't think it's an issue on PMD level, but rather on > > > ethdev/multi-process handling level. > > > > I may be very wrong here, but can't the PMD use its internal 'shared' > state to > > somehow memorise the fact that a secondary process has attached, in > order to not > > let the primary process close in the first place (before the secondary > process > > detaches)? Or is this going to violate some established convention? > > It looks to be a good idea. > > > > When primary process closes the port, ethdev library zeroes and frees > > > device data shared between processes. > > > ethdev port data (rte_eth_dev) on secondary is not updated so it now > points to > > > invalid data. rte_eth_dev_info_get() is not the only API call affected. > > > > > > If the primary process closes the port before exiting > > > (like testpmd does) and it exits before the secondary, > > > the any driver call seems invalid because of that use-after-free > behavior. > > > > > > @Thomas, @Andrew - Do you happen to know if doing anything on ethdev > ports > > > in secondary process after primary has gracefully exited is supported? > > No there is no statement about whether it is supported or not. > I think we should at least return an error instead of crashing. > > >