* [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer @ 2025-07-22 11:54 Khadem Ullah 2025-07-22 13:39 ` Stephen Hemminger ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 11:54 UTC (permalink / raw) To: thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable, Khadem Ullah In secondary processes, accessing 'dev->data->dev_private' directly can cause a segmentation fault if the primary process has exited or the shared memory is unavailable. This patch adds a check for dev/data/dev_private and uses rte_mem_virt2phy to ensure the pointer is still valid. Fixes: bdad90d12ec8 ('ethdev: change device info get callback to return int') Cc: stable@dpdk.org Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> --- lib/ethdev/rte_ethdev.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index dd7c00bc94..03ef446a96 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -4079,6 +4079,15 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) if (dev->dev_ops->dev_infos_get == NULL) return -ENOTSUP; + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + (dev == NULL || dev->data == NULL || + dev->data->dev_private == NULL || + rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { + RTE_ETHDEV_LOG_LINE(ERR, + "Secondary: dev_private not accessible (primary exited?)"); + rte_errno = ENODEV; + return -rte_errno; + } diag = dev->dev_ops->dev_infos_get(dev, dev_info); if (diag != 0) { /* Cleanup already filled in device information */ -- 2.43.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 11:54 [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah @ 2025-07-22 13:39 ` Stephen Hemminger 2025-07-22 14:30 ` Khadem Ullah 2025-07-23 4:29 ` Khadem Ullah 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah 2 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2025-07-22 13:39 UTC (permalink / raw) To: Khadem Ullah; +Cc: thomas, ferruh.yigit, andrew.rybchenko, dev, stable On Tue, 22 Jul 2025 07:54:39 -0400 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > + (dev == NULL || dev->data == NULL || > + dev->data->dev_private == NULL || dev can't be NULL and checking it here will cause a Coverity warning. There are many other ethdev calls that will fail if primary dies. stats, xstats, rx/tx burst, ... I don't think it is good idea to add checks here. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 13:39 ` Stephen Hemminger @ 2025-07-22 14:30 ` Khadem Ullah 2025-07-22 15:42 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 14:30 UTC (permalink / raw) To: Stephen Hemminger Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 723 bytes --] Hi Stephen, Can we add only the check that fixes the segfault, or do you mean that it should be fixed at the PMD level? Best regards, Khadem On Tue, Jul 22, 2025, 18:39 Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 22 Jul 2025 07:54:39 -0400 > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > + (dev == NULL || dev->data == NULL || > > + dev->data->dev_private == NULL || > > dev can't be NULL and checking it here will cause a Coverity warning. > > There are many other ethdev calls that will fail if primary dies. > stats, xstats, rx/tx burst, ... > > I don't think it is good idea to add checks here. > [-- Attachment #2: Type: text/html, Size: 1290 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 14:30 ` Khadem Ullah @ 2025-07-22 15:42 ` Stephen Hemminger 2025-07-22 16:01 ` Khadem Ullah 0 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2025-07-22 15:42 UTC (permalink / raw) To: Khadem Ullah Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Tue, 22 Jul 2025 19:30:41 +0500 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > Hi Stephen, > Can we add only the check that fixes the segfault, or do you mean that it > should be fixed at the PMD level? > > Best regards, > Khadem > > On Tue, Jul 22, 2025, 18:39 Stephen Hemminger <stephen@networkplumber.org> > wrote: > > > On Tue, 22 Jul 2025 07:54:39 -0400 > > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > + (dev == NULL || dev->data == NULL || > > > + dev->data->dev_private == NULL || > > > > dev can't be NULL and checking it here will cause a Coverity warning. > > > > There are many other ethdev calls that will fail if primary dies. > > stats, xstats, rx/tx burst, ... > > > > I don't think it is good idea to add checks here. > > It needs to be fixed at the documentation level. Make sure and document what applications need to do. Rather than adding more checks in ethdev. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 15:42 ` Stephen Hemminger @ 2025-07-22 16:01 ` Khadem Ullah 2025-07-22 16:13 ` Bruce Richardson 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 16:01 UTC (permalink / raw) To: Stephen Hemminger Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 1682 bytes --] Thanks for the follow up. Understood. That makes sense. However, I’d like to highlight that applications should ideally be robust and interactive enough to handle all edge cases where a segfault or unexpected error might occur. While clear documentation is certainly important, relying solely on it may not be sufficient. In my view, potential segfaults should be handled explicitly in code to ensure stability and to prevent silent failures, especially in production environments. On Tue, Jul 22, 2025, 20:42 Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 22 Jul 2025 19:30:41 +0500 > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > Hi Stephen, > > Can we add only the check that fixes the segfault, or do you mean that it > > should be fixed at the PMD level? > > > > Best regards, > > Khadem > > > > On Tue, Jul 22, 2025, 18:39 Stephen Hemminger < > stephen@networkplumber.org> > > wrote: > > > > > On Tue, 22 Jul 2025 07:54:39 -0400 > > > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > > + (dev == NULL || dev->data == NULL || > > > > + dev->data->dev_private == NULL || > > > > > > dev can't be NULL and checking it here will cause a Coverity warning. > > > > > > There are many other ethdev calls that will fail if primary dies. > > > stats, xstats, rx/tx burst, ... > > > > > > I don't think it is good idea to add checks here. > > > > > It needs to be fixed at the documentation level. > Make sure and document what applications need to do. Rather than adding > more checks in ethdev. > [-- Attachment #2: Type: text/html, Size: 2510 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 16:01 ` Khadem Ullah @ 2025-07-22 16:13 ` Bruce Richardson 2025-07-22 17:04 ` Khadem Ullah 0 siblings, 1 reply; 25+ messages in thread From: Bruce Richardson @ 2025-07-22 16:13 UTC (permalink / raw) To: Khadem Ullah Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Tue, Jul 22, 2025 at 09:01:42PM +0500, Khadem Ullah wrote: > Thanks for the follow up. > Understood. That makes sense. However, I’d like to highlight that > applications should ideally be robust and interactive enough to handle > all edge cases where a segfault or unexpected error might occur. While > clear documentation is certainly important, relying solely on it may > not be sufficient. In my view, potential segfaults should be handled > explicitly in code to ensure stability and to prevent silent failures, > especially in production environments. > In fairness, where stability is the main concern, I'd generally recommend avoiding multi-process entirely. Or, if it has to be used, the primary process should be a minimal slim one, that sets up the ports and memory and thereafter sleeps so that it should never crash unexpectedly! Even with that, if any secondary process dies, you'll still have all the buffers in use by that secondary process leaked, so for any multiprocess system the only safe behaviour for the system is to restart all processes if any process unexpectedly terminate. /Bruce ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 16:13 ` Bruce Richardson @ 2025-07-22 17:04 ` Khadem Ullah 2025-07-22 17:38 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 17:04 UTC (permalink / raw) To: Bruce Richardson Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 1504 bytes --] Agree, but I think it's also a good practice to guard against known cases that are prone to crashes. On Tue, Jul 22, 2025 at 9:14 PM Bruce Richardson <bruce.richardson@intel.com> wrote: > On Tue, Jul 22, 2025 at 09:01:42PM +0500, Khadem Ullah wrote: > > Thanks for the follow up. > > Understood. That makes sense. However, I’d like to highlight that > > applications should ideally be robust and interactive enough to handle > > all edge cases where a segfault or unexpected error might occur. While > > clear documentation is certainly important, relying solely on it may > > not be sufficient. In my view, potential segfaults should be handled > > explicitly in code to ensure stability and to prevent silent failures, > > especially in production environments. > > > In fairness, where stability is the main concern, I'd generally recommend > avoiding multi-process entirely. Or, if it has to be used, the primary > process should be a minimal slim one, that sets up the ports and memory and > thereafter sleeps so that it should never crash unexpectedly! > > Even with that, if any secondary process dies, you'll still have all the > buffers in use by that secondary process leaked, so for any multiprocess > system the only safe behaviour for the system is to restart all processes > if any process unexpectedly terminate. > > /Bruce > -- Engr. Khadem Ullah, Software Engineer, Dreambig Semiconductor Inc https://dreambigsemi.com/ [-- Attachment #2: Type: text/html, Size: 2532 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 17:04 ` Khadem Ullah @ 2025-07-22 17:38 ` Stephen Hemminger 2025-07-22 17:53 ` Khadem Ullah 0 siblings, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2025-07-22 17:38 UTC (permalink / raw) To: Khadem Ullah Cc: Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Tue, 22 Jul 2025 22:04:32 +0500 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > Agree, but I think it's also a good practice to guard against known cases > that are prone to crashes. Right but DPDK chooses performance over API safety. For example rx/tx burst doesn't check args. The point is that as a library, if application is doing something wrong returning error doesn't always help. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 17:38 ` Stephen Hemminger @ 2025-07-22 17:53 ` Khadem Ullah 2025-07-22 18:21 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 17:53 UTC (permalink / raw) To: Stephen Hemminger Cc: Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 727 bytes --] Right, but performance and reliability are both important. While DPDK rightly prioritizes performance, some level of reliability should still be ensured, especially to catch known issues that could lead to instability. On Tue, Jul 22, 2025, 22:38 Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 22 Jul 2025 22:04:32 +0500 > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > Agree, but I think it's also a good practice to guard against known cases > > that are prone to crashes. > > > Right but DPDK chooses performance over API safety. > For example rx/tx burst doesn't check args. > > The point is that as a library, if application is doing something wrong > returning error doesn't always help. > [-- Attachment #2: Type: text/html, Size: 1164 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 17:53 ` Khadem Ullah @ 2025-07-22 18:21 ` Stephen Hemminger 2025-07-22 19:03 ` Khadem Ullah 2025-07-22 19:05 ` Ivan Malov 0 siblings, 2 replies; 25+ messages in thread From: Stephen Hemminger @ 2025-07-22 18:21 UTC (permalink / raw) To: Khadem Ullah Cc: Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Tue, 22 Jul 2025 22:53:08 +0500 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > Right, but performance and reliability are both important. While DPDK > rightly prioritizes performance, some level of reliability should still be > ensured, especially to catch known issues that could lead to instability. > > On Tue, Jul 22, 2025, 22:38 Stephen Hemminger <stephen@networkplumber.org> > wrote: > > > On Tue, 22 Jul 2025 22:04:32 +0500 > > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > > Agree, but I think it's also a good practice to guard against known cases > > > that are prone to crashes. > > > > > > Right but DPDK chooses performance over API safety. > > For example rx/tx burst doesn't check args. > > > > The point is that as a library, if application is doing something wrong > > returning error doesn't always help. > > The problem is that all those values dev->data and private are shared between processes without any locking. If the API's are going to MP safe then they would require locking. The DPDK has made an explicit decision to not use locking in ethdev control or data path. You can get away with checking for dev->data being NULL on x86 where there is data consistency. But on weakly ordered platforms that is not going to work. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 18:21 ` Stephen Hemminger @ 2025-07-22 19:03 ` Khadem Ullah 2025-07-22 19:05 ` Ivan Malov 1 sibling, 0 replies; 25+ messages in thread From: Khadem Ullah @ 2025-07-22 19:03 UTC (permalink / raw) To: Stephen Hemminger Cc: Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 1746 bytes --] Right, those structures are accessed within each PMD, and I believe this shouldn't pose issues even on weakly ordered systems. As long as the checks are implemented using proper DPDK APIs, we can ensure correctness by enforcing appropriate memory ordering where needed, without introducing full locking overhead. On Tue, Jul 22, 2025, 23:21 Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 22 Jul 2025 22:53:08 +0500 > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > Right, but performance and reliability are both important. While DPDK > > rightly prioritizes performance, some level of reliability should still > be > > ensured, especially to catch known issues that could lead to instability. > > > > On Tue, Jul 22, 2025, 22:38 Stephen Hemminger < > stephen@networkplumber.org> > > wrote: > > > > > On Tue, 22 Jul 2025 22:04:32 +0500 > > > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > > > > Agree, but I think it's also a good practice to guard against known > cases > > > > that are prone to crashes. > > > > > > > > > Right but DPDK chooses performance over API safety. > > > For example rx/tx burst doesn't check args. > > > > > > The point is that as a library, if application is doing something wrong > > > returning error doesn't always help. > > > > > The problem is that all those values dev->data and private are shared > between processes without any locking. If the API's are going to MP safe > then they would require locking. The DPDK has made an explicit decision > to not use locking in ethdev control or data path. > > You can get away with checking for dev->data being NULL on x86 where > there is data consistency. But on weakly ordered platforms that is not > going > to work. > [-- Attachment #2: Type: text/html, Size: 2546 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 18:21 ` Stephen Hemminger 2025-07-22 19:03 ` Khadem Ullah @ 2025-07-22 19:05 ` Ivan Malov 2025-07-22 22:28 ` Stephen Hemminger 1 sibling, 1 reply; 25+ messages in thread From: Ivan Malov @ 2025-07-22 19:05 UTC (permalink / raw) To: Stephen Hemminger Cc: Khadem Ullah, Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable There is a difference between control path and data path. Always has been. Yes, on data path, DPDK has historically sought better performance, but on the slow path, checks have typically been implemented, even in the flow API, with the only exception being "asynchronous flow" APIs, which are meant to be fast-path. Yes, the idea to have a "secondary process reference counter" in 'rte_device' to be either guarded with its own lock or accessed atomically by 'rte_dev_probe' and 'rte_dev_remove' (to increment and decrement/check respectively) as well as by 'rte_eth_dev_close' and 'rte_eth_dev_reset' (to decrement/check) may not be a hill to die on, to be honest, and might be wrong, so I have no strong opinion. What scares me most in this idea is that, one may still end up with certain entry points overlooked, rendering the whole effort worthless. Thank you. On Tue, 22 Jul 2025, Stephen Hemminger wrote: > On Tue, 22 Jul 2025 22:53:08 +0500 > Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > >> Right, but performance and reliability are both important. While DPDK >> rightly prioritizes performance, some level of reliability should still be >> ensured, especially to catch known issues that could lead to instability. >> >> On Tue, Jul 22, 2025, 22:38 Stephen Hemminger <stephen@networkplumber.org> >> wrote: >> >>> On Tue, 22 Jul 2025 22:04:32 +0500 >>> Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: >>> >>>> Agree, but I think it's also a good practice to guard against known cases >>>> that are prone to crashes. >>> >>> >>> Right but DPDK chooses performance over API safety. >>> For example rx/tx burst doesn't check args. >>> >>> The point is that as a library, if application is doing something wrong >>> returning error doesn't always help. >>> > > The problem is that all those values dev->data and private are shared > between processes without any locking. If the API's are going to MP safe > then they would require locking. The DPDK has made an explicit decision > to not use locking in ethdev control or data path. > > You can get away with checking for dev->data being NULL on x86 where > there is data consistency. But on weakly ordered platforms that is not going > to work. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 19:05 ` Ivan Malov @ 2025-07-22 22:28 ` Stephen Hemminger 0 siblings, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2025-07-22 22:28 UTC (permalink / raw) To: Ivan Malov Cc: Khadem Ullah, Bruce Richardson, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Tue, 22 Jul 2025 23:05:06 +0400 (+04) Ivan Malov <ivan.malov@arknetworks.am> wrote: > There is a difference between control path and data path. Always has been. Yes, > on data path, DPDK has historically sought better performance, but on the slow > path, checks have typically been implemented, even in the flow API, with the > only exception being "asynchronous flow" APIs, which are meant to be fast-path. > > Yes, the idea to have a "secondary process reference counter" in 'rte_device' > to be either guarded with its own lock or accessed atomically by 'rte_dev_probe' > and 'rte_dev_remove' (to increment and decrement/check respectively) as well as > by 'rte_eth_dev_close' and 'rte_eth_dev_reset' (to decrement/check) may not be > a hill to die on, to be honest, and might be wrong, so I have no strong opinion. > > What scares me most in this idea is that, one may still end up with certain > entry points overlooked, rendering the whole effort worthless. > Please don't top post. The DPDK control has (up to now) assumed that control operations are only done from a single thread on each port. There is also the issue of hotplug but that is separate. For example, if two threads start and stop the same port bad thing happen and NIC driver's break. This is not well documented and a section needs to go into programmer's guide thread safety. The whole thread safety section is out of date, and doesn't reference RCU when it should. It also doesn't cover hot plug or weird secondary processes that fork. There is also the issue of how primary/secondary monitoring work. Right now the secondary monitors primary by periodically polling a lock file. This inherently a racy method and leads to problems. It needs to be redesigned to use a blocking method something like spawning a thread in secondary that uses some part of the existing Unix domain IPC to get notification when primary crashes or wants to exit. Ideally it would support synchronous handshake with all primaries and asynchronous case when primary crashes. The point is that bandaid's in the ethdev layer won't fix it well enough. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 11:54 [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah 2025-07-22 13:39 ` Stephen Hemminger @ 2025-07-23 4:29 ` Khadem Ullah 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah 2 siblings, 0 replies; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 4:29 UTC (permalink / raw) To: thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable, Khadem Ullah In secondary processes, directly accessing 'dev->data->dev_private' can cause a segmentation fault if the primary process has exited or if the shared memory is no longer accessible. This patch adds a safety check using rte_mem_virt2phy(), with an unlikely() branch hint to minimize performance impact in the fast path. This ensures 'dev_private' is still valid before accessing it. Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") Cc: stable@dpdk.org Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> --- lib/ethdev/rte_ethdev.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index dd7c00bc94..ef5dc55f2e 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) if (dev->dev_ops->dev_infos_get == NULL) return -ENOTSUP; + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { + RTE_ETHDEV_LOG_LINE(ERR, + "Secondary: dev_private not accessible (primary exited?)"); + rte_errno = ENODEV; + return -rte_errno; + } diag = dev->dev_ops->dev_infos_get(dev, dev_info); if (diag != 0) { /* Cleanup already filled in device information */ -- 2.43.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-22 11:54 [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah 2025-07-22 13:39 ` Stephen Hemminger 2025-07-23 4:29 ` Khadem Ullah @ 2025-07-23 4:50 ` Khadem Ullah 2025-07-23 12:19 ` Khadem Ullah ` (2 more replies) 2 siblings, 3 replies; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 4:50 UTC (permalink / raw) To: stephen, thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable, Khadem Ullah In secondary processes, directly accessing 'dev->data->dev_private' can cause a segmentation fault if the primary process has exited or if the shared memory is no longer accessible. This patch adds a safety check using rte_mem_virt2phy(), with an unlikely() branch hint to minimize performance impact in the fast path. This ensures 'dev_private' is still valid before accessing it. Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") Cc: stable@dpdk.org Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> --- lib/ethdev/rte_ethdev.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index dd7c00bc94..ef5dc55f2e 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) if (dev->dev_ops->dev_infos_get == NULL) return -ENOTSUP; + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { + RTE_ETHDEV_LOG_LINE(ERR, + "Secondary: dev_private not accessible (primary exited?)"); + rte_errno = ENODEV; + return -rte_errno; + } diag = dev->dev_ops->dev_infos_get(dev, dev_info); if (diag != 0) { /* Cleanup already filled in device information */ -- 2.43.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah @ 2025-07-23 12:19 ` Khadem Ullah 2025-07-23 13:13 ` Khadem Ullah 2025-07-23 13:10 ` [PATCH] [PATCH v3] " Khadem Ullah 2025-07-23 14:21 ` [PATCH v2] " Stephen Hemminger 2 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 12:19 UTC (permalink / raw) To: stephen, thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable [-- Attachment #1: Type: text/plain, Size: 2453 bytes --] Secondary application not only breaking on device closing, it's also getting segfault when we do "show device info all" from secondary after primary closes: testpmd> show device info all ********************* Infos for device 0000:03:00.0 ********************* Bus name: pci Bus information: vendor_id=15b3, device_id=101d Driver name: mlx5_pci Devargs: Connect to socket: 0 Segmentation fault (core dumped) This patch prevents these crashes and it seems that these fixes should be in PMD along with the ethdev layer. Some more checks will be added in the next version to prevent "show device info all" crash. On Wed, Jul 23, 2025 at 9:50 AM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > In secondary processes, directly accessing 'dev->data->dev_private' can > cause a segmentation fault if the primary process has exited or if the > shared memory is no longer accessible. > > This patch adds a safety check using rte_mem_virt2phy(), with an > unlikely() branch hint to minimize performance impact in the fast path. > This ensures 'dev_private' is still valid before accessing it. > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return > int") > Cc: stable@dpdk.org > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > --- > lib/ethdev/rte_ethdev.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > index dd7c00bc94..ef5dc55f2e 100644 > --- a/lib/ethdev/rte_ethdev.c > +++ b/lib/ethdev/rte_ethdev.c > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct > rte_eth_dev_info *dev_info) > > if (dev->dev_ops->dev_infos_get == NULL) > return -ENOTSUP; > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == > RTE_BAD_PHYS_ADDR)) { > + RTE_ETHDEV_LOG_LINE(ERR, > + "Secondary: dev_private not accessible (primary > exited?)"); > + rte_errno = ENODEV; > + return -rte_errno; > + } > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > if (diag != 0) { > /* Cleanup already filled in device information */ > -- > 2.43.0 > > -- Engr. Khadem Ullah, Software Engineer, Dreambig Semiconductor Inc https://dreambigsemi.com/ [-- Attachment #2: Type: text/html, Size: 3832 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 12:19 ` Khadem Ullah @ 2025-07-23 13:13 ` Khadem Ullah 2025-07-23 13:24 ` Ivan Malov 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 13:13 UTC (permalink / raw) To: stephen, thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable [-- Attachment #1: Type: text/plain, Size: 2773 bytes --] Please check v3 of this patch. On Wed, Jul 23, 2025 at 5:19 PM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > Secondary application not only breaking on device closing, > it's also getting segfault when we do "show device info all" from secondary > after primary closes: > > testpmd> show device info all > > ********************* Infos for device 0000:03:00.0 ********************* > Bus name: pci > Bus information: vendor_id=15b3, device_id=101d > Driver name: mlx5_pci > Devargs: > Connect to socket: 0 > > Segmentation fault (core dumped) > > This patch prevents these crashes and it seems that these fixes should be in PMD along with the ethdev layer. Some more checks will be added in the next version to prevent "show device info all" crash. > > > > On Wed, Jul 23, 2025 at 9:50 AM Khadem Ullah < > 14pwcse1224@uetpeshawar.edu.pk> wrote: > >> In secondary processes, directly accessing 'dev->data->dev_private' can >> cause a segmentation fault if the primary process has exited or if the >> shared memory is no longer accessible. >> >> This patch adds a safety check using rte_mem_virt2phy(), with an >> unlikely() branch hint to minimize performance impact in the fast path. >> This ensures 'dev_private' is still valid before accessing it. >> >> Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return >> int") >> Cc: stable@dpdk.org >> >> Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> >> --- >> lib/ethdev/rte_ethdev.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c >> index dd7c00bc94..ef5dc55f2e 100644 >> --- a/lib/ethdev/rte_ethdev.c >> +++ b/lib/ethdev/rte_ethdev.c >> @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct >> rte_eth_dev_info *dev_info) >> >> if (dev->dev_ops->dev_infos_get == NULL) >> return -ENOTSUP; >> + if (rte_eal_process_type() == RTE_PROC_SECONDARY && >> + unlikely(rte_mem_virt2phy(dev->data->dev_private) == >> RTE_BAD_PHYS_ADDR)) { >> + RTE_ETHDEV_LOG_LINE(ERR, >> + "Secondary: dev_private not accessible (primary >> exited?)"); >> + rte_errno = ENODEV; >> + return -rte_errno; >> + } >> diag = dev->dev_ops->dev_infos_get(dev, dev_info); >> if (diag != 0) { >> /* Cleanup already filled in device information */ >> -- >> 2.43.0 >> >> > > -- > Engr. Khadem Ullah, > Software Engineer, > Dreambig Semiconductor Inc > https://dreambigsemi.com/ > -- Engr. Khadem Ullah, Software Engineer, Dreambig Semiconductor Inc https://dreambigsemi.com/ [-- Attachment #2: Type: text/html, Size: 4968 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:13 ` Khadem Ullah @ 2025-07-23 13:24 ` Ivan Malov 2025-07-23 13:26 ` Khadem Ullah 0 siblings, 1 reply; 25+ messages in thread From: Ivan Malov @ 2025-07-23 13:24 UTC (permalink / raw) To: Khadem Ullah; +Cc: stephen, thomas, ferruh.yigit, andrew.rybchenko, dev, stable [-- Attachment #1: Type: text/plain, Size: 3233 bytes --] Hi Khadem, On Wed, 23 Jul 2025, Khadem Ullah wrote: > Please check v3 of this patch. Checked. In the meantime, you're not overlooking to mark previous versions [1] with status 'SUPERSEDED' in DPDK Patchwork, are you? [1] https://patches.dpdk.org/project/dpdk/patch/20250723045022.1580829-1-14pwcse1224@uetpeshawar.edu.pk/ > > On Wed, Jul 23, 2025 at 5:19 PM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > Secondary application not only breaking on device closing, > it's also getting segfault when we do "show device info all" from secondary > after primary closes: > testpmd> show device info all > ********************* Infos for device 0000:03:00.0 ********************* > Bus name: pci > Bus information: vendor_id=15b3, device_id=101d > Driver name: mlx5_pci > Devargs: > Connect to socket: 0 > Segmentation fault (core dumped) > This patch prevents these crashes and it seems that these fixes should be in PMD along with the ethdev layer. Some more checks will be added in the next version to prevent "show device i > nfo all" crash. > > > On Wed, Jul 23, 2025 at 9:50 AM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > In secondary processes, directly accessing 'dev->data->dev_private' can > cause a segmentation fault if the primary process has exited or if the > shared memory is no longer accessible. > > This patch adds a safety check using rte_mem_virt2phy(), with an > unlikely() branch hint to minimize performance impact in the fast path. > This ensures 'dev_private' is still valid before accessing it. > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") > Cc: stable@dpdk.org > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > --- > lib/ethdev/rte_ethdev.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > index dd7c00bc94..ef5dc55f2e 100644 > --- a/lib/ethdev/rte_ethdev.c > +++ b/lib/ethdev/rte_ethdev.c > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) > > if (dev->dev_ops->dev_infos_get == NULL) > return -ENOTSUP; > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { > + RTE_ETHDEV_LOG_LINE(ERR, > + "Secondary: dev_private not accessible (primary exited?)"); > + rte_errno = ENODEV; > + return -rte_errno; > + } > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > if (diag != 0) { > /* Cleanup already filled in device information */ > -- > 2.43.0 > > > > -- > Engr. Khadem Ullah, > Software Engineer, > Dreambig Semiconductor Inc > https://dreambigsemi.com/ > > > > -- > Engr. Khadem Ullah, > Software Engineer, > Dreambig Semiconductor Inc > https://dreambigsemi.com/ > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:24 ` Ivan Malov @ 2025-07-23 13:26 ` Khadem Ullah 2025-07-23 13:31 ` Ivan Malov 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 13:26 UTC (permalink / raw) To: Ivan Malov Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 3522 bytes --] Thanks, no. Can, I ? On Wed, Jul 23, 2025, 18:24 Ivan Malov <ivan.malov@arknetworks.am> wrote: > Hi Khadem, > > On Wed, 23 Jul 2025, Khadem Ullah wrote: > > > Please check v3 of this patch. > > Checked. In the meantime, you're not overlooking to mark previous versions > [1] > with status 'SUPERSEDED' in DPDK Patchwork, are you? > > [1] > https://patches.dpdk.org/project/dpdk/patch/20250723045022.1580829-1-14pwcse1224@uetpeshawar.edu.pk/ > > > > > On Wed, Jul 23, 2025 at 5:19 PM Khadem Ullah < > 14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > Secondary application not only breaking on device closing, > > it's also getting segfault when we do "show device info all" from > secondary > > after primary closes: > > testpmd> show device info all > > ********************* Infos for device 0000:03:00.0 ********************* > > Bus name: pci > > Bus information: vendor_id=15b3, device_id=101d > > Driver name: mlx5_pci > > Devargs: > > Connect to socket: 0 > > Segmentation fault (core dumped) > > This patch prevents these crashes and it seems that these fixes should > be in PMD along with the ethdev layer. Some more checks will be added in > the next version to prevent "show device i > > nfo all" crash. > > > > > > On Wed, Jul 23, 2025 at 9:50 AM Khadem Ullah < > 14pwcse1224@uetpeshawar.edu.pk> wrote: > > In secondary processes, directly accessing > 'dev->data->dev_private' can > > cause a segmentation fault if the primary process has exited or if > the > > shared memory is no longer accessible. > > > > This patch adds a safety check using rte_mem_virt2phy(), with an > > unlikely() branch hint to minimize performance impact in the fast > path. > > This ensures 'dev_private' is still valid before accessing it. > > > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to > return int") > > Cc: stable@dpdk.org > > > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > --- > > lib/ethdev/rte_ethdev.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > index dd7c00bc94..ef5dc55f2e 100644 > > --- a/lib/ethdev/rte_ethdev.c > > +++ b/lib/ethdev/rte_ethdev.c > > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, > struct rte_eth_dev_info *dev_info) > > > > if (dev->dev_ops->dev_infos_get == NULL) > > return -ENOTSUP; > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > + unlikely(rte_mem_virt2phy(dev->data->dev_private) > == RTE_BAD_PHYS_ADDR)) { > > + RTE_ETHDEV_LOG_LINE(ERR, > > + "Secondary: dev_private not accessible > (primary exited?)"); > > + rte_errno = ENODEV; > > + return -rte_errno; > > + } > > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > > if (diag != 0) { > > /* Cleanup already filled in device information */ > > -- > > 2.43.0 > > > > > > > > -- > > Engr. Khadem Ullah, > > Software Engineer, > > Dreambig Semiconductor Inc > > https://dreambigsemi.com/ > > > > > > > > -- > > Engr. Khadem Ullah, > > Software Engineer, > > Dreambig Semiconductor Inc > > https://dreambigsemi.com/ > > > > [-- Attachment #2: Type: text/html, Size: 5182 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:26 ` Khadem Ullah @ 2025-07-23 13:31 ` Ivan Malov 0 siblings, 0 replies; 25+ messages in thread From: Ivan Malov @ 2025-07-23 13:31 UTC (permalink / raw) To: Khadem Ullah Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 4393 bytes --] On Wed, 23 Jul 2025, Khadem Ullah wrote: > Thanks, no. Can, I ? Typically, the patch author (as indicated in their respective e-mail address) can sign up for patchwork account, sign in and mark any of their own patches as superseded, in order to help the delegated reviewers tell new from outdated. Thank you. > > On Wed, Jul 23, 2025, 18:24 Ivan Malov <ivan.malov@arknetworks.am> wrote: > Hi Khadem, > > On Wed, 23 Jul 2025, Khadem Ullah wrote: > > > Please check v3 of this patch. > > Checked. In the meantime, you're not overlooking to mark previous versions [1] > with status 'SUPERSEDED' in DPDK Patchwork, are you? > > [1] https://patches.dpdk.org/project/dpdk/patch/20250723045022.1580829-1-14pwcse1224@uetpeshawar.edu.pk/ > > > > > On Wed, Jul 23, 2025 at 5:19 PM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > > > Secondary application not only breaking on device closing, > > it's also getting segfault when we do "show device info all" from secondary > > after primary closes: > > testpmd> show device info all > > ********************* Infos for device 0000:03:00.0 ********************* > > Bus name: pci > > Bus information: vendor_id=15b3, device_id=101d > > Driver name: mlx5_pci > > Devargs: > > Connect to socket: 0 > > Segmentation fault (core dumped) > > This patch prevents these crashes and it seems that these fixes should be in PMD along with the ethdev layer. Some more checks will be added in the next version to prevent > "show device i > > nfo all" crash. > > > > > > On Wed, Jul 23, 2025 at 9:50 AM Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > > In secondary processes, directly accessing 'dev->data->dev_private' can > > cause a segmentation fault if the primary process has exited or if the > > shared memory is no longer accessible. > > > > This patch adds a safety check using rte_mem_virt2phy(), with an > > unlikely() branch hint to minimize performance impact in the fast path. > > This ensures 'dev_private' is still valid before accessing it. > > > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") > > Cc: stable@dpdk.org > > > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > --- > > lib/ethdev/rte_ethdev.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > index dd7c00bc94..ef5dc55f2e 100644 > > --- a/lib/ethdev/rte_ethdev.c > > +++ b/lib/ethdev/rte_ethdev.c > > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) > > > > if (dev->dev_ops->dev_infos_get == NULL) > > return -ENOTSUP; > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { > > + RTE_ETHDEV_LOG_LINE(ERR, > > + "Secondary: dev_private not accessible (primary exited?)"); > > + rte_errno = ENODEV; > > + return -rte_errno; > > + } > > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > > if (diag != 0) { > > /* Cleanup already filled in device information */ > > -- > > 2.43.0 > > > > > > > > -- > > Engr. Khadem Ullah, > > Software Engineer, > > Dreambig Semiconductor Inc > > https://dreambigsemi.com/ > > > > > > > > -- > > Engr. Khadem Ullah, > > Software Engineer, > > Dreambig Semiconductor Inc > > https://dreambigsemi.com/ > > > > > > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] [PATCH v3] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah 2025-07-23 12:19 ` Khadem Ullah @ 2025-07-23 13:10 ` Khadem Ullah 2025-07-23 13:19 ` Ivan Malov 2025-07-23 14:21 ` [PATCH v2] " Stephen Hemminger 2 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 13:10 UTC (permalink / raw) To: stephen, thomas, ferruh.yigit, andrew.rybchenko; +Cc: dev, stable, Khadem Ullah In secondary processes, directly accessing 'dev->data->dev_private' can cause a segmentation fault if the primary process has exited or if the shared memory is no longer accessible. Secondary application not only breaking on device closing, but also getting segfault when we do "show device info all" from secondary after primary closes. This patch adds safety checks while using rte_mem_virt2phy(), with an unlikely() branch hint to minimize performance impact in the fast path. This ensures 'dev_private' is still valid before accessing it. Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") Cc: stable@dpdk.org Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> --- lib/ethdev/rte_ethdev.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index dd7c00bc94..343e156a4f 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) if (dev->dev_ops->dev_infos_get == NULL) return -ENOTSUP; + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { + RTE_ETHDEV_LOG_LINE(ERR, + "Secondary: dev_private not accessible (primary exited?)"); + rte_errno = ENODEV; + return -rte_errno; + } diag = dev->dev_ops->dev_infos_get(dev, dev_info); if (diag != 0) { /* Cleanup already filled in device information */ @@ -4307,7 +4314,13 @@ rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr) port_id); return -EINVAL; } - + if (rte_eal_process_type() == RTE_PROC_SECONDARY && + (dev->data->mac_addrs == NULL)) { + RTE_ETHDEV_LOG_LINE(ERR, + "Secondary: dev_private not accessible (primary exited?)"); + rte_errno = ENODEV; + return -rte_errno; + } rte_ether_addr_copy(&dev->data->mac_addrs[0], mac_addr); rte_eth_trace_macaddr_get(port_id, mac_addr); -- 2.43.0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [PATCH v3] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:10 ` [PATCH] [PATCH v3] " Khadem Ullah @ 2025-07-23 13:19 ` Ivan Malov 2025-07-23 13:34 ` Khadem Ullah 0 siblings, 1 reply; 25+ messages in thread From: Ivan Malov @ 2025-07-23 13:19 UTC (permalink / raw) To: Khadem Ullah; +Cc: stephen, thomas, ferruh.yigit, andrew.rybchenko, dev, stable Hi Khadem, On Wed, 23 Jul 2025, Khadem Ullah wrote: > In secondary processes, directly accessing 'dev->data->dev_private' can > cause a segmentation fault if the primary process has exited or if the > shared memory is no longer accessible. > > Secondary application not only breaking on device closing, > but also getting segfault when we do "show device info all" from secondary > after primary closes. > > This patch adds safety checks while using rte_mem_virt2phy(), with an > unlikely() branch hint to minimize performance impact in the fast path. > This ensures 'dev_private' is still valid before accessing it. > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") > Cc: stable@dpdk.org > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > --- > lib/ethdev/rte_ethdev.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > index dd7c00bc94..343e156a4f 100644 > --- a/lib/ethdev/rte_ethdev.c > +++ b/lib/ethdev/rte_ethdev.c > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info) > > if (dev->dev_ops->dev_infos_get == NULL) > return -ENOTSUP; > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == RTE_BAD_PHYS_ADDR)) { > + RTE_ETHDEV_LOG_LINE(ERR, > + "Secondary: dev_private not accessible (primary exited?)"); > + rte_errno = ENODEV; > + return -rte_errno; > + } > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > if (diag != 0) { > /* Cleanup already filled in device information */ > @@ -4307,7 +4314,13 @@ rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr) > port_id); > return -EINVAL; > } > - > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > + (dev->data->mac_addrs == NULL)) { > + RTE_ETHDEV_LOG_LINE(ERR, > + "Secondary: dev_private not accessible (primary exited?)"); > + rte_errno = ENODEV; > + return -rte_errno; > + } > rte_ether_addr_copy(&dev->data->mac_addrs[0], mac_addr); > > rte_eth_trace_macaddr_get(port_id, mac_addr); I see one more API has been augmented with the check. But community members may still argue this is not robust, as many other APIs will also fail. So, even if the task was to augment as many APIs as possible with the check, then the check would still be required to be factorised/generalised somehow. What do you think? Please also note that there are already macro invocations in many of these APIs, for example, RTE_ETH_VALID_PORTID_OR_ERR_RET. Could be convenient. Thank you. > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [PATCH v3] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:19 ` Ivan Malov @ 2025-07-23 13:34 ` Khadem Ullah 2025-07-23 14:22 ` Stephen Hemminger 0 siblings, 1 reply; 25+ messages in thread From: Khadem Ullah @ 2025-07-23 13:34 UTC (permalink / raw) To: Ivan Malov Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable [-- Attachment #1: Type: text/plain, Size: 3294 bytes --] Hi Ivan, agree. I think we can atleast currently guard all the known crashes. Sure, I will check the macro and get back to you. Thank you! On Wed, Jul 23, 2025, 18:19 Ivan Malov <ivan.malov@arknetworks.am> wrote: > Hi Khadem, > > On Wed, 23 Jul 2025, Khadem Ullah wrote: > > > In secondary processes, directly accessing 'dev->data->dev_private' can > > cause a segmentation fault if the primary process has exited or if the > > shared memory is no longer accessible. > > > > Secondary application not only breaking on device closing, > > but also getting segfault when we do "show device info all" from > secondary > > after primary closes. > > > > This patch adds safety checks while using rte_mem_virt2phy(), with an > > unlikely() branch hint to minimize performance impact in the fast path. > > This ensures 'dev_private' is still valid before accessing it. > > > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return > int") > > Cc: stable@dpdk.org > > > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > --- > > lib/ethdev/rte_ethdev.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > index dd7c00bc94..343e156a4f 100644 > > --- a/lib/ethdev/rte_ethdev.c > > +++ b/lib/ethdev/rte_ethdev.c > > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct > rte_eth_dev_info *dev_info) > > > > if (dev->dev_ops->dev_infos_get == NULL) > > return -ENOTSUP; > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == > RTE_BAD_PHYS_ADDR)) { > > + RTE_ETHDEV_LOG_LINE(ERR, > > + "Secondary: dev_private not accessible (primary > exited?)"); > > + rte_errno = ENODEV; > > + return -rte_errno; > > + } > > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > > if (diag != 0) { > > /* Cleanup already filled in device information */ > > @@ -4307,7 +4314,13 @@ rte_eth_macaddr_get(uint16_t port_id, struct > rte_ether_addr *mac_addr) > > port_id); > > return -EINVAL; > > } > > - > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > + (dev->data->mac_addrs == NULL)) { > > + RTE_ETHDEV_LOG_LINE(ERR, > > + "Secondary: dev_private not accessible (primary > exited?)"); > > + rte_errno = ENODEV; > > + return -rte_errno; > > + } > > rte_ether_addr_copy(&dev->data->mac_addrs[0], mac_addr); > > > > rte_eth_trace_macaddr_get(port_id, mac_addr); > > I see one more API has been augmented with the check. But community > members may > still argue this is not robust, as many other APIs will also fail. So, > even if > the task was to augment as many APIs as possible with the check, then the > check > would still be required to be factorised/generalised somehow. What do you > think? > > Please also note that there are already macro invocations in many of these > APIs, > for example, RTE_ETH_VALID_PORTID_OR_ERR_RET. Could be convenient. > > Thank you. > > > -- > > 2.43.0 > > > > > [-- Attachment #2: Type: text/html, Size: 4493 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] [PATCH v3] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 13:34 ` Khadem Ullah @ 2025-07-23 14:22 ` Stephen Hemminger 0 siblings, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2025-07-23 14:22 UTC (permalink / raw) To: Khadem Ullah Cc: Ivan Malov, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, dpdk stable On Wed, 23 Jul 2025 18:34:04 +0500 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > Hi Ivan, agree. I think we can atleast currently guard all the known > crashes. > > Sure, I will check the macro and get back to you. > > Thank you! > > On Wed, Jul 23, 2025, 18:19 Ivan Malov <ivan.malov@arknetworks.am> wrote: > > > Hi Khadem, > > > > On Wed, 23 Jul 2025, Khadem Ullah wrote: > > > > > In secondary processes, directly accessing 'dev->data->dev_private' can > > > cause a segmentation fault if the primary process has exited or if the > > > shared memory is no longer accessible. > > > > > > Secondary application not only breaking on device closing, > > > but also getting segfault when we do "show device info all" from > > secondary > > > after primary closes. > > > > > > This patch adds safety checks while using rte_mem_virt2phy(), with an > > > unlikely() branch hint to minimize performance impact in the fast path. > > > This ensures 'dev_private' is still valid before accessing it. > > > > > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return > > int") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > > > --- > > > lib/ethdev/rte_ethdev.c | 15 ++++++++++++++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > > > index dd7c00bc94..343e156a4f 100644 > > > --- a/lib/ethdev/rte_ethdev.c > > > +++ b/lib/ethdev/rte_ethdev.c > > > @@ -4079,6 +4079,13 @@ rte_eth_dev_info_get(uint16_t port_id, struct > > rte_eth_dev_info *dev_info) > > > > > > if (dev->dev_ops->dev_infos_get == NULL) > > > return -ENOTSUP; > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > + unlikely(rte_mem_virt2phy(dev->data->dev_private) == > > RTE_BAD_PHYS_ADDR)) { > > > + RTE_ETHDEV_LOG_LINE(ERR, > > > + "Secondary: dev_private not accessible (primary > > exited?)"); > > > + rte_errno = ENODEV; > > > + return -rte_errno; > > > + } > > > diag = dev->dev_ops->dev_infos_get(dev, dev_info); > > > if (diag != 0) { > > > /* Cleanup already filled in device information */ > > > @@ -4307,7 +4314,13 @@ rte_eth_macaddr_get(uint16_t port_id, struct > > rte_ether_addr *mac_addr) > > > port_id); > > > return -EINVAL; > > > } > > > - > > > + if (rte_eal_process_type() == RTE_PROC_SECONDARY && > > > + (dev->data->mac_addrs == NULL)) { > > > + RTE_ETHDEV_LOG_LINE(ERR, > > > + "Secondary: dev_private not accessible (primary > > exited?)"); > > > + rte_errno = ENODEV; > > > + return -rte_errno; > > > + } > > > rte_ether_addr_copy(&dev->data->mac_addrs[0], mac_addr); > > > > > > rte_eth_trace_macaddr_get(port_id, mac_addr); > > > > I see one more API has been augmented with the check. But community > > members may > > still argue this is not robust, as many other APIs will also fail. So, > > even if > > the task was to augment as many APIs as possible with the check, then the > > check > > would still be required to be factorised/generalised somehow. What do you > > think? > > > > Please also note that there are already macro invocations in many of these > > APIs, > > for example, RTE_ETH_VALID_PORTID_OR_ERR_RET. Could be convenient. > > > > Thank you. > > > > > -- > > > 2.43.0 > > > > > > > > No top posting. How are you monitoring the primary? Lets fix that ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah 2025-07-23 12:19 ` Khadem Ullah 2025-07-23 13:10 ` [PATCH] [PATCH v3] " Khadem Ullah @ 2025-07-23 14:21 ` Stephen Hemminger 2 siblings, 0 replies; 25+ messages in thread From: Stephen Hemminger @ 2025-07-23 14:21 UTC (permalink / raw) To: Khadem Ullah; +Cc: thomas, ferruh.yigit, andrew.rybchenko, dev, stable On Wed, 23 Jul 2025 00:50:22 -0400 Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote: > In secondary processes, directly accessing 'dev->data->dev_private' can > cause a segmentation fault if the primary process has exited or if the > shared memory is no longer accessible. > > This patch adds a safety check using rte_mem_virt2phy(), with an > unlikely() branch hint to minimize performance impact in the fast path. > This ensures 'dev_private' is still valid before accessing it. > > Fixes: bdad90d12ec8 ("ethdev: change device info get callback to return int") > Cc: stable@dpdk.org > > Signed-off-by: Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> > --- No still a band-aid solution, and it is depending on side effects. Is your secondary process monitoring for primary process exiting? If it is, then lets fix the monitoring path. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-07-23 14:22 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-07-22 11:54 [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah 2025-07-22 13:39 ` Stephen Hemminger 2025-07-22 14:30 ` Khadem Ullah 2025-07-22 15:42 ` Stephen Hemminger 2025-07-22 16:01 ` Khadem Ullah 2025-07-22 16:13 ` Bruce Richardson 2025-07-22 17:04 ` Khadem Ullah 2025-07-22 17:38 ` Stephen Hemminger 2025-07-22 17:53 ` Khadem Ullah 2025-07-22 18:21 ` Stephen Hemminger 2025-07-22 19:03 ` Khadem Ullah 2025-07-22 19:05 ` Ivan Malov 2025-07-22 22:28 ` Stephen Hemminger 2025-07-23 4:29 ` Khadem Ullah 2025-07-23 4:50 ` [PATCH v2] " Khadem Ullah 2025-07-23 12:19 ` Khadem Ullah 2025-07-23 13:13 ` Khadem Ullah 2025-07-23 13:24 ` Ivan Malov 2025-07-23 13:26 ` Khadem Ullah 2025-07-23 13:31 ` Ivan Malov 2025-07-23 13:10 ` [PATCH] [PATCH v3] " Khadem Ullah 2025-07-23 13:19 ` Ivan Malov 2025-07-23 13:34 ` Khadem Ullah 2025-07-23 14:22 ` Stephen Hemminger 2025-07-23 14:21 ` [PATCH v2] " Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).