DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Varghese, Vipin" <Vipin.Varghese@amd.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"Yigit, Ferruh" <Ferruh.Yigit@amd.com>,
	"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH v7] app/testpmd: monitor state of primary process when using secondary
Date: Mon, 11 Aug 2025 10:37:17 +0000	[thread overview]
Message-ID: <PH7PR12MB8596C293ACF92D446DBD10508228A@PH7PR12MB8596.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20250808094900.5027f034@hermes.local>

[Public]

Hi Stepehen,

Thank you for sharing

Snipped

>
>
> On Fri,  8 Aug 2025 07:49:09 -0400
> Khadem Ullah <14pwcse1224@uetpeshawar.edu.pk> wrote:
>
> > The crashes are on 22.11, 23.03, 24.11, it is on all dpdk stable versions and 25.07
> as well.
> > Please first close primary testpmd before secondary testpmd
> > application and try to close secondary or execute any of the following
> > commands,
> >
> > "show device info all
> > show port stats all
> > show port xstats all
> > set fwd rxonly
> > set fwd txonly
> > start
> > etc"
> >
> > We are all agree that these crashes exists. First we were tried to
> > prevent the crashes at PMD level, but it was not possible to add
> > checks in each PMD. Then we tried to add safety checks in ethdev
> > layer, and it was not suitable as with primary closing all reference
> > to device information (pointers) would lead crashes.
> >
> > Then we agreed on secondary process monitoring for primary process exiting.
> > and it is now resolved on application level, i.e. on testpmd.
> >
> > Now, this solution is working perfectly. We can add eal_cleanup for
> > gracefull exit.
> >
> > Best Regards,
> > Khadem
>
> Maybe this quick picture would help explain the data structures
>
>                                │
>                                │            Huge pages (shared)
>                                │
>             rte_eth_devices[]  │
>                                │
>              ┌────────┐        │
> Primary      │        ┼────────┼───┐
> Process      │        │        │   │
>              ┌────────┐        │   │
>              │        ┼────┐   │   │
>              │        │    │   │   │              rte_eth_dev_data
>              └────────┘    │   │   │            ┌─────────────────┐
>                            │   │   │            │                 │
>                            │   │   └───────────►│                 │
>                            │   │                │              ───┼─────────────►
>                            │   │        ┌───────►                 │
>                            │   │        │       │                 │
>                            │   │        │       └─────────────────┘
>                            │   │        │
>                            │   │        │       ┌─────────────────┐
>                            │   │        │       │                 │
>                            └───┼────────┼───────►                 │
>                                │        │       │            ─────┼────────────►
>             rte_eth_devices    │        │  ┌────►                 │
>   Secondary ┌────────┐         │        │  │    │                 │
>   Process   │        ┼─────────┼────────┘  │
> └─────────────────┘
>             │        │         │           │
>             ┌────────┐         │           │
>             │        ┼─────────┼───────────┘
>             │        │         │
>             └────────┘         │
>                                │

Definitely something in way `rte_eth_Dev_data` is changed in some release.
Earlier when secondary comes up, the memory shared for rte_eth_dev_data were probed by secondary to get physical device into local memory. Then all virtual devices under secondary were added.

Hence adding physical or virtual device in primary did not reflect back to secondary after rte_eal_inti completed by secondary.
I am still trying to figure out why @Khadem Ullah mentioned

```
Please first close primary testpmd before secondary testpmd application and try to close secondary or execute any of the following commands,

"show device info all
show port stats all
show port xstats all
set fwd rxonly
set fwd txonly
start
etc"
```

`I think the reason is because, in all the testing we use SIGKILL to kill the primary, and not close or shutdown which will trigger the cleanup.
The thumb rule or understanding, if you are shutting down primary always shut down secondaries first.`

@Khadem Ullah I am open to be available to slack for you where you can recreate the issue (as you have working setup) with the version deployed.
Please do let me know.

Regards
Vipin Varghese

  parent reply	other threads:[~2025-08-11 10:37 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-22 11:54 [PATCH] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah
2025-07-22 13:39 ` Stephen Hemminger
2025-07-22 14:30   ` Khadem Ullah
2025-07-22 15:42     ` Stephen Hemminger
2025-07-22 16:01       ` Khadem Ullah
2025-07-22 16:13         ` Bruce Richardson
2025-07-22 17:04           ` Khadem Ullah
2025-07-22 17:38             ` Stephen Hemminger
2025-07-22 17:53               ` Khadem Ullah
2025-07-22 18:21                 ` Stephen Hemminger
2025-07-22 19:03                   ` Khadem Ullah
2025-07-22 19:05                   ` Ivan Malov
2025-07-22 22:28                     ` Stephen Hemminger
2025-07-23  4:29 ` Khadem Ullah
2025-07-23  4:50 ` [PATCH v2] " Khadem Ullah
2025-07-23 12:19   ` Khadem Ullah
2025-07-23 13:13     ` Khadem Ullah
2025-07-23 13:24       ` Ivan Malov
2025-07-23 13:26         ` Khadem Ullah
2025-07-23 13:31           ` Ivan Malov
2025-07-23 13:10   ` [PATCH] [PATCH v3] " Khadem Ullah
2025-07-23 13:19     ` Ivan Malov
2025-07-23 13:34       ` Khadem Ullah
2025-07-23 14:22         ` Stephen Hemminger
2025-07-24  5:49           ` Khadem Ullah
2025-07-25 13:00           ` Khadem Ullah
2025-07-25 12:55     ` [PATCH] [PATCH v4] " Khadem Ullah
2025-07-28 21:45       ` Stephen Hemminger
2025-07-29  5:42         ` Khadem Ullah
2025-07-29 21:34           ` Stephen Hemminger
2025-07-30  5:07             ` Khadem Ullah
2025-08-08  3:49               ` Varghese, Vipin
2025-08-08 15:32                 ` Stephen Hemminger
2025-08-11 10:19                   ` Varghese, Vipin
2025-08-11 10:28                     ` Khadem Ullah
2025-08-11 10:39                       ` Varghese, Vipin
2025-07-29  6:39       ` [PATCH] app/testpmd: fix segfault in secondary process by monitoring primary Khadem Ullah
2025-07-29  6:39         ` [PATCH] [PATCH v4] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Khadem Ullah
2025-07-29  6:39         ` [PATCH] [PATCH v5] app/testpmd: fix segfault in secondary process by monitoring primary Khadem Ullah
2025-07-29 14:48           ` Stephen Hemminger
2025-07-29 21:48           ` Stephen Hemminger
2025-07-30  5:24             ` Khadem Ullah
2025-08-08  3:44               ` Varghese, Vipin
2025-08-08 16:17                 ` Stephen Hemminger
2025-08-11 10:23                   ` Varghese, Vipin
2025-08-11 10:27                     ` Khadem Ullah
2025-07-30  5:56           ` [PATCH] app/testpmd: monitor state of primary process when using secondary Khadem Ullah
2025-07-30  6:08           ` [PATCH v6] " Khadem Ullah
2025-08-01 22:50             ` Stephen Hemminger
2025-08-04  7:54           ` [PATCH v7] " Khadem Ullah
2025-08-04 11:33           ` Khadem Ullah
2025-08-04 15:44             ` Stephen Hemminger
2025-08-05  0:50             ` fengchengwen
2025-08-08  3:23             ` Varghese, Vipin
2025-08-08  5:44               ` Khadem Ullah
2025-08-08 10:59                 ` Varghese, Vipin
2025-08-08 11:49                   ` Khadem Ullah
2025-08-08 16:49                     ` Stephen Hemminger
2025-08-08 17:01                       ` Khadem Ullah
2025-08-11 10:37                       ` Varghese, Vipin [this message]
2025-08-11 11:14                         ` Khadem Ullah
2025-08-11 11:34                           ` Varghese, Vipin
2025-08-11 11:55                             ` Khadem Ullah
2025-08-11 14:44                               ` Varghese, Vipin
2025-08-11 17:11                                 ` Khadem Ullah
2025-08-11 10:30                     ` Varghese, Vipin
2025-08-11 10:51                       ` Khadem Ullah
2025-08-11 11:07                         ` Varghese, Vipin
2025-08-08 15:28                 ` Stephen Hemminger
2025-08-08 15:50                   ` Khadem Ullah
2025-08-08 16:10               ` Stephen Hemminger
2025-07-23 14:21   ` [PATCH v2] lib/ethdev: fix segfault in secondary process by validating dev_private pointer Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH7PR12MB8596C293ACF92D446DBD10508228A@PH7PR12MB8596.namprd12.prod.outlook.com \
    --to=vipin.varghese@amd.com \
    --cc=14pwcse1224@uetpeshawar.edu.pk \
    --cc=Ferruh.Yigit@amd.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).