DPDK patches and discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"Shani Peretz" <shperetz@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Stephen Hemminger <stephen@networkplumber.org>
Subject: RE: [RFC PATCH 0/5] Introduce mempool object new debug capabilities
Date: Mon, 25 Aug 2025 11:27:34 +0000	[thread overview]
Message-ID: <MN6PR12MB8567FCF9D54B6E55F3AF7C3BDF3EA@MN6PR12MB8567.namprd12.prod.outlook.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9FDBA@smartserver.smartshare.dk>

Hi, Morten

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Saturday, July 19, 2025 5:39 PM
> To: Shani Peretz <shperetz@nvidia.com>
> Cc: dev@dpdk.org; Stephen Hemminger <stephen@networkplumber.org>
> Subject: RE: [RFC PATCH 0/5] Introduce mempool object new debug
> capabilities
> 
> > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > Sent: Monday, 7 July 2025 14.11
> >
> > > From: Shani Peretz [mailto:shperetz@nvidia.com]
> > > Sent: Monday, 7 July 2025 07.45
> > >
> > > > From: Stephen Hemminger <stephen@networkplumber.org>
> > > > Sent: Monday, 16 June 2025 18:30
> > > >
> > > > On Mon, 16 Jun 2025 10:29:05 +0300 Shani Peretz
> > > > <shperetz@nvidia.com> wrote:
> > > >
> > > > > This feature is designed to monitor the lifecycle of mempool
> > objects
> > > > > as they move between the application and the PMD.
> > > > >
> > > > > It will allow us to track the operations and transitions of each
> > > > > mempool object throughout the system, helping in debugging and
> > > > understanding objects flow.
> > > > >
> > > > > The implementation include several key components:
> > > > > 1. Added a bitmap to mempool's header (rte_mempool_objhdr)
> > > > >    that represent the operations history.
> > > > > 2. Added functions that allow marking operations on an
> > > > >    mempool objects.
> > > > > 3. Dumps the history to a file or the console
> > > > >    (rte_mempool_objects_dump).
> > > > > 4. Added python script that can parse, analyze the data and
> > > > >    present it in an human readable format.
> > > > > 5. Added compilation flag to enable the feature.
> > > > >
> > > >
> > > > Could this not already be done with tracing infrastructure?
> > >
> > > Hey,
> > > We did consider tracing but:
> > > 	- It has limited capacity, which will result in older mbufs being
> > > lost in the tracing output while they are still in use
> > > 	- Some operations may be lost, and we might not capture the
> > > complete picture due to trace misses caused by the performance
> > overhead
> > > of tracking on the datapath as far as I understand WDYT?
> >
> > This looks like an alternative trace infrastructure, just for mempool
> > objects.

It is rather not alternative, but some orthogonal way to trace.
We gather the life history per mbuf and it helps a lot with nailing the issues
on the user's sites. From our practice - it helped instantly to find the mbuf double free,
and core policy violations for rx/tx_burst routines in the user applications .

> > But the list of operations is limited to basic operations on mbuf
> > mempool objects.

Agree, but this limitation is because of only mbuf life milestones
are well-defined in DPDK:
- alloc by app/tx_burst/tx queued|tx_busy/free by PMD|app
- alloc  by PMD/rx replenish/rx_burst_returned/free by app

No other mempool-based object milestones are defined so well.
So, do you think we should move this history data to the mbuf object
itself ?

> > It lacks support for other operations on mbufs, e.g. IP
> > fragmentation/defragmentation library operations, application specific

Yes, these ones can be added later, once we see the user's request.
There are too many options to cover and limited field bit width to embrace all of them 😊

> > operations, and transitions between the mempool cache and the mempool
> > backing store.
> > It also lacks support for operations on other mempool objects than
> > mbufs.
> >
> > You might better off using the trace infrastructure, or something
> > similar.
We considered using the existing trace feature.
We even tried in practice to achieve our debugging goals with trace
on the user's sites:
Trace:
- is overwhelming - to many events with too many mbufs
- is incomplete - sometimes there were longtime runs to catch the bug conditions
and it was a little bit hard to record all the huge trace to the files in real time,
usually we had gaps
- is performance impacting. It records the full event information, that is redundant,
queries timestamp (and this action might introduce execution barriers), uses more memory
(as a large linear address range - it impacts cache performance). Sometimes it is crucially
not to impact the datapath performance to have an issue repro onsite.

> > Using the trace infrastructure allows you to record more detailed
> > information along with the transitions of "owners" of each mbuf.
> >
> > I'm not opposing this RFC, but I think it is very limited, and not
> > sufficiently expandable.
It was intentionally developed to be limited, very compact and very fast.

> >
> > I get the point that trace can cause old events on active mbufs to be
> > lost, and the concept of a trace buffer per mempool object is a good
> > solution to that.
> > But I think you need to be able to store much more information with
> > each transition; at least a timestamp. And if you do that, you need rdtsc
> > much more than 4 bits per event.
> >
> > Alternatively, if you do proceed with the RFC in the current form, I
> > have two key suggestions:
> > 1. Make it possible to register operations at runtime. (Look at
> > dynamic mbuf fields for inspiration.) 2. Use 8 bits for the operation,
> > instead of 4.
OK, we can do:
- narrow the gather history feature to mbuf only
- use the dynamic mbuf field for that
- registration would involve the extra memory access and affects the perf,
I would prefer to have fixed set of events to record, at least on the feature
firstintoduction.

> > And if you need a longer trace history, you can use the rte_bitset
> > library instead of a single uint64_t.
Would be not the best option for the perf ☹
> 
> One more comment:
> If this feature is meant for mbuf type mempool objects only, it might be
> possible to add it to the mbuf library (and store the trace in the rte_mbuf
> structure) instead of the mempool library.

> Although that would prevent tracing internal mempool library operations,
> specifically moving the object between the mempool cache and mempool
> backing store.

We can update free/alloc mbuf functions, so functionality will not be lost.

With best regards,
Slava



      reply	other threads:[~2025-08-25 11:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16  7:29 Shani Peretz
2025-06-16  7:29 ` [RFC PATCH 1/5] mempool: record mempool objects operations history Shani Peretz
2025-06-16  7:29 ` [RFC PATCH 2/5] drivers: add mempool history compilation flag Shani Peretz
2025-06-16  7:29 ` [RFC PATCH 3/5] net/mlx5: mark an operation in mempool object's history Shani Peretz
2025-06-16  7:29 ` [RFC PATCH 4/5] app/testpmd: add testpmd command to dump mempool history Shani Peretz
2025-06-16  7:29 ` [RFC PATCH 5/5] usertool: add a script to parse mempool history dump Shani Peretz
2025-06-16 15:30 ` [RFC PATCH 0/5] Introduce mempool object new debug capabilities Stephen Hemminger
2025-06-19 12:57   ` Morten Brørup
2025-07-07  5:46     ` Shani Peretz
2025-07-07  5:45   ` Shani Peretz
2025-07-07 12:10     ` Morten Brørup
2025-07-19 14:39       ` Morten Brørup
2025-08-25 11:27         ` Slava Ovsiienko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN6PR12MB8567FCF9D54B6E55F3AF7C3BDF3EA@MN6PR12MB8567.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    --cc=shperetz@nvidia.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).