Including contigmem in core dumps

DPDK patches and discussions
 help / color / mirror / Atom feed

* Including contigmem in core dumps
@ 2024-10-22 12:41 Lewis Donzis
  2024-10-22 14:47 ` Dmitry Kozlyuk
  0 siblings, 1 reply; 8+ messages in thread
From: Lewis Donzis @ 2024-10-22 12:41 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

I've been wondering why we exclude memory allocated by eal_get_virtual_area() from core dumps? (More specifically, it calls eal_mem_set_dump() to call madvise() to disable core dumps from the allocated region.) 

On many occasions, when debugging after a crash, it would have been very convenient to be able to see the contents of an mbuf or other object allocated in contigmem space. And we often avoid using the rte memory allocator just because of this. 

Is there any reason for this, or could it perhaps be a compile-time configuration option not to call madvise()? 

[-- Attachment #2: Type: text/html, Size: 850 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Including contigmem in core dumps
  2024-10-22 12:41 Including contigmem in core dumps Lewis Donzis
@ 2024-10-22 14:47 ` Dmitry Kozlyuk
  2024-10-22 15:39   ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Kozlyuk @ 2024-10-22 14:47 UTC (permalink / raw)
  To: Lewis Donzis; +Cc: dev

2024-10-22 07:41 (UTC-0500), Lewis Donzis:
> I've been wondering why we exclude memory allocated by
> eal_get_virtual_area() from core dumps? (More specifically, it calls
> eal_mem_set_dump() to call madvise() to disable core dumps from the
> allocated region.)
> 
> On many occasions, when debugging after a crash, it would have been very
> convenient to be able to see the contents of an mbuf or other object
> allocated in contigmem space. And we often avoid using the rte memory
> allocator just because of this. 
> 
> Is there any reason for this, or could it perhaps be a compile-time
> configuration option not to call madvise()? 

The commit that originally added madvise() argued that dumping everything
ended up in coredumps with "useless" data [non-mapped or unused pages]:

http://git.dpdk.org/dpdk/commit/?id=d72e4042c5ebda7af81448b387af8218136402d0

Dumping mapped pages sounds reasonable in many cases.
Not in all cases admittedly:
- legacy memory mode mapping a lot of pages that are not (yet) used;
- if packet data is confidential while the app is not.

The option to dump or not can easily be a runtime one.
The safe default however seems to be "off".

In dynamic memory node (not FreeSBD, unfortunately)
rte_mem_event_callback-register() may be used to call madvise().
Maybe DPDK should allow such callbacks in any mode
and invoke them during initialization to make the above solution universal.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Including contigmem in core dumps
  2024-10-22 14:47 ` Dmitry Kozlyuk
@ 2024-10-22 15:39   ` Stephen Hemminger
  2024-10-22 15:57     ` Morten Brørup
  2024-10-22 16:00     ` Lewis Donzis
  0 siblings, 2 replies; 8+ messages in thread
From: Stephen Hemminger @ 2024-10-22 15:39 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: Lewis Donzis, dev

On Tue, 22 Oct 2024 17:47:11 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> 2024-10-22 07:41 (UTC-0500), Lewis Donzis:
> > I've been wondering why we exclude memory allocated by
> > eal_get_virtual_area() from core dumps? (More specifically, it calls
> > eal_mem_set_dump() to call madvise() to disable core dumps from the
> > allocated region.)
> > 
> > On many occasions, when debugging after a crash, it would have been very
> > convenient to be able to see the contents of an mbuf or other object
> > allocated in contigmem space. And we often avoid using the rte memory
> > allocator just because of this. 
> > 
> > Is there any reason for this, or could it perhaps be a compile-time
> > configuration option not to call madvise()?   
> 
> The commit that originally added madvise() argued that dumping everything
> ended up in coredumps with "useless" data [non-mapped or unused pages]:
> 
> http://git.dpdk.org/dpdk/commit/?id=d72e4042c5ebda7af81448b387af8218136402d0
> 
> Dumping mapped pages sounds reasonable in many cases.
> Not in all cases admittedly:
> - legacy memory mode mapping a lot of pages that are not (yet) used;
> - if packet data is confidential while the app is not.
> 
> The option to dump or not can easily be a runtime one.
> The safe default however seems to be "off".
> 
> In dynamic memory node (not FreeSBD, unfortunately)
> rte_mem_event_callback-register() may be used to call madvise().
> Maybe DPDK should allow such callbacks in any mode
> and invoke them during initialization to make the above solution universal.

It is not unusual to have 2 or 4 Gigabytes of huge pages mapped.
Many embedded systems do not have 6G of extra storage available for a single core
dump, not to mention multiples. Plus any storage can be really slow on embedded
systems.

And the common scenario on Linux is to use systemd to capture and compress
core dumps.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Including contigmem in core dumps
  2024-10-22 15:39   ` Stephen Hemminger
@ 2024-10-22 15:57     ` Morten Brørup
  2024-10-22 16:00     ` Lewis Donzis
  1 sibling, 0 replies; 8+ messages in thread
From: Morten Brørup @ 2024-10-22 15:57 UTC (permalink / raw)
  To: Stephen Hemminger, Dmitry Kozlyuk; +Cc: Lewis Donzis, dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 22 October 2024 17.39
> 
> On Tue, 22 Oct 2024 17:47:11 +0300
> Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> 
> > 2024-10-22 07:41 (UTC-0500), Lewis Donzis:
> > > I've been wondering why we exclude memory allocated by
> > > eal_get_virtual_area() from core dumps? (More specifically, it
> calls
> > > eal_mem_set_dump() to call madvise() to disable core dumps from the
> > > allocated region.)
> > >
> > > On many occasions, when debugging after a crash, it would have been
> very
> > > convenient to be able to see the contents of an mbuf or other
> object
> > > allocated in contigmem space. And we often avoid using the rte
> memory
> > > allocator just because of this.
> > >
> > > Is there any reason for this, or could it perhaps be a compile-time
> > > configuration option not to call madvise()?
> >
> > The commit that originally added madvise() argued that dumping
> everything
> > ended up in coredumps with "useless" data [non-mapped or unused
> pages]:
> >
> >
> http://git.dpdk.org/dpdk/commit/?id=d72e4042c5ebda7af81448b387af8218136
> 402d0
> >
> > Dumping mapped pages sounds reasonable in many cases.
> > Not in all cases admittedly:
> > - legacy memory mode mapping a lot of pages that are not (yet) used;
> > - if packet data is confidential while the app is not.
> >
> > The option to dump or not can easily be a runtime one.
> > The safe default however seems to be "off".

Please feel free to submit a patch adding an EAL command line parameter to control core dump.

> >
> > In dynamic memory node (not FreeSBD, unfortunately)
> > rte_mem_event_callback-register() may be used to call madvise().
> > Maybe DPDK should allow such callbacks in any mode
> > and invoke them during initialization to make the above solution
> universal.
> 
> It is not unusual to have 2 or 4 Gigabytes of huge pages mapped.
> Many embedded systems do not have 6G of extra storage available for a
> single core
> dump, not to mention multiples. Plus any storage can be really slow on
> embedded
> systems.
> 
> And the common scenario on Linux is to use systemd to capture and
> compress
> core dumps.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Including contigmem in core dumps
  2024-10-22 15:39   ` Stephen Hemminger
  2024-10-22 15:57     ` Morten Brørup
@ 2024-10-22 16:00     ` Lewis Donzis
  1 sibling, 0 replies; 8+ messages in thread
From: Lewis Donzis @ 2024-10-22 16:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Dmitry Kozlyuk, dev



----- On Oct 22, 2024, at 10:39 AM, Stephen Hemminger stephen@networkplumber.org wrote:

> On Tue, 22 Oct 2024 17:47:11 +0300
> Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> 
>> 2024-10-22 07:41 (UTC-0500), Lewis Donzis:
>> > I've been wondering why we exclude memory allocated by
>> > eal_get_virtual_area() from core dumps? (More specifically, it calls
>> > eal_mem_set_dump() to call madvise() to disable core dumps from the
>> > allocated region.)
>> > 
>> > On many occasions, when debugging after a crash, it would have been very
>> > convenient to be able to see the contents of an mbuf or other object
>> > allocated in contigmem space. And we often avoid using the rte memory
>> > allocator just because of this.
>> > 
>> > Is there any reason for this, or could it perhaps be a compile-time
>> > configuration option not to call madvise()?
>> 
>> The commit that originally added madvise() argued that dumping everything
>> ended up in coredumps with "useless" data [non-mapped or unused pages]:
>> 
>> http://git.dpdk.org/dpdk/commit/?id=d72e4042c5ebda7af81448b387af8218136402d0
>> 
>> Dumping mapped pages sounds reasonable in many cases.
>> Not in all cases admittedly:
>> - legacy memory mode mapping a lot of pages that are not (yet) used;
>> - if packet data is confidential while the app is not.
>> 
>> The option to dump or not can easily be a runtime one.
>> The safe default however seems to be "off".
>> 
>> In dynamic memory node (not FreeSBD, unfortunately)
>> rte_mem_event_callback-register() may be used to call madvise().
>> Maybe DPDK should allow such callbacks in any mode
>> and invoke them during initialization to make the above solution universal.
> 
> It is not unusual to have 2 or 4 Gigabytes of huge pages mapped.
> Many embedded systems do not have 6G of extra storage available for a single
> core
> dump, not to mention multiples. Plus any storage can be really slow on embedded
> systems.
> 
> And the common scenario on Linux is to use systemd to capture and compress
> core dumps.

Totally agree.  Most of the time, we wouldn't want this enabled because the core dumps would be huge, but in environments where we do have storage available, it can greatly help troubleshooting to be able to examine the contents of contigmem in the debugger.  Unfortunately, on FreeBSD, we don't have the luxury of systemd's ability to do this, or even to compress the core dumps, although we do have ZFS compression on the local filesystem, which does a pretty decent job on on core dumps.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Including contigmem in core dumps
  2024-05-28  6:55 ` Dmitry Kozlyuk
@ 2024-05-28 13:19   ` Lewis Donzis
  0 siblings, 0 replies; 8+ messages in thread
From: Lewis Donzis @ 2024-05-28 13:19 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: dev

----- On May 28, 2024, at 1:55 AM, Dmitry Kozlyuk dmitry.kozliuk@gmail.com wrote:

> Hi Lewis,
> 
> Memory reserved by eal_get_virtual_area() is not yet useful,
> but it is very large, so by excluding it from dumps,
> DPDK prevents dumps from including large zero-filled parts.
> 
> It also makes sense to call eal_mem_set_dump(..., false)
> from eal_memalloc.c:free_seg(), because of --huge-unlink=never:
> in this mode (Linux-only), freed segments are not cleared,
> so if they were included into dump, it would be a lot of garbage data.
> 
> Newly allocated hugepages are not included into dumps
> because this would make dumps very large by default.
> However, this could be an opt-in as a runtime option if need be.

Thanks for the clarification.  I agree that not including freed segments makes perfect sense.

But when debugging a core dump, it's sometimes really helpful to be able to see what's in the mbuf that was being processed at the time.  Perhaps it would be a useful option to be able to tell the allocator not to disable core dumps.

In the mean time, my experiments to get around this have not been fruitful.

I wondered if we could enable core dumps for mbufs by calling rte_mempool_mem_iter() on the pool returned by rte_pktmbuf_pool_create(), and have the callback function call madvise(memhdr->addr, memhdr->len, MADV_CORE).  But that didn't help, or at least the size of the core file didn't increase.

I then tried disabling the call to madvise() in the DPDK source code, and that didn't make any difference either.

Note that this is on FreeBSD, so I wonder if there's some fundamental reason that the contigmem memory doesn't get included in a core dump?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Including contigmem in core dumps
  2024-05-28  0:34 Lewis Donzis
@ 2024-05-28  6:55 ` Dmitry Kozlyuk
  2024-05-28 13:19   ` Lewis Donzis
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Kozlyuk @ 2024-05-28  6:55 UTC (permalink / raw)
  To: Lewis Donzis; +Cc: dev

Hi Lewis,

2024-05-27 19:34 (UTC-0500), Lewis Donzis:
> I've been wondering why we exclude memory allocated by eal_get_virtual_area() from core dumps? (More specifically, it calls eal_mem_set_dump() to call madvise() to disable core dumps from the allocated region.) 
> 
> On many occasions, when debugging after a crash, it would have been very convenient to be able to see the contents of an mbuf or other object allocated in contigmem space. And we often avoid using the rte memory allocator just because of this. 
> 
> Is there any reason for this, or could it perhaps be a compile-time configuration option not to call madvise()? 

Memory reserved by eal_get_virtual_area() is not yet useful,
but it is very large, so by excluding it from dumps,
DPDK prevents dumps from including large zero-filled parts.

It also makes sense to call eal_mem_set_dump(..., false)
from eal_memalloc.c:free_seg(), because of --huge-unlink=never:
in this mode (Linux-only), freed segments are not cleared,
so if they were included into dump, it would be a lot of garbage data.

Newly allocated hugepages are not included into dumps
because this would make dumps very large by default.
However, this could be an opt-in as a runtime option if need be.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Including contigmem in core dumps
@ 2024-05-28  0:34 Lewis Donzis
  2024-05-28  6:55 ` Dmitry Kozlyuk
  0 siblings, 1 reply; 8+ messages in thread
From: Lewis Donzis @ 2024-05-28  0:34 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

I've been wondering why we exclude memory allocated by eal_get_virtual_area() from core dumps? (More specifically, it calls eal_mem_set_dump() to call madvise() to disable core dumps from the allocated region.) 

On many occasions, when debugging after a crash, it would have been very convenient to be able to see the contents of an mbuf or other object allocated in contigmem space. And we often avoid using the rte memory allocator just because of this. 

Is there any reason for this, or could it perhaps be a compile-time configuration option not to call madvise()? 

[-- Attachment #2: Type: text/html, Size: 771 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-22 16:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-22 12:41 Including contigmem in core dumps Lewis Donzis
2024-10-22 14:47 ` Dmitry Kozlyuk
2024-10-22 15:39   ` Stephen Hemminger
2024-10-22 15:57     ` Morten Brørup
2024-10-22 16:00     ` Lewis Donzis
  -- strict thread matches above, loose matches on Subject: below --
2024-05-28  0:34 Lewis Donzis
2024-05-28  6:55 ` Dmitry Kozlyuk
2024-05-28 13:19   ` Lewis Donzis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).