On CentOS7, we observed that the program (based on dpdk 19.11) creates a huge core file size, i.e. 100+GB, far bigger than the expected <4GB. even though the system only installs 16GB memory, and allocates 1GB hugepage size at boot time. no matter if the core file is created by program panic (segfault), or run with tool gcore. On CentOS 6, the program (based on dpdk 17.05), the core file is the expected size. On CentOS7, we tried to adjust the process coredump_filter combinations, it found only when clean the bit 0 can avoid the huge core size, however, a cleared bit 0 generate small core file (200MB) and is meaningless for debug purposes, i.e. gdb bt command does not output. Is there a way to avoid dumping the hugepage memory, while remaining other memory in the core file? The following is the program pmap output comparison. on CentOS 6, the hugepage resides on the process user space: ... 00007f4e80000000 1048576K rw-s- /mnt/huge_1GB/rtemap_0 00007f4ec0000000 2048K rw-s- /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource0 00007f4ec0200000 16K rw-s- /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource4 00007f4ec0204000 2048K rw-s- /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource0 00007f4ec0404000 16K rw-s- /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource4 ... on CentOS 7, the hugepage resides on the process system space:: ... 0000000100000000 20K rw-s- config 0000000100005000 184K rw-s- fbarray_memzone 0000000100033000 4K rw-s- fbarray_memseg-1048576k-0-0 0000000140000000 1048576K rw-s- rtemap_0 0000000180000000 32505856K r---- [ anon ] 0000000940000000 4K rw-s- fbarray_memseg-1048576k-0-1 0000000980000000 33554432K r---- [ anon ] 0000001180000000 4K rw-s- fbarray_memseg-1048576k-0-2 00000011c0000000 33554432K r---- [ anon ] 00000019c0000000 4K rw-s- fbarray_memseg-1048576k-0-3 0000001a00000000 33554432K r---- [ anon ] 0000002200000000 1024K rw-s- resource0 0000002200100000 16K rw-s- resource3 0000002200104000 1024K rw-s- resource0 0000002200204000 16K rw-s- resource3 ... Thanks, -James
UPDATE: the 'kill -6' command does not dump the hugepage memory zone into
the core file.
Is there a way to bypass the hugepage memory zone dump into the core file
with running gcore command ?
On Fri, Feb 19, 2021 at 11:18 AM James Huang <jamsphon@gmail.com> wrote:
> On CentOS7, we observed that the program (based on dpdk 19.11) creates a
> huge core file size, i.e. 100+GB, far bigger than the expected <4GB. even
> though the system only installs 16GB memory, and allocates 1GB hugepage
> size at boot time. no matter if the core file is created by program panic
> (segfault), or run with tool gcore.
>
> On CentOS 6, the program (based on dpdk 17.05), the core file is the
> expected size.
>
> On CentOS7, we tried to adjust the process coredump_filter combinations,
> it found only when clean the bit 0 can avoid the huge core size, however, a
> cleared bit 0 generate small core file (200MB) and is meaningless for debug
> purposes, i.e. gdb bt command does not output.
>
> Is there a way to avoid dumping the hugepage memory, while remaining other
> memory in the core file?
>
> The following is the program pmap output comparison.
> on CentOS 6, the hugepage resides on the process user space:
> ...
> 00007f4e80000000 1048576K rw-s- /mnt/huge_1GB/rtemap_0
> 00007f4ec0000000 2048K rw-s-
> /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource0
> 00007f4ec0200000 16K rw-s-
> /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource4
> 00007f4ec0204000 2048K rw-s-
> /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource0
> 00007f4ec0404000 16K rw-s-
> /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource4
> ...
>
>
> on CentOS 7, the hugepage resides on the process system space::
> ...
> 0000000100000000 20K rw-s- config
> 0000000100005000 184K rw-s- fbarray_memzone
> 0000000100033000 4K rw-s- fbarray_memseg-1048576k-0-0
> 0000000140000000 1048576K rw-s- rtemap_0
> 0000000180000000 32505856K r---- [ anon ]
> 0000000940000000 4K rw-s- fbarray_memseg-1048576k-0-1
> 0000000980000000 33554432K r---- [ anon ]
> 0000001180000000 4K rw-s- fbarray_memseg-1048576k-0-2
> 00000011c0000000 33554432K r---- [ anon ]
> 00000019c0000000 4K rw-s- fbarray_memseg-1048576k-0-3
> 0000001a00000000 33554432K r---- [ anon ]
> 0000002200000000 1024K rw-s- resource0
> 0000002200100000 16K rw-s- resource3
> 0000002200104000 1024K rw-s- resource0
> 0000002200204000 16K rw-s- resource3
> ...
>
> Thanks,
> -James
>
I think you should update your dpdk to the latest.
I have fixed this issue some months ago.
d72e4042c - mem: exclude unused memory from core dump
Thanks,
Feng Li
James Huang <jamsphon@gmail.com> 于2021年2月24日周三 上午3:22写道:
>
> UPDATE: the 'kill -6' command does not dump the hugepage memory zone into
> the core file.
>
> Is there a way to bypass the hugepage memory zone dump into the core file
> with running gcore command ?
>
>
> On Fri, Feb 19, 2021 at 11:18 AM James Huang <jamsphon@gmail.com> wrote:
>
> > On CentOS7, we observed that the program (based on dpdk 19.11) creates a
> > huge core file size, i.e. 100+GB, far bigger than the expected <4GB. even
> > though the system only installs 16GB memory, and allocates 1GB hugepage
> > size at boot time. no matter if the core file is created by program panic
> > (segfault), or run with tool gcore.
> >
> > On CentOS 6, the program (based on dpdk 17.05), the core file is the
> > expected size.
> >
> > On CentOS7, we tried to adjust the process coredump_filter combinations,
> > it found only when clean the bit 0 can avoid the huge core size, however, a
> > cleared bit 0 generate small core file (200MB) and is meaningless for debug
> > purposes, i.e. gdb bt command does not output.
> >
> > Is there a way to avoid dumping the hugepage memory, while remaining other
> > memory in the core file?
> >
> > The following is the program pmap output comparison.
> > on CentOS 6, the hugepage resides on the process user space:
> > ...
> > 00007f4e80000000 1048576K rw-s- /mnt/huge_1GB/rtemap_0
> > 00007f4ec0000000 2048K rw-s-
> > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource0
> > 00007f4ec0200000 16K rw-s-
> > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource4
> > 00007f4ec0204000 2048K rw-s-
> > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource0
> > 00007f4ec0404000 16K rw-s-
> > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource4
> > ...
> >
> >
> > on CentOS 7, the hugepage resides on the process system space::
> > ...
> > 0000000100000000 20K rw-s- config
> > 0000000100005000 184K rw-s- fbarray_memzone
> > 0000000100033000 4K rw-s- fbarray_memseg-1048576k-0-0
> > 0000000140000000 1048576K rw-s- rtemap_0
> > 0000000180000000 32505856K r---- [ anon ]
> > 0000000940000000 4K rw-s- fbarray_memseg-1048576k-0-1
> > 0000000980000000 33554432K r---- [ anon ]
> > 0000001180000000 4K rw-s- fbarray_memseg-1048576k-0-2
> > 00000011c0000000 33554432K r---- [ anon ]
> > 00000019c0000000 4K rw-s- fbarray_memseg-1048576k-0-3
> > 0000001a00000000 33554432K r---- [ anon ]
> > 0000002200000000 1024K rw-s- resource0
> > 0000002200100000 16K rw-s- resource3
> > 0000002200104000 1024K rw-s- resource0
> > 0000002200204000 16K rw-s- resource3
> > ...
> >
> > Thanks,
> > -James
> >
@feng, thank you for the info. We'll pull the fix from
eal_common_memory.c:eal_get_virtual_area(), give it a try.
Regards,
James Huang
On Tue, Feb 23, 2021 at 8:00 PM Li Feng <fengli@smartx.com> wrote:
> I think you should update your dpdk to the latest.
> I have fixed this issue some months ago.
>
> d72e4042c - mem: exclude unused memory from core dump
>
> Thanks,
> Feng Li
>
> James Huang <jamsphon@gmail.com> 于2021年2月24日周三 上午3:22写道:
> >
> > UPDATE: the 'kill -6' command does not dump the hugepage memory zone into
> > the core file.
> >
> > Is there a way to bypass the hugepage memory zone dump into the core file
> > with running gcore command ?
> >
> >
> > On Fri, Feb 19, 2021 at 11:18 AM James Huang <jamsphon@gmail.com> wrote:
> >
> > > On CentOS7, we observed that the program (based on dpdk 19.11) creates
> a
> > > huge core file size, i.e. 100+GB, far bigger than the expected <4GB.
> even
> > > though the system only installs 16GB memory, and allocates 1GB hugepage
> > > size at boot time. no matter if the core file is created by program
> panic
> > > (segfault), or run with tool gcore.
> > >
> > > On CentOS 6, the program (based on dpdk 17.05), the core file is the
> > > expected size.
> > >
> > > On CentOS7, we tried to adjust the process coredump_filter
> combinations,
> > > it found only when clean the bit 0 can avoid the huge core size,
> however, a
> > > cleared bit 0 generate small core file (200MB) and is meaningless for
> debug
> > > purposes, i.e. gdb bt command does not output.
> > >
> > > Is there a way to avoid dumping the hugepage memory, while remaining
> other
> > > memory in the core file?
> > >
> > > The following is the program pmap output comparison.
> > > on CentOS 6, the hugepage resides on the process user space:
> > > ...
> > > 00007f4e80000000 1048576K rw-s- /mnt/huge_1GB/rtemap_0
> > > 00007f4ec0000000 2048K rw-s-
> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource0
> > > 00007f4ec0200000 16K rw-s-
> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource4
> > > 00007f4ec0204000 2048K rw-s-
> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource0
> > > 00007f4ec0404000 16K rw-s-
> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource4
> > > ...
> > >
> > >
> > > on CentOS 7, the hugepage resides on the process system space::
> > > ...
> > > 0000000100000000 20K rw-s- config
> > > 0000000100005000 184K rw-s- fbarray_memzone
> > > 0000000100033000 4K rw-s- fbarray_memseg-1048576k-0-0
> > > 0000000140000000 1048576K rw-s- rtemap_0
> > > 0000000180000000 32505856K r---- [ anon ]
> > > 0000000940000000 4K rw-s- fbarray_memseg-1048576k-0-1
> > > 0000000980000000 33554432K r---- [ anon ]
> > > 0000001180000000 4K rw-s- fbarray_memseg-1048576k-0-2
> > > 00000011c0000000 33554432K r---- [ anon ]
> > > 00000019c0000000 4K rw-s- fbarray_memseg-1048576k-0-3
> > > 0000001a00000000 33554432K r---- [ anon ]
> > > 0000002200000000 1024K rw-s- resource0
> > > 0000002200100000 16K rw-s- resource3
> > > 0000002200104000 1024K rw-s- resource0
> > > 0000002200204000 16K rw-s- resource3
> > > ...
> > >
> > > Thanks,
> > > -James
> > >
>
UPDATE: yes, the code works, in use of system api madvise().
Regards,
James Huang
On Thu, Feb 25, 2021 at 9:37 AM James Huang <jamsphon@gmail.com> wrote:
>
> @feng, thank you for the info. We'll pull the fix from
> eal_common_memory.c:eal_get_virtual_area(), give it a try.
>
> Regards,
> James Huang
>
>
> On Tue, Feb 23, 2021 at 8:00 PM Li Feng <fengli@smartx.com> wrote:
>
>> I think you should update your dpdk to the latest.
>> I have fixed this issue some months ago.
>>
>> d72e4042c - mem: exclude unused memory from core dump
>>
>> Thanks,
>> Feng Li
>>
>> James Huang <jamsphon@gmail.com> 于2021年2月24日周三 上午3:22写道:
>> >
>> > UPDATE: the 'kill -6' command does not dump the hugepage memory zone
>> into
>> > the core file.
>> >
>> > Is there a way to bypass the hugepage memory zone dump into the core
>> file
>> > with running gcore command ?
>> >
>> >
>> > On Fri, Feb 19, 2021 at 11:18 AM James Huang <jamsphon@gmail.com>
>> wrote:
>> >
>> > > On CentOS7, we observed that the program (based on dpdk 19.11)
>> creates a
>> > > huge core file size, i.e. 100+GB, far bigger than the expected <4GB.
>> even
>> > > though the system only installs 16GB memory, and allocates 1GB
>> hugepage
>> > > size at boot time. no matter if the core file is created by program
>> panic
>> > > (segfault), or run with tool gcore.
>> > >
>> > > On CentOS 6, the program (based on dpdk 17.05), the core file is the
>> > > expected size.
>> > >
>> > > On CentOS7, we tried to adjust the process coredump_filter
>> combinations,
>> > > it found only when clean the bit 0 can avoid the huge core size,
>> however, a
>> > > cleared bit 0 generate small core file (200MB) and is meaningless for
>> debug
>> > > purposes, i.e. gdb bt command does not output.
>> > >
>> > > Is there a way to avoid dumping the hugepage memory, while remaining
>> other
>> > > memory in the core file?
>> > >
>> > > The following is the program pmap output comparison.
>> > > on CentOS 6, the hugepage resides on the process user space:
>> > > ...
>> > > 00007f4e80000000 1048576K rw-s- /mnt/huge_1GB/rtemap_0
>> > > 00007f4ec0000000 2048K rw-s-
>> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource0
>> > > 00007f4ec0200000 16K rw-s-
>> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/resource4
>> > > 00007f4ec0204000 2048K rw-s-
>> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource0
>> > > 00007f4ec0404000 16K rw-s-
>> > > /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/resource4
>> > > ...
>> > >
>> > >
>> > > on CentOS 7, the hugepage resides on the process system space::
>> > > ...
>> > > 0000000100000000 20K rw-s- config
>> > > 0000000100005000 184K rw-s- fbarray_memzone
>> > > 0000000100033000 4K rw-s- fbarray_memseg-1048576k-0-0
>> > > 0000000140000000 1048576K rw-s- rtemap_0
>> > > 0000000180000000 32505856K r---- [ anon ]
>> > > 0000000940000000 4K rw-s- fbarray_memseg-1048576k-0-1
>> > > 0000000980000000 33554432K r---- [ anon ]
>> > > 0000001180000000 4K rw-s- fbarray_memseg-1048576k-0-2
>> > > 00000011c0000000 33554432K r---- [ anon ]
>> > > 00000019c0000000 4K rw-s- fbarray_memseg-1048576k-0-3
>> > > 0000001a00000000 33554432K r---- [ anon ]
>> > > 0000002200000000 1024K rw-s- resource0
>> > > 0000002200100000 16K rw-s- resource3
>> > > 0000002200104000 1024K rw-s- resource0
>> > > 0000002200204000 16K rw-s- resource3
>> > > ...
>> > >
>> > > Thanks,
>> > > -James
>> > >
>>
>
On Wed, Feb 24, 2021 at 5:00 AM Li Feng <fengli@smartx.com> wrote: > > I think you should update your dpdk to the latest. > I have fixed this issue some months ago. > > d72e4042c - mem: exclude unused memory from core dump No need to go to the latest release, this commit has been backported to 19.11 recently. http://git.dpdk.org/dpdk-stable/commit/?h=19.11&id=d3ceba92eff783d2c995387e5ed8509045578748 You can wait for the next 19.11.x release. -- David Marchand