Add cc dev@dpdk.org
Thanks,
Feng Li
Li Feng <fengli@smartx.com> 于2020年4月20日周一 下午3:04写道:
>
> Avoid dump all mmapped memory to coredump file when crash.
> Otherwise it will very large and it's hard to analyze with gdb.
>
> In my test, it will dump 128GiB memory to coredump file when integrated
> to spdk with default configuration.
>
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
> lib/librte_eal/common/eal_common_memory.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index cc7d54e0c..cc6743d56 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -102,6 +102,13 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
> if (mapped_addr == MAP_FAILED && allow_shrink)
> *size -= page_sz;
>
> + if (mapped_addr != MAP_FAILED) {
> + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0)
> + RTE_LOG(INFO, EAL, "MADV_DONTDUMP advice setting failed.\n");
> + RTE_LOG(DEBUG, EAL, "madvise with MADV_DONTDUMP, addr: %p size: %ld\n",
> + mapped_addr, (size_t)map_sz);
> + }
> +
> if (mapped_addr != MAP_FAILED && addr_is_hint &&
> mapped_addr != requested_addr) {
> try++;
> --
> 2.11.0
>
--
The SmartX email address is only for business purpose. Any sent message
that is not related to the business is not authorized or permitted by
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.
On Mon, Apr 20, 2020 at 9:10 AM Li Feng <fengli@smartx.com> wrote: > Li Feng <fengli@smartx.com> 于2020年4月20日周一 下午3:04写道: > > > > Avoid dump all mmapped memory to coredump file when crash. > > Otherwise it will very large and it's hard to analyze with gdb. > > > > In my test, it will dump 128GiB memory to coredump file when integrated > > to spdk with default configuration. Can you test with current master which has this change? https://git.dpdk.org/dpdk/commit?id=8a4baf06c17a806696fb10aba36fce7471983028 Thanks. -- David Marchand
Thank you, Marchand,
I have just tested your patch, and it doesn't work.
The core dump file is still very very large, the same to virtual memory size.
David Marchand <david.marchand@redhat.com> 于2020年4月20日周一 下午3:13写道:
>
> On Mon, Apr 20, 2020 at 9:10 AM Li Feng <fengli@smartx.com> wrote:
> > Li Feng <fengli@smartx.com> 于2020年4月20日周一 下午3:04写道:
> > >
> > > Avoid dump all mmapped memory to coredump file when crash.
> > > Otherwise it will very large and it's hard to analyze with gdb.
> > >
> > > In my test, it will dump 128GiB memory to coredump file when integrated
> > > to spdk with default configuration.
>
> Can you test with current master which has this change?
> https://git.dpdk.org/dpdk/commit?id=8a4baf06c17a806696fb10aba36fce7471983028
>
> Thanks.
>
> --
> David Marchand
>
On Mon, Apr 20, 2020 at 11:41 AM Feng Li <lifeng1519@gmail.com> wrote: > > Thank you, Marchand, David is fine. > > I have just tested your patch, and it doesn't work. > The core dump file is still very very large, the same to virtual memory size. Please double check that the patch is in. I remember checking coredump size with my patch. This needs more investigation but I don't have time atm. Anatoly, opinion? -- David Marchand
Hi David,
Mmap with PROT_NONE does not affected the core dump size.
Here is a simple test prog:
#include <sys/mman.h>
#include <time.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char** argv) {
// FIXME(fengli): XXXXX
uint64_t gb = atoi(argv[1]);
void* ptr = mmap(0, gb << 30, PROT_NONE, MAP_PRIVATE |
MAP_ANONYMOUS, -1, 0);
if (ptr == (void*)-1) {
perror("[-] mmap failed with MAP_PRIVATE | MAP_ANONYMOUS");
exit(1);
}
while(1)
sleep(1);
return 0;
}
Thanks.
David Marchand <david.marchand@redhat.com> 于2020年4月21日周二 下午3:41写道:
>
> On Mon, Apr 20, 2020 at 11:41 AM Feng Li <lifeng1519@gmail.com> wrote:
> >
> > Thank you, Marchand,
>
> David is fine.
>
> >
> > I have just tested your patch, and it doesn't work.
> > The core dump file is still very very large, the same to virtual memory size.
>
> Please double check that the patch is in.
> I remember checking coredump size with my patch.
>
> This needs more investigation but I don't have time atm.
> Anatoly, opinion?
>
>
> --
> David Marchand
>
On 21-Apr-20 12:06 PM, Feng Li wrote:
> #include <sys/mman.h>
> #include <time.h>
> #include <stdint.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> int main(int argc, char** argv) {
> // FIXME(fengli): XXXXX
> uint64_t gb = atoi(argv[1]);
> void* ptr = mmap(0, gb << 30, PROT_NONE, MAP_PRIVATE |
> MAP_ANONYMOUS, -1, 0);
> if (ptr == (void*)-1) {
> perror("[-] mmap failed with MAP_PRIVATE | MAP_ANONYMOUS");
> exit(1);
> }
> while(1)
> sleep(1);
> return 0;
> }
DONTDUMP is available since Linux 3.4. I presume our minimum kernel
version is higher than that.
I have little idea of how dumping works, but reading the manpage for
madvise, DONTDUMP should be the way to go here. Also, reading up on
PROT_NONE, i can't find any references to this memory necessarily being
excluded from core dumps.
That said, I've run the program above, and i got a core dump sized
~100K. Do i need any special configuration to trigger core dump that
would include that anonymous memory?
--
Thanks,
Anatoly
Hi Anatoly,
This program could run like this:
gcc test_map.c
./a.out 2 &
gcore `pidof a.out`
You will get a core dump sized to 2GiB.
Thanks,
Feng Li
Burakov, Anatoly <anatoly.burakov@intel.com> 于2020年4月21日周二 下午8:19写道:
>
> On 21-Apr-20 12:06 PM, Feng Li wrote:
> > #include <sys/mman.h>
> > #include <time.h>
> > #include <stdint.h>
> > #include <unistd.h>
> > #include <stdlib.h>
> > #include <stdio.h>
> >
> > int main(int argc, char** argv) {
> > // FIXME(fengli): XXXXX
> > uint64_t gb = atoi(argv[1]);
> > void* ptr = mmap(0, gb << 30, PROT_NONE, MAP_PRIVATE |
> > MAP_ANONYMOUS, -1, 0);
> > if (ptr == (void*)-1) {
> > perror("[-] mmap failed with MAP_PRIVATE | MAP_ANONYMOUS");
> > exit(1);
> > }
> > while(1)
> > sleep(1);
> > return 0;
> > }
>
> DONTDUMP is available since Linux 3.4. I presume our minimum kernel
> version is higher than that.
>
> I have little idea of how dumping works, but reading the manpage for
> madvise, DONTDUMP should be the way to go here. Also, reading up on
> PROT_NONE, i can't find any references to this memory necessarily being
> excluded from core dumps.
>
> That said, I've run the program above, and i got a core dump sized
> ~100K. Do i need any special configuration to trigger core dump that
> would include that anonymous memory?
>
> --
> Thanks,
> Anatoly
On 21-Apr-20 5:38 PM, Feng Li wrote:
> Hi Anatoly,
>
> This program could run like this:
> gcc test_map.c
> ./a.out 2 &
> gcore `pidof a.out`
>
> You will get a core dump sized to 2GiB.
>
> Thanks,
> Feng Li
>
I did just that, and my core dump was ~100K in size. Hence my asking
about any special configuration required.
--
Thanks,
Anatoly
Really?So weird.
I have tested on 4 machines, running CentOS 7.
The core dump size is the same as the first argument GiB.
Thanks,
Feng Li
Burakov, Anatoly <anatoly.burakov@intel.com> 于2020年4月22日周三 上午1:02写道:
>
> On 21-Apr-20 5:38 PM, Feng Li wrote:
> > Hi Anatoly,
> >
> > This program could run like this:
> > gcc test_map.c
> > ./a.out 2 &
> > gcore `pidof a.out`
> >
> > You will get a core dump sized to 2GiB.
> >
> > Thanks,
> > Feng Li
> >
>
> I did just that, and my core dump was ~100K in size. Hence my asking
> about any special configuration required.
>
> --
> Thanks,
> Anatoly
--
The SmartX email address is only for business purpose. Any sent message
that is not related to the business is not authorized or permitted by
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.
On 22-Apr-20 4:13 AM, Li Feng wrote:
> Really?So weird.
> I have tested on 4 machines, running CentOS 7.
> The core dump size is the same as the first argument GiB.
>
> Thanks,
>
> Feng Li
>
> Burakov, Anatoly <anatoly.burakov@intel.com> 于2020年4月22日周三 上午1:02写道:
>>
>> On 21-Apr-20 5:38 PM, Feng Li wrote:
>>> Hi Anatoly,
>>>
>>> This program could run like this:
>>> gcc test_map.c
>>> ./a.out 2 &
>>> gcore `pidof a.out`
>>>
>>> You will get a core dump sized to 2GiB.
>>>
>>> Thanks,
>>> Feng Li
>>>
>>
>> I did just that, and my core dump was ~100K in size. Hence my asking
>> about any special configuration required.
>>
>> --
>> Thanks,
>> Anatoly
>
Apologies, i've looked at the wrong core dump file. I do get a 2G core
dump out of it.
However, i've added madvise call for that memory and run the exact same
command line you gave me, and i can see multiple core dump files, some
of them are 2G files. Does that mean that MADV_DONTDUMP doesn't work?
--
Thanks,
Anatoly
On 23-Apr-20 7:36 AM, Feng Li wrote:
> Hi,
> I have tested as follows, the core dump file is ~ 200KB.
> It should generate one core dump file each crash.
>
> #include <sys/mman.h>
> #include <time.h>
> #include <stdint.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> int main(int argc, char** argv) {
> // FIXME(fengli): XXXXX
> uint64_t size = 1<<30;
> void* ptr = mmap(0, size , PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> if (ptr == (void*)-1) {
> perror("[-] mmap failed with MAP_PRIVATE | MAP_ANONYMOUS");
> exit(1);
> }
> if (madvise(ptr, size , MADV_DONTDUMP) != 0)
> perror("[-] madvise failed");
> while(1)
> sleep(1);
> return 0;
> }
>
That's odd, your code works. Mine, even though it did the same thing,
didn't work the same way. My compiler must like you more than it likes
me :) (or perhaps i had a typo...)
Anyway, i can see that this indeed prevents core dumps on madvise'd
memory (i've also tested it with PROT_NONE).
I'll go ahead and ack the original patch then.
--
Thanks,
Anatoly
Avoid dump all mapped memory to a core dump file when crash. Otherwise it will very large and it's hard to analyze with gdb. In my test, it will dump 128GiB memory to a core dump file when integrated to spdk with default configuration. Signed-off-by: Li Feng <fengli@smartx.com> --- lib/librte_eal/common/eal_common_memory.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index cc7d54e0c..2d9564b28 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, after_len = RTE_PTR_DIFF(map_end, aligned_end); if (after_len > 0) munmap(aligned_end, after_len); + + /* + * Exclude this pages from a core dump. + */ + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", + strerror(errno)); + } else { + /* + * Exclude this pages from a core dump. + */ + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", + strerror(errno)); } return aligned_addr; -- 2.11.0 -- The SmartX email address is only for business purpose. Any sent message that is not related to the business is not authorized or permitted by SmartX. 本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.
On 23-Apr-20 4:43 PM, Li Feng wrote: > Avoid dump all mapped memory to a core dump file when crash. > Otherwise it will very large and it's hard to analyze with gdb. > > In my test, it will dump 128GiB memory to a core dump file when integrated > to spdk with default configuration. Suggested rewording: Currently, even though memory is mapped with PROT_NONE, this does not cause it to be excluded from core dumps. This is counter-productive, because in a lot of cases, this memory will go unused (e.g. when the memory subsystem preallocates VA space but hasn't yet mapped physical pages into it). Use `madvise()` call with MADV_DONTDUMP parameter to exclude the unmapped memory from being dumped. > > Signed-off-by: Li Feng <fengli@smartx.com> > --- > lib/librte_eal/common/eal_common_memory.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c > index cc7d54e0c..2d9564b28 100644 > --- a/lib/librte_eal/common/eal_common_memory.c > +++ b/lib/librte_eal/common/eal_common_memory.c > @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > after_len = RTE_PTR_DIFF(map_end, aligned_end); > if (after_len > 0) > munmap(aligned_end, after_len); > + > + /* > + * Exclude this pages from a core dump. > + */ > + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > + strerror(errno)); > + } else { > + /* > + * Exclude this pages from a core dump. > + */ > + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > + strerror(errno)); > } > > return aligned_addr; > For the contents of this patch, Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> However, even though this is good to have, after some more thought, i believe the fix is incomplete, because this is not the only place we're reserving anonymous memory. We're also doing so in `eal_memalloc.c:free_seg()`, so an `madvise()` call should also be added there. @David, now that i think of it, the PROT_NONE patch also was incomplete, as we only set PROT_NONE to memory that's initially reserved, but not when it's unmapped and returned back to the pool of anonymous memory. So, eal_memalloc.c should also remap anonymous memory with PROT_NONE. @Li Feng, would you be so kind as to provide a patch replacing PROT_READ with PROT_NONE in eal_memalloc.c as well? Thank you very much! -- Thanks, Anatoly
On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly <anatoly.burakov@intel.com> wrote: > > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c > > index cc7d54e0c..2d9564b28 100644 > > --- a/lib/librte_eal/common/eal_common_memory.c > > +++ b/lib/librte_eal/common/eal_common_memory.c > > @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > > after_len = RTE_PTR_DIFF(map_end, aligned_end); > > if (after_len > 0) > > munmap(aligned_end, after_len); > > + > > + /* > > + * Exclude this pages from a core dump. > > + */ > > + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > > + strerror(errno));> + } else { > > + /* > > + * Exclude this pages from a core dump. > > + */ > > + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > > + strerror(errno)); > > } > > > > return aligned_addr; > > > > For the contents of this patch, MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to be a MADV_NOCORE option on FreeBSD. 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ > > Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> > > However, even though this is good to have, after some more thought, i > believe the fix is incomplete, because this is not the only place we're > reserving anonymous memory. We're also doing so in > `eal_memalloc.c:free_seg()`, so an `madvise()` call should also be added > there. > > @David, now that i think of it, the PROT_NONE patch also was incomplete, > as we only set PROT_NONE to memory that's initially reserved, but not > when it's unmapped and returned back to the pool of anonymous memory. > So, eal_memalloc.c should also remap anonymous memory with PROT_NONE. I can't disagree if you say so :-). > > @Li Feng, would you be so kind as to provide a patch replacing PROT_READ > with PROT_NONE in eal_memalloc.c as well? Thank you very much! > Once we have the proper fixes, I'd like to get this Cc: stable@dpdk.org. Thanks. -- David Marchand
On 23-Apr-20 9:04 PM, David Marchand wrote: > On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly > <anatoly.burakov@intel.com> wrote: >>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c >>> index cc7d54e0c..2d9564b28 100644 >>> --- a/lib/librte_eal/common/eal_common_memory.c >>> +++ b/lib/librte_eal/common/eal_common_memory.c >>> @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, >>> after_len = RTE_PTR_DIFF(map_end, aligned_end); >>> if (after_len > 0) >>> munmap(aligned_end, after_len); >>> + >>> + /* >>> + * Exclude this pages from a core dump. >>> + */ >>> + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) >>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>> + strerror(errno));> + } else { >>> + /* >>> + * Exclude this pages from a core dump. >>> + */ >>> + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) >>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>> + strerror(errno)); >>> } >>> >>> return aligned_addr; >>> >> >> For the contents of this patch, > > MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to > be a MADV_NOCORE option on FreeBSD. > 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ > > Oh, right, so this would probably not compile on FreeBSD. Perhaps this function would have to be OS-specific after all (or call into an OS-specific madvise() after reserving the memory area). >> >> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> >> >> However, even though this is good to have, after some more thought, i >> believe the fix is incomplete, because this is not the only place we're >> reserving anonymous memory. We're also doing so in >> `eal_memalloc.c:free_seg()`, so an `madvise()` call should also be added >> there. >> >> @David, now that i think of it, the PROT_NONE patch also was incomplete, >> as we only set PROT_NONE to memory that's initially reserved, but not >> when it's unmapped and returned back to the pool of anonymous memory. >> So, eal_memalloc.c should also remap anonymous memory with PROT_NONE. > > I can't disagree if you say so :-). Nice to have that kind of power! *evil laugh* > >> >> @Li Feng, would you be so kind as to provide a patch replacing PROT_READ >> with PROT_NONE in eal_memalloc.c as well? Thank you very much! >> > > Once we have the proper fixes, I'd like to get this Cc: stable@dpdk.org. > Thanks. > > -- Thanks, Anatoly
On Fri, Apr 24, 2020 at 10:12:10AM +0100, Burakov, Anatoly wrote:
> On 23-Apr-20 9:04 PM, David Marchand wrote:
> > On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com> wrote:
> > > > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> > > > index cc7d54e0c..2d9564b28 100644
> > > > --- a/lib/librte_eal/common/eal_common_memory.c
> > > > +++ b/lib/librte_eal/common/eal_common_memory.c
> > > > @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
> > > > after_len = RTE_PTR_DIFF(map_end, aligned_end);
> > > > if (after_len > 0)
> > > > munmap(aligned_end, after_len);
> > > > +
> > > > + /*
> > > > + * Exclude this pages from a core dump.
> > > > + */
> > > > + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0)
> > > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n",
> > > > + strerror(errno));> + } else {
> > > > + /*
> > > > + * Exclude this pages from a core dump.
> > > > + */
> > > > + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0)
> > > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n",
> > > > + strerror(errno));
> > > > }
> > > >
> > > > return aligned_addr;
> > > >
> > >
> > > For the contents of this patch,
> >
> > MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to
> > be a MADV_NOCORE option on FreeBSD.
> > 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/
> >
> >
>
> Oh, right, so this would probably not compile on FreeBSD. Perhaps this
> function would have to be OS-specific after all (or call into an OS-specific
> madvise() after reserving the memory area).
>
Is it just a differently named flag? If so, I think a single #ifdef macro
won't kill us in the common code.
Bruce Richardson <bruce.richardson@intel.com> 于2020年4月24日周五 下午5:14写道: > > On Fri, Apr 24, 2020 at 10:12:10AM +0100, Burakov, Anatoly wrote: > > On 23-Apr-20 9:04 PM, David Marchand wrote: > > > On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly > > > <anatoly.burakov@intel.com> wrote: > > > > > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c > > > > > index cc7d54e0c..2d9564b28 100644 > > > > > --- a/lib/librte_eal/common/eal_common_memory.c > > > > > +++ b/lib/librte_eal/common/eal_common_memory.c > > > > > @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > > > > > after_len = RTE_PTR_DIFF(map_end, aligned_end); > > > > > if (after_len > 0) > > > > > munmap(aligned_end, after_len); > > > > > + > > > > > + /* > > > > > + * Exclude this pages from a core dump. > > > > > + */ > > > > > + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) > > > > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > > > > > + strerror(errno));> + } else { > > > > > + /* > > > > > + * Exclude this pages from a core dump. > > > > > + */ > > > > > + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) > > > > > + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > > > > > + strerror(errno)); > > > > > } > > > > > > > > > > return aligned_addr; > > > > > > > > > > > > > For the contents of this patch, > > > > > > MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to > > > be a MADV_NOCORE option on FreeBSD. > > > 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ > > > > > > > > > > Oh, right, so this would probably not compile on FreeBSD. Perhaps this > > function would have to be OS-specific after all (or call into an OS-specific > > madvise() after reserving the memory area). > > > > Is it just a differently named flag? If so, I think a single #ifdef macro > won't kill us in the common code. > Just the flag name is different. I should use RTE_EXEC_ENV_FREEBSD and RTE_EXEC_ENV_LINUX, right? Another question, in `eal_memalloc.c:alloc_seg`, I should undo the DONTMAP of the memory region. Right? @Anatoly Just few minutes, I have prepared a patch for the OS-specific code: --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -443,4 +443,20 @@ rte_option_usage(void); uint64_t eal_get_baseaddr(void); +/** + * @internal + * Exclude this pages from a core dump. + * + * @param addr + * The memory region starts. + * + * @param len + * The memory region length.. + * + * @return + * returns 0 or -errno + */ +int +eal_madvise_dontdump(void* addr, size_t len); + #endif /* _EAL_PRIVATE_H_ */ diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c index a97d8f0f0..585042dde 100644 --- a/lib/librte_eal/freebsd/eal_memory.c +++ b/lib/librte_eal/freebsd/eal_memory.c @@ -534,3 +534,9 @@ rte_eal_memseg_init(void) memseg_primary_init() : memseg_secondary_init(); } + +int +eal_madvise_dontdump(void* addr, size_t len) +{ + return madvise(addr, len, MADV_NOCORE); +} diff --git a/lib/librte_eal/linux/eal_memory.c b/lib/librte_eal/linux/eal_memory.c index 7a9c97ff8..cfdbfccfe 100644 --- a/lib/librte_eal/linux/eal_memory.c +++ b/lib/librte_eal/linux/eal_memory.c @@ -2479,3 +2479,9 @@ rte_eal_memseg_init(void) #endif memseg_secondary_init(); } + +int +eal_madvise_dontdump(void* addr, size_t len) +{ + return madvise(addr, len, MADV_DONTDUMP); +}
On 24-Apr-20 10:33 AM, Feng Li wrote: > Bruce Richardson <bruce.richardson@intel.com> 于2020年4月24日周五 下午5:14写道: >> >> On Fri, Apr 24, 2020 at 10:12:10AM +0100, Burakov, Anatoly wrote: >>> On 23-Apr-20 9:04 PM, David Marchand wrote: >>>> On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly >>>> <anatoly.burakov@intel.com> wrote: >>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c >>>>>> index cc7d54e0c..2d9564b28 100644 >>>>>> --- a/lib/librte_eal/common/eal_common_memory.c >>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c >>>>>> @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, >>>>>> after_len = RTE_PTR_DIFF(map_end, aligned_end); >>>>>> if (after_len > 0) >>>>>> munmap(aligned_end, after_len); >>>>>> + >>>>>> + /* >>>>>> + * Exclude this pages from a core dump. >>>>>> + */ >>>>>> + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>>>>> + strerror(errno));> + } else { >>>>>> + /* >>>>>> + * Exclude this pages from a core dump. >>>>>> + */ >>>>>> + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>>>>> + strerror(errno)); >>>>>> } >>>>>> >>>>>> return aligned_addr; >>>>>> >>>>> >>>>> For the contents of this patch, >>>> >>>> MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to >>>> be a MADV_NOCORE option on FreeBSD. >>>> 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ >>>> >>>> >>> >>> Oh, right, so this would probably not compile on FreeBSD. Perhaps this >>> function would have to be OS-specific after all (or call into an OS-specific >>> madvise() after reserving the memory area). >>> >> >> Is it just a differently named flag? If so, I think a single #ifdef macro >> won't kill us in the common code. >> > Just the flag name is different. > I should use RTE_EXEC_ENV_FREEBSD and RTE_EXEC_ENV_LINUX, right? Yes, but we need this in two places, so a function call is still necessary. > > Another question, in `eal_memalloc.c:alloc_seg`, I should undo the > DONTMAP of the memory region. > Right? @Anatoly I don't think it's necessary. When you map different memory into that region, madvise() flags no longer apply. To be sure, i just tested this by adding another mmap() call after madvise() (in your test app) and remapping the same memory with MAP_FIXED, and the core dump was back to 1GB of size. So, no, i don't think you should undo anything - the system does so automatically. > > Just few minutes, I have prepared a patch for the OS-specific code: > --- a/lib/librte_eal/common/eal_private.h > +++ b/lib/librte_eal/common/eal_private.h > @@ -443,4 +443,20 @@ rte_option_usage(void); > uint64_t > eal_get_baseaddr(void); > > +/** > + * @internal > + * Exclude this pages from a core dump. > + * > + * @param addr > + * The memory region starts. > + * > + * @param len > + * The memory region length.. > + * > + * @return > + * returns 0 or -errno > + */ > +int > +eal_madvise_dontdump(void* addr, size_t len); > + > #endif /* _EAL_PRIVATE_H_ */ > diff --git a/lib/librte_eal/freebsd/eal_memory.c > b/lib/librte_eal/freebsd/eal_memory.c > index a97d8f0f0..585042dde 100644 > --- a/lib/librte_eal/freebsd/eal_memory.c > +++ b/lib/librte_eal/freebsd/eal_memory.c > @@ -534,3 +534,9 @@ rte_eal_memseg_init(void) > memseg_primary_init() : > memseg_secondary_init(); > } > + > +int > +eal_madvise_dontdump(void* addr, size_t len) > +{ > + return madvise(addr, len, MADV_NOCORE); > +} > diff --git a/lib/librte_eal/linux/eal_memory.c > b/lib/librte_eal/linux/eal_memory.c > index 7a9c97ff8..cfdbfccfe 100644 > --- a/lib/librte_eal/linux/eal_memory.c > +++ b/lib/librte_eal/linux/eal_memory.c > @@ -2479,3 +2479,9 @@ rte_eal_memseg_init(void) > #endif > memseg_secondary_init(); > } > + > +int > +eal_madvise_dontdump(void* addr, size_t len) > +{ > + return madvise(addr, len, MADV_DONTDUMP); > +} > That would work as well (with added FreeBSD code of course), however if everyone else is OK with it, i'll settle for an #ifdef in common code. -- Thanks, Anatoly
Thanks, Feng Li Burakov, Anatoly <anatoly.burakov@intel.com> 于2020年4月24日周五 下午7:00写道: > > On 24-Apr-20 10:33 AM, Feng Li wrote: > > Bruce Richardson <bruce.richardson@intel.com> 于2020年4月24日周五 下午5:14写道: > >> > >> On Fri, Apr 24, 2020 at 10:12:10AM +0100, Burakov, Anatoly wrote: > >>> On 23-Apr-20 9:04 PM, David Marchand wrote: > >>>> On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly > >>>> <anatoly.burakov@intel.com> wrote: > >>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c > >>>>>> index cc7d54e0c..2d9564b28 100644 > >>>>>> --- a/lib/librte_eal/common/eal_common_memory.c > >>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c > >>>>>> @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > >>>>>> after_len = RTE_PTR_DIFF(map_end, aligned_end); > >>>>>> if (after_len > 0) > >>>>>> munmap(aligned_end, after_len); > >>>>>> + > >>>>>> + /* > >>>>>> + * Exclude this pages from a core dump. > >>>>>> + */ > >>>>>> + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) > >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > >>>>>> + strerror(errno));> + } else { > >>>>>> + /* > >>>>>> + * Exclude this pages from a core dump. > >>>>>> + */ > >>>>>> + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) > >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", > >>>>>> + strerror(errno)); > >>>>>> } > >>>>>> > >>>>>> return aligned_addr; > >>>>>> > >>>>> > >>>>> For the contents of this patch, > >>>> > >>>> MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to > >>>> be a MADV_NOCORE option on FreeBSD. > >>>> 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ > >>>> > >>>> > >>> > >>> Oh, right, so this would probably not compile on FreeBSD. Perhaps this > >>> function would have to be OS-specific after all (or call into an OS-specific > >>> madvise() after reserving the memory area). > >>> > >> > >> Is it just a differently named flag? If so, I think a single #ifdef macro > >> won't kill us in the common code. > >> > > Just the flag name is different. > > I should use RTE_EXEC_ENV_FREEBSD and RTE_EXEC_ENV_LINUX, right? > > Yes, but we need this in two places, so a function call is still necessary. > > > > > Another question, in `eal_memalloc.c:alloc_seg`, I should undo the > > DONTMAP of the memory region. > > Right? @Anatoly > > I don't think it's necessary. When you map different memory into that > region, madvise() flags no longer apply. To be sure, i just tested this > by adding another mmap() call after madvise() (in your test app) and > remapping the same memory with MAP_FIXED, and the core dump was back to > 1GB of size. So, no, i don't think you should undo anything - the system > does so automatically. Got it. > > > > > Just few minutes, I have prepared a patch for the OS-specific code: > > --- a/lib/librte_eal/common/eal_private.h > > +++ b/lib/librte_eal/common/eal_private.h > > @@ -443,4 +443,20 @@ rte_option_usage(void); > > uint64_t > > eal_get_baseaddr(void); > > > > +/** > > + * @internal > > + * Exclude this pages from a core dump. > > + * > > + * @param addr > > + * The memory region starts. > > + * > > + * @param len > > + * The memory region length.. > > + * > > + * @return > > + * returns 0 or -errno > > + */ > > +int > > +eal_madvise_dontdump(void* addr, size_t len); > > + > > #endif /* _EAL_PRIVATE_H_ */ > > diff --git a/lib/librte_eal/freebsd/eal_memory.c > > b/lib/librte_eal/freebsd/eal_memory.c > > index a97d8f0f0..585042dde 100644 > > --- a/lib/librte_eal/freebsd/eal_memory.c > > +++ b/lib/librte_eal/freebsd/eal_memory.c > > @@ -534,3 +534,9 @@ rte_eal_memseg_init(void) > > memseg_primary_init() : > > memseg_secondary_init(); > > } > > + > > +int > > +eal_madvise_dontdump(void* addr, size_t len) > > +{ > > + return madvise(addr, len, MADV_NOCORE); > > +} > > diff --git a/lib/librte_eal/linux/eal_memory.c > > b/lib/librte_eal/linux/eal_memory.c > > index 7a9c97ff8..cfdbfccfe 100644 > > --- a/lib/librte_eal/linux/eal_memory.c > > +++ b/lib/librte_eal/linux/eal_memory.c > > @@ -2479,3 +2479,9 @@ rte_eal_memseg_init(void) > > #endif > > memseg_secondary_init(); > > } > > + > > +int > > +eal_madvise_dontdump(void* addr, size_t len) > > +{ > > + return madvise(addr, len, MADV_DONTDUMP); > > +} > > > > That would work as well (with added FreeBSD code of course), however if > everyone else is OK with it, i'll settle for an #ifdef in common code. > > -- > Thanks, > Anatoly -- The SmartX email address is only for business purpose. Any sent message that is not related to the business is not authorized or permitted by SmartX. 本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.