From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0EC92A00C2; Fri, 24 Apr 2020 13:00:45 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C8C7A1C1EC; Fri, 24 Apr 2020 13:00:43 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id B36E71C01B for ; Fri, 24 Apr 2020 13:00:41 +0200 (CEST) IronPort-SDR: kT28bByMqzwjdJq7qUkSq9APXkXulXkj6OSlbIufSM/97RS+nTQo1+JPDAQZ7fJTBtgloMvUoq DtdQxvWMVTGQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2020 04:00:40 -0700 IronPort-SDR: x8ejRkLgTgwm2T/lACsBRM8qLTO+D6mLyegdoGsmyz1OMyP9g/Dzq1a6LZeqOM4he6EoPZfB6o kAgqwXlUFNWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,311,1583222400"; d="scan'208";a="256334102" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.212.61.196]) ([10.212.61.196]) by orsmga003.jf.intel.com with ESMTP; 24 Apr 2020 04:00:39 -0700 To: Feng Li , Bruce Richardson Cc: David Marchand , Li Feng , dev , Kyle Zhang , Yang Fan References: <20200420070508.645533-1-fengli@smartx.com> <20200423154302.2217041-1-fengli@smartx.com> <9d6dc63b-34f7-36b3-5c3f-df74b71d961c@intel.com> <083d248a-77dd-0b07-cb8b-f2703e8503f5@intel.com> <20200424091421.GB1440@bricha3-MOBL.ger.corp.intel.com> From: "Burakov, Anatoly" Message-ID: Date: Fri, 24 Apr 2020 12:00:38 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v2] eal: add madvise to avoid dump memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 24-Apr-20 10:33 AM, Feng Li wrote: > Bruce Richardson 于2020年4月24日周五 下午5:14写道: >> >> On Fri, Apr 24, 2020 at 10:12:10AM +0100, Burakov, Anatoly wrote: >>> On 23-Apr-20 9:04 PM, David Marchand wrote: >>>> On Thu, Apr 23, 2020 at 6:34 PM Burakov, Anatoly >>>> wrote: >>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c >>>>>> index cc7d54e0c..2d9564b28 100644 >>>>>> --- a/lib/librte_eal/common/eal_common_memory.c >>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c >>>>>> @@ -177,6 +177,20 @@ eal_get_virtual_area(void *requested_addr, size_t *size, >>>>>> after_len = RTE_PTR_DIFF(map_end, aligned_end); >>>>>> if (after_len > 0) >>>>>> munmap(aligned_end, after_len); >>>>>> + >>>>>> + /* >>>>>> + * Exclude this pages from a core dump. >>>>>> + */ >>>>>> + if (madvise(aligned_addr, *size, MADV_DONTDUMP) != 0) >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>>>>> + strerror(errno));> + } else { >>>>>> + /* >>>>>> + * Exclude this pages from a core dump. >>>>>> + */ >>>>>> + if (madvise(mapped_addr, map_sz, MADV_DONTDUMP) != 0) >>>>>> + RTE_LOG(WARNING, EAL, "Madvise with MADV_DONTDUMP failed: %s\n", >>>>>> + strerror(errno)); >>>>>> } >>>>>> >>>>>> return aligned_addr; >>>>>> >>>>> >>>>> For the contents of this patch, >>>> >>>> MADV_DONTDUMP does not seem POSIX, but as I said [1], there seems to >>>> be a MADV_NOCORE option on FreeBSD. >>>> 1: http://inbox.dpdk.org/dev/CAJFAV8y9YtT-7njUz+mD6U8+3XUqYrgp28KD7jy2923EpAcXrg@mail.gmail.com/ >>>> >>>> >>> >>> Oh, right, so this would probably not compile on FreeBSD. Perhaps this >>> function would have to be OS-specific after all (or call into an OS-specific >>> madvise() after reserving the memory area). >>> >> >> Is it just a differently named flag? If so, I think a single #ifdef macro >> won't kill us in the common code. >> > Just the flag name is different. > I should use RTE_EXEC_ENV_FREEBSD and RTE_EXEC_ENV_LINUX, right? Yes, but we need this in two places, so a function call is still necessary. > > Another question, in `eal_memalloc.c:alloc_seg`, I should undo the > DONTMAP of the memory region. > Right? @Anatoly I don't think it's necessary. When you map different memory into that region, madvise() flags no longer apply. To be sure, i just tested this by adding another mmap() call after madvise() (in your test app) and remapping the same memory with MAP_FIXED, and the core dump was back to 1GB of size. So, no, i don't think you should undo anything - the system does so automatically. > > Just few minutes, I have prepared a patch for the OS-specific code: > --- a/lib/librte_eal/common/eal_private.h > +++ b/lib/librte_eal/common/eal_private.h > @@ -443,4 +443,20 @@ rte_option_usage(void); > uint64_t > eal_get_baseaddr(void); > > +/** > + * @internal > + * Exclude this pages from a core dump. > + * > + * @param addr > + * The memory region starts. > + * > + * @param len > + * The memory region length.. > + * > + * @return > + * returns 0 or -errno > + */ > +int > +eal_madvise_dontdump(void* addr, size_t len); > + > #endif /* _EAL_PRIVATE_H_ */ > diff --git a/lib/librte_eal/freebsd/eal_memory.c > b/lib/librte_eal/freebsd/eal_memory.c > index a97d8f0f0..585042dde 100644 > --- a/lib/librte_eal/freebsd/eal_memory.c > +++ b/lib/librte_eal/freebsd/eal_memory.c > @@ -534,3 +534,9 @@ rte_eal_memseg_init(void) > memseg_primary_init() : > memseg_secondary_init(); > } > + > +int > +eal_madvise_dontdump(void* addr, size_t len) > +{ > + return madvise(addr, len, MADV_NOCORE); > +} > diff --git a/lib/librte_eal/linux/eal_memory.c > b/lib/librte_eal/linux/eal_memory.c > index 7a9c97ff8..cfdbfccfe 100644 > --- a/lib/librte_eal/linux/eal_memory.c > +++ b/lib/librte_eal/linux/eal_memory.c > @@ -2479,3 +2479,9 @@ rte_eal_memseg_init(void) > #endif > memseg_secondary_init(); > } > + > +int > +eal_madvise_dontdump(void* addr, size_t len) > +{ > + return madvise(addr, len, MADV_DONTDUMP); > +} > That would work as well (with added FreeBSD code of course), however if everyone else is OK with it, i'll settle for an #ifdef in common code. -- Thanks, Anatoly