From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id DB8D11B8A4 for ; Mon, 9 Apr 2018 20:02:14 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Apr 2018 11:02:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,427,1517904000"; d="scan'208";a="218993859" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 09 Apr 2018 11:02:10 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w39I29AF031440; Mon, 9 Apr 2018 19:02:10 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w39I29Og028636; Mon, 9 Apr 2018 19:02:09 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w39I29IW028632; Mon, 9 Apr 2018 19:02:09 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com, gowrishankar.m@linux.vnet.ibm.com Date: Mon, 9 Apr 2018 19:01:13 +0100 Message-Id: <18eb522a2e573fc88dee59ca34db73200d13af3e.1523296701.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v5 70/70] eal: prevent preallocated pages from being freed X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2018 18:02:15 -0000 It is common sense to expect for DPDK process to not deallocate any pages that were preallocated by "-m" or "--socket-mem" flags - yet, currently, DPDK memory subsystem will do exactly that once it finds that the pages are unused. Fix this by marking pages as unfreebale, and preventing malloc from ever trying to free them. Signed-off-by: Anatoly Burakov Tested-by: Santosh Shukla Tested-by: Hemant Agrawal --- lib/librte_eal/common/include/rte_memory.h | 3 +++ lib/librte_eal/common/malloc_heap.c | 23 +++++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 7 +++++++ lib/librte_eal/linuxapp/eal/eal_memory.c | 18 +++++++++++++++--- 4 files changed, 48 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h index b085a8b..a18fe27 100644 --- a/lib/librte_eal/common/include/rte_memory.h +++ b/lib/librte_eal/common/include/rte_memory.h @@ -83,6 +83,8 @@ typedef uint64_t rte_iova_t; /** * Physical memory segment descriptor. */ +#define RTE_MEMSEG_FLAG_DO_NOT_FREE (1 << 0) +/**< Prevent this segment from being freed back to the OS. */ struct rte_memseg { RTE_STD_C11 union { @@ -99,6 +101,7 @@ struct rte_memseg { int32_t socket_id; /**< NUMA socket ID. */ uint32_t nchannel; /**< Number of channels. */ uint32_t nrank; /**< Number of ranks. */ + uint32_t flags; /**< Memseg-specific flags */ } __rte_packed; /** diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index f8daf84..41c14a8 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -606,6 +606,7 @@ malloc_heap_free(struct malloc_elem *elem) void *start, *aligned_start, *end, *aligned_end; size_t len, aligned_len, page_sz; struct rte_memseg_list *msl; + unsigned int i, n_segs; int ret; if (!malloc_elem_cookies_ok(elem) || elem->state != ELEM_BUSY) @@ -647,6 +648,28 @@ malloc_heap_free(struct malloc_elem *elem) if (aligned_len < page_sz) goto free_unlock; + /* we can free something. however, some of these pages may be marked as + * unfreeable, so also check that as well + */ + n_segs = aligned_len / page_sz; + for (i = 0; i < n_segs; i++) { + const struct rte_memseg *tmp = + rte_mem_virt2memseg(aligned_start, msl); + + if (tmp->flags & RTE_MEMSEG_FLAG_DO_NOT_FREE) { + /* this is an unfreeable segment, so move start */ + aligned_start = RTE_PTR_ADD(tmp->addr, tmp->len); + } + } + + /* recalculate length and number of segments */ + aligned_len = RTE_PTR_DIFF(aligned_end, aligned_start); + n_segs = aligned_len / page_sz; + + /* check if we can still free some pages */ + if (n_segs == 0) + goto free_unlock; + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); /* diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 93f80bb..7bbbf30 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -806,6 +806,13 @@ eal_memalloc_free_seg_bulk(struct rte_memseg **ms, int n_segs) struct free_walk_param wa; int i, walk_res; + /* if this page is marked as unfreeable, fail */ + if (cur->flags & RTE_MEMSEG_FLAG_DO_NOT_FREE) { + RTE_LOG(DEBUG, EAL, "Page is not allowed to be freed\n"); + ret = -1; + continue; + } + memset(&wa, 0, sizeof(wa)); for (i = 0; i < (int)RTE_DIM(internal_config.hugepage_info); diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 7ec7129..2bd9c30 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1637,21 +1637,33 @@ eal_hugepage_init(void) hp_sz_idx++) { for (socket_id = 0; socket_id < RTE_MAX_NUMA_NODES; socket_id++) { + struct rte_memseg **pages; struct hugepage_info *hpi = &used_hp[hp_sz_idx]; unsigned int num_pages = hpi->num_pages[socket_id]; - int num_pages_alloc; + int num_pages_alloc, i; if (num_pages == 0) continue; + pages = malloc(sizeof(*pages) * num_pages); + RTE_LOG(DEBUG, EAL, "Allocating %u pages of size %" PRIu64 "M on socket %i\n", num_pages, hpi->hugepage_sz >> 20, socket_id); - num_pages_alloc = eal_memalloc_alloc_seg_bulk(NULL, + num_pages_alloc = eal_memalloc_alloc_seg_bulk(pages, num_pages, hpi->hugepage_sz, socket_id, true); - if (num_pages_alloc < 0) + if (num_pages_alloc < 0) { + free(pages); return -1; + } + + /* mark preallocated pages as unfreeable */ + for (i = 0; i < num_pages_alloc; i++) { + struct rte_memseg *ms = pages[i]; + ms->flags |= RTE_MEMSEG_FLAG_DO_NOT_FREE; + } + free(pages); } } return 0; -- 2.7.4