* [dpdk-dev] [PATCH 2/3] eal/bsdapp: concatenate adjacent segments
2018-06-11 16:13 [dpdk-dev] [PATCH 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
@ 2018-06-11 16:13 ` Anatoly Burakov
2018-06-11 16:13 ` [dpdk-dev] [PATCH 3/3] eal: make memory segment preallocation OS-specific Anatoly Burakov
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Anatoly Burakov @ 2018-06-11 16:13 UTC (permalink / raw)
To: dev; +Cc: Bruce Richardson
Previously, memory allocator always left holes between mapped
contigmem segments, even if they were IOVA-contiguous. Fix this
by remembering last IOVA address and memseg index, and checking
against those when mapping new contigmem segments.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memory.c | 48 ++++++++++++++++----------
1 file changed, 30 insertions(+), 18 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index ca06de2f8..21a390fac 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -104,6 +104,8 @@ rte_eal_hugepage_init(void)
/* map all hugepages and sort them */
for (i = 0; i < internal_config.num_hugepage_sizes; i ++){
struct hugepage_info *hpi;
+ rte_iova_t prev_end = 0;
+ int prev_ms_idx = -1;
uint64_t page_sz, mem_needed;
unsigned int n_pages, max_pages;
@@ -124,10 +126,27 @@ rte_eal_hugepage_init(void)
int error;
size_t sysctl_size = sizeof(physaddr);
char physaddr_str[64];
+ bool is_adjacent;
+
+ /* first, check if this segment is IOVA-adjacent to
+ * the previous one.
+ */
+ snprintf(physaddr_str, sizeof(physaddr_str),
+ "hw.contigmem.physaddr.%d", j);
+ error = sysctlbyname(physaddr_str, &physaddr,
+ &sysctl_size, NULL, 0);
+ if (error < 0) {
+ RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u "
+ "from %s\n", j, hpi->hugedir);
+ return -1;
+ }
+
+ is_adjacent = prev_end != 0 && physaddr == prev_end;
+ prev_end = physaddr + hpi->hugepage_sz;
for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS;
msl_idx++) {
- bool empty;
+ bool empty, need_hole;
msl = &mcfg->memsegs[msl_idx];
arr = &msl->memseg_arr;
@@ -136,20 +155,23 @@ rte_eal_hugepage_init(void)
empty = arr->count == 0;
- /* we need 1, plus hole if not empty */
+ /* we need a hole if this isn't an empty memseg
+ * list, and if previous segment was not
+ * adjacent to current one.
+ */
+ need_hole = !empty && !is_adjacent;
+
+ /* we need 1, plus hole if not adjacent */
ms_idx = rte_fbarray_find_next_n_free(arr,
- 0, 1 + (empty ? 1 : 0));
+ 0, 1 + (need_hole ? 1 : 0));
/* memseg list is full? */
if (ms_idx < 0)
continue;
- /* leave some space between memsegs, they are
- * not IOVA contiguous, so they shouldn't be VA
- * contiguous either.
- */
- if (!empty)
+ if (need_hole && prev_ms_idx != ms_idx - 1)
ms_idx++;
+ prev_ms_idx = ms_idx;
break;
}
@@ -178,16 +200,6 @@ rte_eal_hugepage_init(void)
return -1;
}
- snprintf(physaddr_str, sizeof(physaddr_str), "hw.contigmem"
- ".physaddr.%d", j);
- error = sysctlbyname(physaddr_str, &physaddr, &sysctl_size,
- NULL, 0);
- if (error < 0) {
- RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u "
- "from %s\n", j, hpi->hugedir);
- return -1;
- }
-
seg->addr = addr;
seg->iova = physaddr;
seg->hugepage_sz = page_sz;
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [PATCH 3/3] eal: make memory segment preallocation OS-specific
2018-06-11 16:13 [dpdk-dev] [PATCH 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
2018-06-11 16:13 ` [dpdk-dev] [PATCH 2/3] eal/bsdapp: concatenate adjacent segments Anatoly Burakov
@ 2018-06-11 16:13 ` Anatoly Burakov
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Anatoly Burakov @ 2018-06-11 16:13 UTC (permalink / raw)
To: dev; +Cc: Bruce Richardson
In the perfect world, it wouldn't matter how much memory was
preallocated because most of it was always going to be private
anonymous zero-page mappings for the duration of the program.
However, in practice, due to peculiarities of FreeBSD, we need
to additionally limit memory allocation there. This patch moves
the segment preallocation to EAL private functions that will be
implemented by an OS-specific EAL rather than being in the common
memory-related code.
Since there is no support for growing/shrinking memory use at
runtime on FreeBSD anyway, this does not inhibit any functionality
but makes core dumps faster even on default settings.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
For Linuxapp, this is 99% code move (aside from slight changes due to
code deduplication between Linuxapp EAL and old common memory code),
while for FreeBSD it's mostly code move but with changes due to
dropping 32-bit code and implementing FreeBSD-specific limits on
memory preallocation outlined in the commit.
lib/librte_eal/bsdapp/eal/eal_memory.c | 215 ++++++++++++
lib/librte_eal/common/eal_common_memory.c | 386 +---------------------
lib/librte_eal/common/eal_private.h | 12 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 341 +++++++++++++++++++
4 files changed, 569 insertions(+), 385 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 21a390fac..3dc427bd8 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -12,6 +12,7 @@
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_log.h>
#include <rte_string_fns.h>
#include "eal_private.h"
@@ -300,3 +301,217 @@ rte_eal_using_phys_addrs(void)
{
return 0;
}
+
+static uint64_t
+get_mem_amount(uint64_t page_sz, uint64_t max_mem)
+{
+ uint64_t area_sz, max_pages;
+
+ /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
+ max_pages = RTE_MAX_MEMSEG_PER_LIST;
+ max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
+
+ area_sz = RTE_MIN(page_sz * max_pages, max_mem);
+
+ /* make sure the list isn't smaller than the page size */
+ area_sz = RTE_MAX(area_sz, page_sz);
+
+ return RTE_ALIGN(area_sz, page_sz);
+}
+
+#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i"
+static int
+alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
+ int n_segs, int socket_id, int type_msl_idx)
+{
+ char name[RTE_FBARRAY_NAME_LEN];
+
+ snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
+ type_msl_idx);
+ if (rte_fbarray_init(&msl->memseg_arr, name, n_segs,
+ sizeof(struct rte_memseg))) {
+ RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n",
+ rte_strerror(rte_errno));
+ return -1;
+ }
+
+ msl->page_sz = page_sz;
+ msl->socket_id = socket_id;
+ msl->base_va = NULL;
+
+ RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
+ (size_t)page_sz >> 10, socket_id);
+
+ return 0;
+}
+
+static int
+alloc_va_space(struct rte_memseg_list *msl)
+{
+ uint64_t page_sz;
+ size_t mem_sz;
+ void *addr;
+ int flags = 0;
+
+#ifdef RTE_ARCH_PPC_64
+ flags |= MAP_HUGETLB;
+#endif
+
+ page_sz = msl->page_sz;
+ mem_sz = page_sz * msl->memseg_arr.len;
+
+ addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags);
+ if (addr == NULL) {
+ if (rte_errno == EADDRNOTAVAIL)
+ RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n",
+ (unsigned long long)mem_sz, msl->base_va);
+ else
+ RTE_LOG(ERR, EAL, "Cannot reserve memory\n");
+ return -1;
+ }
+ msl->base_va = addr;
+
+ return 0;
+}
+
+
+static int
+memseg_primary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int hpi_idx, msl_idx = 0;
+ struct rte_memseg_list *msl;
+ uint64_t max_mem, total_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ /* FreeBSD has an issue where core dump will dump the entire memory
+ * contents, including anonymous zero-page memory. Therefore, while we
+ * will be limiting total amount of memory to RTE_MAX_MEM_MB, we will
+ * also be further limiting total memory amount to whatever memory is
+ * available to us through contigmem driver (plus spacing blocks).
+ *
+ * so, at each stage, we will be checking how much memory we are
+ * preallocating, and adjust all the values accordingly.
+ */
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ total_mem = 0;
+
+ /* create memseg lists */
+ for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
+ hpi_idx++) {
+ uint64_t max_type_mem, total_type_mem = 0;
+ uint64_t avail_mem;
+ int type_msl_idx, max_segs, avail_segs, total_segs = 0;
+ struct hugepage_info *hpi;
+ uint64_t hugepage_sz;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ /* no NUMA support on FreeBSD */
+
+ /* check if we've already exceeded total memory amount */
+ if (total_mem >= max_mem)
+ break;
+
+ /* first, calculate theoretical limits according to config */
+ max_type_mem = RTE_MIN(max_mem - total_mem,
+ (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+
+ /* now, limit all of that to whatever will actually be
+ * available to us, because without dynamic allocation support,
+ * all of that extra memory will be sitting there being useless
+ * and slowing down core dumps in case of a crash.
+ *
+ * we need (N*2)-1 segments because we cannot guarantee that
+ * each segment will be IOVA-contiguous with the previous one,
+ * so we will allocate more and put spaces inbetween segments
+ * that are non-contiguous.
+ */
+ avail_segs = (hpi->num_pages[0] * 2) - 1;
+ avail_mem = avail_segs * hugepage_sz;
+
+ max_type_mem = RTE_MIN(avail_mem, max_type_mem);
+ max_segs = RTE_MIN(avail_segs, max_segs);
+
+ type_msl_idx = 0;
+ while (total_type_mem < max_type_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_max_mem, cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx++];
+
+ cur_max_mem = max_type_mem - total_type_mem;
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ cur_max_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ 0, type_msl_idx))
+ return -1;
+
+ total_segs += msl->memseg_arr.len;
+ total_type_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
+ return -1;
+ }
+ }
+ total_mem += total_type_mem;
+ }
+ return 0;
+}
+
+static int
+memseg_secondary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int msl_idx = 0;
+ struct rte_memseg_list *msl;
+
+ for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ /* skip empty memseg lists */
+ if (msl->memseg_arr.len == 0)
+ continue;
+
+ if (rte_fbarray_attach(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
+ return -1;
+ }
+
+ /* preallocate VA space */
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+int
+rte_eal_memseg_init(void)
+{
+ return rte_eal_process_type() == RTE_PROC_PRIMARY ?
+ memseg_primary_init() :
+ memseg_secondary_init();
+}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..4b7389ed4 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -153,382 +153,6 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
return aligned_addr;
}
-static uint64_t
-get_mem_amount(uint64_t page_sz, uint64_t max_mem)
-{
- uint64_t area_sz, max_pages;
-
- /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
- max_pages = RTE_MAX_MEMSEG_PER_LIST;
- max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
-
- area_sz = RTE_MIN(page_sz * max_pages, max_mem);
-
- /* make sure the list isn't smaller than the page size */
- area_sz = RTE_MAX(area_sz, page_sz);
-
- return RTE_ALIGN(area_sz, page_sz);
-}
-
-static int
-free_memseg_list(struct rte_memseg_list *msl)
-{
- if (rte_fbarray_destroy(&msl->memseg_arr)) {
- RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n");
- return -1;
- }
- memset(msl, 0, sizeof(*msl));
- return 0;
-}
-
-static int
-alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
- uint64_t max_mem, int socket_id, int type_msl_idx)
-{
- char name[RTE_FBARRAY_NAME_LEN];
- uint64_t mem_amount;
- int max_segs;
-
- mem_amount = get_mem_amount(page_sz, max_mem);
- max_segs = mem_amount / page_sz;
-
- snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
- type_msl_idx);
- if (rte_fbarray_init(&msl->memseg_arr, name, max_segs,
- sizeof(struct rte_memseg))) {
- RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n",
- rte_strerror(rte_errno));
- return -1;
- }
-
- msl->page_sz = page_sz;
- msl->socket_id = socket_id;
- msl->base_va = NULL;
-
- RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
- (size_t)page_sz >> 10, socket_id);
-
- return 0;
-}
-
-static int
-alloc_va_space(struct rte_memseg_list *msl)
-{
- uint64_t page_sz;
- size_t mem_sz;
- void *addr;
- int flags = 0;
-
-#ifdef RTE_ARCH_PPC_64
- flags |= MAP_HUGETLB;
-#endif
-
- page_sz = msl->page_sz;
- mem_sz = page_sz * msl->memseg_arr.len;
-
- addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags);
- if (addr == NULL) {
- if (rte_errno == EADDRNOTAVAIL)
- RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n",
- (unsigned long long)mem_sz, msl->base_va);
- else
- RTE_LOG(ERR, EAL, "Cannot reserve memory\n");
- return -1;
- }
- msl->base_va = addr;
-
- return 0;
-}
-
-static int __rte_unused
-memseg_primary_init_32(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int active_sockets, hpi_idx, msl_idx = 0;
- unsigned int socket_id, i;
- struct rte_memseg_list *msl;
- uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem;
- uint64_t max_mem;
-
- /* no-huge does not need this at all */
- if (internal_config.no_hugetlbfs)
- return 0;
-
- /* this is a giant hack, but desperate times call for desperate
- * measures. in legacy 32-bit mode, we cannot preallocate VA space,
- * because having upwards of 2 gigabytes of VA space already mapped will
- * interfere with our ability to map and sort hugepages.
- *
- * therefore, in legacy 32-bit mode, we will be initializing memseg
- * lists much later - in eal_memory.c, right after we unmap all the
- * unneeded pages. this will not affect secondary processes, as those
- * should be able to mmap the space without (too many) problems.
- */
- if (internal_config.legacy_mem)
- return 0;
-
- /* 32-bit mode is a very special case. we cannot know in advance where
- * the user will want to allocate their memory, so we have to do some
- * heuristics.
- */
- active_sockets = 0;
- total_requested_mem = 0;
- if (internal_config.force_sockets)
- for (i = 0; i < rte_socket_count(); i++) {
- uint64_t mem;
-
- socket_id = rte_socket_id_by_idx(i);
- mem = internal_config.socket_mem[socket_id];
-
- if (mem == 0)
- continue;
-
- active_sockets++;
- total_requested_mem += mem;
- }
- else
- total_requested_mem = internal_config.memory;
-
- max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
- if (total_requested_mem > max_mem) {
- RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n",
- (unsigned int)(max_mem >> 20));
- return -1;
- }
- total_extra_mem = max_mem - total_requested_mem;
- extra_mem_per_socket = active_sockets == 0 ? total_extra_mem :
- total_extra_mem / active_sockets;
-
- /* the allocation logic is a little bit convoluted, but here's how it
- * works, in a nutshell:
- * - if user hasn't specified on which sockets to allocate memory via
- * --socket-mem, we allocate all of our memory on master core socket.
- * - if user has specified sockets to allocate memory on, there may be
- * some "unused" memory left (e.g. if user has specified --socket-mem
- * such that not all memory adds up to 2 gigabytes), so add it to all
- * sockets that are in use equally.
- *
- * page sizes are sorted by size in descending order, so we can safely
- * assume that we dispense with bigger page sizes first.
- */
-
- /* create memseg lists */
- for (i = 0; i < rte_socket_count(); i++) {
- int hp_sizes = (int) internal_config.num_hugepage_sizes;
- uint64_t max_socket_mem, cur_socket_mem;
- unsigned int master_lcore_socket;
- struct rte_config *cfg = rte_eal_get_configuration();
- bool skip;
-
- socket_id = rte_socket_id_by_idx(i);
-
-#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
- if (socket_id > 0)
- break;
-#endif
-
- /* if we didn't specifically request memory on this socket */
- skip = active_sockets != 0 &&
- internal_config.socket_mem[socket_id] == 0;
- /* ...or if we didn't specifically request memory on *any*
- * socket, and this is not master lcore
- */
- master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore);
- skip |= active_sockets == 0 && socket_id != master_lcore_socket;
-
- if (skip) {
- RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n",
- socket_id);
- continue;
- }
-
- /* max amount of memory on this socket */
- max_socket_mem = (active_sockets != 0 ?
- internal_config.socket_mem[socket_id] :
- internal_config.memory) +
- extra_mem_per_socket;
- cur_socket_mem = 0;
-
- for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) {
- uint64_t max_pagesz_mem, cur_pagesz_mem = 0;
- uint64_t hugepage_sz;
- struct hugepage_info *hpi;
- int type_msl_idx, max_segs, total_segs = 0;
-
- hpi = &internal_config.hugepage_info[hpi_idx];
- hugepage_sz = hpi->hugepage_sz;
-
- /* check if pages are actually available */
- if (hpi->num_pages[socket_id] == 0)
- continue;
-
- max_segs = RTE_MAX_MEMSEG_PER_TYPE;
- max_pagesz_mem = max_socket_mem - cur_socket_mem;
-
- /* make it multiple of page size */
- max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem,
- hugepage_sz);
-
- RTE_LOG(DEBUG, EAL, "Attempting to preallocate "
- "%" PRIu64 "M on socket %i\n",
- max_pagesz_mem >> 20, socket_id);
-
- type_msl_idx = 0;
- while (cur_pagesz_mem < max_pagesz_mem &&
- total_segs < max_segs) {
- if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
- RTE_LOG(ERR, EAL,
- "No more space in memseg lists, please increase %s\n",
- RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
- return -1;
- }
-
- msl = &mcfg->memsegs[msl_idx];
-
- if (alloc_memseg_list(msl, hugepage_sz,
- max_pagesz_mem, socket_id,
- type_msl_idx)) {
- /* failing to allocate a memseg list is
- * a serious error.
- */
- RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n");
- return -1;
- }
-
- if (alloc_va_space(msl)) {
- /* if we couldn't allocate VA space, we
- * can try with smaller page sizes.
- */
- RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n");
- /* deallocate memseg list */
- if (free_memseg_list(msl))
- return -1;
- break;
- }
-
- total_segs += msl->memseg_arr.len;
- cur_pagesz_mem = total_segs * hugepage_sz;
- type_msl_idx++;
- msl_idx++;
- }
- cur_socket_mem += cur_pagesz_mem;
- }
- if (cur_socket_mem == 0) {
- RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n",
- socket_id);
- return -1;
- }
- }
-
- return 0;
-}
-
-static int __rte_unused
-memseg_primary_init(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int i, socket_id, hpi_idx, msl_idx = 0;
- struct rte_memseg_list *msl;
- uint64_t max_mem, total_mem;
-
- /* no-huge does not need this at all */
- if (internal_config.no_hugetlbfs)
- return 0;
-
- max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
- total_mem = 0;
-
- /* create memseg lists */
- for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
- hpi_idx++) {
- struct hugepage_info *hpi;
- uint64_t hugepage_sz;
-
- hpi = &internal_config.hugepage_info[hpi_idx];
- hugepage_sz = hpi->hugepage_sz;
-
- for (i = 0; i < (int) rte_socket_count(); i++) {
- uint64_t max_type_mem, total_type_mem = 0;
- int type_msl_idx, max_segs, total_segs = 0;
-
- socket_id = rte_socket_id_by_idx(i);
-
-#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
- if (socket_id > 0)
- break;
-#endif
-
- if (total_mem >= max_mem)
- break;
-
- max_type_mem = RTE_MIN(max_mem - total_mem,
- (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
- max_segs = RTE_MAX_MEMSEG_PER_TYPE;
-
- type_msl_idx = 0;
- while (total_type_mem < max_type_mem &&
- total_segs < max_segs) {
- uint64_t cur_max_mem;
- if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
- RTE_LOG(ERR, EAL,
- "No more space in memseg lists, please increase %s\n",
- RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
- return -1;
- }
-
- msl = &mcfg->memsegs[msl_idx++];
-
- cur_max_mem = max_type_mem - total_type_mem;
- if (alloc_memseg_list(msl, hugepage_sz,
- cur_max_mem, socket_id,
- type_msl_idx))
- return -1;
-
- total_segs += msl->memseg_arr.len;
- total_type_mem = total_segs * hugepage_sz;
- type_msl_idx++;
-
- if (alloc_va_space(msl)) {
- RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
- return -1;
- }
- }
- total_mem += total_type_mem;
- }
- }
- return 0;
-}
-
-static int
-memseg_secondary_init(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int msl_idx = 0;
- struct rte_memseg_list *msl;
-
- for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
-
- msl = &mcfg->memsegs[msl_idx];
-
- /* skip empty memseg lists */
- if (msl->memseg_arr.len == 0)
- continue;
-
- if (rte_fbarray_attach(&msl->memseg_arr)) {
- RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
- return -1;
- }
-
- /* preallocate VA space */
- if (alloc_va_space(msl)) {
- RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
- return -1;
- }
- }
-
- return 0;
-}
-
static struct rte_memseg *
virt2memseg(const void *addr, const struct rte_memseg_list *msl)
{
@@ -918,15 +542,7 @@ rte_eal_memory_init(void)
/* lock mem hotplug here, to prevent races while we init */
rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- retval = rte_eal_process_type() == RTE_PROC_PRIMARY ?
-#ifndef RTE_ARCH_64
- memseg_primary_init_32() :
-#else
- memseg_primary_init() :
-#endif
- memseg_secondary_init();
-
- if (retval < 0)
+ if (rte_eal_memseg_init() < 0)
goto fail;
if (eal_memalloc_init() < 0)
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d50..b742f4c58 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -46,6 +46,18 @@ void eal_log_set_default(FILE *default_log);
*/
int rte_eal_cpu_init(void);
+/**
+ * Create memseg lists
+ *
+ * This function is private to EAL.
+ *
+ * Preallocate virtual memory.
+ *
+ * @return
+ * 0 on success, negative on error
+ */
+int rte_eal_memseg_init(void);
+
/**
* Map memory
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..b8c8a59e0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -767,6 +767,34 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end)
return 0;
}
+static uint64_t
+get_mem_amount(uint64_t page_sz, uint64_t max_mem)
+{
+ uint64_t area_sz, max_pages;
+
+ /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
+ max_pages = RTE_MAX_MEMSEG_PER_LIST;
+ max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
+
+ area_sz = RTE_MIN(page_sz * max_pages, max_mem);
+
+ /* make sure the list isn't smaller than the page size */
+ area_sz = RTE_MAX(area_sz, page_sz);
+
+ return RTE_ALIGN(area_sz, page_sz);
+}
+
+static int
+free_memseg_list(struct rte_memseg_list *msl)
+{
+ if (rte_fbarray_destroy(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n");
+ return -1;
+ }
+ memset(msl, 0, sizeof(*msl));
+ return 0;
+}
+
#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i"
static int
alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
@@ -1840,3 +1868,316 @@ rte_eal_using_phys_addrs(void)
{
return phys_addrs_available;
}
+
+static int __rte_unused
+memseg_primary_init_32(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int active_sockets, hpi_idx, msl_idx = 0;
+ unsigned int socket_id, i;
+ struct rte_memseg_list *msl;
+ uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem;
+ uint64_t max_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ /* this is a giant hack, but desperate times call for desperate
+ * measures. in legacy 32-bit mode, we cannot preallocate VA space,
+ * because having upwards of 2 gigabytes of VA space already mapped will
+ * interfere with our ability to map and sort hugepages.
+ *
+ * therefore, in legacy 32-bit mode, we will be initializing memseg
+ * lists much later - in eal_memory.c, right after we unmap all the
+ * unneeded pages. this will not affect secondary processes, as those
+ * should be able to mmap the space without (too many) problems.
+ */
+ if (internal_config.legacy_mem)
+ return 0;
+
+ /* 32-bit mode is a very special case. we cannot know in advance where
+ * the user will want to allocate their memory, so we have to do some
+ * heuristics.
+ */
+ active_sockets = 0;
+ total_requested_mem = 0;
+ if (internal_config.force_sockets)
+ for (i = 0; i < rte_socket_count(); i++) {
+ uint64_t mem;
+
+ socket_id = rte_socket_id_by_idx(i);
+ mem = internal_config.socket_mem[socket_id];
+
+ if (mem == 0)
+ continue;
+
+ active_sockets++;
+ total_requested_mem += mem;
+ }
+ else
+ total_requested_mem = internal_config.memory;
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ if (total_requested_mem > max_mem) {
+ RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n",
+ (unsigned int)(max_mem >> 20));
+ return -1;
+ }
+ total_extra_mem = max_mem - total_requested_mem;
+ extra_mem_per_socket = active_sockets == 0 ? total_extra_mem :
+ total_extra_mem / active_sockets;
+
+ /* the allocation logic is a little bit convoluted, but here's how it
+ * works, in a nutshell:
+ * - if user hasn't specified on which sockets to allocate memory via
+ * --socket-mem, we allocate all of our memory on master core socket.
+ * - if user has specified sockets to allocate memory on, there may be
+ * some "unused" memory left (e.g. if user has specified --socket-mem
+ * such that not all memory adds up to 2 gigabytes), so add it to all
+ * sockets that are in use equally.
+ *
+ * page sizes are sorted by size in descending order, so we can safely
+ * assume that we dispense with bigger page sizes first.
+ */
+
+ /* create memseg lists */
+ for (i = 0; i < rte_socket_count(); i++) {
+ int hp_sizes = (int) internal_config.num_hugepage_sizes;
+ uint64_t max_socket_mem, cur_socket_mem;
+ unsigned int master_lcore_socket;
+ struct rte_config *cfg = rte_eal_get_configuration();
+ bool skip;
+
+ socket_id = rte_socket_id_by_idx(i);
+
+#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ if (socket_id > 0)
+ break;
+#endif
+
+ /* if we didn't specifically request memory on this socket */
+ skip = active_sockets != 0 &&
+ internal_config.socket_mem[socket_id] == 0;
+ /* ...or if we didn't specifically request memory on *any*
+ * socket, and this is not master lcore
+ */
+ master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore);
+ skip |= active_sockets == 0 && socket_id != master_lcore_socket;
+
+ if (skip) {
+ RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n",
+ socket_id);
+ continue;
+ }
+
+ /* max amount of memory on this socket */
+ max_socket_mem = (active_sockets != 0 ?
+ internal_config.socket_mem[socket_id] :
+ internal_config.memory) +
+ extra_mem_per_socket;
+ cur_socket_mem = 0;
+
+ for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) {
+ uint64_t max_pagesz_mem, cur_pagesz_mem = 0;
+ uint64_t hugepage_sz;
+ struct hugepage_info *hpi;
+ int type_msl_idx, max_segs, total_segs = 0;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ /* check if pages are actually available */
+ if (hpi->num_pages[socket_id] == 0)
+ continue;
+
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+ max_pagesz_mem = max_socket_mem - cur_socket_mem;
+
+ /* make it multiple of page size */
+ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem,
+ hugepage_sz);
+
+ RTE_LOG(DEBUG, EAL, "Attempting to preallocate "
+ "%" PRIu64 "M on socket %i\n",
+ max_pagesz_mem >> 20, socket_id);
+
+ type_msl_idx = 0;
+ while (cur_pagesz_mem < max_pagesz_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ max_pagesz_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ socket_id, type_msl_idx)) {
+ /* failing to allocate a memseg list is
+ * a serious error.
+ */
+ RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n");
+ return -1;
+ }
+
+ if (alloc_va_space(msl)) {
+ /* if we couldn't allocate VA space, we
+ * can try with smaller page sizes.
+ */
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n");
+ /* deallocate memseg list */
+ if (free_memseg_list(msl))
+ return -1;
+ break;
+ }
+
+ total_segs += msl->memseg_arr.len;
+ cur_pagesz_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+ msl_idx++;
+ }
+ cur_socket_mem += cur_pagesz_mem;
+ }
+ if (cur_socket_mem == 0) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n",
+ socket_id);
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int __rte_unused
+memseg_primary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i, socket_id, hpi_idx, msl_idx = 0;
+ struct rte_memseg_list *msl;
+ uint64_t max_mem, total_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ total_mem = 0;
+
+ /* create memseg lists */
+ for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
+ hpi_idx++) {
+ struct hugepage_info *hpi;
+ uint64_t hugepage_sz;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ for (i = 0; i < (int) rte_socket_count(); i++) {
+ uint64_t max_type_mem, total_type_mem = 0;
+ int type_msl_idx, max_segs, total_segs = 0;
+
+ socket_id = rte_socket_id_by_idx(i);
+
+#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ if (socket_id > 0)
+ break;
+#endif
+
+ if (total_mem >= max_mem)
+ break;
+
+ max_type_mem = RTE_MIN(max_mem - total_mem,
+ (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+
+ type_msl_idx = 0;
+ while (total_type_mem < max_type_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_max_mem, cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx++];
+
+ cur_max_mem = max_type_mem - total_type_mem;
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ cur_max_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ socket_id, type_msl_idx))
+ return -1;
+
+ total_segs += msl->memseg_arr.len;
+ total_type_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
+ return -1;
+ }
+ }
+ total_mem += total_type_mem;
+ }
+ }
+ return 0;
+}
+
+static int
+memseg_secondary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int msl_idx = 0;
+ struct rte_memseg_list *msl;
+
+ for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ /* skip empty memseg lists */
+ if (msl->memseg_arr.len == 0)
+ continue;
+
+ if (rte_fbarray_attach(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
+ return -1;
+ }
+
+ /* preallocate VA space */
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+int
+rte_eal_memseg_init(void)
+{
+ return rte_eal_process_type() == RTE_PROC_PRIMARY ?
+#ifndef RTE_ARCH_64
+ memseg_primary_init_32() :
+#else
+ memseg_primary_init() :
+#endif
+ memseg_secondary_init();
+}
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [PATCH v2 1/3] eal/bsdapp: fix segment index display
2018-06-11 16:13 [dpdk-dev] [PATCH 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
2018-06-11 16:13 ` [dpdk-dev] [PATCH 2/3] eal/bsdapp: concatenate adjacent segments Anatoly Burakov
2018-06-11 16:13 ` [dpdk-dev] [PATCH 3/3] eal: make memory segment preallocation OS-specific Anatoly Burakov
@ 2018-06-28 11:41 ` Anatoly Burakov
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 2/3] eal/bsdapp: concatenate adjacent segments Anatoly Burakov
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific Anatoly Burakov
4 siblings, 0 replies; 7+ messages in thread
From: Anatoly Burakov @ 2018-06-28 11:41 UTC (permalink / raw)
To: dev; +Cc: Bruce Richardson, stable
Segment index was set to 0 at start but was never incremented.
This has no consequences other than displayed number of segments
allocated at initialization. Fix this by incrementing it after
displaying.
Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index a5e034789..ca06de2f8 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -200,7 +200,7 @@ rte_eal_hugepage_init(void)
RTE_LOG(INFO, EAL, "Mapped memory segment %u @ %p: physaddr:0x%"
PRIx64", len %zu\n",
- seg_idx, addr, physaddr, page_sz);
+ seg_idx++, addr, physaddr, page_sz);
total_mem += seg->len;
}
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [PATCH v2 2/3] eal/bsdapp: concatenate adjacent segments
2018-06-11 16:13 [dpdk-dev] [PATCH 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
` (2 preceding siblings ...)
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
@ 2018-06-28 11:41 ` Anatoly Burakov
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific Anatoly Burakov
4 siblings, 0 replies; 7+ messages in thread
From: Anatoly Burakov @ 2018-06-28 11:41 UTC (permalink / raw)
To: dev; +Cc: Bruce Richardson
Previously, memory allocator always left holes between mapped
contigmem segments, even if they were IOVA-contiguous. Fix this
by remembering last IOVA address and memseg index, and checking
against those when mapping new contigmem segments.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v2:
- Fix comparison when deciding if hole is needed
lib/librte_eal/bsdapp/eal/eal_memory.c | 48 ++++++++++++++++----------
1 file changed, 30 insertions(+), 18 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index ca06de2f8..7bf05c760 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -104,6 +104,8 @@ rte_eal_hugepage_init(void)
/* map all hugepages and sort them */
for (i = 0; i < internal_config.num_hugepage_sizes; i ++){
struct hugepage_info *hpi;
+ rte_iova_t prev_end = 0;
+ int prev_ms_idx = -1;
uint64_t page_sz, mem_needed;
unsigned int n_pages, max_pages;
@@ -124,10 +126,27 @@ rte_eal_hugepage_init(void)
int error;
size_t sysctl_size = sizeof(physaddr);
char physaddr_str[64];
+ bool is_adjacent;
+
+ /* first, check if this segment is IOVA-adjacent to
+ * the previous one.
+ */
+ snprintf(physaddr_str, sizeof(physaddr_str),
+ "hw.contigmem.physaddr.%d", j);
+ error = sysctlbyname(physaddr_str, &physaddr,
+ &sysctl_size, NULL, 0);
+ if (error < 0) {
+ RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u "
+ "from %s\n", j, hpi->hugedir);
+ return -1;
+ }
+
+ is_adjacent = prev_end != 0 && physaddr == prev_end;
+ prev_end = physaddr + hpi->hugepage_sz;
for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS;
msl_idx++) {
- bool empty;
+ bool empty, need_hole;
msl = &mcfg->memsegs[msl_idx];
arr = &msl->memseg_arr;
@@ -136,20 +155,23 @@ rte_eal_hugepage_init(void)
empty = arr->count == 0;
- /* we need 1, plus hole if not empty */
+ /* we need a hole if this isn't an empty memseg
+ * list, and if previous segment was not
+ * adjacent to current one.
+ */
+ need_hole = !empty && !is_adjacent;
+
+ /* we need 1, plus hole if not adjacent */
ms_idx = rte_fbarray_find_next_n_free(arr,
- 0, 1 + (empty ? 1 : 0));
+ 0, 1 + (need_hole ? 1 : 0));
/* memseg list is full? */
if (ms_idx < 0)
continue;
- /* leave some space between memsegs, they are
- * not IOVA contiguous, so they shouldn't be VA
- * contiguous either.
- */
- if (!empty)
+ if (need_hole && prev_ms_idx == ms_idx - 1)
ms_idx++;
+ prev_ms_idx = ms_idx;
break;
}
@@ -178,16 +200,6 @@ rte_eal_hugepage_init(void)
return -1;
}
- snprintf(physaddr_str, sizeof(physaddr_str), "hw.contigmem"
- ".physaddr.%d", j);
- error = sysctlbyname(physaddr_str, &physaddr, &sysctl_size,
- NULL, 0);
- if (error < 0) {
- RTE_LOG(ERR, EAL, "Failed to get physical addr for buffer %u "
- "from %s\n", j, hpi->hugedir);
- return -1;
- }
-
seg->addr = addr;
seg->iova = physaddr;
seg->hugepage_sz = page_sz;
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific
2018-06-11 16:13 [dpdk-dev] [PATCH 1/3] eal/bsdapp: fix segment index display Anatoly Burakov
` (3 preceding siblings ...)
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 2/3] eal/bsdapp: concatenate adjacent segments Anatoly Burakov
@ 2018-06-28 11:41 ` Anatoly Burakov
2018-07-12 23:00 ` Thomas Monjalon
4 siblings, 1 reply; 7+ messages in thread
From: Anatoly Burakov @ 2018-06-28 11:41 UTC (permalink / raw)
To: dev; +Cc: Bruce Richardson
In the perfect world, it wouldn't matter how much memory was
preallocated because most of it was always going to be private
anonymous zero-page mappings for the duration of the program.
However, in practice, due to peculiarities of FreeBSD, we need
to additionally limit memory allocation there. This patch moves
the segment preallocation to EAL private functions that will be
implemented by an OS-specific EAL rather than being in the common
memory-related code.
Since there is no support for growing/shrinking memory use at
runtime on FreeBSD anyway, this does not inhibit any functionality
but makes core dumps faster even on default settings.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
For Linuxapp, this is 99% code move (aside from slight changes due to
code deduplication between Linuxapp EAL and old common memory code),
while for FreeBSD it's mostly code move but with changes due to
dropping 32-bit code and implementing FreeBSD-specific limits on
memory preallocation outlined in the commit.
lib/librte_eal/bsdapp/eal/eal_memory.c | 215 ++++++++++++
lib/librte_eal/common/eal_common_memory.c | 386 +---------------------
lib/librte_eal/common/eal_private.h | 12 +
lib/librte_eal/linuxapp/eal/eal_memory.c | 341 +++++++++++++++++++
4 files changed, 569 insertions(+), 385 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 7bf05c760..16d2bc7c3 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -12,6 +12,7 @@
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_log.h>
#include <rte_string_fns.h>
#include "eal_private.h"
@@ -300,3 +301,217 @@ rte_eal_using_phys_addrs(void)
{
return 0;
}
+
+static uint64_t
+get_mem_amount(uint64_t page_sz, uint64_t max_mem)
+{
+ uint64_t area_sz, max_pages;
+
+ /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
+ max_pages = RTE_MAX_MEMSEG_PER_LIST;
+ max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
+
+ area_sz = RTE_MIN(page_sz * max_pages, max_mem);
+
+ /* make sure the list isn't smaller than the page size */
+ area_sz = RTE_MAX(area_sz, page_sz);
+
+ return RTE_ALIGN(area_sz, page_sz);
+}
+
+#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i"
+static int
+alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
+ int n_segs, int socket_id, int type_msl_idx)
+{
+ char name[RTE_FBARRAY_NAME_LEN];
+
+ snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
+ type_msl_idx);
+ if (rte_fbarray_init(&msl->memseg_arr, name, n_segs,
+ sizeof(struct rte_memseg))) {
+ RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n",
+ rte_strerror(rte_errno));
+ return -1;
+ }
+
+ msl->page_sz = page_sz;
+ msl->socket_id = socket_id;
+ msl->base_va = NULL;
+
+ RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
+ (size_t)page_sz >> 10, socket_id);
+
+ return 0;
+}
+
+static int
+alloc_va_space(struct rte_memseg_list *msl)
+{
+ uint64_t page_sz;
+ size_t mem_sz;
+ void *addr;
+ int flags = 0;
+
+#ifdef RTE_ARCH_PPC_64
+ flags |= MAP_HUGETLB;
+#endif
+
+ page_sz = msl->page_sz;
+ mem_sz = page_sz * msl->memseg_arr.len;
+
+ addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags);
+ if (addr == NULL) {
+ if (rte_errno == EADDRNOTAVAIL)
+ RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n",
+ (unsigned long long)mem_sz, msl->base_va);
+ else
+ RTE_LOG(ERR, EAL, "Cannot reserve memory\n");
+ return -1;
+ }
+ msl->base_va = addr;
+
+ return 0;
+}
+
+
+static int
+memseg_primary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int hpi_idx, msl_idx = 0;
+ struct rte_memseg_list *msl;
+ uint64_t max_mem, total_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ /* FreeBSD has an issue where core dump will dump the entire memory
+ * contents, including anonymous zero-page memory. Therefore, while we
+ * will be limiting total amount of memory to RTE_MAX_MEM_MB, we will
+ * also be further limiting total memory amount to whatever memory is
+ * available to us through contigmem driver (plus spacing blocks).
+ *
+ * so, at each stage, we will be checking how much memory we are
+ * preallocating, and adjust all the values accordingly.
+ */
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ total_mem = 0;
+
+ /* create memseg lists */
+ for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
+ hpi_idx++) {
+ uint64_t max_type_mem, total_type_mem = 0;
+ uint64_t avail_mem;
+ int type_msl_idx, max_segs, avail_segs, total_segs = 0;
+ struct hugepage_info *hpi;
+ uint64_t hugepage_sz;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ /* no NUMA support on FreeBSD */
+
+ /* check if we've already exceeded total memory amount */
+ if (total_mem >= max_mem)
+ break;
+
+ /* first, calculate theoretical limits according to config */
+ max_type_mem = RTE_MIN(max_mem - total_mem,
+ (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+
+ /* now, limit all of that to whatever will actually be
+ * available to us, because without dynamic allocation support,
+ * all of that extra memory will be sitting there being useless
+ * and slowing down core dumps in case of a crash.
+ *
+ * we need (N*2)-1 segments because we cannot guarantee that
+ * each segment will be IOVA-contiguous with the previous one,
+ * so we will allocate more and put spaces inbetween segments
+ * that are non-contiguous.
+ */
+ avail_segs = (hpi->num_pages[0] * 2) - 1;
+ avail_mem = avail_segs * hugepage_sz;
+
+ max_type_mem = RTE_MIN(avail_mem, max_type_mem);
+ max_segs = RTE_MIN(avail_segs, max_segs);
+
+ type_msl_idx = 0;
+ while (total_type_mem < max_type_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_max_mem, cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx++];
+
+ cur_max_mem = max_type_mem - total_type_mem;
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ cur_max_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ 0, type_msl_idx))
+ return -1;
+
+ total_segs += msl->memseg_arr.len;
+ total_type_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
+ return -1;
+ }
+ }
+ total_mem += total_type_mem;
+ }
+ return 0;
+}
+
+static int
+memseg_secondary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int msl_idx = 0;
+ struct rte_memseg_list *msl;
+
+ for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ /* skip empty memseg lists */
+ if (msl->memseg_arr.len == 0)
+ continue;
+
+ if (rte_fbarray_attach(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
+ return -1;
+ }
+
+ /* preallocate VA space */
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+int
+rte_eal_memseg_init(void)
+{
+ return rte_eal_process_type() == RTE_PROC_PRIMARY ?
+ memseg_primary_init() :
+ memseg_secondary_init();
+}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..4b7389ed4 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -153,382 +153,6 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
return aligned_addr;
}
-static uint64_t
-get_mem_amount(uint64_t page_sz, uint64_t max_mem)
-{
- uint64_t area_sz, max_pages;
-
- /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
- max_pages = RTE_MAX_MEMSEG_PER_LIST;
- max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
-
- area_sz = RTE_MIN(page_sz * max_pages, max_mem);
-
- /* make sure the list isn't smaller than the page size */
- area_sz = RTE_MAX(area_sz, page_sz);
-
- return RTE_ALIGN(area_sz, page_sz);
-}
-
-static int
-free_memseg_list(struct rte_memseg_list *msl)
-{
- if (rte_fbarray_destroy(&msl->memseg_arr)) {
- RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n");
- return -1;
- }
- memset(msl, 0, sizeof(*msl));
- return 0;
-}
-
-static int
-alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
- uint64_t max_mem, int socket_id, int type_msl_idx)
-{
- char name[RTE_FBARRAY_NAME_LEN];
- uint64_t mem_amount;
- int max_segs;
-
- mem_amount = get_mem_amount(page_sz, max_mem);
- max_segs = mem_amount / page_sz;
-
- snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
- type_msl_idx);
- if (rte_fbarray_init(&msl->memseg_arr, name, max_segs,
- sizeof(struct rte_memseg))) {
- RTE_LOG(ERR, EAL, "Cannot allocate memseg list: %s\n",
- rte_strerror(rte_errno));
- return -1;
- }
-
- msl->page_sz = page_sz;
- msl->socket_id = socket_id;
- msl->base_va = NULL;
-
- RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
- (size_t)page_sz >> 10, socket_id);
-
- return 0;
-}
-
-static int
-alloc_va_space(struct rte_memseg_list *msl)
-{
- uint64_t page_sz;
- size_t mem_sz;
- void *addr;
- int flags = 0;
-
-#ifdef RTE_ARCH_PPC_64
- flags |= MAP_HUGETLB;
-#endif
-
- page_sz = msl->page_sz;
- mem_sz = page_sz * msl->memseg_arr.len;
-
- addr = eal_get_virtual_area(msl->base_va, &mem_sz, page_sz, 0, flags);
- if (addr == NULL) {
- if (rte_errno == EADDRNOTAVAIL)
- RTE_LOG(ERR, EAL, "Could not mmap %llu bytes at [%p] - please use '--base-virtaddr' option\n",
- (unsigned long long)mem_sz, msl->base_va);
- else
- RTE_LOG(ERR, EAL, "Cannot reserve memory\n");
- return -1;
- }
- msl->base_va = addr;
-
- return 0;
-}
-
-static int __rte_unused
-memseg_primary_init_32(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int active_sockets, hpi_idx, msl_idx = 0;
- unsigned int socket_id, i;
- struct rte_memseg_list *msl;
- uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem;
- uint64_t max_mem;
-
- /* no-huge does not need this at all */
- if (internal_config.no_hugetlbfs)
- return 0;
-
- /* this is a giant hack, but desperate times call for desperate
- * measures. in legacy 32-bit mode, we cannot preallocate VA space,
- * because having upwards of 2 gigabytes of VA space already mapped will
- * interfere with our ability to map and sort hugepages.
- *
- * therefore, in legacy 32-bit mode, we will be initializing memseg
- * lists much later - in eal_memory.c, right after we unmap all the
- * unneeded pages. this will not affect secondary processes, as those
- * should be able to mmap the space without (too many) problems.
- */
- if (internal_config.legacy_mem)
- return 0;
-
- /* 32-bit mode is a very special case. we cannot know in advance where
- * the user will want to allocate their memory, so we have to do some
- * heuristics.
- */
- active_sockets = 0;
- total_requested_mem = 0;
- if (internal_config.force_sockets)
- for (i = 0; i < rte_socket_count(); i++) {
- uint64_t mem;
-
- socket_id = rte_socket_id_by_idx(i);
- mem = internal_config.socket_mem[socket_id];
-
- if (mem == 0)
- continue;
-
- active_sockets++;
- total_requested_mem += mem;
- }
- else
- total_requested_mem = internal_config.memory;
-
- max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
- if (total_requested_mem > max_mem) {
- RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n",
- (unsigned int)(max_mem >> 20));
- return -1;
- }
- total_extra_mem = max_mem - total_requested_mem;
- extra_mem_per_socket = active_sockets == 0 ? total_extra_mem :
- total_extra_mem / active_sockets;
-
- /* the allocation logic is a little bit convoluted, but here's how it
- * works, in a nutshell:
- * - if user hasn't specified on which sockets to allocate memory via
- * --socket-mem, we allocate all of our memory on master core socket.
- * - if user has specified sockets to allocate memory on, there may be
- * some "unused" memory left (e.g. if user has specified --socket-mem
- * such that not all memory adds up to 2 gigabytes), so add it to all
- * sockets that are in use equally.
- *
- * page sizes are sorted by size in descending order, so we can safely
- * assume that we dispense with bigger page sizes first.
- */
-
- /* create memseg lists */
- for (i = 0; i < rte_socket_count(); i++) {
- int hp_sizes = (int) internal_config.num_hugepage_sizes;
- uint64_t max_socket_mem, cur_socket_mem;
- unsigned int master_lcore_socket;
- struct rte_config *cfg = rte_eal_get_configuration();
- bool skip;
-
- socket_id = rte_socket_id_by_idx(i);
-
-#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
- if (socket_id > 0)
- break;
-#endif
-
- /* if we didn't specifically request memory on this socket */
- skip = active_sockets != 0 &&
- internal_config.socket_mem[socket_id] == 0;
- /* ...or if we didn't specifically request memory on *any*
- * socket, and this is not master lcore
- */
- master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore);
- skip |= active_sockets == 0 && socket_id != master_lcore_socket;
-
- if (skip) {
- RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n",
- socket_id);
- continue;
- }
-
- /* max amount of memory on this socket */
- max_socket_mem = (active_sockets != 0 ?
- internal_config.socket_mem[socket_id] :
- internal_config.memory) +
- extra_mem_per_socket;
- cur_socket_mem = 0;
-
- for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) {
- uint64_t max_pagesz_mem, cur_pagesz_mem = 0;
- uint64_t hugepage_sz;
- struct hugepage_info *hpi;
- int type_msl_idx, max_segs, total_segs = 0;
-
- hpi = &internal_config.hugepage_info[hpi_idx];
- hugepage_sz = hpi->hugepage_sz;
-
- /* check if pages are actually available */
- if (hpi->num_pages[socket_id] == 0)
- continue;
-
- max_segs = RTE_MAX_MEMSEG_PER_TYPE;
- max_pagesz_mem = max_socket_mem - cur_socket_mem;
-
- /* make it multiple of page size */
- max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem,
- hugepage_sz);
-
- RTE_LOG(DEBUG, EAL, "Attempting to preallocate "
- "%" PRIu64 "M on socket %i\n",
- max_pagesz_mem >> 20, socket_id);
-
- type_msl_idx = 0;
- while (cur_pagesz_mem < max_pagesz_mem &&
- total_segs < max_segs) {
- if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
- RTE_LOG(ERR, EAL,
- "No more space in memseg lists, please increase %s\n",
- RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
- return -1;
- }
-
- msl = &mcfg->memsegs[msl_idx];
-
- if (alloc_memseg_list(msl, hugepage_sz,
- max_pagesz_mem, socket_id,
- type_msl_idx)) {
- /* failing to allocate a memseg list is
- * a serious error.
- */
- RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n");
- return -1;
- }
-
- if (alloc_va_space(msl)) {
- /* if we couldn't allocate VA space, we
- * can try with smaller page sizes.
- */
- RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n");
- /* deallocate memseg list */
- if (free_memseg_list(msl))
- return -1;
- break;
- }
-
- total_segs += msl->memseg_arr.len;
- cur_pagesz_mem = total_segs * hugepage_sz;
- type_msl_idx++;
- msl_idx++;
- }
- cur_socket_mem += cur_pagesz_mem;
- }
- if (cur_socket_mem == 0) {
- RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n",
- socket_id);
- return -1;
- }
- }
-
- return 0;
-}
-
-static int __rte_unused
-memseg_primary_init(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int i, socket_id, hpi_idx, msl_idx = 0;
- struct rte_memseg_list *msl;
- uint64_t max_mem, total_mem;
-
- /* no-huge does not need this at all */
- if (internal_config.no_hugetlbfs)
- return 0;
-
- max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
- total_mem = 0;
-
- /* create memseg lists */
- for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
- hpi_idx++) {
- struct hugepage_info *hpi;
- uint64_t hugepage_sz;
-
- hpi = &internal_config.hugepage_info[hpi_idx];
- hugepage_sz = hpi->hugepage_sz;
-
- for (i = 0; i < (int) rte_socket_count(); i++) {
- uint64_t max_type_mem, total_type_mem = 0;
- int type_msl_idx, max_segs, total_segs = 0;
-
- socket_id = rte_socket_id_by_idx(i);
-
-#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
- if (socket_id > 0)
- break;
-#endif
-
- if (total_mem >= max_mem)
- break;
-
- max_type_mem = RTE_MIN(max_mem - total_mem,
- (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
- max_segs = RTE_MAX_MEMSEG_PER_TYPE;
-
- type_msl_idx = 0;
- while (total_type_mem < max_type_mem &&
- total_segs < max_segs) {
- uint64_t cur_max_mem;
- if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
- RTE_LOG(ERR, EAL,
- "No more space in memseg lists, please increase %s\n",
- RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
- return -1;
- }
-
- msl = &mcfg->memsegs[msl_idx++];
-
- cur_max_mem = max_type_mem - total_type_mem;
- if (alloc_memseg_list(msl, hugepage_sz,
- cur_max_mem, socket_id,
- type_msl_idx))
- return -1;
-
- total_segs += msl->memseg_arr.len;
- total_type_mem = total_segs * hugepage_sz;
- type_msl_idx++;
-
- if (alloc_va_space(msl)) {
- RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
- return -1;
- }
- }
- total_mem += total_type_mem;
- }
- }
- return 0;
-}
-
-static int
-memseg_secondary_init(void)
-{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int msl_idx = 0;
- struct rte_memseg_list *msl;
-
- for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
-
- msl = &mcfg->memsegs[msl_idx];
-
- /* skip empty memseg lists */
- if (msl->memseg_arr.len == 0)
- continue;
-
- if (rte_fbarray_attach(&msl->memseg_arr)) {
- RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
- return -1;
- }
-
- /* preallocate VA space */
- if (alloc_va_space(msl)) {
- RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
- return -1;
- }
- }
-
- return 0;
-}
-
static struct rte_memseg *
virt2memseg(const void *addr, const struct rte_memseg_list *msl)
{
@@ -918,15 +542,7 @@ rte_eal_memory_init(void)
/* lock mem hotplug here, to prevent races while we init */
rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- retval = rte_eal_process_type() == RTE_PROC_PRIMARY ?
-#ifndef RTE_ARCH_64
- memseg_primary_init_32() :
-#else
- memseg_primary_init() :
-#endif
- memseg_secondary_init();
-
- if (retval < 0)
+ if (rte_eal_memseg_init() < 0)
goto fail;
if (eal_memalloc_init() < 0)
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d50..b742f4c58 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -46,6 +46,18 @@ void eal_log_set_default(FILE *default_log);
*/
int rte_eal_cpu_init(void);
+/**
+ * Create memseg lists
+ *
+ * This function is private to EAL.
+ *
+ * Preallocate virtual memory.
+ *
+ * @return
+ * 0 on success, negative on error
+ */
+int rte_eal_memseg_init(void);
+
/**
* Map memory
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..b8c8a59e0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -767,6 +767,34 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end)
return 0;
}
+static uint64_t
+get_mem_amount(uint64_t page_sz, uint64_t max_mem)
+{
+ uint64_t area_sz, max_pages;
+
+ /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */
+ max_pages = RTE_MAX_MEMSEG_PER_LIST;
+ max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem);
+
+ area_sz = RTE_MIN(page_sz * max_pages, max_mem);
+
+ /* make sure the list isn't smaller than the page size */
+ area_sz = RTE_MAX(area_sz, page_sz);
+
+ return RTE_ALIGN(area_sz, page_sz);
+}
+
+static int
+free_memseg_list(struct rte_memseg_list *msl)
+{
+ if (rte_fbarray_destroy(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot destroy memseg list\n");
+ return -1;
+ }
+ memset(msl, 0, sizeof(*msl));
+ return 0;
+}
+
#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i"
static int
alloc_memseg_list(struct rte_memseg_list *msl, uint64_t page_sz,
@@ -1840,3 +1868,316 @@ rte_eal_using_phys_addrs(void)
{
return phys_addrs_available;
}
+
+static int __rte_unused
+memseg_primary_init_32(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int active_sockets, hpi_idx, msl_idx = 0;
+ unsigned int socket_id, i;
+ struct rte_memseg_list *msl;
+ uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem;
+ uint64_t max_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ /* this is a giant hack, but desperate times call for desperate
+ * measures. in legacy 32-bit mode, we cannot preallocate VA space,
+ * because having upwards of 2 gigabytes of VA space already mapped will
+ * interfere with our ability to map and sort hugepages.
+ *
+ * therefore, in legacy 32-bit mode, we will be initializing memseg
+ * lists much later - in eal_memory.c, right after we unmap all the
+ * unneeded pages. this will not affect secondary processes, as those
+ * should be able to mmap the space without (too many) problems.
+ */
+ if (internal_config.legacy_mem)
+ return 0;
+
+ /* 32-bit mode is a very special case. we cannot know in advance where
+ * the user will want to allocate their memory, so we have to do some
+ * heuristics.
+ */
+ active_sockets = 0;
+ total_requested_mem = 0;
+ if (internal_config.force_sockets)
+ for (i = 0; i < rte_socket_count(); i++) {
+ uint64_t mem;
+
+ socket_id = rte_socket_id_by_idx(i);
+ mem = internal_config.socket_mem[socket_id];
+
+ if (mem == 0)
+ continue;
+
+ active_sockets++;
+ total_requested_mem += mem;
+ }
+ else
+ total_requested_mem = internal_config.memory;
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ if (total_requested_mem > max_mem) {
+ RTE_LOG(ERR, EAL, "Invalid parameters: 32-bit process can at most use %uM of memory\n",
+ (unsigned int)(max_mem >> 20));
+ return -1;
+ }
+ total_extra_mem = max_mem - total_requested_mem;
+ extra_mem_per_socket = active_sockets == 0 ? total_extra_mem :
+ total_extra_mem / active_sockets;
+
+ /* the allocation logic is a little bit convoluted, but here's how it
+ * works, in a nutshell:
+ * - if user hasn't specified on which sockets to allocate memory via
+ * --socket-mem, we allocate all of our memory on master core socket.
+ * - if user has specified sockets to allocate memory on, there may be
+ * some "unused" memory left (e.g. if user has specified --socket-mem
+ * such that not all memory adds up to 2 gigabytes), so add it to all
+ * sockets that are in use equally.
+ *
+ * page sizes are sorted by size in descending order, so we can safely
+ * assume that we dispense with bigger page sizes first.
+ */
+
+ /* create memseg lists */
+ for (i = 0; i < rte_socket_count(); i++) {
+ int hp_sizes = (int) internal_config.num_hugepage_sizes;
+ uint64_t max_socket_mem, cur_socket_mem;
+ unsigned int master_lcore_socket;
+ struct rte_config *cfg = rte_eal_get_configuration();
+ bool skip;
+
+ socket_id = rte_socket_id_by_idx(i);
+
+#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ if (socket_id > 0)
+ break;
+#endif
+
+ /* if we didn't specifically request memory on this socket */
+ skip = active_sockets != 0 &&
+ internal_config.socket_mem[socket_id] == 0;
+ /* ...or if we didn't specifically request memory on *any*
+ * socket, and this is not master lcore
+ */
+ master_lcore_socket = rte_lcore_to_socket_id(cfg->master_lcore);
+ skip |= active_sockets == 0 && socket_id != master_lcore_socket;
+
+ if (skip) {
+ RTE_LOG(DEBUG, EAL, "Will not preallocate memory on socket %u\n",
+ socket_id);
+ continue;
+ }
+
+ /* max amount of memory on this socket */
+ max_socket_mem = (active_sockets != 0 ?
+ internal_config.socket_mem[socket_id] :
+ internal_config.memory) +
+ extra_mem_per_socket;
+ cur_socket_mem = 0;
+
+ for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) {
+ uint64_t max_pagesz_mem, cur_pagesz_mem = 0;
+ uint64_t hugepage_sz;
+ struct hugepage_info *hpi;
+ int type_msl_idx, max_segs, total_segs = 0;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ /* check if pages are actually available */
+ if (hpi->num_pages[socket_id] == 0)
+ continue;
+
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+ max_pagesz_mem = max_socket_mem - cur_socket_mem;
+
+ /* make it multiple of page size */
+ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem,
+ hugepage_sz);
+
+ RTE_LOG(DEBUG, EAL, "Attempting to preallocate "
+ "%" PRIu64 "M on socket %i\n",
+ max_pagesz_mem >> 20, socket_id);
+
+ type_msl_idx = 0;
+ while (cur_pagesz_mem < max_pagesz_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ max_pagesz_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ socket_id, type_msl_idx)) {
+ /* failing to allocate a memseg list is
+ * a serious error.
+ */
+ RTE_LOG(ERR, EAL, "Cannot allocate memseg list\n");
+ return -1;
+ }
+
+ if (alloc_va_space(msl)) {
+ /* if we couldn't allocate VA space, we
+ * can try with smaller page sizes.
+ */
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list, retrying with different page size\n");
+ /* deallocate memseg list */
+ if (free_memseg_list(msl))
+ return -1;
+ break;
+ }
+
+ total_segs += msl->memseg_arr.len;
+ cur_pagesz_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+ msl_idx++;
+ }
+ cur_socket_mem += cur_pagesz_mem;
+ }
+ if (cur_socket_mem == 0) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space on socket %u\n",
+ socket_id);
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int __rte_unused
+memseg_primary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i, socket_id, hpi_idx, msl_idx = 0;
+ struct rte_memseg_list *msl;
+ uint64_t max_mem, total_mem;
+
+ /* no-huge does not need this at all */
+ if (internal_config.no_hugetlbfs)
+ return 0;
+
+ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
+ total_mem = 0;
+
+ /* create memseg lists */
+ for (hpi_idx = 0; hpi_idx < (int) internal_config.num_hugepage_sizes;
+ hpi_idx++) {
+ struct hugepage_info *hpi;
+ uint64_t hugepage_sz;
+
+ hpi = &internal_config.hugepage_info[hpi_idx];
+ hugepage_sz = hpi->hugepage_sz;
+
+ for (i = 0; i < (int) rte_socket_count(); i++) {
+ uint64_t max_type_mem, total_type_mem = 0;
+ int type_msl_idx, max_segs, total_segs = 0;
+
+ socket_id = rte_socket_id_by_idx(i);
+
+#ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES
+ if (socket_id > 0)
+ break;
+#endif
+
+ if (total_mem >= max_mem)
+ break;
+
+ max_type_mem = RTE_MIN(max_mem - total_mem,
+ (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20);
+ max_segs = RTE_MAX_MEMSEG_PER_TYPE;
+
+ type_msl_idx = 0;
+ while (total_type_mem < max_type_mem &&
+ total_segs < max_segs) {
+ uint64_t cur_max_mem, cur_mem;
+ unsigned int n_segs;
+
+ if (msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+ RTE_LOG(ERR, EAL,
+ "No more space in memseg lists, please increase %s\n",
+ RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS));
+ return -1;
+ }
+
+ msl = &mcfg->memsegs[msl_idx++];
+
+ cur_max_mem = max_type_mem - total_type_mem;
+
+ cur_mem = get_mem_amount(hugepage_sz,
+ cur_max_mem);
+ n_segs = cur_mem / hugepage_sz;
+
+ if (alloc_memseg_list(msl, hugepage_sz, n_segs,
+ socket_id, type_msl_idx))
+ return -1;
+
+ total_segs += msl->memseg_arr.len;
+ total_type_mem = total_segs * hugepage_sz;
+ type_msl_idx++;
+
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot allocate VA space for memseg list\n");
+ return -1;
+ }
+ }
+ total_mem += total_type_mem;
+ }
+ }
+ return 0;
+}
+
+static int
+memseg_secondary_init(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int msl_idx = 0;
+ struct rte_memseg_list *msl;
+
+ for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) {
+
+ msl = &mcfg->memsegs[msl_idx];
+
+ /* skip empty memseg lists */
+ if (msl->memseg_arr.len == 0)
+ continue;
+
+ if (rte_fbarray_attach(&msl->memseg_arr)) {
+ RTE_LOG(ERR, EAL, "Cannot attach to primary process memseg lists\n");
+ return -1;
+ }
+
+ /* preallocate VA space */
+ if (alloc_va_space(msl)) {
+ RTE_LOG(ERR, EAL, "Cannot preallocate VA space for hugepage memory\n");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+int
+rte_eal_memseg_init(void)
+{
+ return rte_eal_process_type() == RTE_PROC_PRIMARY ?
+#ifndef RTE_ARCH_64
+ memseg_primary_init_32() :
+#else
+ memseg_primary_init() :
+#endif
+ memseg_secondary_init();
+}
--
2.17.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific
2018-06-28 11:41 ` [dpdk-dev] [PATCH v2 3/3] eal: make memory segment preallocation OS-specific Anatoly Burakov
@ 2018-07-12 23:00 ` Thomas Monjalon
0 siblings, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2018-07-12 23:00 UTC (permalink / raw)
To: Anatoly Burakov; +Cc: dev, Bruce Richardson
28/06/2018 13:41, Anatoly Burakov:
> In the perfect world, it wouldn't matter how much memory was
> preallocated because most of it was always going to be private
> anonymous zero-page mappings for the duration of the program.
> However, in practice, due to peculiarities of FreeBSD, we need
> to additionally limit memory allocation there. This patch moves
> the segment preallocation to EAL private functions that will be
> implemented by an OS-specific EAL rather than being in the common
> memory-related code.
>
> Since there is no support for growing/shrinking memory use at
> runtime on FreeBSD anyway, this does not inhibit any functionality
> but makes core dumps faster even on default settings.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> For Linuxapp, this is 99% code move (aside from slight changes due to
> code deduplication between Linuxapp EAL and old common memory code),
> while for FreeBSD it's mostly code move but with changes due to
> dropping 32-bit code and implementing FreeBSD-specific limits on
> memory preallocation outlined in the commit.
Applied, thanks
^ permalink raw reply [flat|nested] 7+ messages in thread