DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
@ 2018-07-06 13:17 Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index malloc heaps by NUMA node index, rather than NUMA node itself
- Add identifier string to malloc heap, to uniquely identify it
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from named heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The following limitations are present:

- No multiprocess support
- No thread safety

There is currently no way to allocate memory during initialization
stage, so even if multiprocess support is added, it is not guaranteed
to work because of underlying issues with mapping fbarrays in
secondary processes. This is not an issue in single process scenario,
but it may be an issue in a multiprocess scenario in case where
primary doesn't intend to share the externally allocated memory, yet
adding such memory could fail because some other process failed to
attach to this shared memory when it wasn't needed.

Anatoly Burakov (11):
  mem: allow memseg lists to be marked as external
  eal: add function to rerieve socket index by socket ID
  malloc: index heaps using heap ID rather than NUMA node
  malloc: add name to malloc heaps
  malloc: enable retrieving statistics from named heaps
  malloc: enable allocating from named heaps
  malloc: enable creating new malloc heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow destroying heaps
  memzone: enable reserving memory from named heaps

 config/common_base                            |   1 +
 lib/librte_eal/common/eal_common_lcore.c      |  15 +
 lib/librte_eal/common/eal_common_memory.c     |  51 +++-
 lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
 .../common/include/rte_eal_memconfig.h        |   5 +-
 lib/librte_eal/common/include/rte_lcore.h     |  19 +-
 lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
 .../common/include/rte_malloc_heap.h          |   2 +
 lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
 lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  26 ++
 lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
 lib/librte_eal/rte_eal_version.map            |  10 +
 13 files changed, 1118 insertions(+), 109 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-10 11:18   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID Anatoly Burakov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memory.c     | 51 ++++++++++++++++---
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/malloc_heap.c           |  2 +-
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..835bbffb6 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -24,6 +24,21 @@
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
 
+/* forward declarations for memseg walk functions. we support external segments,
+ * but for some functionality to work, we need to either skip or not skip
+ * external segments. for example, while we expect for virt2memseg to return a
+ * valid memseg even though it's an external memseg, for regular memseg walk we
+ * want to skip those because the expectation is that we will only walk the
+ * DPDK allocated memory.
+ */
+static int
+memseg_list_walk(rte_memseg_list_walk_t func, void *arg, bool skip_external);
+static int
+memseg_walk(rte_memseg_walk_t func, void *arg, bool skip_external);
+static int
+memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg,
+		bool skip_external);
+
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
  * pointer to the mmap'd area and keep *size unmodified. Else, retry
@@ -621,9 +636,9 @@ rte_mem_iova2virt(rte_iova_t iova)
 	 * as we know they are PA-contiguous as well
 	 */
 	if (internal_config.legacy_mem)
-		rte_memseg_contig_walk(find_virt_legacy, &vi);
+		memseg_contig_walk(find_virt_legacy, &vi, false);
 	else
-		rte_memseg_walk(find_virt, &vi);
+		memseg_walk(find_virt, &vi, false);
 
 	return vi.virt;
 }
@@ -787,8 +802,8 @@ rte_mem_lock_page(const void *virt)
 	return mlock((void *)aligned, page_size);
 }
 
-int __rte_experimental
-rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
+static int
+memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg, bool skip_external)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	int i, ms_idx, ret = 0;
@@ -803,6 +818,8 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
 
 		if (msl->memseg_arr.count == 0)
 			continue;
+		if (skip_external && msl->external)
+			continue;
 
 		arr = &msl->memseg_arr;
 
@@ -837,7 +854,13 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
 }
 
 int __rte_experimental
-rte_memseg_walk(rte_memseg_walk_t func, void *arg)
+rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
+{
+	return memseg_contig_walk(func, arg, true);
+}
+
+static int
+memseg_walk(rte_memseg_walk_t func, void *arg, bool skip_external)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	int i, ms_idx, ret = 0;
@@ -852,6 +875,8 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg)
 
 		if (msl->memseg_arr.count == 0)
 			continue;
+		if (skip_external && msl->external)
+			continue;
 
 		arr = &msl->memseg_arr;
 
@@ -875,7 +900,13 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg)
 }
 
 int __rte_experimental
-rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
+rte_memseg_walk(rte_memseg_walk_t func, void *arg)
+{
+	return memseg_walk(func, arg, true);
+}
+
+static int
+memseg_list_walk(rte_memseg_list_walk_t func, void *arg, bool skip_external)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	int i, ret = 0;
@@ -888,6 +919,8 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
 
 		if (msl->base_va == NULL)
 			continue;
+		if (skip_external && msl->external)
+			continue;
 
 		ret = func(msl, arg);
 		if (ret < 0) {
@@ -904,6 +937,12 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
 	return ret;
 }
 
+int __rte_experimental
+rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
+{
+	return memseg_list_walk(func, arg, true);
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..4e8720ba6 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	bool external; /**< true if this list points to external memory */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index d6cf3af81..8a1f54905 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -631,7 +631,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	ret = 0;
 
 	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-10 13:03   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

We are preparing to switch to index heap based on heap indexes
rather than by NUMA nodes. First few indexes will be equal to
NUMA node ID indexes. For example, currently on a machine with
NUMA nodes [0, 8], heaps 0 and 8 will be active, while we want
to make it so that heaps 0 and 1 are active. However, currently
we don't have a function to map a specific NUMA node to a node
index, so add it in this patch.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_lcore.c  | 15 +++++++++++++++
 lib/librte_eal/common/include/rte_lcore.h | 19 ++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map        |  1 +
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 3167e9d79..579f5a0a1 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -132,3 +132,18 @@ rte_socket_id_by_idx(unsigned int idx)
 	}
 	return config->numa_nodes[idx];
 }
+
+int __rte_experimental
+rte_socket_idx_by_id(unsigned int socket)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	int i;
+
+	for (i = 0; i < (int) config->numa_node_count; i++) {
+		unsigned int cur_socket = config->numa_nodes[i];
+		if (cur_socket == socket)
+			return i;
+	}
+	rte_errno = EINVAL;
+	return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 6e09d9181..f58cda09a 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -156,11 +156,28 @@ rte_socket_count(void);
  *
  * @return
  *   - physical socket id as recognized by EAL
- *   - -1 on error, with errno set to EINVAL
+ *   - -1 on error, with rte_errno set to EINVAL
  */
 int __rte_experimental
 rte_socket_id_by_idx(unsigned int idx);
 
+/**
+ * Return index for a particular socket id.
+ *
+ * This will return position in list of all detected physical socket id's for a
+ * given socket. For example, on a machine with sockets [0, 8], passing
+ * 8 as a parameter will return 1.
+ *
+ * @param socket
+ *   physical socket id to return index for
+ *
+ * @return
+ *   - index of physical socket id as recognized by EAL
+ *   - -1 on error, with rte_errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_idx_by_id(unsigned int socket);
+
 /**
  * Get the ID of the physical socket of the specified lcore
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f7dd0e7bc..e7fb37b2a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,7 @@ EXPERIMENTAL {
 	rte_mp_sendmsg;
 	rte_socket_count;
 	rte_socket_id_by_idx;
+	rte_socket_idx_by_id;
 	rte_vfio_dma_map;
 	rte_vfio_dma_unmap;
 	rte_vfio_get_container_fd;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 16:05   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps Anatoly Burakov
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, srinath.mannam, scott.branden, ajit.khaparde

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 lib/librte_eal/common/eal_common_memzone.c    | 46 ++++++++++------
 .../common/include/rte_eal_memconfig.h        |  4 +-
 lib/librte_eal/common/malloc_heap.c           | 53 ++++++++++++-------
 lib/librte_eal/common/rte_malloc.c            | 28 ++++++----
 5 files changed, 84 insertions(+), 48 deletions(-)

diff --git a/config/common_base b/config/common_base
index fcf3a1f6f..b0e3937e0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index faa3b0615..25c56052c 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -52,6 +52,26 @@ memzone_lookup_thread_unsafe(const char *name)
 	return NULL;
 }
 
+static size_t
+heap_max_free_elem(unsigned int heap_idx, unsigned int align)
+{
+	struct rte_malloc_socket_stats stats;
+	struct rte_mem_config *mcfg;
+	size_t len;
+
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
+
+	malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx], &stats);
+
+	len = stats.greatest_free_size;
+
+	if (len < MALLOC_ELEM_OVERHEAD + align)
+		return 0;
+
+	return len - MALLOC_ELEM_OVERHEAD - align;
+}
+
 
 /* This function will return the greatest free block if a heap has been
  * specified. If no heap has been specified, it will return the heap and
@@ -59,29 +79,23 @@ memzone_lookup_thread_unsafe(const char *name)
 static size_t
 find_heap_max_free_elem(int *s, unsigned align)
 {
-	struct rte_mem_config *mcfg;
-	struct rte_malloc_socket_stats stats;
-	int i, socket = *s;
+	unsigned int idx;
+	int socket = *s;
 	size_t len = 0;
 
-	/* get pointer to global configuration */
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-		if ((socket != SOCKET_ID_ANY) && (socket != i))
+	for (idx = 0; idx < rte_socket_count(); idx++) {
+		int cur_socket = rte_socket_id_by_idx(idx);
+		if ((socket != SOCKET_ID_ANY) && (socket != cur_socket))
 			continue;
 
-		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
-		if (stats.greatest_free_size > len) {
-			len = stats.greatest_free_size;
-			*s = i;
+		size_t cur_len = heap_max_free_elem(idx, align);
+		if (cur_len > len) {
+			len = cur_len;
+			*s = cur_socket;
 		}
 	}
 
-	if (len < MALLOC_ELEM_OVERHEAD + align)
-		return 0;
-
-	return len - MALLOC_ELEM_OVERHEAD - align;
+	return len;
 }
 
 static const struct rte_memzone *
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 4e8720ba6..7e03196a6 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -71,8 +71,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8a1f54905..e7e1838b1 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -93,9 +93,10 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = rte_socket_idx_by_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -494,14 +495,20 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
+	/* return NULL if size is 0 or alignment is not power-of-2 */
+	if (size == 0 || (align && !rte_is_power_of_2(align)))
+		return NULL;
+
 	rte_spinlock_lock(&(heap->lock));
 
 	align = align == 0 ? 1 : align;
@@ -521,8 +528,13 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -538,13 +550,9 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
-	/* return NULL if size is 0 or alignment is not power-of-2 */
-	if (size == 0 || (align && !rte_is_power_of_2(align)))
-		return NULL;
-
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
@@ -557,18 +565,23 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (socket >= RTE_MAX_NUMA_NODES)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	/* turn socket ID into heap ID */
+	heap_id = rte_socket_idx_by_id(socket);
+	/* if heap id is invalid, it's a non-existent NUMA node */
+	if (heap_id < 0)
+		return NULL;
+
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
 	/* try other heaps */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -788,7 +801,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -826,7 +839,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..4387bc494 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,17 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx;
 
 	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
 		return -1;
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = rte_socket_idx_by_id(socket);
+	if (heap_idx < 0)
+		return -1;
+
+	return malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
 }
 
 /*
@@ -168,10 +174,9 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
 }
@@ -182,14 +187,17 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 16:09   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps Anatoly Burakov
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 13 +++++++++++++
 lib/librte_eal/common/rte_malloc.c              |  1 +
 3 files changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..bd64dff03 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -27,6 +28,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index e7e1838b1..8f22c062b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -848,6 +848,7 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 
 	rte_spinlock_lock(&heap->lock);
 
+	fprintf(f, "Heap name: %s\n", heap->name);
 	fprintf(f, "Heap size: 0x%zx\n", heap->total_size);
 	fprintf(f, "Heap alloc count: %u\n", heap->alloc_count);
 
@@ -864,6 +865,18 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	/* assign names to default DPDK heaps */
+	for (i = 0; i < rte_socket_count(); i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+		char heap_name[RTE_HEAP_NAME_MAX_LEN];
+		int socket_id = rte_socket_id_by_idx(i);
+
+		snprintf(heap_name, sizeof(heap_name) - 1,
+				"socket_%i", socket_id);
+		strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	}
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 4387bc494..75d6e0b4d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -198,6 +198,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 16:25   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 06/11] malloc: enable allocating " Anatoly Burakov
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add internal functions to look up heap by name, and enable
dumping statistics for a specified named heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 19 +++++++++++--
 lib/librte_eal/common/malloc_heap.c        | 31 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  6 +++++
 lib/librte_eal/common/rte_malloc.c         | 17 ++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..7cbcd3184 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -256,13 +256,28 @@ rte_malloc_validate(const void *ptr, size_t *size);
  * @param socket_stats
  *   A structure which provides memory to store statistics
  * @return
- *   Null on error
- *   Pointer to structure storing statistics on success
+ *   0 on success
+ *   -1 on error
  */
 int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Get heap statistics for the specified heap.
+ *
+ * @param socket
+ *   An unsigned integer specifying the socket to get heap statistics for
+ * @param socket_stats
+ *   A structure which provides memory to store statistics
+ * @return
+ *   0 on success
+ *   -1 on error
+ */
+int __rte_experimental
+rte_malloc_get_stats_from_heap(const char *heap_name,
+		struct rte_malloc_socket_stats *socket_stats);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8f22c062b..8437d33b3 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -614,6 +614,37 @@ malloc_heap_free_pages(void *aligned_start, size_t aligned_len)
 	return 0;
 }
 
+int
+malloc_heap_find_named_heap_idx(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx;
+
+	if (heap_name == NULL)
+		return -1;
+	if (strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == RTE_HEAP_NAME_MAX_LEN)
+		return -1;
+	for (heap_idx = rte_socket_count(); heap_idx < RTE_MAX_HEAPS;
+			heap_idx++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_idx];
+		if (strncmp(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN) == 0)
+			return heap_idx;
+	}
+	return -1;
+}
+
+struct malloc_heap *
+malloc_heap_find_named_heap(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx;
+
+	heap_idx = malloc_heap_find_named_heap_idx(heap_name);
+	if (heap_idx < 0)
+		return NULL;
+	return &mcfg->malloc_heaps[heap_idx];
+}
+
 int
 malloc_heap_free(struct malloc_elem *elem)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 03b801415..4f3137260 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -29,6 +29,12 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int flags,
 		size_t align, size_t bound, bool contig);
 
+int
+malloc_heap_find_named_heap_idx(const char *name);
+
+struct malloc_heap *
+malloc_heap_find_named_heap(const char *name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 75d6e0b4d..2508abdb1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -165,6 +165,23 @@ rte_malloc_get_socket_stats(int socket,
 			socket_stats);
 }
 
+/*
+ * Function to retrieve data for heap on given socket
+ */
+int __rte_experimental
+rte_malloc_get_stats_from_heap(const char *heap_name,
+		struct rte_malloc_socket_stats *socket_stats)
+{
+	struct malloc_heap *heap;
+
+	heap = malloc_heap_find_named_heap(heap_name);
+
+	if (heap == NULL)
+		return -1;
+
+	return malloc_heap_get_stats(heap, socket_stats);
+}
+
 /*
  * Function to dump contents of all heaps
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index e7fb37b2a..786df1e39 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -278,6 +278,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_get_stats_from_heap;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 06/11] malloc: enable allocating from named heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 16:31   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 07/11] malloc: enable creating new malloc heaps Anatoly Burakov
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add new malloc API to allocate memory from heap referenced to by
specified name.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 25 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.c        |  2 +-
 lib/librte_eal/common/malloc_heap.h        |  6 ++++++
 lib/librte_eal/common/rte_malloc.c         | 11 ++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 7cbcd3184..f1bcd9b65 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -213,6 +213,31 @@ rte_zmalloc_socket(const char *type, size_t size, unsigned align, int socket);
 void *
 rte_calloc_socket(const char *type, size_t num, size_t size, unsigned align, int socket);
 
+/**
+ * This function allocates memory from a specified named heap.
+ *
+ * @param name
+ *   Name of the heap to allocate from.
+ * @param type
+ *   A string identifying the type of allocated objects (useful for debug
+ *   purposes, such as identifying the cause of a memory leak). Can be NULL.
+ * @param size
+ *   Size (in bytes) to be allocated.
+ * @param align
+ *   If 0, the return is a pointer that is suitably aligned for any kind of
+ *   variable (in the same manner as malloc()).
+ *   Otherwise, the return is a pointer that is a multiple of *align*. In
+ *   this case, it must be a power of two. (Minimum alignment is the
+ *   cacheline size, i.e. 64-bytes)
+ * @return
+ *   - NULL on error. Not enough memory, or invalid arguments (size is 0,
+ *     align is not a power of two).
+ *   - Otherwise, the pointer to the allocated object.
+ */
+__rte_experimental void *
+rte_malloc_from_heap(const char *heap_name, const char *type, size_t size,
+		unsigned int align);
+
 /**
  * Frees the memory space pointed to by the provided pointer.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8437d33b3..a33acc252 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -494,7 +494,7 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 }
 
 /* this will try lower page sizes first */
-static void *
+void *
 malloc_heap_alloc_on_heap_id(const char *type, size_t size,
 		unsigned int heap_id, unsigned int flags, size_t align,
 		size_t bound, bool contig)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 4f3137260..a7e408c99 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -29,6 +29,12 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int flags,
 		size_t align, size_t bound, bool contig);
 
+/* allocate from specified heap id */
+void *
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig);
+
 int
 malloc_heap_find_named_heap_idx(const char *name);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 2508abdb1..215876a59 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -55,6 +55,17 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 			align == 0 ? 1 : align, 0, false);
 }
 
+void *
+rte_malloc_from_heap(const char *heap_name, const char *type, size_t size,
+		unsigned int align)
+{
+	int heap_idx = malloc_heap_find_named_heap_idx(heap_name);
+	if (heap_idx < 0)
+		return NULL;
+	return malloc_heap_alloc_on_heap_id(type, size, heap_idx, 0,
+			align == 0 ? 1 : align, 0, false);
+}
+
 /*
  * Allocate memory on default heap.
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 786df1e39..716a7585d 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -278,6 +278,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_from_heap;
 	rte_malloc_get_stats_from_heap;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 07/11] malloc: enable creating new malloc heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (5 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 06/11] malloc: enable allocating " Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps Anatoly Burakov
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add API to allow creating new malloc heaps. They will be created
with indexes higher than heaps reserved for NUMA sockets, and up to
RTE_MAX_HEAPS.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 21 ++++++++++
 lib/librte_eal/common/malloc_heap.c        | 16 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 46 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 87 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index f1bcd9b65..fa6de073a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -253,6 +253,27 @@ rte_malloc_from_heap(const char *heap_name, const char *type, size_t size,
 void
 rte_free(void *ptr);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Concurrently creating or destroying heaps is not safe.
+ *
+ * @note This function does not need to be called in multiple processes, as
+ *   multiprocess synchronization will happen automatically as far as heap data
+ *   is concerned. However, before accessing pointers to memory in this heap, it
+ *   is responsibility of the user to ensure that the heap memory is accessible
+ *   in all processes.
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation.
+ *   - -1 on error.
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * If malloc debug is enabled, check a memory block for header
  * and trailer markers to indicate that all is well with the block.
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a33acc252..f5d103626 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -892,6 +892,22 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index a7e408c99..aa819ef65 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -35,6 +35,9 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size,
 		unsigned int heap_id, unsigned int flags, size_t align,
 		size_t bound, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_find_named_heap_idx(const char *name);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 215876a59..e000dc5b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -12,6 +12,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -272,3 +273,48 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i;
+
+	/* iova_addrs is allowed to be NULL */
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists. start from non-socket heaps.
+	 */
+	for (i = rte_socket_count(); i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			return -1;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	return malloc_heap_create(heap, heap_name);
+}
+
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 716a7585d..f3c375156 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -280,6 +280,7 @@ EXPERIMENTAL {
 	rte_malloc_dump_heaps;
 	rte_malloc_from_heap;
 	rte_malloc_get_stats_from_heap;
+	rte_malloc_heap_create;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (6 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 07/11] malloc: enable creating new malloc heaps Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 17:04   ` Alejandro Lucero
  2018-07-06 13:17 ` [dpdk-dev] [RFC 09/11] malloc: allow removing memory from " Anatoly Burakov
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 44 ++++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 70 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 158 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fa6de073a..5f933993b 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -274,6 +274,50 @@ rte_free(void *ptr);
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Add more memory to heap with specified name.
+ *
+ * @note Concurrently adding memory to or removing memory from different heaps
+ *   is not safe.
+ *
+ * @note This function does not need to be called in multiple processes, as
+ *   multiprocess synchronization will happen automatically as far as heap data
+ *   is concerned. However, before accessing pointers to memory in this heap, it
+ *   is responsibility of the user to ensure that the heap memory is accessible
+ *   in all processes.
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including crashes.
+ *
+ * @note Adding memory to heap may fail in multiple processes scenario, as
+ *   attaching to ``rte_fbarray`` structures may not always be successful in
+ *   secondary processes.
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ * @param va_addr
+ *   Start of virtual area to add to the heap.
+ * @param len
+ *   Length of virtual area to add to the heap.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Must be zero of ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory.
+ *
+ * @return
+ *   - 0 on successful creation.
+ *   - -1 on error.
+ */
+int __rte_experimental
+rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * If malloc debug is enabled, check a memory block for header
  * and trailer markers to indicate that all is well with the block.
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index f5d103626..29446cac9 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -892,6 +892,76 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, n_pages * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = -1; /* invalid socket ID */
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = -1; /* invalid socket ID */
+	msl->version = 0;
+	msl->external = true;
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index aa819ef65..3be4656d0 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -38,6 +38,10 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_find_named_heap_idx(const char *name);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index e000dc5b7..db0f604ad 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -274,6 +274,45 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+int
+rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	/* iova_addrs is allowed to be NULL */
+	if (heap_name == NULL || va_addr == NULL ||
+			/* n_pages can be 0 if iova_addrs is NULL */
+			((iova_addrs != NULL) == (n_pages == 0)) ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* find our heap */
+	heap = malloc_heap_find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f3c375156..6290cc910 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -280,6 +280,7 @@ EXPERIMENTAL {
 	rte_malloc_dump_heaps;
 	rte_malloc_from_heap;
 	rte_malloc_get_stats_from_heap;
+	rte_malloc_heap_add_memory;
 	rte_malloc_heap_create;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 09/11] malloc: allow removing memory from named heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (7 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 10/11] malloc: allow destroying heaps Anatoly Burakov
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++++
 lib/librte_eal/common/malloc_heap.c        | 61 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 28 ++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 122 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 5f933993b..25d8d3f11 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -318,6 +318,34 @@ int __rte_experimental
 rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory area from heap with specified name.
+ *
+ * @note Concurrently adding or removing memory from different heaps is not
+ *   safe.
+ *
+ * @note This function does not need to be called in multiple processes, as
+ *   multiprocess synchronization will happen automatically as far as heap data
+ *   is concerned. However, before accessing pointers to memory in this heap, it
+ *   is responsibility of the user to ensure that the heap memory is accessible
+ *   in all processes.
+ *
+ * @note Memory area must be empty to allow its removal from the heap.
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ * @param va_addr
+ *   Virtual address to remove from the heap.
+ * @param len
+ *   Length of virtual area to remove from the heap.
+ *
+ * @return
+ *   - 0 on successful creation.
+ *   - -1 on error.
+ */
+int __rte_experimental
+rte_malloc_heap_remove_memory(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * If malloc debug is enabled, check a memory block for header
  * and trailer markers to indicate that all is well with the block.
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 29446cac9..27dbf6e60 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -892,6 +892,44 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	/* check if the element is unused */
+	if (elem->state != ELEM_FREE) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+
+	msl = elem->msl;
+
+	/* check if element encompasses the entire memseg list */
+	if (elem->size != len || len != (msl->page_sz * msl->memseg_arr.len)) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	memset(elem, 0, sizeof(*elem));
+
+	heap->total_size -= len;
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -962,6 +1000,29 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			elem = NULL;
+			break;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 3be4656d0..000146365 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -42,6 +42,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_find_named_heap_idx(const char *name);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index db0f604ad..8d2eb7250 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,34 @@ rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_remove_memory(const char *heap_name, void *va_addr, size_t len)
+{
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	/* iova_addrs is allowed to be NULL */
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* find our heap */
+	heap = malloc_heap_find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 6290cc910..7ee79051f 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -282,6 +282,7 @@ EXPERIMENTAL {
 	rte_malloc_get_stats_from_heap;
 	rte_malloc_heap_add_memory;
 	rte_malloc_heap_create;
+	rte_malloc_heap_remove_memory;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 10/11] malloc: allow destroying heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (8 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 09/11] malloc: allow removing memory from " Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-06 13:17 ` [dpdk-dev] [RFC 11/11] memzone: enable reserving memory from named heaps Anatoly Burakov
  2018-07-13 17:10 ` [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Burakov, Anatoly
  11 siblings, 0 replies; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add an API to destroy specified heap. Any memory regions still
contained within the heap will be removed first.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 21 ++++++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 29 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 +++
 lib/librte_eal/common/rte_malloc.c         | 27 ++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 81 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 25d8d3f11..239cda2ca 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -346,6 +346,27 @@ rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_remove_memory(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note Concurrently creating or destroying heaps is not thread-safe.
+ *
+ * @note This function does not deallocate the memory backing the heap - it only
+ *   deregisters memory from DPDK.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to malloc heap.
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation.
+ *   - -1 on error.
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * If malloc debug is enabled, check a memory block for header
  * and trailer markers to indicate that all is well with the block.
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 27dbf6e60..e447b6412 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1039,6 +1039,35 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	struct malloc_elem *elem;
+
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	elem = heap->first;
+	while (elem != NULL) {
+		struct malloc_elem *next = elem->next;
+
+		if (destroy_seg(elem, elem->size) < 0)
+			return -1;
+
+		elem = next;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* we can't memset the entire thing as we're still holding the lock */
+	LIST_INIT(heap->free_head);
+	memset(&heap->name, 0, sizeof(heap->name));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 000146365..399c9a6b1 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -38,6 +38,9 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 8d2eb7250..b6beee7ce 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -385,3 +385,30 @@ rte_malloc_heap_create(const char *heap_name)
 	return malloc_heap_create(heap, heap_name);
 }
 
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* start from non-socket heaps */
+	heap = malloc_heap_find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+	rte_spinlock_unlock(&heap->lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 7ee79051f..cdde7eb3b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -282,6 +282,7 @@ EXPERIMENTAL {
 	rte_malloc_get_stats_from_heap;
 	rte_malloc_heap_add_memory;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_remove_memory;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dpdk-dev] [RFC 11/11] memzone: enable reserving memory from named heaps
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (9 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 10/11] malloc: allow destroying heaps Anatoly Burakov
@ 2018-07-06 13:17 ` Anatoly Burakov
  2018-07-13 17:10 ` [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Burakov, Anatoly
  11 siblings, 0 replies; 25+ messages in thread
From: Anatoly Burakov @ 2018-07-06 13:17 UTC (permalink / raw)
  To: dev; +Cc: srinath.mannam, scott.branden, ajit.khaparde

Add ability to allocate memory for memzones from named heaps. The
semantics are kept similar to regular allocations, and as much of
the code as possible is shared.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c  | 237 +++++++++++++++-----
 lib/librte_eal/common/include/rte_memzone.h | 183 +++++++++++++++
 lib/librte_eal/rte_eal_version.map          |   3 +
 3 files changed, 373 insertions(+), 50 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 25c56052c..d37e7ae1d 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -98,17 +98,14 @@ find_heap_max_free_elem(int *s, unsigned align)
 	return len;
 }
 
-static const struct rte_memzone *
-memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
-		int socket_id, unsigned int flags, unsigned int align,
+static int
+common_checks(const char *name, size_t len, unsigned int align,
 		unsigned int bound)
 {
 	struct rte_memzone *mz;
 	struct rte_mem_config *mcfg;
 	struct rte_fbarray *arr;
 	size_t requested_len;
-	int mz_idx;
-	bool contig;
 
 	/* get pointer to global configuration */
 	mcfg = rte_eal_get_configuration()->mem_config;
@@ -118,14 +115,14 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	if (arr->count >= arr->len) {
 		RTE_LOG(ERR, EAL, "%s(): No more room in config\n", __func__);
 		rte_errno = ENOSPC;
-		return NULL;
+		return -1;
 	}
 
 	if (strlen(name) > sizeof(mz->name) - 1) {
 		RTE_LOG(DEBUG, EAL, "%s(): memzone <%s>: name too long\n",
 			__func__, name);
 		rte_errno = ENAMETOOLONG;
-		return NULL;
+		return -1;
 	}
 
 	/* zone already exist */
@@ -133,7 +130,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		RTE_LOG(DEBUG, EAL, "%s(): memzone <%s> already exists\n",
 			__func__, name);
 		rte_errno = EEXIST;
-		return NULL;
+		return -1;
 	}
 
 	/* if alignment is not a power of two */
@@ -141,7 +138,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		RTE_LOG(ERR, EAL, "%s(): Invalid alignment: %u\n", __func__,
 				align);
 		rte_errno = EINVAL;
-		return NULL;
+		return -1;
 	}
 
 	/* alignment less than cache size is not allowed */
@@ -151,7 +148,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* align length on cache boundary. Check for overflow before doing so */
 	if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) {
 		rte_errno = EINVAL; /* requested size too big */
-		return NULL;
+		return -1;
 	}
 
 	len += RTE_CACHE_LINE_MASK;
@@ -163,49 +160,23 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* check that boundary condition is valid */
 	if (bound != 0 && (requested_len > bound || !rte_is_power_of_2(bound))) {
 		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	if (!rte_eal_has_hugepages())
-		socket_id = SOCKET_ID_ANY;
-
-	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
-	/* malloc only cares about size flags, remove contig flag from flags */
-	flags &= ~RTE_MEMZONE_IOVA_CONTIG;
-
-	if (len == 0) {
-		/* len == 0 is only allowed for non-contiguous zones */
-		if (contig) {
-			RTE_LOG(DEBUG, EAL, "Reserving zero-length contiguous memzones is not supported\n");
-			rte_errno = EINVAL;
-			return NULL;
-		}
-		if (bound != 0)
-			requested_len = bound;
-		else {
-			requested_len = find_heap_max_free_elem(&socket_id, align);
-			if (requested_len == 0) {
-				rte_errno = ENOMEM;
-				return NULL;
-			}
-		}
-	}
-
-	/* allocate memory on heap */
-	void *mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id, flags,
-			align, bound, contig);
-	if (mz_addr == NULL) {
-		rte_errno = ENOMEM;
-		return NULL;
+		return -1;
 	}
+	return 0;
+}
 
+static const struct rte_memzone *
+create_memzone(const char *name, void *mz_addr, size_t requested_len)
+{
+	struct rte_mem_config *mcfg;
+	struct rte_fbarray *arr;
 	struct malloc_elem *elem = malloc_elem_from_data(mz_addr);
+	struct rte_memzone *mz;
+	int mz_idx;
+
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
+	arr = &mcfg->memzones;
 
 	/* fill the zone in config */
 	mz_idx = rte_fbarray_find_next_free(arr, 0);
@@ -236,6 +207,134 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	return mz;
 }
 
+static const struct rte_memzone *
+memzone_reserve_from_heap_aligned_thread_unsafe(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align,
+		unsigned int bound)
+{
+	size_t requested_len = len;
+	void *mz_addr;
+	int heap_idx;
+	bool contig;
+
+	/* this function sets rte_errno */
+	if (common_checks(name, len, align, bound) < 0)
+		return NULL;
+
+	heap_idx = malloc_heap_find_named_heap_idx(heap_name);
+	if (heap_idx < 0) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
+	/* malloc only cares about size flags, remove contig flag from flags */
+	flags &= ~RTE_MEMZONE_IOVA_CONTIG;
+
+	if (len == 0) {
+		/* len == 0 is only allowed for non-contiguous zones */
+		if (contig) {
+			RTE_LOG(DEBUG, EAL, "Reserving zero-length contiguous memzones is not supported\n");
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		if (bound != 0)
+			requested_len = bound;
+		else {
+			requested_len = heap_max_free_elem(heap_idx, align);
+			if (requested_len == 0) {
+				rte_errno = ENOMEM;
+				return NULL;
+			}
+		}
+	}
+
+	/* allocate memory on heap */
+	mz_addr = malloc_heap_alloc_on_heap_id(NULL, requested_len, heap_idx,
+			flags, align, bound, contig);
+	if (mz_addr == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	return create_memzone(name, mz_addr, requested_len);
+}
+
+static const struct rte_memzone *
+memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
+		int socket_id, unsigned int flags, unsigned int align,
+		unsigned int bound)
+{
+	size_t requested_len = len;
+	bool contig;
+	void *mz_addr;
+
+	/* this function sets rte_errno */
+	if (common_checks(name, len, align, bound) < 0)
+		return NULL;
+
+	if ((socket_id != SOCKET_ID_ANY) &&
+			(socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	if (!rte_eal_has_hugepages())
+		socket_id = SOCKET_ID_ANY;
+
+	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
+	/* malloc only cares about size flags, remove contig flag from flags */
+	flags &= ~RTE_MEMZONE_IOVA_CONTIG;
+
+	if (len == 0) {
+		/* len == 0 is only allowed for non-contiguous zones */
+		if (contig) {
+			RTE_LOG(DEBUG, EAL, "Reserving zero-length contiguous memzones is not supported\n");
+			rte_errno = EINVAL;
+			return NULL;
+		}
+		if (bound != 0)
+			requested_len = bound;
+		else {
+			requested_len = find_heap_max_free_elem(&socket_id,
+					align);
+			if (requested_len == 0) {
+				rte_errno = ENOMEM;
+				return NULL;
+			}
+		}
+	}
+
+	/* allocate memory on heap */
+	mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id, flags,
+			align, bound, contig);
+	if (mz_addr == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	return create_memzone(name, mz_addr, requested_len);
+}
+
+static const struct rte_memzone *
+rte_memzone_reserve_from_heap_thread_safe(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align,
+		unsigned int bound)
+{
+	struct rte_mem_config *mcfg;
+	const struct rte_memzone *mz = NULL;
+
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
+
+	rte_rwlock_write_lock(&mcfg->mlock);
+
+	mz = memzone_reserve_from_heap_aligned_thread_unsafe(name, len,
+			heap_name, flags, align, bound);
+
+	rte_rwlock_write_unlock(&mcfg->mlock);
+
+	return mz;
+}
+
 static const struct rte_memzone *
 rte_memzone_reserve_thread_safe(const char *name, size_t len, int socket_id,
 		unsigned int flags, unsigned int align, unsigned int bound)
@@ -293,6 +392,44 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id,
 					       flags, RTE_CACHE_LINE_SIZE, 0);
 }
 
+/*
+ * Return a pointer to a correctly filled memzone descriptor (with a
+ * specified alignment and boundary). If the allocation cannot be done,
+ * return NULL.
+ */
+const struct rte_memzone *
+rte_memzone_reserve_from_heap_bounded(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align,
+		unsigned int bound)
+{
+	return rte_memzone_reserve_from_heap_thread_safe(name, len, heap_name,
+			flags, align, bound);
+}
+
+/*
+ * Return a pointer to a correctly filled memzone descriptor (with a
+ * specified alignment). If the allocation cannot be done, return NULL.
+ */
+const struct rte_memzone *
+rte_memzone_reserve_from_heap_aligned(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align)
+{
+	return rte_memzone_reserve_from_heap_thread_safe(name, len, heap_name,
+			flags, align, 0);
+}
+
+/*
+ * Return a pointer to a correctly filled memzone descriptor. If the
+ * allocation cannot be done, return NULL.
+ */
+const struct rte_memzone *
+rte_memzone_reserve_from_heap(const char *name, size_t len,
+		const char *heap_name, unsigned int flags)
+{
+	return rte_memzone_reserve_from_heap_thread_safe(name, len, heap_name,
+			flags, RTE_CACHE_LINE_SIZE, 0);
+}
+
 int
 rte_memzone_free(const struct rte_memzone *mz)
 {
diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index ef370fa6f..b27e5c421 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -258,6 +258,189 @@ const struct rte_memzone *rte_memzone_reserve_bounded(const char *name,
 			size_t len, int socket_id,
 			unsigned flags, unsigned align, unsigned bound);
 
+/**
+ * Reserve a portion of physical memory from a specified named heap.
+ *
+ * This function reserves some memory and returns a pointer to a
+ * correctly filled memzone descriptor. If the allocation cannot be
+ * done, return NULL.
+ *
+ * @note Reserving memzones with len set to 0 will only attempt to allocate
+ *   memzones from memory that is already available. It will not trigger any
+ *   new allocations.
+ *
+ * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
+ *   supported.
+ *
+ * @param name
+ *   The name of the memzone. If it already exists, the function will
+ *   fail and return NULL.
+ * @param len
+ *   The size of the memory to be reserved. If it
+ *   is 0, the biggest contiguous zone will be reserved.
+ * @param heap_name
+ *   The name of the heap to reserve memory from.
+ * @param flags
+ *   The flags parameter is used to request memzones to be
+ *   taken from specifically sized hugepages.
+ *   - RTE_MEMZONE_2MB - Reserved from 2MB pages
+ *   - RTE_MEMZONE_1GB - Reserved from 1GB pages
+ *   - RTE_MEMZONE_16MB - Reserved from 16MB pages
+ *   - RTE_MEMZONE_16GB - Reserved from 16GB pages
+ *   - RTE_MEMZONE_256KB - Reserved from 256KB pages
+ *   - RTE_MEMZONE_256MB - Reserved from 256MB pages
+ *   - RTE_MEMZONE_512MB - Reserved from 512MB pages
+ *   - RTE_MEMZONE_4GB - Reserved from 4GB pages
+ *   - RTE_MEMZONE_SIZE_HINT_ONLY - Allow alternative page size to be used if
+ *                                  the requested page size is unavailable.
+ *                                  If this flag is not set, the function
+ *                                  will return error on an unavailable size
+ *                                  request.
+ *   - RTE_MEMZONE_IOVA_CONTIG - Ensure reserved memzone is IOVA-contiguous.
+ *                               This option should be used when allocating
+ *                               memory intended for hardware rings etc.
+ * @return
+ *   A pointer to a correctly-filled read-only memzone descriptor, or NULL
+ *   on error.
+ *   On error case, rte_errno will be set appropriately:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ *    - EINVAL - invalid parameters
+ */
+__rte_experimental const struct rte_memzone *
+rte_memzone_reserve_from_heap(const char *name, size_t len,
+		const char *heap_name, unsigned int flags);
+
+/**
+ * Reserve a portion of physical memory from a specified named heap with
+ * alignment on a specified boundary.
+ *
+ * This function reserves some memory with alignment on a specified
+ * boundary, and returns a pointer to a correctly filled memzone
+ * descriptor. If the allocation cannot be done or if the alignment
+ * is not a power of 2, returns NULL.
+ *
+ * @note Reserving memzones with len set to 0 will only attempt to allocate
+ *   memzones from memory that is already available. It will not trigger any
+ *   new allocations.
+ *
+ * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
+ *   supported.
+ *
+ * @param name
+ *   The name of the memzone. If it already exists, the function will
+ *   fail and return NULL.
+ * @param len
+ *   The size of the memory to be reserved. If it
+ *   is 0, the biggest contiguous zone will be reserved.
+ * @param heap_name
+ *   The name of the heap to reserve memory from.
+ * @param flags
+ *   The flags parameter is used to request memzones to be
+ *   taken from specifically sized hugepages.
+ *   - RTE_MEMZONE_2MB - Reserved from 2MB pages
+ *   - RTE_MEMZONE_1GB - Reserved from 1GB pages
+ *   - RTE_MEMZONE_16MB - Reserved from 16MB pages
+ *   - RTE_MEMZONE_16GB - Reserved from 16GB pages
+ *   - RTE_MEMZONE_256KB - Reserved from 256KB pages
+ *   - RTE_MEMZONE_256MB - Reserved from 256MB pages
+ *   - RTE_MEMZONE_512MB - Reserved from 512MB pages
+ *   - RTE_MEMZONE_4GB - Reserved from 4GB pages
+ *   - RTE_MEMZONE_SIZE_HINT_ONLY - Allow alternative page size to be used if
+ *                                  the requested page size is unavailable.
+ *                                  If this flag is not set, the function
+ *                                  will return error on an unavailable size
+ *                                  request.
+ *   - RTE_MEMZONE_IOVA_CONTIG - Ensure reserved memzone is IOVA-contiguous.
+ *                               This option should be used when allocating
+ *                               memory intended for hardware rings etc.
+ * @param align
+ *   Alignment for resulting memzone. Must be a power of 2.
+ * @return
+ *   A pointer to a correctly-filled read-only memzone descriptor, or NULL
+ *   on error.
+ *   On error case, rte_errno will be set appropriately:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ *    - EINVAL - invalid parameters
+ */
+__rte_experimental const struct rte_memzone *
+rte_memzone_reserve_from_heap_aligned(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align);
+
+/**
+ * Reserve a portion of physical memory from a specified named heap with
+ * specified alignment and boundary.
+ *
+ * This function reserves some memory with specified alignment and
+ * boundary, and returns a pointer to a correctly filled memzone
+ * descriptor. If the allocation cannot be done or if the alignment
+ * or boundary are not a power of 2, returns NULL.
+ * Memory buffer is reserved in a way, that it wouldn't cross specified
+ * boundary. That implies that requested length should be less or equal
+ * then boundary.
+ *
+ * @note Reserving memzones with len set to 0 will only attempt to allocate
+ *   memzones from memory that is already available. It will not trigger any
+ *   new allocations.
+ *
+ * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
+ *   supported.
+ *
+ * @param name
+ *   The name of the memzone. If it already exists, the function will
+ *   fail and return NULL.
+ * @param len
+ *   The size of the memory to be reserved. If it
+ *   is 0, the biggest contiguous zone will be reserved.
+ * @param heap_name
+ *   The name of the heap to reserve memory from.
+ * @param flags
+ *   The flags parameter is used to request memzones to be
+ *   taken from specifically sized hugepages.
+ *   - RTE_MEMZONE_2MB - Reserved from 2MB pages
+ *   - RTE_MEMZONE_1GB - Reserved from 1GB pages
+ *   - RTE_MEMZONE_16MB - Reserved from 16MB pages
+ *   - RTE_MEMZONE_16GB - Reserved from 16GB pages
+ *   - RTE_MEMZONE_256KB - Reserved from 256KB pages
+ *   - RTE_MEMZONE_256MB - Reserved from 256MB pages
+ *   - RTE_MEMZONE_512MB - Reserved from 512MB pages
+ *   - RTE_MEMZONE_4GB - Reserved from 4GB pages
+ *   - RTE_MEMZONE_SIZE_HINT_ONLY - Allow alternative page size to be used if
+ *                                  the requested page size is unavailable.
+ *                                  If this flag is not set, the function
+ *                                  will return error on an unavailable size
+ *                                  request.
+ *   - RTE_MEMZONE_IOVA_CONTIG - Ensure reserved memzone is IOVA-contiguous.
+ *                               This option should be used when allocating
+ *                               memory intended for hardware rings etc.
+ * @param align
+ *   Alignment for resulting memzone. Must be a power of 2.
+ * @param bound
+ *   Boundary for resulting memzone. Must be a power of 2 or zero.
+ *   Zero value implies no boundary condition.
+ * @return
+ *   A pointer to a correctly-filled read-only memzone descriptor, or NULL
+ *   on error.
+ *   On error case, rte_errno will be set appropriately:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ *    - EINVAL - invalid parameters
+ */
+__rte_experimental const struct rte_memzone *
+rte_memzone_reserve_from_heap_bounded(const char *name, size_t len,
+		const char *heap_name, unsigned int flags, unsigned int align,
+		unsigned int bound);
+
 /**
  * Free a memzone.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index cdde7eb3b..db1cfae6a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -294,6 +294,9 @@ EXPERIMENTAL {
 	rte_memseg_contig_walk;
 	rte_memseg_list_walk;
 	rte_memseg_walk;
+	rte_memzone_reserve_from_heap;
+	rte_memzone_reserve_from_heap_aligned;
+	rte_memzone_reserve_from_heap_bounded;
 	rte_mp_action_register;
 	rte_mp_action_unregister;
 	rte_mp_reply;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external
  2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-07-10 11:18   ` Alejandro Lucero
  2018-07-10 11:31     ` Burakov, Anatoly
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-10 11:18 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/eal_common_memory.c     | 51 ++++++++++++++++---
>  .../common/include/rte_eal_memconfig.h        |  1 +
>  lib/librte_eal/common/malloc_heap.c           |  2 +-
>  3 files changed, 47 insertions(+), 7 deletions(-)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> index 4f0688f9d..835bbffb6 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -24,6 +24,21 @@
>  #include "eal_private.h"
>  #include "eal_internal_cfg.h"
>
> +/* forward declarations for memseg walk functions. we support external
> segments,
> + * but for some functionality to work, we need to either skip or not skip
> + * external segments. for example, while we expect for virt2memseg to
> return a
> + * valid memseg even though it's an external memseg, for regular memseg
> walk we
> + * want to skip those because the expectation is that we will only walk
> the
> + * DPDK allocated memory.
> + */
>

I do not clear understand when external segments can be used along with
hugepages segments, but if this is a possibility, should not we support
memseg walk for just external segments and also to allow any walk type as
an API? If I'm right, it seems just memseg walk skipping external memsegs
can be invoked from other files.


> +static int
> +memseg_list_walk(rte_memseg_list_walk_t func, void *arg, bool
> skip_external);
> +static int
> +memseg_walk(rte_memseg_walk_t func, void *arg, bool skip_external);
> +static int
> +memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg,
> +               bool skip_external);
> +
>  /*
>   * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>   * pointer to the mmap'd area and keep *size unmodified. Else, retry
> @@ -621,9 +636,9 @@ rte_mem_iova2virt(rte_iova_t iova)
>          * as we know they are PA-contiguous as well
>          */
>         if (internal_config.legacy_mem)
> -               rte_memseg_contig_walk(find_virt_legacy, &vi);
> +               memseg_contig_walk(find_virt_legacy, &vi, false);
>         else
> -               rte_memseg_walk(find_virt, &vi);
> +               memseg_walk(find_virt, &vi, false);
>
>         return vi.virt;
>  }
> @@ -787,8 +802,8 @@ rte_mem_lock_page(const void *virt)
>         return mlock((void *)aligned, page_size);
>  }
>
> -int __rte_experimental
> -rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
> +static int
> +memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg, bool
> skip_external)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
>         int i, ms_idx, ret = 0;
> @@ -803,6 +818,8 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func,
> void *arg)
>
>                 if (msl->memseg_arr.count == 0)
>                         continue;
> +               if (skip_external && msl->external)
> +                       continue;
>
>                 arr = &msl->memseg_arr;
>
> @@ -837,7 +854,13 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t
> func, void *arg)
>  }
>
>  int __rte_experimental
> -rte_memseg_walk(rte_memseg_walk_t func, void *arg)
> +rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg)
> +{
> +       return memseg_contig_walk(func, arg, true);
> +}
> +
> +static int
> +memseg_walk(rte_memseg_walk_t func, void *arg, bool skip_external)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
>         int i, ms_idx, ret = 0;
> @@ -852,6 +875,8 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg)
>
>                 if (msl->memseg_arr.count == 0)
>                         continue;
> +               if (skip_external && msl->external)
> +                       continue;
>
>                 arr = &msl->memseg_arr;
>
> @@ -875,7 +900,13 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg)
>  }
>
>  int __rte_experimental
> -rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
> +rte_memseg_walk(rte_memseg_walk_t func, void *arg)
> +{
> +       return memseg_walk(func, arg, true);
> +}
> +
> +static int
> +memseg_list_walk(rte_memseg_list_walk_t func, void *arg, bool
> skip_external)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
>         int i, ret = 0;
> @@ -888,6 +919,8 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func,
> void *arg)
>
>                 if (msl->base_va == NULL)
>                         continue;
> +               if (skip_external && msl->external)
> +                       continue;
>
>                 ret = func(msl, arg);
>                 if (ret < 0) {
> @@ -904,6 +937,12 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func,
> void *arg)
>         return ret;
>  }
>
> +int __rte_experimental
> +rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
> +{
> +       return memseg_list_walk(func, arg, true);
> +}
> +
>  /* init memory subsystem */
>  int
>  rte_eal_memory_init(void)
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688dd..4e8720ba6 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -30,6 +30,7 @@ struct rte_memseg_list {
>                 uint64_t addr_64;
>                 /**< Makes sure addr is always 64-bits */
>         };
> +       bool external; /**< true if this list points to external memory */
>         int socket_id; /**< Socket ID for all memsegs in this list. */
>         uint64_t page_sz; /**< Page size for all memsegs in this list. */
>         volatile uint32_t version; /**< version number for multiprocess
> sync. */
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/
> malloc_heap.c
> index d6cf3af81..8a1f54905 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -631,7 +631,7 @@ malloc_heap_free(struct malloc_elem *elem)
>         ret = 0;
>
>         /* ...of which we can't avail if we are in legacy mode */
> -       if (internal_config.legacy_mem)
> +       if (internal_config.legacy_mem || msl->external)
>                 goto free_unlock;
>
>         /* check if we can free any memory back to the system */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external
  2018-07-10 11:18   ` Alejandro Lucero
@ 2018-07-10 11:31     ` Burakov, Anatoly
  0 siblings, 0 replies; 25+ messages in thread
From: Burakov, Anatoly @ 2018-07-10 11:31 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On 10-Jul-18 12:18 PM, Alejandro Lucero wrote:
> On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
> 
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   lib/librte_eal/common/eal_common_memory.c     | 51 ++++++++++++++++---
>>   .../common/include/rte_eal_memconfig.h        |  1 +
>>   lib/librte_eal/common/malloc_heap.c           |  2 +-
>>   3 files changed, 47 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>> b/lib/librte_eal/common/eal_common_memory.c
>> index 4f0688f9d..835bbffb6 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,21 @@
>>   #include "eal_private.h"
>>   #include "eal_internal_cfg.h"
>>
>> +/* forward declarations for memseg walk functions. we support external
>> segments,
>> + * but for some functionality to work, we need to either skip or not skip
>> + * external segments. for example, while we expect for virt2memseg to
>> return a
>> + * valid memseg even though it's an external memseg, for regular memseg
>> walk we
>> + * want to skip those because the expectation is that we will only walk
>> the
>> + * DPDK allocated memory.
>> + */
>>
> 
> I do not clear understand when external segments can be used along with
> hugepages segments, but if this is a possibility, should not we support
> memseg walk for just external segments and also to allow any walk type as
> an API? If I'm right, it seems just memseg walk skipping external memsegs
> can be invoked from other files.

Well, now that i think of it, every walk function will receive a memseg 
list along with memseg itself, so user code can check to skip external 
memsegs. So yes, you're correct. I'll do that in the next iteration. 
Thanks for your feedback!

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID
  2018-07-06 13:17 ` [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID Anatoly Burakov
@ 2018-07-10 13:03   ` Alejandro Lucero
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-10 13:03 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> We are preparing to switch to index heap based on heap indexes
> rather than by NUMA nodes. First few indexes will be equal to
> NUMA node ID indexes. For example, currently on a machine with
> NUMA nodes [0, 8], heaps 0 and 8 will be active, while we want
> to make it so that heaps 0 and 1 are active. However, currently
> we don't have a function to map a specific NUMA node to a node
> index, so add it in this patch.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/eal_common_lcore.c  | 15 +++++++++++++++
>  lib/librte_eal/common/include/rte_lcore.h | 19 ++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map        |  1 +
>  3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_lcore.c
> b/lib/librte_eal/common/eal_common_lcore.c
> index 3167e9d79..579f5a0a1 100644
> --- a/lib/librte_eal/common/eal_common_lcore.c
> +++ b/lib/librte_eal/common/eal_common_lcore.c
> @@ -132,3 +132,18 @@ rte_socket_id_by_idx(unsigned int idx)
>         }
>         return config->numa_nodes[idx];
>  }
> +
> +int __rte_experimental
> +rte_socket_idx_by_id(unsigned int socket)
> +{
> +       const struct rte_config *config = rte_eal_get_configuration();
> +       int i;
> +
> +       for (i = 0; i < (int) config->numa_node_count; i++) {
> +               unsigned int cur_socket = config->numa_nodes[i];
> +               if (cur_socket == socket)
> +                       return i;
> +       }
> +       rte_errno = EINVAL;
> +       return -1;
> +}
> diff --git a/lib/librte_eal/common/include/rte_lcore.h
> b/lib/librte_eal/common/include/rte_lcore.h
> index 6e09d9181..f58cda09a 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -156,11 +156,28 @@ rte_socket_count(void);
>   *
>   * @return
>   *   - physical socket id as recognized by EAL
> - *   - -1 on error, with errno set to EINVAL
> + *   - -1 on error, with rte_errno set to EINVAL
>   */
>  int __rte_experimental
>  rte_socket_id_by_idx(unsigned int idx);
>
> +/**
> + * Return index for a particular socket id.
> + *
> + * This will return position in list of all detected physical socket id's
> for a
> + * given socket. For example, on a machine with sockets [0, 8], passing
> + * 8 as a parameter will return 1.
> + *
> + * @param socket
> + *   physical socket id to return index for
> + *
> + * @return
> + *   - index of physical socket id as recognized by EAL
> + *   - -1 on error, with rte_errno set to EINVAL
> + */
> +int __rte_experimental
> +rte_socket_idx_by_id(unsigned int socket);
> +
>  /**
>   * Get the ID of the physical socket of the specified lcore
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_
> version.map
> index f7dd0e7bc..e7fb37b2a 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -296,6 +296,7 @@ EXPERIMENTAL {
>         rte_mp_sendmsg;
>         rte_socket_count;
>         rte_socket_id_by_idx;
> +       rte_socket_idx_by_id;
>         rte_vfio_dma_map;
>         rte_vfio_dma_unmap;
>         rte_vfio_get_container_fd;
> --
> 2.17.1
>

Acked-by: Alejandro Lucero <alejandro.lucero@netronome.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node
  2018-07-06 13:17 ` [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-07-13 16:05   ` Alejandro Lucero
  2018-07-13 16:08     ` Burakov, Anatoly
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-13 16:05 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Thomas Monjalon, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  config/common_base                            |  1 +
>  lib/librte_eal/common/eal_common_memzone.c    | 46 ++++++++++------
>  .../common/include/rte_eal_memconfig.h        |  4 +-
>  lib/librte_eal/common/malloc_heap.c           | 53 ++++++++++++-------
>  lib/librte_eal/common/rte_malloc.c            | 28 ++++++----
>  5 files changed, 84 insertions(+), 48 deletions(-)
>
> diff --git a/config/common_base b/config/common_base
> index fcf3a1f6f..b0e3937e0 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>  CONFIG_RTE_LIBRTE_EAL=y
>  CONFIG_RTE_MAX_LCORE=128
>  CONFIG_RTE_MAX_NUMA_NODES=8
> +CONFIG_RTE_MAX_HEAPS=32
>  CONFIG_RTE_MAX_MEMSEG_LISTS=64
>  # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
>  # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is
> smaller
> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> b/lib/librte_eal/common/eal_common_memzone.c
> index faa3b0615..25c56052c 100644
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -52,6 +52,26 @@ memzone_lookup_thread_unsafe(const char *name)
>         return NULL;
>  }
>
> +static size_t
> +heap_max_free_elem(unsigned int heap_idx, unsigned int align)
> +{
> +       struct rte_malloc_socket_stats stats;
> +       struct rte_mem_config *mcfg;
> +       size_t len;
> +
> +       /* get pointer to global configuration */
> +       mcfg = rte_eal_get_configuration()->mem_config;
> +
> +       malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx], &stats);
> +
> +       len = stats.greatest_free_size;
> +
> +       if (len < MALLOC_ELEM_OVERHEAD + align)
> +               return 0;
> +
> +       return len - MALLOC_ELEM_OVERHEAD - align;
> +}
> +
>
>  /* This function will return the greatest free block if a heap has been
>   * specified. If no heap has been specified, it will return the heap and
>

This description should be updated as no heap/socket is used now.


> @@ -59,29 +79,23 @@ memzone_lookup_thread_unsafe(const char *name)
>  static size_t
>  find_heap_max_free_elem(int *s, unsigned align)
>  {
> -       struct rte_mem_config *mcfg;
> -       struct rte_malloc_socket_stats stats;
> -       int i, socket = *s;
> +       unsigned int idx;
> +       int socket = *s;
>         size_t len = 0;
>
> -       /* get pointer to global configuration */
> -       mcfg = rte_eal_get_configuration()->mem_config;
> -
> -       for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
> -               if ((socket != SOCKET_ID_ANY) && (socket != i))
> +       for (idx = 0; idx < rte_socket_count(); idx++) {
> +               int cur_socket = rte_socket_id_by_idx(idx);
> +               if ((socket != SOCKET_ID_ANY) && (socket != cur_socket))
>                         continue;
>
> -               malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
> -               if (stats.greatest_free_size > len) {
> -                       len = stats.greatest_free_size;
> -                       *s = i;
> +               size_t cur_len = heap_max_free_elem(idx, align);
> +               if (cur_len > len) {
> +                       len = cur_len;
> +                       *s = cur_socket;
>                 }
>         }
>
> -       if (len < MALLOC_ELEM_OVERHEAD + align)
> -               return 0;
> -
> -       return len - MALLOC_ELEM_OVERHEAD - align;
> +       return len;
>

Is it worth to set *s to some safe value if no space at all?


>  }
>
>  static const struct rte_memzone *
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index 4e8720ba6..7e03196a6 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -71,8 +71,8 @@ struct rte_mem_config {
>
>         struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for
> objects */
>
> -       /* Heaps of Malloc per socket */
> -       struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
> +       /* Heaps of Malloc */
> +       struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
>
>         /* address of mem_config in primary process. used to map shared
> config into
>          * exact same address the primary process maps it.
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/
> malloc_heap.c
> index 8a1f54905..e7e1838b1 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -93,9 +93,10 @@ malloc_add_seg(const struct rte_memseg_list *msl,
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
>         struct rte_memseg_list *found_msl;
>         struct malloc_heap *heap;
> -       int msl_idx;
> +       int msl_idx, heap_idx;
>
> -       heap = &mcfg->malloc_heaps[msl->socket_id];
> +       heap_idx = rte_socket_idx_by_id(msl->socket_id);
> +       heap = &mcfg->malloc_heaps[heap_idx];
>
>         /* msl is const, so find it */
>         msl_idx = msl - mcfg->memsegs;
> @@ -494,14 +495,20 @@ alloc_more_mem_on_socket(struct malloc_heap *heap,
> size_t size, int socket,
>
>  /* this will try lower page sizes first */
>  static void *
> -heap_alloc_on_socket(const char *type, size_t size, int socket,
> -               unsigned int flags, size_t align, size_t bound, bool
> contig)
> +malloc_heap_alloc_on_heap_id(const char *type, size_t size,
> +               unsigned int heap_id, unsigned int flags, size_t align,
> +               size_t bound, bool contig)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
> -       struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
> +       struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
>         unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
> +       int socket_id;
>         void *ret;
>
> +       /* return NULL if size is 0 or alignment is not power-of-2 */
> +       if (size == 0 || (align && !rte_is_power_of_2(align)))
> +               return NULL;
> +
>         rte_spinlock_lock(&(heap->lock));
>
>         align = align == 0 ? 1 : align;
> @@ -521,8 +528,13 @@ heap_alloc_on_socket(const char *type, size_t size,
> int socket,
>         if (ret != NULL)
>                 goto alloc_unlock;
>
> -       if (!alloc_more_mem_on_socket(heap, size, socket, flags, align,
> bound,
> -                       contig)) {
> +       socket_id = rte_socket_id_by_idx(heap_id);
> +       /* if socket ID is invalid, this is an external heap */
> +       if (socket_id < 0)
> +               goto alloc_unlock;
>

A log message would help to know what is going on. It is costly, but better
than hidden problem.


> +
> +       if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
> +                       bound, contig)) {
>                 ret = heap_alloc(heap, type, size, flags, align, bound,
> contig);
>
>                 /* this should have succeeded */
> @@ -538,13 +550,9 @@ void *
>  malloc_heap_alloc(const char *type, size_t size, int socket_arg,
>                 unsigned int flags, size_t align, size_t bound, bool
> contig)
>  {
> -       int socket, i, cur_socket;
> +       int socket, heap_id, i;
>         void *ret;
>
> -       /* return NULL if size is 0 or alignment is not power-of-2 */
> -       if (size == 0 || (align && !rte_is_power_of_2(align)))
> -               return NULL;
> -
>         if (!rte_eal_has_hugepages())
>                 socket_arg = SOCKET_ID_ANY;
>
> @@ -557,18 +565,23 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
>         if (socket >= RTE_MAX_NUMA_NODES)
>                 return NULL;
>
> -       ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
> -                       contig);
> +       /* turn socket ID into heap ID */
> +       heap_id = rte_socket_idx_by_id(socket);
> +       /* if heap id is invalid, it's a non-existent NUMA node */
> +       if (heap_id < 0)
>

same than above. A log would help.


> +               return NULL;
> +
> +       ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags,
> align,
> +                       bound, contig);
>         if (ret != NULL || socket_arg != SOCKET_ID_ANY)
>                 return ret;
>
>         /* try other heaps */
>         for (i = 0; i < (int) rte_socket_count(); i++) {
> -               cur_socket = rte_socket_id_by_idx(i);
> -               if (cur_socket == socket)
> +               if (i == heap_id)
>                         continue;
> -               ret = heap_alloc_on_socket(type, size, cur_socket, flags,
> -                               align, bound, contig);
> +               ret = malloc_heap_alloc_on_heap_id(type, size, i, flags,
> align,
> +                               bound, contig);
>                 if (ret != NULL)
>                         return ret;
>         }
> @@ -788,7 +801,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t
> size)
>  }
>
>  /*
> - * Function to retrieve data for heap on given socket
> + * Function to retrieve data for a given heap
>   */
>  int
>  malloc_heap_get_stats(struct malloc_heap *heap,
> @@ -826,7 +839,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
>  }
>
>  /*
> - * Function to retrieve data for heap on given socket
> + * Function to retrieve data for a given heap
>   */
>  void
>  malloc_heap_dump(struct malloc_heap *heap, FILE *f)
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index b51a6d111..4387bc494 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -152,11 +152,17 @@ rte_malloc_get_socket_stats(int socket,
>                 struct rte_malloc_socket_stats *socket_stats)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
> +       int heap_idx;
>
>         if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
>                 return -1;
>
> -       return malloc_heap_get_stats(&mcfg->malloc_heaps[socket],
> socket_stats);
> +       heap_idx = rte_socket_idx_by_id(socket);
> +       if (heap_idx < 0)
> +               return -1;
> +
> +       return malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
> +                       socket_stats);
>  }
>
>  /*
> @@ -168,10 +174,9 @@ rte_malloc_dump_heaps(FILE *f)
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
>         unsigned int idx;
>
> -       for (idx = 0; idx < rte_socket_count(); idx++) {
> -               unsigned int socket = rte_socket_id_by_idx(idx);
> -               fprintf(f, "Heap on socket %i:\n", socket);
> -               malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
> +       for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
> +               fprintf(f, "Heap id: %u\n", idx);
> +               malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
>         }
>
>  }
> @@ -182,14 +187,17 @@ rte_malloc_dump_heaps(FILE *f)
>  void
>  rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
>  {
> -       unsigned int socket;
> +       struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
> +       unsigned int heap_id;
>         struct rte_malloc_socket_stats sock_stats;
> +
>         /* Iterate through all initialised heaps */
> -       for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
> -               if ((rte_malloc_get_socket_stats(socket, &sock_stats) <
> 0))
> -                       continue;
> +       for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
> +               struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
>
> -               fprintf(f, "Socket:%u\n", socket);
> +               malloc_heap_get_stats(heap, &sock_stats);
> +
> +               fprintf(f, "Heap id:%u\n", heap_id);
>                 fprintf(f, "\tHeap_size:%zu,\n",
> sock_stats.heap_totalsz_bytes);
>                 fprintf(f, "\tFree_size:%zu,\n",
> sock_stats.heap_freesz_bytes);
>                 fprintf(f, "\tAlloc_size:%zu,\n",
> sock_stats.heap_allocsz_bytes);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node
  2018-07-13 16:05   ` Alejandro Lucero
@ 2018-07-13 16:08     ` Burakov, Anatoly
  0 siblings, 0 replies; 25+ messages in thread
From: Burakov, Anatoly @ 2018-07-13 16:08 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Thomas Monjalon, srinath.mannam, scott.branden, Ajit Khaparde

On 13-Jul-18 5:05 PM, Alejandro Lucero wrote:
>> -       /* get pointer to global configuration */
>> -       mcfg = rte_eal_get_configuration()->mem_config;
>> -
>> -       for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
>> -               if ((socket != SOCKET_ID_ANY) && (socket != i))
>> +       for (idx = 0; idx < rte_socket_count(); idx++) {
>> +               int cur_socket = rte_socket_id_by_idx(idx);
>> +               if ((socket != SOCKET_ID_ANY) && (socket != cur_socket))
>>                          continue;
>>
>> -               malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
>> -               if (stats.greatest_free_size > len) {
>> -                       len = stats.greatest_free_size;
>> -                       *s = i;
>> +               size_t cur_len = heap_max_free_elem(idx, align);
>> +               if (cur_len > len) {
>> +                       len = cur_len;
>> +                       *s = cur_socket;
>>                  }
>>          }
>>
>> -       if (len < MALLOC_ELEM_OVERHEAD + align)
>> -               return 0;
>> -
>> -       return len - MALLOC_ELEM_OVERHEAD - align;
>> +       return len;
>>
> 
> Is it worth to set *s to some safe value if no space at all?

No, the value of *s is set externally anyway, and is not used of return 
value is 0.

Thanks for other comments, will fix when the next iteration comes.


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps
  2018-07-06 13:17 ` [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-07-13 16:09   ` Alejandro Lucero
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-13 16:09 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> We will need to refer to external heaps in some way. While we use
> heap ID's internally, for external API use it has to be something
> more user-friendly. So, we will be using a string to uniquely
> identify a heap.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
>  lib/librte_eal/common/malloc_heap.c             | 13 +++++++++++++
>  lib/librte_eal/common/rte_malloc.c              |  1 +
>  3 files changed, 16 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h
> b/lib/librte_eal/common/include/rte_malloc_heap.h
> index d43fa9097..bd64dff03 100644
> --- a/lib/librte_eal/common/include/rte_malloc_heap.h
> +++ b/lib/librte_eal/common/include/rte_malloc_heap.h
> @@ -12,6 +12,7 @@
>
>  /* Number of free lists per heap, grouped by size. */
>  #define RTE_HEAP_NUM_FREELISTS  13
> +#define RTE_HEAP_NAME_MAX_LEN 32
>
>  /* dummy definition, for pointers */
>  struct malloc_elem;
> @@ -27,6 +28,7 @@ struct malloc_heap {
>
>         unsigned alloc_count;
>         size_t total_size;
> +       char name[RTE_HEAP_NAME_MAX_LEN];
>  } __rte_cache_aligned;
>
>  #endif /* _RTE_MALLOC_HEAP_H_ */
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/
> malloc_heap.c
> index e7e1838b1..8f22c062b 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -848,6 +848,7 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
>
>         rte_spinlock_lock(&heap->lock);
>
> +       fprintf(f, "Heap name: %s\n", heap->name);
>         fprintf(f, "Heap size: 0x%zx\n", heap->total_size);
>         fprintf(f, "Heap alloc count: %u\n", heap->alloc_count);
>
> @@ -864,6 +865,18 @@ int
>  rte_eal_malloc_heap_init(void)
>  {
>         struct rte_mem_config *mcfg = rte_eal_get_configuration()->
> mem_config;
> +       unsigned int i;
> +
> +       /* assign names to default DPDK heaps */
> +       for (i = 0; i < rte_socket_count(); i++) {
> +               struct malloc_heap *heap = &mcfg->malloc_heaps[i];
> +               char heap_name[RTE_HEAP_NAME_MAX_LEN];
> +               int socket_id = rte_socket_id_by_idx(i);
> +
> +               snprintf(heap_name, sizeof(heap_name) - 1,
> +                               "socket_%i", socket_id);
> +               strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
> +       }
>
>         if (register_mp_requests()) {
>                 RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess
> actions\n");
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 4387bc494..75d6e0b4d 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -198,6 +198,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char
> *type)
>                 malloc_heap_get_stats(heap, &sock_stats);
>
>                 fprintf(f, "Heap id:%u\n", heap_id);
> +               fprintf(f, "\tHeap name:%s\n", heap->name);
>                 fprintf(f, "\tHeap_size:%zu,\n",
> sock_stats.heap_totalsz_bytes);
>                 fprintf(f, "\tFree_size:%zu,\n",
> sock_stats.heap_freesz_bytes);
>                 fprintf(f, "\tAlloc_size:%zu,\n",
> sock_stats.heap_allocsz_bytes);
> --
> 2.17.1
>

Acked: Alejandro Lucero <alejandro.lucero@netronome.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps
  2018-07-06 13:17 ` [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps Anatoly Burakov
@ 2018-07-13 16:25   ` Alejandro Lucero
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-13 16:25 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Add internal functions to look up heap by name, and enable
> dumping statistics for a specified named heap.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/include/rte_malloc.h | 19 +++++++++++--
>  lib/librte_eal/common/malloc_heap.c        | 31 ++++++++++++++++++++++
>  lib/librte_eal/common/malloc_heap.h        |  6 +++++
>  lib/librte_eal/common/rte_malloc.c         | 17 ++++++++++++
>  lib/librte_eal/rte_eal_version.map         |  1 +
>  5 files changed, 72 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/rte_malloc.h
> b/lib/librte_eal/common/include/rte_malloc.h
> index a9fb7e452..7cbcd3184 100644
> --- a/lib/librte_eal/common/include/rte_malloc.h
> +++ b/lib/librte_eal/common/include/rte_malloc.h
> @@ -256,13 +256,28 @@ rte_malloc_validate(const void *ptr, size_t *size);
>   * @param socket_stats
>   *   A structure which provides memory to store statistics
>   * @return
> - *   Null on error
> - *   Pointer to structure storing statistics on success
> + *   0 on success
> + *   -1 on error
>   */
>  int
>  rte_malloc_get_socket_stats(int socket,
>                 struct rte_malloc_socket_stats *socket_stats);
>
> +/**
> + * Get heap statistics for the specified heap.
> + *
> + * @param socket
> + *   An unsigned integer specifying the socket to get heap statistics for
> + * @param socket_stats
> + *   A structure which provides memory to store statistics
> + * @return
> + *   0 on success
> + *   -1 on error
> + */
> +int __rte_experimental
> +rte_malloc_get_stats_from_heap(const char *heap_name,
> +               struct rte_malloc_socket_stats *socket_stats);
> +
>  /**
>   * Dump statistics.
>   *
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 8f22c062b..8437d33b3 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -614,6 +614,37 @@ malloc_heap_free_pages(void *aligned_start, size_t
> aligned_len)
>         return 0;
>  }
>
> +int
> +malloc_heap_find_named_heap_idx(const char *heap_name)
> +{
> +       struct rte_mem_config *mcfg = rte_eal_get_configuration()->m
> em_config;
> +       int heap_idx;
> +
> +       if (heap_name == NULL)
> +               return -1;
> +       if (strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
> RTE_HEAP_NAME_MAX_LEN)
> +               return -1;
> +       for (heap_idx = rte_socket_count(); heap_idx < RTE_MAX_HEAPS;
>

Why not to allow to use a default DPDK heap here? They got names as well
and I think it is good for consistency.


> +                       heap_idx++) {
> +               struct malloc_heap *heap = &mcfg->malloc_heaps[heap_idx];
> +               if (strncmp(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN)
> == 0)
> +                       return heap_idx;
> +       }
> +       return -1;
> +}
> +
> +struct malloc_heap *
> +malloc_heap_find_named_heap(const char *heap_name)
> +{
> +       struct rte_mem_config *mcfg = rte_eal_get_configuration()->m
> em_config;
> +       int heap_idx;
> +
> +       heap_idx = malloc_heap_find_named_heap_idx(heap_name);
> +       if (heap_idx < 0)
> +               return NULL;
> +       return &mcfg->malloc_heaps[heap_idx];
> +}
> +
>  int
>  malloc_heap_free(struct malloc_elem *elem)
>  {
> diff --git a/lib/librte_eal/common/malloc_heap.h
> b/lib/librte_eal/common/malloc_heap.h
> index 03b801415..4f3137260 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -29,6 +29,12 @@ void *
>  malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int
> flags,
>                 size_t align, size_t bound, bool contig);
>
> +int
> +malloc_heap_find_named_heap_idx(const char *name);
> +
> +struct malloc_heap *
> +malloc_heap_find_named_heap(const char *name);
> +
>  int
>  malloc_heap_free(struct malloc_elem *elem);
>
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 75d6e0b4d..2508abdb1 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -165,6 +165,23 @@ rte_malloc_get_socket_stats(int socket,
>                         socket_stats);
>  }
>
> +/*
> + * Function to retrieve data for heap on given socket
> + */
> +int __rte_experimental
> +rte_malloc_get_stats_from_heap(const char *heap_name,
> +               struct rte_malloc_socket_stats *socket_stats)
> +{
> +       struct malloc_heap *heap;
> +
> +       heap = malloc_heap_find_named_heap(heap_name);
> +
> +       if (heap == NULL)
> +               return -1;
> +
> +       return malloc_heap_get_stats(heap, socket_stats);
> +}
> +
>  /*
>   * Function to dump contents of all heaps
>   */
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index e7fb37b2a..786df1e39 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -278,6 +278,7 @@ EXPERIMENTAL {
>         rte_fbarray_set_used;
>         rte_log_register_type_and_pick_level;
>         rte_malloc_dump_heaps;
> +       rte_malloc_get_stats_from_heap;
>         rte_mem_alloc_validator_register;
>         rte_mem_alloc_validator_unregister;
>         rte_mem_event_callback_register;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 06/11] malloc: enable allocating from named heaps
  2018-07-06 13:17 ` [dpdk-dev] [RFC 06/11] malloc: enable allocating " Anatoly Burakov
@ 2018-07-13 16:31   ` Alejandro Lucero
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-13 16:31 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Add new malloc API to allocate memory from heap referenced to by
> specified name.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/include/rte_malloc.h | 25 ++++++++++++++++++++++
>  lib/librte_eal/common/malloc_heap.c        |  2 +-
>  lib/librte_eal/common/malloc_heap.h        |  6 ++++++
>  lib/librte_eal/common/rte_malloc.c         | 11 ++++++++++
>  lib/librte_eal/rte_eal_version.map         |  1 +
>  5 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/include/rte_malloc.h
> b/lib/librte_eal/common/include/rte_malloc.h
> index 7cbcd3184..f1bcd9b65 100644
> --- a/lib/librte_eal/common/include/rte_malloc.h
> +++ b/lib/librte_eal/common/include/rte_malloc.h
> @@ -213,6 +213,31 @@ rte_zmalloc_socket(const char *type, size_t size,
> unsigned align, int socket);
>  void *
>  rte_calloc_socket(const char *type, size_t num, size_t size, unsigned
> align, int socket);
>
> +/**
> + * This function allocates memory from a specified named heap.
> + *
> + * @param name
> + *   Name of the heap to allocate from.
> + * @param type
> + *   A string identifying the type of allocated objects (useful for debug
> + *   purposes, such as identifying the cause of a memory leak). Can be
> NULL.
> + * @param size
> + *   Size (in bytes) to be allocated.
> + * @param align
> + *   If 0, the return is a pointer that is suitably aligned for any kind
> of
> + *   variable (in the same manner as malloc()).
> + *   Otherwise, the return is a pointer that is a multiple of *align*. In
> + *   this case, it must be a power of two. (Minimum alignment is the
> + *   cacheline size, i.e. 64-bytes)
> + * @return
> + *   - NULL on error. Not enough memory, or invalid arguments (size is 0,
> + *     align is not a power of two).
> + *   - Otherwise, the pointer to the allocated object.
> + */
> +__rte_experimental void *
> +rte_malloc_from_heap(const char *heap_name, const char *type, size_t size,
> +               unsigned int align);
> +
>  /**
>   * Frees the memory space pointed to by the provided pointer.
>   *
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/
> malloc_heap.c
> index 8437d33b3..a33acc252 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -494,7 +494,7 @@ alloc_more_mem_on_socket(struct malloc_heap *heap,
> size_t size, int socket,
>  }
>
>  /* this will try lower page sizes first */
> -static void *
> +void *
>  malloc_heap_alloc_on_heap_id(const char *type, size_t size,
>                 unsigned int heap_id, unsigned int flags, size_t align,
>                 size_t bound, bool contig)
> diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/
> malloc_heap.h
> index 4f3137260..a7e408c99 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -29,6 +29,12 @@ void *
>  malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int
> flags,
>                 size_t align, size_t bound, bool contig);
>
> +/* allocate from specified heap id */
> +void *
> +malloc_heap_alloc_on_heap_id(const char *type, size_t size,
> +               unsigned int heap_id, unsigned int flags, size_t align,
> +               size_t bound, bool contig);
> +
>  int
>  malloc_heap_find_named_heap_idx(const char *name);
>
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 2508abdb1..215876a59 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -55,6 +55,17 @@ rte_malloc_socket(const char *type, size_t size,
> unsigned int align,
>                         align == 0 ? 1 : align, 0, false);
>  }
>
> +void *
> +rte_malloc_from_heap(const char *heap_name, const char *type, size_t size,
> +               unsigned int align)
> +{
> +       int heap_idx = malloc_heap_find_named_heap_idx(heap_name);
> +       if (heap_idx < 0)
> +               return NULL;
>

error log?


> +       return malloc_heap_alloc_on_heap_id(type, size, heap_idx, 0,
> +                       align == 0 ? 1 : align, 0, false);
> +}
> +
>  /*
>   * Allocate memory on default heap.
>   */
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_
> version.map
> index 786df1e39..716a7585d 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -278,6 +278,7 @@ EXPERIMENTAL {
>         rte_fbarray_set_used;
>         rte_log_register_type_and_pick_level;
>         rte_malloc_dump_heaps;
> +       rte_malloc_from_heap;
>         rte_malloc_get_stats_from_heap;
>         rte_mem_alloc_validator_register;
>         rte_mem_alloc_validator_unregister;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps
  2018-07-06 13:17 ` [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-07-13 17:04   ` Alejandro Lucero
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Lucero @ 2018-07-13 17:04 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, srinath.mannam, scott.branden, Ajit Khaparde

On Fri, Jul 6, 2018 at 2:17 PM, Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Add an API to add externally allocated memory to malloc heap. The
> memory will be stored in memseg lists like regular DPDK memory.
> Multiple segments are allowed within a heap. If IOVA table is
> not provided, IOVA addresses are filled in with RTE_BAD_IOVA.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/include/rte_malloc.h | 44 ++++++++++++++
>  lib/librte_eal/common/malloc_heap.c        | 70 ++++++++++++++++++++++
>  lib/librte_eal/common/malloc_heap.h        |  4 ++
>  lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++
>  lib/librte_eal/rte_eal_version.map         |  1 +
>  5 files changed, 158 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/rte_malloc.h
> b/lib/librte_eal/common/include/rte_malloc.h
> index fa6de073a..5f933993b 100644
> --- a/lib/librte_eal/common/include/rte_malloc.h
> +++ b/lib/librte_eal/common/include/rte_malloc.h
> @@ -274,6 +274,50 @@ rte_free(void *ptr);
>  int __rte_experimental
>  rte_malloc_heap_create(const char *heap_name);
>
> +/**
> + * Add more memory to heap with specified name.
> + *
> + * @note Concurrently adding memory to or removing memory from different
> heaps
> + *   is not safe.
> + *
> + * @note This function does not need to be called in multiple processes,
> as
> + *   multiprocess synchronization will happen automatically as far as
> heap data
> + *   is concerned. However, before accessing pointers to memory in this
> heap, it
> + *   is responsibility of the user to ensure that the heap memory is
> accessible
> + *   in all processes.
> + *
> + * @note Memory must be previously allocated for DPDK to be able to use
> it as a
> + *   malloc heap. Failing to do so will result in undefined behavior, up
> to and
> + *   including crashes.
> + *
> + * @note Adding memory to heap may fail in multiple processes scenario, as
> + *   attaching to ``rte_fbarray`` structures may not always be successful
> in
> + *   secondary processes.
> + *
> + * @param heap_name
> + *   Name of the heap to create.
>

Name of the heap to add memory to.


> + * @param va_addr
> + *   Start of virtual area to add to the heap.
> + * @param len
> + *   Length of virtual area to add to the heap.
> + * @param iova_addrs
> + *   Array of page IOVA addresses corresponding to each page in this
> memory
> + *   area. Can be NULL, in which case page IOVA addresses will be set to
> + *   RTE_BAD_IOVA.
> + * @param n_pages
> + *   Number of elements in the iova_addrs array. Must be zero of
> ``iova_addrs``
> + *   is NULL.
> + * @param page_sz
> + *   Page size of the underlying memory.
> + *
> + * @return
> + *   - 0 on successful creation.
> + *   - -1 on error.
> + */
> +int __rte_experimental
> +rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t
> len,
> +               rte_iova_t iova_addrs[], unsigned int n_pages, size_t
> page_sz);
> +
>  /**
>   * If malloc debug is enabled, check a memory block for header
>   * and trailer markers to indicate that all is well with the block.
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index f5d103626..29446cac9 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -892,6 +892,76 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
>         rte_spinlock_unlock(&heap->lock);
>  }
>
> +int
> +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
> +               rte_iova_t iova_addrs[], unsigned int n_pages, size_t
> page_sz)
> +{
> +       struct rte_mem_config *mcfg = rte_eal_get_configuration()->m
> em_config;
> +       char fbarray_name[RTE_FBARRAY_NAME_LEN];
> +       struct rte_memseg_list *msl = NULL;
> +       struct rte_fbarray *arr;
> +       size_t seg_len = n_pages * page_sz;
> +       unsigned int i;
> +
> +       /* first, find a free memseg list */
> +       for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
> +               struct rte_memseg_list *tmp = &mcfg->memsegs[i];
> +               if (tmp->base_va == NULL) {
> +                       msl = tmp;
> +                       break;
> +               }
> +       }
> +       if (msl == NULL) {
> +               RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
> +               rte_errno = ENOSPC;
> +               return -1;
> +       }
> +
> +       snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
> +                       heap->name, va_addr);
> +
> +       /* create the backing fbarray */
> +       if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
> +                       sizeof(struct rte_memseg)) < 0) {
> +               RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the
> memseg list\n");
> +               return -1;
> +       }
> +       arr = &msl->memseg_arr;
> +
> +       /* fbarray created, fill it up */
> +       for (i = 0; i < n_pages; i++) {
> +               struct rte_memseg *ms;
> +
> +               rte_fbarray_set_used(arr, i);
> +               ms = rte_fbarray_get(arr, i);
> +               ms->addr = RTE_PTR_ADD(va_addr, n_pages * page_sz);
> +               ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA :
> iova_addrs[i];
> +               ms->hugepage_sz = page_sz;
> +               ms->len = page_sz;
> +               ms->nchannel = rte_memory_get_nchannel();
> +               ms->nrank = rte_memory_get_nrank();
> +               ms->socket_id = -1; /* invalid socket ID */
> +       }
> +
> +       /* set up the memseg list */
> +       msl->base_va = va_addr;
> +       msl->page_sz = page_sz;
> +       msl->socket_id = -1; /* invalid socket ID */
> +       msl->version = 0;
> +       msl->external = true;
> +
> +       /* now, add newly minted memory to the malloc heap */
> +       malloc_heap_add_memory(heap, msl, va_addr, seg_len);
> +
> +       heap->total_size += seg_len;
> +
> +       /* all done! */
> +       RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
> +                       heap->name, va_addr);
> +
> +       return 0;
> +}
> +
>  int
>  malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
>  {
> diff --git a/lib/librte_eal/common/malloc_heap.h
> b/lib/librte_eal/common/malloc_heap.h
> index aa819ef65..3be4656d0 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -38,6 +38,10 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t
> size,
>  int
>  malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
>
> +int
> +malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
> +               rte_iova_t iova_addrs[], unsigned int n_pages, size_t
> page_sz);
> +
>  int
>  malloc_heap_find_named_heap_idx(const char *name);
>
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index e000dc5b7..db0f604ad 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -274,6 +274,45 @@ rte_malloc_virt2iova(const void *addr)
>         return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
>  }
>
> +int
> +rte_malloc_heap_add_memory(const char *heap_name, void *va_addr, size_t
> len,
> +               rte_iova_t iova_addrs[], unsigned int n_pages, size_t
> page_sz)
> +{
> +       struct malloc_heap *heap = NULL;
> +       unsigned int n;
> +       int ret;
> +
> +       /* iova_addrs is allowed to be NULL */
> +       if (heap_name == NULL || va_addr == NULL ||
> +                       /* n_pages can be 0 if iova_addrs is NULL */
> +                       ((iova_addrs != NULL) == (n_pages == 0)) ||
> +                       page_sz == 0 || !rte_is_power_of_2(page_sz) ||
> +                       strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
> +                       strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
> +                               RTE_HEAP_NAME_MAX_LEN) {
> +               rte_errno = EINVAL;
> +               return -1;
> +       }
> +       /* find our heap */
> +       heap = malloc_heap_find_named_heap(heap_name);
> +       if (heap == NULL) {
> +               rte_errno = EINVAL;
> +               return -1;
> +       }
> +       n = len / page_sz;
> +       if (n != n_pages && iova_addrs != NULL) {
> +               rte_errno = EINVAL;
> +               return -1;
> +       }
> +
> +       rte_spinlock_lock(&heap->lock);
> +       ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs,
> n,
> +                       page_sz);
> +       rte_spinlock_unlock(&heap->lock);
> +
> +       return ret;
> +}
> +
>  int
>  rte_malloc_heap_create(const char *heap_name)
>  {
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index f3c375156..6290cc910 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -280,6 +280,7 @@ EXPERIMENTAL {
>         rte_malloc_dump_heaps;
>         rte_malloc_from_heap;
>         rte_malloc_get_stats_from_heap;
> +       rte_malloc_heap_add_memory;
>         rte_malloc_heap_create;
>         rte_mem_alloc_validator_register;
>         rte_mem_alloc_validator_unregister;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (10 preceding siblings ...)
  2018-07-06 13:17 ` [dpdk-dev] [RFC 11/11] memzone: enable reserving memory from named heaps Anatoly Burakov
@ 2018-07-13 17:10 ` Burakov, Anatoly
  2018-07-13 17:56   ` Wiles, Keith
  11 siblings, 1 reply; 25+ messages in thread
From: Burakov, Anatoly @ 2018-07-13 17:10 UTC (permalink / raw)
  To: dev
  Cc: srinath.mannam, scott.branden, ajit.khaparde, Thomas Monjalon,
	Shreyansh Jain, jerin.jacob, Keith Wiles

On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
> This is a proposal to enable using externally allocated memory
> in DPDK.
> 
> In a nutshell, here is what is being done here:
> 
> - Index malloc heaps by NUMA node index, rather than NUMA node itself
> - Add identifier string to malloc heap, to uniquely identify it
> - Allow creating named heaps and add/remove memory to/from those heaps
> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>    of externally allocated memory
>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> - Allow malloc and memzones to allocate from named heaps
> 
> The responsibility to ensure memory is accessible before using it is
> on the shoulders of the user - there is no checking done with regards
> to validity of the memory (nor could there be...).
> 
> The following limitations are present:
> 
> - No multiprocess support
> - No thread safety
> 
> There is currently no way to allocate memory during initialization
> stage, so even if multiprocess support is added, it is not guaranteed
> to work because of underlying issues with mapping fbarrays in
> secondary processes. This is not an issue in single process scenario,
> but it may be an issue in a multiprocess scenario in case where
> primary doesn't intend to share the externally allocated memory, yet
> adding such memory could fail because some other process failed to
> attach to this shared memory when it wasn't needed.
> 
> Anatoly Burakov (11):
>    mem: allow memseg lists to be marked as external
>    eal: add function to rerieve socket index by socket ID
>    malloc: index heaps using heap ID rather than NUMA node
>    malloc: add name to malloc heaps
>    malloc: enable retrieving statistics from named heaps
>    malloc: enable allocating from named heaps
>    malloc: enable creating new malloc heaps
>    malloc: allow adding memory to named heaps
>    malloc: allow removing memory from named heaps
>    malloc: allow destroying heaps
>    memzone: enable reserving memory from named heaps
> 
>   config/common_base                            |   1 +
>   lib/librte_eal/common/eal_common_lcore.c      |  15 +
>   lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>   lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
>   .../common/include/rte_eal_memconfig.h        |   5 +-
>   lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>   lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>   .../common/include/rte_malloc_heap.h          |   2 +
>   lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>   lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>   lib/librte_eal/common/malloc_heap.h           |  26 ++
>   lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>   lib/librte_eal/rte_eal_version.map            |  10 +
>   13 files changed, 1118 insertions(+), 109 deletions(-)
> 

So, now that the RFC is out, i would like to ask a general question.

One other thing that this patchset is missing, is the ability for data 
structures (e.g. hash, mempool, etc.) to be allocated from external 
heaps. Currently, we can kinda sorta do that with various _init() API's 
(initializing a data structure over already allocated memzone), but this 
is not ideal and is a hassle for anyone using external memory in DPDK.

There are basically four ways to approach this problem (that i can see).

First way is to change "socket ID" to mean "heap ID" everywhere. This 
has an upside of having a consistent API to allocate from internal and 
external heaps, with little to no API additions, only internal changes 
to account for the fact that "socket ID" is now "heap ID".

However, there is a massive downside to this approach: it is a *giant* 
API change, and it's also a giant *ABI-compatible* API change. Meaning, 
replacing socket ID with heap ID will not cause compile failures for old 
code, which would result in many subtle bugs in already existing 
codebases. So, while in the perfect world this would've been my 
preferred approach, realistically i think this is a very, very bad idea.

Second one is to add a separate "heap name" API's to everything. This 
has an upside of clean separation between allocation from internal and 
external heaps. (well, whether it's an upside is debatable...) This is 
the approach i expected to take when i was creating this patchset.

The downside is that we have to add new API's to every library and every 
DPDK data structure, to allow explicit allocation from external heaps. 
We will have to maintain both, and things like hardware drivers will 
need to have a way to indicate the need to allocate things from a 
particular external heap.

The third way is to expose the "heap ID" externally, and allow a single, 
unified API to reserve memory. That is, create an API that would map 
either a NUMA node ID or a heap name to an ID, and allow reserving 
memory through that ID regardless of whether it's internal or external 
memory. This would also allow to gradually phase out socket-based ID's 
in favor of heap ID API, should we choose to do so.

The downside for this is, it adds a layer of indirection between socket 
ID and reserving memory on a particular NUMA node, and it makes it hard 
to produce a single value of "heap ID" in such a way as to replicate 
current functionality of allocating with SOCKET_ID_ANY. Most likely user 
will have to explicitly try to allocate on all sockets, unless we keep 
old API's around in parallel.

Finally, a fourth way would be to abuse the socket ID to also mean 
something else, which is an approach i've seen numerous times already, 
and one that i don't like. We could register new heaps as a new, fake 
socket ID, and use that to address external heaps (each heap would get 
its own socket). So, keep current socket ID behavior, but for 
non-existent sockets it would be possible to be registered as a fake 
socket pointing to an external heap.

The upside for this approach would be that no API changes are required 
whatsoever to existing libraries - this scheme is compatible with both 
internal and external heaps without adding a separate API.

The downside is bad semantics - "special" sockets, handling of 
SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that 
happens to correspond to an existing external heap", and many other 
things that can be confusing. I don't like this option, but it's an 
option :)

Thoughts? Comments?

I myself still favor the second way, however there are good arguments to 
be made for each of these options.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-13 17:10 ` [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Burakov, Anatoly
@ 2018-07-13 17:56   ` Wiles, Keith
  2018-07-19 10:58     ` László Vadkerti
  0 siblings, 1 reply; 25+ messages in thread
From: Wiles, Keith @ 2018-07-13 17:56 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob



> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> 
> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
>> This is a proposal to enable using externally allocated memory
>> in DPDK.
>> In a nutshell, here is what is being done here:
>> - Index malloc heaps by NUMA node index, rather than NUMA node itself
>> - Add identifier string to malloc heap, to uniquely identify it
>> - Allow creating named heaps and add/remove memory to/from those heaps
>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>   of externally allocated memory
>>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>> - Allow malloc and memzones to allocate from named heaps
>> The responsibility to ensure memory is accessible before using it is
>> on the shoulders of the user - there is no checking done with regards
>> to validity of the memory (nor could there be...).
>> The following limitations are present:
>> - No multiprocess support
>> - No thread safety
>> There is currently no way to allocate memory during initialization
>> stage, so even if multiprocess support is added, it is not guaranteed
>> to work because of underlying issues with mapping fbarrays in
>> secondary processes. This is not an issue in single process scenario,
>> but it may be an issue in a multiprocess scenario in case where
>> primary doesn't intend to share the externally allocated memory, yet
>> adding such memory could fail because some other process failed to
>> attach to this shared memory when it wasn't needed.
>> Anatoly Burakov (11):
>>   mem: allow memseg lists to be marked as external
>>   eal: add function to rerieve socket index by socket ID
>>   malloc: index heaps using heap ID rather than NUMA node
>>   malloc: add name to malloc heaps
>>   malloc: enable retrieving statistics from named heaps
>>   malloc: enable allocating from named heaps
>>   malloc: enable creating new malloc heaps
>>   malloc: allow adding memory to named heaps
>>   malloc: allow removing memory from named heaps
>>   malloc: allow destroying heaps
>>   memzone: enable reserving memory from named heaps
>>  config/common_base                            |   1 +
>>  lib/librte_eal/common/eal_common_lcore.c      |  15 +
>>  lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>>  lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
>>  .../common/include/rte_eal_memconfig.h        |   5 +-
>>  lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>>  lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>>  .../common/include/rte_malloc_heap.h          |   2 +
>>  lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>>  lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>>  lib/librte_eal/common/malloc_heap.h           |  26 ++
>>  lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>>  lib/librte_eal/rte_eal_version.map            |  10 +
>>  13 files changed, 1118 insertions(+), 109 deletions(-)
> 
> So, now that the RFC is out, i would like to ask a general question.
> 
> One other thing that this patchset is missing, is the ability for data structures (e.g. hash, mempool, etc.) to be allocated from external heaps. Currently, we can kinda sorta do that with various _init() API's (initializing a data structure over already allocated memzone), but this is not ideal and is a hassle for anyone using external memory in DPDK.
> 
> There are basically four ways to approach this problem (that i can see).
> 
> First way is to change "socket ID" to mean "heap ID" everywhere. This has an upside of having a consistent API to allocate from internal and external heaps, with little to no API additions, only internal changes to account for the fact that "socket ID" is now "heap ID".
> 
> However, there is a massive downside to this approach: it is a *giant* API change, and it's also a giant *ABI-compatible* API change. Meaning, replacing socket ID with heap ID will not cause compile failures for old code, which would result in many subtle bugs in already existing codebases. So, while in the perfect world this would've been my preferred approach, realistically i think this is a very, very bad idea.
> 
> Second one is to add a separate "heap name" API's to everything. This has an upside of clean separation between allocation from internal and external heaps. (well, whether it's an upside is debatable...) This is the approach i expected to take when i was creating this patchset.
> 
> The downside is that we have to add new API's to every library and every DPDK data structure, to allow explicit allocation from external heaps. We will have to maintain both, and things like hardware drivers will need to have a way to indicate the need to allocate things from a particular external heap.
> 
> The third way is to expose the "heap ID" externally, and allow a single, unified API to reserve memory. That is, create an API that would map either a NUMA node ID or a heap name to an ID, and allow reserving memory through that ID regardless of whether it's internal or external memory. This would also allow to gradually phase out socket-based ID's in favor of heap ID API, should we choose to do so.
> 
> The downside for this is, it adds a layer of indirection between socket ID and reserving memory on a particular NUMA node, and it makes it hard to produce a single value of "heap ID" in such a way as to replicate current functionality of allocating with SOCKET_ID_ANY. Most likely user will have to explicitly try to allocate on all sockets, unless we keep old API's around in parallel.
> 
> Finally, a fourth way would be to abuse the socket ID to also mean something else, which is an approach i've seen numerous times already, and one that i don't like. We could register new heaps as a new, fake socket ID, and use that to address external heaps (each heap would get its own socket). So, keep current socket ID behavior, but for non-existent sockets it would be possible to be registered as a fake socket pointing to an external heap.
> 
> The upside for this approach would be that no API changes are required whatsoever to existing libraries - this scheme is compatible with both internal and external heaps without adding a separate API.
> 
> The downside is bad semantics - "special" sockets, handling of SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that happens to correspond to an existing external heap", and many other things that can be confusing. I don't like this option, but it's an option :)
> 
> Thoughts? Comments?

#1 is super clean, but very disruptive to everyone. Very Bad IMO
#2 is also clean, but adds a lot of new APIs that everyone needs to use or at least in the external heap cases.
#3 not sure I fully understand it, but reproducing heap IDs for testing is a problem and requires new/old APIs

#4 Very easy to add, IMO it is clean and very small disruption to developers. It does require the special handling, but I feel it is OK and can be explained in the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is normal socket and > 64K is external id.

My vote would be #4, as it seems the least risk and work. :-)

> 
> I myself still favor the second way, however there are good arguments to be made for each of these options.
> 
> -- 
> Thanks,
> Anatoly

Regards,
Keith


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-13 17:56   ` Wiles, Keith
@ 2018-07-19 10:58     ` László Vadkerti
  2018-07-26 13:48       ` Burakov, Anatoly
  0 siblings, 1 reply; 25+ messages in thread
From: László Vadkerti @ 2018-07-19 10:58 UTC (permalink / raw)
  To: Wiles, Keith, Burakov, Anatoly
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob

> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
>
> > On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> >
> > On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
> >> This is a proposal to enable using externally allocated memory in
> >> DPDK.
> >> In a nutshell, here is what is being done here:
> >> - Index malloc heaps by NUMA node index, rather than NUMA node itself
> >> - Add identifier string to malloc heap, to uniquely identify it
> >> - Allow creating named heaps and add/remove memory to/from those
> >> heaps
> >> - Allocate memseg lists at runtime, to keep track of IOVA addresses
> >>   of externally allocated memory
> >>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> >> - Allow malloc and memzones to allocate from named heaps The
> >> responsibility to ensure memory is accessible before using it is on
> >> the shoulders of the user - there is no checking done with regards to
> >> validity of the memory (nor could there be...).
> >> The following limitations are present:
> >> - No multiprocess support
> >> - No thread safety
> >> There is currently no way to allocate memory during initialization
> >> stage, so even if multiprocess support is added, it is not guaranteed
> >> to work because of underlying issues with mapping fbarrays in
> >> secondary processes. This is not an issue in single process scenario,
> >> but it may be an issue in a multiprocess scenario in case where
> >> primary doesn't intend to share the externally allocated memory, yet
> >> adding such memory could fail because some other process failed to
> >> attach to this shared memory when it wasn't needed.
> >> Anatoly Burakov (11):
> >>   mem: allow memseg lists to be marked as external
> >>   eal: add function to rerieve socket index by socket ID
> >>   malloc: index heaps using heap ID rather than NUMA node
> >>   malloc: add name to malloc heaps
> >>   malloc: enable retrieving statistics from named heaps
> >>   malloc: enable allocating from named heaps
> >>   malloc: enable creating new malloc heaps
> >>   malloc: allow adding memory to named heaps
> >>   malloc: allow removing memory from named heaps
> >>   malloc: allow destroying heaps
> >>   memzone: enable reserving memory from named heaps
> >>  config/common_base                            |   1 +
> >>  lib/librte_eal/common/eal_common_lcore.c      |  15 +
> >>  lib/librte_eal/common/eal_common_memory.c     |  51 +++-
> >>  lib/librte_eal/common/eal_common_memzone.c    | 283
> ++++++++++++++----
> >>  .../common/include/rte_eal_memconfig.h        |   5 +-
> >>  lib/librte_eal/common/include/rte_lcore.h     |  19 +-
> >>  lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
> >>  .../common/include/rte_malloc_heap.h          |   2 +
> >>  lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
> >>  lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
> >>  lib/librte_eal/common/malloc_heap.h           |  26 ++
> >>  lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
> >>  lib/librte_eal/rte_eal_version.map            |  10 +
> >>  13 files changed, 1118 insertions(+), 109 deletions(-)
> >
> > So, now that the RFC is out, i would like to ask a general question.
> >
> > One other thing that this patchset is missing, is the ability for data
> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.
> Currently, we can kinda sorta do that with various _init() API's (initializing a
> data structure over already allocated memzone), but this is not ideal and is a
> hassle for anyone using external memory in DPDK.
> >
> > There are basically four ways to approach this problem (that i can see).
> >
> > First way is to change "socket ID" to mean "heap ID" everywhere. This has
> an upside of having a consistent API to allocate from internal and external
> heaps, with little to no API additions, only internal changes to account for the
> fact that "socket ID" is now "heap ID".
> >
> > However, there is a massive downside to this approach: it is a *giant* API
> change, and it's also a giant *ABI-compatible* API change. Meaning,
> replacing socket ID with heap ID will not cause compile failures for old code,
> which would result in many subtle bugs in already existing codebases. So,
> while in the perfect world this would've been my preferred approach,
> realistically i think this is a very, very bad idea.
> >
> > Second one is to add a separate "heap name" API's to everything. This has
> an upside of clean separation between allocation from internal and external
> heaps. (well, whether it's an upside is debatable...) This is the approach i
> expected to take when i was creating this patchset.
> >
> > The downside is that we have to add new API's to every library and every
> DPDK data structure, to allow explicit allocation from external heaps. We will
> have to maintain both, and things like hardware drivers will need to have a
> way to indicate the need to allocate things from a particular external heap.
> >
> > The third way is to expose the "heap ID" externally, and allow a single,
> unified API to reserve memory. That is, create an API that would map either
> a NUMA node ID or a heap name to an ID, and allow reserving memory
> through that ID regardless of whether it's internal or external memory. This
> would also allow to gradually phase out socket-based ID's in favor of heap ID
> API, should we choose to do so.
> >
> > The downside for this is, it adds a layer of indirection between socket ID
> and reserving memory on a particular NUMA node, and it makes it hard to
> produce a single value of "heap ID" in such a way as to replicate current
> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to
> explicitly try to allocate on all sockets, unless we keep old API's around in
> parallel.
> >
> > Finally, a fourth way would be to abuse the socket ID to also mean
> something else, which is an approach i've seen numerous times already, and
> one that i don't like. We could register new heaps as a new, fake socket ID,
> and use that to address external heaps (each heap would get its own
> socket). So, keep current socket ID behavior, but for non-existent sockets it
> would be possible to be registered as a fake socket pointing to an external
> heap.
> >
> > The upside for this approach would be that no API changes are required
> whatsoever to existing libraries - this scheme is compatible with both internal
> and external heaps without adding a separate API.
> >
> > The downside is bad semantics - "special" sockets, handling of
> > SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that
> > happens to correspond to an existing external heap", and many other
> > things that can be confusing. I don't like this option, but it's an
> > option :)
> >
> > Thoughts? Comments?
> 
> #1 is super clean, but very disruptive to everyone. Very Bad IMO
> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at
> least in the external heap cases.
> #3 not sure I fully understand it, but reproducing heap IDs for testing is a
> problem and requires new/old APIs
> 
> #4 Very easy to add, IMO it is clean and very small disruption to developers.
> It does require the special handling, but I feel it is OK and can be explained in
> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is
> normal socket and > 64K is external id.
> 
> My vote would be #4, as it seems the least risk and work. :-)
> 
We are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.
Using an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.
The allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).
Anyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id  and problem solved with #4 :)

The cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.
Meaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”
SOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.

If changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.
Socket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.
The socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.
Using socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.

Irrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.

It would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.

> >
> > I myself still favor the second way, however there are good arguments to
> be made for each of these options.
> >
> > --
> > Thanks,
> > Anatoly
> 
> Regards,
> Keith

Thanks,
 Laszlo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-19 10:58     ` László Vadkerti
@ 2018-07-26 13:48       ` Burakov, Anatoly
  0 siblings, 0 replies; 25+ messages in thread
From: Burakov, Anatoly @ 2018-07-26 13:48 UTC (permalink / raw)
  To: László Vadkerti, Wiles, Keith
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob

On 19-Jul-18 11:58 AM, László Vadkerti wrote:
>> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>
>>> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
>>>
>>> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
>>>> This is a proposal to enable using externally allocated memory in
>>>> DPDK.
>>>> In a nutshell, here is what is being done here:
>>>> - Index malloc heaps by NUMA node index, rather than NUMA node itself
>>>> - Add identifier string to malloc heap, to uniquely identify it
>>>> - Allow creating named heaps and add/remove memory to/from those
>>>> heaps
>>>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>>>    of externally allocated memory
>>>>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>>>> - Allow malloc and memzones to allocate from named heaps The
>>>> responsibility to ensure memory is accessible before using it is on
>>>> the shoulders of the user - there is no checking done with regards to
>>>> validity of the memory (nor could there be...).
>>>> The following limitations are present:
>>>> - No multiprocess support
>>>> - No thread safety
>>>> There is currently no way to allocate memory during initialization
>>>> stage, so even if multiprocess support is added, it is not guaranteed
>>>> to work because of underlying issues with mapping fbarrays in
>>>> secondary processes. This is not an issue in single process scenario,
>>>> but it may be an issue in a multiprocess scenario in case where
>>>> primary doesn't intend to share the externally allocated memory, yet
>>>> adding such memory could fail because some other process failed to
>>>> attach to this shared memory when it wasn't needed.
>>>> Anatoly Burakov (11):
>>>>    mem: allow memseg lists to be marked as external
>>>>    eal: add function to rerieve socket index by socket ID
>>>>    malloc: index heaps using heap ID rather than NUMA node
>>>>    malloc: add name to malloc heaps
>>>>    malloc: enable retrieving statistics from named heaps
>>>>    malloc: enable allocating from named heaps
>>>>    malloc: enable creating new malloc heaps
>>>>    malloc: allow adding memory to named heaps
>>>>    malloc: allow removing memory from named heaps
>>>>    malloc: allow destroying heaps
>>>>    memzone: enable reserving memory from named heaps
>>>>   config/common_base                            |   1 +
>>>>   lib/librte_eal/common/eal_common_lcore.c      |  15 +
>>>>   lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>>>>   lib/librte_eal/common/eal_common_memzone.c    | 283
>> ++++++++++++++----
>>>>   .../common/include/rte_eal_memconfig.h        |   5 +-
>>>>   lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>>>>   lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>>>>   .../common/include/rte_malloc_heap.h          |   2 +
>>>>   lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>>>>   lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>>>>   lib/librte_eal/common/malloc_heap.h           |  26 ++
>>>>   lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>>>>   lib/librte_eal/rte_eal_version.map            |  10 +
>>>>   13 files changed, 1118 insertions(+), 109 deletions(-)
>>>
>>> So, now that the RFC is out, i would like to ask a general question.
>>>
>>> One other thing that this patchset is missing, is the ability for data
>> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.
>> Currently, we can kinda sorta do that with various _init() API's (initializing a
>> data structure over already allocated memzone), but this is not ideal and is a
>> hassle for anyone using external memory in DPDK.
>>>
>>> There are basically four ways to approach this problem (that i can see).
>>>
>>> First way is to change "socket ID" to mean "heap ID" everywhere. This has
>> an upside of having a consistent API to allocate from internal and external
>> heaps, with little to no API additions, only internal changes to account for the
>> fact that "socket ID" is now "heap ID".
>>>
>>> However, there is a massive downside to this approach: it is a *giant* API
>> change, and it's also a giant *ABI-compatible* API change. Meaning,
>> replacing socket ID with heap ID will not cause compile failures for old code,
>> which would result in many subtle bugs in already existing codebases. So,
>> while in the perfect world this would've been my preferred approach,
>> realistically i think this is a very, very bad idea.
>>>
>>> Second one is to add a separate "heap name" API's to everything. This has
>> an upside of clean separation between allocation from internal and external
>> heaps. (well, whether it's an upside is debatable...) This is the approach i
>> expected to take when i was creating this patchset.
>>>
>>> The downside is that we have to add new API's to every library and every
>> DPDK data structure, to allow explicit allocation from external heaps. We will
>> have to maintain both, and things like hardware drivers will need to have a
>> way to indicate the need to allocate things from a particular external heap.
>>>
>>> The third way is to expose the "heap ID" externally, and allow a single,
>> unified API to reserve memory. That is, create an API that would map either
>> a NUMA node ID or a heap name to an ID, and allow reserving memory
>> through that ID regardless of whether it's internal or external memory. This
>> would also allow to gradually phase out socket-based ID's in favor of heap ID
>> API, should we choose to do so.
>>>
>>> The downside for this is, it adds a layer of indirection between socket ID
>> and reserving memory on a particular NUMA node, and it makes it hard to
>> produce a single value of "heap ID" in such a way as to replicate current
>> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to
>> explicitly try to allocate on all sockets, unless we keep old API's around in
>> parallel.
>>>
>>> Finally, a fourth way would be to abuse the socket ID to also mean
>> something else, which is an approach i've seen numerous times already, and
>> one that i don't like. We could register new heaps as a new, fake socket ID,
>> and use that to address external heaps (each heap would get its own
>> socket). So, keep current socket ID behavior, but for non-existent sockets it
>> would be possible to be registered as a fake socket pointing to an external
>> heap.
>>>
>>> The upside for this approach would be that no API changes are required
>> whatsoever to existing libraries - this scheme is compatible with both internal
>> and external heaps without adding a separate API.
>>>
>>> The downside is bad semantics - "special" sockets, handling of
>>> SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that
>>> happens to correspond to an existing external heap", and many other
>>> things that can be confusing. I don't like this option, but it's an
>>> option :)
>>>
>>> Thoughts? Comments?
>>
>> #1 is super clean, but very disruptive to everyone. Very Bad IMO
>> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at
>> least in the external heap cases.
>> #3 not sure I fully understand it, but reproducing heap IDs for testing is a
>> problem and requires new/old APIs
>>
>> #4 Very easy to add, IMO it is clean and very small disruption to developers.
>> It does require the special handling, but I feel it is OK and can be explained in
>> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is
>> normal socket and > 64K is external id.
>>
>> My vote would be #4, as it seems the least risk and work. :-)
>>
> We are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.
> Using an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.
> The allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).

Hi Laszlo,

That depends on what you mean by "all other processes". If they are all 
part of the primary-secondary process prefix, then my plan is to enable 
private and shared heaps - i.e. a heap is either available to a single 
process, or it is available to some or all processes within a prefix.

It is also not possible to share the same area with different process 
prefixes (i.e. between two different primaries) because each of the 
processes will think it owns the entire memory and will do with it as it 
pleases. Using the same memory region with two different process 
prefixes will break many assumptions heap has - for example, it relies 
on a per-heap lock to control access to the heap, and that will not work 
if you map the same memory area into multiple primary processes. I do 
not foresee a mechanism to fix this problem within DPDK, but obviously 
if you have any suggestions, they will be considered :)

The reason we have to care about private vs. shared heaps is because of 
how DPDK handles memory management. In order for DPDK facilities such as 
rte_mem_virt2iova() or rte_memseg_walk() to work, we need to keep track 
of the pages we use for the heap - i.e. from DPDK's point of view, 
external memory behaves just like regular memory and is tracked using 
the same method of keeping page tables around (see rte_memseg_list).

These page tables need to be shared between all processes that use a 
specific heap. This introduces an inherent point of failure - you may 
mmap() the *area itself* successfully at the same address, but you may 
still fail to *attach to the page tables*, which will cause a particular 
heap to not be available in a process. This is a problem that i do not 
see a solution for at the moment, and it is something that users 
attempting to use external memory in secondary processes will have to 
deal with.

I haven't yet decided whether this should be automatic (i.e. shared 
heaps "automagically" appearing in all processes) or manual (make the 
user explicitly attach to an externally allocated heap in each process 
within the prefix). I would tend to go for the latter as it gives the 
user more control, and it is easier to implement because there's no need 
to engage IPC to make this work.

> Anyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id  and problem solved with #4 :)
> 
> The cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.
> Meaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”
> SOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.
> 
> If changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.
> Socket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.
> The socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.
> Using socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.
> 
> Irrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.

So, in other words, the consesus seems to be that we need to stay with 
the old socket_id and just use weird socket ID's for external heaps. 
Okay, so be it. Less work for me implementing it :)

> 
> It would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.

I do not see physical socket ID of externally allocated memory as a 
matter of concern for DPDK. I think this information should be up to the 
user application to handle, not DPDK. From my point of view, we 
shouldn't care where the memory came from, we just facilitate using it. 
If the user chooses to store additional metadata about the memory 
somewhere else - that is his prerogative, but i don't think having a 
provision for "physical socket ID" etc for external heaps should be in DPDK.

> 
>>>
>>> I myself still favor the second way, however there are good arguments to
>> be made for each of these options.
>>>
>>> --
>>> Thanks,
>>> Anatoly
>>
>> Regards,
>> Keith
> 
> Thanks,
>   Laszlo
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-07-26 13:48 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 13:17 [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-07-10 11:18   ` Alejandro Lucero
2018-07-10 11:31     ` Burakov, Anatoly
2018-07-06 13:17 ` [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID Anatoly Burakov
2018-07-10 13:03   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-07-13 16:05   ` Alejandro Lucero
2018-07-13 16:08     ` Burakov, Anatoly
2018-07-06 13:17 ` [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps Anatoly Burakov
2018-07-13 16:09   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps Anatoly Burakov
2018-07-13 16:25   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 06/11] malloc: enable allocating " Anatoly Burakov
2018-07-13 16:31   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 07/11] malloc: enable creating new malloc heaps Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps Anatoly Burakov
2018-07-13 17:04   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 09/11] malloc: allow removing memory from " Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 10/11] malloc: allow destroying heaps Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 11/11] memzone: enable reserving memory from named heaps Anatoly Burakov
2018-07-13 17:10 ` [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Burakov, Anatoly
2018-07-13 17:56   ` Wiles, Keith
2018-07-19 10:58     ` László Vadkerti
2018-07-26 13:48       ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).